# Scraping Steam Downloadable Content (DLC) data

1. In order to scrape DLC reviews, I first scraped information on all DLCs (i.e. app ids) on Steam.
    - To scrape, I used an open source scraper written by [Andre Perunicic](https://github.com/prncc/steam-scraper)
        - Detailed documentation of how the scraper works is provided by the author [here](https://blog.scrapinghub.com/2017/07/07/scraping-the-steam-game-store-with-scrapy/).     
2. I had to make some modifications to the scraper code because I was interested in scraping DLC reviews from the [steam store](http://store.steampowered.com/app/684630/American_Truck_Simulator__New_Mexico/), rather than the game reviews from the [community review page](http://steamcommunity.com/app/270880/reviews/) (The original scraper was only set up to do the latter)

### Handling AJAX response

<div style="text-align: left; display:inline-block"><img src="images/ajax_ex.png" style="width: 700px"/></div><br/><br/>
<h4>Because the Steam store dynamically loads user reviews, I tracked the AJAX request url using the chrome developer tool (shown below)</h4><br/><br/>
<div style="text-align: left; display:inline-block"><img src="images/chrome_dev.png" style="width: 700px"/></div><br/>

<h4>Then, I inspected the html elements to scrape certain info from the reviews (shown below)</h4><br/><br/>
<div style="text-align: left; display:inline-block"><img src="images/parse_html.png" style="width: 700px"/></div><br/><br/>

### Modifying the original code

- In order to scrape fields not already coded in the original code, I made modifications to **steam/items.py**, where the loaders are instantiated
```python
class ReviewItem(scrapy.Item()):
    # fields already set up in the original code
    # ...
    # ...
    
    # my custom fields
    user_url = scrapy.Field()
    date = scrapy.Field()
    num_reviews = scrapy.Field(
        output_processor=Compose(TakeFirst(), str_to_int)
    )
```

- Using BeautifulSoup, I grabbed all the html elements I wanted to scrape by modifying the **steam/spiders/review_spider.py** code

```python
class ReviewSpider(scrapy.Spider):
    name = 'reviews'
    
    # original code
    # ...
    # ...
    
    # function to parse page
    def parse(self, response):
        product_id = get_product_id(response)
        
        parsed_json = json.loads(response.body)
        parsed_html = BeautifulSoup(parsed_json['html'])
        
        # load all reviews from current AJAX request
        reviews = parsed_html.findAll('div', attrs={'class': 'review_box'})
        
        # if found, load each review and scrape data
        if reviews:
            for i, review in enumerate(reviews):
                yield load_review(review, product_id)
                
        # if there are more reviews waiting to be loaded
        form = parsed_html.findAll('div', attrs={"id": "LoadMoreReviewsrecent"})
        if form:
            offset = re.findall("start_offset=(\d+)", response.url)[0]
            yield self.request_next_page(response.url, offset)
```


- Because the AJAX response returns at maximum 20 reviews at a time before making another call to load the next 20, I wrote a custom function that grabs the next 20 reviews until I have scraped all the reviews (this custom function is called above inside the `if form:` block)
```python
def request_next_page(self, url, offset):
    new_offset = str(int(offset) + 20)
    request_url = url.replace(f"start_offset={offset}", f"start_offset={new_offset}")
    
    return Request(url=request_url, method="GET", callback=self.parse)
```

```python
def load_review(review, product_id):
    # original code
    # ...
    # ...
    
    # my code
    from bs4 import BeautifulSoup
    import json
    
    # user_id
    try:
        user_id = re.findall("profiles/(\d+)/", review.select(".avatar > a")[0]['href'])
        if user_id:
            loader.add_value('user_id', user_id[0])
    except:
        pass
    
    # more ...
```

### Scraping

After all the modifications made, I simply ran the scrapy spiders to do all the heavy lifting for me!
```
$ scrapy crawl <spider name> -o <output.file>
```

# Cleaning scraped DLC data

In [1]:
import pandas as pd
import numpy as np
import re

In [2]:
# load dlc data
dlcs_raw = pd.read_json('raw_data/dlcs_all.json')
dlcs_raw = dlcs_raw.drop(columns=['metascore','early_access'])
print(dlcs_raw.shape[0], dlcs_raw.shape[1])
dlcs_raw.head(100)

12841 17


Unnamed: 0,app_name,description,developer,discount_price,genres,id,parent_game,parent_game_url,price,publisher,release_date,reviews_url,sentiment,specs,tags,title,url
0,Stellar Interface - Murum Charta,"[, About This Content, ►, Murum Charta, Commun...",ImaginationOverflow,,"[Action, Casual, Indie]",777450,Stellar Interface,http://store.steampowered.com/app/517330/,Free,,2017-12-30,http://steamcommunity.com/app/777450/reviews/?...,,"[Single-player, Downloadable Content, Steam Ac...","[Action, Indie, Casual]",Stellar Interface - Murum Charta,http://store.steampowered.com/app/777450/Stell...
1,"SUPER ARMY OF TENTACLES 3, Winter Outfit Pack ...","[, About This Content, Unlocks three new ultra...",Stegalosaurus Game Development,,"[Adventure, Indie, RPG]",694780,Super Army of Tentacles 3: The Search for Army...,http://store.steampowered.com/app/592200/,0.99,,2018-01-11,http://steamcommunity.com/app/694780/reviews/?...,,"[Single-player, Downloadable Content, Steam Ac...","[Adventure, RPG, Indie]","SUPER ARMY OF TENTACLES 3, Winter Outfit Pack ...",http://store.steampowered.com/app/694780/SUPER...
2,X-Plane 11 - Add-on: Aerosoft - Airport Wilmin...,"[, About This Content, The Wilmington Internat...",Drawbridge Designs,,[Simulation],623613,X-Plane 11,http://store.steampowered.com/app/269950/,19.99,Aerosoft GmbH,2018-01-11,http://steamcommunity.com/app/623613/reviews/?...,,"[Single-player, Online Multi-Player, Local Mul...",[Simulation],X-Plane 11 - Add-on: Aerosoft - Airport Wilmin...,http://store.steampowered.com/app/623613/XPlan...
3,Train Simulator: RhB Enhancement Pack 02 Add-On,"[, About This Content, Get ready for another h...",Thomson Interactive,,[Simulation],642803,Train Simulator,http://store.steampowered.com/app/24010/,19.99,Dovetail Games - Trains,2018-01-11,http://steamcommunity.com/app/642803/reviews/?...,5 user reviews,"[Single-player, Downloadable Content, Steam Ac...",[Simulation],Train Simulator: RhB Enhancement Pack 02 Add-On,http://store.steampowered.com/app/642803/Train...
4,Waiting For the Loop Official Soundtrack and EP,"[, About This Content, With its three dark ele...",Bryan Minus,,"[Adventure, Casual, Indie]",722550,Waiting for the Loop,http://store.steampowered.com/app/717830/,1.99,Minus Equals Plus,2018-01-11,http://steamcommunity.com/app/722550/reviews/?...,,"[Single-player, Downloadable Content]","[Adventure, Indie, Casual]",Waiting For the Loop Official Soundtrack and EP,http://store.steampowered.com/app/722550/Waiti...
5,Banyu Lintar Angin - Little Storm - Deluxe Edi...,"[, About This Content, Banyu Lintar Angin is a...",Mojiken Studio,,"[Casual, Indie]",760610,Banyu Lintar Angin - Little Storm -,http://store.steampowered.com/app/744800/,2.99,Toge Productions,2017-03-03,http://steamcommunity.com/app/760610/reviews/?...,2 user reviews,"[Single-player, Downloadable Content]","[Indie, Casual]",Banyu Lintar Angin - Little Storm - Deluxe Edi...,http://store.steampowered.com/app/760610/Banyu...
6,A Raven Monologue Fan Pack,"[, About This Content, A Raven Monologue is a ...",Mojiken Studio,,"[Casual, Indie]",763280,A Raven Monologue,http://store.steampowered.com/app/744810/,2.99,Toge Productions,2018-01-11,http://steamcommunity.com/app/763280/reviews/?...,1 user reviews,"[Single-player, Downloadable Content]","[Indie, Casual]",A Raven Monologue Fan Pack,http://store.steampowered.com/app/763280/A_Rav...
7,Truth: Disorder - Soundtrack,"[, About This Content, Disorderia (Soundtracks...",JustE A,,[Indie],783750,Truth: Disorder,http://store.steampowered.com/app/755350/,0.99,JustE Publishing,2018-01-11,http://steamcommunity.com/app/783750/reviews/?...,,"[Single-player, Downloadable Content, Steam Ac...",[Indie],Truth: Disorder - Soundtrack,http://store.steampowered.com/app/783750/Truth...
8,FURIDASHI - PREMIUM CAR: 2015 STRONGER,"[, About This Content, DLC with full upgraded ...",Drift Physics Crew,1.88,"[Racing, Simulation, Sports]",756720,FURIDASHI: Drift Cyber Sport,http://store.steampowered.com/app/658570/,2.99,Drift Physics Crew,2018-01-11,http://steamcommunity.com/app/756720/reviews/?...,,"[Single-player, Multi-player, Online Multi-Pla...","[Simulation, Racing, Sports]",FURIDASHI - PREMIUM CAR: 2015 STRONGER,http://store.steampowered.com/app/756720/FURID...
9,FURIDASHI - PREMIUM CAR: 1986 AE-86S,"[, About This Content, DLC with full upgraded ...",Drift Physics Crew,1.88,"[Racing, Simulation, Sports]",756729,FURIDASHI: Drift Cyber Sport,http://store.steampowered.com/app/658570/,2.99,Drift Physics Crew,2018-01-11,http://steamcommunity.com/app/756729/reviews/?...,,"[Single-player, Multi-player, Online Multi-Pla...","[Simulation, Racing, Sports]",FURIDASHI - PREMIUM CAR: 1986 AE-86S,http://store.steampowered.com/app/756729/FURID...


In [3]:
len(dlcs_raw['developer'].unique())

2933

In [4]:
cleaned_dlcs = dlcs_raw

#### Drop rows with missing pub + dev + parent game info

In [5]:
# inspect missing publisher / developer

# Create variable with TRUE if publisher is missing
missing_pub = pd.isnull(cleaned_dlcs['publisher']) == True
# Create variable with TRUE if developer is missing
missing_dev = pd.isnull(cleaned_dlcs['developer']) == True
# Create variable with TRUE if parent game is missing
missing_parent = pd.isnull(cleaned_dlcs['parent_game_url']) == True

# Drop rows where all three fields are missing
no_pub_dev_parent = cleaned_dlcs[missing_pub & missing_dev & missing_parent]
print(len(no_pub_dev_parent))
cleaned_dlcs.drop(no_pub_dev_parent.index, inplace=True)

# Drop rows where parent game info is missing
no_parent = cleaned_dlcs[missing_parent]
print(len(no_parent))
cleaned_dlcs.drop(no_parent.index, inplace=True)

14
144


  app.launch_new_instance()


#### Drop rows with missing tags or specs (missing genres won't be dropped; it could be inferred from parent game)

In [6]:
# inspect missing specs, tags, and genres

# Create variable with TRUE if specs is missing
missing_specs = pd.isnull(cleaned_dlcs['specs']) == True
# Create variable with TRUE if tags is missing
missing_tags = pd.isnull(cleaned_dlcs['tags']) == True
# Create variable with TRUE if genres is missing
missing_genres = pd.isnull(cleaned_dlcs['genres']) == True

# Select all cases where all three fields are missing
no_spec_tag_genre = cleaned_dlcs[missing_specs & missing_tags & missing_genres]
no_specs = cleaned_dlcs[missing_specs]
no_tags = cleaned_dlcs[missing_tags]
no_genres = cleaned_dlcs[missing_genres]
#print(no_tags)
print('missing all 3: '+ str(len(no_spec_tag_genre)))
print('missing specs: '+ str(len(no_specs)))
print('missing tags: '+ str(len(no_tags)))
print('missing genres: '+ str(len(no_genres)))

cleaned_dlcs.drop(no_tags.index, inplace=True)

missing all 3: 0
missing specs: 0
missing tags: 111
missing genres: 145


#### Extract genres, tags, and specs

In [7]:
dlc_genres = {}
dlc_genres_count = {}
#unpack dlc genre and put it in dictionary
def unpack_genre(row):
    """This function unpacks the list of genres and stores it into a dictionary"""
    genres = row['genres']
    print(genres)
    try:
        for i in range(len(genres)):
            if genres[i] not in dlc_genres:
                dlc_genres[genres[i]] = 0
                dlc_genres_count[genres[i]] = 0
            else:
                dlc_genres_count[genres[i]] += 1
    except:
        pass

dlcs_raw.apply(unpack_genre, axis=1)

['Action', 'Casual', 'Indie']
['Adventure', 'Indie', 'RPG']
['Simulation']
['Simulation']
['Adventure', 'Casual', 'Indie']
['Casual', 'Indie']
['Casual', 'Indie']
['Indie']
['Racing', 'Simulation', 'Sports']
['Racing', 'Simulation', 'Sports']
['Racing', 'Simulation', 'Sports']
['Racing', 'Simulation', 'Sports']
['Racing', 'Simulation', 'Sports']
['Casual']
['Casual', 'Free to Play', 'Indie', 'Simulation', 'Sports', 'Strategy']
['Simulation']
['Action', 'Adventure', 'Indie', 'RPG']
['Simulation']
['Indie', 'Simulation']
['Indie', 'Simulation']
['Indie', 'Simulation']
['Indie', 'Simulation']
['Racing']
['Racing']
['Racing']
['Simulation']
['Simulation']
['Adventure', 'Indie', 'Strategy']
['Design &amp; Illustration', 'Web Publishing']
['Design &amp; Illustration', 'Web Publishing']
['Design &amp; Illustration', 'Web Publishing']
['Design &amp; Illustration', 'Web Publishing']
['Casual', 'Simulation']
['Design &amp; Illustration', 'Web Publishing']
['Casual', 'Simulation']
['Casual', 'Sim

['Simulation']
['Strategy']
['Free to Play', 'Simulation']
['Action']
['Action']
['Action']
['Action']
['Indie']
['Action']
['Action']
['Action']
['Action']
['Action']
['Action']
['Action']
['Action']
['Action']
['Action', 'Indie']
['Action', 'Adventure']
['Indie', 'RPG']
['Action']
['Casual', 'Free to Play', 'Indie']
['Casual', 'Free to Play', 'Indie']
['Simulation']
['Action', 'Indie']
['Adventure', 'Indie']
['Action', 'RPG']
['Adventure', 'Indie']
['Casual', 'Simulation']
['Racing', 'Sports']
['Indie', 'RPG']
['Action', 'Indie', 'Racing']
['Strategy']
['Strategy']
['Strategy']
['Strategy']
['Strategy']
['Action', 'RPG']
['Strategy']
['Strategy']
['Strategy']
['Strategy']
['Strategy']
['Strategy']
['Indie', 'RPG']
['Strategy']
['Action', 'Indie', 'Strategy']
['Action', 'Indie', 'Strategy']
['Action', 'Indie', 'Strategy']
['Indie']
['RPG']
['Casual', 'Simulation']
['Racing', 'Sports']
['Casual', 'Simulation']
['Casual', 'Simulation']
nan
['Design &amp; Illustration', 'Web Publishing']

['Casual', 'Simulation']
['Casual', 'Simulation']
['Action', 'Indie']
['Casual', 'Simulation']
['Casual', 'Simulation']
['Simulation']
['Indie', 'RPG', 'Strategy']
['Action', 'Casual', 'Indie']
['Action']
['Action', 'Adventure', 'Free to Play', 'Sports', 'Strategy']
['Action']
['Action', 'Adventure', 'Free to Play', 'Sports', 'Strategy']
['Casual', 'Simulation']
['Indie', 'Massively Multiplayer', 'RPG']
['Action', 'Indie']
['Simulation']
['Action', 'Casual', 'Indie']
['RPG', 'Strategy']
['Action', 'Adventure', 'Indie', 'RPG']
['Action']
['Adventure', 'Casual', 'Indie', 'RPG']
['Action', 'Indie', 'RPG']
['Action', 'Adventure', 'Indie']
['Action', 'Strategy']
['Action', 'Strategy']
['Action', 'Indie', 'Simulation']
['Adventure', 'Casual', 'Indie']
['RPG']
['RPG']
['RPG']
['Action', 'Indie']
['Action', 'Strategy']
['Indie', 'RPG', 'Strategy']
['RPG']
['Simulation']
['Casual', 'Strategy']
['Adventure', 'Indie', 'RPG']
['Action', 'Adventure', 'Indie']
['Action', 'Adventure', 'Indie', 'RPG']

['Indie', 'RPG', 'Strategy']
['Indie', 'RPG', 'Strategy']
['Indie', 'RPG', 'Strategy']
['Casual', 'Simulation']
['Casual', 'Indie']
['Casual', 'Simulation']
['Simulation']
['Action']
['Action', 'Indie']
['Action', 'Indie']
['Action', 'Indie']
['Action', 'Indie']
['Design &amp; Illustration', 'Web Publishing']
['Action', 'Adventure', 'Indie', 'RPG']
['Action', 'Indie']
['Simulation']
['Simulation', 'Strategy']
['Utilities']
['Simulation']
['Utilities']
['Utilities']
['Utilities']
['Utilities']
nan
['Action', 'Adventure', 'Indie', 'Simulation']
['Adventure', 'Indie']
['Simulation']
['Racing', 'Simulation', 'Sports']
['Casual', 'Simulation']
['Racing', 'Simulation', 'Sports']
['Action', 'Indie']
['Action', 'Adventure']
['Casual', 'Simulation']
['Action', 'Adventure']
['Animation &amp; Modeling', 'Design &amp; Illustration', 'Education', 'Software Training', 'Utilities']
['Adventure', 'Casual', 'Indie', 'RPG']
['Action', 'Adventure']
['Simulation']
['Animation &amp; Modeling', 'Design &amp

['Casual', 'Free to Play', 'Simulation', 'Sports']
['Casual', 'Simulation', 'Sports']
['Casual', 'Free to Play', 'Simulation', 'Sports']
['Casual', 'Free to Play', 'Simulation', 'Sports']
['Casual', 'Free to Play', 'Simulation', 'Sports']
['Casual', 'Free to Play', 'Simulation', 'Sports']
['Casual', 'Free to Play', 'Simulation', 'Sports']
['Casual', 'Free to Play', 'Simulation', 'Sports']
['Casual', 'Free to Play', 'Simulation', 'Sports']
['Casual', 'Free to Play', 'Simulation', 'Sports']
['Casual', 'Free to Play', 'Simulation', 'Sports']
['Casual', 'Free to Play', 'Simulation', 'Sports']
['Casual', 'Free to Play', 'Simulation', 'Sports']
['Casual', 'Free to Play', 'Simulation', 'Sports']
['Casual', 'Free to Play', 'Simulation', 'Sports']
['Casual', 'Free to Play', 'Simulation', 'Sports']
['Casual', 'Free to Play', 'Simulation', 'Sports']
['Casual', 'Free to Play', 'Simulation', 'Sports']
['Casual', 'Free to Play', 'Simulation', 'Sports']
['Casual', 'Free to Play', 'Simulation', 'Sport

['Action', 'Indie']
['Action']
['Adventure', 'Indie']
['Racing']
['Simulation', 'Strategy']
['Indie', 'RPG', 'Strategy']
['Action', 'Adventure']
['Action', 'Adventure']
['Adventure', 'Indie', 'RPG']
['Action', 'Indie', 'Massively Multiplayer', 'RPG']
['Action', 'Massively Multiplayer', 'RPG']
['Action', 'Massively Multiplayer', 'RPG']
['Action', 'Adventure']
['Strategy']
['Simulation', 'Strategy']
['Simulation']
['Simulation']
['Indie']
['Casual', 'Free to Play', 'Indie', 'Simulation']
['Casual', 'Free to Play', 'Indie', 'Simulation']
['Design &amp; Illustration', 'Web Publishing']
['Casual', 'Simulation']
['Design &amp; Illustration', 'Web Publishing']
['Casual', 'Simulation']
['Casual', 'Simulation']
['Casual', 'Simulation']
['Casual', 'Simulation']
['Casual', 'Simulation']
['Action']
['Action', 'Casual', 'Indie']
['Design &amp; Illustration', 'Web Publishing']
['Design &amp; Illustration', 'Web Publishing']
['Design &amp; Illustration', 'Web Publishing']
['Indie', 'Strategy']
['Desi

['Casual', 'Simulation']
['Casual', 'Simulation']
['Casual', 'Simulation']
['Adventure', 'Strategy']
['Casual', 'Simulation']
['Action', 'Indie']
['Casual', 'Simulation']
['Adventure', 'Strategy']
['Action']
['Adventure', 'Indie']
['Adventure', 'Indie']
['Casual', 'Free to Play', 'Simulation']
['Casual', 'Simulation']
['Action', 'Indie']
['Casual', 'Indie', 'RPG', 'Simulation', 'Strategy']
['Action', 'Indie', 'RPG']
['Design &amp; Illustration', 'Web Publishing']
['Design &amp; Illustration', 'Web Publishing']
['Design &amp; Illustration', 'Web Publishing']
['Indie', 'RPG']
['Casual', 'Simulation']
['Action']
['Design &amp; Illustration', 'Web Publishing']
['Casual', 'Simulation']
['Action', 'Adventure', 'Indie']
['Casual', 'Simulation']
['Design &amp; Illustration', 'Web Publishing']
['Indie', 'RPG', 'Strategy']
['Indie', 'RPG', 'Strategy']
['Indie', 'RPG', 'Strategy']
['Casual', 'Simulation']
['Indie', 'RPG', 'Strategy']
['Indie', 'RPG', 'Strategy']
['Indie', 'RPG', 'Strategy']
['Ind

['Action']
['Free to Play', 'Strategy']
['Free to Play', 'Strategy']
['RPG']
['Utilities']
['RPG']
['RPG']
['RPG']
['RPG']
['RPG']
['Design &amp; Illustration', 'Utilities']
['Massively Multiplayer', 'RPG']
['Massively Multiplayer', 'RPG']
['Massively Multiplayer', 'RPG']
['Massively Multiplayer', 'RPG']
['Massively Multiplayer', 'RPG']
['Massively Multiplayer', 'RPG']
['RPG', 'Strategy']
['Strategy']
['Action', 'Adventure', 'Indie']
['Adventure', 'Indie', 'RPG']
['Adventure', 'RPG', 'Strategy']
['Indie', 'Simulation', 'Strategy']
['Simulation']
['Simulation']
['Adventure', 'Free to Play', 'Indie', 'Massively Multiplayer', 'RPG']
['Adventure', 'Free to Play', 'Indie', 'Massively Multiplayer', 'RPG']
['Adventure', 'Free to Play', 'Indie', 'Massively Multiplayer', 'RPG']
['Adventure', 'Indie', 'RPG']
['Action', 'Indie']
['Action', 'Indie']
['Indie', 'RPG', 'Strategy']
['Indie', 'RPG', 'Strategy']
['Indie', 'RPG', 'Strategy']
['Indie', 'RPG', 'Strategy']
['Indie', 'RPG', 'Strategy']
['Ind

['Action', 'Indie', 'Massively Multiplayer']
['Adventure', 'Casual']
['Adventure', 'Casual']
['Indie', 'RPG', 'Strategy']
['Indie', 'RPG', 'Strategy']
['Indie', 'RPG', 'Strategy']
['Indie', 'RPG', 'Strategy']
['Indie', 'RPG', 'Strategy']
['Indie', 'RPG', 'Strategy']
['Indie', 'RPG', 'Strategy']
['Indie', 'RPG', 'Strategy']
['Indie', 'RPG', 'Strategy']
['Simulation', 'Strategy']
['Action', 'Simulation', 'Strategy']
['Indie', 'Strategy']
['Action', 'Indie']
['Simulation', 'Sports']
['Indie', 'Simulation', 'Strategy']
['Action']
['Free to Play', 'Indie', 'Strategy']
['Adventure', 'Casual', 'Free to Play', 'Indie', 'RPG', 'Simulation', 'Strategy']
['RPG']
['Indie', 'RPG', 'Strategy']
['Adventure', 'Casual', 'Simulation']
['Indie', 'Simulation']
['Adventure', 'Casual', 'Indie', 'Simulation']
['Animation &amp; Modeling', 'Video Production']
['Indie', 'Racing']
['Racing', 'Simulation', 'Sports']
['Racing', 'Simulation', 'Sports']
['Action', 'Free to Play', 'Indie', 'Strategy']
['Action', 'Cas

['Adventure', 'Casual', 'Indie', 'Simulation']
['Casual', 'Free to Play', 'Indie', 'Simulation']
['Action', 'Casual', 'Indie']
['Animation &amp; Modeling', 'Design &amp; Illustration', 'Education', 'Software Training', 'Utilities']
['Action', 'Indie']
['Casual']
['Casual']
['Casual']
['Indie', 'Strategy']
['Action', 'Adventure', 'Indie', 'Strategy']
['Simulation']
['Casual']
['Action', 'Adventure', 'Indie']
['Action', 'Adventure']
['Simulation']
['Action', 'Adventure']
['Casual', 'Simulation']
['Casual', 'Simulation']
['Casual', 'Simulation']
['Casual', 'Simulation']
['Casual', 'Simulation']
['Action']
['Action']
['Action']
['Action']
['Action']
['Action']
['Action']
['Action']
['Action']
['Action']
['Action']
['Action']
['Action']
['Action']
['Action']
['Action']
['Action']
['Indie', 'Simulation']
['Casual', 'Simulation']
['Strategy']
['Indie', 'Simulation']
['Indie']
['Simulation']
['Simulation', 'Strategy']
['Simulation']
['Action', 'Indie']
['Action', 'Free to Play']
['Action', 'Ca

['Casual', 'Simulation']
['Casual', 'Indie', 'Strategy']
['Action', 'Indie', 'Racing', 'Strategy']
['Adventure', 'Indie', 'Simulation', 'Strategy']
['Casual', 'Indie', 'RPG']
['Strategy']
['Action', 'Strategy']
['Indie', 'RPG', 'Strategy']
['Indie', 'RPG', 'Strategy']
['Indie', 'RPG', 'Strategy']
['Indie', 'RPG', 'Strategy']
['Indie', 'RPG', 'Strategy']
['Indie', 'RPG', 'Strategy']
['Indie', 'RPG', 'Strategy']
['Indie', 'RPG', 'Strategy']
['Casual', 'Indie', 'RPG']
['Indie', 'RPG', 'Strategy']
['Indie', 'RPG', 'Strategy']
['Indie', 'RPG', 'Strategy']
['Indie', 'Simulation', 'Strategy']
['Action', 'Indie', 'RPG']
['Indie', 'Simulation', 'Strategy']
['Casual', 'Simulation']
['Action', 'Adventure']
['Casual', 'Simulation']
['Casual', 'Simulation']
['Action', 'Casual', 'Indie']
['Racing']
['Indie']
['Indie', 'RPG', 'Strategy']
['Action', 'Free to Play', 'Indie', 'Strategy']
['Free to Play']
['Free to Play']
['Action', 'Adventure', 'Indie']
['Free to Play']
['Free to Play']
['Action', 'Free

['Action', 'Free to Play', 'Strategy']
['Indie']
['Racing', 'Simulation', 'Sports']
['Action', 'RPG']
['Simulation']
['Simulation']
['Indie', 'RPG', 'Strategy']
['Action', 'Adventure', 'Indie']
['Indie', 'RPG', 'Strategy']
['Indie', 'RPG', 'Strategy']
['Indie', 'RPG', 'Strategy']
['Indie', 'RPG', 'Strategy']
['Action']
['Indie', 'RPG']
['Action']
['Simulation']
['Free to Play', 'Massively Multiplayer', 'RPG']
['Adventure', 'Simulation', 'Sports']
['Action', 'Adventure', 'Indie', 'RPG']
['Action', 'Adventure', 'Indie']
['Casual', 'Simulation']
['Free to Play', 'Massively Multiplayer', 'RPG']
['Free to Play', 'Massively Multiplayer', 'RPG']
['Free to Play', 'Massively Multiplayer', 'RPG']
['Free to Play', 'Massively Multiplayer', 'RPG']
['Free to Play', 'Massively Multiplayer', 'RPG']
['Free to Play', 'Massively Multiplayer', 'RPG']
['Action', 'Free to Play', 'Massively Multiplayer']
['Action', 'Free to Play', 'Massively Multiplayer']
['Action', 'Free to Play', 'Massively Multiplayer']
[

['Casual', 'Simulation']
['Action', 'RPG']
['Action', 'RPG']
['Action', 'Adventure', 'Indie']
['Casual', 'Simulation']
['Simulation']
['Simulation']
['Adventure', 'Indie', 'RPG']
['Adventure', 'Indie', 'RPG']
['Adventure', 'Indie', 'RPG']
['Action', 'Adventure', 'Indie', 'RPG', 'Simulation']
['Adventure', 'Indie']
['Indie', 'Simulation', 'Strategy']
['Casual', 'Indie']
['Adventure', 'Indie']
['Action', 'RPG']
['Adventure', 'Indie']
['Indie', 'RPG', 'Strategy']
['Action', 'Adventure', 'Casual', 'Indie', 'Strategy']
['Simulation']
['Indie', 'Simulation']
['Indie', 'Simulation']
['Simulation']
['Simulation']
['Simulation']
['Action', 'Casual', 'Indie', 'Racing', 'Simulation', 'Sports']
['Action', 'Casual', 'Indie', 'Racing', 'Simulation', 'Sports']
['Action', 'Casual', 'Indie', 'Racing', 'Simulation', 'Sports']
['Adventure', 'Indie', 'RPG']
['Free to Play', 'Massively Multiplayer', 'Simulation', 'Sports']
['Action', 'Adventure', 'Free to Play', 'Massively Multiplayer', 'RPG']
['Simulation

['Action', 'Indie', 'Strategy']
['Casual', 'Indie', 'RPG', 'Simulation', 'Strategy']
['Free to Play', 'Strategy']
['Action', 'Adventure', 'Indie']
['Action', 'Casual', 'Indie']
['Action', 'Adventure', 'Casual', 'Indie']
['Action', 'Adventure', 'RPG']
['Casual', 'Free to Play', 'Indie', 'Simulation', 'Sports', 'Strategy']
['Action', 'Adventure', 'Indie']
['Action', 'Adventure', 'Indie']
['Action', 'Casual', 'Free to Play', 'Indie']
['Racing', 'Simulation', 'Sports']
['Action', 'Adventure', 'Indie', 'RPG', 'Strategy']
['Action', 'Casual', 'Massively Multiplayer']
['Action', 'Casual', 'Massively Multiplayer']
['Action', 'Casual', 'Massively Multiplayer']
['Action', 'Adventure', 'Indie']
['Adventure', 'Casual', 'Indie', 'Simulation']
['Adventure', 'Indie']
['Action', 'Indie', 'RPG']
['Adventure', 'Casual', 'RPG', 'Strategy']
['Indie', 'RPG', 'Strategy']
['Action', 'Indie']
['Action']
['Action']
['Action']
['Adventure', 'Casual', 'Indie']
['Simulation']
['Simulation']
['Design &amp; Illustr

['RPG', 'Strategy']
['Design &amp; Illustration', 'Web Publishing']
['Design &amp; Illustration', 'Web Publishing']
['Simulation']
['Design &amp; Illustration', 'Web Publishing']
['Design &amp; Illustration', 'Web Publishing']
['Action', 'Adventure', 'Casual', 'Free to Play', 'Indie', 'Simulation']
['Free to Play', 'Indie', 'Massively Multiplayer', 'Strategy']
['Adventure', 'Indie', 'RPG']
['Action', 'Indie']
['Indie', 'Strategy']
['Indie', 'Strategy']
['Adventure', 'Indie', 'RPG']
['Simulation', 'Strategy']
['Action', 'Adventure', 'Indie']
['Casual', 'Free to Play', 'Simulation', 'Sports']
['Action', 'Adventure', 'Indie']
['Adventure', 'Indie', 'RPG']
['Free to Play', 'Indie', 'Massively Multiplayer', 'Strategy']
['Free to Play', 'Indie', 'Massively Multiplayer', 'Strategy']
['Free to Play', 'Indie', 'Massively Multiplayer', 'Strategy']
['Casual', 'Free to Play', 'Simulation', 'Sports']
['Free to Play', 'Indie']
['Indie', 'RPG', 'Strategy']
['Casual', 'Free to Play', 'Simulation', 'Sp

['Action', 'Adventure']
['Casual', 'Simulation']
['Simulation']
['Indie', 'RPG', 'Strategy']
['Simulation']
['Indie', 'RPG', 'Strategy']
['Indie', 'RPG', 'Strategy']
['Indie', 'RPG', 'Strategy']
['Indie', 'RPG', 'Strategy']
['Indie', 'RPG', 'Strategy']
['Action']
['Indie', 'RPG', 'Strategy']
['Simulation']
['Simulation']
['Casual', 'Indie', 'RPG', 'Simulation']
['Simulation']
['Indie', 'RPG', 'Strategy']
['Indie', 'RPG', 'Strategy']
['Indie', 'RPG', 'Strategy']
['Indie', 'RPG', 'Strategy']
['Adventure', 'Indie', 'RPG']
['Free to Play', 'Indie', 'Simulation']
['Action', 'Adventure', 'Indie', 'Simulation']
['Adventure', 'Casual', 'Indie', 'RPG']
['Indie', 'Racing', 'Simulation', 'Sports']
['Action']
['Strategy']
['Simulation']
['Simulation']
['Simulation']
['Strategy']
['RPG']
['Casual', 'Simulation']
['Casual', 'Simulation']
['Casual', 'Simulation']
['Casual', 'Simulation']
['Casual', 'Simulation']
['Action', 'Casual', 'Indie']
['Indie', 'Simulation', 'Strategy']
['Action']
['Action']
[

['Action', 'Indie', 'Racing', 'Sports']
['Action', 'Indie']
['Action', 'Indie']
['Action']
['Action']
['Action']
['Adventure', 'Indie']
['Action', 'Casual', 'Indie']
['Action', 'Free to Play', 'Massively Multiplayer', 'Strategy']
['Indie', 'Simulation']
['Action', 'Free to Play', 'Massively Multiplayer', 'Strategy']
['Indie', 'RPG']
['Adventure', 'Indie']
['Action']
['Indie', 'Strategy']
['Action']
['Action']
['Action']
['Action']
['Adventure', 'Indie']
['Action', 'Adventure', 'Casual', 'Indie']
['Simulation']
['Action', 'Casual', 'Indie']
['Action', 'Indie']
['Action', 'Indie']
['Action', 'Adventure']
['Action', 'Adventure']
['Adventure']
['Free to Play', 'Racing', 'Simulation', 'Sports']
['Adventure']
['Action']
['Action', 'Indie']
['Casual', 'Simulation']
['Action']
['Casual', 'Simulation']
['Casual', 'Simulation']
['Casual', 'Simulation']
['Action', 'Massively Multiplayer', 'RPG']
['Action', 'Massively Multiplayer', 'RPG']
['Action', 'Massively Multiplayer', 'RPG']
['Action', 'Mass

['Action', 'Adventure', 'Free to Play', 'Massively Multiplayer', 'RPG']
['Indie', 'Simulation']
['Simulation']
['Action', 'Indie', 'Simulation', 'Strategy']
['Indie', 'Simulation']
['Action', 'Adventure', 'Free to Play', 'Massively Multiplayer', 'RPG']
['Massively Multiplayer', 'Strategy']
['Indie', 'Simulation']
['Indie', 'Simulation']
['Indie', 'Sports']
['Simulation']
['Action', 'Casual', 'Free to Play', 'RPG', 'Simulation', 'Strategy']
['Simulation']
['Casual', 'Indie', 'Simulation']
['Adventure', 'Indie', 'RPG']
['Design &amp; Illustration', 'Web Publishing']
['Design &amp; Illustration', 'Web Publishing']
['Design &amp; Illustration', 'Web Publishing']
['Design &amp; Illustration', 'Web Publishing']
['Action', 'Casual', 'Indie', 'Racing', 'Strategy']
['Racing', 'Simulation']
['Adventure', 'Casual', 'Free to Play', 'Indie', 'Simulation']
['Action', 'Free to Play', 'Indie', 'Strategy']
['Action', 'Casual', 'Free to Play', 'Indie']
['Action']
['Free to Play', 'Indie', 'Strategy']
['

['Simulation']
['Simulation']
['Action', 'Indie']
['Indie', 'Simulation']
['Animation &amp; Modeling', 'Design &amp; Illustration', 'Education', 'Software Training', 'Utilities']
['Casual', 'Simulation']
['Casual', 'Simulation']
['Casual', 'Simulation']
['Casual', 'Simulation']
['Casual', 'Simulation']
['Casual', 'Simulation']
['Casual', 'Simulation']
['Casual', 'Simulation']
['Casual', 'Simulation']
['Casual', 'Simulation']
['Action', 'Adventure']
['Action', 'Adventure']
['Action', 'Adventure', 'RPG']
['Simulation']
['Action', 'Adventure', 'Indie', 'Racing']
['Action', 'Adventure', 'Indie', 'Racing']
['Action', 'Strategy']
['Action', 'Strategy']
['Casual', 'Indie', 'Simulation']
['Action']
['Action']
['Strategy']
['Strategy']
['Adventure', 'Indie']
['Indie', 'Strategy']
['Animation &amp; Modeling', 'Video Production']
['Action', 'Adventure', 'Strategy']
['Simulation']
['Indie']
['Action', 'Indie']
['Action', 'Indie']
['Simulation']
['Strategy']
['Animation &amp; Modeling', 'Design &am

['Casual', 'Indie', 'Strategy']
['Action', 'Adventure', 'Casual', 'Indie', 'RPG']
['Simulation']
['Simulation']
['Action', 'RPG']
['Free to Play', 'RPG']
['Free to Play', 'RPG']
['Free to Play', 'RPG']
['Simulation', 'Strategy']
['Racing', 'Sports']
['Action', 'RPG']
['Casual', 'Simulation']
['Design &amp; Illustration', 'Utilities', 'Web Publishing']
['Action']
['Design &amp; Illustration', 'Web Publishing']
['Design &amp; Illustration', 'Web Publishing']
['Design &amp; Illustration', 'Web Publishing']
['Casual', 'Indie', 'Strategy']
['Action', 'RPG']
['Action', 'Adventure']
['Action', 'Adventure']
['Action', 'RPG', 'Simulation']
['Adventure', 'Free to Play', 'Indie']
['Action', 'Adventure']
['Action', 'Adventure']
['Casual', 'Simulation']
['Casual', 'Simulation']
['Casual', 'Simulation']
['Action', 'Adventure']
['Action', 'Adventure']
['Action', 'Adventure']
['Action', 'Adventure']
['Action']
['Action', 'Adventure']
['Action', 'Adventure']
['Action', 'Adventure']
['Action', 'Adventur

['Casual', 'Simulation']
['Strategy']
['Action', 'Indie', 'Strategy']
['Strategy']
['Action', 'Indie']
['Strategy']
['Strategy']
['Action', 'Adventure']
['Action', 'Strategy', 'Indie']
['Simulation']
['Indie', 'Casual']
['Racing']
['Casual', 'Simulation']
['Casual', 'Simulation']
['Racing']
['Casual', 'Simulation']
['Racing']
['Casual', 'Simulation']
['Strategy']
['Strategy']
['Strategy']
['Strategy']
['Simulation']
['Strategy']
['Indie', 'Strategy']
['Action']
['Action', 'Adventure']
['Strategy']
['Simulation']
['Strategy']
['Action', 'Adventure']
['Action', 'RPG']
['Action', 'RPG']
['Action', 'Adventure']
['Simulation']
['Simulation']
['Simulation']
['Action', 'RPG']
['Strategy']
['Strategy']
['Strategy']
['Simulation']
['Strategy']
['Simulation']
['Casual', 'Simulation']
['Casual', 'Simulation']
['Action', 'Indie']
['Strategy']
['Simulation']
['Casual', 'Simulation']
['Strategy']
['Action']
['Action']
['Simulation']
['Action', 'Adventure']
['Action']
['Action']
['Action']
['Action']

['Indie', 'RPG', 'Strategy']
['Simulation']
['Action', 'Indie', 'Massively Multiplayer']
['Casual', 'Indie']
nan
['Casual', 'Indie']
['Casual', 'Indie']
['Adventure', 'Indie']
['Action', 'Adventure', 'Casual', 'Indie']
['Action', 'Indie', 'Strategy']
['Action']
['Utilities']
['Utilities']
['Utilities']
['Utilities']
['Action', 'Adventure', 'Casual', 'Indie', 'RPG']
['Utilities']
['Action', 'Indie']
['Action', 'Free to Play']
['Action', 'Casual', 'Free to Play', 'Indie', 'Racing']
['Indie', 'Simulation']
['Action', 'Casual', 'Free to Play', 'Indie', 'Racing']
['Action', 'Casual', 'Free to Play', 'Indie', 'Racing']
['Casual', 'Indie', 'RPG', 'Simulation', 'Strategy']
['Adventure', 'Casual', 'Indie', 'Simulation', 'Sports']
['Action', 'Adventure', 'RPG']
['Racing', 'Simulation']
['Adventure', 'Casual', 'Indie', 'Simulation', 'Strategy']
['Free to Play', 'Indie', 'Simulation', 'Sports']
['Free to Play', 'Indie', 'Simulation', 'Sports']
['Racing', 'Simulation', 'Sports']
['Free to Play', 'I

0        None
1        None
2        None
3        None
4        None
5        None
6        None
7        None
8        None
9        None
10       None
11       None
12       None
13       None
14       None
15       None
16       None
17       None
18       None
19       None
20       None
21       None
23       None
24       None
25       None
26       None
27       None
28       None
29       None
30       None
         ... 
12811    None
12812    None
12813    None
12814    None
12815    None
12816    None
12817    None
12818    None
12819    None
12820    None
12821    None
12822    None
12823    None
12824    None
12825    None
12826    None
12827    None
12828    None
12829    None
12830    None
12831    None
12832    None
12833    None
12834    None
12835    None
12836    None
12837    None
12838    None
12839    None
12840    None
Length: 12572, dtype: object

In [8]:
# save dictionaries as csv
(pd.DataFrame.from_dict(data=dlc_genres_count, orient='index').
to_csv('dlc_genres_count.csv', header=True))

(pd.DataFrame.from_dict(data=dlc_genres, orient='index').
to_csv('dlc_genres.csv', header=True))

In [9]:
dlc_specs = {}
dlc_specs_count = {}
#unpack dlc genre and put it in dictionary
def unpack_specs(row):
    """This function unpacks the list of specs and stores it into a dictionary"""
    specs = row['specs']
    print(specs)

    try:
        for i in range(len(specs)):
            if specs[i] not in dlc_specs:
                dlc_specs[specs[i]] = 0
                dlc_specs_count[specs[i]] = 0
            else:
                dlc_specs_count[specs[i]] += 1
    except:
        pass

dlcs_raw.apply(unpack_specs, axis=1)

['Single-player', 'Downloadable Content', 'Steam Achievements', 'Full controller support', 'Steam Trading Cards', 'Steam Cloud', 'Stats']
['Single-player', 'Downloadable Content', 'Steam Achievements']
['Single-player', 'Online Multi-Player', 'Local Multi-Player', 'Downloadable Content']
['Single-player', 'Downloadable Content', 'Steam Achievements', 'Full controller support', 'Steam Workshop']
['Single-player', 'Downloadable Content']
['Single-player', 'Downloadable Content']
['Single-player', 'Downloadable Content']
['Single-player', 'Downloadable Content', 'Steam Achievements']
['Single-player', 'Multi-player', 'Online Multi-Player', 'Local Multi-Player', 'Downloadable Content', 'Steam Achievements', 'Partial Controller Support', 'Steam Cloud']
['Single-player', 'Multi-player', 'Online Multi-Player', 'Local Multi-Player', 'Downloadable Content', 'Steam Achievements', 'Partial Controller Support', 'Steam Cloud']
['Single-player', 'Multi-player', 'Online Multi-Player', 'Local Multi-Pl

['Single-player', 'Downloadable Content', 'Steam Achievements', 'Partial Controller Support', 'Steam Cloud', 'Steam Leaderboards']
['Single-player', 'Multi-player', 'Downloadable Content', 'Steam Trading Cards']
['Single-player', 'Multi-player', 'Downloadable Content', 'Steam Trading Cards']
['Single-player', 'MMO', 'Co-op', 'Downloadable Content', 'Partial Controller Support']
['Single-player', 'Downloadable Content', 'Steam Achievements', 'Full controller support', 'Steam Cloud', 'Steam Leaderboards']
['Single-player', 'Co-op', 'Downloadable Content', 'Steam Achievements', 'Full controller support', 'Steam Trading Cards', 'Steam Cloud']
['Single-player', 'Downloadable Content', 'Steam Achievements', 'Stats', 'Steam Leaderboards']
['Single-player', 'Multi-player', 'Co-op', 'Downloadable Content', 'Captions available', 'Partial Controller Support', 'Includes level editor']
['Single-player', 'Multi-player', 'Downloadable Content', 'Steam Achievements', 'Stats', 'Steam Leaderboards']
['S

['Single-player', 'Multi-player', 'Co-op', 'Downloadable Content', 'Steam Achievements', 'Full controller support', 'Steam Workshop', 'Steam Cloud', 'Stats', 'Steam Leaderboards', 'Includes level editor']
['Single-player', 'Downloadable Content', 'Steam Achievements']
['Single-player', 'Downloadable Content', 'Steam Achievements', 'Full controller support', 'Steam Trading Cards', 'Steam Leaderboards']
['Single-player', 'Downloadable Content', 'Steam Achievements', 'Full controller support', 'Steam Trading Cards', 'Steam Workshop', 'Steam Cloud', 'Steam Leaderboards']
['Single-player', 'Multi-player', 'Co-op', 'Downloadable Content', 'Steam Achievements', 'Full controller support', 'Steam Cloud', 'Valve Anti-Cheat enabled', 'Steam Leaderboards']
['Single-player', 'Multi-player', 'Co-op', 'Downloadable Content', 'Steam Achievements', 'Full controller support', 'Steam Cloud', 'Valve Anti-Cheat enabled', 'Steam Leaderboards']
['Single-player', 'Multi-player', 'Co-op', 'Downloadable Content

['Single-player', 'Shared/Split Screen', 'Downloadable Content', 'Steam Achievements', 'Steam Trading Cards', 'Partial Controller Support', 'Steam Cloud', 'Steam Leaderboards']
['Single-player', 'Multi-player', 'Downloadable Content', 'Captions available', 'Partial Controller Support']
['Single-player', 'Multi-player', 'Co-op', 'Downloadable Content', 'Steam Achievements', 'Full controller support', 'Steam Trading Cards', 'Steam Cloud']
['Single-player', 'Co-op', 'Downloadable Content', 'Steam Achievements', 'Full controller support', 'Steam Trading Cards', 'Steam Cloud', 'Steam Leaderboards']
['Single-player', 'Multi-player', 'Cross-Platform Multiplayer', 'Downloadable Content']
['Single-player', 'Multi-player', 'Downloadable Content', 'Steam Achievements', 'Partial Controller Support', 'Steam Leaderboards']
['Single-player', 'Downloadable Content', 'Steam Achievements', 'Partial Controller Support', 'Steam Cloud']
['Single-player', 'Multi-player', 'Co-op', 'Downloadable Content', 'St

['Single-player', 'Downloadable Content', 'Steam Achievements', 'Steam Trading Cards', 'Partial Controller Support', 'Steam Cloud']
['Single-player', 'Downloadable Content', 'Steam Achievements', 'Steam Trading Cards', 'Partial Controller Support', 'Steam Cloud']
['Single-player', 'Downloadable Content', 'Steam Achievements', 'Steam Trading Cards', 'Partial Controller Support', 'Steam Cloud']
['Single-player', 'Downloadable Content', 'Steam Achievements', 'Steam Trading Cards', 'Partial Controller Support', 'Steam Cloud']
['Single-player', 'Downloadable Content', 'Steam Achievements', 'Steam Trading Cards', 'Partial Controller Support', 'Steam Cloud']
['Single-player', 'Downloadable Content', 'Steam Achievements', 'Steam Trading Cards', 'Partial Controller Support', 'Steam Cloud']
['Single-player', 'Downloadable Content', 'Steam Achievements', 'Full controller support', 'Steam Trading Cards', 'Steam Workshop', 'Steam Cloud', 'Steam Leaderboards']
['Multi-player', 'Cross-Platform Multip

['Single-player', 'Multi-player', 'MMO', 'Co-op', 'Cross-Platform Multiplayer', 'Downloadable Content', 'Steam Achievements', 'Steam Trading Cards', 'Stats', 'Steam Leaderboards', 'Commentary available']
['Single-player', 'Downloadable Content', 'Full controller support']
['Single-player', 'Downloadable Content']
['Single-player', 'Downloadable Content', 'Steam Achievements', 'Steam Trading Cards', 'Steam Cloud']
['Single-player', 'Multi-player', 'Online Multi-Player', 'Downloadable Content', 'In-App Purchases']
['Single-player', 'Downloadable Content', 'Full controller support']
['Single-player', 'Downloadable Content']
['Single-player', 'Downloadable Content', 'Full controller support']
['Single-player', 'Downloadable Content']
['Single-player', 'Multi-player', 'Co-op', 'Downloadable Content', 'Steam Achievements', 'Steam Trading Cards']
['Single-player', 'Multi-player', 'Co-op', 'Downloadable Content', 'Steam Achievements', 'Steam Trading Cards']
['Single-player', 'Downloadable Cont

['Single-player', 'Multi-player', 'Downloadable Content', 'Steam Achievements', 'Partial Controller Support']
['Single-player', 'Multi-player', 'Downloadable Content', 'Steam Achievements', 'Partial Controller Support']
['Single-player', 'Multi-player', 'Downloadable Content', 'Steam Achievements', 'Partial Controller Support']
['Single-player', 'Multi-player', 'Downloadable Content', 'Steam Achievements', 'Partial Controller Support']
['Single-player', 'Multi-player', 'Downloadable Content', 'Steam Achievements', 'Partial Controller Support']
['Single-player', 'Multi-player', 'Downloadable Content', 'Steam Achievements', 'Partial Controller Support']
['Single-player', 'Multi-player', 'Downloadable Content', 'Steam Achievements', 'Partial Controller Support']
['Single-player', 'Multi-player', 'Downloadable Content', 'Steam Achievements', 'Partial Controller Support']
['Single-player', 'Multi-player', 'Downloadable Content', 'Steam Achievements', 'Partial Controller Support']
['Single-p

['Single-player', 'Downloadable Content', 'Steam Achievements', 'Full controller support', 'Steam Trading Cards', 'Steam Cloud']
['Single-player', 'Downloadable Content', 'Steam Achievements', 'Full controller support', 'Steam Trading Cards', 'Steam Cloud']
['Single-player', 'Downloadable Content', 'Steam Achievements', 'Full controller support', 'Steam Trading Cards', 'Steam Cloud']
['Single-player', 'Downloadable Content', 'Steam Achievements', 'Full controller support', 'Steam Trading Cards', 'Steam Cloud']
['Single-player', 'Downloadable Content', 'Steam Achievements', 'Full controller support', 'Steam Trading Cards', 'Steam Cloud']
['Single-player', 'Multi-player', 'Online Multi-Player', 'Shared/Split Screen', 'Downloadable Content', 'Steam Achievements', 'Steam Trading Cards', 'Partial Controller Support', 'Steam Cloud']
['Single-player', 'Downloadable Content']
['Single-player', 'Shared/Split Screen', 'Downloadable Content', 'Steam Achievements', 'Steam Trading Cards', 'Partial 

['Single-player', 'Multi-player', 'Co-op', 'Downloadable Content']
['Single-player', 'Multi-player', 'Co-op', 'Downloadable Content']
['Single-player', 'Multi-player', 'Co-op', 'Cross-Platform Multiplayer', 'Downloadable Content', 'Steam Achievements', 'Steam Workshop', 'Partial Controller Support', 'Steam Cloud']
['Single-player', 'Multi-player', 'Online Multi-Player', 'Local Multi-Player', 'Co-op', 'Online Co-op', 'Local Co-op', 'Downloadable Content', 'Steam Achievements', 'Steam Trading Cards', 'Captions available', 'Steam Workshop', 'Partial Controller Support', 'Steam Leaderboards', 'Includes level editor']
['Single-player', 'Multi-player', 'Downloadable Content']
['Multi-player', 'Co-op', 'Cross-Platform Multiplayer', 'Downloadable Content']
['Multi-player', 'Co-op', 'Cross-Platform Multiplayer', 'Downloadable Content']
['Single-player', 'Multi-player', 'Co-op', 'Shared/Split Screen', 'Cross-Platform Multiplayer', 'Downloadable Content', 'Steam Achievements', 'Steam Trading Card

['Single-player', 'Downloadable Content']
['Single-player', 'Local Multi-Player', 'Local Co-op', 'Shared/Split Screen', 'Downloadable Content', 'Full controller support', 'Steam Cloud']
['Single-player', 'Downloadable Content', 'HTC Vive', 'Tracked Motion Controllers', 'Seated', 'Standing']
['Single-player', 'Downloadable Content', 'HTC Vive', 'Tracked Motion Controllers', 'Seated', 'Standing']
['Single-player', 'Downloadable Content', 'HTC Vive', 'Tracked Motion Controllers', 'Seated', 'Standing']
['Single-player', 'Downloadable Content', 'HTC Vive', 'Tracked Motion Controllers', 'Seated', 'Standing']
['Single-player', 'Downloadable Content']
['Single-player', 'Co-op', 'Shared/Split Screen', 'Downloadable Content', 'In-App Purchases', 'Steam Cloud']
['Single-player', 'Downloadable Content']
['Single-player', 'Downloadable Content', 'Steam Trading Cards']
['Multi-player', 'Online Multi-Player', 'MMO', 'Co-op', 'Online Co-op', 'Downloadable Content', 'Steam Trading Cards']
['Single-play

['Single-player', 'Downloadable Content', 'Steam Achievements', 'Full controller support', 'Steam Trading Cards', 'Steam Cloud']
['Single-player', 'Multi-player', 'Online Multi-Player', 'Downloadable Content', 'Steam Achievements', 'Steam Trading Cards']
['Single-player', 'Multi-player', 'Co-op', 'Downloadable Content', 'Steam Trading Cards', 'Partial Controller Support']
['Single-player', 'Downloadable Content', 'Steam Achievements', 'Full controller support']
['Single-player', 'Multi-player', 'Online Multi-Player', 'Local Multi-Player', 'Downloadable Content', 'Steam Achievements', 'Full controller support', 'Steam Trading Cards', 'In-App Purchases', 'Steam Cloud']
['Single-player', 'Downloadable Content', 'Steam Achievements', 'Full controller support']
['Multi-player', 'Downloadable Content']
['Multi-player', 'Downloadable Content']
['Single-player', 'Online Multi-Player', 'Local Multi-Player', 'Downloadable Content']
['Single-player', 'Downloadable Content', 'Captions available', 

['Single-player', 'Downloadable Content']
['Single-player', 'Downloadable Content', 'Steam Achievements', 'Partial Controller Support']
['Single-player', 'Downloadable Content', 'Steam Achievements', 'Partial Controller Support']
['Single-player', 'Downloadable Content', 'Steam Achievements', 'Partial Controller Support']
['Single-player', 'Downloadable Content', 'Steam Achievements', 'Partial Controller Support']
['Single-player', 'Downloadable Content', 'Steam Achievements', 'Partial Controller Support']
['Single-player', 'Downloadable Content']
['Single-player', 'Multi-player', 'Online Multi-Player', 'Downloadable Content', 'Steam Achievements', 'Steam Trading Cards', 'Steam Workshop']
['Single-player', 'Multi-player', 'Online Multi-Player', 'Downloadable Content', 'Steam Achievements', 'Steam Trading Cards']
['Single-player', 'Downloadable Content', 'Steam Achievements', 'Partial Controller Support']
['Single-player', 'Downloadable Content']
['Multi-player', 'Downloadable Content',

['Single-player', 'Downloadable Content', 'Steam Achievements', 'Steam Trading Cards', 'Captions available']
['Single-player', 'Online Multi-Player', 'Local Multi-Player', 'Downloadable Content']
['Single-player', 'Downloadable Content', 'Partial Controller Support', 'Steam Cloud']
['Single-player', 'Downloadable Content', 'Partial Controller Support', 'Steam Cloud']
['Single-player', 'Online Multi-Player', 'Downloadable Content', 'Partial Controller Support', 'Steam Cloud']
['Multi-player', 'MMO', 'Downloadable Content', 'Steam Trading Cards', 'In-App Purchases']
['Single-player', 'Downloadable Content', 'Steam Achievements', 'Full controller support', 'Steam Trading Cards', 'Steam Workshop', 'Steam Cloud', 'Steam Leaderboards']
['Single-player', 'Downloadable Content']
['Single-player', 'Downloadable Content']
['Single-player', 'Downloadable Content', 'Full controller support', 'Steam Workshop', 'Steam Cloud', 'Steam Leaderboards']
['Single-player', 'Downloadable Content', 'Full cont

['Single-player', 'Shared/Split Screen', 'Downloadable Content', 'Steam Achievements', 'Steam Trading Cards', 'Partial Controller Support', 'Steam Cloud', 'Steam Leaderboards']
['Single-player', 'Shared/Split Screen', 'Downloadable Content', 'Steam Achievements', 'Steam Trading Cards', 'Partial Controller Support', 'Steam Cloud', 'Steam Leaderboards']
['Single-player', 'Shared/Split Screen', 'Downloadable Content', 'Steam Achievements', 'Steam Trading Cards', 'Partial Controller Support', 'Steam Cloud', 'Steam Leaderboards']
['Single-player', 'Shared/Split Screen', 'Downloadable Content', 'Steam Achievements', 'Steam Trading Cards', 'Partial Controller Support', 'Steam Cloud', 'Steam Leaderboards']
['Single-player', 'Multi-player', 'Online Multi-Player', 'Shared/Split Screen', 'Cross-Platform Multiplayer', 'Downloadable Content']
['Single-player', 'Multi-player', 'Online Multi-Player', 'Local Multi-Player', 'Shared/Split Screen', 'Cross-Platform Multiplayer', 'Downloadable Content']
['

['Single-player', 'Multi-player', 'Downloadable Content', 'Steam Achievements', 'Steam Trading Cards', 'Steam Cloud', 'Valve Anti-Cheat enabled', 'Stats', 'Steam Leaderboards']
['Single-player', 'Downloadable Content']
['Single-player', 'Downloadable Content']
['Single-player', 'Multi-player', 'Downloadable Content', 'Steam Achievements', 'Steam Trading Cards', 'Steam Workshop']
['Single-player', 'Multi-player', 'Cross-Platform Multiplayer', 'Downloadable Content', 'Steam Achievements', 'Steam Workshop', 'Steam Cloud']
['Single-player', 'Downloadable Content', 'Steam Achievements', 'Full controller support', 'Steam Trading Cards', 'Steam Workshop', 'Steam Cloud', 'Steam Leaderboards']
['Single-player', 'Co-op', 'Shared/Split Screen', 'Downloadable Content']
['Single-player', 'Downloadable Content', 'Steam Achievements', 'Partial Controller Support', 'Steam Cloud', 'Steam Leaderboards']
['Single-player', 'Downloadable Content']
['Multi-player', 'Co-op', 'Cross-Platform Multiplayer', 'Do

['Single-player', 'Downloadable Content']
['Single-player', 'Downloadable Content', 'Steam Achievements', 'Full controller support', 'Steam Cloud']
['Single-player', 'Downloadable Content', 'Steam Achievements', 'Steam Trading Cards']
['Single-player', 'Multi-player', 'Downloadable Content', 'Partial Controller Support']
['Single-player', 'Downloadable Content']
['Single-player', 'Downloadable Content']
['Single-player', 'Downloadable Content', 'Steam Achievements', 'Full controller support', 'Steam Trading Cards', 'Steam Workshop', 'Steam Cloud', 'Steam Leaderboards']
['Single-player', 'Downloadable Content', 'Steam Achievements', 'Full controller support', 'Captions available', 'Steam Cloud']
['Single-player', 'Downloadable Content']
['Single-player', 'Multi-player', 'Co-op', 'Downloadable Content', 'Steam Trading Cards', 'Partial Controller Support']
['Single-player', 'Multi-player', 'Co-op', 'Downloadable Content', 'Steam Trading Cards', 'Partial Controller Support']
['Single-playe

['Single-player', 'Downloadable Content', 'Partial Controller Support', 'Includes level editor']
['Single-player', 'Downloadable Content', 'Partial Controller Support', 'Includes level editor']
['Single-player', 'Multi-player', 'Cross-Platform Multiplayer', 'Downloadable Content']
['Single-player', 'Multi-player', 'Cross-Platform Multiplayer', 'Downloadable Content']
['Single-player', 'Multi-player', 'Cross-Platform Multiplayer', 'Downloadable Content']
['Single-player', 'Multi-player', 'Cross-Platform Multiplayer', 'Downloadable Content']
['Single-player', 'Multi-player', 'Downloadable Content', 'Partial Controller Support']
['Single-player', 'Multi-player', 'Downloadable Content', 'Partial Controller Support']
['Single-player', 'Multi-player', 'Downloadable Content', 'Steam Achievements', 'Partial Controller Support', 'Steam Cloud', 'Steam Leaderboards']
['Single-player', 'Multi-player', 'Downloadable Content', 'Partial Controller Support']
['Single-player', 'Multi-player', 'Download

['Single-player', 'Shared/Split Screen', 'Downloadable Content', 'Steam Achievements', 'Steam Trading Cards', 'Partial Controller Support', 'Steam Cloud', 'Steam Leaderboards']
['Single-player', 'Shared/Split Screen', 'Downloadable Content', 'Steam Achievements', 'Steam Trading Cards', 'Partial Controller Support', 'Steam Cloud', 'Steam Leaderboards']
['Single-player', 'Shared/Split Screen', 'Downloadable Content', 'Full controller support', 'Steam Cloud']
['Single-player', 'Shared/Split Screen', 'Downloadable Content', 'Steam Achievements', 'Steam Trading Cards', 'Partial Controller Support', 'Steam Cloud', 'Steam Leaderboards']
['Single-player', 'Downloadable Content', 'Steam Achievements', 'Full controller support']
['Single-player', 'Shared/Split Screen', 'Downloadable Content', 'Steam Achievements', 'Steam Trading Cards', 'Partial Controller Support', 'Steam Cloud', 'Steam Leaderboards']
['Single-player', 'Downloadable Content', 'Steam Achievements', 'Full controller support', 'St

['Single-player', 'Multi-player', 'Co-op', 'Downloadable Content', 'Steam Achievements', 'Full controller support', 'Steam Trading Cards', 'Steam Cloud']
['Single-player', 'Shared/Split Screen', 'Downloadable Content', 'Steam Achievements', 'Steam Trading Cards', 'Partial Controller Support', 'Steam Cloud', 'Steam Leaderboards']
['Single-player', 'Shared/Split Screen', 'Downloadable Content', 'Steam Achievements', 'Steam Trading Cards', 'Partial Controller Support', 'Steam Cloud', 'Steam Leaderboards']
['Single-player', 'Shared/Split Screen', 'Downloadable Content', 'Steam Achievements', 'Steam Trading Cards', 'Partial Controller Support', 'Steam Cloud', 'Steam Leaderboards']
['Single-player', 'Shared/Split Screen', 'Downloadable Content', 'Steam Achievements', 'Steam Trading Cards', 'Partial Controller Support', 'Steam Cloud', 'Steam Leaderboards']
['Single-player', 'Shared/Split Screen', 'Downloadable Content', 'Steam Achievements', 'Steam Trading Cards', 'Partial Controller Support'

['MMO', 'Downloadable Content']
['MMO', 'Co-op', 'Downloadable Content', 'Steam Achievements']
['MMO', 'Downloadable Content']
['MMO', 'Downloadable Content']
['Single-player', 'Multi-player', 'Downloadable Content', 'Steam Trading Cards', 'Steam Cloud']
['MMO', 'Downloadable Content']
['Single-player', 'Downloadable Content', 'Steam Achievements', 'Steam Leaderboards']
['MMO', 'Downloadable Content']
['Single-player', 'Multi-player', 'Downloadable Content', 'Steam Achievements', 'Full controller support', 'Steam Trading Cards', 'Steam Cloud', 'Steam Leaderboards']
['Single-player', 'Shared/Split Screen', 'Downloadable Content', 'Steam Achievements', 'Steam Trading Cards', 'Partial Controller Support', 'Steam Cloud', 'Steam Leaderboards']
['Single-player', 'Shared/Split Screen', 'Downloadable Content', 'Steam Achievements', 'Steam Trading Cards', 'Partial Controller Support', 'Steam Cloud', 'Steam Leaderboards']
['Single-player', 'Shared/Split Screen', 'Downloadable Content', 'Steam Ac

['Single-player', 'Shared/Split Screen', 'Downloadable Content', 'Steam Achievements', 'Steam Trading Cards', 'Partial Controller Support', 'Steam Cloud', 'Steam Leaderboards']
['Single-player', 'Shared/Split Screen', 'Downloadable Content', 'Steam Achievements', 'Full controller support', 'Steam Trading Cards', 'Steam Workshop', 'Steam Cloud', 'Steam Leaderboards', 'Includes level editor']
['Single-player', 'Co-op', 'Downloadable Content', 'Steam Achievements', 'Full controller support', 'Steam Trading Cards', 'Steam Cloud']
['Single-player', 'Multi-player', 'Downloadable Content', 'Steam Achievements', 'Full controller support', 'Steam Trading Cards', 'Steam Cloud', 'Steam Leaderboards']
['Single-player', 'Multi-player', 'Co-op', 'Shared/Split Screen', 'Downloadable Content', 'Steam Achievements', 'Full controller support', 'Partial Controller Support', 'Steam Leaderboards']
['Single-player', 'Multi-player', 'Co-op', 'Shared/Split Screen', 'Downloadable Content', 'Steam Achievements'

['Single-player', 'Downloadable Content', 'Steam Achievements', 'Partial Controller Support', 'Steam Cloud']
['Single-player', 'Multi-player', 'Downloadable Content']
['Single-player', 'Multi-player', 'Co-op', 'Downloadable Content', 'Steam Achievements', 'Partial Controller Support', 'Steam Cloud']
['Single-player', 'Multi-player', 'Co-op', 'Downloadable Content', 'Steam Achievements', 'Partial Controller Support', 'Steam Cloud']
['Single-player', 'Multi-player', 'Co-op', 'Downloadable Content', 'Steam Achievements', 'Partial Controller Support', 'Steam Cloud']
['Single-player', 'Multi-player', 'Co-op', 'Downloadable Content', 'Steam Achievements', 'Partial Controller Support', 'Steam Cloud']
['Single-player', 'Downloadable Content', 'Steam Achievements', 'Partial Controller Support', 'Steam Cloud', 'Steam Leaderboards']
['Single-player', 'Downloadable Content', 'Steam Achievements', 'Partial Controller Support', 'Steam Cloud', 'Steam Leaderboards']
['Single-player', 'Downloadable Con

0        None
1        None
2        None
3        None
4        None
5        None
6        None
7        None
8        None
9        None
10       None
11       None
12       None
13       None
14       None
15       None
16       None
17       None
18       None
19       None
20       None
21       None
23       None
24       None
25       None
26       None
27       None
28       None
29       None
30       None
         ... 
12811    None
12812    None
12813    None
12814    None
12815    None
12816    None
12817    None
12818    None
12819    None
12820    None
12821    None
12822    None
12823    None
12824    None
12825    None
12826    None
12827    None
12828    None
12829    None
12830    None
12831    None
12832    None
12833    None
12834    None
12835    None
12836    None
12837    None
12838    None
12839    None
12840    None
Length: 12572, dtype: object

In [10]:
# save dictionaries as csv
(pd.DataFrame.from_dict(data=dlc_specs_count, orient='index').
to_csv('dlc_specs_count.csv', header=True))

(pd.DataFrame.from_dict(data=dlc_specs, orient='index').
to_csv('dlc_specs.csv', header=True))

In [11]:
dlc_tags = {}
dlc_tags_count = {}
#unpack dlc tags and put it in dictionary
def unpack_tags(row):
    """This function unpacks the list of genres and stores it into a dictionary"""
    tags = row['tags']
    print(tags)
    try:
        for i in range(len(tags)):
            if tags[i] not in dlc_tags:
                dlc_tags[tags[i]] = 0
                dlc_tags_count[tags[i]] = 0
            else:
                dlc_tags_count[tags[i]] += 1
    except:
        pass

dlcs_raw.apply(unpack_tags, axis=1)

['Action', 'Indie', 'Casual']
['Adventure', 'RPG', 'Indie']
['Simulation']
['Simulation']
['Adventure', 'Indie', 'Casual']
['Indie', 'Casual']
['Indie', 'Casual']
['Indie']
['Simulation', 'Racing', 'Sports']
['Simulation', 'Racing', 'Sports']
['Simulation', 'Racing', 'Sports']
['Simulation', 'Racing', 'Sports']
['Simulation', 'Racing', 'Sports']
['Casual']
['Strategy', 'Free to Play', 'Indie', 'Casual', 'Simulation', 'Sports']
['Simulation']
['Action', 'Adventure', 'RPG', 'Indie']
['Simulation']
['Indie', 'Simulation']
['Indie', 'Simulation']
['Indie', 'Simulation']
['Indie', 'Simulation']
['Racing', 'Simulation']
['Racing', 'Simulation', 'Driving', 'Multiplayer', 'Singleplayer']
['Racing', 'Simulation']
['Simulation']
['Simulation']
['Strategy', 'Adventure', 'Indie']
['Design & Illustration', 'Web Publishing']
['Design & Illustration', 'Web Publishing']
['Design & Illustration', 'Web Publishing']
['Design & Illustration', 'Web Publishing']
['Casual', 'Simulation']
['Design & Illustrat

['Action', 'Adventure']
['Action', 'Horror', 'Survival Horror']
['Action', 'RPG']
['Casual', 'Simulation']
['Action', 'RPG']
['Massively Multiplayer', 'Indie', 'Casual', 'RPG', 'Free to Play']
['Simulation']
['RPG', 'Indie']
['Action', 'Racing', 'Sports']
['Simulation', 'Trains', 'Realistic', 'Singleplayer', 'Driving', 'Building', 'Open World', 'Sandbox', 'Casual', 'Family Friendly', 'Level Editor', 'TrackIR', 'City Builder', 'Atmospheric', 'Relaxing', 'Moddable', 'Sports']
['Adventure', 'Indie', 'Action', 'Horror', 'Atmospheric', 'Singleplayer']
['Strategy', 'RPG', 'Indie', 'Anime']
['Racing', 'Sports', 'Free to Play']
['Action', 'Free to Play', 'Indie', 'Massively Multiplayer', 'RPG', 'Shooter']
['Adventure', 'RPG', 'Indie']
['Strategy', 'Simulation', 'RPG']
['Adventure', 'Indie']
['Strategy', 'Indie', 'Casual']
['Casual', 'Simulation']
['Casual', 'Simulation']
['Design & Illustration', 'Web Publishing']
['Utilities', 'Animation & Modeling', 'Education', 'Software Training', 'Design 

['Casual', 'Simulation']
['Casual', 'Simulation']
['Casual', 'Action', 'Indie', 'Survival', 'Shooter', '2D', 'Sci-fi', 'Fantasy', 'Space', 'Arcade', 'Atmospheric', 'Singleplayer', 'Stylized', "Shoot 'Em Up", 'Anime']
['Indie', 'Casual']
['Casual', 'Simulation']
['Casual', 'Simulation']
['Action', 'Indie']
['Simulation']
['Simulation']
['Casual', 'Simulation']
['Simulation']
['Strategy']
['Adventure', 'Indie']
['Strategy', 'Action', 'Adventure']
['Strategy', 'Adventure', 'Indie']
['Casual', 'Simulation']
['Strategy', 'Indie']
['Design & Illustration', 'Web Publishing']
['Action', 'Adventure', 'Indie']
['Strategy', 'Action', 'Adventure', 'RPG', 'Indie', 'Simulation', 'Survival', 'Post-apocalyptic', 'Choose Your Own Adventure', 'Atmospheric', 'Realistic', 'Open World', 'Great Soundtrack', 'Action RPG', 'Turn-Based Combat', 'Survival Horror']
['Action', 'Indie', 'Casual', 'Music', 'Difficult', 'Minimalist', 'Music-Based Procedural Generation', 'Fast-Paced', 'Arcade']
['Adventure', 'Indie']

['Adventure', 'Indie', 'Casual', 'Survival', 'Hidden Object', 'Choose Your Own Adventure', 'Choices Matter', 'Supernatural', 'Horror', '2D', 'Atmospheric', 'Singleplayer', 'Stylized', 'Dark', 'Anime', 'Action-Adventure', 'Point & Click', 'Psychological Horror', 'Lore-Rich']
['Design & Illustration', 'Utilities']
['Simulation', 'Strategy']
['Strategy', 'RPG', 'Indie']
['Strategy', 'RPG', 'Indie']
['Strategy', 'RPG', 'Indie', 'Turn-Based', 'Fantasy', 'Online Co-Op', 'Party-Based RPG']
['Strategy', 'RPG', 'Indie']
['Strategy', 'RPG', 'Indie']
['Strategy', 'Sports']
['Strategy', 'RPG', 'Indie']
['Casual', 'Simulation']
['Strategy', 'RPG', 'Indie']
['Casual', 'Simulation']
['Casual', 'Simulation']
['Casual', 'Simulation']
['Casual', 'Simulation']
['Action', 'Adventure']
['Action', 'Adventure']
['Strategy', 'RPG', 'Indie']
['Strategy', 'RPG', 'Indie']
['Action', 'Indie']
['Strategy', 'RPG', 'Indie']
['Strategy', 'Indie', 'Simulation']
['Simulation', 'Racing', 'Sports']
['Indie', 'Casual', 'S

['Simulation']
['Strategy', 'Action', 'Indie']
['Simulation']
['Indie', 'Casual', 'Free to Play']
['Strategy', 'Indie', 'Casual', 'Simulation', 'Sports']
['Action', 'RPG', 'Indie']
['Action', 'RPG', 'Indie']
['Action', 'RPG', 'Indie']
['Action', 'RPG', 'Indie']
['Adventure', 'Action']
['Indie', 'Action', 'RPG', 'Rogue-like', 'Rhythm', 'Great Soundtrack']
['Casual', 'Simulation']
['Casual', 'Simulation']
['Casual', 'Simulation']
['Casual', 'Simulation']
['Casual', 'Simulation']
['Adventure', 'RPG', 'Indie']
['Indie', 'Casual']
['Simulation']
['Indie', 'Casual']
['Adventure', 'Indie', 'Simulation']
['Adventure', 'Indie', 'Simulation']
['Action', 'RPG']
['Action', 'Adventure', 'Indie', 'Casual', 'Platformer', 'Great Soundtrack', 'Arcade', 'Steampunk', '2D', 'Sci-fi', 'Minimalist', 'Funny', 'Atmospheric', 'Singleplayer', 'Cartoony', 'Stylized', 'Dark', 'Gore', 'Assassin', 'Abstract']
['Action', 'RPG']
['Action', 'RPG']
['Adventure', 'Indie', 'Casual']
['Free to Play', 'Adventure', 'Action'

['Strategy', 'Casual', 'Simulation']
['Strategy', 'Casual', 'Simulation']
['Strategy', 'Casual', 'Simulation']
['Casual', 'Free to Play', 'Indie', 'Simulation', 'Strategy', 'Sports']
['Casual', 'Simulation']
['Casual', 'Simulation']
['Casual', 'Simulation']
['Design & Illustration', 'Web Publishing']
['Strategy', 'RPG', 'Indie', 'Free to Play']
['Indie', 'Adventure', 'Action', 'Casual', 'VR', 'Horror', 'Singleplayer', 'Difficult', 'Dark', 'Atmospheric', 'Psychological', 'Blood', 'Gore', 'Zombies', 'Demons', 'Survival', 'Survival Horror', 'Violent', 'Exploration', 'Cinematic']
['Strategy', 'Racing', 'Sports', 'Simulation']
['Strategy', 'Simulation', 'Indie', 'Multiplayer']
['Simulation', 'Indie']
['Indie', 'Casual', 'Simulation']
['Indie', 'Strategy', 'Simulation']
['Strategy', 'Adventure', 'RPG', 'Massively Multiplayer', 'Indie']
['Utilities', 'Animation & Modeling', 'Design & Illustration', 'Video Production', 'Software Training', 'Web Publishing', 'Education', 'Game Development']
['A

['Strategy', 'Action']
['Adventure', 'RPG', 'Indie']
['Adventure', 'RPG']
['Adventure', 'Indie', 'Casual', 'Simulation']
['Indie', 'Casual', 'Simulation', 'Free to Play']
['Action', 'Indie', 'Casual', 'Gore', 'Violent', 'Nudity', 'Sexual Content']
['Animation & Modeling', 'Design & Illustration', 'Utilities', 'Education', 'Software Training', 'Game Development']
['Action', 'Indie']
['Casual']
['Casual']
['Casual']
['Strategy', 'Indie']
['Strategy', 'Action', 'Adventure', 'Indie']
['Simulation', 'Multiplayer', 'Open World']
['Casual']
['Action', 'Adventure', 'Indie']
['Action', 'Adventure']
['Simulation']
['Action', 'Adventure']
['Casual', 'Simulation']
['Casual', 'Simulation']
['Casual', 'Simulation']
['Casual', 'Simulation']
['Casual', 'Simulation']
['Action']
['Action']
['Action']
['Action']
['Action']
['Action']
['Action']
['Action']
['Action']
['Action']
['Action']
['Action']
['Action']
['Action']
['Action']
['Action']
['Action']
['Indie', 'Simulation']
['Casual', 'Simulation']
['S

['Strategy', 'Adventure', 'RPG', 'Indie', 'Casual']
['Strategy', 'Adventure', 'RPG', 'Indie', 'Casual']
['Simulation', 'Racing', 'Sports']
['Action']
['Strategy', 'Action', 'Adventure', 'RPG', 'Massively Multiplayer', 'Free to Play']
['Action', 'Racing', 'Indie', 'Sports']
['Simulation']
['Casual', 'Simulation']
['Casual', 'Simulation']
['Action', 'Gore', 'Violent']
['Casual', 'Simulation']
['Action']
['Action', 'Indie']
['Indie', 'Action']
['Action', 'Indie']
['Strategy', 'Simulation']
['Simulation']
['Strategy', 'Simulation']
['Simulation']
['Racing']
['Simulation']
['Adventure', 'Indie']
['Indie', 'Casual', 'Simulation']
['Simulation']
['Simulation']
['Simulation']
['Strategy', 'Simulation', 'Grand Strategy', 'Space', '4X', 'Sci-fi']
['Simulation']
['Strategy', 'Simulation', 'Grand Strategy', 'Historical']
['Simulation']
['Strategy', 'Simulation']
['Adventure', 'Indie', 'Horror', 'Survival Horror', 'Online Co-Op', 'Funny', 'First-Person', 'Replay Value', 'Exploration', 'Controller',

['Strategy', 'RPG', 'Indie', 'Story Rich', 'Sandbox', 'Software']
['Strategy', 'RPG', 'Indie']
['Indie', 'Simulation']
['Simulation', 'Racing', 'Sports', 'Indie']
['Utilities', 'Benchmark']
['Utilities', 'Benchmark']
['Action', 'Indie', 'Sports']
['Action']
['Action']
['Action']
['Action']
['Action']
['Action']
['Action']
['Action']
['Action']
['Action']
['Action']
['Simulation']
['Simulation']
['Simulation']
['Simulation']
['Simulation']
['Strategy', 'Action', 'Indie']
['Simulation']
['Strategy', 'Simulation']
['Adventure', 'RPG', 'Indie']
['Simulation']
['RPG']
['Action', 'Adventure', 'Indie']
['Action', 'Adventure']
['Massively Multiplayer', 'Strategy', 'Free to Play', 'Indie', 'Card Game', 'Turn-Based', 'Fantasy', 'Action', 'Adventure', 'Multiplayer', 'Trading Card Game', 'Tactical']
['Casual', 'Simulation']
['Action', 'Adventure']
['Action', 'Adventure']
['Adventure']
['Action', 'Adventure']
['Action', 'Adventure']
['Simulation', 'Indie']
['Simulation', 'Indie']
['Adventure', 'Ind

['Adventure', 'Action', 'LEGO', 'Dinosaurs']
['Action', 'Adventure', 'LEGO', 'Dinosaurs']
['Action', 'Adventure', 'LEGO']
['Adventure', 'RPG', 'Indie']
['Action', 'Indie']
['Casual', 'Simulation']
['Casual', 'Simulation']
['Casual', 'Simulation']
['Simulation']
['Simulation']
['Adventure', 'RPG', 'Indie', 'Casual', 'Simulation']
['Strategy', 'RPG', 'Indie']
['Casual', 'Simulation']
['Sports', 'Strategy', 'Action', 'Massively Multiplayer', 'Casual', 'Free to Play']
['Strategy', 'Action', 'Massively Multiplayer', 'Casual', 'Sports', 'Free to Play']
['Massively Multiplayer', 'Casual', 'Sports', 'Strategy', 'Action', 'Free to Play']
['Strategy', 'RPG', 'Indie']
['Strategy', 'Action', 'Indie']
['Strategy', 'Action', 'Indie']
['Strategy', 'Action', 'Indie']
['Action', 'Simulation', 'RPG', 'Space Sim', 'Space', 'Relaxing']
['Strategy', 'Action', 'Adventure', 'Indie']
['Action', 'RPG', 'Indie']
['Adventure', 'RPG', 'Indie', 'Simulation']
['Adventure']
['Strategy', 'Indie', 'Sports']
['Simulati

['Strategy', 'Free to Play', 'Indie']
['Strategy', 'Free to Play', 'Indie']
['Strategy', 'RPG', 'Simulation']
['Simulation', 'Strategy']
['Strategy', 'RPG', 'Simulation']
['Design & Illustration', 'Web Publishing']
['Simulation', 'Indie']
['Design & Illustration', 'Web Publishing']
['Strategy', 'RPG', 'Indie']
['Free to Play', 'Indie', 'Casual', 'Great Soundtrack', 'Anime', 'Visual Novel']
['Casual', 'Simulation']
['Casual', 'Simulation']
['Design & Illustration', 'Web Publishing']
['Casual', 'Simulation']
['Casual', 'Simulation']
['RPG']
['Casual', 'Simulation']
['RPG']
['Strategy', 'Indie', 'Casual']
['RPG']
['RPG']
['RPG']
['Action', 'Survival', 'Zombies']
['Simulation', 'Flight']
['RPG']
['Simulation', 'Realistic', 'Family Friendly', 'TrackIR', 'Trains', 'Open World', 'Sandbox', 'Physics', 'Atmospheric']
['RPG']
['Action', 'Strategy', 'Indie', 'FPS', 'RTS']
['Strategy', 'Adventure', 'RPG']
['Casual', 'Simulation']
['Casual', 'Simulation']
['Action', 'FPS', 'Multiplayer', 'Free to P

['Casual', 'Simulation']
['Action', 'RPG']
['Action', 'Indie', 'Strategy']
['Action']
['Simulation']
['Indie']
['Action', 'Indie']
['Simulation']
['Casual', 'Simulation']
['Casual', 'Simulation']
['Casual', 'Simulation']
['Casual', 'Simulation']
['Action']
['Casual', 'Simulation']
['Action']
['Simulation']
['Casual', 'Simulation']
['Action']
['Action', 'Sniper']
['Casual', 'Simulation']
['Casual', 'Simulation']
['Casual', 'Simulation']
['Casual', 'Simulation']
['Action', 'Indie', 'Simulation']
['Simulation']
['Strategy']
['Strategy', 'RPG']
['Strategy']
['Action', 'FPS', 'Post-apocalyptic', 'Singleplayer']
['Strategy', 'Simulation']
['Casual', 'Simulation']
['Casual', 'Simulation']
['Casual', 'Simulation']
['Casual', 'Simulation']
['Casual', 'Simulation']
['Strategy']
['Strategy', 'Action', 'Indie']
['Strategy']
['Action', 'Indie']
['Strategy']
['Strategy']
['Action', 'Adventure']
['Indie', 'Strategy', 'Action']
['Simulation', 'Trains', 'Family Friendly', 'TrackIR', 'Co-op', 'Open Worl

0        None
1        None
2        None
3        None
4        None
5        None
6        None
7        None
8        None
9        None
10       None
11       None
12       None
13       None
14       None
15       None
16       None
17       None
18       None
19       None
20       None
21       None
23       None
24       None
25       None
26       None
27       None
28       None
29       None
30       None
         ... 
12811    None
12812    None
12813    None
12814    None
12815    None
12816    None
12817    None
12818    None
12819    None
12820    None
12821    None
12822    None
12823    None
12824    None
12825    None
12826    None
12827    None
12828    None
12829    None
12830    None
12831    None
12832    None
12833    None
12834    None
12835    None
12836    None
12837    None
12838    None
12839    None
12840    None
Length: 12572, dtype: object

In [12]:
print(len(dlc_tags))

285


In [13]:
dlc_tags_count

{'2D': 65,
 '2D Fighter': 9,
 '3D Platformer': 3,
 '3D Vision': 0,
 '4 Player Local': 9,
 '4X': 13,
 '6DOF': 3,
 'Abstract': 1,
 'Action': 4260,
 'Action RPG': 15,
 'Action-Adventure': 8,
 'Adventure': 2522,
 'Agriculture': 0,
 'Aliens': 5,
 'Alternate History': 7,
 'America': 2,
 'Animation & Modeling': 111,
 'Anime': 75,
 'Arcade': 34,
 'Arena Shooter': 1,
 'Assassin': 6,
 'Asynchronous Multiplayer': 4,
 'Atmospheric': 221,
 'Audio Production': 42,
 'Base Building': 12,
 'Based On A Novel': 4,
 'Batman': 10,
 "Beat 'em up": 3,
 'Benchmark': 1,
 'Blood': 8,
 'Board Game': 20,
 'Building': 41,
 'Bullet Hell': 9,
 'CRPG': 3,
 'Capitalism': 0,
 'Card Game': 2,
 'Cartoon': 4,
 'Cartoony': 3,
 'Casual': 3046,
 'Character Customization': 6,
 'Choices Matter': 14,
 'Choose Your Own Adventure': 10,
 'Cinematic': 2,
 'City Builder': 38,
 'Class-Based': 1,
 'Classic': 18,
 'Clicker': 5,
 'Co-op': 107,
 'Co-op Campaign': 0,
 'Cold War': 5,
 'Colorful': 1,
 'Comedy': 28,
 'Comic Book': 2,
 'Compe

In [18]:
# save dictionaries as csv
(pd.DataFrame.from_dict(data=dlc_tags_count, orient='index').
to_csv('dlc_tags_count.csv', header=True))

(pd.DataFrame.from_dict(data=dlc_tags, orient='index').
to_csv('dlc_tags.csv', header=True))

#### Cleaning

In [15]:
cleaned_dlcs.drop(columns=['discount_price', 'url', 'app_name', 'parent_game', 'reviews_url'], inplace=True)
print(len(cleaned_dlcs))
cleaned_dlcs.head()

12572


Unnamed: 0,description,developer,genres,id,parent_game_url,price,publisher,release_date,sentiment,specs,tags,title
0,"[, About This Content, ►, Murum Charta, Commun...",ImaginationOverflow,"[Action, Casual, Indie]",777450,http://store.steampowered.com/app/517330/,Free,,2017-12-30,,"[Single-player, Downloadable Content, Steam Ac...","[Action, Indie, Casual]",Stellar Interface - Murum Charta
1,"[, About This Content, Unlocks three new ultra...",Stegalosaurus Game Development,"[Adventure, Indie, RPG]",694780,http://store.steampowered.com/app/592200/,0.99,,2018-01-11,,"[Single-player, Downloadable Content, Steam Ac...","[Adventure, RPG, Indie]","SUPER ARMY OF TENTACLES 3, Winter Outfit Pack ..."
2,"[, About This Content, The Wilmington Internat...",Drawbridge Designs,[Simulation],623613,http://store.steampowered.com/app/269950/,19.99,Aerosoft GmbH,2018-01-11,,"[Single-player, Online Multi-Player, Local Mul...",[Simulation],X-Plane 11 - Add-on: Aerosoft - Airport Wilmin...
3,"[, About This Content, Get ready for another h...",Thomson Interactive,[Simulation],642803,http://store.steampowered.com/app/24010/,19.99,Dovetail Games - Trains,2018-01-11,5 user reviews,"[Single-player, Downloadable Content, Steam Ac...",[Simulation],Train Simulator: RhB Enhancement Pack 02 Add-On
4,"[, About This Content, With its three dark ele...",Bryan Minus,"[Adventure, Casual, Indie]",722550,http://store.steampowered.com/app/717830/,1.99,Minus Equals Plus,2018-01-11,,"[Single-player, Downloadable Content]","[Adventure, Indie, Casual]",Waiting For the Loop Official Soundtrack and EP


In [16]:
# check data types
for column in cleaned_dlcs.columns:
    print(str(column) + ': ' +  str(cleaned_dlcs[column].dtype))

description: object
developer: object
genres: object
id: int64
parent_game_url: object
price: object
publisher: object
release_date: object
sentiment: object
specs: object
tags: object
title: object


In [17]:
# check for missing data
for column in cleaned_dlcs.columns:
    if np.any(pd.isnull(cleaned_dlcs[column])) == True:
        print(column)

description
developer
genres
price
publisher
release_date
sentiment
title


#### Investigate missing data

In [19]:
# inspect missing description
print(len(cleaned_dlcs.loc[pd.isnull(cleaned_dlcs['description']), 'description']))
cleaned_dlcs.iloc[683]

52


description        [, About This Content, DCS: Flaming Cliffs 3 i...
developer                                          Eagle Dynamics SA
genres                                                  [Simulation]
id                                                            249320
parent_game_url            http://store.steampowered.com/app/223750/
price                                                          39.99
publisher                                     The Fighter Collection
release_date                                              2012-11-08
sentiment                                              Very Positive
specs              [Single-player, Multi-player, Downloadable Con...
tags                                  [Simulation, Flight, Military]
title                                          DCS: Flaming Cliffs 3
Name: 706, dtype: object

In [20]:
# fill in missing description with empty string
cleaned_dlcs['description'].fillna('', inplace=True)

In [21]:
print(cleaned_dlcs['price'].unique())

['Free' 0.99 19.99 1.99 2.99 11.99 9.99 199.99 39.99 7.99 10.99 24.99 6.99
 8.99 14.99 13.99 4.99 5.99 7.49 3.49 3.99 nan 1.49 1.5899999999999999
 12.99 29.99 2.49 16.99 49.99 5.0 44.99 17.99 299.99 34.99 99.99 69.99
 15.99 21.99 79.99 10.0 'Free to Play' 15.0 289.99 4.49 18.99 64.99
 'Third-party' 'Play Now' 5.49 89.99 6.49 1.29 54.99 23.99
 1.3900000000000001 36.99 31.99 'Free To Play' 199.0 49.0 99.0 59.99 27.99
 2.66 249.99 499.99 26.98 119.99 131.4 22.99 995.0 27.49 6.0 3.39 19.95
 149.99 38.85 71.7 20.0 30.0 74.99 7.0 20.99 42.99 'Install Theme' 41.99
 4.29 59.95 109.99 1.25]


In [26]:
# drop row with price third-party
cleaned_dlcs.loc[cleaned_dlcs['price'] =='Third-party'].index
cleaned_dlcs.drop(cleaned_dlcs.loc[cleaned_dlcs['price'] =='Third-party'].index, inplace=True)

In [23]:
# fill in missing price (nan is usually for free products)
cleaned_dlcs['price'].fillna(0.0, inplace=True)

In [29]:
# clean price data
cleaned_dlcs.loc[cleaned_dlcs['price'] == 'Free', 'price'] = 0.0
cleaned_dlcs.loc[cleaned_dlcs['price'] == 'Play Now', 'price'] = 0.0
cleaned_dlcs.loc[cleaned_dlcs['price'] == 'Free to Play', 'price'] = 0.0
cleaned_dlcs.loc[cleaned_dlcs['price'] == 'Free To Play', 'price'] = 0.0
cleaned_dlcs.loc[cleaned_dlcs['price'] == 'Install Theme', 'price'] = 0.0

In [32]:
print(cleaned_dlcs['price'].unique())
cleaned_dlcs['price'] = cleaned_dlcs.price.astype(float)

[0.0 0.99 19.99 1.99 2.99 11.99 9.99 199.99 39.99 7.99 10.99 24.99 6.99
 8.99 14.99 13.99 4.99 5.99 7.49 3.49 3.99 1.49 1.5899999999999999 12.99
 29.99 2.49 16.99 49.99 5.0 44.99 17.99 299.99 34.99 99.99 69.99 15.99
 21.99 79.99 10.0 15.0 289.99 4.49 18.99 64.99 5.49 89.99 6.49 1.29 54.99
 23.99 1.3900000000000001 36.99 31.99 199.0 49.0 99.0 59.99 27.99 2.66
 249.99 499.99 26.98 119.99 131.4 22.99 995.0 27.49 6.0 3.39 19.95 149.99
 38.85 71.7 20.0 30.0 74.99 7.0 20.99 42.99 41.99 4.29 59.95 109.99 1.25]


In [36]:
# extract parent game app id from parent game url
def get_parent_game_id(row):
    """This function extracts the game app id from the url"""
    import re
    
    parent_url = row['parent_game_url']
    
    return re.findall('app/(\d+)/',row['parent_game_url'])[0]

In [49]:
cleaned_dlcs['parent_game_id'] = cleaned_dlcs.apply(get_parent_game_id, axis=1)
cleaned_dlcs['parent_game_id'] = cleaned_dlcs['parent_game_id'].astype(int)

In [38]:
cleaned_dlcs.head()

Unnamed: 0,description,developer,genres,id,parent_game_url,price,publisher,release_date,sentiment,specs,tags,title,parent_game_id
0,"[, About This Content, ►, Murum Charta, Commun...",ImaginationOverflow,"[Action, Casual, Indie]",777450,http://store.steampowered.com/app/517330/,0.0,,2017-12-30,,"[Single-player, Downloadable Content, Steam Ac...","[Action, Indie, Casual]",Stellar Interface - Murum Charta,517330
1,"[, About This Content, Unlocks three new ultra...",Stegalosaurus Game Development,"[Adventure, Indie, RPG]",694780,http://store.steampowered.com/app/592200/,0.99,,2018-01-11,,"[Single-player, Downloadable Content, Steam Ac...","[Adventure, RPG, Indie]","SUPER ARMY OF TENTACLES 3, Winter Outfit Pack ...",592200
2,"[, About This Content, The Wilmington Internat...",Drawbridge Designs,[Simulation],623613,http://store.steampowered.com/app/269950/,19.99,Aerosoft GmbH,2018-01-11,,"[Single-player, Online Multi-Player, Local Mul...",[Simulation],X-Plane 11 - Add-on: Aerosoft - Airport Wilmin...,269950
3,"[, About This Content, Get ready for another h...",Thomson Interactive,[Simulation],642803,http://store.steampowered.com/app/24010/,19.99,Dovetail Games - Trains,2018-01-11,5 user reviews,"[Single-player, Downloadable Content, Steam Ac...",[Simulation],Train Simulator: RhB Enhancement Pack 02 Add-On,24010
4,"[, About This Content, With its three dark ele...",Bryan Minus,"[Adventure, Casual, Indie]",722550,http://store.steampowered.com/app/717830/,1.99,Minus Equals Plus,2018-01-11,,"[Single-player, Downloadable Content]","[Adventure, Indie, Casual]",Waiting For the Loop Official Soundtrack and EP,717830


In [39]:
cleaned_dlcs = cleaned_dlcs.drop(columns=['parent_game_url', 'sentiment'])

In [43]:
cleaned_dlcs.head()

Unnamed: 0,description,developer,genres,id,price,publisher,release_date,specs,tags,title,parent_game_id
0,"[, About This Content, ►, Murum Charta, Commun...",ImaginationOverflow,"[Action, Casual, Indie]",777450,0.0,,2017-12-30,"[Single-player, Downloadable Content, Steam Ac...","[Action, Indie, Casual]",Stellar Interface - Murum Charta,517330
1,"[, About This Content, Unlocks three new ultra...",Stegalosaurus Game Development,"[Adventure, Indie, RPG]",694780,0.99,,2018-01-11,"[Single-player, Downloadable Content, Steam Ac...","[Adventure, RPG, Indie]","SUPER ARMY OF TENTACLES 3, Winter Outfit Pack ...",592200
2,"[, About This Content, The Wilmington Internat...",Drawbridge Designs,[Simulation],623613,19.99,Aerosoft GmbH,2018-01-11,"[Single-player, Online Multi-Player, Local Mul...",[Simulation],X-Plane 11 - Add-on: Aerosoft - Airport Wilmin...,269950
3,"[, About This Content, Get ready for another h...",Thomson Interactive,[Simulation],642803,19.99,Dovetail Games - Trains,2018-01-11,"[Single-player, Downloadable Content, Steam Ac...",[Simulation],Train Simulator: RhB Enhancement Pack 02 Add-On,24010
4,"[, About This Content, With its three dark ele...",Bryan Minus,"[Adventure, Casual, Indie]",722550,1.99,Minus Equals Plus,2018-01-11,"[Single-player, Downloadable Content]","[Adventure, Indie, Casual]",Waiting For the Loop Official Soundtrack and EP,717830


In [44]:
# check for missing data
for column in cleaned_dlcs.columns:
    if np.any(pd.isnull(cleaned_dlcs[column])) == True:
        print(column)

developer
genres
publisher
release_date
title


In [45]:
print(len(cleaned_dlcs.loc[pd.isnull(cleaned_dlcs['release_date'])]))

34


In [46]:
len(cleaned_dlcs)

12571

In [53]:
# check data types
for column in cleaned_dlcs.columns:
    print(str(column) + ': ' +  str(cleaned_dlcs[column].dtype))

description: object
developer: object
genres: object
id: int64
price: float64
publisher: object
release_date: object
specs: object
tags: object
title: object
parent_game_id: int64


In [308]:
cleaned_dlcs.head()

Unnamed: 0,description,developer,genres,id,price,publisher,release_date,specs,tags,title,parent_game_id
0,"[, About This Content, ►, Murum Charta, Commun...",ImaginationOverflow,"[Action, Casual, Indie]",777450,0.0,,2017-12-30,"[Single-player, Downloadable Content, Steam Ac...","[Action, Indie, Casual]",Stellar Interface - Murum Charta,517330
1,"[, About This Content, Unlocks three new ultra...",Stegalosaurus Game Development,"[Adventure, Indie, RPG]",694780,0.99,,2018-01-11,"[Single-player, Downloadable Content, Steam Ac...","[Adventure, RPG, Indie]","SUPER ARMY OF TENTACLES 3, Winter Outfit Pack ...",592200
2,"[, About This Content, The Wilmington Internat...",Drawbridge Designs,[Simulation],623613,19.99,Aerosoft GmbH,2018-01-11,"[Single-player, Online Multi-Player, Local Mul...",[Simulation],X-Plane 11 - Add-on: Aerosoft - Airport Wilmin...,269950
3,"[, About This Content, Get ready for another h...",Thomson Interactive,[Simulation],642803,19.99,Dovetail Games - Trains,2018-01-11,"[Single-player, Downloadable Content, Steam Ac...",[Simulation],Train Simulator: RhB Enhancement Pack 02 Add-On,24010
4,"[, About This Content, With its three dark ele...",Bryan Minus,"[Adventure, Casual, Indie]",722550,1.99,Minus Equals Plus,2018-01-11,"[Single-player, Downloadable Content]","[Adventure, Indie, Casual]",Waiting For the Loop Official Soundtrack and EP,717830


In [309]:
cleaned_dlcs['publisher'].unique()

array(['', 'Aerosoft GmbH', 'Dovetail Games - Trains', ..., 'Motion Twin',
       'KoheiGallery', 'IzHard'], dtype=object)

In [51]:
cleaned_dlcs['developer'] = cleaned_dlcs['developer'].astype(str)
cleaned_dlcs['publisher'] = cleaned_dlcs['publisher'].astype(str)
cleaned_dlcs['title'] = cleaned_dlcs['title'].astype(str)
cleaned_dlcs['price'] = cleaned_dlcs['price'].astype(float)

In [52]:
cleaned_dlcs.to_csv('dlcs_cleaned.csv')

# Cleaning scraped DLC reviews data

In [7]:
# load scraped dlc reviews
data1 = pd.read_json('raw_data/reviews_1.json', dtype=False)
data2 = pd.read_json('raw_data/reviews_2.json', dtype=False)
data3 = pd.read_json('raw_data/reviews_3.json', dtype=False)
data4 = pd.read_json('raw_data/reviews_4.json', dtype=False)
data5 = pd.read_json('raw_data/reviews_5.json', dtype=False)

In [8]:
reviews_raw = pd.DataFrame(columns=data1.columns)
reviews_raw = reviews_raw.append(data1).append(data2).append(data3).append(data4)
print(len(reviews_raw))
reviews_raw.head()

73128


Unnamed: 0,compensation,date,hours,num_reviews,product_id,products,recommended,text,user_id,user_url,username
0,False,January-13,,35,642803,468.0,Recommended,"Okay, get thisyou can open the cab's side door...",,taschi,🚂🚃Taschi🚃🚃
1,False,January-12,,6,760610,888.0,Recommended,Bought for the pure respect of the original pr...,7.656119799989941e+16,,Crimson-Albedo
2,False,January-11,,37,760610,213.0,Recommended,One of my favorite pages is 23 1920.pngThey ar...,,MoeShan,MoeShan
3,False,January-15,,8,779280,164.0,Recommended,i have found my queen.,,plachtaA17,I know de wae
4,False,January-14,,94,779280,304.0,Recommended,Should be called: The Flat Pack,,disabledchildlite,Putotyra


In [9]:
# reset index and drop extraneous index column
reviews_raw = reviews_raw.reset_index().drop(columns='index')

#### Grabbing user_ids from the user_urls

In [10]:
# Steam has different ways to route to the user profiles
# by user_url: http://steamcommunity.com/id/user_url/
# by user id:  http://steamcommunity.com/profiles/user_id

api_key = 'your api key'
my_id = 'your steam id'

In [11]:
# write a function that grabs the user_id based on their user_url (if user_id is missing) 
# user_ids are required to make Steam API calls later
grabbed_user_ids = pd.Series(index = reviews_raw.index)
def get_user_id(row):
    """This function gets the user steamID64 from the user_url"""
    from urllib.request import Request, urlopen
    import json
    
    # if user_id is null, we want to get the user_id from the user_url
    if pd.isnull(row['user_id']):
        url = """
        http://api.steampowered.com/ISteamUser/ResolveVanityURL/v1/?key={}&steamid={}&format=json&vanityurl={}
        """.format(api_key, my_id, row['user_url'])
    
        print(row['user_url'])
        req = Request(url, headers={'User-Agent': 'Mozilla/5.0'})
        data = urlopen(req).read().decode('utf-8')
        feed = json.loads(data)
        #print(feed)
        
        try:
            # grab user_id if it exists
            user_id = feed['response']['steamid']
            
            # append to instantiated series
            grabbed_user_ids.iloc[row.name] = user_id
            
            return user_id
        except:
            pass

In [13]:
# grab the user ids and print to a separate file in order to retrieve their owned games later
grabbed_user_ids = pd.Series(index = reviews_raw.index)
reviews_raw['grabbed_user_id'] = reviews_raw.apply(get_user_id, axis=1)
np.savetxt('user_id_1.txt', grabbed_user_ids, fmt='%s')

#### Unify user info columns to get the final user id

In [16]:
reviews_raw.head()

Unnamed: 0,compensation,date,hours,num_reviews,product_id,products,recommended,text,user_id,user_url,username,grabbed_user_id
0,False,January-13,,35,642803,468.0,Recommended,"Okay, get thisyou can open the cab's side door...",,taschi,🚂🚃Taschi🚃🚃,7.656119806701994e+16
1,False,January-12,,6,760610,888.0,Recommended,Bought for the pure respect of the original pr...,7.656119799989941e+16,,Crimson-Albedo,
2,False,January-11,,37,760610,213.0,Recommended,One of my favorite pages is 23 1920.pngThey ar...,,MoeShan,MoeShan,7.656119815192173e+16
3,False,January-15,,8,779280,164.0,Recommended,i have found my queen.,,plachtaA17,I know de wae,7.656119814127744e+16
4,False,January-14,,94,779280,304.0,Recommended,Should be called: The Flat Pack,,disabledchildlite,Putotyra,7.656119806765352e+16


In [17]:
def combine_user_id(row):
    """This function combines the user_id and grabbed_user_id column into a new column"""
    
    # if grabbed_user_id is null, we want to get the user_id from the user_id (assuming it exists)
    if pd.isnull(row['grabbed_user_id']):
        if not pd.isnull(row['user_id']):
            return row['user_id']
    # if grabbed_user_id is not null, return original value
    else:
        return row['grabbed_user_id']

In [19]:
# perform apply operations to combine the grabbed_user_id and user_id
reviews_raw['final_user_id'] = reviews_raw.apply(combine_user_id, axis=1)

In [20]:
# check how many null values exist in the final_user_id column
null_user_id = reviews_raw.loc[pd.isnull(reviews_raw['final_user_id']), 'final_user_id']
print('Number of user data that cannot be tracked: '+str(len(null_user_id)))

Number of user data that cannot be tracked: 4


In [21]:
# inspect the null data rows
print(null_user_id)
print(reviews_raw.iloc[6174])
print(reviews_raw.iloc[31853])

6174     None
31853    None
49031    None
59322    None
Name: final_user_id, dtype: object
compensation                                                   False
date                                                    June-21-2017
hours                                                            NaN
num_reviews                                                       51
product_id                                                    210937
products                                                         117
recommended                                              Recommended
text               my favorite character i always played during t...
user_id                                                          NaN
user_url                                       Wrath_Of_The_Ancients
username                                                      Mhorti
grabbed_user_id                                                  NaN
final_user_id                                                   None
Name: 6174, 

In [23]:
# save current process to file
#reviews_raw.to_csv('raw_data/raw_dlc_review_data_1.csv')

In [25]:
reviews_raw.head()

Unnamed: 0,compensation,date,hours,num_reviews,product_id,products,recommended,text,user_id,user_url,username,grabbed_user_id,final_user_id
0,False,January-13,,35,642803,468.0,Recommended,"Okay, get thisyou can open the cab's side door...",,taschi,🚂🚃Taschi🚃🚃,7.656119806701994e+16,76561198067019938
1,False,January-12,,6,760610,888.0,Recommended,Bought for the pure respect of the original pr...,7.656119799989941e+16,,Crimson-Albedo,,76561197999899413
2,False,January-11,,37,760610,213.0,Recommended,One of my favorite pages is 23 1920.pngThey ar...,,MoeShan,MoeShan,7.656119815192173e+16,76561198151921721
3,False,January-15,,8,779280,164.0,Recommended,i have found my queen.,,plachtaA17,I know de wae,7.656119814127744e+16,76561198141277447
4,False,January-14,,94,779280,304.0,Recommended,Should be called: The Flat Pack,,disabledchildlite,Putotyra,7.656119806765352e+16,76561198067653526


### Cleaning

In [27]:
# check data types
cleaned_reviews = reviews_raw
for column in cleaned_reviews.columns:
    print(str(column) + ': ' +  str(cleaned_reviews[column].dtype))

compensation: object
date: object
hours: float64
num_reviews: object
product_id: object
products: float64
recommended: object
text: object
user_id: object
user_url: object
username: object
grabbed_user_id: object
final_user_id: object


In [28]:
# check for missing data
for column in cleaned_reviews.columns:
    if np.any(pd.isnull(cleaned_reviews[column])) == True:
        print(column)

hours
products
user_id
user_url
username
grabbed_user_id
final_user_id


#### Investigate missing data

In [30]:
# inspecting null columns of products, seems like NaN was pulled for private user profiles
private_users = cleaned_reviews.loc[pd.isnull(cleaned_reviews['products']) == True]
print(len(private_users))
private_users.index

66


Int64Index([ 2595,  5440,  8426,  9419, 15259, 15633, 15648, 15722, 15764,
            16118, 16247, 16806, 17655, 18553, 23001, 26733, 29398, 29595,
            29612, 29694, 30033, 30928, 32197, 32571, 32620, 34029, 35605,
            35652, 36039, 36194, 38084, 38538, 39738, 42890, 45200, 46640,
            46977, 48217, 49655, 49738, 50874, 51149, 51596, 52182, 54485,
            54488, 55437, 56445, 56468, 57183, 64459, 64508, 64609, 64614,
            65975, 66466, 68562, 68564, 68677, 68849, 68972, 69234, 69960,
            70533, 70564, 70946],
           dtype='int64')

In [33]:
# inspect missing ids

# Create variable with TRUE if user_id is missing
missing_id = pd.isnull(cleaned_reviews['user_id']) == True
# Create variable with TRUE if grabbed_user_id is missing
missing_grabbed_id = pd.isnull(cleaned_reviews['grabbed_user_id']) == True

# Select all cases where user ID is missing
no_id = cleaned_reviews[missing_id & missing_grabbed_id]
no_id.index

Int64Index([6174, 31853, 49031, 59322], dtype='int64')

In [34]:
# drop rows with missing user ids and products
drop_indices = private_users.index.append(no_id.index)
print(len(drop_indices))

# Drop rows with missing product information and user id information
cleaned_reviews = cleaned_reviews.drop(drop_indices)

70


In [36]:
# check if the rows were dropped correctly
print('before dropping rows with missing products: ' + str(len(reviews_raw)))
print('after dropping rows with missing products: ' + str(len(cleaned_reviews)))

before dropping rows with missing products: 73128
after dropping rows with missing products: 73058


In [37]:
# fill in missing hours
cleaned_reviews["hours"].fillna(0.0, inplace=True)

In [43]:
# fill in missing text fields with empty string
cleaned_reviews["text"] = cleaned_reviews["text"].fillna('')

In [44]:
# check progress so far
for column in cleaned_reviews.columns:
    if np.any(pd.isnull(cleaned_reviews[column])) == True:
        print(column)

user_id
user_url
username
grabbed_user_id


In [46]:
# drop unnecessary columns
cleaned_reviews = cleaned_reviews.drop(columns=['user_id', 'user_url', 'username', 'grabbed_user_id'])
cleaned_reviews.head()

Unnamed: 0,compensation,date,hours,num_reviews,product_id,products,recommended,text,final_user_id
0,False,January-13,0.0,35,642803,468.0,Recommended,"Okay, get thisyou can open the cab's side door...",76561198067019938
1,False,January-12,0.0,6,760610,888.0,Recommended,Bought for the pure respect of the original pr...,76561197999899413
2,False,January-11,0.0,37,760610,213.0,Recommended,One of my favorite pages is 23 1920.pngThey ar...,76561198151921721
3,False,January-15,0.0,8,779280,164.0,Recommended,i have found my queen.,76561198141277447
4,False,January-14,0.0,94,779280,304.0,Recommended,Should be called: The Flat Pack,76561198067653526


#### Convert data types

In [48]:
cleaned_reviews['compensation'].unique()

array([False, True], dtype=object)

In [49]:
# convert compensation to boolean
cleaned_reviews.loc[cleaned_reviews['compensation'] == True, 'compensation'] = 1
cleaned_reviews.loc[cleaned_reviews['compensation'] == False, 'compensation'] = 0

In [51]:
cleaned_reviews['recommended'].unique()

array(['Recommended', 'Not Recommended'], dtype=object)

In [52]:
# convert recommended to boolean
cleaned_reviews.loc[cleaned_reviews['recommended'] == 'Recommended', 'recommended'] = 1
cleaned_reviews.loc[cleaned_reviews['recommended'] == 'Not Recommended', 'recommended'] = 0

In [113]:
cleaned_reviews.head(20)

Unnamed: 0,compensation,date,hours,num_reviews,product_id,products,recommended,text,final_user_id
0,0,January-13,0.0,35,642803,468.0,1,"Okay, get thisyou can open the cab's side door...",76561198067019938
1,0,January-12,0.0,6,760610,888.0,1,Bought for the pure respect of the original pr...,76561197999899413
2,0,January-11,0.0,37,760610,213.0,1,One of my favorite pages is 23 1920.pngThey ar...,76561198151921721
3,0,January-15,0.0,8,779280,164.0,1,i have found my queen.,76561198141277447
4,0,January-14,0.0,94,779280,304.0,1,Should be called: The Flat Pack,76561198067653526
5,0,January-14,0.0,11,779280,90.0,1,Nice Skill and Damage,76561198374084021
6,0,January-13,0.0,22,779280,127.0,1,For my waifu.,76561198029850210
7,0,January-12,0.0,13,779280,259.0,1,"as I review in main gameyou can't play as ""Sak...",76561198056989242
8,0,July-8-2014,0.1,2,25940,197.0,1,Under System requirements it says: Controller ...,76561198037549250
9,0,December-6-2017,0.0,19,8650,8.0,0,Can't get it to work. What a shame.,76561198343391469


In [124]:
# process date so that both pandas and psql will understand
def format_date(date):
    date = str(date)
    if len(date) < 2:
        return '0' + date
    else:
        return date
    
def convert_date(row):
    """This function converts the date string to date ISO 8601 format"""
    import datetime
    
    # mapping string to int
    map_month = {'January': 1, 'February': 2, 'March': 3, 'April': 4, 'May': 5, 'June': 6, 'July': 7, 'August': 8, 'September': 9, 'October': 10, 'November': 11, 'December': 12}
    
    # process string and map to dateformat that datetime will understand
    date_list = row['date'].split('-')
    month = format_date(str(map_month[date_list[0]]))
    day = format_date(str(date_list[1]))
    if len(date_list) == 2:
        year = '2018'
    else:
        year = str(date_list[2])
    
    date = month + day + year

    return datetime.datetime.strptime(date, "%m%d%Y").isoformat()

In [125]:
cleaned_reviews['date_iso'] = cleaned_reviews.apply(convert_date, axis=1)

In [128]:
cleaned_reviews = cleaned_reviews.drop(columns=['date'])
cleaned_reviews.head()

Unnamed: 0,compensation,hours,num_reviews,product_id,products,recommended,text,final_user_id,date_iso
0,0,0.0,35,642803,468.0,1,"Okay, get thisyou can open the cab's side door...",76561198067019938,2018-01-13T00:00:00
1,0,0.0,6,760610,888.0,1,Bought for the pure respect of the original pr...,76561197999899413,2018-01-12T00:00:00
2,0,0.0,37,760610,213.0,1,One of my favorite pages is 23 1920.pngThey ar...,76561198151921721,2018-01-11T00:00:00
3,0,0.0,8,779280,164.0,1,i have found my queen.,76561198141277447,2018-01-15T00:00:00
4,0,0.0,94,779280,304.0,1,Should be called: The Flat Pack,76561198067653526,2018-01-14T00:00:00


In [135]:
# check data types
for column in cleaned_reviews.columns:
    print(str(column) + ': ' +  str(cleaned_reviews[column].dtype))

compensation: int64
hours: float64
num_reviews: int64
product_id: int64
products: float64
recommended: int64
text: object
final_user_id: int64
date_iso: object


In [136]:
cleaned_reviews.head()

Unnamed: 0,compensation,hours,num_reviews,product_id,products,recommended,text,final_user_id,date_iso
0,0,0.0,35,642803,468.0,1,"Okay, get thisyou can open the cab's side door...",76561198067019938,2018-01-13T00:00:00
1,0,0.0,6,760610,888.0,1,Bought for the pure respect of the original pr...,76561197999899413,2018-01-12T00:00:00
2,0,0.0,37,760610,213.0,1,One of my favorite pages is 23 1920.pngThey ar...,76561198151921721,2018-01-11T00:00:00
3,0,0.0,8,779280,164.0,1,i have found my queen.,76561198141277447,2018-01-15T00:00:00
4,0,0.0,94,779280,304.0,1,Should be called: The Flat Pack,76561198067653526,2018-01-14T00:00:00


In [129]:
cleaned_reviews.to_csv('reviews_cleaned.csv')

# Cleaning game data

In [2]:
games_raw = pd.read_json('raw_data/games_all.json')

In [3]:
print(len(games_raw))
games_raw.head()

32645


Unnamed: 0,app_name,description,developer,discount_price,early_access,genres,id,metascore,parent_game,parent_game_url,price,publisher,release_date,reviews_url,sentiment,specs,tags,title,url
0,The Hardest Thing,"[, About This Game, Do you love puzzles? What ...",ChillFun,,False,[Strategy],773510.0,,,,14.99,ChillFun,2018-01-17,http://steamcommunity.com/app/773510/reviews/?...,,"[Single-player, Partial Controller Support, St...","[Strategy, Cartoon, Puzzle, First-Person, Phys...",The Hardest Thing,http://store.steampowered.com/app/773510/The_H...
1,Serious Office,"[, About This Game, , Dear Bob,, I know you kn...",NOS3D,,True,"[Action, Casual, Indie, Massively Multiplayer,...",776170.0,,,,,NOS3D,Jan 2018,http://steamcommunity.com/app/776170/reviews/?...,,"[Multi-player, Online Multi-Player, Partial Co...","[Early Access, Strategy, Action, Massively Mul...",Serious Office,http://store.steampowered.com/app/776170/Serio...
2,OVERVIEW,"[, About This Game, OVERVIEW is a 30 minutes n...",,,False,,751110.0,,,,,,,http://steamcommunity.com/app/751110/reviews/?...,,"[Single-player, HTC Vive, Oculus Rift, Tracked...","[Adventure, Casual, Simulation]",,http://store.steampowered.com/app/751110/OVERV...
3,Global Soccer Manager 2017,"[, About This Game, In Global Soccer Manager 2...","gsmpcgame,globalsoccermanager,Andrea Hochstein",,False,"[Casual, Indie, Simulation, Sports, Strategy]",625700.0,,,,9.99,gsmpcgame,2017-05-24,http://steamcommunity.com/app/625700/reviews/?...,Positive,"[Single-player, Steam Achievements, Steam Trad...","[Sports, Strategy, Simulation, Indie, Casual, ...",Global Soccer Manager 2017,http://store.steampowered.com/app/625700/Globa...
4,Full Metal Furies - Soundtrack,"[, About This Content, Loved the game? Then w...",A Shell in the Pit,,False,"[Action, Adventure, Indie, RPG]",788060.0,,Full Metal Furies,http://store.steampowered.com/app/416600/,7.99,Cellar Door Games,2018-01-17,http://steamcommunity.com/app/788060/reviews/?...,,"[Single-player, Co-op, Online Co-op, Local Co-...","[Action, Adventure, RPG, Indie]",Full Metal Furies - Soundtrack,http://store.steampowered.com/app/788060/Full_...


In [12]:
game_specs

{'Captions available': 0,
 'Co-op': 0,
 'Commentary available': 0,
 'Cross-Platform Multiplayer': 0,
 'Downloadable Content': 0,
 'Full controller support': 0,
 'Game demo': 0,
 'Gamepad': 0,
 'HTC Vive': 0,
 'In-App Purchases': 0,
 'Includes Source SDK': 0,
 'Includes level editor': 0,
 'Keyboard / Mouse': 0,
 'Local Co-op': 0,
 'Local Multi-Player': 0,
 'MMO': 0,
 'Mods': 0,
 'Mods (require HL1)': 0,
 'Mods (require HL2)': 0,
 'Multi-player': 0,
 'Oculus Rift': 0,
 'Online Co-op': 0,
 'Online Multi-Player': 0,
 'Partial Controller Support': 0,
 'Room-Scale': 0,
 'Seated': 0,
 'Shared/Split Screen': 0,
 'Single-player': 0,
 'Standing': 0,
 'Stats': 0,
 'Steam Achievements': 0,
 'Steam Cloud': 0,
 'Steam Leaderboards': 0,
 'Steam Trading Cards': 0,
 'Steam Turn Notifications': 0,
 'Steam Workshop': 0,
 'SteamVR Collectibles': 0,
 'Tracked Motion Controllers': 0,
 'Valve Anti-Cheat enabled': 0,
 'Windows Mixed Reality': 0}

In [10]:
game_specs = {}
game_specs_count = {}
#unpack dlc tags and put it in dictionary
def unpack_specs(row):
    """This function unpacks the list of genres and stores it into a dictionary"""
    specs = row['specs']
    #print(tags)
    try:
        for i in range(len(specs)):
            if specs[i] not in game_specs:
                game_specs[specs[i]] = 0
                game_specs_count[specs[i]] = 0
            else:
                game_specs_count[specs[i]] += 1
    except:
        pass

games_raw.apply(unpack_specs, axis=1)

0        None
1        None
2        None
3        None
4        None
5        None
6        None
7        None
8        None
9        None
10       None
11       None
12       None
13       None
14       None
15       None
16       None
17       None
18       None
19       None
20       None
21       None
22       None
23       None
24       None
25       None
26       None
27       None
28       None
29       None
         ... 
32615    None
32616    None
32617    None
32618    None
32619    None
32620    None
32621    None
32622    None
32623    None
32624    None
32625    None
32626    None
32627    None
32628    None
32629    None
32630    None
32631    None
32632    None
32633    None
32634    None
32635    None
32636    None
32637    None
32638    None
32639    None
32640    None
32641    None
32642    None
32643    None
32644    None
Length: 32645, dtype: object

In [13]:
# save dictionaries as csv
(pd.DataFrame.from_dict(data=game_specs_count, orient='index').
to_csv('game_specs_count.csv', header=True))

(pd.DataFrame.from_dict(data=game_specs, orient='index').
to_csv('game_specs.csv', header=True))

In [64]:
game_tags = {}
game_tags_count = {}
#unpack dlc tags and put it in dictionary
def unpack_tags(row):
    """This function unpacks the list of genres and stores it into a dictionary"""
    tags = row['tags']
    #print(tags)
    try:
        for i in range(len(tags)):
            if tags[i] not in game_tags:
                game_tags[tags[i]] = 0
                game_tags_count[tags[i]] = 0
            else:
                game_tags_count[tags[i]] += 1
    except:
        pass

games_raw.apply(unpack_tags, axis=1)

0        None
1        None
2        None
3        None
4        None
5        None
6        None
7        None
8        None
9        None
10       None
11       None
12       None
13       None
14       None
15       None
16       None
17       None
18       None
19       None
20       None
21       None
22       None
23       None
24       None
25       None
26       None
27       None
28       None
29       None
         ... 
32615    None
32616    None
32617    None
32618    None
32619    None
32620    None
32621    None
32622    None
32623    None
32624    None
32625    None
32626    None
32627    None
32628    None
32629    None
32630    None
32631    None
32632    None
32633    None
32634    None
32635    None
32636    None
32637    None
32638    None
32639    None
32640    None
32641    None
32642    None
32643    None
32644    None
Length: 32645, dtype: object

In [69]:
# save dictionaries as csv
(pd.DataFrame.from_dict(data=game_tags_count, orient='index').
to_csv('game_tags_count.csv', header=True))

(pd.DataFrame.from_dict(data=game_tags, orient='index').
to_csv('game_tags.csv', header=True))

In [70]:
game_genres = {}
game_genres_count = {}
#unpack dlc genre and put it in dictionary
def unpack_genre(row):
    """This function unpacks the list of genres and stores it into a dictionary"""
    genres = row['genres']
    #print(genres)
    try:
        for i in range(len(genres)):
            if genres[i] not in game_genres:
                game_genres[genres[i]] = 0
                game_genres_count[genres[i]] = 0
            else:
                game_genres_count[genres[i]] += 1
    except:
        pass

games_raw.apply(unpack_genre, axis=1)

0        None
1        None
2        None
3        None
4        None
5        None
6        None
7        None
8        None
9        None
10       None
11       None
12       None
13       None
14       None
15       None
16       None
17       None
18       None
19       None
20       None
21       None
22       None
23       None
24       None
25       None
26       None
27       None
28       None
29       None
         ... 
32615    None
32616    None
32617    None
32618    None
32619    None
32620    None
32621    None
32622    None
32623    None
32624    None
32625    None
32626    None
32627    None
32628    None
32629    None
32630    None
32631    None
32632    None
32633    None
32634    None
32635    None
32636    None
32637    None
32638    None
32639    None
32640    None
32641    None
32642    None
32643    None
32644    None
Length: 32645, dtype: object

In [72]:
# save dictionaries as csv
(pd.DataFrame.from_dict(data=game_genres_count, orient='index').
to_csv('game_genres_count.csv', header=True))

(pd.DataFrame.from_dict(data=game_genres, orient='index').
to_csv('game_genres.csv', header=True))

In [365]:
cleaned_games = games_raw

In [366]:
# check data types
def check_df(df):    
    for column in cleaned_games.columns:
        if np.any(pd.isnull(cleaned_games[column])) == True:
            print(str(column) + ': ' +  str(cleaned_games[column].dtype) + "    (has NaN)")
        else:
            print(str(column) + ': ' +  str(cleaned_games[column].dtype))

In [294]:
check_df(cleaned_games)

description: object    (has NaN)
developer: object    (has NaN)
genres: object    (has NaN)
id: float64    (has NaN)
parent_game: object    (has NaN)
parent_game_url: object    (has NaN)
price: object    (has NaN)
publisher: object    (has NaN)
release_date: object    (has NaN)
sentiment: object    (has NaN)
specs: object    (has NaN)
tags: object    (has NaN)
title: object    (has NaN)


In [367]:
cleaned_games = cleaned_games.drop(columns=['app_name', 'discount_price', 'metascore', 'reviews_url', 'url', 'early_access'])

In [69]:
cleaned_games.head()

Unnamed: 0,description,developer,genres,id,parent_game,parent_game_url,price,publisher,release_date,sentiment,specs,tags,title
0,"[, About This Game, Do you love puzzles? What ...",ChillFun,[Strategy],773510.0,,,14.99,ChillFun,2018-01-17,,"[Single-player, Partial Controller Support, St...","[Strategy, Cartoon, Puzzle, First-Person, Phys...",The Hardest Thing
1,"[, About This Game, , Dear Bob,, I know you kn...",NOS3D,"[Action, Casual, Indie, Massively Multiplayer,...",776170.0,,,,NOS3D,Jan 2018,,"[Multi-player, Online Multi-Player, Partial Co...","[Early Access, Strategy, Action, Massively Mul...",Serious Office
2,"[, About This Game, OVERVIEW is a 30 minutes n...",,,751110.0,,,,,,,"[Single-player, HTC Vive, Oculus Rift, Tracked...","[Adventure, Casual, Simulation]",
3,"[, About This Game, In Global Soccer Manager 2...","gsmpcgame,globalsoccermanager,Andrea Hochstein","[Casual, Indie, Simulation, Sports, Strategy]",625700.0,,,9.99,gsmpcgame,2017-05-24,Positive,"[Single-player, Steam Achievements, Steam Trad...","[Sports, Strategy, Simulation, Indie, Casual, ...",Global Soccer Manager 2017
4,"[, About This Content, Loved the game? Then w...",A Shell in the Pit,"[Action, Adventure, Indie, RPG]",788060.0,Full Metal Furies,http://store.steampowered.com/app/416600/,7.99,Cellar Door Games,2018-01-17,,"[Single-player, Co-op, Online Co-op, Local Co-...","[Action, Adventure, RPG, Indie]",Full Metal Furies - Soundtrack


In [368]:
# inspect missing publisher / developer

# Create variable with TRUE if publisher is missing
missing_pub = pd.isnull(cleaned_games['publisher']) == True
# Create variable with TRUE if developer is missing
missing_dev = pd.isnull(cleaned_games['developer']) == True
# Create variable with False if parent game is missing (i.e. True if parent game exists, thus is a DLC)
existing_parent = pd.isnull(cleaned_games['parent_game_url']) == False

# Drop rows where all three fields are missing
#no_pub_dev_parent = cleaned_dlcs[missing_pub & missing_dev & missing_parent]
#print(len(no_pub_dev_parent))
#cleaned_dlcs.drop(no_pub_dev_parent.index, inplace=True)

# Drop rows where parent game exists because this is a DLC
exist_parent = cleaned_games[existing_parent]
print(len(exist_parent))
cleaned_games.drop(exist_parent.index, inplace=True)

12680


In [369]:
print(len(cleaned_games))

19965


#### Original scraper choked on games with VR info because the html structure is different for these games

- need to re-scrape these games for info

In [277]:
need_more_info = cleaned_games[missing_pub & missing_dev]
#need_more_info.tail()

  """Entry point for launching an IPython kernel.


In [400]:
x = 'Jan 2018'
def standardize_date(x):
    from datetime import datetime, date
    
    try:
        if datetime.strptime(x, '%Y-%m-%d'):
            return x
    except:
        for fmt in ["%b %d, %Y", "%B %d, %Y"]:
            try:
                return datetime.strptime(x, fmt).strftime("%Y-%m-%d")
            except:
                fmt_fail = True

        for fmt in ["%b %Y", "%B %Y"]:
            try:
                d = datetime.strptime(x, fmt)
                d = d.replace(day=1)
                return d.strftime("%Y-%m-%d")
            except:
                fmt_fail = True
        
print(standardize_date(x))

2018-01-01


In [263]:
# set up beautiful soup to scrape info for these 3k games
# title, developer, genres, parent game, publisher, release_date, sentiment
from urllib.request import Request, urlopen
import json
from bs4 import BeautifulSoup
import re

info = pd.DataFrame(index=need_more_info.index, columns=['title','genres','developer','publisher','release_date'])
def scrape_game_info(row):
    print(row.name)
    url = "http://store.steampowered.com/app/{}".format(row['id'])
#     url = 'http://store.steampowered.com/app/251810/Leadwerks_Game_Engine/'
    page = urlopen(url) 
    soup = BeautifulSoup(page,'lxml') # scrape page
    details = soup.findAll("div", {"class": "details_block"}) # fetch details block div

    import collections
    def flatten(l):
        for el in l:
            if isinstance(el, collections.Iterable) and not isinstance(el, (str, bytes)):
                yield from flatten(el)
            else:
                yield el

    #print(details)
    for item in details:
        title = re.findall("Title:\<\/b\>\s+([\w+\s+\w+]+)\<br", str(item))
        #print(title)
        #import pdb; pdb.set_trace()
        if title:
            detail = item.get_text().split('<br/>')
            #print(detail)

            items = item.get_text().split('\n')
            items2 = list(flatten([item.split(':') for item in items]))
            items2 = [x for x in items2 if x]
            #print(items2)

            for prop, name in [
                ('Title', 'title'),
                ('Genre', 'genres'),
                ('Developer', 'developer'),
                ('Publisher', 'publisher'),
                ('Release Date', 'release_date')
            ]:
                #print(prop,name)
                try:
                    foundIndex = items2.index(prop)
                    if prop == 'Genre':
                        value = items2[foundIndex+1].strip().split(',')
                        info.loc[row.name,'genres'] = value
                        print('genre: ' + str(value))
                    elif prop == 'Release Date':
                        value = standardize_date(items2[foundIndex+1].strip())
                        info.loc[row.name,'release_date'] = value
                        print(value)
                    else:
                        value = items2[foundIndex+1].strip()
                        info.loc[row.name,name] = value
                        print(value)
                except:
                    pass

In [265]:
cleaned_games[missing_pub & missing_dev].apply(scrape_game_info, axis=1)

  """Entry point for launching an IPython kernel.


2
OVERVIEW
genre: ['Adventure', ' Casual', ' Simulation']
Orbital Views
Orbital Views
2018-01-01
33
35
Second Nature
2017-09-18
56
The Iron Lady
2012-04-10
57
LIV Client
genre: ['Utilities', ' Video Production', ' Web Publishing']
LIV Inc
LIV Inc
2018-01-01
70
Awakening
genre: ['Animation & Modeling', ' Design & Illustration', ' Utilities']
AwingSoft
AwingSoft
2017-08-15
78
Mech League Boxing
genre: ['Action', ' Simulation', ' Sports']
VRGEN
VRGEN
2018-01-10
109
The Defector
2018-01-06
111
347
1439
War Thunder
genre: ['Action', ' Free to Play', ' Massively Multiplayer', ' Simulation']
Gaijin Entertainment
Gaijin Entertainment
2013-08-15
1474
1480
Legend of Dungeon
genre: ['Action', ' Indie', ' RPG']
Robot Loves Kitty
2013-09-13
1576
1615
1616
1631
1638
1639
1692
Leadwerks Game Engine
genre: ['Animation & Modeling', ' Design & Illustration', ' Education', ' Software Training', ' Utilities']
Leadwerks Software
Leadwerks Software
2014-01-06
1700
1701
1702
1703
2265
Infinity Runner
genre: 

5830
Sketchfab VR
genre: ['Adventure']
Sketchfab
Sketchfab
2016-05-13
5831
Dolphin Defense
genre: ['Action', ' Adventure', ' Indie']
Tanner Thayer
Tanner Thayer
2016-05-16
5844
Gods of Egypt
2016-05-17
5853
5864
5887
Time Machine VR
genre: ['Adventure', ' Indie', ' Simulation']
Minority Media Inc.
Minority Media Inc.
2016-05-19
5889
Canis Belli
2016-03-11
5900
Holodaze
genre: ['Action', ' Casual', ' Indie']
Sysdia Games
Sysdia Games
2016-05-19
5907
bloxyz
genre: ['Casual', ' Indie']
Svution
Svution
2016-05-19
5910
VR Karts SteamVR
genre: ['Casual', ' Indie', ' Racing']
Viewpoint Games
Viewpoint Games
2016-05-20
5915
Please Hold
2016-04-05
5916
5932
Surge
genre: ['Indie']
House of Secrets
House of Secrets
2016-05-19
5949
vrAMP
genre: ['Casual', ' Indie', ' Early Access']
Oriku Inc
Oriku Inc
2016-05-27
5964
Leave The Nest
genre: ['Action', ' Casual', ' Indie']
Kaio Interactive
Kaio Interactive
2016-05-27
5970
5971
Waltz of the Wizard
genre: ['Adventure', ' Indie', ' Simulation']
Aldin Dy

Artstage
genre: ['Animation & Modeling', ' Design & Illustration', ' Early Access']
Soy-software
Soy-software
2016-09-01
6804
Caketomino
genre: ['Casual', ' Indie']
Smoketree Studios
Smoketree Studios
2016-09-02
6807
Kubz VR
genre: ['Casual', ' Indie']
bibimbapstudio
bibimbapstudio
2016-09-05
6819
6820
GE Neuro
genre: ['Casual', ' Simulation']
Kite & Lightning
Kite & Lightning
2016-09-05
6835
Ten Little Roosters
2014-11-05
6836
Compadres
2016-09-06
6842
Temple of the Apsara
genre: ['Adventure', ' Indie']
Attraction Studios
Attraction Studios
2016-09-06
6843
6858
CloudBound
genre: ['Action', ' Adventure', ' Casual', ' Indie', ' Early Access']
Silicon Storm
Silicon Storm
2016-09-07
6868
Cockroach VR
genre: ['Casual', ' Indie']
TANTANMEN
TANTANMEN
2016-09-08
6878
Final Fleet
genre: ['Action', ' Adventure', ' Indie', ' Early Access']
Team2Bit
Team2Bit
2016-09-08
6882
Mars Odyssey
genre: ['Simulation']
Steel Wool Studios
Steel Wool Studios
2016-09-08
6889
Raptor Valley
genre: ['Action', ' A

7751
7752
Kingspray Graffiti VR
genre: ['Simulation']
Andrew Bates
Infectious Ape
2016-12-06
7756
7757
7758
7772
7790
7796
Broken Blue
genre: ['Casual', ' Free to Play', ' Indie', ' Simulation']
BacklotRealities
Cacti Council, BacklotRealities
2016-12-05
7798
WackyMoles
genre: ['Action', ' Casual', ' Simulation', ' Sports']
CrystalGame
CrystalGame
2016-12-12
7800
7802
Bleach
2004-10-05
7806
Tornuffalo
genre: ['Action']
RealityRig
RealityRig
2016-12-12
7837
Wipe Out VR
genre: ['Simulation']
PnagaeaVR
PnagaeaVR
2016-11-10
7844
Arma 3 Community Guide Series
2013-03-07
7856
Children of Colossus
genre: ['Action', ' Indie']
Colossus Interactive
Colossus Interactive
2016-12-15
7863
Alpine Ski VR
genre: ['Casual', ' Racing', ' Sports']
Suchworks Ltd
Suchworks Ltd
2016-12-16
7865
7866
Snow Fortress
genre: ['Action', ' Casual', ' Indie', ' Simulation', ' Strategy']
Mythical City Games
Mythical City Games
2016-12-16
7877
Dragon Skies VR
genre: ['Action', ' Adventure', ' RPG']
Tristan Hodges
FoxLe

8396
HOMEBOUND
genre: ['Action', ' Adventure', ' Casual']
Quixel
Quixel
2017-02-16
8406
8418
8423
IrreVRsible
genre: ['Action', ' Indie']
Raptor-Lab
Raptor-Lab
2016-05-24
8428
Neon Arena
genre: ['Action', ' Casual', ' Indie']
Axyos Games
Axyos Games
2017-02-20
8445
8446
Game of Aces
2016-09-09
8447
8448
8453
8455
Climbtime
genre: ['Casual', ' Indie']
Saluda Systems
Saluda Systems
2017-05-03
8460
MixCast Studio
genre: ['Education', ' Utilities', ' Video Production']
Blueprint Reality Inc.
Blueprint Reality Inc.
2017-02-22
8467
8468
Stolen Steel VR
genre: ['Action', ' Casual', ' Indie', ' Simulation']
Impromptu Games
Impromptu Games
2017-02-22
8481
Skills Hockey VR
genre: ['Sports', ' Early Access']
Justin Jimmo
Justin Jimmo
2017-02-23
8483
VR Meditation SkyRun
genre: ['Education']
Weekend Soft
Weekend Soft
2017-02-23
8484
Seabed Prelude
genre: ['Adventure', ' Casual', ' Indie', ' Simulation']
MythicOwl
MythicOwl
2017-02-24
8487
Magazime Editor
genre: ['Action', ' Indie', ' Simulation']


9228
VectorWars VR
genre: ['Action', ' Adventure', ' Casual', ' Indie']
Red Iron Labs
Red Iron Labs
2017-05-01
9231
VRIQ
genre: ['Free to Play', ' Simulation']
3DIQ
3DIQ
2017-05-03
9263
9299
Bounty Killer
genre: ['Action', ' Indie', ' RPG', ' Early Access']
Galaxy Game Studio
Galaxy Game Studio
2017-05-09
9302
Goaltender VR
genre: ['Casual', ' Sports', ' Early Access']
RunByCoffee
RunByCoffee
2017-05-09
9314
Blastercell
genre: ['Action', ' Casual', ' Indie']
Robot Games
SmokinSkull
2017-05-10
9315
Puppy Doge VR
genre: ['Casual', ' Indie', ' Simulation']
Chun Y.
Chun Y.
2017-05-10
9318
Battlezone
genre: ['Action']
Rebellion
Rebellion
2017-05-11
9322
Speed and Scream
genre: ['Casual', ' Indie', ' Simulation']
iNFINITE Production
iNFINITE Production
2017-04-30
9354
9356
Domain Defense VR
genre: ['Casual', ' Indie', ' Strategy']
Inclusion Studios
Inclusion Studios
2017-05-11
9357
Void Rangers
genre: ['Action', ' Indie']
Martin Senovsky
Martin Senovsky
2017-05-11
9362
9363
9364
9365
Making 

9906
9909
9931
The Dark Tapes
2017-03-17
9935
Retro Block VR
genre: ['Casual', ' Indie']
Mocove Studio
Mocove Studio
2017-07-10
9936
Tunnel Runner VR
genre: ['Casual', ' Simulation']
Mocove Studio
Mocove Studio
2017-07-10
9938
Survivor VR
genre: ['Action', ' Adventure', ' Simulation', ' Strategy']
Lucid Dream Studio
Lucid Dream Studio
2017-07-12
9960
9961
CloudCity VR
genre: ['Indie', ' Simulation', ' Strategy']
Floating Point Interactive
Floating Point Interactive
2017-07-13
9962
Balloon Chair Death Match
genre: ['Action']
Climax Studios Ltd
Climax Studios Ltd
2017-07-13
9974
Space Dream VR
genre: ['Adventure', ' Casual', ' Indie']
Davis3D
Davis3D
None
10002
10011
Simple Creature
2017-07-11
10012
Breakdance Academy
2017-07-11
10015
10023
Mortal Blitz
genre: ['Action']
Skonec Entertainment
Skonec Entertainment
2017-07-19
10024
OldMaidGirl
genre: ['Casual']
Vrai
Vrai
2017-07-19
10026
Cutlass
genre: ['Action', ' Adventure', ' Indie', ' Simulation']
Demigon
Demigon
2017-07-20
10031
Ultraw

Butterfly Moment
genre: ['Casual']
David Mulder
Red Splat Games
2017-09-22
10886
DWVR
genre: ['Action', ' Indie']
Mad Triangles
Mad Triangles
2017-09-26
10902
Waiting on Mary
2017-09-26
10903
The Magic Door
2007-03-04
10912
Cirque du Soleil
genre: ['Casual']
Felix and Paul Studios
Felix and Paul Studios
2017-09-01
10921
End of Days
genre: ['Early Access']
Life Art Studios
Life Art Studios
2017-09-27
10925
Moon VR Video Player
genre: ['Animation & Modeling', ' Education', ' Utilities', ' Video Production']
Rock VR
Rock VR
2017-09-27
10926
Gurugedara
genre: ['Free to Play', ' Massively Multiplayer']
Osmium
Osmium
2017-10-01
10927
10930
10932
10934
RuneSage
genre: ['Adventure', ' Indie', ' RPG']
George Gilbert
George Gilbert
2017-09-28
10943
STYLY
genre: ['Design & Illustration', ' Early Access']
Psychic VR Lab Co., Ltd.
Psychic VR Lab Co., Ltd.
2017-08-15
10958
Shoot Loop VR
genre: ['Action', ' Indie', ' Simulation']
ARLOOPA Inc.
ARLOOPA Inc.
2017-09-28
10962
Dronihilation VR
genre: ['Ac

11739
Naval Legends
2017-11-18
11748
Cove Point Fun Center VR
genre: ['Casual', ' Indie', ' Sports']
Armet Games
Armet Games
2017-11-20
11759
11769
11774
11775
11799
Vector Velocity
genre: ['Action', ' Casual']
David Mulder
Red Splat Games
2017-11-23
11800
Harvest Simulator VR
genre: ['Simulation']
Sakis25 Games
Bolt Virtual
2017-11-25
11806
Sprint Vector
genre: ['Action', ' Indie', ' Racing', ' Sports']
Survios
Survios
None
11809
Titanic VR
genre: ['Action', ' Simulation', ' Early Access']
Immersive VR Education Ltd
Immersive VR Education Ltd
2017-11-24
11825
Dream Channel
genre: ['Adventure', ' Casual', ' Free to Play']
Dream Channel
Dream Channel
2017-11-27
11828
11834
11841
AMON
genre: ['Casual', ' Indie']
Lykke Studios
Lykke Studios
2017-11-28
11852
11853
Z for Zachariah
2015-10-20
11854
Woman in Gold
2015-06-26
11862
11874
11877
Unknightly
genre: ['Casual', ' Indie', ' Early Access']
Portal Studios
Portal Studios
2017-11-30
11878
11879
11880
11884
Medieval Mayhem
genre: ['Casual'

12663
12669
REGENESIS Arcade DELUXE
genre: ['Action', ' Early Access']
Hyperbook Studio
Blue Technology Sp. z o.o.
2017-12-18
12676
Major League Gladiators
genre: ['Action', ' Free to Play', ' Indie']
Team Major League Gladiators
Team Major League Gladiators
2017-12-18
12688
WizzBall
genre: ['Action', ' Indie', ' Sports']
NRVR Studios
NRVR Studios
2017-12-15
12691
APEX Tournament
genre: ['Action', ' Indie', ' Early Access']
Douglas Liang
Fantom Fathom LLC
2017-12-15
12704
12717
12739
Zero Days VR
genre: ['Casual', ' Indie']
Scatter
Scatter
2017-06-08
12743
Show It 2 Me
genre: ['Casual', ' Indie']
Titmouse
Titmouse
2017-12-14
12747
The First Class VR
genre: ['Indie', ' Simulation']
Light & Digital Technology
Zodiac Interactive
2017-12-14
12759
YouTube VR
genre: ['Early Access']
Google
Google
2017-12-14
12762
12781
Manhattan
2015-12-16
12784
Nurse Jackie
2010-01-11
12786
Rocketjump
2016-09-13
12788
Mad Men
2008-09-25
12789
Orch Star
genre: ['Action', ' Indie', ' Strategy', ' Early Access

Fragments
genre: ['Adventure', ' Indie']
Pulsarium
Pulsarium
2017-11-09
13668
Naklua VR
genre: ['Adventure', ' Casual', ' Indie', ' Sports', ' Early Access']
Fly Dream Dev
Fly Dream Dev
2017-11-10
13678
13702
Miracle In The Woods
2017-11-08
13704
The Only Living Boy in New York
2017-10-20
13706
Order Up VR
genre: ['Casual', ' Indie', ' Simulation', ' Early Access']
Gambit Games Studio, LLC
Gambit Games Studio, LLC
2017-11-08
13709
vBuilder
genre: ['Animation & Modeling']
TSMOUNT, Inc.
TSMOUNT, Inc.
2017-11-08
13722
Roman Sacrifice in Córdoba
genre: ['Adventure', ' Casual']
Lithodomos VR
Lithodomos VR
2017-11-06
13740
Hide N Seek VR
genre: ['Casual', ' Indie', ' Early Access']
Antoine Rigitano
Infiniverse
2017-11-07
13741
SUPERHYPERCUBE
genre: ['Casual', ' Indie']
Kokoromi
Polytron
2017-11-07
13742
13745
Gravity Tunnel VR
genre: ['Simulation']
Qverty.com
Qverty.com
2017-11-06
13751
Sweaty Palms
genre: ['Action', ' Indie', ' Early Access']
Delattre & Harger
Delattre & Harger
2017-11-06
1

Stonehenge VR SANDBOX
genre: ['Casual']
VoyagerVR
VoyagerVR
2017-10-10
14409
14411
King of Spin VR
genre: ['Action', ' Casual', ' Simulation', ' Sports']
Wicked Witch
Wicked Witch
2017-10-10
14436
The Mix
2017-10-10
14445
14446
Knife Club VR
genre: ['Action', ' Indie', ' Early Access']
LiquidFire Entertainment
LiquidFire Entertainment
2017-10-09
14460
ASCII Wars
genre: ['Strategy']
James Oliver
James Oliver
None
14462
Toy Goblins
genre: ['Action', ' Adventure', ' Indie', ' RPG', ' Strategy', ' Early Access']
Lemonauts
Lemonauts
2017-10-05
14464
VR Triber
genre: ['Action', ' Adventure', ' Casual', ' Free to Play', ' Indie', ' Massively Multiplayer', ' Racing', ' Simulation', ' Sports', ' Early Access']
VR Triber
VR Triber
2017-09-26
14474
Portrait Drawing Fundamentals Course
2012-11-30
14478
Healing The Stupid
2015-03-01
14480
The Zero Dome
genre: ['Action', ' Early Access']
Pillow Head Games
Pillow Head Games
None
14485
Marshmallow Melee
genre: ['Action', ' Adventure', ' Indie', ' RPG'

Kill Switch
2017-06-16
15594
Wingless
genre: ['Action', ' Casual', ' Indie']
Kentoo Sp. z o.o.
Kentoo Sp. z o.o.
2017-08-22
15628
15631
15644
VR Table Sports
genre: ['Casual', ' Indie', ' Simulation', ' Sports']
Happy Bat
Happy Bat
2017-08-19
15646
15671
Chocolate
genre: ['Casual', ' Indie']
Gentle Manhands
Viacom NEXT
2017-08-17
15689
The Museum of ThroughView
genre: ['Adventure', ' Indie', ' Simulation']
Erwin Wolf
MarWin Studios, ThroughView
2017-08-17
15693
15712
VR2Space
genre: ['Casual', ' Indie', ' Simulation']
Infinite Void Ltd.
Indie
2017-08-16
15729
How to be a Latin Lover
2017-06-20
15737
15741
ChainMan
genre: ['Action', ' Adventure', ' Indie']
ouka-ichi-mon.inc
ouka-ichi-mon.inc
2017-08-15
15743
CollabHub
genre: ['Design & Illustration', ' Utilities', ' Early Access']
CollabHub
CollabHub
2017-08-14
15752
MermaidVR Video Player
genre: ['Utilities', ' Video Production']
Mermaid VR Inc.
Mermaid VR Inc.
2017-08-15
15758
Impromptu Vector Field Painter
genre: ['Animation & Modeli

Dabda
genre: ['Adventure', ' Casual', ' Indie']
2017-06-30
16629
Block Rocking Beats
genre: ['Casual', ' Indie', ' Simulation', ' Early Access']
Sander Sneek
SNAKE Productions
2017-06-30
16632
16635
16637
Puttyface
genre: ['Free to Play', ' Indie', ' Simulation', ' Early Access']
the3dCrew
the3dCrew
2017-03-10
16639
16651
16673
16674
16676
Ghost Ship
2017-06-27
16691
Gus Track Adventures VR
genre: ['Action', ' Adventure', ' Indie']
maxlvlgames
maxlvlgames
2017-06-27
16692
16718
16719
Xion
genre: ['Action', ' Indie', ' Early Access']
Zenz VR
Zenz VR
2017-06-23
16722
Luxin Time
genre: ['Action', ' Casual', ' Indie', ' Early Access']
ExtraSaucySauce
ExtraSaucySauce
2017-06-23
16724
Escape Together
genre: ['Adventure', ' Casual', ' Indie', ' Strategy']
2017-06-21
16727
16748
Miss Bernard Said
2017-06-21
16750
Kemono Friends
2017-06-21
16751
16753
Hundred
2017-06-21
16754
Big Order
2017-06-21
16755
16756
Lost Village
2017-06-21
16758
16759
Space Patrol Luluco
2017-06-21
16760
16762
Diesel E

17327
Elevator VR
genre: ['Casual', ' Indie', ' Simulation']
のdev
Babaroga, LLC
2017-05-15
17333
Eastwood VR
genre: ['Action', ' Indie', ' Simulation', ' Early Access']
RAV3 Interactive
RAV3 Interactive
2017-05-05
17354
AM Model Viewer
genre: ['Animation & Modeling', ' Design & Illustration']
Ao Mariko 2016
Ao Mariko 2016
2017-05-17
17367
Blobby Tennis
genre: ['Indie', ' Simulation', ' Sports']
SlinDev
SlinDev
2017-05-12
17370
Mythlink
genre: ['Action', ' Indie', ' Strategy']
Home Point Games
Home Point Games
None
17386
Immersion Chess
genre: ['Casual', ' Free to Play', ' Sports', ' Strategy']
Immersion
Immersion
2017-05-15
17395
Freeze Climbing
genre: ['Action', ' Indie', ' Simulation']
Bottino.Games
Bottino.Games
2017-05-12
17429
Cyberdrifter
genre: ['Action', ' Indie', ' Early Access']
Scott Wilson
UziGames
2017-05-09
17434
17436
Pixvana 360 Production Series
2017-05-08
17438
17439
Under the Canopy
2017-05-10
17440
17441
The Hunger Games 360
2012-08-18
17442
RED CUBE VR
genre: ['Act

18153
Virtual Sports
genre: ['Action', ' Casual', ' Simulation', ' Sports']
Free Range Games
Vive Studios
2017-03-29
18154
Happy Penguin VR
genre: ['Action', ' Adventure', ' Indie', ' Simulation', ' Sports']
Bellcat Game
Bellcat Game
2017-03-31
18156
Planet Defender
genre: ['Action', ' Adventure', ' Casual', ' Indie']
Fevolution Innovation Inc.
Fevolution Innovation Inc.
2017-03-30
18163
18164
Bungo Stray Dogs
2017-03-30
18165
INFINITI VR
genre: ['Free to Play']
The Pulse
The Pulse
2017-03-28
18168
18169
18175
18177
18185
18193
18195
Narcosis
genre: ['Adventure', ' Indie']
Honor Code, Inc.
Honor Code, Inc.
2017-03-28
18216
18218
Ze VR
genre: ['Action', ' Casual', ' Indie']
Funny Bit Games
Funny Bit Games
2017-03-11
18225
Storm VR
genre: ['Adventure', ' Casual', ' Indie', ' Simulation']
TeamStormVR
TeamStormVR
2016-09-23
18227
Weelco VR
genre: ['Animation & Modeling', ' Utilities', ' Video Production']
weelco_vr
weelco_vr
2017-03-21
18255
Mekside VR
genre: ['Indie', ' Massively Multipla

Rest House
genre: ['Adventure', ' Early Access']
studio Don Quixote
studio Don Quixote, Minotaur production
2017-02-08
18931
Dimensional Rift
genre: ['Free to Play', ' Indie']
Team Seven
Team Seven
2017-02-08
18932
Blue Effect VR
genre: ['Action', ' Indie']
DIVR Labs
DIVR Labs
2016-09-29
18934
Pyro VR
genre: ['Action', ' Casual', ' Simulation', ' Early Access']
Virtual Light VR
Virtual Light VR
2017-02-08
18939
18954
Archery Practice VR
genre: ['Action', ' Adventure', ' Casual', ' Indie', ' RPG']
Virtual Rage Studios LLC
Virtual Rage Studios LLC
2017-02-08
18958
Manchester By The Sea
2017-02-07
18959
The 9th Life of Louis Drax
2017-02-07
18962
Boogeyman 2
genre: ['Indie', ' Strategy']
Barry McCabe
Clockwork Wolf
2017-02-07
18964
Derail Valley
genre: ['Simulation', ' Early Access']
Altfuture
Altfuture
None
18966
18967
Dirty Dancing
1987-08-21
18970
VR Golf Online
genre: ['Action', ' Casual', ' Sports']
MAUMGOLF Co.,Ltd.
Kakao Games Corp.
2017-02-08
18972
MonkeyKing VR
genre: ['Action', 

Man From Shaolin
2013-02-26
19614
Escape from Zellman Orbital
genre: ['Adventure', ' Early Access']
Ken Richlin
Ken Richlin
2016-12-16
19618
19619
19633
19636
Special Delivery
genre: ['Action', ' Casual', ' Indie', ' Simulation']
Meerkat Gaming
Meerkat Gaming
2016-12-16
19640
We Are Stars
genre: ['Casual']
NSC Creative
NSC Creative
2016-12-16
19641
Perfect
genre: ['Simulation']
nDreams / Near Light
nDreams
2016-12-16
19646
Kunlun Fight
genre: ['Action', ' Sports', ' Early Access']
Touch Art Technology Co.,Ltd
Touch Art Technology Co.,Ltd
2016-12-17
19652
Singing Stones VR
genre: ['Casual', ' Indie', ' Simulation']
Chingis LLC
Chingis LLC
2016-12-16
19664
Defense of Castle Chilly
genre: ['Action', ' Indie', ' Early Access']
Lord of the Stack
Lord of the Stack
2016-12-15
19667
Elephant Express VR
genre: ['Action', ' Adventure', ' Casual']
Blob Lab
Blob Group Ltd
2016-12-15
19671
OneManVurgeR
genre: ['Action', ' Casual', ' Simulation']
Dazzle Inc.
Dazzle Inc.
2016-12-15
19690
VR Journey
g

Pirate Defense
genre: ['Strategy']
Rushil Reddy
Rushil Reddy
2016-11-11
20274
Percussive VR
genre: ['Casual', ' Simulation', ' Early Access']
Jamhack Games
Jamhack Games
2016-11-11
20276
Speech Trainer
genre: ['Software Training']
Wolf In Motion Ltd
Wolf In Motion Ltd
2016-11-11
20281
Castle Must Be Mine
genre: ['Casual', ' Indie', ' Simulation', ' Strategy', ' Early Access']
TheMiddleGray
TheMiddleGray
2016-11-11
20285
Nightcap
2016-11-11
20291
20307
Queendoom
genre: ['Action']
EP Games®
EP Games®
2016-11-10
20308
20311
Lonelyland VR
Catbox
Catbox
2016-11-09
20315
The Fastest Fist
genre: ['Action', ' Sports']
Touch Art Technology Co.,Ltd
Touch Art Technology Co.,Ltd
2016-10-30
20323
20327
VR Battle Grid
genre: ['Action', ' Casual', ' Free to Play', ' Indie', ' Simulation', ' Strategy']
Fred Sauer
Fred Sauer
2016-11-08
20329
20330
HandPass VR
genre: ['Indie', ' Sports']
Constructive Media
Constructive Media
2016-11-24
20331
Con Man
2016-11-08
20332
Hell Or High Water
2016-11-08
20334
W

21222
21223
Sword Master VR
genre: ['Action', ' Indie', ' Simulation', ' Sports']
Master Indie
Master Indie
2016-09-23
21224
Bitslap
genre: ['Indie']
Comrex AG
Comrex AG
2016-09-23
21226
What Goes Up
2014-09-12
21228
Top Floor
genre: ['Indie', ' Simulation']
SmartVR Studio
SmartVR Studio
2016-09-23
21234
21236
Antboy 3
2015-01-01
21238
Paranormal Island
2013-01-01
21259
21260
21261
Edge Guardian
genre: ['Action', ' Indie', ' Early Access']
Hypothermic Games
Hypothermic Games
2016-09-21
21265
Come On Down
2016-09-21
21268
Pierhead Arcade
genre: ['Casual', ' Indie', ' Simulation']
Mechabit Ltd
Mechabit Ltd
2016-09-21
21270
21271
21272
Escape Station
genre: ['Indie', ' Early Access']
Paul Thornton
Escape Reality UKVR , SoftApps
2016-09-20
21274
Candy Kingdom VR
genre: ['Action', ' Adventure', ' Casual', ' Indie']
Gameplay Studio VR
Gameplay Studio VR
2016-09-20
21279
Francois BOUILLE
genre: ['Education', ' Utilities']
François Bouille - Experience 360
François Bouille - Experience 360
201

21993
Russian VR Coasters
genre: ['Action', ' Casual', ' Indie', ' Simulation']
Funny Twins
Funny Twins
2016-07-27
22013
Broomball VR
genre: ['Adventure']
Rushil Reddy
Broomball Inc.
2016-07-28
22018
Pong Champion VR
genre: ['Action', ' Casual', ' Indie', ' Simulation', ' Sports', ' Early Access']
DegaSolutions
DegaSolutions
2016-07-26
22032
Roomscale Tower
genre: ['Action', ' Adventure', ' Early Access']
DuplicatorStudio
DuplicatorStudio
2016-07-26
22036
Screwball
2016-05-06
22040
The Debt
2016-07-08
22049
Candy Smash VR
genre: ['Action', ' Casual', ' Indie', ' Simulation', ' Sports']
Wadup Games
Wadup Games
2016-07-26
22050
Book Of Merlin
genre: ['Simulation']
乐客游戏
乐客游戏
2016-07-22
22054
22055
VRMultigames
genre: ['Action', ' Casual', ' Free to Play', ' Indie', ' Sports']
Mad Triangles
Mad Triangles
2016-07-25
22062
Audio Arena
genre: ['Action']
Skydome Studios
Skydome Studios
2016-07-25
22067
MageWorks
genre: ['Casual', ' Early Access']
Earthborn Interactive, LLC
Earthborn Interactiv

Octane
2009-06-30
23249
Vampire
2013-08-20
23250
23253
The Hurt Locker
2009-06-26
23255
Reservoir Dogs
1992-10-23
23256
Rounders
1999-02-09
23257
Jackie Brown
2002-08-20
23258
FROM DUSK TILL DAWN
1996-01-19
23259
Sin City
2005-04-01
23260
23261
American Psycho
2000-04-14
23262
23263
Daybreakers
2010-05-11
23264
23265
War
2008-01-01
23266
Cop Land
1998-04-21
23267
Cooties
2015-12-01
23268
Snitch
2013-02-22
23269
Bad Lieutenant
1993-08-18
23270
Escape From New York
2016-04-22
23271
This is Spinal Tap
1984-03-02
23272
Bad Santa
2003-11-26
23273
23274
Return of the Living Dead 3
2001-08-28
23275
Hero
2004-11-30
23276
Leprechaun 2
1995-09-26
23277
Russkies
1987-11-06
23278
23279
Leprechaun 3
2001-02-27
23280
Sicario
2016-01-05
23281
Two Family House
2000-10-06
23282
The Hunger Games
2012-03-23
23283
Divergent
2014-03-21
23284
23285
23286
Escape Plan
2013-10-18
23287
Twilight
2009-06-01
23288
23289
23290
23291
The Last Witch Hunter
2016-02-02
23292
23293
Dredd
2013-01-08
23294
The Expendable

Nevermind
genre: ['Adventure', ' Indie']
Flying Mollusk
Flying Mollusk
2015-09-29
25614
Euclidean
genre: ['Action', ' Casual', ' Indie']
Alpha Wave Entertainment
AAD Productions
2015-09-25
25653
Planetship
genre: ['Action', ' Adventure', ' Indie']
John Lawrence
John Lawrence
2015-02-06
25687
25701
25723
The Man From Orlando
2012-09-07
25748
Future Farmer
2008-08-01
25750
Gillespie
2010-08-01
25751
25779
The Facility
genre: ['Action', ' Adventure', ' Early Access']
PolyDigital
PolyDigital
2015-09-03
25788
Mad Max Beyond Thunderdome
1985-07-10
25789
25793
25809
Metal Gear Solid Legacy
2015-09-01
25899
Universe Sandbox ²
genre: ['Casual', ' Indie', ' Simulation', ' Early Access']
Giant Army
Giant Army
2015-08-24
25972
26041
26056
Metro Warp
genre: ['Casual', ' Indie', ' Strategy']
Another Yeti
Another Yeti
2015-08-05
26058
REST DAYS
2014-12-10
26060
26061
26085
THE BASEMENT
2015-05-15
26089
JUMP
genre: ['Action', ' Indie']
Endeavor One Inc.
Endeavor One Inc.
2015-07-30
26092
Diving Normal

Progeny VR
genre: ['Indie', ' RPG', ' Simulation']
Silverstring Media Inc.
Silverstring Media Inc.
2018-01-16
32644
BrickWorks 360
2017-09-22


2        None
33       None
35       None
56       None
57       None
70       None
78       None
109      None
111      None
347      None
1439     None
1474     None
1480     None
1576     None
1615     None
1616     None
1631     None
1638     None
1639     None
1692     None
1700     None
1701     None
1702     None
1703     None
2265     None
2362     None
2410     None
2707     None
2883     None
2928     None
         ... 
32261    None
32275    None
32280    None
32281    None
32284    None
32341    None
32342    None
32416    None
32439    None
32440    None
32451    None
32456    None
32460    None
32462    None
32474    None
32482    None
32484    None
32486    None
32488    None
32505    None
32509    None
32558    None
32559    None
32575    None
32576    None
32577    None
32580    None
32581    None
32586    None
32644    None
Length: 3170, dtype: object

In [371]:
info = pd.read_csv('info.csv', index_col=0)
info

Unnamed: 0,title,genres,developer,publisher,release_date
2,OVERVIEW,"['Adventure', ' Casual', ' Simulation']",Orbital Views,Orbital Views,2018-01-01
33,,,,,
35,Second Nature,,,,2017-09-18
56,The Iron Lady,,,,2012-04-10
57,LIV Client,"['Utilities', ' Video Production', ' Web Publi...",LIV Inc,LIV Inc,2018-01-01
70,Awakening,"['Animation & Modeling', ' Design & Illustrati...",AwingSoft,AwingSoft,2017-08-15
78,Mech League Boxing,"['Action', ' Simulation', ' Sports']",VRGEN,VRGEN,2018-01-10
109,The Defector,,,,2018-01-06
111,,,,,
347,,,,,


In [372]:
#info.to_csv('info.csv')
cleaned_games.loc[info.index, "developer"] = info['developer']
cleaned_games.loc[info.index, "publisher"] = info['publisher']
cleaned_games.loc[info.index, "title"] = info['title']
cleaned_games.loc[info.index, "genres"] = info['genres']
cleaned_games.loc[info.index, "release_date"] = info['release_date']

In [373]:
cleaned_games.head()

Unnamed: 0,description,developer,genres,id,parent_game,parent_game_url,price,publisher,release_date,sentiment,specs,tags,title
0,"[, About This Game, Do you love puzzles? What ...",ChillFun,[Strategy],773510.0,,,14.99,ChillFun,2018-01-17,,"[Single-player, Partial Controller Support, St...","[Strategy, Cartoon, Puzzle, First-Person, Phys...",The Hardest Thing
1,"[, About This Game, , Dear Bob,, I know you kn...",NOS3D,"[Action, Casual, Indie, Massively Multiplayer,...",776170.0,,,,NOS3D,Jan 2018,,"[Multi-player, Online Multi-Player, Partial Co...","[Early Access, Strategy, Action, Massively Mul...",Serious Office
2,"[, About This Game, OVERVIEW is a 30 minutes n...",Orbital Views,"['Adventure', ' Casual', ' Simulation']",751110.0,,,,Orbital Views,2018-01-01,,"[Single-player, HTC Vive, Oculus Rift, Tracked...","[Adventure, Casual, Simulation]",OVERVIEW
3,"[, About This Game, In Global Soccer Manager 2...","gsmpcgame,globalsoccermanager,Andrea Hochstein","[Casual, Indie, Simulation, Sports, Strategy]",625700.0,,,9.99,gsmpcgame,2017-05-24,Positive,"[Single-player, Steam Achievements, Steam Trad...","[Sports, Strategy, Simulation, Indie, Casual, ...",Global Soccer Manager 2017
5,"[, About This Game, , Survived By is a free-to...",Human Head Studios,"[Action, Free to Play, Indie, Massively Multip...",606140.0,,,,Digital Extremes,Early 2018,,"[Multi-player, Online Multi-Player, MMO, Co-op...","[Free to Play, Massively Multiplayer, Indie, R...",Survived By


In [374]:
cleaned_games.drop(columns=['parent_game','parent_game_url'], inplace=True)

In [303]:
cleaned_games.head()

Unnamed: 0,description,developer,genres,id,price,publisher,release_date,sentiment,specs,tags,title
0,"[, About This Game, Do you love puzzles? What ...",ChillFun,[Strategy],773510.0,14.99,ChillFun,2018-01-17,,"[Single-player, Partial Controller Support, St...","[Strategy, Cartoon, Puzzle, First-Person, Phys...",The Hardest Thing
1,"[, About This Game, , Dear Bob,, I know you kn...",NOS3D,"[Action, Casual, Indie, Massively Multiplayer,...",776170.0,,NOS3D,Jan 2018,,"[Multi-player, Online Multi-Player, Partial Co...","[Early Access, Strategy, Action, Massively Mul...",Serious Office
2,"[, About This Game, OVERVIEW is a 30 minutes n...",Orbital Views,"[Adventure, Casual, Simulation]",751110.0,,Orbital Views,2018-01-01,,"[Single-player, HTC Vive, Oculus Rift, Tracked...","[Adventure, Casual, Simulation]",OVERVIEW
3,"[, About This Game, In Global Soccer Manager 2...","gsmpcgame,globalsoccermanager,Andrea Hochstein","[Casual, Indie, Simulation, Sports, Strategy]",625700.0,9.99,gsmpcgame,2017-05-24,Positive,"[Single-player, Steam Achievements, Steam Trad...","[Sports, Strategy, Simulation, Indie, Casual, ...",Global Soccer Manager 2017
5,"[, About This Game, , Survived By is a free-to...",Human Head Studios,"[Action, Free to Play, Indie, Massively Multip...",606140.0,,Digital Extremes,Early 2018,,"[Multi-player, Online Multi-Player, MMO, Co-op...","[Free to Play, Massively Multiplayer, Indie, R...",Survived By


#### Inspect data

In [375]:
# inspect missing ids, specs, tags, and genres

# Create variable with TRUE if id is missing
missing_id = pd.isnull(cleaned_games['id']) == True
# Create variable with TRUE if game specs is missing
missing_specs = pd.isnull(cleaned_games['specs']) == True
# Create variable with TRUE if game tags is missing
missing_tags = pd.isnull(cleaned_games['tags']) == True
# Create variable with TRUE if parent game is missing
missing_genres = pd.isnull(cleaned_games['genres']) == True

# Select all cases where all three fields are missing
no_spec_tag_genre = cleaned_games[missing_specs & missing_tags & missing_genres]
no_specs = cleaned_games[missing_specs]
no_tags = cleaned_games[missing_tags]
no_genres = cleaned_games[missing_genres]
no_id = cleaned_games[missing_id]

print('missing id: '+ str(len(no_id)))
print('missing all 3: '+ str(len(no_spec_tag_genre)))
print('missing specs: '+ str(len(no_specs)))
print('missing tags: '+ str(len(no_tags)))
print('missing genres: '+ str(len(no_genres)))

#cleaned_games.drop(no_tags.index, inplace=True)

missing id: 2
missing all 3: 12
missing specs: 680
missing tags: 60
missing genres: 1638


In [310]:
# inspect no id rows
no_id

Unnamed: 0,description,developer,genres,id,price,publisher,release_date,sentiment,specs,tags,title
347,,,,,39.99,,,,,,
899,"[, About This Game, Batman: Arkham City builds...","Rocksteady Studios,Feral Interactive (Mac)","[Action, Adventure]",,19.99,"Warner Bros. Interactive Entertainment, Feral ...",2012-09-07,Overwhelmingly Positive,"[Single-player, Steam Achievements, Steam Trad...","[Action, Open World, Batman, Adventure, Stealt...",Batman: Arkham City - Game of the Year Edition


In [376]:
# drop rows with no game id
cleaned_games.drop(no_id.index, inplace=True)

In [377]:
# inspect missing all 3
no_spec_tag_genre = no_spec_tag_genre.drop(347)
cleaned_games.drop(no_spec_tag_genre.index, inplace=True)

In [378]:
cleaned_games['id']=cleaned_games['id'].astype(int)
cleaned_games.head()

Unnamed: 0,description,developer,genres,id,price,publisher,release_date,sentiment,specs,tags,title
0,"[, About This Game, Do you love puzzles? What ...",ChillFun,[Strategy],773510,14.99,ChillFun,2018-01-17,,"[Single-player, Partial Controller Support, St...","[Strategy, Cartoon, Puzzle, First-Person, Phys...",The Hardest Thing
1,"[, About This Game, , Dear Bob,, I know you kn...",NOS3D,"[Action, Casual, Indie, Massively Multiplayer,...",776170,,NOS3D,Jan 2018,,"[Multi-player, Online Multi-Player, Partial Co...","[Early Access, Strategy, Action, Massively Mul...",Serious Office
2,"[, About This Game, OVERVIEW is a 30 minutes n...",Orbital Views,"['Adventure', ' Casual', ' Simulation']",751110,,Orbital Views,2018-01-01,,"[Single-player, HTC Vive, Oculus Rift, Tracked...","[Adventure, Casual, Simulation]",OVERVIEW
3,"[, About This Game, In Global Soccer Manager 2...","gsmpcgame,globalsoccermanager,Andrea Hochstein","[Casual, Indie, Simulation, Sports, Strategy]",625700,9.99,gsmpcgame,2017-05-24,Positive,"[Single-player, Steam Achievements, Steam Trad...","[Sports, Strategy, Simulation, Indie, Casual, ...",Global Soccer Manager 2017
5,"[, About This Game, , Survived By is a free-to...",Human Head Studios,"[Action, Free to Play, Indie, Massively Multip...",606140,,Digital Extremes,Early 2018,,"[Multi-player, Online Multi-Player, MMO, Co-op...","[Free to Play, Massively Multiplayer, Indie, R...",Survived By


In [379]:
print(len(pd.isnull(info['genres'])))

3170


In [380]:
to_drop = no_tags.index.append(no_genres.index).append(no_specs.index)
to_drop_unique = to_drop.unique()

In [381]:
len(to_drop_unique)

1943

In [382]:
for ix in to_drop_unique:
#     print(ix)
    try: 
        cleaned_games.drop(ix, inplace=True)
    except:
        pass
#to_drop_unique.drop([347, 5697, 7073, 7075, 7076, 7077, 7078, 9099, 21105, 21106, 24123, 27983])
#cleaned_games.drop(to_drop_unique)

In [383]:
len(cleaned_games)

18021

In [356]:
cleaned_games.head()

Unnamed: 0,description,developer,genres,id,price,publisher,release_date,sentiment,specs,tags,title
0,"[, About This Game, Do you love puzzles? What ...",ChillFun,[Strategy],773510,14.99,ChillFun,2018-01-17,,"[Single-player, Partial Controller Support, St...","[Strategy, Cartoon, Puzzle, First-Person, Phys...",The Hardest Thing
1,"[, About This Game, , Dear Bob,, I know you kn...",NOS3D,"[Action, Casual, Indie, Massively Multiplayer,...",776170,,NOS3D,Jan 2018,,"[Multi-player, Online Multi-Player, Partial Co...","[Early Access, Strategy, Action, Massively Mul...",Serious Office
2,"[, About This Game, OVERVIEW is a 30 minutes n...",Orbital Views,"[Adventure, Casual, Simulation]",751110,,Orbital Views,2018-01-01,,"[Single-player, HTC Vive, Oculus Rift, Tracked...","[Adventure, Casual, Simulation]",OVERVIEW
3,"[, About This Game, In Global Soccer Manager 2...","gsmpcgame,globalsoccermanager,Andrea Hochstein","[Casual, Indie, Simulation, Sports, Strategy]",625700,9.99,gsmpcgame,2017-05-24,Positive,"[Single-player, Steam Achievements, Steam Trad...","[Sports, Strategy, Simulation, Indie, Casual, ...",Global Soccer Manager 2017
5,"[, About This Game, , Survived By is a free-to...",Human Head Studios,"[Action, Free to Play, Indie, Massively Multip...",606140,,Digital Extremes,Early 2018,,"[Multi-player, Online Multi-Player, MMO, Co-op...","[Free to Play, Massively Multiplayer, Indie, R...",Survived By


#### re-format date, clean/inspect price, clean sentiment

In [402]:
cleaned_games['release_date'] = cleaned_games['release_date'].apply(lambda x: standardize_date(x))

In [404]:
# drop NaN release dates -- these are games yet to be released
missing_date = cleaned_games[pd.isnull(cleaned_games['release_date'])]
for i in missing_date.index:
    try:
        cleaned_games.drop(i, inplace=True)
    except:
        pass

In [408]:
# inspect missing prices
len(cleaned_games.loc[pd.isnull(cleaned_games['price'])])

664

In [409]:
# drop NaN prices -- some are not released, some are no longer availabe etc
missing_price = cleaned_games[pd.isnull(cleaned_games['price'])]
for i in missing_price.index:
    try:
        cleaned_games.drop(i, inplace=True)
    except:
        pass

In [411]:
cleaned_games['price'].unique()

array([14.99, 9.99, 19.99, 0.99, 1.99, 2.99, 'Free', 4.99, 8.99, 6.99,
       3.99, 5.99, 12.99, 39.99, 'Free to Play', 7.99, 29.99, 10.99, 2.49,
       15.99, 54.99, 89.99, 4.49, 18.99, 15.0, 11.99, 13.99, 0.98,
       'Free Demo', 24.99, 'Free To Play', 99.99, 74.48, 160.91, 49.99,
       19.98, 34.99, 59.95, 17.99, 59.99, 9.53, 0.74, 16.99, 689.75,
       189.96, 13.37, 79.99, 23.99, 129.99, 12.0, 44.99, 15.98, 0.5, 72.49,
       1.0, 0.59, 4.05, 1.47, 0.49, 31.99, 11.66, 4.79, 20.0, 9.98, 9.69,
       'Play the Demo', 3.98, 9.0, 2.0, 1.9500000000000002, 1.79, 1.5,
       2.66, 11.15, 1.48, 254.72, 6.66, 6.48, 3.34, 26.99, 399.99, 149.99,
       10.97, 10.96, 2.47, 40.0, 199.99, 9.08, 3.33, 289.97, 202.76, 44.98,
       20.99, 1.29, 'Free Mod', 11.96, 16.06, 5.35, 5.98,
       1.5899999999999999, 2.3, 0.9500000000000001, 1.98, 172.24, 1.49,
       0.79, 6.27, 24.94, 19.95, 10.74, 0.66, 7.12, 64.99, 5.83, 3.49,
       69.99, 2.89, 4.0, 87.94, 8.98, 9.95, 19.12, 3.0, 25.0, 13.47, 36.6

In [419]:
# clean price data
cleaned_games.loc[cleaned_games['price'] == 'Free', 'price'] = 0.0
cleaned_games.loc[cleaned_games['price'] == 'Free Demo', 'price'] = 0.0
cleaned_games.loc[cleaned_games['price'] == 'Play Now', 'price'] = 0.0
cleaned_games.loc[cleaned_games['price'] == 'Free to Play', 'price'] = 0.0
cleaned_games.loc[cleaned_games['price'] == 'Free To Play', 'price'] = 0.0
cleaned_games.loc[cleaned_games['price'] == 'Install Theme', 'price'] = 0.0
cleaned_games.loc[cleaned_games['price'] == 'Third-party', 'price'] = 0.0
cleaned_games.loc[cleaned_games['price'] == 'Play for Free!', 'price'] = 0.0
cleaned_games.loc[cleaned_games['price'] == 'Install Now', 'price'] = 0.0

In [420]:
cleaned_games['price'].unique()

array([14.99, 9.99, 19.99, 0.99, 1.99, 2.99, 0.0, 4.99, 8.99, 6.99, 3.99,
       5.99, 12.99, 39.99, 7.99, 29.99, 10.99, 2.49, 15.99, 54.99, 89.99,
       4.49, 18.99, 15.0, 11.99, 13.99, 0.98, 24.99, 99.99, 74.48, 160.91,
       49.99, 19.98, 34.99, 59.95, 17.99, 59.99, 9.53, 0.74, 16.99, 689.75,
       189.96, 13.37, 79.99, 23.99, 129.99, 12.0, 44.99, 15.98, 0.5, 72.49,
       1.0, 0.59, 4.05, 1.47, 0.49, 31.99, 11.66, 4.79, 20.0, 9.98, 9.69,
       'Play the Demo', 3.98, 9.0, 2.0, 1.9500000000000002, 1.79, 1.5,
       2.66, 11.15, 1.48, 254.72, 6.66, 6.48, 3.34, 26.99, 399.99, 149.99,
       10.97, 10.96, 2.47, 40.0, 199.99, 9.08, 3.33, 289.97, 202.76, 44.98,
       20.99, 1.29, 'Free Mod', 11.96, 16.06, 5.35, 5.98,
       1.5899999999999999, 2.3, 0.9500000000000001, 1.98, 172.24, 1.49,
       0.79, 6.27, 24.94, 19.95, 10.74, 0.66, 7.12, 64.99, 5.83, 3.49,
       69.99, 2.89, 4.0, 87.94, 8.98, 9.95, 19.12, 3.0, 25.0, 13.47, 36.67,
       3.29, 299.99, 9.9, 12.89, 21.99, 1.87, 20.94,

In [423]:
index_1 = cleaned_games.loc[cleaned_games['price'] == 'Starting at $449.00'].index
index_2 = cleaned_games.loc[cleaned_games['price'] == 'Free Mod'].index
index_3 = cleaned_games.loc[cleaned_games['price'] == 'Free Movie'].index
index_4 = cleaned_games.loc[cleaned_games['price'] == 'Play WARMACHINE: Tactics Demo'].index
index_5 = cleaned_games.loc[cleaned_games['price'] == 'Free to Try'].index
index_6 = cleaned_games.loc[cleaned_games['price'] == 'Play the Demo'].index
print(index_1, index_2, index_3, index_4, index_5, index_6)

Int64Index([], dtype='int64') Int64Index([13421, 27416, 28521, 28522], dtype='int64') Int64Index([], dtype='int64') Int64Index([28187], dtype='int64') Int64Index([], dtype='int64') Int64Index([6101], dtype='int64')


In [424]:
to_drop = index_1.append(index_2).append(index_3).append(index_4).append(index_5).append(index_6)

In [427]:
for i in to_drop:
    try:
        cleaned_games.drop(i, inplace=True)
    except:
        pass

In [428]:
cleaned_games['price'].unique()

array([14.99, 9.99, 19.99, 0.99, 1.99, 2.99, 0.0, 4.99, 8.99, 6.99, 3.99,
       5.99, 12.99, 39.99, 7.99, 29.99, 10.99, 2.49, 15.99, 54.99, 89.99,
       4.49, 18.99, 15.0, 11.99, 13.99, 0.98, 24.99, 99.99, 74.48, 160.91,
       49.99, 19.98, 34.99, 59.95, 17.99, 59.99, 9.53, 0.74, 16.99, 689.75,
       189.96, 13.37, 79.99, 23.99, 129.99, 12.0, 44.99, 15.98, 0.5, 72.49,
       1.0, 0.59, 4.05, 1.47, 0.49, 31.99, 11.66, 4.79, 20.0, 9.98, 9.69,
       3.98, 9.0, 2.0, 1.9500000000000002, 1.79, 1.5, 2.66, 11.15, 1.48,
       254.72, 6.66, 6.48, 3.34, 26.99, 399.99, 149.99, 10.97, 10.96, 2.47,
       40.0, 199.99, 9.08, 3.33, 289.97, 202.76, 44.98, 20.99, 1.29, 11.96,
       16.06, 5.35, 5.98, 1.5899999999999999, 2.3, 0.9500000000000001,
       1.98, 172.24, 1.49, 0.79, 6.27, 24.94, 19.95, 10.74, 0.66, 7.12,
       64.99, 5.83, 3.49, 69.99, 2.89, 4.0, 87.94, 8.98, 9.95, 19.12, 3.0,
       25.0, 13.47, 36.67, 3.29, 299.99, 9.9, 12.89, 21.99, 1.87, 20.94,
       7.49, 0.89, 10.0, 17.88, 2.4

In [429]:
print(len(cleaned_games.loc[cleaned_games['sentiment'] == 'Overwhelmingly Positive']))
print(len(cleaned_games.loc[cleaned_games['sentiment'] == 'Very Positive']))
print(len(cleaned_games.loc[cleaned_games['sentiment'] == 'Positive']))
print(len(cleaned_games.loc[cleaned_games['sentiment'] == 'Mostly Positive']))

print(len(cleaned_games.loc[cleaned_games['sentiment'] == 'Mixed']))

print(len(cleaned_games.loc[cleaned_games['sentiment'] == 'Mostly Negative']))
print(len(cleaned_games.loc[cleaned_games['sentiment'] == 'Negative']))
print(len(cleaned_games.loc[cleaned_games['sentiment'] == 'Very Negative']))
print(len(cleaned_games.loc[cleaned_games['sentiment'] == 'Overwhelmingly Negative']))

sentiment_mapping = {'Mixed': 0, 'Mostly Positive': 1, 'Positive': 2, 'Very Positive': 3, 'Overwhelmingly Positive': 4, 'Mostly Negative': -1, 'Negative': -2, 'Very Negative': -3, 'Overwhelmingly Negative': -4 }
cleaned_games.sentiment.unique()

278
3249
1967
2216
3363
647
112
25
6


array([nan, 'Positive', '4 user reviews', 'Mixed', '1 user reviews',
       '3 user reviews', '5 user reviews', 'Overwhelmingly Positive',
       'Very Positive', 'Mostly Positive', '8 user reviews',
       '2 user reviews', '7 user reviews', '6 user reviews', 'Negative',
       'Mostly Negative', '9 user reviews', 'Overwhelmingly Negative',
       'Very Negative'], dtype=object)

In [430]:
def parse_sentiment(row):
    """This function mapps the sentiments into numerical values"""
    sentiment_mapping = {'Mixed': 0, 'Mostly Positive': 1, 'Positive': 2, 'Very Positive': 3, 'Overwhelmingly Positive': 4, 'Mostly Negative': -1, 'Negative': -2, 'Very Negative': -3, 'Overwhelmingly Negative': -4 }
    if row['sentiment'] in sentiment_mapping:
        return sentiment_mapping[row['sentiment']]
    else:
        return 0
cleaned_games['sentiment_num'] = cleaned_games.apply(parse_sentiment, axis=1)

In [431]:
cleaned_games = cleaned_games.drop(columns='sentiment')

In [433]:
cleaned_games.head()

Unnamed: 0,description,developer,genres,id,price,publisher,release_date,specs,tags,title,sentiment_num
0,"[, About This Game, Do you love puzzles? What ...",ChillFun,[Strategy],773510,14.99,ChillFun,2018-01-17,"[Single-player, Partial Controller Support, St...","[Strategy, Cartoon, Puzzle, First-Person, Phys...",The Hardest Thing,0
3,"[, About This Game, In Global Soccer Manager 2...","gsmpcgame,globalsoccermanager,Andrea Hochstein","[Casual, Indie, Simulation, Sports, Strategy]",625700,9.99,gsmpcgame,2017-05-24,"[Single-player, Steam Achievements, Steam Trad...","[Sports, Strategy, Simulation, Indie, Casual, ...",Global Soccer Manager 2017,2
6,"[, About This Game, Experience the pureness an...",Media Art,"[Adventure, Casual]",770880,9.99,Big Fish Games,2018-01-17,[Single-player],"[Adventure, Casual]",Love Story: The Beach Cottage,0
7,"[, About This Game, See how a game created by ...",AIREM,"[Adventure, Indie]",725780,9.99,IQ Publishing,2018-01-17,"[Single-player, Full controller support]","[Adventure, Indie, Horror]",PLAY WITH ME,0
8,"[, About This Game, From the creators of Rogue...",Cellar Door Games,"[Action, Adventure, Indie, RPG]",416600,19.99,Cellar Door Games,2018-01-17,"[Single-player, Co-op, Online Co-op, Local Co-...","[Indie, Action, Adventure, RPG, Co-op, Beat 'e...",Full Metal Furies,0


In [434]:
print(len(cleaned_games))

17098


In [436]:
# check for missing data
check_df(cleaned_games)

description: object    (has NaN)
developer: object    (has NaN)
genres: object
id: int64
price: object
publisher: object    (has NaN)
release_date: object
specs: object
tags: object
title: object
sentiment_num: int64


In [437]:
cleaned_games['description'].fillna('', inplace=True)

In [439]:
cleaned_games[pd.isnull(cleaned_games['developer'])]

Unnamed: 0,description,developer,genres,id,price,publisher,release_date,specs,tags,title,sentiment_num
278,"[, About This Game, Get the Shot. Get the Game...",,"[Simulation, Sports]",12690,9.99,"ValuSoft, Retroism",2009-07-07,[Single-player],"[Simulation, Hunting, Sports, America]",Hunting Unlimited 2010,1
657,"[, About This Game, Welcome to the new Wild We...",,"[Action, Adventure]",33420,9.99,Ubisoft,2011-09-13,"[Single-player, Multi-player, Co-op, Steam Ach...","[Action, Adventure, FPS, Co-op, Western, Shoot...",Call of Juarez®: The Cartel,0
1429,"[, Reviews, , “Somewhere during the journey yo...",,[Indie],239350,14.99,Mossmouth,2013-08-08,"[Single-player, Shared/Split Screen, Steam Ach...","[Rogue-like, Platformer, Indie, Difficult, 2D,...",Spelunky,3
1476,"[, Reviews, , “If you can endure a half hour’s...",,"[Action, Adventure, Indie, Simulation, Early A...",246300,9.99,Matthew C Cohen,2012-10-19,"[Single-player, Steam Trading Cards, Partial C...","[Early Access, Horror, Indie, Adventure, Simul...",Paranormal,0
1493,"[, About This Game, Cast down from power by a ...",,[Action],242960,6.99,Square Enix,2002-03-29,[Single-player],"[Action, Vampire, Adventure, Dark Fantasy, Fan...",Blood Omen 2: Legacy of Kain,1
1506,"[, Reviews, , “If you're not the diehard RPGer...",,[RPG],253940,4.99,Topware Interactive,1999-10-31,"[Single-player, Steam Trading Cards, Steam Cloud]","[RPG, JRPG, Turn-Based, Singleplayer, Isometri...",Septerra Core,1
1507,"[, Reviews, , “Players who engage in get a won...",,"[Simulation, Strategy]",227020,19.99,Kalypso Media Digital,2013-09-27,"[Single-player, Multi-player, Steam Achievemen...","[Strategy, Simulation, Trading, Historical, Ec...",Rise of Venice,0
1513,"[, About This Game, In 2455 AD, Kage Mishima u...",,[Action],242980,6.99,Square Enix,2000-05-23,[Single-player],"[Action, FPS, Classic, Time Travel, Singleplay...",Daikatana,0
1517,"[, About This Game, , 20 years after the origi...",,"[Action, Adventure, RPG]",245730,9.99,Ubisoft,2013-10-01,[Single-player],"[Action, Adventure, RPG, Platformer, Cyberpunk...",Flashback,0
1525,"[, Reviews, , “The flow overall is excellent, ...",,"[Action, Indie, RPG]",248710,4.99,Forever Entertainment S. A.,2013-07-07,[Single-player],"[RPG, Action, Indie, Hack and Slash, Action RP...",Iesabel,-1


In [443]:
# Create variable with TRUE if publisher is missing
missing_pub = pd.isnull(cleaned_games['publisher']) == True
# Create variable with TRUE if developer is missing
missing_dev = pd.isnull(cleaned_games['developer']) == True

len(cleaned_games[missing_pub & missing_dev])
# drop rows that have missing pub and dev
cleaned_games.drop(cleaned_games[missing_pub & missing_dev].index, inplace=True)

In [444]:
cleaned_games.head()

Unnamed: 0,description,developer,genres,id,price,publisher,release_date,specs,tags,title,sentiment_num
0,"[, About This Game, Do you love puzzles? What ...",ChillFun,[Strategy],773510,14.99,ChillFun,2018-01-17,"[Single-player, Partial Controller Support, St...","[Strategy, Cartoon, Puzzle, First-Person, Phys...",The Hardest Thing,0
3,"[, About This Game, In Global Soccer Manager 2...","gsmpcgame,globalsoccermanager,Andrea Hochstein","[Casual, Indie, Simulation, Sports, Strategy]",625700,9.99,gsmpcgame,2017-05-24,"[Single-player, Steam Achievements, Steam Trad...","[Sports, Strategy, Simulation, Indie, Casual, ...",Global Soccer Manager 2017,2
6,"[, About This Game, Experience the pureness an...",Media Art,"[Adventure, Casual]",770880,9.99,Big Fish Games,2018-01-17,[Single-player],"[Adventure, Casual]",Love Story: The Beach Cottage,0
7,"[, About This Game, See how a game created by ...",AIREM,"[Adventure, Indie]",725780,9.99,IQ Publishing,2018-01-17,"[Single-player, Full controller support]","[Adventure, Indie, Horror]",PLAY WITH ME,0
8,"[, About This Game, From the creators of Rogue...",Cellar Door Games,"[Action, Adventure, Indie, RPG]",416600,19.99,Cellar Door Games,2018-01-17,"[Single-player, Co-op, Online Co-op, Local Co-...","[Indie, Action, Adventure, RPG, Co-op, Beat 'e...",Full Metal Furies,0


In [445]:
cleaned_games.to_csv('games_cleaned.csv')

# Summary

In [448]:
print('Total # of games: ' + str(len(cleaned_games)))
print('Total # of dlcs: ' + str(len(cleaned_dlcs)))
print('Total # of reviews: ' + str(len(cleaned_reviews)))

Total # of games: 17068
Total # of dlcs: 13884
Total # of reviews: 73058
