# Data Cleaning and Merging

In this notebook, we will be doing data cleaning and data merging. 

The general gist of the notebook will be
1. Removing duplicates in both datasets
2. Merging both dataset
3. Cleaning dataset of missing/null values and combining columns. 

---

## Import Libraries

In this section, we will import all the libraries that will be used in this notebook.

In [1]:
# For Calculation and Data Manipulation
import numpy as np
import pandas as pd

# For file exportion folder creation
import os

# for data storing
import sqlite3

# for null values
from utils import num_col_null, pkl_output, unpack_cell

# this setting widens how many characters pandas will display in a column:
pd.options.display.max_colwidth = 400

# this setting allows us to see up to 50 columns
pd.options.display.max_columns = 50

---

## Functions

In this section, we will list down all the functions that are being used in the notebook as a summary. The functions can be found in [utils.py](./utils.py).

1. Number of columns with null values: `num_col_null`
2. Pickle file output: `pkl_output`
3. Unpack dictionary from column: `unpack_cell`

---

## Read data file

First, we will read in the pickle file containing the raw data extracted using a notebook similar to the previous code notebook. 

In [2]:
# read pickle file
df_steamstore = pd.read_pickle('../data/steam_game_data.pkl')
df_steamspy = pd.read_pickle('../data/steamspy_game_data.pkl')

In [3]:
# see df shape and size
print(f"Shape of Steam Store data : {df_steamstore.shape}")
print(f"First 3 rows of Steam Store data")
df_steamstore.head(3)

Shape of Steam Store data : (51749, 38)
First 3 rows of Steam Store data


Unnamed: 0,about_the_game,background,categories,content_descriptors,detailed_description,developers,genres,header_image,is_free,linux_requirements,mac_requirements,metacritic,name,package_groups,packages,pc_requirements,platforms,price_overview,publishers,recommendations,release_date,required_age,screenshots,short_description,steam_appid,support_info,supported_languages,type,website,dlc,achievements,demos,movies,controller_support,reviews,legal_notice,drm_notice,ext_user_account_notice
0,Play the world's number 1 online action game. Engage in an incredibly realistic brand of terrorist warfare in this wildly popular team-based game. Ally with teammates to complete strategic missions. Take out enemy sites. Rescue hostages. Your role affects your team's success. Your team's success affects your role.,https://cdn.akamai.steamstatic.com/steam/apps/10/page_bg_generated_v6b.jpg?t=1602535893,"[{'id': 1, 'description': 'Multi-player'}, {'id': 49, 'description': 'PvP'}, {'id': 36, 'description': 'Online PvP'}, {'id': 37, 'description': 'Shared/Split Screen PvP'}, {'id': 8, 'description': 'Valve Anti-Cheat enabled'}]","{'ids': [2, 5], 'notes': 'Includes intense violence and blood.'}",Play the world's number 1 online action game. Engage in an incredibly realistic brand of terrorist warfare in this wildly popular team-based game. Ally with teammates to complete strategic missions. Take out enemy sites. Rescue hostages. Your role affects your team's success. Your team's success affects your role.,[Valve],"[{'id': '1', 'description': 'Action'}]",https://cdn.akamai.steamstatic.com/steam/apps/10/header.jpg?t=1602535893,0.0,"{'minimum': 'Minimum: Linux Ubuntu 12.04, Dual-core from Intel or AMD at 2.8 GHz, 1GB Memory, nVidia GeForce 8600/9600GT, ATI/AMD Radeaon HD2600/3600 (Graphic Drivers: nVidia 310, AMD 12.11), OpenGL 2.1, 4GB Hard Drive Space, OpenAL Compatible Sound Card'}","{'minimum': 'Minimum: OS X Snow Leopard 10.6.3, 1GB RAM, 4GB Hard Drive Space,NVIDIA GeForce 8 or higher, ATI X1600 or higher, or Intel HD 3000 or higher Mouse, Keyboard, Internet Connection'}","{'score': 88, 'url': 'https://www.metacritic.com/game/pc/counter-strike?ftag=MCD-06-10aaa1f'}",Counter-Strike,"[{'name': 'default', 'title': 'Buy Counter-Strike', 'description': '', 'selection_text': 'Select a purchase option', 'save_text': '', 'display_type': 0, 'is_recurring_subscription': 'false', 'subs': [{'packageid': 7, 'percent_savings_text': ' ', 'percent_savings': 0, 'option_text': 'Counter-Strike: Condition Zero - 8,19€', 'option_description': '', 'can_get_free_license': '0', 'is_free_license...","[574941, 7]","{'minimum': '  <p><strong>Minimum:</strong> 500 mhz processor, 96mb ram, 16mb video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>  <p><strong>Recommended:</strong> 800 mhz processor, 128mb ram, 32mb+ video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>  '}","{'windows': True, 'mac': True, 'linux': True}","{'currency': 'EUR', 'initial': 819, 'final': 819, 'discount_percent': 0, 'initial_formatted': '', 'final_formatted': '8,19€'}",[Valve],{'total': 118599},"{'coming_soon': False, 'date': '1 Nov, 2000'}",0.0,"[{'id': 0, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/10/0000000132.600x338.jpg?t=1602535893', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/apps/10/0000000132.1920x1080.jpg?t=1602535893'}, {'id': 1, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/10/0000000133.600x338.jpg?t=1602535893', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/apps...",Play the world's number 1 online action game. Engage in an incredibly realistic brand of terrorist warfare in this wildly popular team-based game. Ally with teammates to complete strategic missions. Take out enemy sites. Rescue hostages. Your role affects your team's success. Your team's success affects your role.,10.0,"{'url': 'http://steamcommunity.com/app/10', 'email': ''}","English<strong>*</strong>, French<strong>*</strong>, German<strong>*</strong>, Italian<strong>*</strong>, Spanish - Spain<strong>*</strong>, Simplified Chinese<strong>*</strong>, Traditional Chinese<strong>*</strong>, Korean<strong>*</strong><br><strong>*</strong>languages with full audio support",game,,,,,,,,,,
1,"One of the most popular online action games of all time, Team Fortress Classic features over nine character classes -- from Medic to Spy to Demolition Man -- enlisted in a unique style of online team warfare. Each character class possesses unique weapons, items, and abilities, as teams compete online in a variety of game play modes.",https://cdn.akamai.steamstatic.com/steam/apps/20/page_bg_generated_v6b.jpg?t=1579634708,"[{'id': 1, 'description': 'Multi-player'}, {'id': 49, 'description': 'PvP'}, {'id': 36, 'description': 'Online PvP'}, {'id': 37, 'description': 'Shared/Split Screen PvP'}, {'id': 8, 'description': 'Valve Anti-Cheat enabled'}, {'id': 44, 'description': 'Remote Play Together'}]","{'ids': [2, 5], 'notes': 'Includes intense violence and blood.'}","One of the most popular online action games of all time, Team Fortress Classic features over nine character classes -- from Medic to Spy to Demolition Man -- enlisted in a unique style of online team warfare. Each character class possesses unique weapons, items, and abilities, as teams compete online in a variety of game play modes.",[Valve],"[{'id': '1', 'description': 'Action'}]",https://cdn.akamai.steamstatic.com/steam/apps/20/header.jpg?t=1579634708,0.0,"{'minimum': 'Minimum: Linux Ubuntu 12.04, Dual-core from Intel or AMD at 2.8 GHz, 1GB Memory, nVidia GeForce 8600/9600GT, ATI/AMD Radeaon HD2600/3600 (Graphic Drivers: nVidia 310, AMD 12.11), OpenGL 2.1, 4GB Hard Drive Space, OpenAL Compatible Sound Card'}","{'minimum': 'Minimum: OS X Snow Leopard 10.6.3, 1GB RAM, 4GB Hard Drive Space,NVIDIA GeForce 8 or higher, ATI X1600 or higher, or Intel HD 3000 or higher Mouse, Keyboard, Internet Connection'}",,Team Fortress Classic,"[{'name': 'default', 'title': 'Buy Team Fortress Classic', 'description': '', 'selection_text': 'Select a purchase option', 'save_text': '', 'display_type': 0, 'is_recurring_subscription': 'false', 'subs': [{'packageid': 29, 'percent_savings_text': ' ', 'percent_savings': 0, 'option_text': 'Team Fortress Classic - S$5.25', 'option_description': '', 'can_get_free_license': '0', 'is_free_license...",[29],"{'minimum': '  <p><strong>Minimum:</strong> 500 mhz processor, 96mb ram, 16mb video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>  <p><strong>Recommended:</strong> 800 mhz processor, 128mb ram, 32mb+ video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>  '}","{'windows': True, 'mac': True, 'linux': True}","{'currency': 'SGD', 'initial': 525, 'final': 525, 'discount_percent': 0, 'initial_formatted': '', 'final_formatted': 'S$5.25'}",[Valve],{'total': 4486},"{'coming_soon': False, 'date': '1 Apr, 1999'}",0.0,"[{'id': 0, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/20/0000000164.600x338.jpg?t=1579634708', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/apps/20/0000000164.1920x1080.jpg?t=1579634708'}, {'id': 1, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/20/0000000165.600x338.jpg?t=1579634708', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/apps...","One of the most popular online action games of all time, Team Fortress Classic features over nine character classes -- from Medic to Spy to Demolition Man -- enlisted in a unique style of online team warfare. Each character class possesses unique weapons, items, and abilities, as teams compete online in a variety of game play modes.",20.0,"{'url': '', 'email': ''}","English, French, German, Italian, Spanish - Spain, Korean, Russian, Simplified Chinese, Traditional Chinese",game,,,,,,,,,,
2,"Enlist in an intense brand of Axis vs. Allied teamplay set in the WWII European Theatre of Operations. Players assume the role of light/assault/heavy infantry, sniper or machine-gunner class, each with a unique arsenal of historical weaponry at their disposal. Missions are based on key historical operations. And, as war rages, players must work together with their squad to accomplish a variety...",https://cdn.akamai.steamstatic.com/steam/apps/30/page_bg_generated_v6b.jpg?t=1512413490,"[{'id': 1, 'description': 'Multi-player'}, {'id': 8, 'description': 'Valve Anti-Cheat enabled'}]","{'ids': [], 'notes': None}","Enlist in an intense brand of Axis vs. Allied teamplay set in the WWII European Theatre of Operations. Players assume the role of light/assault/heavy infantry, sniper or machine-gunner class, each with a unique arsenal of historical weaponry at their disposal. Missions are based on key historical operations. And, as war rages, players must work together with their squad to accomplish a variety...",[Valve],"[{'id': '1', 'description': 'Action'}]",https://cdn.akamai.steamstatic.com/steam/apps/30/header.jpg?t=1512413490,0.0,"{'minimum': 'Minimum: Linux Ubuntu 12.04, Dual-core from Intel or AMD at 2.8 GHz, 1GB Memory, nVidia GeForce 8600/9600GT, ATI/AMD Radeaon HD2600/3600 (Graphic Drivers: nVidia 310, AMD 12.11), OpenGL 2.1, 4GB Hard Drive Space, OpenAL Compatible Sound Card'}","{'minimum': 'Minimum: OS X Snow Leopard 10.6.3, 1GB RAM, 4GB Hard Drive Space,NVIDIA GeForce 8 or higher, ATI X1600 or higher, or Intel HD 3000 or higher Mouse, Keyboard, Internet Connection'}","{'score': 79, 'url': 'https://www.metacritic.com/game/pc/day-of-defeat?ftag=MCD-06-10aaa1f'}",Day of Defeat,"[{'name': 'default', 'title': 'Buy Day of Defeat', 'description': '', 'selection_text': 'Select a purchase option', 'save_text': '', 'display_type': 0, 'is_recurring_subscription': 'false', 'subs': [{'packageid': 30, 'percent_savings_text': ' ', 'percent_savings': 0, 'option_text': 'Day of Defeat - S$5.25', 'option_description': '', 'can_get_free_license': '0', 'is_free_license': False, 'price...",[30],"{'minimum': '  <p><strong>Minimum:</strong> 500 mhz processor, 96mb ram, 16mb video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>  <p><strong>Recommended:</strong> 800 mhz processor, 128mb ram, 32mb+ video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>  '}","{'windows': True, 'mac': True, 'linux': True}","{'currency': 'SGD', 'initial': 525, 'final': 525, 'discount_percent': 0, 'initial_formatted': '', 'final_formatted': 'S$5.25'}",[Valve],{'total': 3126},"{'coming_soon': False, 'date': '1 May, 2003'}",0.0,"[{'id': 0, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/30/0000000169.600x338.jpg?t=1512413490', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/apps/30/0000000169.1920x1080.jpg?t=1512413490'}, {'id': 1, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/30/0000000170.600x338.jpg?t=1512413490', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/apps...","Enlist in an intense brand of Axis vs. Allied teamplay set in the WWII European Theatre of Operations. Players assume the role of light/assault/heavy infantry, sniper or machine-gunner class, each with a unique arsenal of historical weaponry at their disposal. Missions are based on key historical operations.",30.0,"{'url': '', 'email': ''}","English, French, German, Italian, Spanish - Spain",game,http://www.dayofdefeat.com/,,,,,,,,,


In [4]:
# see df info
print(f"Info on Steam Store data")
df_steamstore.info()

Info on Steam Store data
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 51749 entries, 0 to 51748
Data columns (total 38 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   about_the_game           51566 non-null  object 
 1   background               51566 non-null  object 
 2   categories               50613 non-null  object 
 3   content_descriptors      51566 non-null  object 
 4   detailed_description     51566 non-null  object 
 5   developers               51431 non-null  object 
 6   genres                   51474 non-null  object 
 7   header_image             51566 non-null  object 
 8   is_free                  51566 non-null  float64
 9   linux_requirements       51566 non-null  object 
 10  mac_requirements         51566 non-null  object 
 11  metacritic               3725 non-null   object 
 12  name                     51749 non-null  object 
 13  package_groups           51566 non-null  object 
 1

In [5]:
# see df shape and size
print(f"Shape of SteamSpy data : {df_steamspy.shape}")
print(f"First 3 rows of SteamSpy data")
df_steamspy.head(3)

Shape of SteamSpy data : (51749, 20)
First 3 rows of SteamSpy data


Unnamed: 0,appid,average_2weeks,average_forever,ccu,developer,discount,genre,initialprice,languages,median_2weeks,median_forever,name,negative,owners,positive,price,publisher,score_rank,tags,userscore
0,10.0,212.0,8690.0,16837.0,Valve,0,Action,999,"English, French, German, Italian, Spanish - Spain, Simplified Chinese, Traditional Chinese, Korean",116.0,239.0,Counter-Strike,4944.0,"10,000,000 .. 20,000,000",193192.0,999,Valve,,"{'Action': 5379, 'FPS': 4801, 'Multiplayer': 3362, 'Shooter': 3327, 'Classic': 2758, 'Team-Based': 1844, 'First-Person': 1692, 'Competitive': 1588, 'Tactical': 1323, '1990's': 1181, 'e-sports': 1173, 'PvP': 865, 'Old School': 751, 'Military': 623, 'Strategy': 604, 'Survival': 296, 'Score Attack': 285, '1980s': 256, 'Assassin': 223, 'Violent': 65}",0.0
1,20.0,0.0,2752.0,77.0,Valve,0,Action,499,"English, French, German, Italian, Spanish - Spain, Korean, Russian, Simplified Chinese, Traditional Chinese",0.0,16.0,Team Fortress Classic,896.0,"5,000,000 .. 10,000,000",5416.0,499,Valve,,"{'Action': 745, 'FPS': 306, 'Multiplayer': 257, 'Classic': 232, 'Hero Shooter': 213, 'Shooter': 206, 'Team-Based': 188, 'Class-Based': 181, 'First-Person': 169, '1990's': 132, 'Old School': 106, 'Co-op': 89, 'Competitive': 68, 'Fast-Paced': 61, 'Retro': 55, 'Online Co-Op': 51, 'Violent': 45, 'Mod': 36, 'Funny': 35, 'Remake': 35}",0.0
2,30.0,0.0,4250.0,139.0,Valve,0,Action,499,"English, French, German, Italian, Spanish - Spain",0.0,28.0,Day of Defeat,557.0,"5,000,000 .. 10,000,000",5007.0,499,Valve,,"{'FPS': 788, 'World War II': 249, 'Multiplayer': 202, 'Shooter': 188, 'Action': 160, 'War': 151, 'Team-Based': 131, 'Classic': 126, 'First-Person': 105, 'Class-Based': 77, 'Military': 64, 'Historical': 57, 'Tactical': 40, 'Singleplayer': 37, 'Co-op': 34, 'Difficult': 18, 'Old School': 16, 'Retro': 14, 'World War I': 14, 'Strategy': 13}",0.0


In [6]:
# see df info
print(f"Info on SteamSpy data")
df_steamspy.info()

Info on SteamSpy data
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 51749 entries, 0 to 51748
Data columns (total 20 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   appid            51749 non-null  float64
 1   average_2weeks   51749 non-null  float64
 2   average_forever  51749 non-null  float64
 3   ccu              51749 non-null  float64
 4   developer        51749 non-null  object 
 5   discount         51727 non-null  object 
 6   genre            51749 non-null  object 
 7   initialprice     51727 non-null  object 
 8   languages        51727 non-null  object 
 9   median_2weeks    51749 non-null  float64
 10  median_forever   51749 non-null  float64
 11  name             51749 non-null  object 
 12  negative         51749 non-null  float64
 13  owners           51749 non-null  object 
 14  positive         51749 non-null  float64
 15  price            51720 non-null  object 
 16  publisher        51749 non-null  obj

---

## Removing duplicates within dataset

Before starting to merge the two dataset, we will remove the duplicates that are within both datasets. 

#### Steam Store

Running the below code shows that there are `1537` duplicates in the dataset by treating `steam_appid` as the unique identifier for the dataset. 

In [7]:
# create copy for cleaning
df_steamstore_clean = df_steamstore.copy()

# show there is duplicate rows within the table
print(f"Example of duplicated rows")
df_steamstore_clean.loc[df_steamstore_clean['steam_appid'].duplicated(keep=False), ['steam_appid', 'name']].sort_values(by=['steam_appid']).head(6)

Example of duplicated rows


Unnamed: 0,steam_appid,name
7,80.0,Counter-Strike: Condition Zero
8,80.0,Counter-Strike: Condition Zero
30,1300.0,SiN Episodes: Emergence
31,1300.0,SiN Episodes: Emergence
364,1620.0,Jagged Alliance 2 Gold
363,1620.0,Jagged Alliance 2 Gold


In [8]:
# Number of duplicated rows
# can add `end="\r"` into the print statement to ensure printing on same line
print(f"Number of duplicated rows {len(df_steamstore_clean.loc[df_steamstore_clean['steam_appid'].duplicated(), 'steam_appid'])}")

# drop duplicated rows
print(f"dropping duplicated rows")
df_steamstore_clean.drop_duplicates(subset=['steam_appid'], inplace=True)

# confirm there is no more duplicated rows
print(f"Number of duplicated rows {len(df_steamstore_clean.loc[df_steamstore_clean['steam_appid'].duplicated(), 'steam_appid'])}")

Number of duplicated rows 1537
dropping duplicated rows
Number of duplicated rows 0


In [9]:
# see df shape and size
print(f"Shape of Steam Store data : {df_steamstore_clean.shape}")
print(f"First 3 rows of Steam Store data")
df_steamstore_clean.head(3)

Shape of Steam Store data : (50212, 38)
First 3 rows of Steam Store data


Unnamed: 0,about_the_game,background,categories,content_descriptors,detailed_description,developers,genres,header_image,is_free,linux_requirements,mac_requirements,metacritic,name,package_groups,packages,pc_requirements,platforms,price_overview,publishers,recommendations,release_date,required_age,screenshots,short_description,steam_appid,support_info,supported_languages,type,website,dlc,achievements,demos,movies,controller_support,reviews,legal_notice,drm_notice,ext_user_account_notice
0,Play the world's number 1 online action game. Engage in an incredibly realistic brand of terrorist warfare in this wildly popular team-based game. Ally with teammates to complete strategic missions. Take out enemy sites. Rescue hostages. Your role affects your team's success. Your team's success affects your role.,https://cdn.akamai.steamstatic.com/steam/apps/10/page_bg_generated_v6b.jpg?t=1602535893,"[{'id': 1, 'description': 'Multi-player'}, {'id': 49, 'description': 'PvP'}, {'id': 36, 'description': 'Online PvP'}, {'id': 37, 'description': 'Shared/Split Screen PvP'}, {'id': 8, 'description': 'Valve Anti-Cheat enabled'}]","{'ids': [2, 5], 'notes': 'Includes intense violence and blood.'}",Play the world's number 1 online action game. Engage in an incredibly realistic brand of terrorist warfare in this wildly popular team-based game. Ally with teammates to complete strategic missions. Take out enemy sites. Rescue hostages. Your role affects your team's success. Your team's success affects your role.,[Valve],"[{'id': '1', 'description': 'Action'}]",https://cdn.akamai.steamstatic.com/steam/apps/10/header.jpg?t=1602535893,0.0,"{'minimum': 'Minimum: Linux Ubuntu 12.04, Dual-core from Intel or AMD at 2.8 GHz, 1GB Memory, nVidia GeForce 8600/9600GT, ATI/AMD Radeaon HD2600/3600 (Graphic Drivers: nVidia 310, AMD 12.11), OpenGL 2.1, 4GB Hard Drive Space, OpenAL Compatible Sound Card'}","{'minimum': 'Minimum: OS X Snow Leopard 10.6.3, 1GB RAM, 4GB Hard Drive Space,NVIDIA GeForce 8 or higher, ATI X1600 or higher, or Intel HD 3000 or higher Mouse, Keyboard, Internet Connection'}","{'score': 88, 'url': 'https://www.metacritic.com/game/pc/counter-strike?ftag=MCD-06-10aaa1f'}",Counter-Strike,"[{'name': 'default', 'title': 'Buy Counter-Strike', 'description': '', 'selection_text': 'Select a purchase option', 'save_text': '', 'display_type': 0, 'is_recurring_subscription': 'false', 'subs': [{'packageid': 7, 'percent_savings_text': ' ', 'percent_savings': 0, 'option_text': 'Counter-Strike: Condition Zero - 8,19€', 'option_description': '', 'can_get_free_license': '0', 'is_free_license...","[574941, 7]","{'minimum': '  <p><strong>Minimum:</strong> 500 mhz processor, 96mb ram, 16mb video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>  <p><strong>Recommended:</strong> 800 mhz processor, 128mb ram, 32mb+ video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>  '}","{'windows': True, 'mac': True, 'linux': True}","{'currency': 'EUR', 'initial': 819, 'final': 819, 'discount_percent': 0, 'initial_formatted': '', 'final_formatted': '8,19€'}",[Valve],{'total': 118599},"{'coming_soon': False, 'date': '1 Nov, 2000'}",0.0,"[{'id': 0, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/10/0000000132.600x338.jpg?t=1602535893', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/apps/10/0000000132.1920x1080.jpg?t=1602535893'}, {'id': 1, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/10/0000000133.600x338.jpg?t=1602535893', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/apps...",Play the world's number 1 online action game. Engage in an incredibly realistic brand of terrorist warfare in this wildly popular team-based game. Ally with teammates to complete strategic missions. Take out enemy sites. Rescue hostages. Your role affects your team's success. Your team's success affects your role.,10.0,"{'url': 'http://steamcommunity.com/app/10', 'email': ''}","English<strong>*</strong>, French<strong>*</strong>, German<strong>*</strong>, Italian<strong>*</strong>, Spanish - Spain<strong>*</strong>, Simplified Chinese<strong>*</strong>, Traditional Chinese<strong>*</strong>, Korean<strong>*</strong><br><strong>*</strong>languages with full audio support",game,,,,,,,,,,
1,"One of the most popular online action games of all time, Team Fortress Classic features over nine character classes -- from Medic to Spy to Demolition Man -- enlisted in a unique style of online team warfare. Each character class possesses unique weapons, items, and abilities, as teams compete online in a variety of game play modes.",https://cdn.akamai.steamstatic.com/steam/apps/20/page_bg_generated_v6b.jpg?t=1579634708,"[{'id': 1, 'description': 'Multi-player'}, {'id': 49, 'description': 'PvP'}, {'id': 36, 'description': 'Online PvP'}, {'id': 37, 'description': 'Shared/Split Screen PvP'}, {'id': 8, 'description': 'Valve Anti-Cheat enabled'}, {'id': 44, 'description': 'Remote Play Together'}]","{'ids': [2, 5], 'notes': 'Includes intense violence and blood.'}","One of the most popular online action games of all time, Team Fortress Classic features over nine character classes -- from Medic to Spy to Demolition Man -- enlisted in a unique style of online team warfare. Each character class possesses unique weapons, items, and abilities, as teams compete online in a variety of game play modes.",[Valve],"[{'id': '1', 'description': 'Action'}]",https://cdn.akamai.steamstatic.com/steam/apps/20/header.jpg?t=1579634708,0.0,"{'minimum': 'Minimum: Linux Ubuntu 12.04, Dual-core from Intel or AMD at 2.8 GHz, 1GB Memory, nVidia GeForce 8600/9600GT, ATI/AMD Radeaon HD2600/3600 (Graphic Drivers: nVidia 310, AMD 12.11), OpenGL 2.1, 4GB Hard Drive Space, OpenAL Compatible Sound Card'}","{'minimum': 'Minimum: OS X Snow Leopard 10.6.3, 1GB RAM, 4GB Hard Drive Space,NVIDIA GeForce 8 or higher, ATI X1600 or higher, or Intel HD 3000 or higher Mouse, Keyboard, Internet Connection'}",,Team Fortress Classic,"[{'name': 'default', 'title': 'Buy Team Fortress Classic', 'description': '', 'selection_text': 'Select a purchase option', 'save_text': '', 'display_type': 0, 'is_recurring_subscription': 'false', 'subs': [{'packageid': 29, 'percent_savings_text': ' ', 'percent_savings': 0, 'option_text': 'Team Fortress Classic - S$5.25', 'option_description': '', 'can_get_free_license': '0', 'is_free_license...",[29],"{'minimum': '  <p><strong>Minimum:</strong> 500 mhz processor, 96mb ram, 16mb video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>  <p><strong>Recommended:</strong> 800 mhz processor, 128mb ram, 32mb+ video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>  '}","{'windows': True, 'mac': True, 'linux': True}","{'currency': 'SGD', 'initial': 525, 'final': 525, 'discount_percent': 0, 'initial_formatted': '', 'final_formatted': 'S$5.25'}",[Valve],{'total': 4486},"{'coming_soon': False, 'date': '1 Apr, 1999'}",0.0,"[{'id': 0, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/20/0000000164.600x338.jpg?t=1579634708', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/apps/20/0000000164.1920x1080.jpg?t=1579634708'}, {'id': 1, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/20/0000000165.600x338.jpg?t=1579634708', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/apps...","One of the most popular online action games of all time, Team Fortress Classic features over nine character classes -- from Medic to Spy to Demolition Man -- enlisted in a unique style of online team warfare. Each character class possesses unique weapons, items, and abilities, as teams compete online in a variety of game play modes.",20.0,"{'url': '', 'email': ''}","English, French, German, Italian, Spanish - Spain, Korean, Russian, Simplified Chinese, Traditional Chinese",game,,,,,,,,,,
2,"Enlist in an intense brand of Axis vs. Allied teamplay set in the WWII European Theatre of Operations. Players assume the role of light/assault/heavy infantry, sniper or machine-gunner class, each with a unique arsenal of historical weaponry at their disposal. Missions are based on key historical operations. And, as war rages, players must work together with their squad to accomplish a variety...",https://cdn.akamai.steamstatic.com/steam/apps/30/page_bg_generated_v6b.jpg?t=1512413490,"[{'id': 1, 'description': 'Multi-player'}, {'id': 8, 'description': 'Valve Anti-Cheat enabled'}]","{'ids': [], 'notes': None}","Enlist in an intense brand of Axis vs. Allied teamplay set in the WWII European Theatre of Operations. Players assume the role of light/assault/heavy infantry, sniper or machine-gunner class, each with a unique arsenal of historical weaponry at their disposal. Missions are based on key historical operations. And, as war rages, players must work together with their squad to accomplish a variety...",[Valve],"[{'id': '1', 'description': 'Action'}]",https://cdn.akamai.steamstatic.com/steam/apps/30/header.jpg?t=1512413490,0.0,"{'minimum': 'Minimum: Linux Ubuntu 12.04, Dual-core from Intel or AMD at 2.8 GHz, 1GB Memory, nVidia GeForce 8600/9600GT, ATI/AMD Radeaon HD2600/3600 (Graphic Drivers: nVidia 310, AMD 12.11), OpenGL 2.1, 4GB Hard Drive Space, OpenAL Compatible Sound Card'}","{'minimum': 'Minimum: OS X Snow Leopard 10.6.3, 1GB RAM, 4GB Hard Drive Space,NVIDIA GeForce 8 or higher, ATI X1600 or higher, or Intel HD 3000 or higher Mouse, Keyboard, Internet Connection'}","{'score': 79, 'url': 'https://www.metacritic.com/game/pc/day-of-defeat?ftag=MCD-06-10aaa1f'}",Day of Defeat,"[{'name': 'default', 'title': 'Buy Day of Defeat', 'description': '', 'selection_text': 'Select a purchase option', 'save_text': '', 'display_type': 0, 'is_recurring_subscription': 'false', 'subs': [{'packageid': 30, 'percent_savings_text': ' ', 'percent_savings': 0, 'option_text': 'Day of Defeat - S$5.25', 'option_description': '', 'can_get_free_license': '0', 'is_free_license': False, 'price...",[30],"{'minimum': '  <p><strong>Minimum:</strong> 500 mhz processor, 96mb ram, 16mb video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>  <p><strong>Recommended:</strong> 800 mhz processor, 128mb ram, 32mb+ video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>  '}","{'windows': True, 'mac': True, 'linux': True}","{'currency': 'SGD', 'initial': 525, 'final': 525, 'discount_percent': 0, 'initial_formatted': '', 'final_formatted': 'S$5.25'}",[Valve],{'total': 3126},"{'coming_soon': False, 'date': '1 May, 2003'}",0.0,"[{'id': 0, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/30/0000000169.600x338.jpg?t=1512413490', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/apps/30/0000000169.1920x1080.jpg?t=1512413490'}, {'id': 1, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/30/0000000170.600x338.jpg?t=1512413490', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/apps...","Enlist in an intense brand of Axis vs. Allied teamplay set in the WWII European Theatre of Operations. Players assume the role of light/assault/heavy infantry, sniper or machine-gunner class, each with a unique arsenal of historical weaponry at their disposal. Missions are based on key historical operations.",30.0,"{'url': '', 'email': ''}","English, French, German, Italian, Spanish - Spain",game,http://www.dayofdefeat.com/,,,,,,,,,


In [10]:
# see df info
print(f"Info on Steam Store data")
df_steamstore_clean.info()

Info on Steam Store data
<class 'pandas.core.frame.DataFrame'>
Int64Index: 50212 entries, 0 to 51748
Data columns (total 38 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   about_the_game           50030 non-null  object 
 1   background               50030 non-null  object 
 2   categories               49091 non-null  object 
 3   content_descriptors      50030 non-null  object 
 4   detailed_description     50030 non-null  object 
 5   developers               49900 non-null  object 
 6   genres                   49941 non-null  object 
 7   header_image             50030 non-null  object 
 8   is_free                  50030 non-null  float64
 9   linux_requirements       50030 non-null  object 
 10  mac_requirements         50030 non-null  object 
 11  metacritic               3651 non-null   object 
 12  name                     50212 non-null  object 
 13  package_groups           50030 non-null  object 
 1

#### Steam Spy

Running the below code shows that there are `1521` duplicates in the dataset by treating `appid` as the unique identifier for the dataset. 

In [11]:
# create copy for cleaning
df_steamspy_clean = df_steamspy.copy()

# show there is duplicate rows within the table
print(f"Example of duplicated rows")
df_steamspy_clean.loc[df_steamspy_clean['appid'].duplicated(keep=False), ['appid', 'name']].sort_values(by=['appid']).head(6)

Example of duplicated rows


Unnamed: 0,appid,name
107,3420.0,Iggle Pop Deluxe
108,3420.0,Iggle Pop Deluxe
123,3580.0,The Wizard's Pen
124,3580.0,The Wizard's Pen
131,3730.0,Aliens versus Predator Classic 2000
132,3730.0,Aliens versus Predator Classic 2000


In [12]:
# Number of duplicated rows
# can add `end="\r"` into the print statement to ensure printing on same line
print(f"Number of duplicated rows {len(df_steamspy_clean.loc[df_steamspy_clean['appid'].duplicated(), 'appid'])}")

# drop duplicated rows
print(f"dropping duplicated rows")
df_steamspy_clean.drop_duplicates(subset=['appid'], inplace=True)

# confirm there is no more duplicated rows
print(f"Number of duplicated rows {len(df_steamspy_clean.loc[df_steamspy_clean['appid'].duplicated(), 'appid'])}")

Number of duplicated rows 1521
dropping duplicated rows
Number of duplicated rows 0


In [13]:
# see df shape and size
print(f"Shape of Steam Store data : {df_steamspy_clean.shape}")
print(f"First 3 rows of Steam Store data")
df_steamspy_clean.head(3)

Shape of Steam Store data : (50228, 20)
First 3 rows of Steam Store data


Unnamed: 0,appid,average_2weeks,average_forever,ccu,developer,discount,genre,initialprice,languages,median_2weeks,median_forever,name,negative,owners,positive,price,publisher,score_rank,tags,userscore
0,10.0,212.0,8690.0,16837.0,Valve,0,Action,999,"English, French, German, Italian, Spanish - Spain, Simplified Chinese, Traditional Chinese, Korean",116.0,239.0,Counter-Strike,4944.0,"10,000,000 .. 20,000,000",193192.0,999,Valve,,"{'Action': 5379, 'FPS': 4801, 'Multiplayer': 3362, 'Shooter': 3327, 'Classic': 2758, 'Team-Based': 1844, 'First-Person': 1692, 'Competitive': 1588, 'Tactical': 1323, '1990's': 1181, 'e-sports': 1173, 'PvP': 865, 'Old School': 751, 'Military': 623, 'Strategy': 604, 'Survival': 296, 'Score Attack': 285, '1980s': 256, 'Assassin': 223, 'Violent': 65}",0.0
1,20.0,0.0,2752.0,77.0,Valve,0,Action,499,"English, French, German, Italian, Spanish - Spain, Korean, Russian, Simplified Chinese, Traditional Chinese",0.0,16.0,Team Fortress Classic,896.0,"5,000,000 .. 10,000,000",5416.0,499,Valve,,"{'Action': 745, 'FPS': 306, 'Multiplayer': 257, 'Classic': 232, 'Hero Shooter': 213, 'Shooter': 206, 'Team-Based': 188, 'Class-Based': 181, 'First-Person': 169, '1990's': 132, 'Old School': 106, 'Co-op': 89, 'Competitive': 68, 'Fast-Paced': 61, 'Retro': 55, 'Online Co-Op': 51, 'Violent': 45, 'Mod': 36, 'Funny': 35, 'Remake': 35}",0.0
2,30.0,0.0,4250.0,139.0,Valve,0,Action,499,"English, French, German, Italian, Spanish - Spain",0.0,28.0,Day of Defeat,557.0,"5,000,000 .. 10,000,000",5007.0,499,Valve,,"{'FPS': 788, 'World War II': 249, 'Multiplayer': 202, 'Shooter': 188, 'Action': 160, 'War': 151, 'Team-Based': 131, 'Classic': 126, 'First-Person': 105, 'Class-Based': 77, 'Military': 64, 'Historical': 57, 'Tactical': 40, 'Singleplayer': 37, 'Co-op': 34, 'Difficult': 18, 'Old School': 16, 'Retro': 14, 'World War I': 14, 'Strategy': 13}",0.0


In [14]:
# see df info
print(f"Info on Steam Store data")
df_steamspy_clean.info()

Info on Steam Store data
<class 'pandas.core.frame.DataFrame'>
Int64Index: 50228 entries, 0 to 51748
Data columns (total 20 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   appid            50228 non-null  float64
 1   average_2weeks   50228 non-null  float64
 2   average_forever  50228 non-null  float64
 3   ccu              50228 non-null  float64
 4   developer        50228 non-null  object 
 5   discount         50206 non-null  object 
 6   genre            50228 non-null  object 
 7   initialprice     50206 non-null  object 
 8   languages        50206 non-null  object 
 9   median_2weeks    50228 non-null  float64
 10  median_forever   50228 non-null  float64
 11  name             50228 non-null  object 
 12  negative         50228 non-null  float64
 13  owners           50228 non-null  object 
 14  positive         50228 non-null  float64
 15  price            50199 non-null  object 
 16  publisher        50228 non-null  

---

## Merging both files

Now that we have removed the duplicates within the two datasets, we will merge the two dataset into one. 

We will perform the merging using **Inner Merge / Inner Join**. This will ensure that we will have all the data with no missing values. By running the below code, we identified to have `50205` rows of data in the merged dataset. 

After merging of the two columns, we will then proceed to clean the data. 

In [15]:
# find the number of common values between the two datasets after 'inner' merge is applied
df_steamspy_clean['appid'].isin(df_steamstore_clean['steam_appid']).value_counts()

True     50205
False       23
Name: appid, dtype: int64

In [16]:
# merge the two datasets
df_steam = pd.merge(df_steamstore_clean,   # 'left' dataset
                    df_steamspy_clean,     # 'right' dataset
                    left_on="steam_appid",   # column name on 'left' dataset
                    right_on="appid",   # column name on 'right' dataset
                    #how = 'inner'   # default is inner, can be used to state {'left', 'right', 'outer', 'inner', 'cross'}
                   )

In [17]:
# ensure that all identified rows are merged together
print(f"Shape of Steam Store data : {df_steamstore_clean.shape}")
print(f"Shape of SteamSpy data : {df_steamspy_clean.shape}")
print(f"Shape of merged data : {df_steam.shape}")

Shape of Steam Store data : (50212, 38)
Shape of SteamSpy data : (50228, 20)
Shape of merged data : (50205, 58)


In [18]:
print(f"First 3 rows of merged data")
df_steam.head(3)

First 3 rows of merged data


Unnamed: 0,about_the_game,background,categories,content_descriptors,detailed_description,developers,genres,header_image,is_free,linux_requirements,mac_requirements,metacritic,name_x,package_groups,packages,pc_requirements,platforms,price_overview,publishers,recommendations,release_date,required_age,screenshots,short_description,steam_appid,...,controller_support,reviews,legal_notice,drm_notice,ext_user_account_notice,appid,average_2weeks,average_forever,ccu,developer,discount,genre,initialprice,languages,median_2weeks,median_forever,name_y,negative,owners,positive,price,publisher,score_rank,tags,userscore
0,Play the world's number 1 online action game. Engage in an incredibly realistic brand of terrorist warfare in this wildly popular team-based game. Ally with teammates to complete strategic missions. Take out enemy sites. Rescue hostages. Your role affects your team's success. Your team's success affects your role.,https://cdn.akamai.steamstatic.com/steam/apps/10/page_bg_generated_v6b.jpg?t=1602535893,"[{'id': 1, 'description': 'Multi-player'}, {'id': 49, 'description': 'PvP'}, {'id': 36, 'description': 'Online PvP'}, {'id': 37, 'description': 'Shared/Split Screen PvP'}, {'id': 8, 'description': 'Valve Anti-Cheat enabled'}]","{'ids': [2, 5], 'notes': 'Includes intense violence and blood.'}",Play the world's number 1 online action game. Engage in an incredibly realistic brand of terrorist warfare in this wildly popular team-based game. Ally with teammates to complete strategic missions. Take out enemy sites. Rescue hostages. Your role affects your team's success. Your team's success affects your role.,[Valve],"[{'id': '1', 'description': 'Action'}]",https://cdn.akamai.steamstatic.com/steam/apps/10/header.jpg?t=1602535893,0.0,"{'minimum': 'Minimum: Linux Ubuntu 12.04, Dual-core from Intel or AMD at 2.8 GHz, 1GB Memory, nVidia GeForce 8600/9600GT, ATI/AMD Radeaon HD2600/3600 (Graphic Drivers: nVidia 310, AMD 12.11), OpenGL 2.1, 4GB Hard Drive Space, OpenAL Compatible Sound Card'}","{'minimum': 'Minimum: OS X Snow Leopard 10.6.3, 1GB RAM, 4GB Hard Drive Space,NVIDIA GeForce 8 or higher, ATI X1600 or higher, or Intel HD 3000 or higher Mouse, Keyboard, Internet Connection'}","{'score': 88, 'url': 'https://www.metacritic.com/game/pc/counter-strike?ftag=MCD-06-10aaa1f'}",Counter-Strike,"[{'name': 'default', 'title': 'Buy Counter-Strike', 'description': '', 'selection_text': 'Select a purchase option', 'save_text': '', 'display_type': 0, 'is_recurring_subscription': 'false', 'subs': [{'packageid': 7, 'percent_savings_text': ' ', 'percent_savings': 0, 'option_text': 'Counter-Strike: Condition Zero - 8,19€', 'option_description': '', 'can_get_free_license': '0', 'is_free_license...","[574941, 7]","{'minimum': '  <p><strong>Minimum:</strong> 500 mhz processor, 96mb ram, 16mb video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>  <p><strong>Recommended:</strong> 800 mhz processor, 128mb ram, 32mb+ video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>  '}","{'windows': True, 'mac': True, 'linux': True}","{'currency': 'EUR', 'initial': 819, 'final': 819, 'discount_percent': 0, 'initial_formatted': '', 'final_formatted': '8,19€'}",[Valve],{'total': 118599},"{'coming_soon': False, 'date': '1 Nov, 2000'}",0.0,"[{'id': 0, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/10/0000000132.600x338.jpg?t=1602535893', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/apps/10/0000000132.1920x1080.jpg?t=1602535893'}, {'id': 1, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/10/0000000133.600x338.jpg?t=1602535893', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/apps...",Play the world's number 1 online action game. Engage in an incredibly realistic brand of terrorist warfare in this wildly popular team-based game. Ally with teammates to complete strategic missions. Take out enemy sites. Rescue hostages. Your role affects your team's success. Your team's success affects your role.,10.0,...,,,,,,10.0,212.0,8690.0,16837.0,Valve,0,Action,999,"English, French, German, Italian, Spanish - Spain, Simplified Chinese, Traditional Chinese, Korean",116.0,239.0,Counter-Strike,4944.0,"10,000,000 .. 20,000,000",193192.0,999,Valve,,"{'Action': 5379, 'FPS': 4801, 'Multiplayer': 3362, 'Shooter': 3327, 'Classic': 2758, 'Team-Based': 1844, 'First-Person': 1692, 'Competitive': 1588, 'Tactical': 1323, '1990's': 1181, 'e-sports': 1173, 'PvP': 865, 'Old School': 751, 'Military': 623, 'Strategy': 604, 'Survival': 296, 'Score Attack': 285, '1980s': 256, 'Assassin': 223, 'Violent': 65}",0.0
1,"One of the most popular online action games of all time, Team Fortress Classic features over nine character classes -- from Medic to Spy to Demolition Man -- enlisted in a unique style of online team warfare. Each character class possesses unique weapons, items, and abilities, as teams compete online in a variety of game play modes.",https://cdn.akamai.steamstatic.com/steam/apps/20/page_bg_generated_v6b.jpg?t=1579634708,"[{'id': 1, 'description': 'Multi-player'}, {'id': 49, 'description': 'PvP'}, {'id': 36, 'description': 'Online PvP'}, {'id': 37, 'description': 'Shared/Split Screen PvP'}, {'id': 8, 'description': 'Valve Anti-Cheat enabled'}, {'id': 44, 'description': 'Remote Play Together'}]","{'ids': [2, 5], 'notes': 'Includes intense violence and blood.'}","One of the most popular online action games of all time, Team Fortress Classic features over nine character classes -- from Medic to Spy to Demolition Man -- enlisted in a unique style of online team warfare. Each character class possesses unique weapons, items, and abilities, as teams compete online in a variety of game play modes.",[Valve],"[{'id': '1', 'description': 'Action'}]",https://cdn.akamai.steamstatic.com/steam/apps/20/header.jpg?t=1579634708,0.0,"{'minimum': 'Minimum: Linux Ubuntu 12.04, Dual-core from Intel or AMD at 2.8 GHz, 1GB Memory, nVidia GeForce 8600/9600GT, ATI/AMD Radeaon HD2600/3600 (Graphic Drivers: nVidia 310, AMD 12.11), OpenGL 2.1, 4GB Hard Drive Space, OpenAL Compatible Sound Card'}","{'minimum': 'Minimum: OS X Snow Leopard 10.6.3, 1GB RAM, 4GB Hard Drive Space,NVIDIA GeForce 8 or higher, ATI X1600 or higher, or Intel HD 3000 or higher Mouse, Keyboard, Internet Connection'}",,Team Fortress Classic,"[{'name': 'default', 'title': 'Buy Team Fortress Classic', 'description': '', 'selection_text': 'Select a purchase option', 'save_text': '', 'display_type': 0, 'is_recurring_subscription': 'false', 'subs': [{'packageid': 29, 'percent_savings_text': ' ', 'percent_savings': 0, 'option_text': 'Team Fortress Classic - S$5.25', 'option_description': '', 'can_get_free_license': '0', 'is_free_license...",[29],"{'minimum': '  <p><strong>Minimum:</strong> 500 mhz processor, 96mb ram, 16mb video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>  <p><strong>Recommended:</strong> 800 mhz processor, 128mb ram, 32mb+ video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>  '}","{'windows': True, 'mac': True, 'linux': True}","{'currency': 'SGD', 'initial': 525, 'final': 525, 'discount_percent': 0, 'initial_formatted': '', 'final_formatted': 'S$5.25'}",[Valve],{'total': 4486},"{'coming_soon': False, 'date': '1 Apr, 1999'}",0.0,"[{'id': 0, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/20/0000000164.600x338.jpg?t=1579634708', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/apps/20/0000000164.1920x1080.jpg?t=1579634708'}, {'id': 1, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/20/0000000165.600x338.jpg?t=1579634708', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/apps...","One of the most popular online action games of all time, Team Fortress Classic features over nine character classes -- from Medic to Spy to Demolition Man -- enlisted in a unique style of online team warfare. Each character class possesses unique weapons, items, and abilities, as teams compete online in a variety of game play modes.",20.0,...,,,,,,20.0,0.0,2752.0,77.0,Valve,0,Action,499,"English, French, German, Italian, Spanish - Spain, Korean, Russian, Simplified Chinese, Traditional Chinese",0.0,16.0,Team Fortress Classic,896.0,"5,000,000 .. 10,000,000",5416.0,499,Valve,,"{'Action': 745, 'FPS': 306, 'Multiplayer': 257, 'Classic': 232, 'Hero Shooter': 213, 'Shooter': 206, 'Team-Based': 188, 'Class-Based': 181, 'First-Person': 169, '1990's': 132, 'Old School': 106, 'Co-op': 89, 'Competitive': 68, 'Fast-Paced': 61, 'Retro': 55, 'Online Co-Op': 51, 'Violent': 45, 'Mod': 36, 'Funny': 35, 'Remake': 35}",0.0
2,"Enlist in an intense brand of Axis vs. Allied teamplay set in the WWII European Theatre of Operations. Players assume the role of light/assault/heavy infantry, sniper or machine-gunner class, each with a unique arsenal of historical weaponry at their disposal. Missions are based on key historical operations. And, as war rages, players must work together with their squad to accomplish a variety...",https://cdn.akamai.steamstatic.com/steam/apps/30/page_bg_generated_v6b.jpg?t=1512413490,"[{'id': 1, 'description': 'Multi-player'}, {'id': 8, 'description': 'Valve Anti-Cheat enabled'}]","{'ids': [], 'notes': None}","Enlist in an intense brand of Axis vs. Allied teamplay set in the WWII European Theatre of Operations. Players assume the role of light/assault/heavy infantry, sniper or machine-gunner class, each with a unique arsenal of historical weaponry at their disposal. Missions are based on key historical operations. And, as war rages, players must work together with their squad to accomplish a variety...",[Valve],"[{'id': '1', 'description': 'Action'}]",https://cdn.akamai.steamstatic.com/steam/apps/30/header.jpg?t=1512413490,0.0,"{'minimum': 'Minimum: Linux Ubuntu 12.04, Dual-core from Intel or AMD at 2.8 GHz, 1GB Memory, nVidia GeForce 8600/9600GT, ATI/AMD Radeaon HD2600/3600 (Graphic Drivers: nVidia 310, AMD 12.11), OpenGL 2.1, 4GB Hard Drive Space, OpenAL Compatible Sound Card'}","{'minimum': 'Minimum: OS X Snow Leopard 10.6.3, 1GB RAM, 4GB Hard Drive Space,NVIDIA GeForce 8 or higher, ATI X1600 or higher, or Intel HD 3000 or higher Mouse, Keyboard, Internet Connection'}","{'score': 79, 'url': 'https://www.metacritic.com/game/pc/day-of-defeat?ftag=MCD-06-10aaa1f'}",Day of Defeat,"[{'name': 'default', 'title': 'Buy Day of Defeat', 'description': '', 'selection_text': 'Select a purchase option', 'save_text': '', 'display_type': 0, 'is_recurring_subscription': 'false', 'subs': [{'packageid': 30, 'percent_savings_text': ' ', 'percent_savings': 0, 'option_text': 'Day of Defeat - S$5.25', 'option_description': '', 'can_get_free_license': '0', 'is_free_license': False, 'price...",[30],"{'minimum': '  <p><strong>Minimum:</strong> 500 mhz processor, 96mb ram, 16mb video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>  <p><strong>Recommended:</strong> 800 mhz processor, 128mb ram, 32mb+ video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>  '}","{'windows': True, 'mac': True, 'linux': True}","{'currency': 'SGD', 'initial': 525, 'final': 525, 'discount_percent': 0, 'initial_formatted': '', 'final_formatted': 'S$5.25'}",[Valve],{'total': 3126},"{'coming_soon': False, 'date': '1 May, 2003'}",0.0,"[{'id': 0, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/30/0000000169.600x338.jpg?t=1512413490', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/apps/30/0000000169.1920x1080.jpg?t=1512413490'}, {'id': 1, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/30/0000000170.600x338.jpg?t=1512413490', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/apps...","Enlist in an intense brand of Axis vs. Allied teamplay set in the WWII European Theatre of Operations. Players assume the role of light/assault/heavy infantry, sniper or machine-gunner class, each with a unique arsenal of historical weaponry at their disposal. Missions are based on key historical operations.",30.0,...,,,,,,30.0,0.0,4250.0,139.0,Valve,0,Action,499,"English, French, German, Italian, Spanish - Spain",0.0,28.0,Day of Defeat,557.0,"5,000,000 .. 10,000,000",5007.0,499,Valve,,"{'FPS': 788, 'World War II': 249, 'Multiplayer': 202, 'Shooter': 188, 'Action': 160, 'War': 151, 'Team-Based': 131, 'Classic': 126, 'First-Person': 105, 'Class-Based': 77, 'Military': 64, 'Historical': 57, 'Tactical': 40, 'Singleplayer': 37, 'Co-op': 34, 'Difficult': 18, 'Old School': 16, 'Retro': 14, 'World War I': 14, 'Strategy': 13}",0.0


In [19]:
# see dataset info
df_steam.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 50205 entries, 0 to 50204
Data columns (total 58 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   about_the_game           50023 non-null  object 
 1   background               50023 non-null  object 
 2   categories               49084 non-null  object 
 3   content_descriptors      50023 non-null  object 
 4   detailed_description     50023 non-null  object 
 5   developers               49893 non-null  object 
 6   genres                   49934 non-null  object 
 7   header_image             50023 non-null  object 
 8   is_free                  50023 non-null  float64
 9   linux_requirements       50023 non-null  object 
 10  mac_requirements         50023 non-null  object 
 11  metacritic               3650 non-null   object 
 12  name_x                   50205 non-null  object 
 13  package_groups           50023 non-null  object 
 14  packages              

In [20]:
# see if there is any null data
num_col_null(df_steam)

Number of columns with null values: 40



Unnamed: 0,column,number of null values,percentage
0,about_the_game,182,0.003625
1,background,182,0.003625
2,categories,1121,0.022328
3,content_descriptors,182,0.003625
4,detailed_description,182,0.003625
5,developers,312,0.006215
6,genres,271,0.005398
7,header_image,182,0.003625
8,is_free,182,0.003625
9,linux_requirements,182,0.003625


We see that $40$ out of $58$ columns have data missing. 

On first glance, we can see that majority of the columns have at least 182 rows with data missing. Upon further checking (by running the below code), we notice that these 182 rows are 'common' (with the exception of 4) between the data, so we will remove these 182 rows. 

In [21]:
# rows that have null values at `about_the_game`
num_col_null(df_steam.loc[df_steam['about_the_game'].isnull(),:])

Number of columns with null values: 40



Unnamed: 0,column,number of null values,percentage
0,about_the_game,182,1.0
1,background,182,1.0
2,categories,182,1.0
3,content_descriptors,182,1.0
4,detailed_description,182,1.0
5,developers,182,1.0
6,genres,182,1.0
7,header_image,182,1.0
8,is_free,182,1.0
9,linux_requirements,182,1.0


In [22]:
# create copy for cleaning
df_steam_clean = df_steam.copy()

# remove the 182 rows 
df_steam_clean = df_steam_clean[df_steam_clean['about_the_game'].notnull()]

In [23]:
# see df shape and size
print(f"Shape of merged data : {df_steam_clean.shape}")
print(f"First 3 rows of merged data")
df_steam_clean.head(3)

Shape of merged data : (50023, 58)
First 3 rows of merged data


Unnamed: 0,about_the_game,background,categories,content_descriptors,detailed_description,developers,genres,header_image,is_free,linux_requirements,mac_requirements,metacritic,name_x,package_groups,packages,pc_requirements,platforms,price_overview,publishers,recommendations,release_date,required_age,screenshots,short_description,steam_appid,...,controller_support,reviews,legal_notice,drm_notice,ext_user_account_notice,appid,average_2weeks,average_forever,ccu,developer,discount,genre,initialprice,languages,median_2weeks,median_forever,name_y,negative,owners,positive,price,publisher,score_rank,tags,userscore
0,Play the world's number 1 online action game. Engage in an incredibly realistic brand of terrorist warfare in this wildly popular team-based game. Ally with teammates to complete strategic missions. Take out enemy sites. Rescue hostages. Your role affects your team's success. Your team's success affects your role.,https://cdn.akamai.steamstatic.com/steam/apps/10/page_bg_generated_v6b.jpg?t=1602535893,"[{'id': 1, 'description': 'Multi-player'}, {'id': 49, 'description': 'PvP'}, {'id': 36, 'description': 'Online PvP'}, {'id': 37, 'description': 'Shared/Split Screen PvP'}, {'id': 8, 'description': 'Valve Anti-Cheat enabled'}]","{'ids': [2, 5], 'notes': 'Includes intense violence and blood.'}",Play the world's number 1 online action game. Engage in an incredibly realistic brand of terrorist warfare in this wildly popular team-based game. Ally with teammates to complete strategic missions. Take out enemy sites. Rescue hostages. Your role affects your team's success. Your team's success affects your role.,[Valve],"[{'id': '1', 'description': 'Action'}]",https://cdn.akamai.steamstatic.com/steam/apps/10/header.jpg?t=1602535893,0.0,"{'minimum': 'Minimum: Linux Ubuntu 12.04, Dual-core from Intel or AMD at 2.8 GHz, 1GB Memory, nVidia GeForce 8600/9600GT, ATI/AMD Radeaon HD2600/3600 (Graphic Drivers: nVidia 310, AMD 12.11), OpenGL 2.1, 4GB Hard Drive Space, OpenAL Compatible Sound Card'}","{'minimum': 'Minimum: OS X Snow Leopard 10.6.3, 1GB RAM, 4GB Hard Drive Space,NVIDIA GeForce 8 or higher, ATI X1600 or higher, or Intel HD 3000 or higher Mouse, Keyboard, Internet Connection'}","{'score': 88, 'url': 'https://www.metacritic.com/game/pc/counter-strike?ftag=MCD-06-10aaa1f'}",Counter-Strike,"[{'name': 'default', 'title': 'Buy Counter-Strike', 'description': '', 'selection_text': 'Select a purchase option', 'save_text': '', 'display_type': 0, 'is_recurring_subscription': 'false', 'subs': [{'packageid': 7, 'percent_savings_text': ' ', 'percent_savings': 0, 'option_text': 'Counter-Strike: Condition Zero - 8,19€', 'option_description': '', 'can_get_free_license': '0', 'is_free_license...","[574941, 7]","{'minimum': '  <p><strong>Minimum:</strong> 500 mhz processor, 96mb ram, 16mb video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>  <p><strong>Recommended:</strong> 800 mhz processor, 128mb ram, 32mb+ video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>  '}","{'windows': True, 'mac': True, 'linux': True}","{'currency': 'EUR', 'initial': 819, 'final': 819, 'discount_percent': 0, 'initial_formatted': '', 'final_formatted': '8,19€'}",[Valve],{'total': 118599},"{'coming_soon': False, 'date': '1 Nov, 2000'}",0.0,"[{'id': 0, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/10/0000000132.600x338.jpg?t=1602535893', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/apps/10/0000000132.1920x1080.jpg?t=1602535893'}, {'id': 1, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/10/0000000133.600x338.jpg?t=1602535893', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/apps...",Play the world's number 1 online action game. Engage in an incredibly realistic brand of terrorist warfare in this wildly popular team-based game. Ally with teammates to complete strategic missions. Take out enemy sites. Rescue hostages. Your role affects your team's success. Your team's success affects your role.,10.0,...,,,,,,10.0,212.0,8690.0,16837.0,Valve,0,Action,999,"English, French, German, Italian, Spanish - Spain, Simplified Chinese, Traditional Chinese, Korean",116.0,239.0,Counter-Strike,4944.0,"10,000,000 .. 20,000,000",193192.0,999,Valve,,"{'Action': 5379, 'FPS': 4801, 'Multiplayer': 3362, 'Shooter': 3327, 'Classic': 2758, 'Team-Based': 1844, 'First-Person': 1692, 'Competitive': 1588, 'Tactical': 1323, '1990's': 1181, 'e-sports': 1173, 'PvP': 865, 'Old School': 751, 'Military': 623, 'Strategy': 604, 'Survival': 296, 'Score Attack': 285, '1980s': 256, 'Assassin': 223, 'Violent': 65}",0.0
1,"One of the most popular online action games of all time, Team Fortress Classic features over nine character classes -- from Medic to Spy to Demolition Man -- enlisted in a unique style of online team warfare. Each character class possesses unique weapons, items, and abilities, as teams compete online in a variety of game play modes.",https://cdn.akamai.steamstatic.com/steam/apps/20/page_bg_generated_v6b.jpg?t=1579634708,"[{'id': 1, 'description': 'Multi-player'}, {'id': 49, 'description': 'PvP'}, {'id': 36, 'description': 'Online PvP'}, {'id': 37, 'description': 'Shared/Split Screen PvP'}, {'id': 8, 'description': 'Valve Anti-Cheat enabled'}, {'id': 44, 'description': 'Remote Play Together'}]","{'ids': [2, 5], 'notes': 'Includes intense violence and blood.'}","One of the most popular online action games of all time, Team Fortress Classic features over nine character classes -- from Medic to Spy to Demolition Man -- enlisted in a unique style of online team warfare. Each character class possesses unique weapons, items, and abilities, as teams compete online in a variety of game play modes.",[Valve],"[{'id': '1', 'description': 'Action'}]",https://cdn.akamai.steamstatic.com/steam/apps/20/header.jpg?t=1579634708,0.0,"{'minimum': 'Minimum: Linux Ubuntu 12.04, Dual-core from Intel or AMD at 2.8 GHz, 1GB Memory, nVidia GeForce 8600/9600GT, ATI/AMD Radeaon HD2600/3600 (Graphic Drivers: nVidia 310, AMD 12.11), OpenGL 2.1, 4GB Hard Drive Space, OpenAL Compatible Sound Card'}","{'minimum': 'Minimum: OS X Snow Leopard 10.6.3, 1GB RAM, 4GB Hard Drive Space,NVIDIA GeForce 8 or higher, ATI X1600 or higher, or Intel HD 3000 or higher Mouse, Keyboard, Internet Connection'}",,Team Fortress Classic,"[{'name': 'default', 'title': 'Buy Team Fortress Classic', 'description': '', 'selection_text': 'Select a purchase option', 'save_text': '', 'display_type': 0, 'is_recurring_subscription': 'false', 'subs': [{'packageid': 29, 'percent_savings_text': ' ', 'percent_savings': 0, 'option_text': 'Team Fortress Classic - S$5.25', 'option_description': '', 'can_get_free_license': '0', 'is_free_license...",[29],"{'minimum': '  <p><strong>Minimum:</strong> 500 mhz processor, 96mb ram, 16mb video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>  <p><strong>Recommended:</strong> 800 mhz processor, 128mb ram, 32mb+ video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>  '}","{'windows': True, 'mac': True, 'linux': True}","{'currency': 'SGD', 'initial': 525, 'final': 525, 'discount_percent': 0, 'initial_formatted': '', 'final_formatted': 'S$5.25'}",[Valve],{'total': 4486},"{'coming_soon': False, 'date': '1 Apr, 1999'}",0.0,"[{'id': 0, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/20/0000000164.600x338.jpg?t=1579634708', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/apps/20/0000000164.1920x1080.jpg?t=1579634708'}, {'id': 1, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/20/0000000165.600x338.jpg?t=1579634708', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/apps...","One of the most popular online action games of all time, Team Fortress Classic features over nine character classes -- from Medic to Spy to Demolition Man -- enlisted in a unique style of online team warfare. Each character class possesses unique weapons, items, and abilities, as teams compete online in a variety of game play modes.",20.0,...,,,,,,20.0,0.0,2752.0,77.0,Valve,0,Action,499,"English, French, German, Italian, Spanish - Spain, Korean, Russian, Simplified Chinese, Traditional Chinese",0.0,16.0,Team Fortress Classic,896.0,"5,000,000 .. 10,000,000",5416.0,499,Valve,,"{'Action': 745, 'FPS': 306, 'Multiplayer': 257, 'Classic': 232, 'Hero Shooter': 213, 'Shooter': 206, 'Team-Based': 188, 'Class-Based': 181, 'First-Person': 169, '1990's': 132, 'Old School': 106, 'Co-op': 89, 'Competitive': 68, 'Fast-Paced': 61, 'Retro': 55, 'Online Co-Op': 51, 'Violent': 45, 'Mod': 36, 'Funny': 35, 'Remake': 35}",0.0
2,"Enlist in an intense brand of Axis vs. Allied teamplay set in the WWII European Theatre of Operations. Players assume the role of light/assault/heavy infantry, sniper or machine-gunner class, each with a unique arsenal of historical weaponry at their disposal. Missions are based on key historical operations. And, as war rages, players must work together with their squad to accomplish a variety...",https://cdn.akamai.steamstatic.com/steam/apps/30/page_bg_generated_v6b.jpg?t=1512413490,"[{'id': 1, 'description': 'Multi-player'}, {'id': 8, 'description': 'Valve Anti-Cheat enabled'}]","{'ids': [], 'notes': None}","Enlist in an intense brand of Axis vs. Allied teamplay set in the WWII European Theatre of Operations. Players assume the role of light/assault/heavy infantry, sniper or machine-gunner class, each with a unique arsenal of historical weaponry at their disposal. Missions are based on key historical operations. And, as war rages, players must work together with their squad to accomplish a variety...",[Valve],"[{'id': '1', 'description': 'Action'}]",https://cdn.akamai.steamstatic.com/steam/apps/30/header.jpg?t=1512413490,0.0,"{'minimum': 'Minimum: Linux Ubuntu 12.04, Dual-core from Intel or AMD at 2.8 GHz, 1GB Memory, nVidia GeForce 8600/9600GT, ATI/AMD Radeaon HD2600/3600 (Graphic Drivers: nVidia 310, AMD 12.11), OpenGL 2.1, 4GB Hard Drive Space, OpenAL Compatible Sound Card'}","{'minimum': 'Minimum: OS X Snow Leopard 10.6.3, 1GB RAM, 4GB Hard Drive Space,NVIDIA GeForce 8 or higher, ATI X1600 or higher, or Intel HD 3000 or higher Mouse, Keyboard, Internet Connection'}","{'score': 79, 'url': 'https://www.metacritic.com/game/pc/day-of-defeat?ftag=MCD-06-10aaa1f'}",Day of Defeat,"[{'name': 'default', 'title': 'Buy Day of Defeat', 'description': '', 'selection_text': 'Select a purchase option', 'save_text': '', 'display_type': 0, 'is_recurring_subscription': 'false', 'subs': [{'packageid': 30, 'percent_savings_text': ' ', 'percent_savings': 0, 'option_text': 'Day of Defeat - S$5.25', 'option_description': '', 'can_get_free_license': '0', 'is_free_license': False, 'price...",[30],"{'minimum': '  <p><strong>Minimum:</strong> 500 mhz processor, 96mb ram, 16mb video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>  <p><strong>Recommended:</strong> 800 mhz processor, 128mb ram, 32mb+ video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>  '}","{'windows': True, 'mac': True, 'linux': True}","{'currency': 'SGD', 'initial': 525, 'final': 525, 'discount_percent': 0, 'initial_formatted': '', 'final_formatted': 'S$5.25'}",[Valve],{'total': 3126},"{'coming_soon': False, 'date': '1 May, 2003'}",0.0,"[{'id': 0, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/30/0000000169.600x338.jpg?t=1512413490', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/apps/30/0000000169.1920x1080.jpg?t=1512413490'}, {'id': 1, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/30/0000000170.600x338.jpg?t=1512413490', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/apps...","Enlist in an intense brand of Axis vs. Allied teamplay set in the WWII European Theatre of Operations. Players assume the role of light/assault/heavy infantry, sniper or machine-gunner class, each with a unique arsenal of historical weaponry at their disposal. Missions are based on key historical operations.",30.0,...,,,,,,30.0,0.0,4250.0,139.0,Valve,0,Action,499,"English, French, German, Italian, Spanish - Spain",0.0,28.0,Day of Defeat,557.0,"5,000,000 .. 10,000,000",5007.0,499,Valve,,"{'FPS': 788, 'World War II': 249, 'Multiplayer': 202, 'Shooter': 188, 'Action': 160, 'War': 151, 'Team-Based': 131, 'Classic': 126, 'First-Person': 105, 'Class-Based': 77, 'Military': 64, 'Historical': 57, 'Tactical': 40, 'Singleplayer': 37, 'Co-op': 34, 'Difficult': 18, 'Old School': 16, 'Retro': 14, 'World War I': 14, 'Strategy': 13}",0.0


In [24]:
# see df info
print(f"Info on merged data")
df_steam_clean.info()

Info on merged data
<class 'pandas.core.frame.DataFrame'>
Int64Index: 50023 entries, 0 to 50204
Data columns (total 58 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   about_the_game           50023 non-null  object 
 1   background               50023 non-null  object 
 2   categories               49084 non-null  object 
 3   content_descriptors      50023 non-null  object 
 4   detailed_description     50023 non-null  object 
 5   developers               49893 non-null  object 
 6   genres                   49934 non-null  object 
 7   header_image             50023 non-null  object 
 8   is_free                  50023 non-null  float64
 9   linux_requirements       50023 non-null  object 
 10  mac_requirements         50023 non-null  object 
 11  metacritic               3650 non-null   object 
 12  name_x                   50023 non-null  object 
 13  package_groups           50023 non-null  object 
 14  pa

In [25]:
# see if there is any null data
print(f"In merged dataset -", end=" ")
num_col_null(df_steam_clean)

In merged dataset - Number of columns with null values: 23



Unnamed: 0,column,number of null values,percentage
0,categories,939,0.018771
1,developers,130,0.002599
2,genres,89,0.001779
3,metacritic,46373,0.927034
4,packages,6870,0.137337
5,price_overview,7439,0.148712
6,recommendations,38810,0.775843
7,screenshots,74,0.001479
8,supported_languages,33,0.00066
9,website,21473,0.429263


We see that some columns still have a large number of missing data, we will drop columns that has more than $75\%$ of missing data. 

In [26]:
# columns that has more than this number of missing data 
print(f'Drop columns with more than {df_steam_clean.shape[0] // 4} missing rows\n')

# identify columns that should be dropped
column_dropped = df_steam_clean.columns[(df_steam_clean.isnull().sum()) > (df_steam_clean.shape[0] // 4)]
print(f"Number of columns dropped: {len(column_dropped)}")
print(f'Columns dropped: {list(column_dropped)}')

# drop the columns
df_steam_clean.drop(columns=column_dropped, inplace=True)

Drop columns with more than 12505 missing rows

Number of columns dropped: 11
Columns dropped: ['metacritic', 'recommendations', 'website', 'dlc', 'achievements', 'demos', 'controller_support', 'reviews', 'legal_notice', 'drm_notice', 'ext_user_account_notice']


In [27]:
# see df shape and size
print(f"Shape of merged data : {df_steam_clean.shape}")
print(f"First 3 rows of merged data")
df_steam_clean.head(3)

Shape of merged data : (50023, 47)
First 3 rows of merged data


Unnamed: 0,about_the_game,background,categories,content_descriptors,detailed_description,developers,genres,header_image,is_free,linux_requirements,mac_requirements,name_x,package_groups,packages,pc_requirements,platforms,price_overview,publishers,release_date,required_age,screenshots,short_description,steam_appid,support_info,supported_languages,type,movies,appid,average_2weeks,average_forever,ccu,developer,discount,genre,initialprice,languages,median_2weeks,median_forever,name_y,negative,owners,positive,price,publisher,score_rank,tags,userscore
0,Play the world's number 1 online action game. Engage in an incredibly realistic brand of terrorist warfare in this wildly popular team-based game. Ally with teammates to complete strategic missions. Take out enemy sites. Rescue hostages. Your role affects your team's success. Your team's success affects your role.,https://cdn.akamai.steamstatic.com/steam/apps/10/page_bg_generated_v6b.jpg?t=1602535893,"[{'id': 1, 'description': 'Multi-player'}, {'id': 49, 'description': 'PvP'}, {'id': 36, 'description': 'Online PvP'}, {'id': 37, 'description': 'Shared/Split Screen PvP'}, {'id': 8, 'description': 'Valve Anti-Cheat enabled'}]","{'ids': [2, 5], 'notes': 'Includes intense violence and blood.'}",Play the world's number 1 online action game. Engage in an incredibly realistic brand of terrorist warfare in this wildly popular team-based game. Ally with teammates to complete strategic missions. Take out enemy sites. Rescue hostages. Your role affects your team's success. Your team's success affects your role.,[Valve],"[{'id': '1', 'description': 'Action'}]",https://cdn.akamai.steamstatic.com/steam/apps/10/header.jpg?t=1602535893,0.0,"{'minimum': 'Minimum: Linux Ubuntu 12.04, Dual-core from Intel or AMD at 2.8 GHz, 1GB Memory, nVidia GeForce 8600/9600GT, ATI/AMD Radeaon HD2600/3600 (Graphic Drivers: nVidia 310, AMD 12.11), OpenGL 2.1, 4GB Hard Drive Space, OpenAL Compatible Sound Card'}","{'minimum': 'Minimum: OS X Snow Leopard 10.6.3, 1GB RAM, 4GB Hard Drive Space,NVIDIA GeForce 8 or higher, ATI X1600 or higher, or Intel HD 3000 or higher Mouse, Keyboard, Internet Connection'}",Counter-Strike,"[{'name': 'default', 'title': 'Buy Counter-Strike', 'description': '', 'selection_text': 'Select a purchase option', 'save_text': '', 'display_type': 0, 'is_recurring_subscription': 'false', 'subs': [{'packageid': 7, 'percent_savings_text': ' ', 'percent_savings': 0, 'option_text': 'Counter-Strike: Condition Zero - 8,19€', 'option_description': '', 'can_get_free_license': '0', 'is_free_license...","[574941, 7]","{'minimum': '  <p><strong>Minimum:</strong> 500 mhz processor, 96mb ram, 16mb video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>  <p><strong>Recommended:</strong> 800 mhz processor, 128mb ram, 32mb+ video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>  '}","{'windows': True, 'mac': True, 'linux': True}","{'currency': 'EUR', 'initial': 819, 'final': 819, 'discount_percent': 0, 'initial_formatted': '', 'final_formatted': '8,19€'}",[Valve],"{'coming_soon': False, 'date': '1 Nov, 2000'}",0.0,"[{'id': 0, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/10/0000000132.600x338.jpg?t=1602535893', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/apps/10/0000000132.1920x1080.jpg?t=1602535893'}, {'id': 1, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/10/0000000133.600x338.jpg?t=1602535893', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/apps...",Play the world's number 1 online action game. Engage in an incredibly realistic brand of terrorist warfare in this wildly popular team-based game. Ally with teammates to complete strategic missions. Take out enemy sites. Rescue hostages. Your role affects your team's success. Your team's success affects your role.,10.0,"{'url': 'http://steamcommunity.com/app/10', 'email': ''}","English<strong>*</strong>, French<strong>*</strong>, German<strong>*</strong>, Italian<strong>*</strong>, Spanish - Spain<strong>*</strong>, Simplified Chinese<strong>*</strong>, Traditional Chinese<strong>*</strong>, Korean<strong>*</strong><br><strong>*</strong>languages with full audio support",game,,10.0,212.0,8690.0,16837.0,Valve,0,Action,999,"English, French, German, Italian, Spanish - Spain, Simplified Chinese, Traditional Chinese, Korean",116.0,239.0,Counter-Strike,4944.0,"10,000,000 .. 20,000,000",193192.0,999,Valve,,"{'Action': 5379, 'FPS': 4801, 'Multiplayer': 3362, 'Shooter': 3327, 'Classic': 2758, 'Team-Based': 1844, 'First-Person': 1692, 'Competitive': 1588, 'Tactical': 1323, '1990's': 1181, 'e-sports': 1173, 'PvP': 865, 'Old School': 751, 'Military': 623, 'Strategy': 604, 'Survival': 296, 'Score Attack': 285, '1980s': 256, 'Assassin': 223, 'Violent': 65}",0.0
1,"One of the most popular online action games of all time, Team Fortress Classic features over nine character classes -- from Medic to Spy to Demolition Man -- enlisted in a unique style of online team warfare. Each character class possesses unique weapons, items, and abilities, as teams compete online in a variety of game play modes.",https://cdn.akamai.steamstatic.com/steam/apps/20/page_bg_generated_v6b.jpg?t=1579634708,"[{'id': 1, 'description': 'Multi-player'}, {'id': 49, 'description': 'PvP'}, {'id': 36, 'description': 'Online PvP'}, {'id': 37, 'description': 'Shared/Split Screen PvP'}, {'id': 8, 'description': 'Valve Anti-Cheat enabled'}, {'id': 44, 'description': 'Remote Play Together'}]","{'ids': [2, 5], 'notes': 'Includes intense violence and blood.'}","One of the most popular online action games of all time, Team Fortress Classic features over nine character classes -- from Medic to Spy to Demolition Man -- enlisted in a unique style of online team warfare. Each character class possesses unique weapons, items, and abilities, as teams compete online in a variety of game play modes.",[Valve],"[{'id': '1', 'description': 'Action'}]",https://cdn.akamai.steamstatic.com/steam/apps/20/header.jpg?t=1579634708,0.0,"{'minimum': 'Minimum: Linux Ubuntu 12.04, Dual-core from Intel or AMD at 2.8 GHz, 1GB Memory, nVidia GeForce 8600/9600GT, ATI/AMD Radeaon HD2600/3600 (Graphic Drivers: nVidia 310, AMD 12.11), OpenGL 2.1, 4GB Hard Drive Space, OpenAL Compatible Sound Card'}","{'minimum': 'Minimum: OS X Snow Leopard 10.6.3, 1GB RAM, 4GB Hard Drive Space,NVIDIA GeForce 8 or higher, ATI X1600 or higher, or Intel HD 3000 or higher Mouse, Keyboard, Internet Connection'}",Team Fortress Classic,"[{'name': 'default', 'title': 'Buy Team Fortress Classic', 'description': '', 'selection_text': 'Select a purchase option', 'save_text': '', 'display_type': 0, 'is_recurring_subscription': 'false', 'subs': [{'packageid': 29, 'percent_savings_text': ' ', 'percent_savings': 0, 'option_text': 'Team Fortress Classic - S$5.25', 'option_description': '', 'can_get_free_license': '0', 'is_free_license...",[29],"{'minimum': '  <p><strong>Minimum:</strong> 500 mhz processor, 96mb ram, 16mb video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>  <p><strong>Recommended:</strong> 800 mhz processor, 128mb ram, 32mb+ video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>  '}","{'windows': True, 'mac': True, 'linux': True}","{'currency': 'SGD', 'initial': 525, 'final': 525, 'discount_percent': 0, 'initial_formatted': '', 'final_formatted': 'S$5.25'}",[Valve],"{'coming_soon': False, 'date': '1 Apr, 1999'}",0.0,"[{'id': 0, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/20/0000000164.600x338.jpg?t=1579634708', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/apps/20/0000000164.1920x1080.jpg?t=1579634708'}, {'id': 1, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/20/0000000165.600x338.jpg?t=1579634708', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/apps...","One of the most popular online action games of all time, Team Fortress Classic features over nine character classes -- from Medic to Spy to Demolition Man -- enlisted in a unique style of online team warfare. Each character class possesses unique weapons, items, and abilities, as teams compete online in a variety of game play modes.",20.0,"{'url': '', 'email': ''}","English, French, German, Italian, Spanish - Spain, Korean, Russian, Simplified Chinese, Traditional Chinese",game,,20.0,0.0,2752.0,77.0,Valve,0,Action,499,"English, French, German, Italian, Spanish - Spain, Korean, Russian, Simplified Chinese, Traditional Chinese",0.0,16.0,Team Fortress Classic,896.0,"5,000,000 .. 10,000,000",5416.0,499,Valve,,"{'Action': 745, 'FPS': 306, 'Multiplayer': 257, 'Classic': 232, 'Hero Shooter': 213, 'Shooter': 206, 'Team-Based': 188, 'Class-Based': 181, 'First-Person': 169, '1990's': 132, 'Old School': 106, 'Co-op': 89, 'Competitive': 68, 'Fast-Paced': 61, 'Retro': 55, 'Online Co-Op': 51, 'Violent': 45, 'Mod': 36, 'Funny': 35, 'Remake': 35}",0.0
2,"Enlist in an intense brand of Axis vs. Allied teamplay set in the WWII European Theatre of Operations. Players assume the role of light/assault/heavy infantry, sniper or machine-gunner class, each with a unique arsenal of historical weaponry at their disposal. Missions are based on key historical operations. And, as war rages, players must work together with their squad to accomplish a variety...",https://cdn.akamai.steamstatic.com/steam/apps/30/page_bg_generated_v6b.jpg?t=1512413490,"[{'id': 1, 'description': 'Multi-player'}, {'id': 8, 'description': 'Valve Anti-Cheat enabled'}]","{'ids': [], 'notes': None}","Enlist in an intense brand of Axis vs. Allied teamplay set in the WWII European Theatre of Operations. Players assume the role of light/assault/heavy infantry, sniper or machine-gunner class, each with a unique arsenal of historical weaponry at their disposal. Missions are based on key historical operations. And, as war rages, players must work together with their squad to accomplish a variety...",[Valve],"[{'id': '1', 'description': 'Action'}]",https://cdn.akamai.steamstatic.com/steam/apps/30/header.jpg?t=1512413490,0.0,"{'minimum': 'Minimum: Linux Ubuntu 12.04, Dual-core from Intel or AMD at 2.8 GHz, 1GB Memory, nVidia GeForce 8600/9600GT, ATI/AMD Radeaon HD2600/3600 (Graphic Drivers: nVidia 310, AMD 12.11), OpenGL 2.1, 4GB Hard Drive Space, OpenAL Compatible Sound Card'}","{'minimum': 'Minimum: OS X Snow Leopard 10.6.3, 1GB RAM, 4GB Hard Drive Space,NVIDIA GeForce 8 or higher, ATI X1600 or higher, or Intel HD 3000 or higher Mouse, Keyboard, Internet Connection'}",Day of Defeat,"[{'name': 'default', 'title': 'Buy Day of Defeat', 'description': '', 'selection_text': 'Select a purchase option', 'save_text': '', 'display_type': 0, 'is_recurring_subscription': 'false', 'subs': [{'packageid': 30, 'percent_savings_text': ' ', 'percent_savings': 0, 'option_text': 'Day of Defeat - S$5.25', 'option_description': '', 'can_get_free_license': '0', 'is_free_license': False, 'price...",[30],"{'minimum': '  <p><strong>Minimum:</strong> 500 mhz processor, 96mb ram, 16mb video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>  <p><strong>Recommended:</strong> 800 mhz processor, 128mb ram, 32mb+ video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>  '}","{'windows': True, 'mac': True, 'linux': True}","{'currency': 'SGD', 'initial': 525, 'final': 525, 'discount_percent': 0, 'initial_formatted': '', 'final_formatted': 'S$5.25'}",[Valve],"{'coming_soon': False, 'date': '1 May, 2003'}",0.0,"[{'id': 0, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/30/0000000169.600x338.jpg?t=1512413490', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/apps/30/0000000169.1920x1080.jpg?t=1512413490'}, {'id': 1, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/30/0000000170.600x338.jpg?t=1512413490', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/apps...","Enlist in an intense brand of Axis vs. Allied teamplay set in the WWII European Theatre of Operations. Players assume the role of light/assault/heavy infantry, sniper or machine-gunner class, each with a unique arsenal of historical weaponry at their disposal. Missions are based on key historical operations.",30.0,"{'url': '', 'email': ''}","English, French, German, Italian, Spanish - Spain",game,,30.0,0.0,4250.0,139.0,Valve,0,Action,499,"English, French, German, Italian, Spanish - Spain",0.0,28.0,Day of Defeat,557.0,"5,000,000 .. 10,000,000",5007.0,499,Valve,,"{'FPS': 788, 'World War II': 249, 'Multiplayer': 202, 'Shooter': 188, 'Action': 160, 'War': 151, 'Team-Based': 131, 'Classic': 126, 'First-Person': 105, 'Class-Based': 77, 'Military': 64, 'Historical': 57, 'Tactical': 40, 'Singleplayer': 37, 'Co-op': 34, 'Difficult': 18, 'Old School': 16, 'Retro': 14, 'World War I': 14, 'Strategy': 13}",0.0


In [28]:
# see df info
print(f"Info on merged data")
df_steam_clean.info()

Info on merged data
<class 'pandas.core.frame.DataFrame'>
Int64Index: 50023 entries, 0 to 50204
Data columns (total 47 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   about_the_game        50023 non-null  object 
 1   background            50023 non-null  object 
 2   categories            49084 non-null  object 
 3   content_descriptors   50023 non-null  object 
 4   detailed_description  50023 non-null  object 
 5   developers            49893 non-null  object 
 6   genres                49934 non-null  object 
 7   header_image          50023 non-null  object 
 8   is_free               50023 non-null  float64
 9   linux_requirements    50023 non-null  object 
 10  mac_requirements      50023 non-null  object 
 11  name_x                50023 non-null  object 
 12  package_groups        50023 non-null  object 
 13  packages              43153 non-null  object 
 14  pc_requirements       50023 non-null  object 
 15 

In [29]:
# see if there is any null data
print(f"In merged dataset -", end=" ")
num_col_null(df_steam_clean)

In merged dataset - Number of columns with null values: 12



Unnamed: 0,column,number of null values,percentage
0,categories,939,0.018771
1,developers,130,0.002599
2,genres,89,0.001779
3,packages,6870,0.137337
4,price_overview,7439,0.148712
5,screenshots,74,0.001479
6,supported_languages,33,0.00066
7,movies,2297,0.045919
8,discount,2,4e-05
9,initialprice,2,4e-05


---

## Data cleaning - null values

Before starting to merge the two dataset, we had remove the duplicates that are within both datasets. 

#### `price_overview`, `initialprice`, `discount`, `price`

Let us start cleaning with the column with the largest number of missing data and the columns that have missing data related. 
- `price_overview`: `7439` null values
- `initialprice`: `2` null values
- `discount`: `2` null values
- `price`: `2` null values

In [30]:
# see if the missing values in `discount`, `initialprice` and `price` is same rows with `price_overview`
num_col_null(df_steam_clean.loc[(df_steam_clean['price_overview'].isnull()), 
                                ['price_overview', 'initialprice', 'discount', 'price']])

Number of columns with null values: 4



Unnamed: 0,column,number of null values,percentage
0,price_overview,7439,1.0
1,initialprice,1,0.000134
2,discount,1,0.000134
3,price,1,0.000134


In [31]:
# see if the missing values in `discount` and `initialprice` is same rows with `price` and `price_overview`
num_col_null(df_steam_clean.loc[(df_steam_clean['price_overview'].isnull()) | (df_steam_clean['price'].isnull()), 
                                ['price_overview', 'initialprice', 'discount', 'price']])

Number of columns with null values: 4



Unnamed: 0,column,number of null values,percentage
0,price_overview,7439,0.999866
1,initialprice,2,0.000269
2,discount,2,0.000269
3,price,2,0.000269


In [32]:
# see number of rows with the null values identified
len(df_steam_clean.loc[(df_steam_clean['price_overview'].isnull()) | (df_steam_clean['price'].isnull()), 
                                ['price_overview', 'initialprice', 'discount', 'price']])

7440

Running the code, we see that to cover all the null values identified, a total of `7440` rows is highlighted. 

So cleaning will need to be performed on these `7440` rows. We will first look at the columns in details. There is another column, `is_free` that has relationship with pricing within the data. We will include this column when we look at the data. 

In [33]:
# see first 3 rows of the columns
list_price_col = ['price_overview', 'initialprice', 'discount', 'price', 'is_free']

df_steam_clean[list_price_col].head(3)

Unnamed: 0,price_overview,initialprice,discount,price,is_free
0,"{'currency': 'EUR', 'initial': 819, 'final': 819, 'discount_percent': 0, 'initial_formatted': '', 'final_formatted': '8,19€'}",999,0,999,0.0
1,"{'currency': 'SGD', 'initial': 525, 'final': 525, 'discount_percent': 0, 'initial_formatted': '', 'final_formatted': 'S$5.25'}",499,0,499,0.0
2,"{'currency': 'SGD', 'initial': 525, 'final': 525, 'discount_percent': 0, 'initial_formatted': '', 'final_formatted': 'S$5.25'}",499,0,499,0.0


In [34]:
# values within `is_free` column
print(f"for merge dataset, values found in 'is_free' column and the counts")
df_steam_clean['is_free'].value_counts()

for merge dataset, values found in 'is_free' column and the counts


0.0    44907
1.0     5116
Name: is_free, dtype: int64

The values in `is_free` is taking 0 and 1, suggesting that this column is of a `boolean` column, with `1` being `True` and `0` being `False`. We will convert column from `float64` to `int64`

In [35]:
# convert Dtype of column to omit decimal
df_steam_clean['is_free'] = df_steam_clean['is_free'].astype(int)

# values in `is_free` column for null values in price_overview
print(f"breakdown of values in `is_free` for the 7439 rows with null values")
df_steam_clean.loc[(df_steam_clean['price_overview'].isnull()), 'is_free'].value_counts()

breakdown of values in `is_free` for the 7439 rows with null values


1    5111
0    2328
Name: is_free, dtype: int64

We see that out of the `5116` free games, there are 5 with their price identified. Let us take a look at these 5 rows. 

In [36]:
# see the 5 lines with data for free games
df_steam_clean.loc[((df_steam_clean['price_overview'].notnull()) & (df_steam_clean['is_free']==1)), list_price_col]

Unnamed: 0,price_overview,initialprice,discount,price,is_free
5430,"{'currency': 'SGD', 'initial': 1500, 'final': 1500, 'discount_percent': 0, 'initial_formatted': '', 'final_formatted': 'Free'}",1499,90,149,1
7220,"{'currency': 'SGD', 'initial': 1120, 'final': 1120, 'discount_percent': 0, 'initial_formatted': '', 'final_formatted': 'Free'}",999,0,999,1
14592,"{'currency': 'SGD', 'initial': 110, 'final': 110, 'discount_percent': 0, 'initial_formatted': '', 'final_formatted': 'Free'}",99,0,99,1
28467,"{'currency': 'SGD', 'initial': 1850, 'final': 1850, 'discount_percent': 0, 'initial_formatted': '', 'final_formatted': 'Free'}",1999,0,1999,1
44168,"{'currency': 'SGD', 'initial': 2280, 'final': 2280, 'discount_percent': 0, 'initial_formatted': '', 'final_formatted': 'Free'}",1510,0,1510,1


Just from this 5 lines, we see that there are conflicting data. As our ultimate aim is to analyse steam data, we will use the data from `steam store` over the data collected from `steamspy` where conflict is present, must list the above data. 

Another consideration in the value difference between the data collected is `Steam Spy` prices collected are in US dollar (`discount` is in percent) while the data collected through `Steam Store` has the currency listed within the list. We will do this cleaning at a later stage when seperating the details. 

For just for `price_overview` has dictionary values consisting of 6 key-value pairs. 

We see that these rows have in common `{'final_formatted':'free'}` dictionary value. 

As such, we will fill in `{'currency': '', 'initial': 0, 'final': 0, 'discount_percent': 0, 'initial_formatted': '', 'final_formatted': 'Free'}` in place for `price_overview` column if `is_free` is **True**. 

In [37]:
# fill price_overview with dictionary if `is_free` is 1 and price_overview is null
df_steam_clean['price_overview'] = [{'currency': '', 'initial': 0, 'final': 0, 'discount_percent': 0, 'initial_formatted': '', 'final_formatted': 'Free'}
                                    if ((row['is_free'] == 1)&(pd.isna(row['price_overview']))) else row['price_overview'] 
                                    for index, row in df_steam_clean.iterrows()]

In [38]:
# values in `is_free` column for null values in price_overview, 'price', 'initialprice', 'discount'
print(f"breakdown of values in `is_free` for the (7440-5111) rows with null values")
df_steam_clean.loc[(df_steam_clean['price_overview'].isnull()) | (df_steam_clean['price'].isnull()), 'is_free'].value_counts()

breakdown of values in `is_free` for the (7440-5111) rows with null values


0    2329
Name: is_free, dtype: int64

In [39]:
# see the missing values
num_col_null(df_steam_clean.loc[(df_steam_clean['price_overview'].isnull()) | (df_steam_clean['price'].isnull()), 
                                list_price_col])

Number of columns with null values: 4



Unnamed: 0,column,number of null values,percentage
0,price_overview,2328,0.999571
1,initialprice,2,0.000859
2,discount,2,0.000859
3,price,2,0.000859


We see that there 2329 details left and these games should not be free. We will use the values within `initialprice`, `discount` and `price` to fill in the details in the dictionary. We will drop the 2 rows that have missing data in this 3 columns. 

In [40]:
# remove `2` rows with missing data
df_steam_clean = df_steam_clean[(df_steam_clean['price'].notnull())]

In [41]:
# see the missing values
num_col_null(df_steam_clean.loc[(df_steam_clean['price_overview'].isnull()), list_price_col])

Number of columns with null values: 1



Unnamed: 0,column,number of null values,percentage
0,price_overview,2327,1.0


In [42]:
# temp dictionary to store values
temp = {}

# loop to store the values to dataframe
for index, row in df_steam_clean.loc[(df_steam_clean['price_overview'].isnull())].iterrows():
    # creating dictionary
    temp['currency'] = 'USD'
    temp['initial'] = row['initialprice']
    temp['final'] = row['price']
    temp['discount_percent'] = row['discount']
    temp['initial_formatted'] = int(row['initialprice'])/100
    temp['final_formatted'] = int(row['price'])/100
    # assigning to dataframe
    df_steam_clean.loc[index, 'price_overview'] = [temp]

In [43]:
# see the missing values
num_col_null(df_steam_clean.loc[(df_steam_clean['price_overview'].isnull()), list_price_col])

Number of columns with null values: 0



Unnamed: 0,column,number of null values,percentage


We see that we have finished cleaning the price related columns. Now, let us look at the rest of the data. 

In [44]:
# see if there is any null data
print(f"In merged dataset -", end=" ")
num_col_null(df_steam_clean)

In merged dataset - Number of columns with null values: 7



Unnamed: 0,column,number of null values,percentage
0,categories,939,0.018772
1,developers,130,0.002599
2,genres,89,0.001779
3,packages,6869,0.137322
4,screenshots,74,0.001479
5,supported_languages,33,0.00066
6,movies,2297,0.045921


#### `packages`

We see that the `packages` have the largest number of missing values now. There is another column, `package_groups` that contains details about packages. Let us examine the two columns. 

In [45]:
# see head of both columns
df_steam_clean[['packages', 'package_groups']].head()

Unnamed: 0,packages,package_groups
0,"[574941, 7]","[{'name': 'default', 'title': 'Buy Counter-Strike', 'description': '', 'selection_text': 'Select a purchase option', 'save_text': '', 'display_type': 0, 'is_recurring_subscription': 'false', 'subs': [{'packageid': 7, 'percent_savings_text': ' ', 'percent_savings': 0, 'option_text': 'Counter-Strike: Condition Zero - 8,19€', 'option_description': '', 'can_get_free_license': '0', 'is_free_license..."
1,[29],"[{'name': 'default', 'title': 'Buy Team Fortress Classic', 'description': '', 'selection_text': 'Select a purchase option', 'save_text': '', 'display_type': 0, 'is_recurring_subscription': 'false', 'subs': [{'packageid': 29, 'percent_savings_text': ' ', 'percent_savings': 0, 'option_text': 'Team Fortress Classic - S$5.25', 'option_description': '', 'can_get_free_license': '0', 'is_free_license..."
2,[30],"[{'name': 'default', 'title': 'Buy Day of Defeat', 'description': '', 'selection_text': 'Select a purchase option', 'save_text': '', 'display_type': 0, 'is_recurring_subscription': 'false', 'subs': [{'packageid': 30, 'percent_savings_text': ' ', 'percent_savings': 0, 'option_text': 'Day of Defeat - S$5.25', 'option_description': '', 'can_get_free_license': '0', 'is_free_license': False, 'price..."
3,[31],"[{'name': 'default', 'title': 'Buy Deathmatch Classic', 'description': '', 'selection_text': 'Select a purchase option', 'save_text': '', 'display_type': 0, 'is_recurring_subscription': 'false', 'subs': [{'packageid': 31, 'percent_savings_text': ' ', 'percent_savings': 0, 'option_text': 'Deathmatch Classic - S$5.25', 'option_description': '', 'can_get_free_license': '0', 'is_free_license': Fal..."
4,[32],"[{'name': 'default', 'title': 'Buy Half-Life: Opposing Force', 'description': '', 'selection_text': 'Select a purchase option', 'save_text': '', 'display_type': 0, 'is_recurring_subscription': 'false', 'subs': [{'packageid': 32, 'percent_savings_text': ' ', 'percent_savings': 0, 'option_text': 'Opposing Force - S$5.25', 'option_description': '', 'can_get_free_license': '0', 'is_free_license': ..."


In [46]:
# values in `package_groups` column for null values in `packages`
df_steam_clean.loc[df_steam_clean['packages'].isnull(),'package_groups'].value_counts()

[]    6869
Name: package_groups, dtype: int64

In [47]:
# values in `package_groups` column for null values in `packages`
df_steam_clean.loc[[not bool(value) for value in df_steam_clean['package_groups']]&
                   (df_steam_clean['packages'].notnull()),
                   'packages'].value_counts()

[619]       1
[371232]    1
[338240]    1
[324141]    1
[324084]    1
           ..
[331024]    1
[325604]    1
[196449]    1
[443194]    1
[624460]    1
Name: packages, Length: 292, dtype: int64

We see that for rows that are missing in `packages`, the `package_groups` are empty lists. 

Running the above code, we see that while there is no common value within the list as `value_counts` is in descending order by default. 

We will be assigning `0` to `packages` to fill the null values in `packages` as running 
```python 
df.fillna(value=[])
``` 
will result in error as value cannot be a list. 

In [48]:
df_steam_clean[['packages']] = df_steam_clean[['packages']].fillna(value=0)

Now that we have finished cleaning the `packages` column, we will look at the next column with the highest number of missing values. 

In [49]:
# see if there is any null data
print(f"In merged dataset -", end=" ")
num_col_null(df_steam_clean)

In merged dataset - Number of columns with null values: 6



Unnamed: 0,column,number of null values,percentage
0,categories,939,0.018772
1,developers,130,0.002599
2,genres,89,0.001779
3,screenshots,74,0.001479
4,supported_languages,33,0.00066
5,movies,2297,0.045921


#### `movies`, `screenshots`

Running the above code, we see that the `movies` have the largest number of missing values. We noticed that `screenshots` is a smiliar column. Let us take a look at the columns. 

In [50]:
df_steam_clean[['movies', 'screenshots']].head(20)

Unnamed: 0,movies,screenshots
0,,"[{'id': 0, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/10/0000000132.600x338.jpg?t=1602535893', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/apps/10/0000000132.1920x1080.jpg?t=1602535893'}, {'id': 1, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/10/0000000133.600x338.jpg?t=1602535893', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/apps..."
1,,"[{'id': 0, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/20/0000000164.600x338.jpg?t=1579634708', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/apps/20/0000000164.1920x1080.jpg?t=1579634708'}, {'id': 1, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/20/0000000165.600x338.jpg?t=1579634708', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/apps..."
2,,"[{'id': 0, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/30/0000000169.600x338.jpg?t=1512413490', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/apps/30/0000000169.1920x1080.jpg?t=1512413490'}, {'id': 1, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/30/0000000170.600x338.jpg?t=1512413490', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/apps..."
3,,"[{'id': 0, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/40/0000000142.600x338.jpg?t=1568752159', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/apps/40/0000000142.1920x1080.jpg?t=1568752159'}, {'id': 1, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/40/0000000143.600x338.jpg?t=1568752159', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/apps..."
4,,"[{'id': 0, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/50/0000000155.600x338.jpg?t=1579628243', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/apps/50/0000000155.1920x1080.jpg?t=1579628243'}, {'id': 1, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/50/0000000156.600x338.jpg?t=1579628243', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/apps..."
5,,"[{'id': 0, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/60/ss_2bf674132e28385f168dbc46ff55eea7be8c8886.600x338.jpg?t=1599518374', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/apps/60/ss_2bf674132e28385f168dbc46ff55eea7be8c8886.1920x1080.jpg?t=1599518374'}, {'id': 1, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/60/ss_079e05789d40144896f9b16fd49c..."
6,,"[{'id': 0, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/70/0000002354.600x338.jpg?t=1591048039', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/apps/70/0000002354.1920x1080.jpg?t=1591048039'}, {'id': 1, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/70/0000002343.600x338.jpg?t=1591048039', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/apps..."
7,,"[{'id': 0, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/80/0000002528.600x338.jpg?t=1602535977', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/apps/80/0000002528.1920x1080.jpg?t=1602535977'}, {'id': 1, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/80/0000002529.600x338.jpg?t=1602535977', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/apps..."
8,,"[{'id': 0, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/130/0000000127.600x338.jpg?t=1579629868', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/apps/130/0000000127.1920x1080.jpg?t=1579629868'}, {'id': 1, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/130/0000000128.600x338.jpg?t=1579629868', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/a..."
9,"[{'id': 904, 'name': 'Half-Life 2 Trailer', 'thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/904/movie.jpg?t=1569623096', 'webm': {'480': 'http://cdn.akamai.steamstatic.com/steam/apps/904/movie480.webm?t=1569623096', 'max': 'http://cdn.akamai.steamstatic.com/steam/apps/904/movie_max.webm?t=1569623096'}, 'mp4': {'480': 'http://cdn.akamai.steamstatic.com/steam/apps/904/movie480.mp4?t=...","[{'id': 0, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/220/0000001864.600x338.jpg?t=1591063154', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/apps/220/0000001864.1920x1080.jpg?t=1591063154'}, {'id': 1, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/220/0000001865.600x338.jpg?t=1591063154', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/a..."


We see that the column shows the link and details to the trailer or PVs or images that are being used for the game at the page. As we will not be analysing the content of the videos, we will create two new columns `has_movie` and `has_screenshot` the column with `0` and `1`, where `1` indicates there is details and `0` indicates there is no details. 

We will not be 'cleaning' this two column and leave it even at the end where we will create a copy of the columns that we will be unpivoting.

In [51]:
# create new column `has_movies`
df_steam_clean['has_movies'] = [0 if row else 1 for index, row in df_steam_clean['movies'].isnull().iteritems()]

# replace the column `has_screenshots`
df_steam_clean['has_screenshots'] = [0 if row else 1 for index, row in df_steam_clean['screenshots'].isnull().iteritems()]

In [52]:
# see the new column value counts
df_steam_clean['has_movies'].value_counts()

1    47724
0     2297
Name: has_movies, dtype: int64

In [53]:
# see the new column value counts
df_steam_clean['has_screenshots'].value_counts()

1    49947
0       74
Name: has_screenshots, dtype: int64

Now that we have finished cleaning the `movies` and `screenshots` column, let us take a look at the remaining columns with null values.  

In [54]:
# see if there is any null data
print(f"In merged dataset -", end=" ")
num_col_null(df_steam_clean)

In merged dataset - Number of columns with null values: 6



Unnamed: 0,column,number of null values,percentage
0,categories,939,0.018772
1,developers,130,0.002599
2,genres,89,0.001779
3,screenshots,74,0.001479
4,supported_languages,33,0.00066
5,movies,2297,0.045921


#### `developers`

We see that information is also found in a seperate column, `developer`, let us take a look at the columns.

In [55]:
# see the first 5 rows of the data
df_steam_clean[['developers','developer']].head(20)

Unnamed: 0,developers,developer
0,[Valve],Valve
1,[Valve],Valve
2,[Valve],Valve
3,[Valve],Valve
4,[Gearbox Software],Gearbox Software
5,[Valve],Valve
6,[Valve],Valve
7,[Valve],Valve
8,[Gearbox Software],Gearbox Software
9,[Valve],Valve


We see that the values should be equals to each other. As such, we will drop the column `developers` that contains missing values. 

In [56]:
# drop `developers` column
df_steam_clean.drop(columns=["developers"], inplace=True)

In [57]:
# see df shape and size
print(f"Shape of merged data : {df_steam_clean.shape}")
print(f"First 3 rows of merged data")
df_steam_clean.head(3)

Shape of merged data : (50021, 48)
First 3 rows of merged data


Unnamed: 0,about_the_game,background,categories,content_descriptors,detailed_description,genres,header_image,is_free,linux_requirements,mac_requirements,name_x,package_groups,packages,pc_requirements,platforms,price_overview,publishers,release_date,required_age,screenshots,short_description,steam_appid,support_info,supported_languages,type,movies,appid,average_2weeks,average_forever,ccu,developer,discount,genre,initialprice,languages,median_2weeks,median_forever,name_y,negative,owners,positive,price,publisher,score_rank,tags,userscore,has_movies,has_screenshots
0,Play the world's number 1 online action game. Engage in an incredibly realistic brand of terrorist warfare in this wildly popular team-based game. Ally with teammates to complete strategic missions. Take out enemy sites. Rescue hostages. Your role affects your team's success. Your team's success affects your role.,https://cdn.akamai.steamstatic.com/steam/apps/10/page_bg_generated_v6b.jpg?t=1602535893,"[{'id': 1, 'description': 'Multi-player'}, {'id': 49, 'description': 'PvP'}, {'id': 36, 'description': 'Online PvP'}, {'id': 37, 'description': 'Shared/Split Screen PvP'}, {'id': 8, 'description': 'Valve Anti-Cheat enabled'}]","{'ids': [2, 5], 'notes': 'Includes intense violence and blood.'}",Play the world's number 1 online action game. Engage in an incredibly realistic brand of terrorist warfare in this wildly popular team-based game. Ally with teammates to complete strategic missions. Take out enemy sites. Rescue hostages. Your role affects your team's success. Your team's success affects your role.,"[{'id': '1', 'description': 'Action'}]",https://cdn.akamai.steamstatic.com/steam/apps/10/header.jpg?t=1602535893,0,"{'minimum': 'Minimum: Linux Ubuntu 12.04, Dual-core from Intel or AMD at 2.8 GHz, 1GB Memory, nVidia GeForce 8600/9600GT, ATI/AMD Radeaon HD2600/3600 (Graphic Drivers: nVidia 310, AMD 12.11), OpenGL 2.1, 4GB Hard Drive Space, OpenAL Compatible Sound Card'}","{'minimum': 'Minimum: OS X Snow Leopard 10.6.3, 1GB RAM, 4GB Hard Drive Space,NVIDIA GeForce 8 or higher, ATI X1600 or higher, or Intel HD 3000 or higher Mouse, Keyboard, Internet Connection'}",Counter-Strike,"[{'name': 'default', 'title': 'Buy Counter-Strike', 'description': '', 'selection_text': 'Select a purchase option', 'save_text': '', 'display_type': 0, 'is_recurring_subscription': 'false', 'subs': [{'packageid': 7, 'percent_savings_text': ' ', 'percent_savings': 0, 'option_text': 'Counter-Strike: Condition Zero - 8,19€', 'option_description': '', 'can_get_free_license': '0', 'is_free_license...","[574941, 7]","{'minimum': '  <p><strong>Minimum:</strong> 500 mhz processor, 96mb ram, 16mb video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>  <p><strong>Recommended:</strong> 800 mhz processor, 128mb ram, 32mb+ video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>  '}","{'windows': True, 'mac': True, 'linux': True}","{'currency': 'EUR', 'initial': 819, 'final': 819, 'discount_percent': 0, 'initial_formatted': '', 'final_formatted': '8,19€'}",[Valve],"{'coming_soon': False, 'date': '1 Nov, 2000'}",0.0,"[{'id': 0, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/10/0000000132.600x338.jpg?t=1602535893', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/apps/10/0000000132.1920x1080.jpg?t=1602535893'}, {'id': 1, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/10/0000000133.600x338.jpg?t=1602535893', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/apps...",Play the world's number 1 online action game. Engage in an incredibly realistic brand of terrorist warfare in this wildly popular team-based game. Ally with teammates to complete strategic missions. Take out enemy sites. Rescue hostages. Your role affects your team's success. Your team's success affects your role.,10.0,"{'url': 'http://steamcommunity.com/app/10', 'email': ''}","English<strong>*</strong>, French<strong>*</strong>, German<strong>*</strong>, Italian<strong>*</strong>, Spanish - Spain<strong>*</strong>, Simplified Chinese<strong>*</strong>, Traditional Chinese<strong>*</strong>, Korean<strong>*</strong><br><strong>*</strong>languages with full audio support",game,,10.0,212.0,8690.0,16837.0,Valve,0,Action,999,"English, French, German, Italian, Spanish - Spain, Simplified Chinese, Traditional Chinese, Korean",116.0,239.0,Counter-Strike,4944.0,"10,000,000 .. 20,000,000",193192.0,999,Valve,,"{'Action': 5379, 'FPS': 4801, 'Multiplayer': 3362, 'Shooter': 3327, 'Classic': 2758, 'Team-Based': 1844, 'First-Person': 1692, 'Competitive': 1588, 'Tactical': 1323, '1990's': 1181, 'e-sports': 1173, 'PvP': 865, 'Old School': 751, 'Military': 623, 'Strategy': 604, 'Survival': 296, 'Score Attack': 285, '1980s': 256, 'Assassin': 223, 'Violent': 65}",0.0,0,1
1,"One of the most popular online action games of all time, Team Fortress Classic features over nine character classes -- from Medic to Spy to Demolition Man -- enlisted in a unique style of online team warfare. Each character class possesses unique weapons, items, and abilities, as teams compete online in a variety of game play modes.",https://cdn.akamai.steamstatic.com/steam/apps/20/page_bg_generated_v6b.jpg?t=1579634708,"[{'id': 1, 'description': 'Multi-player'}, {'id': 49, 'description': 'PvP'}, {'id': 36, 'description': 'Online PvP'}, {'id': 37, 'description': 'Shared/Split Screen PvP'}, {'id': 8, 'description': 'Valve Anti-Cheat enabled'}, {'id': 44, 'description': 'Remote Play Together'}]","{'ids': [2, 5], 'notes': 'Includes intense violence and blood.'}","One of the most popular online action games of all time, Team Fortress Classic features over nine character classes -- from Medic to Spy to Demolition Man -- enlisted in a unique style of online team warfare. Each character class possesses unique weapons, items, and abilities, as teams compete online in a variety of game play modes.","[{'id': '1', 'description': 'Action'}]",https://cdn.akamai.steamstatic.com/steam/apps/20/header.jpg?t=1579634708,0,"{'minimum': 'Minimum: Linux Ubuntu 12.04, Dual-core from Intel or AMD at 2.8 GHz, 1GB Memory, nVidia GeForce 8600/9600GT, ATI/AMD Radeaon HD2600/3600 (Graphic Drivers: nVidia 310, AMD 12.11), OpenGL 2.1, 4GB Hard Drive Space, OpenAL Compatible Sound Card'}","{'minimum': 'Minimum: OS X Snow Leopard 10.6.3, 1GB RAM, 4GB Hard Drive Space,NVIDIA GeForce 8 or higher, ATI X1600 or higher, or Intel HD 3000 or higher Mouse, Keyboard, Internet Connection'}",Team Fortress Classic,"[{'name': 'default', 'title': 'Buy Team Fortress Classic', 'description': '', 'selection_text': 'Select a purchase option', 'save_text': '', 'display_type': 0, 'is_recurring_subscription': 'false', 'subs': [{'packageid': 29, 'percent_savings_text': ' ', 'percent_savings': 0, 'option_text': 'Team Fortress Classic - S$5.25', 'option_description': '', 'can_get_free_license': '0', 'is_free_license...",[29],"{'minimum': '  <p><strong>Minimum:</strong> 500 mhz processor, 96mb ram, 16mb video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>  <p><strong>Recommended:</strong> 800 mhz processor, 128mb ram, 32mb+ video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>  '}","{'windows': True, 'mac': True, 'linux': True}","{'currency': 'SGD', 'initial': 525, 'final': 525, 'discount_percent': 0, 'initial_formatted': '', 'final_formatted': 'S$5.25'}",[Valve],"{'coming_soon': False, 'date': '1 Apr, 1999'}",0.0,"[{'id': 0, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/20/0000000164.600x338.jpg?t=1579634708', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/apps/20/0000000164.1920x1080.jpg?t=1579634708'}, {'id': 1, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/20/0000000165.600x338.jpg?t=1579634708', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/apps...","One of the most popular online action games of all time, Team Fortress Classic features over nine character classes -- from Medic to Spy to Demolition Man -- enlisted in a unique style of online team warfare. Each character class possesses unique weapons, items, and abilities, as teams compete online in a variety of game play modes.",20.0,"{'url': '', 'email': ''}","English, French, German, Italian, Spanish - Spain, Korean, Russian, Simplified Chinese, Traditional Chinese",game,,20.0,0.0,2752.0,77.0,Valve,0,Action,499,"English, French, German, Italian, Spanish - Spain, Korean, Russian, Simplified Chinese, Traditional Chinese",0.0,16.0,Team Fortress Classic,896.0,"5,000,000 .. 10,000,000",5416.0,499,Valve,,"{'Action': 745, 'FPS': 306, 'Multiplayer': 257, 'Classic': 232, 'Hero Shooter': 213, 'Shooter': 206, 'Team-Based': 188, 'Class-Based': 181, 'First-Person': 169, '1990's': 132, 'Old School': 106, 'Co-op': 89, 'Competitive': 68, 'Fast-Paced': 61, 'Retro': 55, 'Online Co-Op': 51, 'Violent': 45, 'Mod': 36, 'Funny': 35, 'Remake': 35}",0.0,0,1
2,"Enlist in an intense brand of Axis vs. Allied teamplay set in the WWII European Theatre of Operations. Players assume the role of light/assault/heavy infantry, sniper or machine-gunner class, each with a unique arsenal of historical weaponry at their disposal. Missions are based on key historical operations. And, as war rages, players must work together with their squad to accomplish a variety...",https://cdn.akamai.steamstatic.com/steam/apps/30/page_bg_generated_v6b.jpg?t=1512413490,"[{'id': 1, 'description': 'Multi-player'}, {'id': 8, 'description': 'Valve Anti-Cheat enabled'}]","{'ids': [], 'notes': None}","Enlist in an intense brand of Axis vs. Allied teamplay set in the WWII European Theatre of Operations. Players assume the role of light/assault/heavy infantry, sniper or machine-gunner class, each with a unique arsenal of historical weaponry at their disposal. Missions are based on key historical operations. And, as war rages, players must work together with their squad to accomplish a variety...","[{'id': '1', 'description': 'Action'}]",https://cdn.akamai.steamstatic.com/steam/apps/30/header.jpg?t=1512413490,0,"{'minimum': 'Minimum: Linux Ubuntu 12.04, Dual-core from Intel or AMD at 2.8 GHz, 1GB Memory, nVidia GeForce 8600/9600GT, ATI/AMD Radeaon HD2600/3600 (Graphic Drivers: nVidia 310, AMD 12.11), OpenGL 2.1, 4GB Hard Drive Space, OpenAL Compatible Sound Card'}","{'minimum': 'Minimum: OS X Snow Leopard 10.6.3, 1GB RAM, 4GB Hard Drive Space,NVIDIA GeForce 8 or higher, ATI X1600 or higher, or Intel HD 3000 or higher Mouse, Keyboard, Internet Connection'}",Day of Defeat,"[{'name': 'default', 'title': 'Buy Day of Defeat', 'description': '', 'selection_text': 'Select a purchase option', 'save_text': '', 'display_type': 0, 'is_recurring_subscription': 'false', 'subs': [{'packageid': 30, 'percent_savings_text': ' ', 'percent_savings': 0, 'option_text': 'Day of Defeat - S$5.25', 'option_description': '', 'can_get_free_license': '0', 'is_free_license': False, 'price...",[30],"{'minimum': '  <p><strong>Minimum:</strong> 500 mhz processor, 96mb ram, 16mb video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>  <p><strong>Recommended:</strong> 800 mhz processor, 128mb ram, 32mb+ video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>  '}","{'windows': True, 'mac': True, 'linux': True}","{'currency': 'SGD', 'initial': 525, 'final': 525, 'discount_percent': 0, 'initial_formatted': '', 'final_formatted': 'S$5.25'}",[Valve],"{'coming_soon': False, 'date': '1 May, 2003'}",0.0,"[{'id': 0, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/30/0000000169.600x338.jpg?t=1512413490', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/apps/30/0000000169.1920x1080.jpg?t=1512413490'}, {'id': 1, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/30/0000000170.600x338.jpg?t=1512413490', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/apps...","Enlist in an intense brand of Axis vs. Allied teamplay set in the WWII European Theatre of Operations. Players assume the role of light/assault/heavy infantry, sniper or machine-gunner class, each with a unique arsenal of historical weaponry at their disposal. Missions are based on key historical operations.",30.0,"{'url': '', 'email': ''}","English, French, German, Italian, Spanish - Spain",game,,30.0,0.0,4250.0,139.0,Valve,0,Action,499,"English, French, German, Italian, Spanish - Spain",0.0,28.0,Day of Defeat,557.0,"5,000,000 .. 10,000,000",5007.0,499,Valve,,"{'FPS': 788, 'World War II': 249, 'Multiplayer': 202, 'Shooter': 188, 'Action': 160, 'War': 151, 'Team-Based': 131, 'Classic': 126, 'First-Person': 105, 'Class-Based': 77, 'Military': 64, 'Historical': 57, 'Tactical': 40, 'Singleplayer': 37, 'Co-op': 34, 'Difficult': 18, 'Old School': 16, 'Retro': 14, 'World War I': 14, 'Strategy': 13}",0.0,0,1


In [58]:
# see df info
print(f"Info on merged data")
df_steam_clean.info()

Info on merged data
<class 'pandas.core.frame.DataFrame'>
Int64Index: 50021 entries, 0 to 50204
Data columns (total 48 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   about_the_game        50021 non-null  object 
 1   background            50021 non-null  object 
 2   categories            49082 non-null  object 
 3   content_descriptors   50021 non-null  object 
 4   detailed_description  50021 non-null  object 
 5   genres                49932 non-null  object 
 6   header_image          50021 non-null  object 
 7   is_free               50021 non-null  int32  
 8   linux_requirements    50021 non-null  object 
 9   mac_requirements      50021 non-null  object 
 10  name_x                50021 non-null  object 
 11  package_groups        50021 non-null  object 
 12  packages              50021 non-null  object 
 13  pc_requirements       50021 non-null  object 
 14  platforms             50021 non-null  object 
 15 

In [59]:
# see if there is any null data
print(f"In merged dataset -", end=" ")
num_col_null(df_steam_clean)

In merged dataset - Number of columns with null values: 5



Unnamed: 0,column,number of null values,percentage
0,categories,939,0.018772
1,genres,89,0.001779
2,screenshots,74,0.001479
3,supported_languages,33,0.00066
4,movies,2297,0.045921


#### `supported_languages`

We see that information is also found in a seperate column, `languages`, let us take a look at the columns.

In [60]:
# see the first 5 rows of the data
df_steam_clean[['supported_languages','languages']].head(20)

Unnamed: 0,supported_languages,languages
0,"English<strong>*</strong>, French<strong>*</strong>, German<strong>*</strong>, Italian<strong>*</strong>, Spanish - Spain<strong>*</strong>, Simplified Chinese<strong>*</strong>, Traditional Chinese<strong>*</strong>, Korean<strong>*</strong><br><strong>*</strong>languages with full audio support","English, French, German, Italian, Spanish - Spain, Simplified Chinese, Traditional Chinese, Korean"
1,"English, French, German, Italian, Spanish - Spain, Korean, Russian, Simplified Chinese, Traditional Chinese","English, French, German, Italian, Spanish - Spain, Korean, Russian, Simplified Chinese, Traditional Chinese"
2,"English, French, German, Italian, Spanish - Spain","English, French, German, Italian, Spanish - Spain"
3,"English, French, German, Italian, Spanish - Spain, Korean, Russian, Simplified Chinese, Traditional Chinese","English, French, German, Italian, Spanish - Spain, Korean, Russian, Simplified Chinese, Traditional Chinese"
4,"English, French, German, Korean","English, French, German, Korean"
5,"English, French, German, Italian, Spanish - Spain, Korean, Russian, Simplified Chinese, Traditional Chinese","English, French, German, Italian, Spanish - Spain, Korean, Russian, Simplified Chinese, Traditional Chinese"
6,"English<strong>*</strong>, French<strong>*</strong>, German<strong>*</strong>, Italian<strong>*</strong>, Spanish - Spain<strong>*</strong>, Simplified Chinese<strong>*</strong>, Traditional Chinese<strong>*</strong>, Korean<strong>*</strong><br><strong>*</strong>languages with full audio support","English, French, German, Italian, Spanish - Spain, Simplified Chinese, Traditional Chinese, Korean"
7,"English, French, German, Italian, Spanish - Spain, Simplified Chinese, Traditional Chinese, Korean","English, French, German, Italian, Spanish - Spain, Simplified Chinese, Traditional Chinese, Korean"
8,"English, French, German","English, French, German"
9,"English<strong>*</strong>, French<strong>*</strong>, German<strong>*</strong>, Italian<strong>*</strong>, Korean<strong>*</strong>, Spanish - Spain<strong>*</strong>, Russian<strong>*</strong>, Simplified Chinese, Traditional Chinese, Dutch, Danish, Finnish, Japanese, Norwegian, Polish, Portuguese, Swedish, Thai<br><strong>*</strong>languages with full audio support","English, French, German, Italian, Korean, Spanish - Spain, Russian, Simplified Chinese, Traditional Chinese, Dutch, Danish, Finnish, Japanese, Norwegian, Polish, Portuguese, Swedish, Thai"


We see that the values should be equals to each other. As such, we will drop the column `supported languages` that contains missing values. 

In [61]:
# drop `supported_languages` column
df_steam_clean.drop(columns=["supported_languages"], inplace=True)

In [62]:
# see df shape and size
print(f"Shape of merged data : {df_steam_clean.shape}")
print(f"First 3 rows of merged data")
df_steam_clean.head(3)

Shape of merged data : (50021, 47)
First 3 rows of merged data


Unnamed: 0,about_the_game,background,categories,content_descriptors,detailed_description,genres,header_image,is_free,linux_requirements,mac_requirements,name_x,package_groups,packages,pc_requirements,platforms,price_overview,publishers,release_date,required_age,screenshots,short_description,steam_appid,support_info,type,movies,appid,average_2weeks,average_forever,ccu,developer,discount,genre,initialprice,languages,median_2weeks,median_forever,name_y,negative,owners,positive,price,publisher,score_rank,tags,userscore,has_movies,has_screenshots
0,Play the world's number 1 online action game. Engage in an incredibly realistic brand of terrorist warfare in this wildly popular team-based game. Ally with teammates to complete strategic missions. Take out enemy sites. Rescue hostages. Your role affects your team's success. Your team's success affects your role.,https://cdn.akamai.steamstatic.com/steam/apps/10/page_bg_generated_v6b.jpg?t=1602535893,"[{'id': 1, 'description': 'Multi-player'}, {'id': 49, 'description': 'PvP'}, {'id': 36, 'description': 'Online PvP'}, {'id': 37, 'description': 'Shared/Split Screen PvP'}, {'id': 8, 'description': 'Valve Anti-Cheat enabled'}]","{'ids': [2, 5], 'notes': 'Includes intense violence and blood.'}",Play the world's number 1 online action game. Engage in an incredibly realistic brand of terrorist warfare in this wildly popular team-based game. Ally with teammates to complete strategic missions. Take out enemy sites. Rescue hostages. Your role affects your team's success. Your team's success affects your role.,"[{'id': '1', 'description': 'Action'}]",https://cdn.akamai.steamstatic.com/steam/apps/10/header.jpg?t=1602535893,0,"{'minimum': 'Minimum: Linux Ubuntu 12.04, Dual-core from Intel or AMD at 2.8 GHz, 1GB Memory, nVidia GeForce 8600/9600GT, ATI/AMD Radeaon HD2600/3600 (Graphic Drivers: nVidia 310, AMD 12.11), OpenGL 2.1, 4GB Hard Drive Space, OpenAL Compatible Sound Card'}","{'minimum': 'Minimum: OS X Snow Leopard 10.6.3, 1GB RAM, 4GB Hard Drive Space,NVIDIA GeForce 8 or higher, ATI X1600 or higher, or Intel HD 3000 or higher Mouse, Keyboard, Internet Connection'}",Counter-Strike,"[{'name': 'default', 'title': 'Buy Counter-Strike', 'description': '', 'selection_text': 'Select a purchase option', 'save_text': '', 'display_type': 0, 'is_recurring_subscription': 'false', 'subs': [{'packageid': 7, 'percent_savings_text': ' ', 'percent_savings': 0, 'option_text': 'Counter-Strike: Condition Zero - 8,19€', 'option_description': '', 'can_get_free_license': '0', 'is_free_license...","[574941, 7]","{'minimum': '  <p><strong>Minimum:</strong> 500 mhz processor, 96mb ram, 16mb video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>  <p><strong>Recommended:</strong> 800 mhz processor, 128mb ram, 32mb+ video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>  '}","{'windows': True, 'mac': True, 'linux': True}","{'currency': 'EUR', 'initial': 819, 'final': 819, 'discount_percent': 0, 'initial_formatted': '', 'final_formatted': '8,19€'}",[Valve],"{'coming_soon': False, 'date': '1 Nov, 2000'}",0.0,"[{'id': 0, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/10/0000000132.600x338.jpg?t=1602535893', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/apps/10/0000000132.1920x1080.jpg?t=1602535893'}, {'id': 1, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/10/0000000133.600x338.jpg?t=1602535893', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/apps...",Play the world's number 1 online action game. Engage in an incredibly realistic brand of terrorist warfare in this wildly popular team-based game. Ally with teammates to complete strategic missions. Take out enemy sites. Rescue hostages. Your role affects your team's success. Your team's success affects your role.,10.0,"{'url': 'http://steamcommunity.com/app/10', 'email': ''}",game,,10.0,212.0,8690.0,16837.0,Valve,0,Action,999,"English, French, German, Italian, Spanish - Spain, Simplified Chinese, Traditional Chinese, Korean",116.0,239.0,Counter-Strike,4944.0,"10,000,000 .. 20,000,000",193192.0,999,Valve,,"{'Action': 5379, 'FPS': 4801, 'Multiplayer': 3362, 'Shooter': 3327, 'Classic': 2758, 'Team-Based': 1844, 'First-Person': 1692, 'Competitive': 1588, 'Tactical': 1323, '1990's': 1181, 'e-sports': 1173, 'PvP': 865, 'Old School': 751, 'Military': 623, 'Strategy': 604, 'Survival': 296, 'Score Attack': 285, '1980s': 256, 'Assassin': 223, 'Violent': 65}",0.0,0,1
1,"One of the most popular online action games of all time, Team Fortress Classic features over nine character classes -- from Medic to Spy to Demolition Man -- enlisted in a unique style of online team warfare. Each character class possesses unique weapons, items, and abilities, as teams compete online in a variety of game play modes.",https://cdn.akamai.steamstatic.com/steam/apps/20/page_bg_generated_v6b.jpg?t=1579634708,"[{'id': 1, 'description': 'Multi-player'}, {'id': 49, 'description': 'PvP'}, {'id': 36, 'description': 'Online PvP'}, {'id': 37, 'description': 'Shared/Split Screen PvP'}, {'id': 8, 'description': 'Valve Anti-Cheat enabled'}, {'id': 44, 'description': 'Remote Play Together'}]","{'ids': [2, 5], 'notes': 'Includes intense violence and blood.'}","One of the most popular online action games of all time, Team Fortress Classic features over nine character classes -- from Medic to Spy to Demolition Man -- enlisted in a unique style of online team warfare. Each character class possesses unique weapons, items, and abilities, as teams compete online in a variety of game play modes.","[{'id': '1', 'description': 'Action'}]",https://cdn.akamai.steamstatic.com/steam/apps/20/header.jpg?t=1579634708,0,"{'minimum': 'Minimum: Linux Ubuntu 12.04, Dual-core from Intel or AMD at 2.8 GHz, 1GB Memory, nVidia GeForce 8600/9600GT, ATI/AMD Radeaon HD2600/3600 (Graphic Drivers: nVidia 310, AMD 12.11), OpenGL 2.1, 4GB Hard Drive Space, OpenAL Compatible Sound Card'}","{'minimum': 'Minimum: OS X Snow Leopard 10.6.3, 1GB RAM, 4GB Hard Drive Space,NVIDIA GeForce 8 or higher, ATI X1600 or higher, or Intel HD 3000 or higher Mouse, Keyboard, Internet Connection'}",Team Fortress Classic,"[{'name': 'default', 'title': 'Buy Team Fortress Classic', 'description': '', 'selection_text': 'Select a purchase option', 'save_text': '', 'display_type': 0, 'is_recurring_subscription': 'false', 'subs': [{'packageid': 29, 'percent_savings_text': ' ', 'percent_savings': 0, 'option_text': 'Team Fortress Classic - S$5.25', 'option_description': '', 'can_get_free_license': '0', 'is_free_license...",[29],"{'minimum': '  <p><strong>Minimum:</strong> 500 mhz processor, 96mb ram, 16mb video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>  <p><strong>Recommended:</strong> 800 mhz processor, 128mb ram, 32mb+ video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>  '}","{'windows': True, 'mac': True, 'linux': True}","{'currency': 'SGD', 'initial': 525, 'final': 525, 'discount_percent': 0, 'initial_formatted': '', 'final_formatted': 'S$5.25'}",[Valve],"{'coming_soon': False, 'date': '1 Apr, 1999'}",0.0,"[{'id': 0, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/20/0000000164.600x338.jpg?t=1579634708', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/apps/20/0000000164.1920x1080.jpg?t=1579634708'}, {'id': 1, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/20/0000000165.600x338.jpg?t=1579634708', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/apps...","One of the most popular online action games of all time, Team Fortress Classic features over nine character classes -- from Medic to Spy to Demolition Man -- enlisted in a unique style of online team warfare. Each character class possesses unique weapons, items, and abilities, as teams compete online in a variety of game play modes.",20.0,"{'url': '', 'email': ''}",game,,20.0,0.0,2752.0,77.0,Valve,0,Action,499,"English, French, German, Italian, Spanish - Spain, Korean, Russian, Simplified Chinese, Traditional Chinese",0.0,16.0,Team Fortress Classic,896.0,"5,000,000 .. 10,000,000",5416.0,499,Valve,,"{'Action': 745, 'FPS': 306, 'Multiplayer': 257, 'Classic': 232, 'Hero Shooter': 213, 'Shooter': 206, 'Team-Based': 188, 'Class-Based': 181, 'First-Person': 169, '1990's': 132, 'Old School': 106, 'Co-op': 89, 'Competitive': 68, 'Fast-Paced': 61, 'Retro': 55, 'Online Co-Op': 51, 'Violent': 45, 'Mod': 36, 'Funny': 35, 'Remake': 35}",0.0,0,1
2,"Enlist in an intense brand of Axis vs. Allied teamplay set in the WWII European Theatre of Operations. Players assume the role of light/assault/heavy infantry, sniper or machine-gunner class, each with a unique arsenal of historical weaponry at their disposal. Missions are based on key historical operations. And, as war rages, players must work together with their squad to accomplish a variety...",https://cdn.akamai.steamstatic.com/steam/apps/30/page_bg_generated_v6b.jpg?t=1512413490,"[{'id': 1, 'description': 'Multi-player'}, {'id': 8, 'description': 'Valve Anti-Cheat enabled'}]","{'ids': [], 'notes': None}","Enlist in an intense brand of Axis vs. Allied teamplay set in the WWII European Theatre of Operations. Players assume the role of light/assault/heavy infantry, sniper or machine-gunner class, each with a unique arsenal of historical weaponry at their disposal. Missions are based on key historical operations. And, as war rages, players must work together with their squad to accomplish a variety...","[{'id': '1', 'description': 'Action'}]",https://cdn.akamai.steamstatic.com/steam/apps/30/header.jpg?t=1512413490,0,"{'minimum': 'Minimum: Linux Ubuntu 12.04, Dual-core from Intel or AMD at 2.8 GHz, 1GB Memory, nVidia GeForce 8600/9600GT, ATI/AMD Radeaon HD2600/3600 (Graphic Drivers: nVidia 310, AMD 12.11), OpenGL 2.1, 4GB Hard Drive Space, OpenAL Compatible Sound Card'}","{'minimum': 'Minimum: OS X Snow Leopard 10.6.3, 1GB RAM, 4GB Hard Drive Space,NVIDIA GeForce 8 or higher, ATI X1600 or higher, or Intel HD 3000 or higher Mouse, Keyboard, Internet Connection'}",Day of Defeat,"[{'name': 'default', 'title': 'Buy Day of Defeat', 'description': '', 'selection_text': 'Select a purchase option', 'save_text': '', 'display_type': 0, 'is_recurring_subscription': 'false', 'subs': [{'packageid': 30, 'percent_savings_text': ' ', 'percent_savings': 0, 'option_text': 'Day of Defeat - S$5.25', 'option_description': '', 'can_get_free_license': '0', 'is_free_license': False, 'price...",[30],"{'minimum': '  <p><strong>Minimum:</strong> 500 mhz processor, 96mb ram, 16mb video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>  <p><strong>Recommended:</strong> 800 mhz processor, 128mb ram, 32mb+ video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>  '}","{'windows': True, 'mac': True, 'linux': True}","{'currency': 'SGD', 'initial': 525, 'final': 525, 'discount_percent': 0, 'initial_formatted': '', 'final_formatted': 'S$5.25'}",[Valve],"{'coming_soon': False, 'date': '1 May, 2003'}",0.0,"[{'id': 0, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/30/0000000169.600x338.jpg?t=1512413490', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/apps/30/0000000169.1920x1080.jpg?t=1512413490'}, {'id': 1, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/30/0000000170.600x338.jpg?t=1512413490', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/apps...","Enlist in an intense brand of Axis vs. Allied teamplay set in the WWII European Theatre of Operations. Players assume the role of light/assault/heavy infantry, sniper or machine-gunner class, each with a unique arsenal of historical weaponry at their disposal. Missions are based on key historical operations.",30.0,"{'url': '', 'email': ''}",game,,30.0,0.0,4250.0,139.0,Valve,0,Action,499,"English, French, German, Italian, Spanish - Spain",0.0,28.0,Day of Defeat,557.0,"5,000,000 .. 10,000,000",5007.0,499,Valve,,"{'FPS': 788, 'World War II': 249, 'Multiplayer': 202, 'Shooter': 188, 'Action': 160, 'War': 151, 'Team-Based': 131, 'Classic': 126, 'First-Person': 105, 'Class-Based': 77, 'Military': 64, 'Historical': 57, 'Tactical': 40, 'Singleplayer': 37, 'Co-op': 34, 'Difficult': 18, 'Old School': 16, 'Retro': 14, 'World War I': 14, 'Strategy': 13}",0.0,0,1


In [63]:
# see df info
print(f"Info on merged data")
df_steam_clean.info()

Info on merged data
<class 'pandas.core.frame.DataFrame'>
Int64Index: 50021 entries, 0 to 50204
Data columns (total 47 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   about_the_game        50021 non-null  object 
 1   background            50021 non-null  object 
 2   categories            49082 non-null  object 
 3   content_descriptors   50021 non-null  object 
 4   detailed_description  50021 non-null  object 
 5   genres                49932 non-null  object 
 6   header_image          50021 non-null  object 
 7   is_free               50021 non-null  int32  
 8   linux_requirements    50021 non-null  object 
 9   mac_requirements      50021 non-null  object 
 10  name_x                50021 non-null  object 
 11  package_groups        50021 non-null  object 
 12  packages              50021 non-null  object 
 13  pc_requirements       50021 non-null  object 
 14  platforms             50021 non-null  object 
 15 

In [64]:
# see if there is any null data
print(f"In merged dataset -", end=" ")
num_col_null(df_steam_clean)

In merged dataset - Number of columns with null values: 4



Unnamed: 0,column,number of null values,percentage
0,categories,939,0.018772
1,genres,89,0.001779
2,screenshots,74,0.001479
3,movies,2297,0.045921


#### `categories`, `genres`

As the null values for each individual are all lesser than 2% of the data, we will take a look at the number of rows being affected. 

In [65]:
# calculate number of rows with null values
no_rows_with_null = len(df_steam_clean.loc[(df_steam_clean['categories'].isnull()) | 
                                        (df_steam_clean['genres'].isnull()),:])
print(f"Number of rows with null values: {no_rows_with_null}. Percentage: {no_rows_with_null/len(df_steam_clean)}")

Number of rows with null values: 1006. Percentage: 0.020111553147677975


Combining the rows, we see that the rows with missing data make up around $2\%$ of data. As the data is small, we will drop these rows of data.

In [66]:
# remove `1006` rows with missing data
df_steam_clean = df_steam_clean[~((df_steam_clean['categories'].isnull()) | 
                                  (df_steam_clean['genres'].isnull()))]

In [67]:
# see df shape and size
print(f"Shape of merged data : {df_steam_clean.shape}")
print(f"First 3 rows of merged data")
df_steam_clean.head(3)

Shape of merged data : (49015, 47)
First 3 rows of merged data


Unnamed: 0,about_the_game,background,categories,content_descriptors,detailed_description,genres,header_image,is_free,linux_requirements,mac_requirements,name_x,package_groups,packages,pc_requirements,platforms,price_overview,publishers,release_date,required_age,screenshots,short_description,steam_appid,support_info,type,movies,appid,average_2weeks,average_forever,ccu,developer,discount,genre,initialprice,languages,median_2weeks,median_forever,name_y,negative,owners,positive,price,publisher,score_rank,tags,userscore,has_movies,has_screenshots
0,Play the world's number 1 online action game. Engage in an incredibly realistic brand of terrorist warfare in this wildly popular team-based game. Ally with teammates to complete strategic missions. Take out enemy sites. Rescue hostages. Your role affects your team's success. Your team's success affects your role.,https://cdn.akamai.steamstatic.com/steam/apps/10/page_bg_generated_v6b.jpg?t=1602535893,"[{'id': 1, 'description': 'Multi-player'}, {'id': 49, 'description': 'PvP'}, {'id': 36, 'description': 'Online PvP'}, {'id': 37, 'description': 'Shared/Split Screen PvP'}, {'id': 8, 'description': 'Valve Anti-Cheat enabled'}]","{'ids': [2, 5], 'notes': 'Includes intense violence and blood.'}",Play the world's number 1 online action game. Engage in an incredibly realistic brand of terrorist warfare in this wildly popular team-based game. Ally with teammates to complete strategic missions. Take out enemy sites. Rescue hostages. Your role affects your team's success. Your team's success affects your role.,"[{'id': '1', 'description': 'Action'}]",https://cdn.akamai.steamstatic.com/steam/apps/10/header.jpg?t=1602535893,0,"{'minimum': 'Minimum: Linux Ubuntu 12.04, Dual-core from Intel or AMD at 2.8 GHz, 1GB Memory, nVidia GeForce 8600/9600GT, ATI/AMD Radeaon HD2600/3600 (Graphic Drivers: nVidia 310, AMD 12.11), OpenGL 2.1, 4GB Hard Drive Space, OpenAL Compatible Sound Card'}","{'minimum': 'Minimum: OS X Snow Leopard 10.6.3, 1GB RAM, 4GB Hard Drive Space,NVIDIA GeForce 8 or higher, ATI X1600 or higher, or Intel HD 3000 or higher Mouse, Keyboard, Internet Connection'}",Counter-Strike,"[{'name': 'default', 'title': 'Buy Counter-Strike', 'description': '', 'selection_text': 'Select a purchase option', 'save_text': '', 'display_type': 0, 'is_recurring_subscription': 'false', 'subs': [{'packageid': 7, 'percent_savings_text': ' ', 'percent_savings': 0, 'option_text': 'Counter-Strike: Condition Zero - 8,19€', 'option_description': '', 'can_get_free_license': '0', 'is_free_license...","[574941, 7]","{'minimum': '  <p><strong>Minimum:</strong> 500 mhz processor, 96mb ram, 16mb video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>  <p><strong>Recommended:</strong> 800 mhz processor, 128mb ram, 32mb+ video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>  '}","{'windows': True, 'mac': True, 'linux': True}","{'currency': 'EUR', 'initial': 819, 'final': 819, 'discount_percent': 0, 'initial_formatted': '', 'final_formatted': '8,19€'}",[Valve],"{'coming_soon': False, 'date': '1 Nov, 2000'}",0.0,"[{'id': 0, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/10/0000000132.600x338.jpg?t=1602535893', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/apps/10/0000000132.1920x1080.jpg?t=1602535893'}, {'id': 1, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/10/0000000133.600x338.jpg?t=1602535893', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/apps...",Play the world's number 1 online action game. Engage in an incredibly realistic brand of terrorist warfare in this wildly popular team-based game. Ally with teammates to complete strategic missions. Take out enemy sites. Rescue hostages. Your role affects your team's success. Your team's success affects your role.,10.0,"{'url': 'http://steamcommunity.com/app/10', 'email': ''}",game,,10.0,212.0,8690.0,16837.0,Valve,0,Action,999,"English, French, German, Italian, Spanish - Spain, Simplified Chinese, Traditional Chinese, Korean",116.0,239.0,Counter-Strike,4944.0,"10,000,000 .. 20,000,000",193192.0,999,Valve,,"{'Action': 5379, 'FPS': 4801, 'Multiplayer': 3362, 'Shooter': 3327, 'Classic': 2758, 'Team-Based': 1844, 'First-Person': 1692, 'Competitive': 1588, 'Tactical': 1323, '1990's': 1181, 'e-sports': 1173, 'PvP': 865, 'Old School': 751, 'Military': 623, 'Strategy': 604, 'Survival': 296, 'Score Attack': 285, '1980s': 256, 'Assassin': 223, 'Violent': 65}",0.0,0,1
1,"One of the most popular online action games of all time, Team Fortress Classic features over nine character classes -- from Medic to Spy to Demolition Man -- enlisted in a unique style of online team warfare. Each character class possesses unique weapons, items, and abilities, as teams compete online in a variety of game play modes.",https://cdn.akamai.steamstatic.com/steam/apps/20/page_bg_generated_v6b.jpg?t=1579634708,"[{'id': 1, 'description': 'Multi-player'}, {'id': 49, 'description': 'PvP'}, {'id': 36, 'description': 'Online PvP'}, {'id': 37, 'description': 'Shared/Split Screen PvP'}, {'id': 8, 'description': 'Valve Anti-Cheat enabled'}, {'id': 44, 'description': 'Remote Play Together'}]","{'ids': [2, 5], 'notes': 'Includes intense violence and blood.'}","One of the most popular online action games of all time, Team Fortress Classic features over nine character classes -- from Medic to Spy to Demolition Man -- enlisted in a unique style of online team warfare. Each character class possesses unique weapons, items, and abilities, as teams compete online in a variety of game play modes.","[{'id': '1', 'description': 'Action'}]",https://cdn.akamai.steamstatic.com/steam/apps/20/header.jpg?t=1579634708,0,"{'minimum': 'Minimum: Linux Ubuntu 12.04, Dual-core from Intel or AMD at 2.8 GHz, 1GB Memory, nVidia GeForce 8600/9600GT, ATI/AMD Radeaon HD2600/3600 (Graphic Drivers: nVidia 310, AMD 12.11), OpenGL 2.1, 4GB Hard Drive Space, OpenAL Compatible Sound Card'}","{'minimum': 'Minimum: OS X Snow Leopard 10.6.3, 1GB RAM, 4GB Hard Drive Space,NVIDIA GeForce 8 or higher, ATI X1600 or higher, or Intel HD 3000 or higher Mouse, Keyboard, Internet Connection'}",Team Fortress Classic,"[{'name': 'default', 'title': 'Buy Team Fortress Classic', 'description': '', 'selection_text': 'Select a purchase option', 'save_text': '', 'display_type': 0, 'is_recurring_subscription': 'false', 'subs': [{'packageid': 29, 'percent_savings_text': ' ', 'percent_savings': 0, 'option_text': 'Team Fortress Classic - S$5.25', 'option_description': '', 'can_get_free_license': '0', 'is_free_license...",[29],"{'minimum': '  <p><strong>Minimum:</strong> 500 mhz processor, 96mb ram, 16mb video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>  <p><strong>Recommended:</strong> 800 mhz processor, 128mb ram, 32mb+ video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>  '}","{'windows': True, 'mac': True, 'linux': True}","{'currency': 'SGD', 'initial': 525, 'final': 525, 'discount_percent': 0, 'initial_formatted': '', 'final_formatted': 'S$5.25'}",[Valve],"{'coming_soon': False, 'date': '1 Apr, 1999'}",0.0,"[{'id': 0, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/20/0000000164.600x338.jpg?t=1579634708', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/apps/20/0000000164.1920x1080.jpg?t=1579634708'}, {'id': 1, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/20/0000000165.600x338.jpg?t=1579634708', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/apps...","One of the most popular online action games of all time, Team Fortress Classic features over nine character classes -- from Medic to Spy to Demolition Man -- enlisted in a unique style of online team warfare. Each character class possesses unique weapons, items, and abilities, as teams compete online in a variety of game play modes.",20.0,"{'url': '', 'email': ''}",game,,20.0,0.0,2752.0,77.0,Valve,0,Action,499,"English, French, German, Italian, Spanish - Spain, Korean, Russian, Simplified Chinese, Traditional Chinese",0.0,16.0,Team Fortress Classic,896.0,"5,000,000 .. 10,000,000",5416.0,499,Valve,,"{'Action': 745, 'FPS': 306, 'Multiplayer': 257, 'Classic': 232, 'Hero Shooter': 213, 'Shooter': 206, 'Team-Based': 188, 'Class-Based': 181, 'First-Person': 169, '1990's': 132, 'Old School': 106, 'Co-op': 89, 'Competitive': 68, 'Fast-Paced': 61, 'Retro': 55, 'Online Co-Op': 51, 'Violent': 45, 'Mod': 36, 'Funny': 35, 'Remake': 35}",0.0,0,1
2,"Enlist in an intense brand of Axis vs. Allied teamplay set in the WWII European Theatre of Operations. Players assume the role of light/assault/heavy infantry, sniper or machine-gunner class, each with a unique arsenal of historical weaponry at their disposal. Missions are based on key historical operations. And, as war rages, players must work together with their squad to accomplish a variety...",https://cdn.akamai.steamstatic.com/steam/apps/30/page_bg_generated_v6b.jpg?t=1512413490,"[{'id': 1, 'description': 'Multi-player'}, {'id': 8, 'description': 'Valve Anti-Cheat enabled'}]","{'ids': [], 'notes': None}","Enlist in an intense brand of Axis vs. Allied teamplay set in the WWII European Theatre of Operations. Players assume the role of light/assault/heavy infantry, sniper or machine-gunner class, each with a unique arsenal of historical weaponry at their disposal. Missions are based on key historical operations. And, as war rages, players must work together with their squad to accomplish a variety...","[{'id': '1', 'description': 'Action'}]",https://cdn.akamai.steamstatic.com/steam/apps/30/header.jpg?t=1512413490,0,"{'minimum': 'Minimum: Linux Ubuntu 12.04, Dual-core from Intel or AMD at 2.8 GHz, 1GB Memory, nVidia GeForce 8600/9600GT, ATI/AMD Radeaon HD2600/3600 (Graphic Drivers: nVidia 310, AMD 12.11), OpenGL 2.1, 4GB Hard Drive Space, OpenAL Compatible Sound Card'}","{'minimum': 'Minimum: OS X Snow Leopard 10.6.3, 1GB RAM, 4GB Hard Drive Space,NVIDIA GeForce 8 or higher, ATI X1600 or higher, or Intel HD 3000 or higher Mouse, Keyboard, Internet Connection'}",Day of Defeat,"[{'name': 'default', 'title': 'Buy Day of Defeat', 'description': '', 'selection_text': 'Select a purchase option', 'save_text': '', 'display_type': 0, 'is_recurring_subscription': 'false', 'subs': [{'packageid': 30, 'percent_savings_text': ' ', 'percent_savings': 0, 'option_text': 'Day of Defeat - S$5.25', 'option_description': '', 'can_get_free_license': '0', 'is_free_license': False, 'price...",[30],"{'minimum': '  <p><strong>Minimum:</strong> 500 mhz processor, 96mb ram, 16mb video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>  <p><strong>Recommended:</strong> 800 mhz processor, 128mb ram, 32mb+ video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>  '}","{'windows': True, 'mac': True, 'linux': True}","{'currency': 'SGD', 'initial': 525, 'final': 525, 'discount_percent': 0, 'initial_formatted': '', 'final_formatted': 'S$5.25'}",[Valve],"{'coming_soon': False, 'date': '1 May, 2003'}",0.0,"[{'id': 0, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/30/0000000169.600x338.jpg?t=1512413490', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/apps/30/0000000169.1920x1080.jpg?t=1512413490'}, {'id': 1, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/30/0000000170.600x338.jpg?t=1512413490', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/apps...","Enlist in an intense brand of Axis vs. Allied teamplay set in the WWII European Theatre of Operations. Players assume the role of light/assault/heavy infantry, sniper or machine-gunner class, each with a unique arsenal of historical weaponry at their disposal. Missions are based on key historical operations.",30.0,"{'url': '', 'email': ''}",game,,30.0,0.0,4250.0,139.0,Valve,0,Action,499,"English, French, German, Italian, Spanish - Spain",0.0,28.0,Day of Defeat,557.0,"5,000,000 .. 10,000,000",5007.0,499,Valve,,"{'FPS': 788, 'World War II': 249, 'Multiplayer': 202, 'Shooter': 188, 'Action': 160, 'War': 151, 'Team-Based': 131, 'Classic': 126, 'First-Person': 105, 'Class-Based': 77, 'Military': 64, 'Historical': 57, 'Tactical': 40, 'Singleplayer': 37, 'Co-op': 34, 'Difficult': 18, 'Old School': 16, 'Retro': 14, 'World War I': 14, 'Strategy': 13}",0.0,0,1


In [68]:
# see df info
print(f"Info on merged data")
df_steam_clean.info()

Info on merged data
<class 'pandas.core.frame.DataFrame'>
Int64Index: 49015 entries, 0 to 50204
Data columns (total 47 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   about_the_game        49015 non-null  object 
 1   background            49015 non-null  object 
 2   categories            49015 non-null  object 
 3   content_descriptors   49015 non-null  object 
 4   detailed_description  49015 non-null  object 
 5   genres                49015 non-null  object 
 6   header_image          49015 non-null  object 
 7   is_free               49015 non-null  int32  
 8   linux_requirements    49015 non-null  object 
 9   mac_requirements      49015 non-null  object 
 10  name_x                49015 non-null  object 
 11  package_groups        49015 non-null  object 
 12  packages              49015 non-null  object 
 13  pc_requirements       49015 non-null  object 
 14  platforms             49015 non-null  object 
 15 

In [69]:
# see if there is any null data
print(f"In merged dataset -", end=" ")
num_col_null(df_steam_clean)

In merged dataset - Number of columns with null values: 2



Unnamed: 0,column,number of null values,percentage
0,screenshots,51,0.00104
1,movies,2192,0.044721


As previously indicated, we had created 2 new columns related to this quirk. Now, we have finished the columns. We will take a look at the columns to determine if there is any further cleaning required. 


---

## Data cleaning - Column

We will look at some of the data to see in details to see if we need to manipulate the column information or drop certain column due to replication. 

#### `steam_appid`, `appid`

This two column are identified to be the identifier of the data. Let us confirm that this two columns are equal to each other. 

In [70]:
# shows number of rows that have different 'steam_appid' and 'appid'
print(f"Number of rows that have different 'steam_appid' and 'appid' : {(~(df_steam_clean['steam_appid'] == df_steam_clean['appid'])).sum()}. Percentage: {(((~(df_steam_clean['steam_appid'] == df_steam_clean['appid'])).sum())/len(df_steam_clean)):.4f}")

Number of rows that have different 'steam_appid' and 'appid' : 0. Percentage: 0.0000


We see that there is no rows that have different `steam_appid` and `appid`. This is expected as we had previously merged the two different dataset on this two column. 

We will drop `appid` from the table. 

In [71]:
# drop `appid` column
df_steam_clean = df_steam_clean.drop(columns=["appid"])

#### `name_x`, `name_y`

This two column suggests that they are the names of the game for that particular row. We expect that this two columns are identical to each other. Let us confirm our suspicion. 

In [72]:
# shows number of rows that have different 'name_x' and 'name_y'
print(f"Number of rows that have different 'name_x' and 'name_y' : {(~(df_steam_clean['name_x'] == df_steam_clean['name_y'])).sum()}. Percentage: {(((~(df_steam_clean['name_x'] == df_steam_clean['name_y'])).sum())/len(df_steam_clean)):.4f}")

Number of rows that have different 'name_x' and 'name_y' : 819. Percentage: 0.0167


We see that the percentage of the discrepency is less than $2\%$. Let us take a look on the difference. 

In [73]:
# see the first 10 rows
df_steam_clean.loc[(~(df_steam_clean['name_x'] == df_steam_clean['name_y'])), ['name_x', 'name_y']].head(10)

Unnamed: 0,name_x,name_y
64,Gumboy - Crazy Adventures™,Gumboy - Crazy Adventures
65,RIP - Trilogy™,RIP - Trilogy
66,Vigil: Blood Bitterness™,Vigil: Blood Bitterness
69,GUN™,GUN
70,Call of Duty®,Call of Duty
71,Call of Duty® 2,Call of Duty 2
73,RollerCoaster Tycoon® 3: Platinum,RollerCoaster Tycoon 3: Platinum
86,FlatOut 2™,FlatOut 2
113,Mystery P.I.™ - The Lottery Ticket,Mystery P.I. - The Lottery Ticket
114,Amazing Adventures The Lost Tomb™,Amazing Adventures The Lost Tomb


We see that the difference between the two columns seems to arise from the trademark symbol. We will keep `name_x` that shows the trademark logo and drop `name_y`. We will also rename the column `name_x` to `name`. 

In [74]:
# drop `name_y` column
df_steam_clean = df_steam_clean.drop(columns=["name_y"])

# rename the column
df_steam_clean.rename(columns={"name_x" : "name"}, inplace=True)

#### `publishers`, `publisher`

This two column suggests that they are the publisher(s) of the game for that particular row. We expect that this two columns are identical to each other. Let us confirm our suspicion. 

In [75]:
# shows number of rows that have different 'publishers' and 'publisher'
print(f"Number of rows that have different 'publishers' and 'publisher' : {(~(df_steam_clean['publishers'] == df_steam_clean['publisher'])).sum()}. Percentage: {(((~(df_steam_clean['publisher'] == df_steam_clean['publishers'])).sum())/len(df_steam_clean)):.4f}")

Number of rows that have different 'publishers' and 'publisher' : 49015. Percentage: 1.0000


We see that the both columns are different. Let us take a look at the first 10 rows of data.

In [76]:
# see the first 10 rows
df_steam_clean.loc[(~(df_steam_clean['publishers'] == df_steam_clean['publisher'])), ['publishers', 'publisher']].head(10)

Unnamed: 0,publishers,publisher
0,[Valve],Valve
1,[Valve],Valve
2,[Valve],Valve
3,[Valve],Valve
4,[Valve],Valve
5,[Valve],Valve
6,[Valve],Valve
7,[Valve],Valve
8,[Valve],Valve
9,[Valve],Valve


We see that the difference between the two columns is due to one being kept as a list, and the other as string. We will drop `publishers` that stores the data as list. 

In [77]:
# drop `publishers` column
df_steam_clean = df_steam_clean.drop(columns=["publishers"])

#### `genres`, `genre`

This two column suggests that they are the genres of the game for that particular row. We expect that this two columns are identical to each other. Let us confirm our suspicion. 

In [78]:
# shows number of rows that have different 'genres' and 'genre'
print(f"Number of rows that have different 'genres' and 'genre' : {(~(df_steam_clean['genres'] == df_steam_clean['genre'])).sum()}. Percentage: {(((~(df_steam_clean['genres'] == df_steam_clean['genre'])).sum())/len(df_steam_clean)):.4f}")

Number of rows that have different 'genres' and 'genre' : 49015. Percentage: 1.0000


We see that the both columns are different. Let us take a look at the first 10 rows of data.

In [79]:
# see the first 10 rows
df_steam_clean.loc[(~(df_steam_clean['genres'] == df_steam_clean['genre'])), ['genres', 'genre']].head(10)

Unnamed: 0,genres,genre
0,"[{'id': '1', 'description': 'Action'}]",Action
1,"[{'id': '1', 'description': 'Action'}]",Action
2,"[{'id': '1', 'description': 'Action'}]",Action
3,"[{'id': '1', 'description': 'Action'}]",Action
4,"[{'id': '1', 'description': 'Action'}]",Action
5,"[{'id': '1', 'description': 'Action'}]",Action
6,"[{'id': '1', 'description': 'Action'}]",Action
7,"[{'id': '1', 'description': 'Action'}]",Action
8,"[{'id': '1', 'description': 'Action'}]",Action
9,"[{'id': '1', 'description': 'Action'}]",Action


We see that the difference between the two columns is due to one being kept as a list, and the other as string. 

While `genres` includes `id` in the dictionary that holds the description, we are more interested in the description than the id value. As we are more interested in the description for our analysis. 

We will drop `genres` that stores the data as list and create another column, `genre_id` that stores the `id` values. 

In [80]:
# store all possible dictionaries
list_genres = unpack_cell(df_steam_clean, 'genres', 'genre_id', 'id')

100%|███████████████████████████████████████████████████████████████████████████████████████████████| 49015/49015 [00:00<00:00, 104318.50it/s]


In [81]:
# drop `genres` column
df_steam_clean = df_steam_clean.drop(columns=["genres"])

--- 

## Data Output

Now that we have finished cleaning the data, we will export the file to a pickle file for data organisation.

In [82]:
# export file to pkl
pkl_output('../data/clean_steam_all.pkl', df_steam_clean)

We will also export a dataframe of the genres id and description mapping for reference. 

In [83]:
df_genre = pd.DataFrame(list_genres).sort_values(by='id').reset_index(drop=True)

As the description does not have English as the first or the last row, we are unable to use the `duplicated` function to drop the rows. 

Lets see the unique 'id' and create a list of the index to keep. 

In [84]:
list_all_values = list(df_genre['id'].unique())

In [85]:
list_duplicated_values = list(df_genre.loc[df_genre['id'].duplicated(keep=False), 'id'].unique())

In [86]:
list_no_duplicated = []

for value in list_all_values:
    if value not in list_duplicated_values:
        list_no_duplicated.append(value)

In [87]:
list_index_keep = []
for index, row in df_genre['id'].iteritems():
    if row in list_no_duplicated:
        list_index_keep.append(index)

In [88]:
list_duplicated_values

['1', '18', '2', '23', '25', '28', '3', '37', '4', '70', '73', '74', '9']

In [89]:
for value in list_duplicated_values:
    print(f"for id : {value}")
    print(df_genre.loc[df_genre['id'] == value,'description'].index.tolist())
    print(df_genre.loc[df_genre['id'] == value,'description'].to_numpy())
    print()

for id : 1
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
['Action' 'Ação' 'Aksiyon' 'Aktion' 'Akcja' 'Экшены' 'アクション' 'Azione'
 'Acción' 'Akční']

for id : 18
[10, 11, 12]
['Sport' 'Sports' 'Спортивные игры']

for id : 2
[13, 14, 15, 16, 17, 18, 19]
['Stratégie' 'Strategie' 'Strategi' 'Strategy' 'Estratégia' 'Стратегии'
 '策略']

for id : 23
[20, 21, 22, 23, 24, 25, 26, 27, 28]
['インディー' 'Nezávislé' '独立' 'Indépendant' 'Інді' 'Bağımsız Yapımcı' 'Инди'
 'Niezależne' 'Indie']

for id : 25
[29, 30, 31, 32, 33, 34, 35, 36, 37, 38]
['Äventyr' 'Приключенческие игры' 'Aventura' 'Aventure' 'Macera'
 'Adventure' 'Avventura' 'Eventyr' '冒险' 'Abenteuer']

for id : 28
[39, 40, 41, 42, 43, 44, 45, 46, 47, 48]
['Simulação' 'Simulazione' 'Simulation' '模拟' 'Simulationen' 'Симуляторы'
 'Simuladores' 'Симулятори' 'Symulacje' 'Simülasyon']

for id : 3
[50, 51, 52, 53, 54, 55, 56]
['Rollespil (RPG)' 'RPG (rollspel)' 'Ролевые игры' 'RPG' 'Rol' '角色扮演'
 'GDR']

for id : 37
[57, 58, 59, 60, 61, 62, 63, 64, 65]
['Gratis at spille

By running the above code and comparing, we obtained the below list of values of index to keep. 

In [90]:
for index in [0, 11, 16, 28, 34, 41, 53, 60, 74, 89, 93, 94, 102]:
    list_index_keep.append(index)

In [91]:
df_genre = df_genre.loc[list_index_keep,:].sort_index()

In [92]:
# export file to pkl as precaution
pkl_output('../data/genre_id.pkl', df_genre)

In [93]:
# creating a new DB file
con = sqlite3.connect('../data/steam_db.db')

In [94]:
# storing the df in the steam_db db
df_genre.to_sql(name='genre_mapping',con=con, index=False, if_exists='replace')

33

In [95]:
df_genre.shape

(33, 2)

In [96]:
sql_query = '''
SELECT *
FROM genre_mapping
'''

pd.read_sql(sql_query, con)

Unnamed: 0,id,description
0,1,Action
1,18,Sports
2,2,Strategy
3,23,Indie
4,25,Adventure
5,28,Simulation
6,29,Massively Multiplayer
7,3,RPG
8,37,Free to Play
9,4,Casual
