# Data Organisation and Merging

In this notebook, we will be doing data cleaning and data merging. 

The general gist of the notebook will be
1. Update the values within the columns through cleaning or creating new column through feature engineering
2. Export the file to a database file

---

## Import Libraries

In this section, we will import all the libraries that will be used in this notebook.

In [1]:
# For Calculation and Data Manipulation
import numpy as np
import pandas as pd
import math

# For file exportion folder creation
import os

# for datetime conversion
import datetime

# for data storing
import sqlite3

# for null values
from utils import unpack_cell

# this setting widens how many characters pandas will display in a column:
pd.options.display.max_colwidth = 400

# this setting allows us to see up to 50 columns
pd.options.display.max_columns = 50

---

## Functions

In this section, we will list down all the functions that are being used in the notebook as a summary. The functions can be found in [utils.py](./utils.py).

1. Unpack cell (list containing dictionary or dictionary): `unpack_cell`

---

## Read data file

First, we will read in the pickle file containing the raw data extracted using a notebook similar to the previous code notebook. 

In [2]:
# read pickle file
df_steam = pd.read_pickle('../data/clean_steam_all.pkl')

In [3]:
# see df shape and size
print(f"Shape of data : {df_steam.shape}")
print(f"First 3 rows of Store data")
df_steam.head(3)

Shape of data : (49015, 44)
First 3 rows of Store data


Unnamed: 0,about_the_game,background,categories,content_descriptors,detailed_description,header_image,is_free,linux_requirements,mac_requirements,name,package_groups,packages,pc_requirements,platforms,price_overview,release_date,required_age,screenshots,short_description,steam_appid,support_info,type,movies,average_2weeks,average_forever,ccu,developer,discount,genre,initialprice,languages,median_2weeks,median_forever,negative,owners,positive,price,publisher,score_rank,tags,userscore,has_movies,has_screenshots,genre_id
0,Play the world's number 1 online action game. Engage in an incredibly realistic brand of terrorist warfare in this wildly popular team-based game. Ally with teammates to complete strategic missions. Take out enemy sites. Rescue hostages. Your role affects your team's success. Your team's success affects your role.,https://cdn.akamai.steamstatic.com/steam/apps/10/page_bg_generated_v6b.jpg?t=1602535893,"[{'id': 1, 'description': 'Multi-player'}, {'id': 49, 'description': 'PvP'}, {'id': 36, 'description': 'Online PvP'}, {'id': 37, 'description': 'Shared/Split Screen PvP'}, {'id': 8, 'description': 'Valve Anti-Cheat enabled'}]","{'ids': [2, 5], 'notes': 'Includes intense violence and blood.'}",Play the world's number 1 online action game. Engage in an incredibly realistic brand of terrorist warfare in this wildly popular team-based game. Ally with teammates to complete strategic missions. Take out enemy sites. Rescue hostages. Your role affects your team's success. Your team's success affects your role.,https://cdn.akamai.steamstatic.com/steam/apps/10/header.jpg?t=1602535893,0,"{'minimum': 'Minimum: Linux Ubuntu 12.04, Dual-core from Intel or AMD at 2.8 GHz, 1GB Memory, nVidia GeForce 8600/9600GT, ATI/AMD Radeaon HD2600/3600 (Graphic Drivers: nVidia 310, AMD 12.11), OpenGL 2.1, 4GB Hard Drive Space, OpenAL Compatible Sound Card'}","{'minimum': 'Minimum: OS X Snow Leopard 10.6.3, 1GB RAM, 4GB Hard Drive Space,NVIDIA GeForce 8 or higher, ATI X1600 or higher, or Intel HD 3000 or higher Mouse, Keyboard, Internet Connection'}",Counter-Strike,"[{'name': 'default', 'title': 'Buy Counter-Strike', 'description': '', 'selection_text': 'Select a purchase option', 'save_text': '', 'display_type': 0, 'is_recurring_subscription': 'false', 'subs': [{'packageid': 7, 'percent_savings_text': ' ', 'percent_savings': 0, 'option_text': 'Counter-Strike: Condition Zero - 8,19€', 'option_description': '', 'can_get_free_license': '0', 'is_free_license...","[574941, 7]","{'minimum': '  <p><strong>Minimum:</strong> 500 mhz processor, 96mb ram, 16mb video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>  <p><strong>Recommended:</strong> 800 mhz processor, 128mb ram, 32mb+ video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>  '}","{'windows': True, 'mac': True, 'linux': True}","{'currency': 'EUR', 'initial': 819, 'final': 819, 'discount_percent': 0, 'initial_formatted': '', 'final_formatted': '8,19€'}","{'coming_soon': False, 'date': '1 Nov, 2000'}",0.0,"[{'id': 0, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/10/0000000132.600x338.jpg?t=1602535893', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/apps/10/0000000132.1920x1080.jpg?t=1602535893'}, {'id': 1, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/10/0000000133.600x338.jpg?t=1602535893', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/apps...",Play the world's number 1 online action game. Engage in an incredibly realistic brand of terrorist warfare in this wildly popular team-based game. Ally with teammates to complete strategic missions. Take out enemy sites. Rescue hostages. Your role affects your team's success. Your team's success affects your role.,10.0,"{'url': 'http://steamcommunity.com/app/10', 'email': ''}",game,,212.0,8690.0,16837.0,Valve,0,Action,999,"English, French, German, Italian, Spanish - Spain, Simplified Chinese, Traditional Chinese, Korean",116.0,239.0,4944.0,"10,000,000 .. 20,000,000",193192.0,999,Valve,,"{'Action': 5379, 'FPS': 4801, 'Multiplayer': 3362, 'Shooter': 3327, 'Classic': 2758, 'Team-Based': 1844, 'First-Person': 1692, 'Competitive': 1588, 'Tactical': 1323, '1990's': 1181, 'e-sports': 1173, 'PvP': 865, 'Old School': 751, 'Military': 623, 'Strategy': 604, 'Survival': 296, 'Score Attack': 285, '1980s': 256, 'Assassin': 223, 'Violent': 65}",0.0,0,1,[1]
1,"One of the most popular online action games of all time, Team Fortress Classic features over nine character classes -- from Medic to Spy to Demolition Man -- enlisted in a unique style of online team warfare. Each character class possesses unique weapons, items, and abilities, as teams compete online in a variety of game play modes.",https://cdn.akamai.steamstatic.com/steam/apps/20/page_bg_generated_v6b.jpg?t=1579634708,"[{'id': 1, 'description': 'Multi-player'}, {'id': 49, 'description': 'PvP'}, {'id': 36, 'description': 'Online PvP'}, {'id': 37, 'description': 'Shared/Split Screen PvP'}, {'id': 8, 'description': 'Valve Anti-Cheat enabled'}, {'id': 44, 'description': 'Remote Play Together'}]","{'ids': [2, 5], 'notes': 'Includes intense violence and blood.'}","One of the most popular online action games of all time, Team Fortress Classic features over nine character classes -- from Medic to Spy to Demolition Man -- enlisted in a unique style of online team warfare. Each character class possesses unique weapons, items, and abilities, as teams compete online in a variety of game play modes.",https://cdn.akamai.steamstatic.com/steam/apps/20/header.jpg?t=1579634708,0,"{'minimum': 'Minimum: Linux Ubuntu 12.04, Dual-core from Intel or AMD at 2.8 GHz, 1GB Memory, nVidia GeForce 8600/9600GT, ATI/AMD Radeaon HD2600/3600 (Graphic Drivers: nVidia 310, AMD 12.11), OpenGL 2.1, 4GB Hard Drive Space, OpenAL Compatible Sound Card'}","{'minimum': 'Minimum: OS X Snow Leopard 10.6.3, 1GB RAM, 4GB Hard Drive Space,NVIDIA GeForce 8 or higher, ATI X1600 or higher, or Intel HD 3000 or higher Mouse, Keyboard, Internet Connection'}",Team Fortress Classic,"[{'name': 'default', 'title': 'Buy Team Fortress Classic', 'description': '', 'selection_text': 'Select a purchase option', 'save_text': '', 'display_type': 0, 'is_recurring_subscription': 'false', 'subs': [{'packageid': 29, 'percent_savings_text': ' ', 'percent_savings': 0, 'option_text': 'Team Fortress Classic - S$5.25', 'option_description': '', 'can_get_free_license': '0', 'is_free_license...",[29],"{'minimum': '  <p><strong>Minimum:</strong> 500 mhz processor, 96mb ram, 16mb video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>  <p><strong>Recommended:</strong> 800 mhz processor, 128mb ram, 32mb+ video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>  '}","{'windows': True, 'mac': True, 'linux': True}","{'currency': 'SGD', 'initial': 525, 'final': 525, 'discount_percent': 0, 'initial_formatted': '', 'final_formatted': 'S$5.25'}","{'coming_soon': False, 'date': '1 Apr, 1999'}",0.0,"[{'id': 0, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/20/0000000164.600x338.jpg?t=1579634708', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/apps/20/0000000164.1920x1080.jpg?t=1579634708'}, {'id': 1, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/20/0000000165.600x338.jpg?t=1579634708', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/apps...","One of the most popular online action games of all time, Team Fortress Classic features over nine character classes -- from Medic to Spy to Demolition Man -- enlisted in a unique style of online team warfare. Each character class possesses unique weapons, items, and abilities, as teams compete online in a variety of game play modes.",20.0,"{'url': '', 'email': ''}",game,,0.0,2752.0,77.0,Valve,0,Action,499,"English, French, German, Italian, Spanish - Spain, Korean, Russian, Simplified Chinese, Traditional Chinese",0.0,16.0,896.0,"5,000,000 .. 10,000,000",5416.0,499,Valve,,"{'Action': 745, 'FPS': 306, 'Multiplayer': 257, 'Classic': 232, 'Hero Shooter': 213, 'Shooter': 206, 'Team-Based': 188, 'Class-Based': 181, 'First-Person': 169, '1990's': 132, 'Old School': 106, 'Co-op': 89, 'Competitive': 68, 'Fast-Paced': 61, 'Retro': 55, 'Online Co-Op': 51, 'Violent': 45, 'Mod': 36, 'Funny': 35, 'Remake': 35}",0.0,0,1,[1]
2,"Enlist in an intense brand of Axis vs. Allied teamplay set in the WWII European Theatre of Operations. Players assume the role of light/assault/heavy infantry, sniper or machine-gunner class, each with a unique arsenal of historical weaponry at their disposal. Missions are based on key historical operations. And, as war rages, players must work together with their squad to accomplish a variety...",https://cdn.akamai.steamstatic.com/steam/apps/30/page_bg_generated_v6b.jpg?t=1512413490,"[{'id': 1, 'description': 'Multi-player'}, {'id': 8, 'description': 'Valve Anti-Cheat enabled'}]","{'ids': [], 'notes': None}","Enlist in an intense brand of Axis vs. Allied teamplay set in the WWII European Theatre of Operations. Players assume the role of light/assault/heavy infantry, sniper or machine-gunner class, each with a unique arsenal of historical weaponry at their disposal. Missions are based on key historical operations. And, as war rages, players must work together with their squad to accomplish a variety...",https://cdn.akamai.steamstatic.com/steam/apps/30/header.jpg?t=1512413490,0,"{'minimum': 'Minimum: Linux Ubuntu 12.04, Dual-core from Intel or AMD at 2.8 GHz, 1GB Memory, nVidia GeForce 8600/9600GT, ATI/AMD Radeaon HD2600/3600 (Graphic Drivers: nVidia 310, AMD 12.11), OpenGL 2.1, 4GB Hard Drive Space, OpenAL Compatible Sound Card'}","{'minimum': 'Minimum: OS X Snow Leopard 10.6.3, 1GB RAM, 4GB Hard Drive Space,NVIDIA GeForce 8 or higher, ATI X1600 or higher, or Intel HD 3000 or higher Mouse, Keyboard, Internet Connection'}",Day of Defeat,"[{'name': 'default', 'title': 'Buy Day of Defeat', 'description': '', 'selection_text': 'Select a purchase option', 'save_text': '', 'display_type': 0, 'is_recurring_subscription': 'false', 'subs': [{'packageid': 30, 'percent_savings_text': ' ', 'percent_savings': 0, 'option_text': 'Day of Defeat - S$5.25', 'option_description': '', 'can_get_free_license': '0', 'is_free_license': False, 'price...",[30],"{'minimum': '  <p><strong>Minimum:</strong> 500 mhz processor, 96mb ram, 16mb video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>  <p><strong>Recommended:</strong> 800 mhz processor, 128mb ram, 32mb+ video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>  '}","{'windows': True, 'mac': True, 'linux': True}","{'currency': 'SGD', 'initial': 525, 'final': 525, 'discount_percent': 0, 'initial_formatted': '', 'final_formatted': 'S$5.25'}","{'coming_soon': False, 'date': '1 May, 2003'}",0.0,"[{'id': 0, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/30/0000000169.600x338.jpg?t=1512413490', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/apps/30/0000000169.1920x1080.jpg?t=1512413490'}, {'id': 1, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/30/0000000170.600x338.jpg?t=1512413490', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/apps...","Enlist in an intense brand of Axis vs. Allied teamplay set in the WWII European Theatre of Operations. Players assume the role of light/assault/heavy infantry, sniper or machine-gunner class, each with a unique arsenal of historical weaponry at their disposal. Missions are based on key historical operations.",30.0,"{'url': '', 'email': ''}",game,,0.0,4250.0,139.0,Valve,0,Action,499,"English, French, German, Italian, Spanish - Spain",0.0,28.0,557.0,"5,000,000 .. 10,000,000",5007.0,499,Valve,,"{'FPS': 788, 'World War II': 249, 'Multiplayer': 202, 'Shooter': 188, 'Action': 160, 'War': 151, 'Team-Based': 131, 'Classic': 126, 'First-Person': 105, 'Class-Based': 77, 'Military': 64, 'Historical': 57, 'Tactical': 40, 'Singleplayer': 37, 'Co-op': 34, 'Difficult': 18, 'Old School': 16, 'Retro': 14, 'World War I': 14, 'Strategy': 13}",0.0,0,1,[1]


In [4]:
# see df info
print(f"Info on Steam data")
df_steam.info()

Info on Steam data
<class 'pandas.core.frame.DataFrame'>
Int64Index: 49015 entries, 0 to 50204
Data columns (total 44 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   about_the_game        49015 non-null  object 
 1   background            49015 non-null  object 
 2   categories            49015 non-null  object 
 3   content_descriptors   49015 non-null  object 
 4   detailed_description  49015 non-null  object 
 5   header_image          49015 non-null  object 
 6   is_free               49015 non-null  int32  
 7   linux_requirements    49015 non-null  object 
 8   mac_requirements      49015 non-null  object 
 9   name                  49015 non-null  object 
 10  package_groups        49015 non-null  object 
 11  packages              49015 non-null  object 
 12  pc_requirements       49015 non-null  object 
 13  platforms             49015 non-null  object 
 14  price_overview        49015 non-null  object 
 15  

In [5]:
# create a backup copy of the data
df_steam_copy = df_steam.copy()

---

## Data Organisation

Now let us look at the columns and determine how to group the data.

In [6]:
# see first 10 rows of dataset
# 10 is to see value in `movies`
df_steam.head(10)

Unnamed: 0,about_the_game,background,categories,content_descriptors,detailed_description,header_image,is_free,linux_requirements,mac_requirements,name,package_groups,packages,pc_requirements,platforms,price_overview,release_date,required_age,screenshots,short_description,steam_appid,support_info,type,movies,average_2weeks,average_forever,ccu,developer,discount,genre,initialprice,languages,median_2weeks,median_forever,negative,owners,positive,price,publisher,score_rank,tags,userscore,has_movies,has_screenshots,genre_id
0,Play the world's number 1 online action game. Engage in an incredibly realistic brand of terrorist warfare in this wildly popular team-based game. Ally with teammates to complete strategic missions. Take out enemy sites. Rescue hostages. Your role affects your team's success. Your team's success affects your role.,https://cdn.akamai.steamstatic.com/steam/apps/10/page_bg_generated_v6b.jpg?t=1602535893,"[{'id': 1, 'description': 'Multi-player'}, {'id': 49, 'description': 'PvP'}, {'id': 36, 'description': 'Online PvP'}, {'id': 37, 'description': 'Shared/Split Screen PvP'}, {'id': 8, 'description': 'Valve Anti-Cheat enabled'}]","{'ids': [2, 5], 'notes': 'Includes intense violence and blood.'}",Play the world's number 1 online action game. Engage in an incredibly realistic brand of terrorist warfare in this wildly popular team-based game. Ally with teammates to complete strategic missions. Take out enemy sites. Rescue hostages. Your role affects your team's success. Your team's success affects your role.,https://cdn.akamai.steamstatic.com/steam/apps/10/header.jpg?t=1602535893,0,"{'minimum': 'Minimum: Linux Ubuntu 12.04, Dual-core from Intel or AMD at 2.8 GHz, 1GB Memory, nVidia GeForce 8600/9600GT, ATI/AMD Radeaon HD2600/3600 (Graphic Drivers: nVidia 310, AMD 12.11), OpenGL 2.1, 4GB Hard Drive Space, OpenAL Compatible Sound Card'}","{'minimum': 'Minimum: OS X Snow Leopard 10.6.3, 1GB RAM, 4GB Hard Drive Space,NVIDIA GeForce 8 or higher, ATI X1600 or higher, or Intel HD 3000 or higher Mouse, Keyboard, Internet Connection'}",Counter-Strike,"[{'name': 'default', 'title': 'Buy Counter-Strike', 'description': '', 'selection_text': 'Select a purchase option', 'save_text': '', 'display_type': 0, 'is_recurring_subscription': 'false', 'subs': [{'packageid': 7, 'percent_savings_text': ' ', 'percent_savings': 0, 'option_text': 'Counter-Strike: Condition Zero - 8,19€', 'option_description': '', 'can_get_free_license': '0', 'is_free_license...","[574941, 7]","{'minimum': '  <p><strong>Minimum:</strong> 500 mhz processor, 96mb ram, 16mb video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>  <p><strong>Recommended:</strong> 800 mhz processor, 128mb ram, 32mb+ video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>  '}","{'windows': True, 'mac': True, 'linux': True}","{'currency': 'EUR', 'initial': 819, 'final': 819, 'discount_percent': 0, 'initial_formatted': '', 'final_formatted': '8,19€'}","{'coming_soon': False, 'date': '1 Nov, 2000'}",0.0,"[{'id': 0, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/10/0000000132.600x338.jpg?t=1602535893', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/apps/10/0000000132.1920x1080.jpg?t=1602535893'}, {'id': 1, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/10/0000000133.600x338.jpg?t=1602535893', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/apps...",Play the world's number 1 online action game. Engage in an incredibly realistic brand of terrorist warfare in this wildly popular team-based game. Ally with teammates to complete strategic missions. Take out enemy sites. Rescue hostages. Your role affects your team's success. Your team's success affects your role.,10.0,"{'url': 'http://steamcommunity.com/app/10', 'email': ''}",game,,212.0,8690.0,16837.0,Valve,0,Action,999,"English, French, German, Italian, Spanish - Spain, Simplified Chinese, Traditional Chinese, Korean",116.0,239.0,4944.0,"10,000,000 .. 20,000,000",193192.0,999,Valve,,"{'Action': 5379, 'FPS': 4801, 'Multiplayer': 3362, 'Shooter': 3327, 'Classic': 2758, 'Team-Based': 1844, 'First-Person': 1692, 'Competitive': 1588, 'Tactical': 1323, '1990's': 1181, 'e-sports': 1173, 'PvP': 865, 'Old School': 751, 'Military': 623, 'Strategy': 604, 'Survival': 296, 'Score Attack': 285, '1980s': 256, 'Assassin': 223, 'Violent': 65}",0.0,0,1,[1]
1,"One of the most popular online action games of all time, Team Fortress Classic features over nine character classes -- from Medic to Spy to Demolition Man -- enlisted in a unique style of online team warfare. Each character class possesses unique weapons, items, and abilities, as teams compete online in a variety of game play modes.",https://cdn.akamai.steamstatic.com/steam/apps/20/page_bg_generated_v6b.jpg?t=1579634708,"[{'id': 1, 'description': 'Multi-player'}, {'id': 49, 'description': 'PvP'}, {'id': 36, 'description': 'Online PvP'}, {'id': 37, 'description': 'Shared/Split Screen PvP'}, {'id': 8, 'description': 'Valve Anti-Cheat enabled'}, {'id': 44, 'description': 'Remote Play Together'}]","{'ids': [2, 5], 'notes': 'Includes intense violence and blood.'}","One of the most popular online action games of all time, Team Fortress Classic features over nine character classes -- from Medic to Spy to Demolition Man -- enlisted in a unique style of online team warfare. Each character class possesses unique weapons, items, and abilities, as teams compete online in a variety of game play modes.",https://cdn.akamai.steamstatic.com/steam/apps/20/header.jpg?t=1579634708,0,"{'minimum': 'Minimum: Linux Ubuntu 12.04, Dual-core from Intel or AMD at 2.8 GHz, 1GB Memory, nVidia GeForce 8600/9600GT, ATI/AMD Radeaon HD2600/3600 (Graphic Drivers: nVidia 310, AMD 12.11), OpenGL 2.1, 4GB Hard Drive Space, OpenAL Compatible Sound Card'}","{'minimum': 'Minimum: OS X Snow Leopard 10.6.3, 1GB RAM, 4GB Hard Drive Space,NVIDIA GeForce 8 or higher, ATI X1600 or higher, or Intel HD 3000 or higher Mouse, Keyboard, Internet Connection'}",Team Fortress Classic,"[{'name': 'default', 'title': 'Buy Team Fortress Classic', 'description': '', 'selection_text': 'Select a purchase option', 'save_text': '', 'display_type': 0, 'is_recurring_subscription': 'false', 'subs': [{'packageid': 29, 'percent_savings_text': ' ', 'percent_savings': 0, 'option_text': 'Team Fortress Classic - S$5.25', 'option_description': '', 'can_get_free_license': '0', 'is_free_license...",[29],"{'minimum': '  <p><strong>Minimum:</strong> 500 mhz processor, 96mb ram, 16mb video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>  <p><strong>Recommended:</strong> 800 mhz processor, 128mb ram, 32mb+ video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>  '}","{'windows': True, 'mac': True, 'linux': True}","{'currency': 'SGD', 'initial': 525, 'final': 525, 'discount_percent': 0, 'initial_formatted': '', 'final_formatted': 'S$5.25'}","{'coming_soon': False, 'date': '1 Apr, 1999'}",0.0,"[{'id': 0, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/20/0000000164.600x338.jpg?t=1579634708', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/apps/20/0000000164.1920x1080.jpg?t=1579634708'}, {'id': 1, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/20/0000000165.600x338.jpg?t=1579634708', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/apps...","One of the most popular online action games of all time, Team Fortress Classic features over nine character classes -- from Medic to Spy to Demolition Man -- enlisted in a unique style of online team warfare. Each character class possesses unique weapons, items, and abilities, as teams compete online in a variety of game play modes.",20.0,"{'url': '', 'email': ''}",game,,0.0,2752.0,77.0,Valve,0,Action,499,"English, French, German, Italian, Spanish - Spain, Korean, Russian, Simplified Chinese, Traditional Chinese",0.0,16.0,896.0,"5,000,000 .. 10,000,000",5416.0,499,Valve,,"{'Action': 745, 'FPS': 306, 'Multiplayer': 257, 'Classic': 232, 'Hero Shooter': 213, 'Shooter': 206, 'Team-Based': 188, 'Class-Based': 181, 'First-Person': 169, '1990's': 132, 'Old School': 106, 'Co-op': 89, 'Competitive': 68, 'Fast-Paced': 61, 'Retro': 55, 'Online Co-Op': 51, 'Violent': 45, 'Mod': 36, 'Funny': 35, 'Remake': 35}",0.0,0,1,[1]
2,"Enlist in an intense brand of Axis vs. Allied teamplay set in the WWII European Theatre of Operations. Players assume the role of light/assault/heavy infantry, sniper or machine-gunner class, each with a unique arsenal of historical weaponry at their disposal. Missions are based on key historical operations. And, as war rages, players must work together with their squad to accomplish a variety...",https://cdn.akamai.steamstatic.com/steam/apps/30/page_bg_generated_v6b.jpg?t=1512413490,"[{'id': 1, 'description': 'Multi-player'}, {'id': 8, 'description': 'Valve Anti-Cheat enabled'}]","{'ids': [], 'notes': None}","Enlist in an intense brand of Axis vs. Allied teamplay set in the WWII European Theatre of Operations. Players assume the role of light/assault/heavy infantry, sniper or machine-gunner class, each with a unique arsenal of historical weaponry at their disposal. Missions are based on key historical operations. And, as war rages, players must work together with their squad to accomplish a variety...",https://cdn.akamai.steamstatic.com/steam/apps/30/header.jpg?t=1512413490,0,"{'minimum': 'Minimum: Linux Ubuntu 12.04, Dual-core from Intel or AMD at 2.8 GHz, 1GB Memory, nVidia GeForce 8600/9600GT, ATI/AMD Radeaon HD2600/3600 (Graphic Drivers: nVidia 310, AMD 12.11), OpenGL 2.1, 4GB Hard Drive Space, OpenAL Compatible Sound Card'}","{'minimum': 'Minimum: OS X Snow Leopard 10.6.3, 1GB RAM, 4GB Hard Drive Space,NVIDIA GeForce 8 or higher, ATI X1600 or higher, or Intel HD 3000 or higher Mouse, Keyboard, Internet Connection'}",Day of Defeat,"[{'name': 'default', 'title': 'Buy Day of Defeat', 'description': '', 'selection_text': 'Select a purchase option', 'save_text': '', 'display_type': 0, 'is_recurring_subscription': 'false', 'subs': [{'packageid': 30, 'percent_savings_text': ' ', 'percent_savings': 0, 'option_text': 'Day of Defeat - S$5.25', 'option_description': '', 'can_get_free_license': '0', 'is_free_license': False, 'price...",[30],"{'minimum': '  <p><strong>Minimum:</strong> 500 mhz processor, 96mb ram, 16mb video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>  <p><strong>Recommended:</strong> 800 mhz processor, 128mb ram, 32mb+ video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>  '}","{'windows': True, 'mac': True, 'linux': True}","{'currency': 'SGD', 'initial': 525, 'final': 525, 'discount_percent': 0, 'initial_formatted': '', 'final_formatted': 'S$5.25'}","{'coming_soon': False, 'date': '1 May, 2003'}",0.0,"[{'id': 0, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/30/0000000169.600x338.jpg?t=1512413490', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/apps/30/0000000169.1920x1080.jpg?t=1512413490'}, {'id': 1, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/30/0000000170.600x338.jpg?t=1512413490', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/apps...","Enlist in an intense brand of Axis vs. Allied teamplay set in the WWII European Theatre of Operations. Players assume the role of light/assault/heavy infantry, sniper or machine-gunner class, each with a unique arsenal of historical weaponry at their disposal. Missions are based on key historical operations.",30.0,"{'url': '', 'email': ''}",game,,0.0,4250.0,139.0,Valve,0,Action,499,"English, French, German, Italian, Spanish - Spain",0.0,28.0,557.0,"5,000,000 .. 10,000,000",5007.0,499,Valve,,"{'FPS': 788, 'World War II': 249, 'Multiplayer': 202, 'Shooter': 188, 'Action': 160, 'War': 151, 'Team-Based': 131, 'Classic': 126, 'First-Person': 105, 'Class-Based': 77, 'Military': 64, 'Historical': 57, 'Tactical': 40, 'Singleplayer': 37, 'Co-op': 34, 'Difficult': 18, 'Old School': 16, 'Retro': 14, 'World War I': 14, 'Strategy': 13}",0.0,0,1,[1]
3,"Enjoy fast-paced multiplayer gaming with Deathmatch Classic (a.k.a. DMC). Valve's tribute to the work of id software, DMC invites players to grab their rocket launchers and put their reflexes to the test in a collection of futuristic settings.",https://cdn.akamai.steamstatic.com/steam/apps/40/page_bg_generated_v6b.jpg?t=1568752159,"[{'id': 1, 'description': 'Multi-player'}, {'id': 49, 'description': 'PvP'}, {'id': 36, 'description': 'Online PvP'}, {'id': 37, 'description': 'Shared/Split Screen PvP'}, {'id': 8, 'description': 'Valve Anti-Cheat enabled'}, {'id': 44, 'description': 'Remote Play Together'}]","{'ids': [], 'notes': None}","Enjoy fast-paced multiplayer gaming with Deathmatch Classic (a.k.a. DMC). Valve's tribute to the work of id software, DMC invites players to grab their rocket launchers and put their reflexes to the test in a collection of futuristic settings.",https://cdn.akamai.steamstatic.com/steam/apps/40/header.jpg?t=1568752159,0,"{'minimum': 'Minimum: Linux Ubuntu 12.04, Dual-core from Intel or AMD at 2.8 GHz, 1GB Memory, nVidia GeForce 8600/9600GT, ATI/AMD Radeaon HD2600/3600 (Graphic Drivers: nVidia 310, AMD 12.11), OpenGL 2.1, 4GB Hard Drive Space, OpenAL Compatible Sound Card'}","{'minimum': 'Minimum: OS X Snow Leopard 10.6.3, 1GB RAM, 4GB Hard Drive Space,NVIDIA GeForce 8 or higher, ATI X1600 or higher, or Intel HD 3000 or higher Mouse, Keyboard, Internet Connection'}",Deathmatch Classic,"[{'name': 'default', 'title': 'Buy Deathmatch Classic', 'description': '', 'selection_text': 'Select a purchase option', 'save_text': '', 'display_type': 0, 'is_recurring_subscription': 'false', 'subs': [{'packageid': 31, 'percent_savings_text': ' ', 'percent_savings': 0, 'option_text': 'Deathmatch Classic - S$5.25', 'option_description': '', 'can_get_free_license': '0', 'is_free_license': Fal...",[31],"{'minimum': '  <p><strong>Minimum:</strong> 500 mhz processor, 96mb ram, 16mb video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>  <p><strong>Recommended:</strong> 800 mhz processor, 128mb ram, 32mb+ video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>  '}","{'windows': True, 'mac': True, 'linux': True}","{'currency': 'SGD', 'initial': 525, 'final': 525, 'discount_percent': 0, 'initial_formatted': '', 'final_formatted': 'S$5.25'}","{'coming_soon': False, 'date': '1 Jun, 2001'}",0.0,"[{'id': 0, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/40/0000000142.600x338.jpg?t=1568752159', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/apps/40/0000000142.1920x1080.jpg?t=1568752159'}, {'id': 1, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/40/0000000143.600x338.jpg?t=1568752159', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/apps...","Enjoy fast-paced multiplayer gaming with Deathmatch Classic (a.k.a. DMC). Valve's tribute to the work of id software, DMC invites players to grab their rocket launchers and put their reflexes to the test in a collection of futuristic settings.",40.0,"{'url': '', 'email': ''}",game,,0.0,5083.0,5.0,Valve,0,Action,499,"English, French, German, Italian, Spanish - Spain, Korean, Russian, Simplified Chinese, Traditional Chinese",0.0,7.0,412.0,"5,000,000 .. 10,000,000",1854.0,499,Valve,,"{'Action': 629, 'FPS': 139, 'Classic': 107, 'Multiplayer': 96, 'Shooter': 94, 'First-Person': 70, 'Arena Shooter': 44, 'Old School': 33, 'Sci-fi': 33, 'Competitive': 23, 'Fast-Paced': 15, 'Retro': 14, 'Gore': 14, 'Co-op': 13, 'Difficult': 12, '1990's': 8}",0.0,0,1,[1]
4,"Return to the Black Mesa Research Facility as one of the military specialists assigned to eliminate Gordon Freeman. Experience an entirely new episode of single player action. Meet fierce alien opponents, and experiment with new weaponry. Named 'Game of the Year' by the Academy of Interactive Arts and Sciences.",https://cdn.akamai.steamstatic.com/steam/apps/50/page_bg_generated_v6b.jpg?t=1579628243,"[{'id': 2, 'description': 'Single-player'}, {'id': 1, 'description': 'Multi-player'}, {'id': 8, 'description': 'Valve Anti-Cheat enabled'}, {'id': 44, 'description': 'Remote Play Together'}]","{'ids': [], 'notes': None}","Return to the Black Mesa Research Facility as one of the military specialists assigned to eliminate Gordon Freeman. Experience an entirely new episode of single player action. Meet fierce alien opponents, and experiment with new weaponry. Named 'Game of the Year' by the Academy of Interactive Arts and Sciences.",https://cdn.akamai.steamstatic.com/steam/apps/50/header.jpg?t=1579628243,0,"{'minimum': 'Minimum: Linux Ubuntu 12.04, Dual-core from Intel or AMD at 2.8 GHz, 1GB Memory, nVidia GeForce 8600/9600GT, ATI/AMD Radeaon HD2600/3600 (Graphic Drivers: nVidia 310, AMD 12.11), OpenGL 2.1, 4GB Hard Drive Space, OpenAL Compatible Sound Card'}","{'minimum': 'Minimum: OS X Snow Leopard 10.6.3, 1GB RAM, 4GB Hard Drive Space,NVIDIA GeForce 8 or higher, ATI X1600 or higher, or Intel HD 3000 or higher Mouse, Keyboard, Internet Connection'}",Half-Life: Opposing Force,"[{'name': 'default', 'title': 'Buy Half-Life: Opposing Force', 'description': '', 'selection_text': 'Select a purchase option', 'save_text': '', 'display_type': 0, 'is_recurring_subscription': 'false', 'subs': [{'packageid': 32, 'percent_savings_text': ' ', 'percent_savings': 0, 'option_text': 'Opposing Force - S$5.25', 'option_description': '', 'can_get_free_license': '0', 'is_free_license': ...",[32],"{'minimum': '  <p><strong>Minimum:</strong> 500 mhz processor, 96mb ram, 16mb video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>  <p><strong>Recommended:</strong> 800 mhz processor, 128mb ram, 32mb+ video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>  '}","{'windows': True, 'mac': True, 'linux': True}","{'currency': 'SGD', 'initial': 525, 'final': 525, 'discount_percent': 0, 'initial_formatted': '', 'final_formatted': 'S$5.25'}","{'coming_soon': False, 'date': '1 Nov, 1999'}",0.0,"[{'id': 0, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/50/0000000155.600x338.jpg?t=1579628243', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/apps/50/0000000155.1920x1080.jpg?t=1579628243'}, {'id': 1, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/50/0000000156.600x338.jpg?t=1579628243', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/apps...","Return to the Black Mesa Research Facility as one of the military specialists assigned to eliminate Gordon Freeman. Experience an entirely new episode of single player action. Meet fierce alien opponents, and experiment with new weaponry. Named 'Game of the Year' by the Academy of Interactive Arts and Sciences.",50.0,"{'url': 'https://help.steampowered.com', 'email': ''}",game,,0.0,3223.0,139.0,Gearbox Software,0,Action,499,"English, French, German, Korean",0.0,156.0,664.0,"5,000,000 .. 10,000,000",13298.0,499,Valve,,"{'FPS': 881, 'Action': 322, 'Classic': 251, 'Sci-fi': 248, 'Singleplayer': 225, 'Shooter': 220, 'First-Person': 187, 'Aliens': 172, '1990's': 133, 'Adventure': 114, 'Atmospheric': 105, 'Military': 91, 'Story Rich': 74, 'Silent Protagonist': 65, 'Great Soundtrack': 50, 'Gore': 38, 'Puzzle': 35, 'Co-op': 31, 'Moddable': 29, 'Retro': 18}",0.0,0,1,[1]
5,"A futuristic action game that challenges your agility as well as your aim, Ricochet features one-on-one and team matches played in a variety of futuristic battle arenas.",https://cdn.akamai.steamstatic.com/steam/apps/60/page_bg_generated_v6b.jpg?t=1599518374,"[{'id': 1, 'description': 'Multi-player'}, {'id': 49, 'description': 'PvP'}, {'id': 36, 'description': 'Online PvP'}, {'id': 8, 'description': 'Valve Anti-Cheat enabled'}]","{'ids': [], 'notes': None}","A futuristic action game that challenges your agility as well as your aim, Ricochet features one-on-one and team matches played in a variety of futuristic battle arenas.",https://cdn.akamai.steamstatic.com/steam/apps/60/header.jpg?t=1599518374,0,"{'minimum': 'Minimum: Linux Ubuntu 12.04, Dual-core from Intel or AMD at 2.8 GHz, 1GB Memory, nVidia GeForce 8600/9600GT, ATI/AMD Radeaon HD2600/3600 (Graphic Drivers: nVidia 310, AMD 12.11), OpenGL 2.1, 4GB Hard Drive Space, OpenAL Compatible Sound Card'}","{'minimum': 'Minimum: OS X Snow Leopard 10.6.3, 1GB RAM, 4GB Hard Drive Space,NVIDIA GeForce 8 or higher, ATI X1600 or higher, or Intel HD 3000 or higher Mouse, Keyboard, Internet Connection'}",Ricochet,"[{'name': 'default', 'title': 'Buy Ricochet', 'description': '', 'selection_text': 'Select a purchase option', 'save_text': '', 'display_type': 0, 'is_recurring_subscription': 'false', 'subs': [{'packageid': 33, 'percent_savings_text': ' ', 'percent_savings': 0, 'option_text': 'Ricochet - S$5.25', 'option_description': '', 'can_get_free_license': '0', 'is_free_license': False, 'price_in_cents_...",[33],"{'minimum': '  <p><strong>Minimum:</strong> 500 mhz processor, 96mb ram, 16mb video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>  <p><strong>Recommended:</strong> 800 mhz processor, 128mb ram, 32mb+ video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>  '}","{'windows': True, 'mac': True, 'linux': True}","{'currency': 'SGD', 'initial': 525, 'final': 525, 'discount_percent': 0, 'initial_formatted': '', 'final_formatted': 'S$5.25'}","{'coming_soon': False, 'date': '1 Nov, 2000'}",0.0,"[{'id': 0, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/60/ss_2bf674132e28385f168dbc46ff55eea7be8c8886.600x338.jpg?t=1599518374', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/apps/60/ss_2bf674132e28385f168dbc46ff55eea7be8c8886.1920x1080.jpg?t=1599518374'}, {'id': 1, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/60/ss_079e05789d40144896f9b16fd49c...","A futuristic action game that challenges your agility as well as your aim, Ricochet features one-on-one and team matches played in a variety of futuristic battle arenas.",60.0,"{'url': '', 'email': ''}",game,,0.0,3159.0,8.0,Valve,0,Action,499,"English, French, German, Italian, Spanish - Spain, Korean, Russian, Simplified Chinese, Traditional Chinese",0.0,6.0,840.0,"5,000,000 .. 10,000,000",3676.0,499,Valve,,"{'Action': 585, 'FPS': 128, 'Multiplayer': 104, 'Classic': 89, 'First-Person': 78, 'Sci-fi': 54, 'Shooter': 52, 'Space': 42, 'Cyberpunk': 38, 'Memes': 31, 'Platformer': 27, 'Psychological Horror': 26, '3D': 25, 'Conspiracy': 25, 'Old School': 23, 'Retro': 23, 'Cult Classic': 17, 'Competitive': 16, 'Sports': 10, 'Great Soundtrack': 8}",0.0,0,1,[1]
6,"Named Game of the Year by over 50 publications, Valve's debut title blends action and adventure with award-winning technology to create a frighteningly realistic world where players must think to survive. Also includes an exciting multiplayer mode that allows you to play against friends and enemies around the world.",https://cdn.akamai.steamstatic.com/steam/apps/70/page_bg_generated_v6b.jpg?t=1591048039,"[{'id': 2, 'description': 'Single-player'}, {'id': 1, 'description': 'Multi-player'}, {'id': 49, 'description': 'PvP'}, {'id': 36, 'description': 'Online PvP'}, {'id': 23, 'description': 'Steam Cloud'}, {'id': 8, 'description': 'Valve Anti-Cheat enabled'}, {'id': 41, 'description': 'Remote Play on Phone'}, {'id': 42, 'description': 'Remote Play on Tablet'}, {'id': 44, 'description': 'Remote Pl...","{'ids': [], 'notes': None}","Named Game of the Year by over 50 publications, Valve's debut title blends action and adventure with award-winning technology to create a frighteningly realistic world where players must think to survive. Also includes an exciting multiplayer mode that allows you to play against friends and enemies around the world.",https://cdn.akamai.steamstatic.com/steam/apps/70/header.jpg?t=1591048039,0,"{'minimum': 'Minimum: Linux Ubuntu 12.04, Dual-core from Intel or AMD at 2.8 GHz, 1GB Memory, nVidia GeForce 8600/9600GT, ATI/AMD Radeaon HD2600/3600 (Graphic Drivers: nVidia 310, AMD 12.11), OpenGL 2.1, 4GB Hard Drive Space, OpenAL Compatible Sound Card'}","{'minimum': 'Minimum: OS X Snow Leopard 10.6.3, 1GB RAM, 4GB Hard Drive Space,NVIDIA GeForce 8 or higher, ATI X1600 or higher, or Intel HD 3000 or higher Mouse, Keyboard, Internet Connection'}",Half-Life,"[{'name': 'default', 'title': 'Buy Half-Life', 'description': '', 'selection_text': 'Select a purchase option', 'save_text': '', 'display_type': 0, 'is_recurring_subscription': 'false', 'subs': [{'packageid': 34, 'percent_savings_text': ' ', 'percent_savings': 0, 'option_text': 'Half-Life - S$10.00', 'option_description': '', 'can_get_free_license': '0', 'is_free_license': False, 'price_in_cen...","[34, 292347]","{'minimum': '  <p><strong>Minimum:</strong> 500 mhz processor, 96mb ram, 16mb video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>  <p><strong>Recommended:</strong> 800 mhz processor, 128mb ram, 32mb+ video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>  '}","{'windows': True, 'mac': True, 'linux': True}","{'currency': 'SGD', 'initial': 1000, 'final': 1000, 'discount_percent': 0, 'initial_formatted': '', 'final_formatted': 'S$10.00'}","{'coming_soon': False, 'date': '8 Nov, 1998'}",0.0,"[{'id': 0, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/70/0000002354.600x338.jpg?t=1591048039', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/apps/70/0000002354.1920x1080.jpg?t=1591048039'}, {'id': 1, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/70/0000002343.600x338.jpg?t=1591048039', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/apps...","Named Game of the Year by over 50 publications, Valve's debut title blends action and adventure with award-winning technology to create a frighteningly realistic world where players must think to survive. Also includes an exciting multiplayer mode that allows you to play against friends and enemies around the world.",70.0,"{'url': 'http://steamcommunity.com/app/70', 'email': ''}",game,,30.0,1557.0,753.0,Valve,0,Action,999,"English, French, German, Italian, Spanish - Spain, Simplified Chinese, Traditional Chinese, Korean",36.0,145.0,2482.0,"1,000,000 .. 2,000,000",69068.0,999,Valve,,"{'FPS': 2234, 'Sci-fi': 1764, '1990's': 1759, 'Singleplayer': 1758, 'Action': 1757, 'Multiplayer': 1757, 'Shooter': 1750, 'Classic': 1743, 'First-Person': 1743, 'Story Rich': 1740, 'Aliens': 1735, 'Silent Protagonist': 1733, 'Atmospheric': 1728, 'Adventure': 1720, 'Moddable': 1719, 'Action-Adventure': 1711, 'Gore': 1709, 'Retro': 1708, 'Difficult': 1698, 'PvP': 1694}",0.0,0,1,[1]
7,"With its extensive Tour of Duty campaign, a near-limitless number of skirmish modes, updates and new content for Counter-Strike's award-winning multiplayer game play, plus over 12 bonus single player missions, Counter-Strike: Condition Zero is a tremendous offering of single and multiplayer content.",https://cdn.akamai.steamstatic.com/steam/apps/80/page_bg_generated_v6b.jpg?t=1602535977,"[{'id': 2, 'description': 'Single-player'}, {'id': 1, 'description': 'Multi-player'}, {'id': 8, 'description': 'Valve Anti-Cheat enabled'}]","{'ids': [], 'notes': None}","With its extensive Tour of Duty campaign, a near-limitless number of skirmish modes, updates and new content for Counter-Strike's award-winning multiplayer game play, plus over 12 bonus single player missions, Counter-Strike: Condition Zero is a tremendous offering of single and multiplayer content.",https://cdn.akamai.steamstatic.com/steam/apps/80/header.jpg?t=1602535977,0,[],[],Counter-Strike: Condition Zero,"[{'name': 'default', 'title': 'Buy Counter-Strike: Condition Zero', 'description': '', 'selection_text': 'Select a purchase option', 'save_text': '', 'display_type': 0, 'is_recurring_subscription': 'false', 'subs': [{'packageid': 7, 'percent_savings_text': ' ', 'percent_savings': 0, 'option_text': 'Counter-Strike: Condition Zero - S$10.00', 'option_description': '', 'can_get_free_license': '0'...",[7],"{'minimum': '  <p><strong>Minimum:</strong> 500 mhz processor, 96mb ram, 16mb video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>  <p><strong>Recommended:</strong> 800 mhz processor, 128mb ram, 32mb+ video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>  '}","{'windows': True, 'mac': True, 'linux': True}","{'currency': 'SGD', 'initial': 1000, 'final': 1000, 'discount_percent': 0, 'initial_formatted': '', 'final_formatted': 'S$10.00'}","{'coming_soon': False, 'date': '1 Mar, 2004'}",0.0,"[{'id': 0, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/80/0000002528.600x338.jpg?t=1602535977', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/apps/80/0000002528.1920x1080.jpg?t=1602535977'}, {'id': 1, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/80/0000002529.600x338.jpg?t=1602535977', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/apps...","With its extensive Tour of Duty campaign, a near-limitless number of skirmish modes, updates and new content for Counter-Strike's award-winning multiplayer game play, plus over 12 bonus single player missions, Counter-Strike: Condition Zero is a tremendous offering of single and multiplayer content.",80.0,"{'url': 'http://steamcommunity.com/app/80', 'email': ''}",game,,235.0,2500.0,564.0,Valve,0,Action,999,"English, French, German, Italian, Spanish - Spain, Simplified Chinese, Traditional Chinese, Korean",235.0,38.0,1817.0,"5,000,000 .. 10,000,000",18673.0,999,Valve,,"{'Action': 1358, 'FPS': 1013, 'Shooter': 748, 'Multiplayer': 605, 'First-Person': 461, 'Singleplayer': 395, 'Classic': 392, 'Tactical': 369, 'Team-Based': 313, 'Competitive': 301, 'Strategy': 175, 'Online Co-Op': 175, 'Military': 174, 'Adventure': 87, 'Survival': 70, 'Atmospheric': 61, 'Open World': 43, 'Old School': 43, 'Simulation': 39, 'Dark': 39}",0.0,0,1,[1]
8,"Made by Gearbox Software and originally released in 2001 as an add-on to Half-Life, Blue Shift is a return to the Black Mesa Research Facility in which you play as Barney Calhoun, the security guard sidekick who helped Gordon out of so many sticky situations.",https://cdn.akamai.steamstatic.com/steam/apps/130/page_bg_generated_v6b.jpg?t=1579629868,"[{'id': 2, 'description': 'Single-player'}, {'id': 44, 'description': 'Remote Play Together'}]","{'ids': [], 'notes': None}","Made by Gearbox Software and originally released in 2001 as an add-on to Half-Life, Blue Shift is a return to the Black Mesa Research Facility in which you play as Barney Calhoun, the security guard sidekick who helped Gordon out of so many sticky situations.",https://cdn.akamai.steamstatic.com/steam/apps/130/header.jpg?t=1579629868,0,"{'minimum': 'Minimum: Linux Ubuntu 12.04, Dual-core from Intel or AMD at 2.8 GHz, 1GB Memory, nVidia GeForce 8600/9600GT, ATI/AMD Radeaon HD2600/3600 (Graphic Drivers: nVidia 310, AMD 12.11), OpenGL 2.1, 4GB Hard Drive Space, OpenAL Compatible Sound Card'}","{'minimum': 'Minimum: OS X Snow Leopard 10.6.3, 1GB RAM, 4GB Hard Drive Space,NVIDIA GeForce 8 or higher, ATI X1600 or higher, or Intel HD 3000 or higher Mouse, Keyboard, Internet Connection'}",Half-Life: Blue Shift,"[{'name': 'default', 'title': 'Buy Half-Life: Blue Shift', 'description': '', 'selection_text': 'Select a purchase option', 'save_text': '', 'display_type': 0, 'is_recurring_subscription': 'false', 'subs': [{'packageid': 35, 'percent_savings_text': ' ', 'percent_savings': 0, 'option_text': 'Half-Life: Blue Shift - 3,99€', 'option_description': '', 'can_get_free_license': '0', 'is_free_license'...",[35],"{'minimum': '  <p><strong>Minimum:</strong> 500 mhz processor, 96mb ram, 16mb video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>  <p><strong>Recommended:</strong> 800 mhz processor, 128mb ram, 32mb+ video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>  '}","{'windows': True, 'mac': True, 'linux': True}","{'currency': 'EUR', 'initial': 399, 'final': 399, 'discount_percent': 0, 'initial_formatted': '', 'final_formatted': '3,99€'}","{'coming_soon': False, 'date': '1 Jun, 2001'}",0.0,"[{'id': 0, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/130/0000000127.600x338.jpg?t=1579629868', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/apps/130/0000000127.1920x1080.jpg?t=1579629868'}, {'id': 1, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/130/0000000128.600x338.jpg?t=1579629868', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/a...","Made by Gearbox Software and originally released in 2001 as an add-on to Half-Life, Blue Shift is a return to the Black Mesa Research Facility in which you play as Barney Calhoun, the security guard sidekick who helped Gordon out of so many sticky situations.",130.0,"{'url': 'https://help.steampowered.com', 'email': ''}",game,,0.0,2553.0,74.0,Gearbox Software,0,Action,499,"English, French, German",0.0,85.0,904.0,"10,000,000 .. 20,000,000",9493.0,499,Valve,,"{'FPS': 457, 'Action': 283, 'Sci-fi': 213, 'Singleplayer': 192, 'Shooter': 173, 'Aliens': 150, 'Classic': 139, 'First-Person': 138, 'Zombies': 91, 'Adventure': 89, '1990's': 79, 'Short': 74, 'Atmospheric': 69, 'Silent Protagonist': 49, 'Story Rich': 45, 'Puzzle': 23, 'Moddable': 22, 'Great Soundtrack': 21, 'Retro': 19, 'Old School': 10}",0.0,0,1,[1]
9,"1998. HALF-LIFE sends a shock through the game industry with its combination of pounding action and continuous, immersive storytelling. Valve's debut title wins more than 50 game-of-the-year awards on its way to being named ""Best PC Game Ever"" by PC Gamer, and launches a franchise with more than eight million retail units sold worldwide.<br><br>\r\n\t\tNOW. By taking the suspense, challenge an...",https://cdn.akamai.steamstatic.com/steam/apps/220/page_bg_generated_v6b.jpg?t=1591063154,"[{'id': 2, 'description': 'Single-player'}, {'id': 22, 'description': 'Steam Achievements'}, {'id': 29, 'description': 'Steam Trading Cards'}, {'id': 13, 'description': 'Captions available'}, {'id': 18, 'description': 'Partial Controller Support'}, {'id': 23, 'description': 'Steam Cloud'}, {'id': 16, 'description': 'Includes Source SDK'}, {'id': 41, 'description': 'Remote Play on Phone'}, {'id...","{'ids': [], 'notes': None}","1998. HALF-LIFE sends a shock through the game industry with its combination of pounding action and continuous, immersive storytelling. Valve's debut title wins more than 50 game-of-the-year awards on its way to being named ""Best PC Game Ever"" by PC Gamer, and launches a franchise with more than eight million retail units sold worldwide.<br><br>\r\n\t\tNOW. By taking the suspense, challenge an...",https://cdn.akamai.steamstatic.com/steam/apps/220/header.jpg?t=1591063154,0,[],"{'minimum': '<strong>Minimum:</strong><br><ul class=""bb_ul""><li><strong>OS:</strong> Leopard 10.5.8, Snow Leopard 10.6.3, or higher<br></li><li><strong>Memory:</strong> 1 GB RAM<br></li><li><strong>Graphics:</strong> Nvidia GeForce8 or higher, ATI X1600 or higher, Intel HD 3000 or higher</li></ul>'}",Half-Life 2,"[{'name': 'default', 'title': 'Buy Half-Life 2', 'description': '', 'selection_text': 'Select a purchase option', 'save_text': '', 'display_type': 0, 'is_recurring_subscription': 'false', 'subs': [{'packageid': 36, 'percent_savings_text': ' ', 'percent_savings': 0, 'option_text': 'Half-Life 2 - S$10.00', 'option_description': '', 'can_get_free_license': '0', 'is_free_license': False, 'price_in...","[36, 289444, 469]","{'minimum': '<strong>Minimum:</strong><br><ul class=""bb_ul""><li><strong>OS:</strong> Windows 7, Vista, XP<br></li><li><strong>Processor:</strong> 1.7 Ghz<br></li><li><strong>Memory:</strong> 512 MB RAM<br></li><li><strong>Graphics:</strong> DirectX 8.1 level Graphics Card (requires support for SSE)<br></li><li><strong>Storage:</strong> 6500 MB available space</li></ul>'}","{'windows': True, 'mac': True, 'linux': True}","{'currency': 'SGD', 'initial': 1000, 'final': 1000, 'discount_percent': 0, 'initial_formatted': '', 'final_formatted': 'S$10.00'}","{'coming_soon': False, 'date': '16 Nov, 2004'}",0.0,"[{'id': 0, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/220/0000001864.600x338.jpg?t=1591063154', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/apps/220/0000001864.1920x1080.jpg?t=1591063154'}, {'id': 1, 'path_thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/220/0000001865.600x338.jpg?t=1591063154', 'path_full': 'https://cdn.akamai.steamstatic.com/steam/a...","1998. HALF-LIFE sends a shock through the game industry with its combination of pounding action and continuous, immersive storytelling. Valve's debut title wins more than 50 game-of-the-year awards on its way to being named ""Best PC Game Ever"" by PC Gamer, and launches a franchise with more than eight million retail units sold worldwide.",220.0,"{'url': 'http://steamcommunity.com/app/220', 'email': ''}",game,"[{'id': 904, 'name': 'Half-Life 2 Trailer', 'thumbnail': 'https://cdn.akamai.steamstatic.com/steam/apps/904/movie.jpg?t=1569623096', 'webm': {'480': 'http://cdn.akamai.steamstatic.com/steam/apps/904/movie480.webm?t=1569623096', 'max': 'http://cdn.akamai.steamstatic.com/steam/apps/904/movie_max.webm?t=1569623096'}, 'mp4': {'480': 'http://cdn.akamai.steamstatic.com/steam/apps/904/movie480.mp4?t=...",256.0,1108.0,1249.0,Valve,0,Action,999,"English, French, German, Italian, Korean, Spanish - Spain, Russian, Simplified Chinese, Traditional Chinese, Dutch, Danish, Finnish, Japanese, Norwegian, Polish, Portuguese, Swedish, Thai",266.0,452.0,3579.0,"10,000,000 .. 20,000,000",135164.0,999,Valve,,"{'FPS': 3872, 'Action': 2754, 'Sci-fi': 2403, 'Classic': 2196, 'Singleplayer': 2126, 'Story Rich': 1827, 'Shooter': 1645, 'First-Person': 1536, 'Adventure': 1314, 'Dystopian ': 1158, 'Atmospheric': 1075, 'Zombies': 976, 'Silent Protagonist': 958, 'Physics': 930, 'Aliens': 897, 'Great Soundtrack': 885, 'Horror': 708, 'Puzzle': 689, 'Multiplayer': 642, 'Moddable': 469}",0.0,1,1,[1]


### Data manipulation

We will further manipulate the data before grouping into groups after analysing from the above rows. 

#### `categories`

We see that the categories is in the form of list containing dictionary. We will unpack the dictionary to 2 columns and drop the original column. 

In [7]:
# unpack categories
# id
list_cat = unpack_cell(df_steam, 'categories', 'categories_id', 'id')
# description
list_cat_2 = unpack_cell(df_steam, 'categories', 'categories_description', 'description')
# list_cat remains unchanges! We will assign again and you can run the last line to verify this claim
# list_cat_2 == list_cat

100%|████████████████████████████████████████████████████████████████████████| 49015/49015 [00:00<00:00, 104109.57it/s]
100%|█████████████████████████████████████████████████████████████████████████| 49015/49015 [00:00<00:00, 52441.84it/s]


In [8]:
# view first 5 rows of unpacked content to ensure unpack was successful
df_steam[['categories', 'categories_description', 'categories_id']].head()

Unnamed: 0,categories,categories_description,categories_id
0,"[{'id': 1, 'description': 'Multi-player'}, {'id': 49, 'description': 'PvP'}, {'id': 36, 'description': 'Online PvP'}, {'id': 37, 'description': 'Shared/Split Screen PvP'}, {'id': 8, 'description': 'Valve Anti-Cheat enabled'}]","[Multi-player, PvP, Online PvP, Shared/Split Screen PvP, Valve Anti-Cheat enabled]","[1, 49, 36, 37, 8]"
1,"[{'id': 1, 'description': 'Multi-player'}, {'id': 49, 'description': 'PvP'}, {'id': 36, 'description': 'Online PvP'}, {'id': 37, 'description': 'Shared/Split Screen PvP'}, {'id': 8, 'description': 'Valve Anti-Cheat enabled'}, {'id': 44, 'description': 'Remote Play Together'}]","[Multi-player, PvP, Online PvP, Shared/Split Screen PvP, Valve Anti-Cheat enabled, Remote Play Together]","[1, 49, 36, 37, 8, 44]"
2,"[{'id': 1, 'description': 'Multi-player'}, {'id': 8, 'description': 'Valve Anti-Cheat enabled'}]","[Multi-player, Valve Anti-Cheat enabled]","[1, 8]"
3,"[{'id': 1, 'description': 'Multi-player'}, {'id': 49, 'description': 'PvP'}, {'id': 36, 'description': 'Online PvP'}, {'id': 37, 'description': 'Shared/Split Screen PvP'}, {'id': 8, 'description': 'Valve Anti-Cheat enabled'}, {'id': 44, 'description': 'Remote Play Together'}]","[Multi-player, PvP, Online PvP, Shared/Split Screen PvP, Valve Anti-Cheat enabled, Remote Play Together]","[1, 49, 36, 37, 8, 44]"
4,"[{'id': 2, 'description': 'Single-player'}, {'id': 1, 'description': 'Multi-player'}, {'id': 8, 'description': 'Valve Anti-Cheat enabled'}, {'id': 44, 'description': 'Remote Play Together'}]","[Single-player, Multi-player, Valve Anti-Cheat enabled, Remote Play Together]","[2, 1, 8, 44]"


In [9]:
# drop `categories` column
df_steam = df_steam.drop(columns=["categories"])

#### `content_descriptors`

We see that the content_descriptors is in the form of list containing dictionary. However, we also see that there seem to be alot of 'empty' information suggesting that this row will now provide much details. Let us look at the values within. 

In [10]:
# look at content_descriptors
df_steam['content_descriptors'].value_counts()

{'ids': [], 'notes': None}                                                                                                                                                                                                                  42675
{'ids': [2, 5], 'notes': None}                                                                                                                                                                                                                691
{'ids': [1, 5], 'notes': None}                                                                                                                                                                                                                240
{'ids': [5], 'notes': None}                                                                                                                                                                                                                   184
{'ids': [1, 2, 5], 'notes': None

We see that out of 49015 rows, 42675 (more than 87\%) of the data is having zero information. We will drop this column as we do not foresee the column will be useful for analysis or modelling (any) purpose. 

In [11]:
# drop column
df_steam = df_steam.drop(columns=["content_descriptors"])

#### `linux_requirements`, `mac_requirements`, `pc_requirements`

We see that this 3 columns is in the form of dictionary. We will unpack the dictionary to another column and drop the original column. 

In [12]:
# unpack linux_requirements
list_linux = unpack_cell(df_steam, 'linux_requirements', 'min_linux_requirements', 'minimum')

# unpack mac_requirements
list_mac = unpack_cell(df_steam, 'mac_requirements', 'min_mac_requirements', 'minimum')

# unpack pc_requirements
list_pc = unpack_cell(df_steam, 'pc_requirements', 'min_pc_requirements', 'minimum')

100%|█████████████████████████████████████████████████████████████████████████| 49015/49015 [00:03<00:00, 12297.23it/s]
100%|██████████████████████████████████████████████████████████████████████████| 49015/49015 [00:10<00:00, 4838.15it/s]
100%|███████████████████████████████████████████████████████████████████████████| 49015/49015 [02:52<00:00, 283.51it/s]


In [13]:
# view first 5 rows of content
df_steam[['linux_requirements', 'min_linux_requirements', 'mac_requirements', 'min_mac_requirements','pc_requirements', 'min_pc_requirements']].head()

Unnamed: 0,linux_requirements,min_linux_requirements,mac_requirements,min_mac_requirements,pc_requirements,min_pc_requirements
0,"{'minimum': 'Minimum: Linux Ubuntu 12.04, Dual-core from Intel or AMD at 2.8 GHz, 1GB Memory, nVidia GeForce 8600/9600GT, ATI/AMD Radeaon HD2600/3600 (Graphic Drivers: nVidia 310, AMD 12.11), OpenGL 2.1, 4GB Hard Drive Space, OpenAL Compatible Sound Card'}","Minimum: Linux Ubuntu 12.04, Dual-core from Intel or AMD at 2.8 GHz, 1GB Memory, nVidia GeForce 8600/9600GT, ATI/AMD Radeaon HD2600/3600 (Graphic Drivers: nVidia 310, AMD 12.11), OpenGL 2.1, 4GB Hard Drive Space, OpenAL Compatible Sound Card","{'minimum': 'Minimum: OS X Snow Leopard 10.6.3, 1GB RAM, 4GB Hard Drive Space,NVIDIA GeForce 8 or higher, ATI X1600 or higher, or Intel HD 3000 or higher Mouse, Keyboard, Internet Connection'}","Minimum: OS X Snow Leopard 10.6.3, 1GB RAM, 4GB Hard Drive Space,NVIDIA GeForce 8 or higher, ATI X1600 or higher, or Intel HD 3000 or higher Mouse, Keyboard, Internet Connection","{'minimum': '  <p><strong>Minimum:</strong> 500 mhz processor, 96mb ram, 16mb video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>  <p><strong>Recommended:</strong> 800 mhz processor, 128mb ram, 32mb+ video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>  '}","\r\n\t\t\t<p><strong>Minimum:</strong> 500 mhz processor, 96mb ram, 16mb video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>\r\n\t\t\t<p><strong>Recommended:</strong> 800 mhz processor, 128mb ram, 32mb+ video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>\r\n\t\t\t"
1,"{'minimum': 'Minimum: Linux Ubuntu 12.04, Dual-core from Intel or AMD at 2.8 GHz, 1GB Memory, nVidia GeForce 8600/9600GT, ATI/AMD Radeaon HD2600/3600 (Graphic Drivers: nVidia 310, AMD 12.11), OpenGL 2.1, 4GB Hard Drive Space, OpenAL Compatible Sound Card'}","Minimum: Linux Ubuntu 12.04, Dual-core from Intel or AMD at 2.8 GHz, 1GB Memory, nVidia GeForce 8600/9600GT, ATI/AMD Radeaon HD2600/3600 (Graphic Drivers: nVidia 310, AMD 12.11), OpenGL 2.1, 4GB Hard Drive Space, OpenAL Compatible Sound Card","{'minimum': 'Minimum: OS X Snow Leopard 10.6.3, 1GB RAM, 4GB Hard Drive Space,NVIDIA GeForce 8 or higher, ATI X1600 or higher, or Intel HD 3000 or higher Mouse, Keyboard, Internet Connection'}","Minimum: OS X Snow Leopard 10.6.3, 1GB RAM, 4GB Hard Drive Space,NVIDIA GeForce 8 or higher, ATI X1600 or higher, or Intel HD 3000 or higher Mouse, Keyboard, Internet Connection","{'minimum': '  <p><strong>Minimum:</strong> 500 mhz processor, 96mb ram, 16mb video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>  <p><strong>Recommended:</strong> 800 mhz processor, 128mb ram, 32mb+ video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>  '}","\r\n\t\t\t<p><strong>Minimum:</strong> 500 mhz processor, 96mb ram, 16mb video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>\r\n\t\t\t<p><strong>Recommended:</strong> 800 mhz processor, 128mb ram, 32mb+ video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>\r\n\t\t\t"
2,"{'minimum': 'Minimum: Linux Ubuntu 12.04, Dual-core from Intel or AMD at 2.8 GHz, 1GB Memory, nVidia GeForce 8600/9600GT, ATI/AMD Radeaon HD2600/3600 (Graphic Drivers: nVidia 310, AMD 12.11), OpenGL 2.1, 4GB Hard Drive Space, OpenAL Compatible Sound Card'}","Minimum: Linux Ubuntu 12.04, Dual-core from Intel or AMD at 2.8 GHz, 1GB Memory, nVidia GeForce 8600/9600GT, ATI/AMD Radeaon HD2600/3600 (Graphic Drivers: nVidia 310, AMD 12.11), OpenGL 2.1, 4GB Hard Drive Space, OpenAL Compatible Sound Card","{'minimum': 'Minimum: OS X Snow Leopard 10.6.3, 1GB RAM, 4GB Hard Drive Space,NVIDIA GeForce 8 or higher, ATI X1600 or higher, or Intel HD 3000 or higher Mouse, Keyboard, Internet Connection'}","Minimum: OS X Snow Leopard 10.6.3, 1GB RAM, 4GB Hard Drive Space,NVIDIA GeForce 8 or higher, ATI X1600 or higher, or Intel HD 3000 or higher Mouse, Keyboard, Internet Connection","{'minimum': '  <p><strong>Minimum:</strong> 500 mhz processor, 96mb ram, 16mb video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>  <p><strong>Recommended:</strong> 800 mhz processor, 128mb ram, 32mb+ video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>  '}","\r\n\t\t\t<p><strong>Minimum:</strong> 500 mhz processor, 96mb ram, 16mb video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>\r\n\t\t\t<p><strong>Recommended:</strong> 800 mhz processor, 128mb ram, 32mb+ video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>\r\n\t\t\t"
3,"{'minimum': 'Minimum: Linux Ubuntu 12.04, Dual-core from Intel or AMD at 2.8 GHz, 1GB Memory, nVidia GeForce 8600/9600GT, ATI/AMD Radeaon HD2600/3600 (Graphic Drivers: nVidia 310, AMD 12.11), OpenGL 2.1, 4GB Hard Drive Space, OpenAL Compatible Sound Card'}","Minimum: Linux Ubuntu 12.04, Dual-core from Intel or AMD at 2.8 GHz, 1GB Memory, nVidia GeForce 8600/9600GT, ATI/AMD Radeaon HD2600/3600 (Graphic Drivers: nVidia 310, AMD 12.11), OpenGL 2.1, 4GB Hard Drive Space, OpenAL Compatible Sound Card","{'minimum': 'Minimum: OS X Snow Leopard 10.6.3, 1GB RAM, 4GB Hard Drive Space,NVIDIA GeForce 8 or higher, ATI X1600 or higher, or Intel HD 3000 or higher Mouse, Keyboard, Internet Connection'}","Minimum: OS X Snow Leopard 10.6.3, 1GB RAM, 4GB Hard Drive Space,NVIDIA GeForce 8 or higher, ATI X1600 or higher, or Intel HD 3000 or higher Mouse, Keyboard, Internet Connection","{'minimum': '  <p><strong>Minimum:</strong> 500 mhz processor, 96mb ram, 16mb video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>  <p><strong>Recommended:</strong> 800 mhz processor, 128mb ram, 32mb+ video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>  '}","\r\n\t\t\t<p><strong>Minimum:</strong> 500 mhz processor, 96mb ram, 16mb video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>\r\n\t\t\t<p><strong>Recommended:</strong> 800 mhz processor, 128mb ram, 32mb+ video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>\r\n\t\t\t"
4,"{'minimum': 'Minimum: Linux Ubuntu 12.04, Dual-core from Intel or AMD at 2.8 GHz, 1GB Memory, nVidia GeForce 8600/9600GT, ATI/AMD Radeaon HD2600/3600 (Graphic Drivers: nVidia 310, AMD 12.11), OpenGL 2.1, 4GB Hard Drive Space, OpenAL Compatible Sound Card'}","Minimum: Linux Ubuntu 12.04, Dual-core from Intel or AMD at 2.8 GHz, 1GB Memory, nVidia GeForce 8600/9600GT, ATI/AMD Radeaon HD2600/3600 (Graphic Drivers: nVidia 310, AMD 12.11), OpenGL 2.1, 4GB Hard Drive Space, OpenAL Compatible Sound Card","{'minimum': 'Minimum: OS X Snow Leopard 10.6.3, 1GB RAM, 4GB Hard Drive Space,NVIDIA GeForce 8 or higher, ATI X1600 or higher, or Intel HD 3000 or higher Mouse, Keyboard, Internet Connection'}","Minimum: OS X Snow Leopard 10.6.3, 1GB RAM, 4GB Hard Drive Space,NVIDIA GeForce 8 or higher, ATI X1600 or higher, or Intel HD 3000 or higher Mouse, Keyboard, Internet Connection","{'minimum': '  <p><strong>Minimum:</strong> 500 mhz processor, 96mb ram, 16mb video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>  <p><strong>Recommended:</strong> 800 mhz processor, 128mb ram, 32mb+ video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>  '}","\r\n\t\t\t<p><strong>Minimum:</strong> 500 mhz processor, 96mb ram, 16mb video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>\r\n\t\t\t<p><strong>Recommended:</strong> 800 mhz processor, 128mb ram, 32mb+ video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>\r\n\t\t\t"


In [14]:
# drop columns
df_steam = df_steam.drop(columns=["linux_requirements",'mac_requirements', 'pc_requirements',])

#### `package_groups` and `packages`

We see that the `package_group` and `packages` is in the form of list containing dictionary or list. This two column actually tells us whether or not was there a package deal for the game being sold. 

After consideration, we do not foresee the content of the column having much impact to our analysis. As such, we will create a new column that shows the number of packages and drop the two columns. 

In [15]:
# previously, packages was cleaned by inputting 0 to the missing values
df_steam['num_packages'] = df_steam['packages'].apply(lambda x : 0 if x == 0 else len(x))

# alternative - list comprehension
# df_steam['num_packages'] = [0 if row == 0 else len(row) for index, row in df_steam['packages'].iteritems()]

In [16]:
# view first 5 rows of content
df_steam[['package_groups', 'packages', 'num_packages']].head()

Unnamed: 0,package_groups,packages,num_packages
0,"[{'name': 'default', 'title': 'Buy Counter-Strike', 'description': '', 'selection_text': 'Select a purchase option', 'save_text': '', 'display_type': 0, 'is_recurring_subscription': 'false', 'subs': [{'packageid': 7, 'percent_savings_text': ' ', 'percent_savings': 0, 'option_text': 'Counter-Strike: Condition Zero - 8,19€', 'option_description': '', 'can_get_free_license': '0', 'is_free_license...","[574941, 7]",2
1,"[{'name': 'default', 'title': 'Buy Team Fortress Classic', 'description': '', 'selection_text': 'Select a purchase option', 'save_text': '', 'display_type': 0, 'is_recurring_subscription': 'false', 'subs': [{'packageid': 29, 'percent_savings_text': ' ', 'percent_savings': 0, 'option_text': 'Team Fortress Classic - S$5.25', 'option_description': '', 'can_get_free_license': '0', 'is_free_license...",[29],1
2,"[{'name': 'default', 'title': 'Buy Day of Defeat', 'description': '', 'selection_text': 'Select a purchase option', 'save_text': '', 'display_type': 0, 'is_recurring_subscription': 'false', 'subs': [{'packageid': 30, 'percent_savings_text': ' ', 'percent_savings': 0, 'option_text': 'Day of Defeat - S$5.25', 'option_description': '', 'can_get_free_license': '0', 'is_free_license': False, 'price...",[30],1
3,"[{'name': 'default', 'title': 'Buy Deathmatch Classic', 'description': '', 'selection_text': 'Select a purchase option', 'save_text': '', 'display_type': 0, 'is_recurring_subscription': 'false', 'subs': [{'packageid': 31, 'percent_savings_text': ' ', 'percent_savings': 0, 'option_text': 'Deathmatch Classic - S$5.25', 'option_description': '', 'can_get_free_license': '0', 'is_free_license': Fal...",[31],1
4,"[{'name': 'default', 'title': 'Buy Half-Life: Opposing Force', 'description': '', 'selection_text': 'Select a purchase option', 'save_text': '', 'display_type': 0, 'is_recurring_subscription': 'false', 'subs': [{'packageid': 32, 'percent_savings_text': ' ', 'percent_savings': 0, 'option_text': 'Opposing Force - S$5.25', 'option_description': '', 'can_get_free_license': '0', 'is_free_license': ...",[32],1


In [17]:
# drop columns
df_steam = df_steam.drop(columns=['package_groups', 'packages'])

#### `platforms`

We see that the `platforms` is in the form of dictionary. We will unpack the column to 3 columns. 

In [18]:
# unpack platforms to windows
list_windows_platform = unpack_cell(df_steam, 'platforms', 'windows_platform', 'windows')

# unpack platroms to linux
list_linux_platform = unpack_cell(df_steam, 'platforms', 'linux_platform', 'linux')

# unpack platforms to mac
list_mac_platform = unpack_cell(df_steam, 'platforms', 'mac_platform', 'mac')

100%|████████████████████████████████████████████████████████████████████████| 49015/49015 [00:00<00:00, 377142.57it/s]
100%|████████████████████████████████████████████████████████████████████████| 49015/49015 [00:00<00:00, 467719.60it/s]
100%|████████████████████████████████████████████████████████████████████████| 49015/49015 [00:00<00:00, 235901.66it/s]


In [19]:
# change column to 1 and 0 instead of boolean values
df_steam['windows_platform'] = df_steam['windows_platform'].apply(lambda x : 1 if x else 0)
df_steam['linux_platform'] = df_steam['linux_platform'].apply(lambda x : 1 if x else 0)
df_steam['mac_platform'] = df_steam['mac_platform'].apply(lambda x : 1 if x else 0)

# alternative
# list comprehension to change column to 1 and 0 instead of boolean values
# df_steam['windows_platform'] = [1 if row else 0 for index, row in df_steam['windows_platform'].iteritems()]
# df_steam['linux_platform'] = [1 if row else 0 for index, row in df_steam['linux_platform'].iteritems()]
# df_steam['mac_platform'] = [1 if row else 0 for index, row in df_steam['mac_platform'].iteritems()]

In [20]:
# view first 5 rows of content
df_steam[['platforms', 'windows_platform', 'linux_platform', 'mac_platform']].head()

Unnamed: 0,platforms,windows_platform,linux_platform,mac_platform
0,"{'windows': True, 'mac': True, 'linux': True}",1,1,1
1,"{'windows': True, 'mac': True, 'linux': True}",1,1,1
2,"{'windows': True, 'mac': True, 'linux': True}",1,1,1
3,"{'windows': True, 'mac': True, 'linux': True}",1,1,1
4,"{'windows': True, 'mac': True, 'linux': True}",1,1,1


In [21]:
# drop columns
df_steam = df_steam.drop(columns=['platforms'])

#### `price_overview`

We see that the `price_overview` is in the form of dictionary. We will unpack the column to 4 columns, `currency`, `initial_price`, `final_price`, `discount_percent`. Although there is more data in the dictionary, the remaining are just formatted data from the 4 columns identified. 

It will be confusing if we were to keep `initialprice`, `price` and `discount`, we will take the data collected from steam store as the main source of data and we will drop these 3 columns together with `price_overview` after cleaning.

In [22]:
# unpack price_overview to currency
list_currency = unpack_cell(df_steam, 'price_overview', 'currency', 'currency')

# unpack price_overview to initial_price
list_initial_price = unpack_cell(df_steam, 'price_overview', 'initial_price', 'initial')

# unpack price_overview to final_price
list_final_price = unpack_cell(df_steam, 'price_overview', 'final_price', 'final')

# unpack price_overview to discount_percent
list_discount_percent = unpack_cell(df_steam, 'price_overview', 'discount_percent', 'discount_percent')

100%|█████████████████████████████████████████████████████████████████████████| 49015/49015 [00:01<00:00, 27500.38it/s]
100%|█████████████████████████████████████████████████████████████████████████| 49015/49015 [00:01<00:00, 39140.90it/s]
100%|█████████████████████████████████████████████████████████████████████████| 49015/49015 [00:01<00:00, 47797.39it/s]
100%|█████████████████████████████████████████████████████████████████████████| 49015/49015 [00:01<00:00, 45405.62it/s]


In [23]:
# view first 5 rows of content
df_steam[['price_overview', 'currency', 'initial_price', 'discount_percent', 'final_price']].head()

Unnamed: 0,price_overview,currency,initial_price,discount_percent,final_price
0,"{'currency': 'EUR', 'initial': 819, 'final': 819, 'discount_percent': 0, 'initial_formatted': '', 'final_formatted': '8,19€'}",EUR,819,0,819
1,"{'currency': 'SGD', 'initial': 525, 'final': 525, 'discount_percent': 0, 'initial_formatted': '', 'final_formatted': 'S$5.25'}",SGD,525,0,525
2,"{'currency': 'SGD', 'initial': 525, 'final': 525, 'discount_percent': 0, 'initial_formatted': '', 'final_formatted': 'S$5.25'}",SGD,525,0,525
3,"{'currency': 'SGD', 'initial': 525, 'final': 525, 'discount_percent': 0, 'initial_formatted': '', 'final_formatted': 'S$5.25'}",SGD,525,0,525
4,"{'currency': 'SGD', 'initial': 525, 'final': 525, 'discount_percent': 0, 'initial_formatted': '', 'final_formatted': 'S$5.25'}",SGD,525,0,525


In [24]:
# drop columns
df_steam = df_steam.drop(columns=['price_overview', 'initialprice', 'price', 'discount'])

#### `release_date`

We see that the `release_date` is in the form of dictionary. We will replace the column with just the date of the release date. Although there is more data in the dictionary, we decided that only the date will be required date. 

In [25]:
# replace release_date
list_release_date = unpack_cell(df_steam, 'release_date', 'release_date', 'date')

100%|██████████████████████████████████████████████████████████████████████████| 49015/49015 [00:24<00:00, 2038.12it/s]


In [26]:
# view first 5 rows of content
df_steam[['release_date']].head()

Unnamed: 0,release_date
0,"1 Nov, 2000"
1,"1 Apr, 1999"
2,"1 May, 2003"
3,"1 Jun, 2001"
4,"1 Nov, 1999"


#### `screenshots` and `movies`

As we had created another column `has_screenshots` and `has_movies` previously, we will only create another column that shows the count of the screenshots and moviews that are within the cell. 

As we are not expecting to perform any type of analysis for the media files, we will drop the columns after getting the count. 

In [27]:
# apply lambda function
df_steam['num_screenshots'] = df_steam['screenshots'].apply(lambda x : len(x) if type(x)==list else 0)
df_steam['num_movies'] = df_steam['movies'].apply(lambda x : len(x) if type(x)==list else 0)

# list comprehension
# screenshots
# df_steam['num_screenshots'] = [len(row) if type(row)==list else 0 for index, row in df_steam['screenshots'].iteritems()]

# movies
# df_steam['num_movies'] = [len(row) if type(row)==list else 0 for index, row in df_steam['movies'].iteritems()]

In [28]:
# view first 5 rows of content
df_steam[['num_screenshots', 'has_screenshots', 'num_movies', 'has_movies']].head()

Unnamed: 0,num_screenshots,has_screenshots,num_movies,has_movies
0,13,1,0,0
1,5,1,0,0
2,5,1,0,0
3,4,1,0,0
4,5,1,0,0


In [29]:
# drop columns
df_steam = df_steam.drop(columns=['screenshots', 'movies'])

#### `support_info`

We see that the `support_info` is in the form of dictionary. We will unpack the column to 2 columns, `support_url`, `support_email` and drop the original column. 

In [30]:
# unpack support_info to support_url
list_support_url = unpack_cell(df_steam, 'support_info', 'support_url', 'url')

# unpack support_info to support_email
list_support_email = unpack_cell(df_steam, 'support_info', 'support_email', 'email')

100%|███████████████████████████████████████████████████████████████████████████| 49015/49015 [01:43<00:00, 472.10it/s]
100%|███████████████████████████████████████████████████████████████████████████| 49015/49015 [01:47<00:00, 455.63it/s]


In [31]:
# view first 5 rows of content
df_steam[['support_info', 'support_url', 'support_email']].head()

Unnamed: 0,support_info,support_url,support_email
0,"{'url': 'http://steamcommunity.com/app/10', 'email': ''}",http://steamcommunity.com/app/10,
1,"{'url': '', 'email': ''}",,
2,"{'url': '', 'email': ''}",,
3,"{'url': '', 'email': ''}",,
4,"{'url': 'https://help.steampowered.com', 'email': ''}",https://help.steampowered.com,


In [32]:
# drop column
df_steam = df_steam.drop(columns=['support_info'])

#### `owners`

We see that the `owners` column consist of two values, making it a range. We will break the values and store them into two columns `min_owners` and `max_owners` before dropping the original column. 

In [33]:
# create dummy list to hold the values
temp_min = []
temp_max = []

for index, row in df_steam.loc[:,['owners']].iterrows():
    # only 1 value after converting series to list
    # replace the ,
    # split the string
    temp_2 = (row.tolist())[0].replace(",","").split(" .. ")
    
    # change the list to floats
    temp_2 = [int(num) for num in temp_2]
    
    # append values
    temp_min.append(min(temp_2))
    temp_max.append(max(temp_2))

# create column
df_steam['min_owners'] = temp_min
df_steam['max_owners'] = temp_max

In [34]:
# view first 5 rows of content
df_steam[['owners', 'min_owners', 'max_owners']].head()

Unnamed: 0,owners,min_owners,max_owners
0,"10,000,000 .. 20,000,000",10000000,20000000
1,"5,000,000 .. 10,000,000",5000000,10000000
2,"5,000,000 .. 10,000,000",5000000,10000000
3,"5,000,000 .. 10,000,000",5000000,10000000
4,"5,000,000 .. 10,000,000",5000000,10000000


In [35]:
# drop column
df_steam = df_steam.drop(columns=['owners'])

#### `score_rank`

We see that the `score_rank` are mostly empty values, with the rest having just 4 different value of scores. 

In [36]:
# look at score_rank
df_steam['score_rank'].value_counts()

       48977
99        14
98        11
100       10
97         3
Name: score_rank, dtype: int64

We see that out of 49015 rows, 48977 (around 99\%) of the data is having zero information. We will drop this column as we do not foresee the column will be useful for analysis or modelling (any) purpose. 

In [37]:
# drop `score_rank` column
df_steam = df_steam.drop(columns=["score_rank"])

Although there is still `tags` column to do the manipulation, we will perform that manipulation after creating a new dataframe for the column. 

Let us take a look at the data after the manipulation. 

In [38]:
# see df shape and size
print(f"Shape of data : {df_steam.shape}")
print(f"First 3 rows of Store data")
df_steam.head(3)

Shape of data : (49015, 46)
First 3 rows of Store data


Unnamed: 0,about_the_game,background,detailed_description,header_image,is_free,name,release_date,required_age,short_description,steam_appid,type,average_2weeks,average_forever,ccu,developer,genre,languages,median_2weeks,median_forever,negative,positive,publisher,tags,userscore,has_movies,has_screenshots,genre_id,categories_id,categories_description,min_linux_requirements,min_mac_requirements,min_pc_requirements,num_packages,windows_platform,linux_platform,mac_platform,currency,initial_price,final_price,discount_percent,num_screenshots,num_movies,support_url,support_email,min_owners,max_owners
0,Play the world's number 1 online action game. Engage in an incredibly realistic brand of terrorist warfare in this wildly popular team-based game. Ally with teammates to complete strategic missions. Take out enemy sites. Rescue hostages. Your role affects your team's success. Your team's success affects your role.,https://cdn.akamai.steamstatic.com/steam/apps/10/page_bg_generated_v6b.jpg?t=1602535893,Play the world's number 1 online action game. Engage in an incredibly realistic brand of terrorist warfare in this wildly popular team-based game. Ally with teammates to complete strategic missions. Take out enemy sites. Rescue hostages. Your role affects your team's success. Your team's success affects your role.,https://cdn.akamai.steamstatic.com/steam/apps/10/header.jpg?t=1602535893,0,Counter-Strike,"1 Nov, 2000",0.0,Play the world's number 1 online action game. Engage in an incredibly realistic brand of terrorist warfare in this wildly popular team-based game. Ally with teammates to complete strategic missions. Take out enemy sites. Rescue hostages. Your role affects your team's success. Your team's success affects your role.,10.0,game,212.0,8690.0,16837.0,Valve,Action,"English, French, German, Italian, Spanish - Spain, Simplified Chinese, Traditional Chinese, Korean",116.0,239.0,4944.0,193192.0,Valve,"{'Action': 5379, 'FPS': 4801, 'Multiplayer': 3362, 'Shooter': 3327, 'Classic': 2758, 'Team-Based': 1844, 'First-Person': 1692, 'Competitive': 1588, 'Tactical': 1323, '1990's': 1181, 'e-sports': 1173, 'PvP': 865, 'Old School': 751, 'Military': 623, 'Strategy': 604, 'Survival': 296, 'Score Attack': 285, '1980s': 256, 'Assassin': 223, 'Violent': 65}",0.0,0,1,[1],"[1, 49, 36, 37, 8]","[Multi-player, PvP, Online PvP, Shared/Split Screen PvP, Valve Anti-Cheat enabled]","Minimum: Linux Ubuntu 12.04, Dual-core from Intel or AMD at 2.8 GHz, 1GB Memory, nVidia GeForce 8600/9600GT, ATI/AMD Radeaon HD2600/3600 (Graphic Drivers: nVidia 310, AMD 12.11), OpenGL 2.1, 4GB Hard Drive Space, OpenAL Compatible Sound Card","Minimum: OS X Snow Leopard 10.6.3, 1GB RAM, 4GB Hard Drive Space,NVIDIA GeForce 8 or higher, ATI X1600 or higher, or Intel HD 3000 or higher Mouse, Keyboard, Internet Connection","\r\n\t\t\t<p><strong>Minimum:</strong> 500 mhz processor, 96mb ram, 16mb video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>\r\n\t\t\t<p><strong>Recommended:</strong> 800 mhz processor, 128mb ram, 32mb+ video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>\r\n\t\t\t",2,1,1,1,EUR,819,819,0,13,0,http://steamcommunity.com/app/10,,10000000,20000000
1,"One of the most popular online action games of all time, Team Fortress Classic features over nine character classes -- from Medic to Spy to Demolition Man -- enlisted in a unique style of online team warfare. Each character class possesses unique weapons, items, and abilities, as teams compete online in a variety of game play modes.",https://cdn.akamai.steamstatic.com/steam/apps/20/page_bg_generated_v6b.jpg?t=1579634708,"One of the most popular online action games of all time, Team Fortress Classic features over nine character classes -- from Medic to Spy to Demolition Man -- enlisted in a unique style of online team warfare. Each character class possesses unique weapons, items, and abilities, as teams compete online in a variety of game play modes.",https://cdn.akamai.steamstatic.com/steam/apps/20/header.jpg?t=1579634708,0,Team Fortress Classic,"1 Apr, 1999",0.0,"One of the most popular online action games of all time, Team Fortress Classic features over nine character classes -- from Medic to Spy to Demolition Man -- enlisted in a unique style of online team warfare. Each character class possesses unique weapons, items, and abilities, as teams compete online in a variety of game play modes.",20.0,game,0.0,2752.0,77.0,Valve,Action,"English, French, German, Italian, Spanish - Spain, Korean, Russian, Simplified Chinese, Traditional Chinese",0.0,16.0,896.0,5416.0,Valve,"{'Action': 745, 'FPS': 306, 'Multiplayer': 257, 'Classic': 232, 'Hero Shooter': 213, 'Shooter': 206, 'Team-Based': 188, 'Class-Based': 181, 'First-Person': 169, '1990's': 132, 'Old School': 106, 'Co-op': 89, 'Competitive': 68, 'Fast-Paced': 61, 'Retro': 55, 'Online Co-Op': 51, 'Violent': 45, 'Mod': 36, 'Funny': 35, 'Remake': 35}",0.0,0,1,[1],"[1, 49, 36, 37, 8, 44]","[Multi-player, PvP, Online PvP, Shared/Split Screen PvP, Valve Anti-Cheat enabled, Remote Play Together]","Minimum: Linux Ubuntu 12.04, Dual-core from Intel or AMD at 2.8 GHz, 1GB Memory, nVidia GeForce 8600/9600GT, ATI/AMD Radeaon HD2600/3600 (Graphic Drivers: nVidia 310, AMD 12.11), OpenGL 2.1, 4GB Hard Drive Space, OpenAL Compatible Sound Card","Minimum: OS X Snow Leopard 10.6.3, 1GB RAM, 4GB Hard Drive Space,NVIDIA GeForce 8 or higher, ATI X1600 or higher, or Intel HD 3000 or higher Mouse, Keyboard, Internet Connection","\r\n\t\t\t<p><strong>Minimum:</strong> 500 mhz processor, 96mb ram, 16mb video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>\r\n\t\t\t<p><strong>Recommended:</strong> 800 mhz processor, 128mb ram, 32mb+ video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>\r\n\t\t\t",1,1,1,1,SGD,525,525,0,5,0,,,5000000,10000000
2,"Enlist in an intense brand of Axis vs. Allied teamplay set in the WWII European Theatre of Operations. Players assume the role of light/assault/heavy infantry, sniper or machine-gunner class, each with a unique arsenal of historical weaponry at their disposal. Missions are based on key historical operations. And, as war rages, players must work together with their squad to accomplish a variety...",https://cdn.akamai.steamstatic.com/steam/apps/30/page_bg_generated_v6b.jpg?t=1512413490,"Enlist in an intense brand of Axis vs. Allied teamplay set in the WWII European Theatre of Operations. Players assume the role of light/assault/heavy infantry, sniper or machine-gunner class, each with a unique arsenal of historical weaponry at their disposal. Missions are based on key historical operations. And, as war rages, players must work together with their squad to accomplish a variety...",https://cdn.akamai.steamstatic.com/steam/apps/30/header.jpg?t=1512413490,0,Day of Defeat,"1 May, 2003",0.0,"Enlist in an intense brand of Axis vs. Allied teamplay set in the WWII European Theatre of Operations. Players assume the role of light/assault/heavy infantry, sniper or machine-gunner class, each with a unique arsenal of historical weaponry at their disposal. Missions are based on key historical operations.",30.0,game,0.0,4250.0,139.0,Valve,Action,"English, French, German, Italian, Spanish - Spain",0.0,28.0,557.0,5007.0,Valve,"{'FPS': 788, 'World War II': 249, 'Multiplayer': 202, 'Shooter': 188, 'Action': 160, 'War': 151, 'Team-Based': 131, 'Classic': 126, 'First-Person': 105, 'Class-Based': 77, 'Military': 64, 'Historical': 57, 'Tactical': 40, 'Singleplayer': 37, 'Co-op': 34, 'Difficult': 18, 'Old School': 16, 'Retro': 14, 'World War I': 14, 'Strategy': 13}",0.0,0,1,[1],"[1, 8]","[Multi-player, Valve Anti-Cheat enabled]","Minimum: Linux Ubuntu 12.04, Dual-core from Intel or AMD at 2.8 GHz, 1GB Memory, nVidia GeForce 8600/9600GT, ATI/AMD Radeaon HD2600/3600 (Graphic Drivers: nVidia 310, AMD 12.11), OpenGL 2.1, 4GB Hard Drive Space, OpenAL Compatible Sound Card","Minimum: OS X Snow Leopard 10.6.3, 1GB RAM, 4GB Hard Drive Space,NVIDIA GeForce 8 or higher, ATI X1600 or higher, or Intel HD 3000 or higher Mouse, Keyboard, Internet Connection","\r\n\t\t\t<p><strong>Minimum:</strong> 500 mhz processor, 96mb ram, 16mb video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>\r\n\t\t\t<p><strong>Recommended:</strong> 800 mhz processor, 128mb ram, 32mb+ video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>\r\n\t\t\t",1,1,1,1,SGD,525,525,0,5,0,,,5000000,10000000


In [39]:
# see df info
print(f"Info on Steam data")
df_steam.info()

Info on Steam data
<class 'pandas.core.frame.DataFrame'>
Int64Index: 49015 entries, 0 to 50204
Data columns (total 46 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   about_the_game          49015 non-null  object 
 1   background              49015 non-null  object 
 2   detailed_description    49015 non-null  object 
 3   header_image            49015 non-null  object 
 4   is_free                 49015 non-null  int32  
 5   name                    49015 non-null  object 
 6   release_date            49015 non-null  object 
 7   required_age            49015 non-null  object 
 8   short_description       49015 non-null  object 
 9   steam_appid             49015 non-null  float64
 10  type                    49015 non-null  object 
 11  average_2weeks          49015 non-null  float64
 12  average_forever         49015 non-null  float64
 13  ccu                     49015 non-null  float64
 14  developer          

### Data Grouping

We will create copies from `df_steam` according to the below groupings and perform further manipulation where necessary. 

0) *E.g. table_name<br>
'column_name'*

1) main: <br>
'steam_appid', 'name', 'release_date',  'type', 'developer', 'publisher', 'num_packages'

2) genre: <br>
'steam_appid','genre_id', 'genre'

3) categories: <br>
'steam_appid','categories_id', 'categories_description',

4) description: <br>
'steam_appid', 'about_the_game', 'background', 'detailed_description', 'short_description'

5) price: <br>
'steam_appid', 'is_free', 'currency', 'initial_price', 'final_price', 'discount_percent'

6) statistics: <br>
'steam_appid', 'average_2weeks', 'average_forever', 'ccu', 'median_2weeks', 'median_forever', 'negative', 'positive', 'userscore', 'min_owners', 'max_owners'

7) media: <br>
'steam_appid', 'header_image', 'has_movies', 'num_movies', 'has_screenshots', 'num_screenshots'

8) requirements: <br>
'steam_appid', 'required_age', 'min_linux_requirements', 'linux_platform', 'min_mac_requirements', 'mac_platform', 'min_pc_requirements', 'windows_platform'

9) tag: <br>
'steam_appid', 'tags'

10) language: <br>
'steam_appid', 'languages'

11) support_info: <br>
'steam_appid', 'support_url', 'support_email'


In [40]:
# create copies of data based on above groupings
df_main = df_steam[['steam_appid', 'name', 'release_date', 'type', 'developer', 'publisher', 'num_packages']].copy()

df_genre = df_steam[['steam_appid', 'genre_id', 'genre']].copy()

df_categories = df_steam[['steam_appid', 'categories_id', 'categories_description']].copy()

df_price = df_steam[['steam_appid', 'is_free', 'currency', 'initial_price', 'final_price', 'discount_percent']].copy()

df_description = df_steam[['steam_appid', 'about_the_game', 'background', 'detailed_description', 'short_description']].copy()

df_statistics = df_steam[['steam_appid', 'average_2weeks', 'average_forever', 'ccu', 'median_2weeks', 'median_forever', 'negative', 'positive', 'userscore', 'min_owners', 'max_owners']].copy()

df_media = df_steam[['steam_appid', 'header_image', 'has_movies', 'num_movies', 'has_screenshots', 'num_screenshots']].copy()

df_requirements = df_steam[['steam_appid', 'required_age', 'min_linux_requirements', 'linux_platform', 'min_mac_requirements', 'mac_platform', 'min_pc_requirements', 'windows_platform']].copy()

df_tag = df_steam[['steam_appid', 'tags']].copy()

df_language = df_steam[['steam_appid', 'languages']].copy()

df_support_info = df_steam[['steam_appid', 'support_url', 'support_email']].copy()

Next, we will confirm if we will need to perform anymore data manipulation for the datasets. 

#### `main`

Let us look at the data within `main` group. 

In [41]:
# see df shape and size
print(f"Shape of 'main' data : {df_main.shape}")
print(f"First 3 rows of 'main' data")
df_main.head(3)

Shape of 'main' data : (49015, 7)
First 3 rows of 'main' data


Unnamed: 0,steam_appid,name,release_date,type,developer,publisher,num_packages
0,10.0,Counter-Strike,"1 Nov, 2000",game,Valve,Valve,2
1,20.0,Team Fortress Classic,"1 Apr, 1999",game,Valve,Valve,1
2,30.0,Day of Defeat,"1 May, 2003",game,Valve,Valve,1


In [42]:
# see df info
print(f"Info on 'main' data")
df_main.info()

Info on 'main' data
<class 'pandas.core.frame.DataFrame'>
Int64Index: 49015 entries, 0 to 50204
Data columns (total 7 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   steam_appid   49015 non-null  float64
 1   name          49015 non-null  object 
 2   release_date  49015 non-null  object 
 3   type          49015 non-null  object 
 4   developer     49015 non-null  object 
 5   publisher     49015 non-null  object 
 6   num_packages  49015 non-null  int64  
dtypes: float64(1), int64(1), object(5)
memory usage: 3.0+ MB


We will update the Dtype for `release_date`.

In [43]:
# data lost if we force
pd.to_datetime(df_main['release_date'], errors='coerce').isnull().sum()

223

We see that if we were to force the translation, 223 rows (less than 0.005\%) of the data will be lost. This lost is acceptable and we will go ahead with the translation. 

In [44]:
# convert Dtype for release_date
# this ensures only date is captured in the column
df_main['release_date'] = pd.to_datetime(pd.to_datetime(df_main['release_date'], errors='coerce').dt.date)

In [45]:
# see df info
print(f"Info on 'main' data")
df_main.info()

Info on 'main' data
<class 'pandas.core.frame.DataFrame'>
Int64Index: 49015 entries, 0 to 50204
Data columns (total 7 columns):
 #   Column        Non-Null Count  Dtype         
---  ------        --------------  -----         
 0   steam_appid   49015 non-null  float64       
 1   name          49015 non-null  object        
 2   release_date  48792 non-null  datetime64[ns]
 3   type          49015 non-null  object        
 4   developer     49015 non-null  object        
 5   publisher     49015 non-null  object        
 6   num_packages  49015 non-null  int64         
dtypes: datetime64[ns](1), float64(1), int64(1), object(4)
memory usage: 3.0+ MB


#### `genre`

Let us look at the data within `genre` group. 

In [46]:
# see df shape and size
print(f"Shape of 'genre' data : {df_genre.shape}")
print(f"First 3 rows of 'genre' data")
df_genre.head(3)

Shape of 'genre' data : (49015, 3)
First 3 rows of 'genre' data


Unnamed: 0,steam_appid,genre_id,genre
0,10.0,[1],Action
1,20.0,[1],Action
2,30.0,[1],Action


In [47]:
# see df info
print(f"Info on 'genre' data")
df_genre.info()

Info on 'genre' data
<class 'pandas.core.frame.DataFrame'>
Int64Index: 49015 entries, 0 to 50204
Data columns (total 3 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   steam_appid  49015 non-null  float64
 1   genre_id     49015 non-null  object 
 2   genre        49015 non-null  object 
dtypes: float64(1), object(2)
memory usage: 1.5+ MB


Some games have more than 1 id within the `genre_id` column. We will break them into individual columns. 

In [48]:
# break up genre_id using explode and crosstab
df_genre = df_genre.join(pd.crosstab((df_genre['genre_id'].explode()).index, (df_genre['genre_id'].explode())))

In [49]:
# see df shape and size
print(f"Shape of 'genre' data : {df_genre.shape}")
print(f"First 3 rows of 'genre' data")
df_genre.head(3)

Shape of 'genre' data : (49015, 36)
First 3 rows of 'genre' data


Unnamed: 0,steam_appid,genre_id,genre,1,18,2,23,25,28,29,3,37,4,50,51,52,53,54,55,56,57,58,59,60,70,71,72,73,74,80,81,82,83,84,85,9
0,10.0,[1],Action,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,20.0,[1],Action,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,30.0,[1],Action,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [50]:
# see df info
print(f"Info on 'genre' data")
df_genre.info()

Info on 'genre' data
<class 'pandas.core.frame.DataFrame'>
Int64Index: 49015 entries, 0 to 50204
Data columns (total 36 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   steam_appid  49015 non-null  float64
 1   genre_id     49015 non-null  object 
 2   genre        49015 non-null  object 
 3   1            49015 non-null  int64  
 4   18           49015 non-null  int64  
 5   2            49015 non-null  int64  
 6   23           49015 non-null  int64  
 7   25           49015 non-null  int64  
 8   28           49015 non-null  int64  
 9   29           49015 non-null  int64  
 10  3            49015 non-null  int64  
 11  37           49015 non-null  int64  
 12  4            49015 non-null  int64  
 13  50           49015 non-null  int64  
 14  51           49015 non-null  int64  
 15  52           49015 non-null  int64  
 16  53           49015 non-null  int64  
 17  54           49015 non-null  int64  
 18  55           49015 non-nu

In [51]:
# rename column name
df_genre.rename(columns = {elem: 'genre_id_'+elem if elem not in ['steam_appid', 'genre_id', 'genre'] else elem for elem in df_genre.columns.tolist()}, inplace = True)

In [52]:
# change list to string for db storing
df_genre['genre_id'] = df_genre['genre_id'].apply(lambda x : ",".join([str(elem) for elem in x]))
df_genre['genre'] = df_genre['genre'].apply(lambda x : (",".join([str(elem) for elem in x]) if type(x) == list else x))

In [53]:
# see df shape and size
print(f"Shape of 'genre' data : {df_genre.shape}")
print(f"First 3 rows of 'genre' data")
df_genre.head(3)

Shape of 'genre' data : (49015, 36)
First 3 rows of 'genre' data


Unnamed: 0,steam_appid,genre_id,genre,genre_id_1,genre_id_18,genre_id_2,genre_id_23,genre_id_25,genre_id_28,genre_id_29,genre_id_3,genre_id_37,genre_id_4,genre_id_50,genre_id_51,genre_id_52,genre_id_53,genre_id_54,genre_id_55,genre_id_56,genre_id_57,genre_id_58,genre_id_59,genre_id_60,genre_id_70,genre_id_71,genre_id_72,genre_id_73,genre_id_74,genre_id_80,genre_id_81,genre_id_82,genre_id_83,genre_id_84,genre_id_85,genre_id_9
0,10.0,1,Action,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,20.0,1,Action,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,30.0,1,Action,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [54]:
# see df info
print(f"Info on 'genre' data")
df_genre.info()

Info on 'genre' data
<class 'pandas.core.frame.DataFrame'>
Int64Index: 49015 entries, 0 to 50204
Data columns (total 36 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   steam_appid  49015 non-null  float64
 1   genre_id     49015 non-null  object 
 2   genre        49015 non-null  object 
 3   genre_id_1   49015 non-null  int64  
 4   genre_id_18  49015 non-null  int64  
 5   genre_id_2   49015 non-null  int64  
 6   genre_id_23  49015 non-null  int64  
 7   genre_id_25  49015 non-null  int64  
 8   genre_id_28  49015 non-null  int64  
 9   genre_id_29  49015 non-null  int64  
 10  genre_id_3   49015 non-null  int64  
 11  genre_id_37  49015 non-null  int64  
 12  genre_id_4   49015 non-null  int64  
 13  genre_id_50  49015 non-null  int64  
 14  genre_id_51  49015 non-null  int64  
 15  genre_id_52  49015 non-null  int64  
 16  genre_id_53  49015 non-null  int64  
 17  genre_id_54  49015 non-null  int64  
 18  genre_id_55  49015 non-nu

For intepretation of the genre_id_#, we will use the `genre_mapping` table that was previously saved in the database.

In [55]:
# connecting to the DB file
con = sqlite3.connect('../data/steam_db.db')

In [56]:
# see first 5 columns of the `genre_mapping`

sql_query = '''
SELECT *
FROM genre_mapping
LIMIT 5
'''

pd.read_sql(sql_query, con)

Unnamed: 0,id,description
0,1,Action
1,18,Sports
2,2,Strategy
3,23,Indie
4,25,Adventure


#### `categories`

Let us look at the data within `genre` group. 

In [57]:
# see df shape and size
print(f"Shape of 'categories' data : {df_categories.shape}")
print(f"First 3 rows of 'categories' data")
df_categories.head(3)

Shape of 'categories' data : (49015, 3)
First 3 rows of 'categories' data


Unnamed: 0,steam_appid,categories_id,categories_description
0,10.0,"[1, 49, 36, 37, 8]","[Multi-player, PvP, Online PvP, Shared/Split Screen PvP, Valve Anti-Cheat enabled]"
1,20.0,"[1, 49, 36, 37, 8, 44]","[Multi-player, PvP, Online PvP, Shared/Split Screen PvP, Valve Anti-Cheat enabled, Remote Play Together]"
2,30.0,"[1, 8]","[Multi-player, Valve Anti-Cheat enabled]"


In [58]:
# see df info
print(f"Info on 'categories' data")
df_categories.info()

Info on 'categories' data
<class 'pandas.core.frame.DataFrame'>
Int64Index: 49015 entries, 0 to 50204
Data columns (total 3 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   steam_appid             49015 non-null  float64
 1   categories_id           49015 non-null  object 
 2   categories_description  49015 non-null  object 
dtypes: float64(1), object(2)
memory usage: 2.5+ MB


Some games have more than 1 id within the `categories_id` column. Similar to what we did for `genres`, we will break them into individual columns. 

In [59]:
df_categories = df_categories.join(pd.crosstab(df_categories['categories_id'].explode().index, df_categories['categories_id'].explode()))

In [60]:
# rename column name
df_categories.rename(columns = {elem: 'categories_id_'+str(elem) if elem not in ['steam_appid', 'categories_id', 'categories_description'] else elem for elem in df_categories.columns.tolist()}, inplace = True)

In [61]:
# change list to string for db storing
df_categories['categories_id'] = df_categories['categories_id'].apply(lambda x : ",".join([str(elem) for elem in x]))
df_categories['categories_description'] = df_categories['categories_description'].apply(lambda x : (",".join([str(elem) for elem in x]) if type(x) == list else x))

In [62]:
# see df shape and size
print(f"Shape of 'categories' data : {df_categories.shape}")
print(f"First 3 rows of 'categories' data")
df_categories.head(3)

Shape of 'categories' data : (49015, 40)
First 3 rows of 'categories' data


Unnamed: 0,steam_appid,categories_id,categories_description,categories_id_1,categories_id_2,categories_id_6,categories_id_8,categories_id_9,categories_id_13,categories_id_14,categories_id_15,categories_id_16,categories_id_17,categories_id_18,categories_id_19,categories_id_20,categories_id_22,categories_id_23,categories_id_24,categories_id_25,categories_id_27,categories_id_28,categories_id_29,categories_id_30,categories_id_31,categories_id_32,categories_id_35,categories_id_36,categories_id_37,categories_id_38,categories_id_39,categories_id_40,categories_id_41,categories_id_42,categories_id_43,categories_id_44,categories_id_47,categories_id_48,categories_id_49,categories_id_51
0,10.0,14936378,"Multi-player,PvP,Online PvP,Shared/Split Screen PvP,Valve Anti-Cheat enabled",1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,1,0
1,20.0,1493637844,"Multi-player,PvP,Online PvP,Shared/Split Screen PvP,Valve Anti-Cheat enabled,Remote Play Together",1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,1,0,0,1,0
2,30.0,18,"Multi-player,Valve Anti-Cheat enabled",1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [63]:
# see df info
print(f"Info on 'categories' data")
df_categories.info()

Info on 'categories' data
<class 'pandas.core.frame.DataFrame'>
Int64Index: 49015 entries, 0 to 50204
Data columns (total 40 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   steam_appid             49015 non-null  float64
 1   categories_id           49015 non-null  object 
 2   categories_description  49015 non-null  object 
 3   categories_id_1         49015 non-null  int64  
 4   categories_id_2         49015 non-null  int64  
 5   categories_id_6         49015 non-null  int64  
 6   categories_id_8         49015 non-null  int64  
 7   categories_id_9         49015 non-null  int64  
 8   categories_id_13        49015 non-null  int64  
 9   categories_id_14        49015 non-null  int64  
 10  categories_id_15        49015 non-null  int64  
 11  categories_id_16        49015 non-null  int64  
 12  categories_id_17        49015 non-null  int64  
 13  categories_id_18        49015 non-null  int64  
 14  categories_i

We will need a mapping for id number to description, hence we will use `list_cat` to create. 

Lets see the unique 'id' and create a list of the index to keep. 

In [64]:
# create df
df_cat_map = pd.DataFrame(list_cat)

In [65]:
# create unique id list
list_all_values = list(df_cat_map['id'].unique())
list_duplicated_values = list(df_cat_map.loc[df_cat_map['id'].duplicated(keep=False), 'id'].unique())

In [66]:
# create unique id list without duplicates
list_no_duplicated = []

for value in list_all_values:
    if value not in list_duplicated_values:
        list_no_duplicated.append(value)

In [67]:
# create unique id list for keeping
list_index_keep = []
for index, row in df_cat_map['id'].iteritems():
    if row in list_no_duplicated:
        list_index_keep.append(index)

In [68]:
# print out the duplicated id values for comparing
for value in list_duplicated_values:
    print(f"for id : {value}")
    print(df_cat_map.loc[df_cat_map['id'] == value,'description'].index.tolist())
    print(df_cat_map.loc[df_cat_map['id'] == value,'description'].to_numpy())
    print()

for id : 1
[0, 41, 48, 55, 75, 96, 107, 121, 134, 168, 175, 199, 212, 225]
['Multi-player' '多人' 'Çok Oyunculu' 'Mehrspieler' 'Для нескольких игроков'
 'Багатокористувацька гра' 'Multijogador' 'Multijoueur' 'Multiplayer'
 'Multigiocatore' 'Többjátékos' 'Wieloosobowa' 'Multijugador'
 'Režim pro více hráčů']

for id : 49
[1, 76, 144, 213]
['PvP' 'Против игроков' 'JxJ' 'JcJ']

for id : 36
[2, 63, 77, 145, 169, 205, 214]
['Online PvP' 'Online-PvP' 'Против игроков (по сети)' 'JxJ online'
 'PvP online' 'Sieciowe PvP' 'JcJ en línea']

for id : 37
[3, 137, 146, 157, 197, 200]
['Shared/Split Screen PvP' 'PvP-Spiele mit geteiltem Bildschirm'
 'JxJ em tela dividida/compartilhada' 'Против игроков (общий экран)'
 'JcJ en écran partagé' 'PvP na wspólnym/dzielonym ekranie']

for id : 8
[4, 173]
['Valve Anti-Cheat enabled' 'Valve Anti-Cheat attivato']

for id : 2
[6, 32, 34, 40, 47, 65, 74, 95, 106, 114, 120, 165, 167, 174, 190, 219]
['Single-player' 'Jednoosobowa' 'Einzelspieler' '单人' 'Tek Oyunculu'
 

By running the above code and comparing, we obtained the below list of values of index to keep. 

In [69]:
# append the list of values
for index in [0,1,2,3,4,6,7,8,9,10,11,12,13,15,16,17,18,19,21,22,23,24,25,27,28,30,31,64]:
    list_index_keep.append(index)

In [70]:
# create the unique values
df_cat_map = df_cat_map.loc[list_index_keep,:].sort_index()

In [71]:
# create the unique id dataframe
df_cat_map.reset_index(drop=True)

Unnamed: 0,id,description
0,1,Multi-player
1,49,PvP
2,36,Online PvP
3,37,Shared/Split Screen PvP
4,8,Valve Anti-Cheat enabled
5,44,Remote Play Together
6,2,Single-player
7,23,Steam Cloud
8,41,Remote Play on Phone
9,42,Remote Play on Tablet


#### `description`

Let us look at the data within `description` group. 

In [72]:
# see df shape and size
print(f"Shape of 'description' data : {df_description.shape}")
print(f"First 3 rows of 'description' data")
df_description.head(3)

Shape of 'description' data : (49015, 5)
First 3 rows of 'description' data


Unnamed: 0,steam_appid,about_the_game,background,detailed_description,short_description
0,10.0,Play the world's number 1 online action game. Engage in an incredibly realistic brand of terrorist warfare in this wildly popular team-based game. Ally with teammates to complete strategic missions. Take out enemy sites. Rescue hostages. Your role affects your team's success. Your team's success affects your role.,https://cdn.akamai.steamstatic.com/steam/apps/10/page_bg_generated_v6b.jpg?t=1602535893,Play the world's number 1 online action game. Engage in an incredibly realistic brand of terrorist warfare in this wildly popular team-based game. Ally with teammates to complete strategic missions. Take out enemy sites. Rescue hostages. Your role affects your team's success. Your team's success affects your role.,Play the world's number 1 online action game. Engage in an incredibly realistic brand of terrorist warfare in this wildly popular team-based game. Ally with teammates to complete strategic missions. Take out enemy sites. Rescue hostages. Your role affects your team's success. Your team's success affects your role.
1,20.0,"One of the most popular online action games of all time, Team Fortress Classic features over nine character classes -- from Medic to Spy to Demolition Man -- enlisted in a unique style of online team warfare. Each character class possesses unique weapons, items, and abilities, as teams compete online in a variety of game play modes.",https://cdn.akamai.steamstatic.com/steam/apps/20/page_bg_generated_v6b.jpg?t=1579634708,"One of the most popular online action games of all time, Team Fortress Classic features over nine character classes -- from Medic to Spy to Demolition Man -- enlisted in a unique style of online team warfare. Each character class possesses unique weapons, items, and abilities, as teams compete online in a variety of game play modes.","One of the most popular online action games of all time, Team Fortress Classic features over nine character classes -- from Medic to Spy to Demolition Man -- enlisted in a unique style of online team warfare. Each character class possesses unique weapons, items, and abilities, as teams compete online in a variety of game play modes."
2,30.0,"Enlist in an intense brand of Axis vs. Allied teamplay set in the WWII European Theatre of Operations. Players assume the role of light/assault/heavy infantry, sniper or machine-gunner class, each with a unique arsenal of historical weaponry at their disposal. Missions are based on key historical operations. And, as war rages, players must work together with their squad to accomplish a variety...",https://cdn.akamai.steamstatic.com/steam/apps/30/page_bg_generated_v6b.jpg?t=1512413490,"Enlist in an intense brand of Axis vs. Allied teamplay set in the WWII European Theatre of Operations. Players assume the role of light/assault/heavy infantry, sniper or machine-gunner class, each with a unique arsenal of historical weaponry at their disposal. Missions are based on key historical operations. And, as war rages, players must work together with their squad to accomplish a variety...","Enlist in an intense brand of Axis vs. Allied teamplay set in the WWII European Theatre of Operations. Players assume the role of light/assault/heavy infantry, sniper or machine-gunner class, each with a unique arsenal of historical weaponry at their disposal. Missions are based on key historical operations."


In [73]:
# see df info
print(f"Info on 'description' data")
df_description.info()

Info on 'description' data
<class 'pandas.core.frame.DataFrame'>
Int64Index: 49015 entries, 0 to 50204
Data columns (total 5 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   steam_appid           49015 non-null  float64
 1   about_the_game        49015 non-null  object 
 2   background            49015 non-null  object 
 3   detailed_description  49015 non-null  object 
 4   short_description     49015 non-null  object 
dtypes: float64(1), object(4)
memory usage: 3.3+ MB


The data seems to be okay and no further manipulation is required. 

#### `price`

Let us look at the data within `price` group. 

In [74]:
# see df shape and size
print(f"Shape of 'price' data : {df_price.shape}")
print(f"First 3 rows of 'price' data")
df_price.head(3)

Shape of 'price' data : (49015, 6)
First 3 rows of 'price' data


Unnamed: 0,steam_appid,is_free,currency,initial_price,final_price,discount_percent
0,10.0,0,EUR,819,819,0
1,20.0,0,SGD,525,525,0
2,30.0,0,SGD,525,525,0


In [75]:
# see df info
print(f"Info on 'price' data")
df_price.info()

Info on 'price' data
<class 'pandas.core.frame.DataFrame'>
Int64Index: 49015 entries, 0 to 50204
Data columns (total 6 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   steam_appid       49015 non-null  float64
 1   is_free           49015 non-null  int32  
 2   currency          49015 non-null  object 
 3   initial_price     49015 non-null  object 
 4   final_price       49015 non-null  object 
 5   discount_percent  49015 non-null  object 
dtypes: float64(1), int32(1), object(4)
memory usage: 3.4+ MB


In [76]:
# update the data 
df_price['initial_price'] = df_price['initial_price'].apply(pd.to_numeric).astype("float64")
df_price['final_price'] = df_price['final_price'].apply(pd.to_numeric).astype("float64")
df_price['discount_percent'] = df_price['discount_percent'].apply(pd.to_numeric).astype("float64")
df_price['currency'] = df_price['currency'].astype("str")

In [77]:
# see df shape and size
print(f"Shape of 'price' data : {df_price.shape}")
print(f"First 3 rows of 'price' data")
df_price.head(3)

Shape of 'price' data : (49015, 6)
First 3 rows of 'price' data


Unnamed: 0,steam_appid,is_free,currency,initial_price,final_price,discount_percent
0,10.0,0,EUR,819.0,819.0,0.0
1,20.0,0,SGD,525.0,525.0,0.0
2,30.0,0,SGD,525.0,525.0,0.0


In [78]:
# see df info
print(f"Info on 'price' data")
df_price.info()

Info on 'price' data
<class 'pandas.core.frame.DataFrame'>
Int64Index: 49015 entries, 0 to 50204
Data columns (total 6 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   steam_appid       49015 non-null  float64
 1   is_free           49015 non-null  int32  
 2   currency          49015 non-null  object 
 3   initial_price     49015 non-null  float64
 4   final_price       49015 non-null  float64
 5   discount_percent  49015 non-null  float64
dtypes: float64(4), int32(1), object(1)
memory usage: 3.4+ MB


#### `statistics`

Let us look at the data within `statistics` group. 

In [79]:
# see df shape and size
print(f"Shape of 'statistics' data : {df_statistics.shape}")
print(f"First 3 rows of 'statistics' data")
df_statistics.head(3)

Shape of 'statistics' data : (49015, 11)
First 3 rows of 'statistics' data


Unnamed: 0,steam_appid,average_2weeks,average_forever,ccu,median_2weeks,median_forever,negative,positive,userscore,min_owners,max_owners
0,10.0,212.0,8690.0,16837.0,116.0,239.0,4944.0,193192.0,0.0,10000000,20000000
1,20.0,0.0,2752.0,77.0,0.0,16.0,896.0,5416.0,0.0,5000000,10000000
2,30.0,0.0,4250.0,139.0,0.0,28.0,557.0,5007.0,0.0,5000000,10000000


In [80]:
# see df info
print(f"Info on 'statistics' data")
df_statistics.info()

Info on 'statistics' data
<class 'pandas.core.frame.DataFrame'>
Int64Index: 49015 entries, 0 to 50204
Data columns (total 11 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   steam_appid      49015 non-null  float64
 1   average_2weeks   49015 non-null  float64
 2   average_forever  49015 non-null  float64
 3   ccu              49015 non-null  float64
 4   median_2weeks    49015 non-null  float64
 5   median_forever   49015 non-null  float64
 6   negative         49015 non-null  float64
 7   positive         49015 non-null  float64
 8   userscore        49015 non-null  float64
 9   min_owners       49015 non-null  int64  
 10  max_owners       49015 non-null  int64  
dtypes: float64(9), int64(2)
memory usage: 5.5 MB


We will create another column `review_score` and `review_percent`.

For `review_score`, we will take `positive` - `negative` <br>
For `review_percent`, we will take `review_score` / ceil(max(`review_score`)/100)*100

In [81]:
df_statistics['review_score'] = df_statistics['positive'] - df_statistics['negative']

In [82]:
# cell to highest 1000 to avoid 100 
df_statistics['review_percent'] = ((df_statistics['review_score'] / (math.ceil(max(df_statistics['review_score'])/1000)*1000) ) * 100)

In [83]:
# see df shape and size
print(f"Shape of 'statistics' data : {df_statistics.shape}")
print(f"First 3 rows of 'statistics' data")
df_statistics.head(3)

Shape of 'statistics' data : (49015, 13)
First 3 rows of 'statistics' data


Unnamed: 0,steam_appid,average_2weeks,average_forever,ccu,median_2weeks,median_forever,negative,positive,userscore,min_owners,max_owners,review_score,review_percent
0,10.0,212.0,8690.0,16837.0,116.0,239.0,4944.0,193192.0,0.0,10000000,20000000,188248.0,3.966456
1,20.0,0.0,2752.0,77.0,0.0,16.0,896.0,5416.0,0.0,5000000,10000000,4520.0,0.095238
2,30.0,0.0,4250.0,139.0,0.0,28.0,557.0,5007.0,0.0,5000000,10000000,4450.0,0.093763


In [84]:
# see df info
print(f"Info on 'statistics' data")
df_statistics.info()

Info on 'statistics' data
<class 'pandas.core.frame.DataFrame'>
Int64Index: 49015 entries, 0 to 50204
Data columns (total 13 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   steam_appid      49015 non-null  float64
 1   average_2weeks   49015 non-null  float64
 2   average_forever  49015 non-null  float64
 3   ccu              49015 non-null  float64
 4   median_2weeks    49015 non-null  float64
 5   median_forever   49015 non-null  float64
 6   negative         49015 non-null  float64
 7   positive         49015 non-null  float64
 8   userscore        49015 non-null  float64
 9   min_owners       49015 non-null  int64  
 10  max_owners       49015 non-null  int64  
 11  review_score     49015 non-null  float64
 12  review_percent   49015 non-null  float64
dtypes: float64(11), int64(2)
memory usage: 6.2 MB


#### `media`

Let us look at the data within `media` group. 

In [85]:
# see df shape and size
print(f"Shape of 'media' data : {df_media.shape}")
print(f"First 3 rows of 'media' data")
df_media.head(3)

Shape of 'media' data : (49015, 6)
First 3 rows of 'media' data


Unnamed: 0,steam_appid,header_image,has_movies,num_movies,has_screenshots,num_screenshots
0,10.0,https://cdn.akamai.steamstatic.com/steam/apps/10/header.jpg?t=1602535893,0,0,1,13
1,20.0,https://cdn.akamai.steamstatic.com/steam/apps/20/header.jpg?t=1579634708,0,0,1,5
2,30.0,https://cdn.akamai.steamstatic.com/steam/apps/30/header.jpg?t=1512413490,0,0,1,5


In [86]:
# see df info
print(f"Info on 'media' data")
df_media.info()

Info on 'media' data
<class 'pandas.core.frame.DataFrame'>
Int64Index: 49015 entries, 0 to 50204
Data columns (total 6 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   steam_appid      49015 non-null  float64
 1   header_image     49015 non-null  object 
 2   has_movies       49015 non-null  int64  
 3   num_movies       49015 non-null  int64  
 4   has_screenshots  49015 non-null  int64  
 5   num_screenshots  49015 non-null  int64  
dtypes: float64(1), int64(4), object(1)
memory usage: 3.6+ MB


The data seems to be okay and no further manipulation is required. 

#### `requirements`

Let us look at the data within `requirements` group. 

In [87]:
# see df shape and size
print(f"Shape of 'requirements' data : {df_requirements.shape}")
print(f"First 3 rows of 'requirements' data")
df_requirements.head(3)

Shape of 'requirements' data : (49015, 8)
First 3 rows of 'requirements' data


Unnamed: 0,steam_appid,required_age,min_linux_requirements,linux_platform,min_mac_requirements,mac_platform,min_pc_requirements,windows_platform
0,10.0,0.0,"Minimum: Linux Ubuntu 12.04, Dual-core from Intel or AMD at 2.8 GHz, 1GB Memory, nVidia GeForce 8600/9600GT, ATI/AMD Radeaon HD2600/3600 (Graphic Drivers: nVidia 310, AMD 12.11), OpenGL 2.1, 4GB Hard Drive Space, OpenAL Compatible Sound Card",1,"Minimum: OS X Snow Leopard 10.6.3, 1GB RAM, 4GB Hard Drive Space,NVIDIA GeForce 8 or higher, ATI X1600 or higher, or Intel HD 3000 or higher Mouse, Keyboard, Internet Connection",1,"\r\n\t\t\t<p><strong>Minimum:</strong> 500 mhz processor, 96mb ram, 16mb video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>\r\n\t\t\t<p><strong>Recommended:</strong> 800 mhz processor, 128mb ram, 32mb+ video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>\r\n\t\t\t",1
1,20.0,0.0,"Minimum: Linux Ubuntu 12.04, Dual-core from Intel or AMD at 2.8 GHz, 1GB Memory, nVidia GeForce 8600/9600GT, ATI/AMD Radeaon HD2600/3600 (Graphic Drivers: nVidia 310, AMD 12.11), OpenGL 2.1, 4GB Hard Drive Space, OpenAL Compatible Sound Card",1,"Minimum: OS X Snow Leopard 10.6.3, 1GB RAM, 4GB Hard Drive Space,NVIDIA GeForce 8 or higher, ATI X1600 or higher, or Intel HD 3000 or higher Mouse, Keyboard, Internet Connection",1,"\r\n\t\t\t<p><strong>Minimum:</strong> 500 mhz processor, 96mb ram, 16mb video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>\r\n\t\t\t<p><strong>Recommended:</strong> 800 mhz processor, 128mb ram, 32mb+ video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>\r\n\t\t\t",1
2,30.0,0.0,"Minimum: Linux Ubuntu 12.04, Dual-core from Intel or AMD at 2.8 GHz, 1GB Memory, nVidia GeForce 8600/9600GT, ATI/AMD Radeaon HD2600/3600 (Graphic Drivers: nVidia 310, AMD 12.11), OpenGL 2.1, 4GB Hard Drive Space, OpenAL Compatible Sound Card",1,"Minimum: OS X Snow Leopard 10.6.3, 1GB RAM, 4GB Hard Drive Space,NVIDIA GeForce 8 or higher, ATI X1600 or higher, or Intel HD 3000 or higher Mouse, Keyboard, Internet Connection",1,"\r\n\t\t\t<p><strong>Minimum:</strong> 500 mhz processor, 96mb ram, 16mb video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>\r\n\t\t\t<p><strong>Recommended:</strong> 800 mhz processor, 128mb ram, 32mb+ video card, Windows XP, Mouse, Keyboard, Internet Connection<br /></p>\r\n\t\t\t",1


In [88]:
# see df info
print(f"Info on 'requirements' data")
df_requirements.info()

Info on 'requirements' data
<class 'pandas.core.frame.DataFrame'>
Int64Index: 49015 entries, 0 to 50204
Data columns (total 8 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   steam_appid             49015 non-null  float64
 1   required_age            49015 non-null  object 
 2   min_linux_requirements  17453 non-null  object 
 3   linux_platform          49015 non-null  int64  
 4   min_mac_requirements    21594 non-null  object 
 5   mac_platform            49015 non-null  int64  
 6   min_pc_requirements     48978 non-null  object 
 7   windows_platform        49015 non-null  int64  
dtypes: float64(1), int64(3), object(4)
memory usage: 4.4+ MB


In [89]:
for elem in ["\r", "\n", "\t", "<p>", "</p>", "<br />", "<strong>", "</strong>"]:
    df_requirements['min_pc_requirements'] = df_requirements['min_pc_requirements'].apply(lambda x : x.replace(elem,"") if type(x) == str else x)

In [90]:
# see df shape and size
print(f"Shape of 'requirements' data : {df_requirements.shape}")
print(f"First 3 rows of 'requirements' data")
df_requirements.head(3)

Shape of 'requirements' data : (49015, 8)
First 3 rows of 'requirements' data


Unnamed: 0,steam_appid,required_age,min_linux_requirements,linux_platform,min_mac_requirements,mac_platform,min_pc_requirements,windows_platform
0,10.0,0.0,"Minimum: Linux Ubuntu 12.04, Dual-core from Intel or AMD at 2.8 GHz, 1GB Memory, nVidia GeForce 8600/9600GT, ATI/AMD Radeaon HD2600/3600 (Graphic Drivers: nVidia 310, AMD 12.11), OpenGL 2.1, 4GB Hard Drive Space, OpenAL Compatible Sound Card",1,"Minimum: OS X Snow Leopard 10.6.3, 1GB RAM, 4GB Hard Drive Space,NVIDIA GeForce 8 or higher, ATI X1600 or higher, or Intel HD 3000 or higher Mouse, Keyboard, Internet Connection",1,"Minimum: 500 mhz processor, 96mb ram, 16mb video card, Windows XP, Mouse, Keyboard, Internet ConnectionRecommended: 800 mhz processor, 128mb ram, 32mb+ video card, Windows XP, Mouse, Keyboard, Internet Connection",1
1,20.0,0.0,"Minimum: Linux Ubuntu 12.04, Dual-core from Intel or AMD at 2.8 GHz, 1GB Memory, nVidia GeForce 8600/9600GT, ATI/AMD Radeaon HD2600/3600 (Graphic Drivers: nVidia 310, AMD 12.11), OpenGL 2.1, 4GB Hard Drive Space, OpenAL Compatible Sound Card",1,"Minimum: OS X Snow Leopard 10.6.3, 1GB RAM, 4GB Hard Drive Space,NVIDIA GeForce 8 or higher, ATI X1600 or higher, or Intel HD 3000 or higher Mouse, Keyboard, Internet Connection",1,"Minimum: 500 mhz processor, 96mb ram, 16mb video card, Windows XP, Mouse, Keyboard, Internet ConnectionRecommended: 800 mhz processor, 128mb ram, 32mb+ video card, Windows XP, Mouse, Keyboard, Internet Connection",1
2,30.0,0.0,"Minimum: Linux Ubuntu 12.04, Dual-core from Intel or AMD at 2.8 GHz, 1GB Memory, nVidia GeForce 8600/9600GT, ATI/AMD Radeaon HD2600/3600 (Graphic Drivers: nVidia 310, AMD 12.11), OpenGL 2.1, 4GB Hard Drive Space, OpenAL Compatible Sound Card",1,"Minimum: OS X Snow Leopard 10.6.3, 1GB RAM, 4GB Hard Drive Space,NVIDIA GeForce 8 or higher, ATI X1600 or higher, or Intel HD 3000 or higher Mouse, Keyboard, Internet Connection",1,"Minimum: 500 mhz processor, 96mb ram, 16mb video card, Windows XP, Mouse, Keyboard, Internet ConnectionRecommended: 800 mhz processor, 128mb ram, 32mb+ video card, Windows XP, Mouse, Keyboard, Internet Connection",1


In [91]:
# see df info
print(f"Info on 'requirements' data")
df_requirements.info()

Info on 'requirements' data
<class 'pandas.core.frame.DataFrame'>
Int64Index: 49015 entries, 0 to 50204
Data columns (total 8 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   steam_appid             49015 non-null  float64
 1   required_age            49015 non-null  object 
 2   min_linux_requirements  17453 non-null  object 
 3   linux_platform          49015 non-null  int64  
 4   min_mac_requirements    21594 non-null  object 
 5   mac_platform            49015 non-null  int64  
 6   min_pc_requirements     48978 non-null  object 
 7   windows_platform        49015 non-null  int64  
dtypes: float64(1), int64(3), object(4)
memory usage: 4.4+ MB


#### `tag`

Let us look at the data within `tag` group. 

In [92]:
# see df shape and size
print(f"Shape of 'tag' data : {df_tag.shape}")
print(f"First 3 rows of 'tag' data")
df_tag.head(3)

Shape of 'tag' data : (49015, 2)
First 3 rows of 'tag' data


Unnamed: 0,steam_appid,tags
0,10.0,"{'Action': 5379, 'FPS': 4801, 'Multiplayer': 3362, 'Shooter': 3327, 'Classic': 2758, 'Team-Based': 1844, 'First-Person': 1692, 'Competitive': 1588, 'Tactical': 1323, '1990's': 1181, 'e-sports': 1173, 'PvP': 865, 'Old School': 751, 'Military': 623, 'Strategy': 604, 'Survival': 296, 'Score Attack': 285, '1980s': 256, 'Assassin': 223, 'Violent': 65}"
1,20.0,"{'Action': 745, 'FPS': 306, 'Multiplayer': 257, 'Classic': 232, 'Hero Shooter': 213, 'Shooter': 206, 'Team-Based': 188, 'Class-Based': 181, 'First-Person': 169, '1990's': 132, 'Old School': 106, 'Co-op': 89, 'Competitive': 68, 'Fast-Paced': 61, 'Retro': 55, 'Online Co-Op': 51, 'Violent': 45, 'Mod': 36, 'Funny': 35, 'Remake': 35}"
2,30.0,"{'FPS': 788, 'World War II': 249, 'Multiplayer': 202, 'Shooter': 188, 'Action': 160, 'War': 151, 'Team-Based': 131, 'Classic': 126, 'First-Person': 105, 'Class-Based': 77, 'Military': 64, 'Historical': 57, 'Tactical': 40, 'Singleplayer': 37, 'Co-op': 34, 'Difficult': 18, 'Old School': 16, 'Retro': 14, 'World War I': 14, 'Strategy': 13}"


In [93]:
# see df info
print(f"Info on 'tag' data")
df_tag.info()

Info on 'tag' data
<class 'pandas.core.frame.DataFrame'>
Int64Index: 49015 entries, 0 to 50204
Data columns (total 2 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   steam_appid  49015 non-null  float64
 1   tags         49015 non-null  object 
dtypes: float64(1), object(1)
memory usage: 2.1+ MB


We see that the column tag consists of dictionary. We will convert this to a dataframe.

In [94]:
# update tags column
df_tag['tags'] = df_tag['tags'].apply(lambda x : {} if type(x) != dict else x)

In [95]:
# merge the broken down data with the original data
df_tag = df_tag.join(pd.DataFrame(df_tag['tags'].tolist()))

In [96]:
# drop tag column
df_tag = df_tag.drop(columns = ['tags'])

In [97]:
# get columns that were created
list_tag_col = list(df_tag.columns)
list_tag_col.remove("steam_appid")
type(list_tag_col)

list

In [98]:
# fillna in the columns created
# as the tag counts are number of user defined tags, there should not be negative values
# so we will fill the column with negative value
# df_tag.fillna(-9999)
for col in list_tag_col:
    df_tag[col].fillna(-9999, inplace=True)

In [99]:
# see df shape and size
print(f"Shape of 'tag' data : {df_tag.shape}")
print(f"First 3 rows of 'tag' data")
df_tag.head(3)

Shape of 'tag' data : (49015, 429)
First 3 rows of 'tag' data


Unnamed: 0,steam_appid,Action,FPS,Multiplayer,Shooter,Classic,Team-Based,First-Person,Competitive,Tactical,1990's,e-sports,PvP,Old School,Military,Strategy,Survival,Score Attack,1980s,Assassin,Violent,Hero Shooter,Class-Based,Co-op,Fast-Paced,...,Escape Room,Spelling,Roguelike Deckbuilder,Action RTS,VR Only,Skateboarding,Battle Royale,Wrestling,Steam Machine,Hockey,Boss Rush,Social Deduction,Baseball,Jet,Asymmetric VR,Faith,BMX,Hardware,Foreign,Electronic,360 Video,8-bit Music,Rock Music,Instrumental Music,Masterpiece
0,10.0,5379.0,4801.0,3362.0,3327.0,2758.0,1844.0,1692.0,1588.0,1323.0,1181.0,1173.0,865.0,751.0,623.0,604.0,296.0,285.0,256.0,223.0,65.0,-9999.0,-9999.0,-9999.0,-9999.0,...,-9999.0,-9999.0,-9999.0,-9999.0,-9999.0,-9999.0,-9999.0,-9999.0,-9999.0,-9999.0,-9999.0,-9999.0,-9999.0,-9999.0,-9999.0,-9999.0,-9999.0,-9999.0,-9999.0,-9999.0,-9999.0,-9999.0,-9999.0,-9999.0,-9999.0
1,20.0,745.0,306.0,257.0,206.0,232.0,188.0,169.0,68.0,-9999.0,132.0,-9999.0,-9999.0,106.0,-9999.0,-9999.0,-9999.0,-9999.0,-9999.0,-9999.0,45.0,213.0,181.0,89.0,61.0,...,-9999.0,-9999.0,-9999.0,-9999.0,-9999.0,-9999.0,-9999.0,-9999.0,-9999.0,-9999.0,-9999.0,-9999.0,-9999.0,-9999.0,-9999.0,-9999.0,-9999.0,-9999.0,-9999.0,-9999.0,-9999.0,-9999.0,-9999.0,-9999.0,-9999.0
2,30.0,160.0,788.0,202.0,188.0,126.0,131.0,105.0,-9999.0,40.0,-9999.0,-9999.0,-9999.0,16.0,64.0,13.0,-9999.0,-9999.0,-9999.0,-9999.0,-9999.0,-9999.0,77.0,34.0,-9999.0,...,-9999.0,-9999.0,-9999.0,-9999.0,-9999.0,-9999.0,-9999.0,-9999.0,-9999.0,-9999.0,-9999.0,-9999.0,-9999.0,-9999.0,-9999.0,-9999.0,-9999.0,-9999.0,-9999.0,-9999.0,-9999.0,-9999.0,-9999.0,-9999.0,-9999.0


In [100]:
# see df info
print(f"Info on 'tag' data")
df_tag.info()

Info on 'tag' data
<class 'pandas.core.frame.DataFrame'>
Int64Index: 49015 entries, 0 to 50204
Columns: 429 entries, steam_appid to Masterpiece
dtypes: float64(429)
memory usage: 161.8 MB


#### `language`

Let us look at the data within `language` group. 

In [101]:
# see df shape and size
print(f"Shape of 'language' data : {df_language.shape}")
print(f"First 3 rows of 'language' data")
df_language.head(3)

Shape of 'language' data : (49015, 2)
First 3 rows of 'language' data


Unnamed: 0,steam_appid,languages
0,10.0,"English, French, German, Italian, Spanish - Spain, Simplified Chinese, Traditional Chinese, Korean"
1,20.0,"English, French, German, Italian, Spanish - Spain, Korean, Russian, Simplified Chinese, Traditional Chinese"
2,30.0,"English, French, German, Italian, Spanish - Spain"


In [102]:
# see df info
print(f"Info on 'language' data")
df_language.info()

Info on 'language' data
<class 'pandas.core.frame.DataFrame'>
Int64Index: 49015 entries, 0 to 50204
Data columns (total 2 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   steam_appid  49015 non-null  float64
 1   languages    49015 non-null  object 
dtypes: float64(1), object(1)
memory usage: 2.1+ MB


Most of the games have more than 1 language, we will split the language into different columns. 

In [103]:
# change clean the words within language column
df_language['languages'] = df_language['languages'].apply(lambda x : x.lower())
df_language['languages'] = df_language['languages'].apply(lambda x : x.replace("\r",""))
df_language['languages'] = df_language['languages'].apply(lambda x : x.replace("#",""))
df_language['languages'] = df_language['languages'].apply(lambda x : x.replace("[/b]",""))
df_language['languages'] = df_language['languages'].apply(lambda x : x.replace("[b]",""))
df_language['languages'] = df_language['languages'].apply(lambda x : x.replace("lang_",""))
df_language['languages'] = df_language['languages'].apply(lambda x : x.replace("(all with full audio support)",""))
df_language['languages'] = df_language['languages'].apply(lambda x : x.replace("full_audio",""))
df_language['languages'] = df_language['languages'].apply(lambda x : x.replace("(full audio)",""))
df_language['languages'] = df_language['languages'].apply(lambda x : x.replace("*",""))
df_language['languages'] = df_language['languages'].apply(lambda x : x.replace("\n",","))
df_language['languages'] = df_language['languages'].apply(lambda x : x.replace(" ",""))

# split to list for dataframe creating
# df_language['languages'] = (df_language['languages'].apply(lambda x : x.split(",")))

In [104]:
df_language = df_language.join(pd.crosstab((df_language['languages'].apply(lambda x : x.split(","))).explode().index, 
                                           (df_language['languages'].apply(lambda x : x.split(","))).explode()))

In [105]:
# drop unwanted by product
df_language = df_language.drop(columns="")

In [106]:
# see df shape and size
print(f"Shape of 'language' data : {df_language.shape}")
print(f"First 3 rows of 'language' data")
df_language.head(3)

Shape of 'language' data : (49015, 34)
First 3 rows of 'language' data


Unnamed: 0,steam_appid,languages,arabic,bulgarian,czech,danish,dutch,english,finnish,french,german,greek,hungarian,italian,japanese,korean,norwegian,notsupported,polish,portuguese,portuguese-brazil,romanian,russian,simplifiedchinese,slovakian,spanish,spanish-latinamerica,spanish-spain,swedish,thai,traditionalchinese,turkish,ukrainian,vietnamese
0,10.0,"english,french,german,italian,spanish-spain,simplifiedchinese,traditionalchinese,korean",0,0,0,0,0,1,0,1,1,0,0,1,0,1,0,0,0,0,0,0,0,1,0,0,0,1,0,0,1,0,0,0
1,20.0,"english,french,german,italian,spanish-spain,korean,russian,simplifiedchinese,traditionalchinese",0,0,0,0,0,1,0,1,1,0,0,1,0,1,0,0,0,0,0,0,1,1,0,0,0,1,0,0,1,0,0,0
2,30.0,"english,french,german,italian,spanish-spain",0,0,0,0,0,1,0,1,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0


In [107]:
# see df info
print(f"Info on 'language' data")
df_language.info()

Info on 'language' data
<class 'pandas.core.frame.DataFrame'>
Int64Index: 49015 entries, 0 to 50204
Data columns (total 34 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   steam_appid           49015 non-null  float64
 1   languages             49015 non-null  object 
 2   arabic                49015 non-null  int64  
 3   bulgarian             49015 non-null  int64  
 4   czech                 49015 non-null  int64  
 5   danish                49015 non-null  int64  
 6   dutch                 49015 non-null  int64  
 7   english               49015 non-null  int64  
 8   finnish               49015 non-null  int64  
 9   french                49015 non-null  int64  
 10  german                49015 non-null  int64  
 11  greek                 49015 non-null  int64  
 12  hungarian             49015 non-null  int64  
 13  italian               49015 non-null  int64  
 14  japanese              49015 non-null  int64  


#### `support_info`

Let us look at the data within `support_info` group. 

In [108]:
# see df shape and size
print(f"Shape of 'support_info' data : {df_support_info.shape}")
print(f"First 3 rows of 'support_info' data")
df_support_info.head(3)

Shape of 'support_info' data : (49015, 3)
First 3 rows of 'support_info' data


Unnamed: 0,steam_appid,support_url,support_email
0,10.0,http://steamcommunity.com/app/10,
1,20.0,,
2,30.0,,


In [109]:
# see df info
print(f"Info on 'support_info' data")
df_support_info.info()

Info on 'support_info' data
<class 'pandas.core.frame.DataFrame'>
Int64Index: 49015 entries, 0 to 50204
Data columns (total 3 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   steam_appid    49015 non-null  float64
 1   support_url    49015 non-null  object 
 2   support_email  49015 non-null  object 
dtypes: float64(1), object(2)
memory usage: 2.5+ MB


The data seems to be okay and no further manipulation is required. 

## Data output to dictionary

We will perform data output to the database file for the following dataframes:
1) df_main
2) df_genre
3) df_categories and df_cat_list
4) df_description
5) df_price
6) df_statistics
7) df_media
8) df_requirements
9) df_tag
10) df_language
11) df_support_info

In [110]:
# storing the df in the steam_db db
df_main.to_sql(name='main',con=con, index=False, if_exists='replace')

49015

In [111]:
# storing the df in the steam_db db
df_genre.to_sql(name='genre',con=con, index=False, if_exists='replace')

49015

In [112]:
# storing the df in the steam_db db
df_categories.to_sql(name='categories',con=con, index=False, if_exists='replace')

49015

In [113]:
# storing the df in the steam_db db
df_cat_map.to_sql(name='categories_mapping',con=con, index=False, if_exists='replace')

36

In [114]:
# storing the df in the steam_db db
df_description.to_sql(name='description',con=con, index=False, if_exists='replace')

49015

In [115]:
# storing the df in the steam_db db
df_price.to_sql(name='price',con=con, index=False, if_exists='replace')

49015

In [116]:
# storing the df in the steam_db db
df_statistics.to_sql(name='statistics',con=con, index=False, if_exists='replace')

49015

In [117]:
# storing the df in the steam_db db
df_media.to_sql(name='media',con=con, index=False, if_exists='replace')

49015

In [118]:
# storing the df in the steam_db db
df_requirements.to_sql(name='requirements',con=con, index=False, if_exists='replace')

49015

In [119]:
# storing the df in the steam_db db
df_tag.to_sql(name='tag',con=con, index=False, if_exists='replace')

49015

In [120]:
# storing the df in the steam_db db
df_language.to_sql(name='language',con=con, index=False, if_exists='replace')

49015

In [121]:
# storing the df in the steam_db db
df_support_info.to_sql(name='support_info',con=con, index=False, if_exists='replace')

49015

In [122]:
# create list of sql table names
list_df_names = ['main', 'genre', 'genre_mapping', 'categories', 'categories_mapping', 'description', 'price', 
                 'statistics', 'media', 'requirements', 'tag', 'language', 'support_info']

# test if all tables have been successfully created
sql_query_1 = """
SELECT *
FROM
"""
sql_query_2 = """
LIMIT 5;
"""

for table_name in list_df_names:
    print(pd.read_sql((sql_query_1 + " "+ table_name + " " + sql_query_2), con))

   steam_appid                       name         release_date  type  \
0         10.0             Counter-Strike  2000-11-01 00:00:00  game   
1         20.0      Team Fortress Classic  1999-04-01 00:00:00  game   
2         30.0              Day of Defeat  2003-05-01 00:00:00  game   
3         40.0         Deathmatch Classic  2001-06-01 00:00:00  game   
4         50.0  Half-Life: Opposing Force  1999-11-01 00:00:00  game   

          developer publisher  num_packages  
0             Valve     Valve             2  
1             Valve     Valve             1  
2             Valve     Valve             1  
3             Valve     Valve             1  
4  Gearbox Software     Valve             1  
   steam_appid genre_id   genre  genre_id_1  genre_id_18  genre_id_2  \
0         10.0        1  Action           1            0           0   
1         20.0        1  Action           1            0           0   
2         30.0        1  Action           1            0           0   
3  

We are unable to alter the table to add the PRIMARY KEY on db created using sqlite. [[link](https://sqlite.org/faq.html#q11)]

In [None]:
# cell to run if able to upadte primary key

# create list of sql table names
#list_df_names = ['main', 'genre', 'categories', 'description', 'price', 
#                 'statistics', 'media', 'requirements', 'tag', 'language', 'support_info']

# make steam_appid as PRIMARY KEY for all columns
# 'ALTER TABLE `example_table` ADD PRIMARY KEY (`ID_column`);'

#sql_query_1 = """ALTER TABLE """
#sql_query_2 = """ ADD PRIMARY KEY ("steam_appid"); """

#for table_name in list_df_names:
#    print(table_name)
#    pd.read_sql((sql_query_1 + table_name + sql_query_2), con)

## Conclusion

With this, we have completed the data cleaning and grouping of the data. 

Schema as follows. Diagram created using https://dbdiagram.io/home

![schema](../images/schema.png)