# Data Collection (Sample)

***This code notebook shows the functions used and collects a sample of the data. Notebook can be modified to collect all data by changing the paramenters.***

## Gathering Data

As part of our data collection and for the purpose of learning, we will obtain the data required using API. Typically an API is a great way for developers to allow access to databases and information on a server.

Below are the two API that we will be using for the data collection. 

1. https://partner.steamgames.com/doc/webapi_overview
2. https://steamspy.com/about

Valve (the company behind Steam) has API available at https://partner.steamgames.com/. An API such as this allows anyone to interface with data on a website in a controlled way, usually providing a host of useful features to the end-user. 

SteamSpy is a Steam stats-gathering service and crucially has data easily available through its own API. It provides a number of useful metrics including an estimation for total owners of each game.

## Problem Statement

Using data from Steam Games combined with data from Steam Spy, I will seek to identify the top 10 categories and genres that are available on Steam. 

Due to lack of unique user data and response in the data collected, a content based recommender will be built for people who would like to have suggestions on other games that they will be interested by taking in an input of another game name.

Should time permits, the recommender can be built into an application for others to use.

## Data Collection method

Functions are written to collect data using API.

Total data estimated to be $51,749$ games details. The data collection is still in progress as limited to only 5,000 game details per iteration. Reason being the following error will be encountered if more that 5,000.<br> `'Connection failed: Too many connections'`

The data was fully collected on end of **February 2022**, and all data are accurate to that point. 

Data collected are stored in *pickle (.pkl)* format to preserve the integrity of the data. This resulted in a larger file but prevents the data from having integrity issues during cleaning. 

## Data Cleaning

After data collection is completed, we will then merge the two files into one and keep only required columns for the analysis and modelling. 

As opposed to pickle, we will store the data into a *database (.db)*. Due to the large number of columns, we will split the data into various tables. 

---

## Import Libraries

In this section, we will import all the libraries that will be used in this notebook. 

In [1]:
# To read url
import requests

# For Calculation and Data Manipulation
import numpy as np
import pandas as pd

# For `.pkl` file exportion folding creation
import os

# for datetime conversion
import datetime

# for data collection server buffer time
import time

# import functions for downloading data
from utils import get_request, pkl_output, get_game_data, steam_data_request, steamspy_data_request, batch_process

# this setting widens how many characters pandas will display in a column:
pd.options.display.max_colwidth = 400

---

## Functions

In this section, we will list down all the functions that are being used in the notebook as a summary. 

1. `get_request` : Generic Function to get requests from an API
2. `pkl_output` : Function to save dataframe to file
3. `get_game_data` : Functions to get data
4. `steam_data_request` : Function to get data from steam
5. `steamspy_data_request` : Function to get data from steamspy
6. `batch_process` : Function to process data obtained in batches


---

### Function : `get_request`

`get_request` is used to get requests from an API. 

We define a generic function to get requests from an API. This function will take in 2 parameters:
1. URL in string; and 
2. API parameters in dictionary form. 

The API paramenters supplied is passed into the _get requests_ automatically, depending on the API. 

We will add in a couple of scenarios to getting the response: 
1. If a SSL Error occurs during extraction, we will wait for 5 seconds before prompting user to advise if they would like to retry (by calling the function again) and providing a feedback during the function run. 
2. If there is no responses, we wait 10 seconds before retry.

In both scenarios, the loop will end once user choose not to continue with getting the response. 

### Function : `pkl_output`

A generic function to save dataframe into pkl file. This function will take in 2 parameters:
1. filename and path in string; and 
2. dataframe. 

The function saves a dataframe into a pkl file. 

---

## Generate List of App IDs

Every app on steam store has a unique `app ID`, even if the name is the same. This will be our `unique identifier`, which will be used to identify apps between the two extracted data, and eventually merging the tables of data. 

As such, we will generate a list of `App IDs` which will be used to build our data sets. While it is possible to generate the list of `App IDs` from Steam API from the url (https://api.steampowered.com/ISteamApps/GetAppList/v2/https://api.steampowered.com/ISteamApps/GetAppList/v2/), there is a large number of entries and could possibly consists of demos and videos, we will not be able to tell them apart from just the `App ID`. 

SteamSpy provides an `'all'` request, supplying some information on the apps they track. While it does not supply all information about each app, it provides a good starting point. 

After getting the response, we will store it into a pandas dataframe. 

In [2]:
%%time

# define url and parameters to get all App IDs
url_steamspy = 'https://steamspy.com/api.php'
param_appid = {'request' : 'all', 'page': 0}

# show the current number of page being scrap
print(f'\rCurrent page: 0')

# request 'all' from steamspy and parse into dataframe
json_data = get_request(url = url_steamspy, parameters= param_appid)
steam_spy_all_df = pd.DataFrame.from_dict(json_data, orient='index')

# create page counter
counter = 1

# create temporary variable that is length 1000 for while loop to work
data_add = ['temp']*1000

# Create loop for appid extraction
# as each iteration will scrap 1000 entries per page, loop continues if data last obtained is 1000
while len(data_add) == 1000:
    
    # to include buffer timing for each request
    # API indicated that request is every 60s. 
    time.sleep(61)
    
    # show the current number of page being scrap
    print(f'\rCurrent page: {counter}')
    
    # update 'page' parameter
    param_appid['page'] = counter
    
    # create dataframe by getting the json data
    data_add = pd.DataFrame.from_dict(get_request(url = url_steamspy, parameters= param_appid), orient='index')
    
    # concat the additional data
    steam_spy_all_df = pd.concat([steam_spy_all_df, data_add])
    
    # update counter
    counter += 1
    
    # to comment out only when getting all data as code will take around an hour to get complete data
    # used for testing of the below codes
    if counter == 5:    # Line A1
       break           # Line A2

# create pkl file of extraction for future code usage as all data takes about an hour to extract
# to remove comment only if all data is being extracted, i.e. line A1 and A2 are being commented out
pkl_output('../data/sample_app_id_and_game_list.pkl', steam_spy_all_df.sort_values('appid'))

# create dataframe for app_list, keeping only App ID and name
app_list = steam_spy_all_df[['appid', 'name']].sort_values('appid').reset_index(drop=True)
app_list.rename(columns={'appid': 'app_id', 'name': 'game_name'}, inplace=True)

Current page: 0
Current page: 1
Current page: 2
Current page: 3
Current page: 4
Wall time: 4min 10s


In [3]:
# look at app list shape and data
print(app_list.shape)
app_list.head()

(5000, 2)


Unnamed: 0,app_id,game_name
0,10,Counter-Strike
1,20,Team Fortress Classic
2,30,Day of Defeat
3,40,Deathmatch Classic
4,50,Half-Life: Opposing Force


In [4]:
# look at app list info to see if there is any null value
app_list.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5000 entries, 0 to 4999
Data columns (total 2 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   app_id     5000 non-null   int64 
 1   game_name  5000 non-null   object
dtypes: int64(1), object(1)
memory usage: 78.2+ KB


---
## Define download logic

Now that we have the `app_list` dataframe, we can iterate over the App IDs and request individual game data from the servers. 

As there is alot of data that will be retrieved from the internet, we will avoid attempting to retrieve all at once as any errors or connection time-outs could cause the loss of all retrieved data. For this reason, we define a function to download and process the requests in batches, appending each batch to an external file while keeping track of the highest index. 

This will allow us to easily restart the process if an error is encountered, but also suggests we can complete the download across multiple sessions. 

### Function : `get_game_data`

Used to return one batch of game data to the multiple batch process function. It takes in 4 parameters. 

    1. index for starting value corresponding to app_list
    2. index for stopping value corresponding to app_list
    3. function that will be used to scrap data
    4. integer indicating number of seconds in between each request
    
The function returns a dataframe to the batch process function.

### Function : `steam_data_request`

Used to scrap data from steam. It takes in 2 parameters. 

    1. unique identifier of the game
    2. name of the game
    
The function returns a dictionary containing the game data. 

### Function : `steamspy_data_request`

We define 3 functions to get data. 

Used to scrap data from steamspy. It takes in 2 parameters. 

    1. unique identifier of the game
    2. name of the game
    
The function returns a dictionary containing the game data. 

### Function : `batch_process`

Save dataframes from batch scrapping into pkl file. This function will take in 8 parameters:

    1. a function used to scrap data
    2. dataframe containing `app_id` and `game_name`
    3. file path and name of file output
    4. starting index of scrapping dataframe
    5. ending index of scrapping dataframe
    6. batchsize of each batch request
    7. integer indicating pause time between each scrapping
    8. integer indicating pause time between each batch

The function saves file into a pkl file

Once all functions are loaded, we will start the data extraction for Steam and Steamspy.

---

## Download Steam Game

We will start downloading the game data for the games identified in our `app_list`. 

In [5]:
%%time

# if user is interested in obtaining a sample data
# instead of running the remaining cell in this section
# this cell can be run instead
# by removining the comments for lines identified as code

# pkl file name to be saved
sample_steam_filename = '../data/sample_steam_game_data.pkl'    # code

# create empty pickle file for function usage using empty dataframe
empty_df = pd.DataFrame()   # code
pkl_output(sample_steam_filename, empty_df)   # code

# last run index, default is 0 to start scrapping
sample_steam_index_value = 0

# download game data from steam based on app_list
# below are all code
batch_process(
    fn = steam_data_request,  # Function used to scrap data
    app_list = app_list,      # dataframe containing app_id and game_name
    data_filename = sample_steam_filename,       # folder/file path and name of file. E.g. '../data/name.pkl'
    begin = sample_steam_index_value,      # starting index of scrapping. Default is 0
    end=20,                   # last index of scrapping. Default to -1
    batchsize=10,           # Size of each batch iteration. Default is 1000
    pause=5,                  # value of pause time in seconds for each scrapping. Default is 2 seconds
    batch_pause=180                   # value of pause time in seconds for each batch. Default is 300 seconds
)

# read in sample and look at dataframe
sample_steam_game_df = pd.read_pickle(sample_steam_filename)
sample_steam_game_df.info()

Starting at index 0

Starting lines 0 to 9 scrapping                        
Current index: 0

  game_data = game_data.append(data, ignore_index=True)


Current index: 1

  game_data = game_data.append(data, ignore_index=True)


Current index: 2

  game_data = game_data.append(data, ignore_index=True)


Current index: 3

  game_data = game_data.append(data, ignore_index=True)


Current index: 4

  game_data = game_data.append(data, ignore_index=True)


Current index: 5

  game_data = game_data.append(data, ignore_index=True)


Current index: 6

  game_data = game_data.append(data, ignore_index=True)


Current index: 7

  game_data = game_data.append(data, ignore_index=True)


Current index: 8

  game_data = game_data.append(data, ignore_index=True)


Current index: 9

  game_data = game_data.append(data, ignore_index=True)


Data exported for lines 0 to 9                         
Starting lines 10 to 19 scrapping                        
Current index: 10

  game_data = game_data.append(data, ignore_index=True)


Current index: 11

  game_data = game_data.append(data, ignore_index=True)


Current index: 12

  game_data = game_data.append(data, ignore_index=True)


Current index: 13

  game_data = game_data.append(data, ignore_index=True)


Current index: 14

  game_data = game_data.append(data, ignore_index=True)


Current index: 15

  game_data = game_data.append(data, ignore_index=True)


Current index: 16

  game_data = game_data.append(data, ignore_index=True)


Current index: 17

  game_data = game_data.append(data, ignore_index=True)


Current index: 18

  game_data = game_data.append(data, ignore_index=True)


Current index: 19

  game_data = game_data.append(data, ignore_index=True)


Data exported for lines 10 to 19                         

All batches complete. 20 games extracted
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20 entries, 0 to 19
Data columns (total 34 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   type                  20 non-null     object
 1   name                  20 non-null     object
 2   steam_appid           20 non-null     int64 
 3   required_age          20 non-null     int64 
 4   is_free               20 non-null     bool  
 5   detailed_description  20 non-null     object
 6   about_the_game        20 non-null     object
 7   short_description     20 non-null     object
 8   supported_languages   20 non-null     object
 9   header_image          20 non-null     object
 10  website               9 non-null      object
 11  pc_requirements       20 non-null     object
 12  mac_requirements      20 non-null     object
 13  linux_requirements    20 non-null     obje

In [6]:
print(sample_steam_game_df.shape)
sample_steam_game_df.head()

(20, 34)


Unnamed: 0,type,name,steam_appid,required_age,is_free,detailed_description,about_the_game,short_description,supported_languages,header_image,...,recommendations,release_date,support_info,background,background_raw,content_descriptors,dlc,achievements,demos,movies
0,game,Counter-Strike,10,0,False,Play the world's number 1 online action game. Engage in an incredibly realistic brand of terrorist warfare in this wildly popular team-based game. Ally with teammates to complete strategic missions. Take out enemy sites. Rescue hostages. Your role affects your team's success. Your team's success affects your role.,Play the world's number 1 online action game. Engage in an incredibly realistic brand of terrorist warfare in this wildly popular team-based game. Ally with teammates to complete strategic missions. Take out enemy sites. Rescue hostages. Your role affects your team's success. Your team's success affects your role.,Play the world's number 1 online action game. Engage in an incredibly realistic brand of terrorist warfare in this wildly popular team-based game. Ally with teammates to complete strategic missions. Take out enemy sites. Rescue hostages. Your role affects your team's success. Your team's success affects your role.,"English<strong>*</strong>, French<strong>*</strong>, German<strong>*</strong>, Italian<strong>*</strong>, Spanish - Spain<strong>*</strong>, Simplified Chinese<strong>*</strong>, Traditional Chinese<strong>*</strong>, Korean<strong>*</strong><br><strong>*</strong>languages with full audio support",https://cdn.akamai.steamstatic.com/steam/apps/10/header.jpg?t=1602535893,...,{'total': 119700},"{'coming_soon': False, 'date': '1 Nov, 2000'}","{'url': 'http://steamcommunity.com/app/10', 'email': ''}",https://cdn.akamai.steamstatic.com/steam/apps/10/page_bg_generated_v6b.jpg?t=1602535893,https://cdn.akamai.steamstatic.com/steam/apps/10/page_bg_generated.jpg?t=1602535893,"{'ids': [2, 5], 'notes': 'Includes intense violence and blood.'}",,,,
1,game,Team Fortress Classic,20,0,False,"One of the most popular online action games of all time, Team Fortress Classic features over nine character classes -- from Medic to Spy to Demolition Man -- enlisted in a unique style of online team warfare. Each character class possesses unique weapons, items, and abilities, as teams compete online in a variety of game play modes.","One of the most popular online action games of all time, Team Fortress Classic features over nine character classes -- from Medic to Spy to Demolition Man -- enlisted in a unique style of online team warfare. Each character class possesses unique weapons, items, and abilities, as teams compete online in a variety of game play modes.","One of the most popular online action games of all time, Team Fortress Classic features over nine character classes -- from Medic to Spy to Demolition Man -- enlisted in a unique style of online team warfare. Each character class possesses unique weapons, items, and abilities, as teams compete online in a variety of game play modes.","English, French, German, Italian, Spanish - Spain, Korean, Russian, Simplified Chinese, Traditional Chinese",https://cdn.akamai.steamstatic.com/steam/apps/20/header.jpg?t=1579634708,...,{'total': 4547},"{'coming_soon': False, 'date': '1 Apr, 1999'}","{'url': '', 'email': ''}",https://cdn.akamai.steamstatic.com/steam/apps/20/page_bg_generated_v6b.jpg?t=1579634708,https://cdn.akamai.steamstatic.com/steam/apps/20/page_bg_generated.jpg?t=1579634708,"{'ids': [2, 5], 'notes': 'Includes intense violence and blood.'}",,,,
2,game,Day of Defeat,30,0,False,"Enlist in an intense brand of Axis vs. Allied teamplay set in the WWII European Theatre of Operations. Players assume the role of light/assault/heavy infantry, sniper or machine-gunner class, each with a unique arsenal of historical weaponry at their disposal. Missions are based on key historical operations. And, as war rages, players must work together with their squad to accomplish a variety...","Enlist in an intense brand of Axis vs. Allied teamplay set in the WWII European Theatre of Operations. Players assume the role of light/assault/heavy infantry, sniper or machine-gunner class, each with a unique arsenal of historical weaponry at their disposal. Missions are based on key historical operations. And, as war rages, players must work together with their squad to accomplish a variety...","Enlist in an intense brand of Axis vs. Allied teamplay set in the WWII European Theatre of Operations. Players assume the role of light/assault/heavy infantry, sniper or machine-gunner class, each with a unique arsenal of historical weaponry at their disposal. Missions are based on key historical operations.","English, French, German, Italian, Spanish - Spain",https://cdn.akamai.steamstatic.com/steam/apps/30/header.jpg?t=1512413490,...,{'total': 3158},"{'coming_soon': False, 'date': '1 May, 2003'}","{'url': '', 'email': ''}",https://cdn.akamai.steamstatic.com/steam/apps/30/page_bg_generated_v6b.jpg?t=1512413490,https://cdn.akamai.steamstatic.com/steam/apps/30/page_bg_generated.jpg?t=1512413490,"{'ids': [], 'notes': None}",,,,
3,game,Deathmatch Classic,40,0,False,"Enjoy fast-paced multiplayer gaming with Deathmatch Classic (a.k.a. DMC). Valve's tribute to the work of id software, DMC invites players to grab their rocket launchers and put their reflexes to the test in a collection of futuristic settings.","Enjoy fast-paced multiplayer gaming with Deathmatch Classic (a.k.a. DMC). Valve's tribute to the work of id software, DMC invites players to grab their rocket launchers and put their reflexes to the test in a collection of futuristic settings.","Enjoy fast-paced multiplayer gaming with Deathmatch Classic (a.k.a. DMC). Valve's tribute to the work of id software, DMC invites players to grab their rocket launchers and put their reflexes to the test in a collection of futuristic settings.","English, French, German, Italian, Spanish - Spain, Korean, Russian, Simplified Chinese, Traditional Chinese",https://cdn.akamai.steamstatic.com/steam/apps/40/header.jpg?t=1568752159,...,{'total': 1515},"{'coming_soon': False, 'date': '1 Jun, 2001'}","{'url': '', 'email': ''}",https://cdn.akamai.steamstatic.com/steam/apps/40/page_bg_generated_v6b.jpg?t=1568752159,https://cdn.akamai.steamstatic.com/steam/apps/40/page_bg_generated.jpg?t=1568752159,"{'ids': [], 'notes': None}",,,,
4,game,Half-Life: Opposing Force,50,0,False,"Return to the Black Mesa Research Facility as one of the military specialists assigned to eliminate Gordon Freeman. Experience an entirely new episode of single player action. Meet fierce alien opponents, and experiment with new weaponry. Named 'Game of the Year' by the Academy of Interactive Arts and Sciences.","Return to the Black Mesa Research Facility as one of the military specialists assigned to eliminate Gordon Freeman. Experience an entirely new episode of single player action. Meet fierce alien opponents, and experiment with new weaponry. Named 'Game of the Year' by the Academy of Interactive Arts and Sciences.","Return to the Black Mesa Research Facility as one of the military specialists assigned to eliminate Gordon Freeman. Experience an entirely new episode of single player action. Meet fierce alien opponents, and experiment with new weaponry. Named 'Game of the Year' by the Academy of Interactive Arts and Sciences.","English, French, German, Korean",https://cdn.akamai.steamstatic.com/steam/apps/50/header.jpg?t=1579628243,...,{'total': 11627},"{'coming_soon': False, 'date': '1 Nov, 1999'}","{'url': 'https://help.steampowered.com', 'email': ''}",https://cdn.akamai.steamstatic.com/steam/apps/50/page_bg_generated_v6b.jpg?t=1579628243,https://cdn.akamai.steamstatic.com/steam/apps/50/page_bg_generated.jpg?t=1579628243,"{'ids': [], 'notes': None}",,,,


## Download Steamspy Game
We will start downloading the game data from steamspy for the games identified in our app_list.

In [7]:
%%time

# if user is interested in obtaining a sample data
# instead of running the remaining cell in this section
# this cell can be run instead
# by removining the comments for lines identified as code

# pkl file name to be saved
sample_steamspy_filename = '../data/sample_steamspy_game_data.pkl'    # code

# create empty pickle file for function usage using empty dataframe
empty_df = pd.DataFrame()
pkl_output(sample_steamspy_filename, empty_df)   # code

# last run index, default is 0 to start scrapping
sample_steamspy_index_value = 0

# download game data from steam based on app_list
# below are all code
batch_process(
    fn = steamspy_data_request,  # Function used to scrap data
    app_list = app_list,      # dataframe containing app_id and game_name
    data_filename = sample_steamspy_filename,       # folder/file path and name of file. E.g. '../data/name.pkl'
    begin = sample_steamspy_index_value,      # starting index of scrapping. Default is 0
    end=20,                   # last index of scrapping. Default to -1
    batchsize=10,           # Size of each batch iteration. Default is 1000
    pause=2,                  # value of pause time in seconds for each scrapping. Default is 2 seconds
    batch_pause=120                   # value of pause time in seconds for each batch. Default is 300 seconds
)

# read in sample and look at dataframe
sample_steamspy_game_df = pd.read_pickle(sample_steamspy_filename)
sample_steamspy_game_df.info()

Starting at index 0

Starting lines 0 to 9 scrapping                        
Current index: 0

  game_data = game_data.append(data, ignore_index=True)


Current index: 1

  game_data = game_data.append(data, ignore_index=True)


Current index: 2

  game_data = game_data.append(data, ignore_index=True)


Current index: 3

  game_data = game_data.append(data, ignore_index=True)


Current index: 4

  game_data = game_data.append(data, ignore_index=True)


Current index: 5

  game_data = game_data.append(data, ignore_index=True)


Current index: 6

  game_data = game_data.append(data, ignore_index=True)


Current index: 7

  game_data = game_data.append(data, ignore_index=True)


Current index: 8

  game_data = game_data.append(data, ignore_index=True)


Current index: 9

  game_data = game_data.append(data, ignore_index=True)


Data exported for lines 0 to 9                         
Starting lines 10 to 19 scrapping                        
Current index: 10

  game_data = game_data.append(data, ignore_index=True)


Current index: 11

  game_data = game_data.append(data, ignore_index=True)


Current index: 12

  game_data = game_data.append(data, ignore_index=True)


Current index: 13

  game_data = game_data.append(data, ignore_index=True)


Current index: 14

  game_data = game_data.append(data, ignore_index=True)


Current index: 15

  game_data = game_data.append(data, ignore_index=True)


Current index: 16

  game_data = game_data.append(data, ignore_index=True)


Current index: 17

  game_data = game_data.append(data, ignore_index=True)


Current index: 18

  game_data = game_data.append(data, ignore_index=True)


Current index: 19

  game_data = game_data.append(data, ignore_index=True)


Data exported for lines 10 to 19                         

All batches complete. 20 games extracted
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20 entries, 0 to 19
Data columns (total 20 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   appid            20 non-null     int64 
 1   name             20 non-null     object
 2   developer        20 non-null     object
 3   publisher        20 non-null     object
 4   score_rank       20 non-null     object
 5   positive         20 non-null     int64 
 6   negative         20 non-null     int64 
 7   userscore        20 non-null     int64 
 8   owners           20 non-null     object
 9   average_forever  20 non-null     int64 
 10  average_2weeks   20 non-null     int64 
 11  median_forever   20 non-null     int64 
 12  median_2weeks    20 non-null     int64 
 13  price            20 non-null     object
 14  initialprice     20 non-null     object
 15  discount         20 non-nul

In [8]:
print(sample_steamspy_game_df.shape)
sample_steamspy_game_df.head()

(20, 20)


Unnamed: 0,appid,name,developer,publisher,score_rank,positive,negative,userscore,owners,average_forever,average_2weeks,median_forever,median_2weeks,price,initialprice,discount,ccu,languages,genre,tags
0,10,Counter-Strike,Valve,Valve,,194529,4993,0,"10,000,000 .. 20,000,000",11319,108,216,142,999,999,0,14447,"English, French, German, Italian, Spanish - Spain, Simplified Chinese, Traditional Chinese, Korean",Action,"{'Action': 5383, 'FPS': 4807, 'Multiplayer': 3367, 'Shooter': 3332, 'Classic': 2763, 'Team-Based': 1850, 'First-Person': 1697, 'Competitive': 1592, 'Tactical': 1328, '1990's': 1182, 'e-sports': 1178, 'PvP': 868, 'Old School': 753, 'Military': 625, 'Strategy': 607, 'Survival': 297, 'Score Attack': 285, '1980s': 258, 'Assassin': 223, 'Violent': 65}"
1,20,Team Fortress Classic,Valve,Valve,,5487,905,0,"5,000,000 .. 10,000,000",245,0,22,0,499,499,0,108,"English, French, German, Italian, Spanish - Spain, Korean, Russian, Simplified Chinese, Traditional Chinese",Action,"{'Action': 746, 'FPS': 307, 'Multiplayer': 258, 'Classic': 233, 'Hero Shooter': 213, 'Shooter': 206, 'Team-Based': 189, 'Class-Based': 182, 'First-Person': 169, '1990's': 133, 'Old School': 106, 'Co-op': 89, 'Competitive': 69, 'Fast-Paced': 62, 'Retro': 55, 'Online Co-Op': 51, 'Violent': 45, 'Mod': 36, 'Funny': 35, 'Remake': 35}"
2,30,Day of Defeat,Valve,Valve,,5052,557,0,"5,000,000 .. 10,000,000",786,909,9,909,499,499,0,134,"English, French, German, Italian, Spanish - Spain",Action,"{'FPS': 789, 'World War II': 250, 'Multiplayer': 203, 'Shooter': 188, 'Action': 160, 'War': 151, 'Team-Based': 132, 'Classic': 125, 'First-Person': 105, 'Class-Based': 78, 'Military': 65, 'Historical': 57, 'Tactical': 41, 'Singleplayer': 37, 'Co-op': 34, 'Difficult': 18, 'Old School': 16, 'Retro': 14, 'World War I': 14, 'Strategy': 13}"
3,40,Deathmatch Classic,Valve,Valve,,1876,417,0,"5,000,000 .. 10,000,000",231,0,17,0,499,499,0,6,"English, French, German, Italian, Spanish - Spain, Korean, Russian, Simplified Chinese, Traditional Chinese",Action,"{'Action': 630, 'FPS': 140, 'Classic': 108, 'Multiplayer': 97, 'Shooter': 94, 'First-Person': 70, 'Arena Shooter': 45, 'Old School': 33, 'Sci-fi': 33, 'Competitive': 24, 'Fast-Paced': 16, 'Retro': 14, 'Gore': 14, 'Co-op': 13, 'Difficult': 12, '1990's': 8}"
4,50,Half-Life: Opposing Force,Gearbox Software,Valve,,13562,675,0,"1,000,000 .. 2,000,000",440,37,161,37,499,499,0,91,"English, French, German, Korean",Action,"{'FPS': 883, 'Action': 324, 'Classic': 252, 'Sci-fi': 249, 'Singleplayer': 226, 'Shooter': 221, 'First-Person': 187, 'Aliens': 173, '1990's': 134, 'Adventure': 115, 'Atmospheric': 105, 'Military': 92, 'Story Rich': 74, 'Silent Protagonist': 66, 'Great Soundtrack': 51, 'Gore': 39, 'Puzzle': 35, 'Co-op': 32, 'Moddable': 29, 'Retro': 17}"


By changing the paramenters, we will be able to get the complete 51,749 data from the servers for both servers.

Once full data is obtained, we will clean and merge both datasets before conducting EDA and building a recommender.