## Extracting Data from SteamSpy and Steam App Powered Store

For the first part of our analysis I decided to extract our information straight from Steam Store and SteamSpy using their APIs. Although we are able to get similar datasets from websites like Kaggle, I thought I could challenge myself to use APIs to gather data. Thanks to [Nik Davis Blog Post](https://nik-davis.github.io/posts/2019/steam-data-collection/) we are able to get a general idea on the process of extracting steam app data. 

## API References

- (https://partner.steamgames.com/doc/webapi)
- https://wiki.teamfortress.com/wiki/User:RJackson/StorefrontAPI
- https://steamapi.xpaw.me/#
- https://steamspy.com/api.php

## Import Libraries

In [1]:
# standard library imports
import csv
import datetime as dt
import json
import os
import statistics
import time

# third-party imports
import numpy as np
import pandas as pd
import requests

# customisations - ensure tables show all columns
pd.set_option("max_columns", 100)

## Get a List of App IDs on SteamSpy

App IDs are identification numbers for each steam game. We repeat the process going through 5 pages worth of App IDs to return 6000 steam app ids.

In [2]:
def get_request(url, parameters=None):
    """Return json-formatted response of a get request using optional parameters.
    
    Parameters
    ----------
    url : string
    parameters : {'parameter': 'value'}
        parameters to pass as part of get request
    
    Returns
    -------
    json_data
        json-formatted response (dict-like)
    """
    try:
        response = requests.get(url=url, params=parameters)
    except SSLError as s:
        print('SSL Error:', s)
        
        for i in range(5, 0, -1):
            print('\rWaiting... ({})'.format(i), end='')
            time.sleep(1)
        print('\rRetrying.' + ' '*10)
        
        # recusively try again
        return get_request(url, parameters)
    
    if response:
        return response.json()
    else:
        # response is none usually means too many requests. Wait and try again 
        print('No response, waiting 10 seconds...')
        time.sleep(10)
        print('Retrying.')
        return get_request(url, parameters)

In [3]:
url = "https://steamspy.com/api.php"
parameters = {"request": 'all'}

# request 'all' from steam spy and parse into dataframe
json_data = get_request(url, parameters=parameters)
steam_spy_all = pd.DataFrame.from_dict(json_data, orient='index')

# generate sorted app_list from steamspy data
app_list = steam_spy_all[['appid', 'name']].sort_values('appid').reset_index(drop=True)

# display first few rows
app_list

Unnamed: 0,appid,name
0,10,Counter-Strike
1,20,Team Fortress Classic
2,30,Day of Defeat
3,40,Deathmatch Classic
4,50,Half-Life: Opposing Force
...,...,...
995,1593500,God of War
996,1599340,Lost Ark
997,1794680,Vampire Survivors
998,1832640,Mirror 2: Project X


In [4]:
url = "https://steamspy.com/api.php?request=all&page=1"

# request 'all' from steam spy and parse into dataframe
json_data1 = get_request(url, parameters=parameters)
steam_spy_all1 = pd.DataFrame.from_dict(json_data1, orient='index')

# generate sorted app_list from steamspy data
app_list1 = steam_spy_all1[['appid', 'name']].sort_values('appid').reset_index(drop=True)


# display first few rows
app_list1

Unnamed: 0,appid,name
0,1200,Red Orchestra: Ostfront 41-45
1,1500,Darwinia
2,1510,Uplink
3,1520,DEFCON
4,1530,Multiwinia
...,...,...
995,1569040,Football Manager 2022
996,1621690,Core Keeper
997,1668800,Maze Mania: The Ultimate 3D Maze Game
998,1721470,Poppy Playtime


In [5]:
url = "https://steamspy.com/api.php?request=all&page=2"

# request 'all' from steam spy and parse into dataframe
json_data2 = get_request(url, parameters=parameters)
steam_spy_all2 = pd.DataFrame.from_dict(json_data2, orient='index')

# generate sorted app_list from steamspy data
app_list2 = steam_spy_all2[['appid', 'name']].sort_values('appid').reset_index(drop=True)

# display first few rows
app_list2

Unnamed: 0,appid,name
0,2210,Quake 4
1,2270,Wolfenstein 3D
2,2300,DOOM II
3,2330,QUAKE II Mission Pack: The Reckoning
4,2420,The Ship: Single Player
...,...,...
995,1672970,Minecraft Dungeons
996,1674470,ELYON
997,1677740,Stumble Guys
998,1798880,Corridors


In [6]:
url = "https://steamspy.com/api.php?request=all&page=3"

# request 'all' from steam spy and parse into dataframe
json_data3 = get_request(url, parameters=parameters)
steam_spy_all3 = pd.DataFrame.from_dict(json_data3, orient='index')

# generate sorted app_list from steamspy data
app_list3 = steam_spy_all3[['appid', 'name']].sort_values('appid').reset_index(drop=True)

# display first few rows
app_list3

Unnamed: 0,appid,name
0,1309,SiN Episodes: Emergence
1,1700,Arx Fatalis
2,2360,HeXen: Beyond Heretic
3,2370,HeXen: Deathkings of the Dark Citadel
4,4780,Medieval II: Total War Kingdoms
...,...,...
995,1802330,INVITATION To FEAR
996,1821060,Find The Sunbed
997,1827870,DarkHouse
998,1835350,道不可道


In [7]:
url = "https://steamspy.com/api.php?request=all&page=4"

# request 'all' from steam spy and parse into dataframe
json_data4 = get_request(url, parameters=parameters)
steam_spy_all4 = pd.DataFrame.from_dict(json_data4, orient='index')

# generate sorted app_list from steamspy data
app_list4 = steam_spy_all4[['appid', 'name']].sort_values('appid').reset_index(drop=True)

# display first few rows
app_list4

Unnamed: 0,appid,name
0,1300,SiN Episodes: Emergence
1,2340,QUAKE II Mission Pack: Ground Zero
2,2450,Bloody Good Time
3,2610,GUN
4,2640,Call of Duty: United Offensive
...,...,...
995,1713170,Air Hunter
996,1724290,Outergalactic Aliens Pinball
997,1771340,Miasma: Citizens of Free Thought
998,1841630,Armor Clash 1 Remake [RTS]


In [8]:
url = "https://steamspy.com/api.php?request=all&page=5"

# request 'all' from steam spy and parse into dataframe
json_data5 = get_request(url, parameters=parameters)
steam_spy_all5 = pd.DataFrame.from_dict(json_data5, orient='index')

# generate sorted app_list from steamspy data
app_list5 = steam_spy_all5[['appid', 'name']].sort_values('appid').reset_index(drop=True)

# display first few rows
app_list5

Unnamed: 0,appid,name
0,2390,Heretic: Shadow of the Serpent Riders
1,2450,Bloody Good Time
2,2540,RIP - Trilogy
3,2710,Act of War: Direct Action
4,2920,Sub Command
...,...,...
995,1810240,Do Something
996,1842410,Deadly Racing Duel
997,1873960,Dungeon Crawler
998,1889620,AI Roguelite


In [9]:
# create a list from extracting from the first 5 pages.
all_app_list = app_list.append([app_list1,app_list2,app_list3,app_list4,app_list5])

In [10]:
all_app_list

Unnamed: 0,appid,name
0,10,Counter-Strike
1,20,Team Fortress Classic
2,30,Day of Defeat
3,40,Deathmatch Classic
4,50,Half-Life: Opposing Force
...,...,...
995,1810240,Do Something
996,1842410,Deadly Racing Duel
997,1873960,Dungeon Crawler
998,1889620,AI Roguelite


In [11]:
# Store into a csv file for gathering other information later
def get_app_data(start, stop, parser, pause):
    """Return list of app data generated from parser.
    
    parser : function to handle request
    """
    app_data = []
    
    # iterate through each row of app_list, confined by start and stop
    for index, row in all_app_list[start:stop].iterrows():
        print('Current index: {}'.format(index), end='\r')
        
        appid = row['appid']
        name = row['name']

        # retrive app data for a row, handled by supplied parser, and append to list
        data = parser(appid, name)
        app_data.append(data)

        time.sleep(pause) # prevent overloading api with requests
    
    return app_data


def process_batches(parser, all_app_list, download_path, data_filename, index_filename,
                    columns, begin=0, end=-1, batchsize=100, pause=1):
    """Process app data in batches, writing directly to file.
    
    parser : custom function to format request
    app_list : dataframe of appid and name
    download_path : path to store data
    data_filename : filename to save app data
    index_filename : filename to store highest index written
    columns : column names for file
    
    Keyword arguments:
    
    begin : starting index (get from index_filename, default 0)
    end : index to finish (defaults to end of app_list)
    batchsize : number of apps to write in each batch (default 100)
    pause : time to wait after each api request (defualt 1)
    
    returns: none
    """
    print('Starting at index {}:\n'.format(begin))
    
    # by default, process all apps in app_list
    if end == -1:
        end = len(all_app_list) + 1
    
    # generate array of batch begin and end points
    batches = np.arange(begin, end, batchsize)
    batches = np.append(batches, end)
    
    apps_written = 0
    batch_times = []
    
    for i in range(len(batches) - 1):
        start_time = time.time()
        
        start = batches[i]
        stop = batches[i+1]
        
        app_data = get_app_data(start, stop, parser, pause)
        
        rel_path = os.path.join(download_path, data_filename)
        
        # writing app data to file
        with open(rel_path, 'a', newline='', encoding='utf-8') as f:
            writer = csv.DictWriter(f, fieldnames=columns, extrasaction='ignore')
            
            for j in range(3,0,-1):
                print("\rAbout to write data, don't stop script! ({})".format(j), end='')
                time.sleep(0.5)
            
            writer.writerows(app_data)
            print('\rExported lines {}-{} to {}.'.format(start, stop-1, data_filename), end=' ')
            
        apps_written += len(app_data)
        
        idx_path = os.path.join(download_path, index_filename)
        
        # writing last index to file
        with open(idx_path, 'w') as f:
            index = stop
            print(index, file=f)
            
        # logging time taken
        end_time = time.time()
        time_taken = end_time - start_time
        
        batch_times.append(time_taken)
        mean_time = statistics.mean(batch_times)
        
        est_remaining = (len(batches) - i - 2) * mean_time
        
        remaining_td = dt.timedelta(seconds=round(est_remaining))
        time_td = dt.timedelta(seconds=round(time_taken))
        mean_td = dt.timedelta(seconds=round(mean_time))
        
        print('Batch {} time: {} (avg: {}, remaining: {})'.format(i, time_td, mean_td, remaining_td))
            
    print('\nProcessing batches complete. {} apps written'.format(apps_written))

In [12]:
def reset_index(download_path, index_filename):
    """Reset index in file to 0."""
    rel_path = os.path.join(download_path, index_filename)
    
    with open(rel_path, 'w') as f:
        print(0, file=f)
        

def get_index(download_path, index_filename):
    """Retrieve index from file, returning 0 if file not found."""
    try:
        rel_path = os.path.join(download_path, index_filename)

        with open(rel_path, 'r') as f:
            index = int(f.readline())
    
    except FileNotFoundError:
        index = 0
        
    return index


def prepare_data_file(download_path, filename, index, columns):
    """Create file and write headers if index is 0."""
    if index == 0:
        rel_path = os.path.join(download_path, filename)

        with open(rel_path, 'w', newline='') as f:
            writer = csv.DictWriter(f, fieldnames=columns)
            writer.writeheader()

## Retrieve Steam Data

Steampowered API provides the qualitative features of each game such as the detailed description, system requirements, reviews, etc.

In [13]:
def parse_steam_request(appid, name):
    """Unique parser to handle data from Steam Store API.
    
    Returns : json formatted data (dict-like)
    """
    url = "http://store.steampowered.com/api/appdetails/"
    parameters = {"appids": appid}
    
    json_data = get_request(url, parameters=parameters)
    json_app_data = json_data[str(appid)]
    
    if json_app_data['success']:
        data = json_app_data['data']
    else:
        data = {'name': name, 'steam_appid': appid}
        
    return data


# Set file parameters
download_path = '../data/download/test'
steam_app_data = 'steam_app_data_full.csv'
steam_index = 'steam_index.txt'

steam_columns = [
    'type', 'name', 'steam_appid', 'required_age', 'is_free', 'controller_support',
    'dlc', 'detailed_description', 'about_the_game', 'short_description', 'fullgame',
    'supported_languages', 'header_image', 'website', 'pc_requirements', 'mac_requirements',
    'linux_requirements', 'legal_notice', 'drm_notice', 'ext_user_account_notice',
    'developers', 'publishers', 'demos', 'price_overview', 'packages', 'package_groups',
    'platforms', 'metacritic', 'reviews', 'categories', 'genres', 'screenshots',
    'movies', 'recommendations', 'achievements', 'release_date', 'support_info',
    'background', 'content_descriptors'
]

# Overwrites last index for demonstration (would usually store highest index so can continue across sessions)
reset_index(download_path, steam_index)

# Retrieve last index downloaded from file
index = get_index(download_path, steam_index)

# Wipe or create data file and write headers if index is 0
prepare_data_file(download_path, steam_app_data, index, steam_columns)

# Set end and chunksize for demonstration - remove to run through entire app list
process_batches(
    parser=parse_steam_request,
    all_app_list=all_app_list,
    download_path=download_path,
    data_filename=steam_app_data,
    index_filename=steam_index,
    columns=steam_columns,
    begin=index)

Starting at index 0:

Exported lines 0-99 to steam_app_data_full.csv. Batch 0 time: 0:02:14 (avg: 0:02:14, remaining: 2:14:05)
Exported lines 100-199 to steam_app_data_full.csv. Batch 1 time: 0:02:09 (avg: 0:02:12, remaining: 2:09:28)
Exported lines 200-299 to steam_app_data_full.csv. Batch 2 time: 0:02:09 (avg: 0:02:11, remaining: 2:06:27)
Exported lines 300-399 to steam_app_data_full.csv. Batch 3 time: 0:02:11 (avg: 0:02:11, remaining: 2:04:12)
Exported lines 400-499 to steam_app_data_full.csv. Batch 4 time: 0:02:12 (avg: 0:02:11, remaining: 2:02:18)
Exported lines 500-599 to steam_app_data_full.csv. Batch 5 time: 0:02:13 (avg: 0:02:11, remaining: 2:00:26)
Exported lines 600-699 to steam_app_data_full.csv. Batch 6 time: 0:02:13 (avg: 0:02:12, remaining: 1:58:24)
Exported lines 700-799 to steam_app_data_full.csv. Batch 7 time: 0:02:14 (avg: 0:02:12, remaining: 1:56:27)
Exported lines 800-899 to steam_app_data_full.csv. Batch 8 time: 0:02:12 (avg: 0:02:12, remaining: 1:54:13)
No respon

Retrying.
Exported lines 5400-5499 to steam_app_data_full.csv. Batch 54 time: 0:02:36 (avg: 0:02:24, remaining: 0:14:25)
Exported lines 5500-5599 to steam_app_data_full.csv. Batch 55 time: 0:02:19 (avg: 0:02:24, remaining: 0:12:00)
No response, waiting 10 seconds...
Retrying.
No response, waiting 10 seconds...
Retrying.
No response, waiting 10 seconds...
Retrying.
Exported lines 5600-5699 to steam_app_data_full.csv. Batch 56 time: 0:02:45 (avg: 0:02:24, remaining: 0:09:37)
Exported lines 5700-5799 to steam_app_data_full.csv. Batch 57 time: 0:02:13 (avg: 0:02:24, remaining: 0:07:13)
Exported lines 5800-5899 to steam_app_data_full.csv. Batch 58 time: 0:02:18 (avg: 0:02:24, remaining: 0:04:48)
No response, waiting 10 seconds...
Retrying.
No response, waiting 10 seconds...
Retrying.
Exported lines 5900-5999 to steam_app_data_full.csv. Batch 59 time: 0:02:33 (avg: 0:02:24, remaining: 0:02:24)
Exported lines 6000-6000 to steam_app_data_full.csv. Batch 60 time: 0:00:02 (avg: 0:02:22, remainin

In [15]:
# inspect downloaded data
steam_app_data = pd.read_csv('../data/download/test/steam_app_data_full.csv')
steam_app_data

Unnamed: 0,type,name,steam_appid,required_age,is_free,controller_support,dlc,detailed_description,about_the_game,short_description,fullgame,supported_languages,header_image,website,pc_requirements,mac_requirements,linux_requirements,legal_notice,drm_notice,ext_user_account_notice,developers,publishers,demos,price_overview,packages,package_groups,platforms,metacritic,reviews,categories,genres,screenshots,movies,recommendations,achievements,release_date,support_info,background,content_descriptors
0,game,Counter-Strike,10,0,False,,,Disfruta del juego de acción en línea n° 1 en ...,Disfruta del juego de acción en línea n° 1 en ...,Disfruta del juego de acción en línea n° 1 en ...,,"Inglés<strong>*</strong>, Francés<strong>*</st...",https://cdn.akamai.steamstatic.com/steam/apps/...,,{'minimum': '<p><strong>Mínimo:</strong> proce...,{'minimum': 'Minimum: OS X Snow Leopard 10.6....,"{'minimum': 'Minimum: Linux Ubuntu 12.04, Dual...",,,,['Valve'],['Valve'],,"{'currency': 'EUR', 'initial': 819, 'final': 8...","[574941, 7]","[{'name': 'default', 'title': 'Comprar Counter...","{'windows': True, 'mac': True, 'linux': True}","{'score': 88, 'url': 'https://www.metacritic.c...",,"[{'id': 1, 'description': 'Multijugador'}, {'i...","[{'id': '1', 'description': 'Acción'}]","[{'id': 0, 'path_thumbnail': 'https://cdn.akam...",,{'total': 119660},,"{'coming_soon': False, 'date': '1 NOV 2000'}","{'url': 'http://steamcommunity.com/app/10', 'e...",https://cdn.akamai.steamstatic.com/steam/apps/...,"{'ids': [2, 5], 'notes': 'Includes intense vio..."
1,game,Team Fortress Classic,20,0,False,,,One of the most popular online action games of...,One of the most popular online action games of...,One of the most popular online action games of...,,"English, French, German, Italian, Spanish - Sp...",https://cdn.akamai.steamstatic.com/steam/apps/...,,{'minimum': '\r\n\t\t\t<p><strong>Minimum:</st...,{'minimum': 'Minimum: OS X Snow Leopard 10.6....,"{'minimum': 'Minimum: Linux Ubuntu 12.04, Dual...",,,,['Valve'],['Valve'],,"{'currency': 'USD', 'initial': 499, 'final': 4...",[29],"[{'name': 'default', 'title': 'Buy Team Fortre...","{'windows': True, 'mac': True, 'linux': True}",,,"[{'id': 1, 'description': 'Multi-player'}, {'i...","[{'id': '1', 'description': 'Action'}]","[{'id': 0, 'path_thumbnail': 'https://cdn.akam...",,{'total': 4544},,"{'coming_soon': False, 'date': 'Apr 1, 1999'}","{'url': '', 'email': ''}",https://cdn.akamai.steamstatic.com/steam/apps/...,"{'ids': [2, 5], 'notes': 'Includes intense vio..."
2,game,Day of Defeat,30,0,False,,,Enlist in an intense brand of Axis vs. Allied ...,Enlist in an intense brand of Axis vs. Allied ...,Enlist in an intense brand of Axis vs. Allied ...,,"English, French, German, Italian, Spanish - Spain",https://cdn.akamai.steamstatic.com/steam/apps/...,http://www.dayofdefeat.com/,{'minimum': '\r\n\t\t\t<p><strong>Minimum:</st...,{'minimum': 'Minimum: OS X Snow Leopard 10.6....,"{'minimum': 'Minimum: Linux Ubuntu 12.04, Dual...",,,,['Valve'],['Valve'],,"{'currency': 'EUR', 'initial': 399, 'final': 3...",[30],"[{'name': 'default', 'title': 'Buy Day of Defe...","{'windows': True, 'mac': True, 'linux': True}","{'score': 79, 'url': 'https://www.metacritic.c...",,"[{'id': 1, 'description': 'Multi-player'}, {'i...","[{'id': '1', 'description': 'Action'}]","[{'id': 0, 'path_thumbnail': 'https://cdn.akam...",,{'total': 3159},,"{'coming_soon': False, 'date': '1 May, 2003'}","{'url': '', 'email': ''}",https://cdn.akamai.steamstatic.com/steam/apps/...,"{'ids': [], 'notes': None}"
3,game,Deathmatch Classic,40,0,False,,,Enjoy fast-paced multiplayer gaming with Death...,Enjoy fast-paced multiplayer gaming with Death...,Enjoy fast-paced multiplayer gaming with Death...,,"English, French, German, Italian, Spanish - Sp...",https://cdn.akamai.steamstatic.com/steam/apps/...,,{'minimum': '\r\n\t\t\t<p><strong>Minimum:</st...,{'minimum': 'Minimum: OS X Snow Leopard 10.6....,"{'minimum': 'Minimum: Linux Ubuntu 12.04, Dual...",,,,['Valve'],['Valve'],,"{'currency': 'USD', 'initial': 499, 'final': 4...",[31],"[{'name': 'default', 'title': 'Buy Deathmatch ...","{'windows': True, 'mac': True, 'linux': True}",,,"[{'id': 1, 'description': 'Multi-player'}, {'i...","[{'id': '1', 'description': 'Action'}]","[{'id': 0, 'path_thumbnail': 'https://cdn.akam...",,{'total': 1517},,"{'coming_soon': False, 'date': 'Jun 1, 2001'}","{'url': '', 'email': ''}",https://cdn.akamai.steamstatic.com/steam/apps/...,"{'ids': [], 'notes': None}"
4,game,Half-Life: Opposing Force,50,0,False,,,Return to the Black Mesa Research Facility as ...,Return to the Black Mesa Research Facility as ...,Return to the Black Mesa Research Facility as ...,,"English, French, German, Korean",https://cdn.akamai.steamstatic.com/steam/apps/...,,{'minimum': '\r\n\t\t\t<p><strong>Minimum:</st...,{'minimum': 'Minimum: OS X Snow Leopard 10.6....,"{'minimum': 'Minimum: Linux Ubuntu 12.04, Dual...",,,,['Gearbox Software'],['Valve'],,"{'currency': 'EUR', 'initial': 399, 'final': 3...",[32],"[{'name': 'default', 'title': 'Buy Half-Life: ...","{'windows': True, 'mac': True, 'linux': True}",,,"[{'id': 2, 'description': 'Single-player'}, {'...","[{'id': '1', 'description': 'Action'}]","[{'id': 0, 'path_thumbnail': 'https://cdn.akam...",,{'total': 11611},,"{'coming_soon': False, 'date': '1 Nov, 1999'}","{'url': 'https://help.steampowered.com', 'emai...",https://cdn.akamai.steamstatic.com/steam/apps/...,"{'ids': [], 'notes': None}"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5995,game,Do Something,1810240,0,True,full,,You buy a game that you can pass to the end. F...,You buy a game that you can pass to the end. F...,"After a few hours of epidemic, you go straight...",,"English<strong>*</strong>, Simplified Chinese,...",https://cdn.akamai.steamstatic.com/steam/apps/...,,{'minimum': '<strong>Minimum:</strong><br><ul ...,{'minimum': '<strong>Minimum:</strong><br><ul ...,{'minimum': '<strong>Minimum:</strong><br><ul ...,,,,['StrelitziaGames'],[' StrelitziaGames'],,,,[],"{'windows': True, 'mac': False, 'linux': False}",,,"[{'id': 2, 'description': 'Single-player'}, {'...","[{'id': '1', 'description': 'Action'}, {'id': ...","[{'id': 0, 'path_thumbnail': 'https://cdn.akam...","[{'id': 256859678, 'name': 'early alpha', 'thu...",,,"{'coming_soon': False, 'date': 'Dec 15, 2021'}","{'url': '', 'email': 'strelitziareg@gmail.com'}",https://cdn.akamai.steamstatic.com/steam/apps/...,"{'ids': [2, 5], 'notes': 'This game contains b..."
5996,game,Deadly Racing Duel,1842410,0,False,,,Test your driving skills in this unusual racin...,Test your driving skills in this unusual racin...,Deadly Racing Duel - in the role of a racer wh...,,English<strong>*</strong><br><strong>*</strong...,https://cdn.akamai.steamstatic.com/steam/apps/...,,{'minimum': '<strong>Minimum:</strong><br><ul ...,[],[],,,,['DDRACE'],['kovalevviktor'],,"{'currency': 'USD', 'initial': 1099, 'final': ...",[662854],"[{'name': 'default', 'title': 'Buy Deadly Raci...","{'windows': True, 'mac': False, 'linux': False}",,,"[{'id': 2, 'description': 'Single-player'}]","[{'id': '23', 'description': 'Indie'}]","[{'id': 0, 'path_thumbnail': 'https://cdn.akam...","[{'id': 256863765, 'name': 'Deadly Racing Duel...",,,"{'coming_soon': False, 'date': 'Jan 31, 2022'}","{'url': '', 'email': 'kovalevviktorst@gmail.com'}",https://cdn.akamai.steamstatic.com/steam/apps/...,"{'ids': [], 'notes': None}"
5997,game,Dungeon Crawler,1873960,0,False,,,"<img src=""https://cdn.akamai.steamstatic.com/s...","<img src=""https://cdn.akamai.steamstatic.com/s...","Dungeon Crawler is a round-based, third-person...",,English,https://cdn.akamai.steamstatic.com/steam/apps/...,https://discord.gg/v57Ap2Yf93,{'minimum': '<strong>Minimum:</strong><br><ul ...,{'minimum': '<strong>Minimum:</strong><br><ul ...,{'minimum': '<strong>Minimum:</strong><br><ul ...,Copyright: Yanfei Schönberner,,,['Jinxi'],['Jinxi'],,"{'currency': 'USD', 'initial': 599, 'final': 5...",[674613],"[{'name': 'default', 'title': 'Buy Dungeon Cra...","{'windows': True, 'mac': False, 'linux': False}",,,"[{'id': 2, 'description': 'Single-player'}]","[{'id': '4', 'description': 'Casual'}, {'id': ...","[{'id': 0, 'path_thumbnail': 'https://cdn.akam...","[{'id': 256872518, 'name': 'Dungeon Crawler Tr...",,"{'total': 87, 'highlighted': [{'name': 'Zombie...","{'coming_soon': False, 'date': 'Feb 25, 2022'}","{'url': 'https://discord.gg/v57Ap2Yf93', 'emai...",https://cdn.akamai.steamstatic.com/steam/apps/...,"{'ids': [5], 'notes': 'Fantasy Violence, Mild ..."
5998,game,AI Roguelite,1889620,0,False,,,<strong>ATTENTION: This game requires an NVIDI...,<strong>ATTENTION: This game requires an NVIDI...,"Infinite text-based RPG, powered by cutting-ed...",,English,https://cdn.akamai.steamstatic.com/steam/apps/...,,{'minimum': '<strong>Minimum:</strong><br><ul ...,[],[],,,,['Max Loh'],['Max Loh'],,"{'currency': 'USD', 'initial': 499, 'final': 4...",[680517],"[{'name': 'default', 'title': 'Buy AI Roguelit...","{'windows': True, 'mac': False, 'linux': False}",,,"[{'id': 2, 'description': 'Single-player'}]","[{'id': '25', 'description': 'Adventure'}, {'i...","[{'id': 0, 'path_thumbnail': 'https://cdn.akam...","[{'id': 256874827, 'name': 'traler_v3_try2', '...",,,"{'coming_soon': False, 'date': 'Mar 2, 2022'}","{'url': 'www.maxloh.com', 'email': 'max@maxloh...",https://cdn.akamai.steamstatic.com/steam/apps/...,"{'ids': [], 'notes': None}"


## Retrieve SteamSpy Data

SteamSpy data provides each's games statistical data such as the price, average game play, price, concurrent gamers as of today (March 2022)

In [16]:
def parse_steamspy_request(appid, name):
    """Parser to handle SteamSpy API data."""
    url = "https://steamspy.com/api.php"
    parameters = {"request": "appdetails", "appid": appid}
    
    json_data = get_request(url, parameters)
    return json_data


# set files and columns
download_path = '../data/download/test'
steamspy_data = 'steamspy_data_full.csv'
steamspy_index = 'steamspy_index.txt'

steamspy_columns = [
    'appid', 'name', 'developer', 'publisher', 'score_rank', 'positive',
    'negative', 'userscore', 'owners', 'average_forever', 'average_2weeks',
    'median_forever', 'median_2weeks', 'price', 'initialprice', 'discount',
    'languages', 'genre', 'ccu', 'tags'
]

reset_index(download_path, steamspy_index)
index = get_index(download_path, steamspy_index)

# Wipe data file if index is 0
prepare_data_file(download_path, steamspy_data, index, steamspy_columns)

process_batches(
    parser=parse_steamspy_request,
    all_app_list=all_app_list,
    download_path=download_path, 
    data_filename=steamspy_data,
    index_filename=steamspy_index,
    columns=steamspy_columns,
    begin=index,
    pause=0.5
)

Starting at index 0:

Exported lines 0-99 to steamspy_data_full.csv. Batch 0 time: 0:01:27 (avg: 0:01:27, remaining: 1:26:45)
Exported lines 100-199 to steamspy_data_full.csv. Batch 1 time: 0:01:26 (avg: 0:01:27, remaining: 1:25:06)
Exported lines 200-299 to steamspy_data_full.csv. Batch 2 time: 0:01:26 (avg: 0:01:26, remaining: 1:23:30)
Exported lines 300-399 to steamspy_data_full.csv. Batch 3 time: 0:01:26 (avg: 0:01:26, remaining: 1:21:57)
Exported lines 400-499 to steamspy_data_full.csv. Batch 4 time: 0:01:26 (avg: 0:01:26, remaining: 1:20:28)
Exported lines 500-599 to steamspy_data_full.csv. Batch 5 time: 0:01:26 (avg: 0:01:26, remaining: 1:18:59)
Exported lines 600-699 to steamspy_data_full.csv. Batch 6 time: 0:01:26 (avg: 0:01:26, remaining: 1:17:28)
Exported lines 700-799 to steamspy_data_full.csv. Batch 7 time: 0:01:25 (avg: 0:01:26, remaining: 1:15:58)
Exported lines 800-899 to steamspy_data_full.csv. Batch 8 time: 0:01:25 (avg: 0:01:26, remaining: 1:14:28)
Exported lines 900

In [17]:
steamspy_data = pd.read_csv('../data/download/test/steamspy_data_full.csv')
steamspy_data

Unnamed: 0,appid,name,developer,publisher,score_rank,positive,negative,userscore,owners,average_forever,average_2weeks,median_forever,median_2weeks,price,initialprice,discount,languages,genre,ccu,tags
0,10,Counter-Strike,Valve,Valve,,194508,4995,0,"10,000,000 .. 20,000,000",12298,3385,202,6680,999,999,0,"English, French, German, Italian, Spanish - Sp...",Action,14724,"{'Action': 5383, 'FPS': 4807, 'Multiplayer': 3..."
1,20,Team Fortress Classic,Valve,Valve,,5485,905,0,"5,000,000 .. 10,000,000",624,0,23,0,499,499,0,"English, French, German, Italian, Spanish - Sp...",Action,95,"{'Action': 746, 'FPS': 307, 'Multiplayer': 258..."
2,30,Day of Defeat,Valve,Valve,,5052,557,0,"5,000,000 .. 10,000,000",735,909,10,909,499,499,0,"English, French, German, Italian, Spanish - Spain",Action,134,"{'FPS': 789, 'World War II': 250, 'Multiplayer..."
3,40,Deathmatch Classic,Valve,Valve,,1876,417,0,"5,000,000 .. 10,000,000",1362,0,19,0,499,499,0,"English, French, German, Italian, Spanish - Sp...",Action,7,"{'Action': 630, 'FPS': 140, 'Classic': 108, 'M..."
4,50,Half-Life: Opposing Force,Gearbox Software,Valve,,13557,675,0,"5,000,000 .. 10,000,000",651,37,130,37,499,499,0,"English, French, German, Korean",Action,116,"{'FPS': 883, 'Action': 324, 'Classic': 252, 'S..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5995,1810240,Do Something,StrelitziaGames,StrelitziaGames,,3,7,0,"100,000 .. 200,000",0,0,0,0,0,0,0,"English, Simplified Chinese, Russian, Japanese...","Action, Adventure",1,"{'Zombies': 253, 'Action': 252, 'Survival Horr..."
5996,1842410,Deadly Racing Duel,DDRACE,kovalevviktor,,9,1,0,"100,000 .. 200,000",0,0,0,0,1099,1099,0,English,Indie,0,"{'Difficult': 248, 'Rogue-like': 243, 'Pixel G..."
5997,1873960,Dungeon Crawler,Jinxi,Jinxi,,5,3,0,"100,000 .. 200,000",0,0,0,0,599,599,0,English,"Casual, Indie, RPG, Strategy",0,"{'Turn-Based Combat': 443, 'Strategy': 439, 'D..."
5998,1889620,AI Roguelite,Max Loh,Max Loh,,4,2,0,"100,000 .. 200,000",0,0,0,0,499,499,0,English,"Adventure, Indie, RPG, Early Access",0,"{'Early Access': 448, 'RPG': 407, 'Text-Based'..."


## Next Steps

We were able to gather 6000 games. Note that this 5 hours to run without timed out for requesting a lot of information in a short amount of time. Next we work on cleaning the data to get a sense of what metric we can use for our analysis and to understand our features.