# Steam Data

In this notebook I'm adding all the methods required to download game data.

## appids, name and last_modified
I want to start by downloading all the information that Steam provides about a game in their API.
We do this by calling the IStoreService/GetAppList API endpoint. 
This returns a maximum of 50k appids. So we need to iterate through multiple pages to download all appids.

Whenever we want to update our data, we can start from the last app id from our dataframe.

In [2]:
!pip install pandas
!pip install beautifulsoup4



In [3]:
import requests
import json
import pandas as pd
import numpy as np

In [4]:
STEAMAPIS_KEY="fD9Ddf9K93IJtVk3PbYySF0ghxU" # Key for https://steamapis.com/
STEAM_API_KEY="EE167EC066B7A3EBB2F6B2392E72AD42" # Key for https://api.steampowered.com/

In [14]:
def from_last_appid(last_appid, count):
    url = f"https://api.steampowered.com/IStoreService/GetAppList/v1/?key={STEAM_API_KEY}&max_results={count}&last_appid={last_appid}"
    return requests.get(url)

response = {"last_appid": 0}
apps = []

while 'last_appid' in response:
    result = from_last_appid(response['last_appid'], 50000)
    if result.status_code == 200:
        response = result.json()['response']
        if response != {}:
            apps += response['apps']
            

apps = pd.DataFrame(apps)
apps.head()

Unnamed: 0,appid,name,last_modified,price_change_number
0,10,Counter-Strike,1666823513,16899276
1,20,Team Fortress Classic,1579634708,16899276
2,30,Day of Defeat,1512413490,16899276
3,40,Deathmatch Classic,1568752159,16899276
4,50,Half-Life: Opposing Force,1579628243,16899276


In [16]:
apps.to_pickle("steam_data.pkl")

# Store Images

Luckily, images are a function of the appid, so we can add the urls for them easily.

In [None]:
# apps["store_image_header"] = apps[apps[

# Store Page Scrapping

Additional data can be scrapped from the store page, so let's do that now.

In [12]:
from bs4 import BeautifulSoup

In [97]:
def get_steamdb_page(appid):
    return requests.get(f"https://store.steampowered.com/app/{appid}").text

html_doc = get_steamdb_page(10)

In [98]:
doc = BeautifulSoup(html_doc, "html.parser")

In [62]:
description = doc.find("div", class_="game_description_snippet").text.strip()

In [73]:
header_image = doc.find("img", class_="game_header_image_full")['src']

In [102]:
developers = []
for dev in doc.find("div", id="developers_list").find_all("a"):
    developers.append(dev.text)
developers = ", ".join(developers)

# SteamSpy Data

SteamSpy provides an API with some useful data, so we can add that to the data frame.

We first add the fields that we can find in the api:

In [5]:
apps = pd.read_pickle("steam_data.pkl")

In [129]:
# create new columns for data from SteamSpy
fields = ['developer', 'publisher', 'score_rank', 'positive', 'negative', 'userscore', 'owners', 'average_forever', 'average_2weeks', 'median_forever', 'median_2weeks', 'price', 'initialprice', 'discount', 'ccu', 'languages', 'genre', 'tags'];

app_count = apps.count().appid

empty_array = []
for i in range(app_count):
    empty_array.append("")

for field in fields:
    apps[field] = empty_array

In [6]:
def get_steamspy_all(page):
    return requests.get(f"https://steamspy.com/api.php?request=all&page={page}").json()
    
def get_steamspy_page(appid):
    return requests.get(f"https://steamspy.com/api.php?request=appdetails&appid={appid}").json()

def add_to_apps(result):
    appid = result.pop('appid')

    for key in result:
        if key != 'tags':
            apps.loc[apps['appid'] == appid, key] = result[key]
        else:
            if result['tags'] != []:
                apps.loc[apps['appid'] == appid, "tags"] = ", ".join(result['tags'].keys())
    apps.to_pickle("steam_data.pkl")


In [7]:
for id in apps[apps["price"] == ""].appid:
    add_to_apps(get_steamspy_page(id))
    print(id, "done", apps[apps["price"] != ""].count().appid, "pending",  apps[apps["price"] == ""].count().appid)


735320 done 18837 pending 57793
735350 done 18838 pending 57792
735410 done 18839 pending 57791
735450 done 18840 pending 57790
735460 done 18841 pending 57789
735490 done 18842 pending 57788
735500 done 18843 pending 57787
735520 done 18844 pending 57786
735530 done 18845 pending 57785
735540 done 18846 pending 57784
735550 done 18847 pending 57783
735570 done 18848 pending 57782
735580 done 18849 pending 57781
735600 done 18850 pending 57780
735620 done 18851 pending 57779
735650 done 18852 pending 57778
735680 done 18853 pending 57777
735810 done 18854 pending 57776
735850 done 18855 pending 57775
736050 done 18856 pending 57774
736110 done 18857 pending 57773
736180 done 18858 pending 57772
736190 done 18859 pending 57771
736200 done 18860 pending 57770
736220 done 18861 pending 57769
736230 done 18862 pending 57768
736240 done 18863 pending 57767
736250 done 18864 pending 57766
736260 done 18865 pending 57765
736290 done 18866 pending 57764
736300 done 18867 pending 57763
736340 d

KeyboardInterrupt: 

In [278]:
print("done", apps[apps["price"] != ""].count().appid, "pending",  apps[apps["price"] == ""].count().appid)

done 9296 pending 67334


In [277]:
apps.to_pickle("steam_data.pkl")

In [None]:
# todo: reviews = positive + negative

In [None]:
# todo: log(reviews) + histogram

In [None]:
# todo: boolean genres

In [None]:
# todo: boolean tags