# Steam Data

In this notebook I'm adding all the methods required to download game data.

## appids, name and last_modified
I want to start by downloading all the information that Steam provides about a game in their API.
We do this by calling the IStoreService/GetAppList API endpoint. 
This returns a maximum of 50k appids. So we need to iterate through multiple pages to download all appids.

Whenever we want to update our data, we can start from the last app id from our dataframe.

In [17]:
!pip install pandas
!pip install beautifulsoup4

Defaulting to user installation because normal site-packages is not writeable
Defaulting to user installation because normal site-packages is not writeable


In [253]:
import requests
import json
import pandas as pd
import numpy as np

In [251]:
STEAMAPIS_KEY="fD9Ddf9K93IJtVk3PbYySF0ghxU" # Key for https://steamapis.com/
STEAM_API_KEY="EE167EC066B7A3EBB2F6B2392E72AD42" # Key for https://api.steampowered.com/

In [14]:
def from_last_appid(last_appid, count):
    url = f"https://api.steampowered.com/IStoreService/GetAppList/v1/?key={STEAM_API_KEY}&max_results={count}&last_appid={last_appid}"
    return requests.get(url)

response = {"last_appid": 0}
apps = []

while 'last_appid' in response:
    result = from_last_appid(response['last_appid'], 50000)
    if result.status_code == 200:
        response = result.json()['response']
        if response != {}:
            apps += response['apps']
            

apps = pd.DataFrame(apps)
apps.head()

Unnamed: 0,appid,name,last_modified,price_change_number
0,10,Counter-Strike,1666823513,16899276
1,20,Team Fortress Classic,1579634708,16899276
2,30,Day of Defeat,1512413490,16899276
3,40,Deathmatch Classic,1568752159,16899276
4,50,Half-Life: Opposing Force,1579628243,16899276


In [16]:
apps.to_pickle("steam_data.pkl")

# Store Images

Luckily, images are a function of the appid, so we can add the urls for them easily.

In [None]:
apps["store_image_header"] = apps[apps[

# Store Page Scrapping

Additional data can be scrapped from the store page, so let's do that now.

In [19]:
from bs4 import BeautifulSoup

In [97]:
def get_steamdb_page(appid):
    return requests.get(f"https://store.steampowered.com/app/{appid}").text

html_doc = get_steamdb_page(10)

In [98]:
doc = BeautifulSoup(html_doc, "html.parser")

In [62]:
description = doc.find("div", class_="game_description_snippet").text.strip()

In [73]:
header_image = doc.find("img", class_="game_header_image_full")['src']

In [102]:
developers = []
for dev in doc.find("div", id="developers_list").find_all("a"):
    developers.append(dev.text)
developers = ", ".join(developers)

# SteamSpy Data

SteamSpy provides an API with some useful data, so we can add that to the data frame.

We first add the fields that we can find in the api:

In [254]:
apps = pd.read_pickle("steam_data.pkl")

In [129]:
fields = ['developer', 'publisher', 'score_rank', 'positive', 'negative', 'userscore', 'owners', 'average_forever', 'average_2weeks', 'median_forever', 'median_2weeks', 'price', 'initialprice', 'discount', 'ccu', 'languages', 'genre', 'tags'];

app_count = apps.count().appid

empty_array = []
for i in range(app_count):
    empty_array.append("")

for field in fields:
    apps[field] = empty_array

In [281]:
def get_steamspy_all(page):
    return requests.get(f"https://steamspy.com/api.php?request=all&page={page}").json()
    
def get_steamspy_page(appid):
    return requests.get(f"https://steamspy.com/api.php?request=appdetails&appid={appid}").json()

def add_to_apps(result):
    appid = result.pop('appid')

    for key in result:
        if key != 'tags':
            apps.loc[apps['appid'] == appid, key] = result[key]
        else:
            if result['tags'] != []:
                apps.loc[apps['appid'] == appid, "tags"] = ", ".join(result['tags'].keys())
    apps.to_pickle("steam_data.pkl")


In [None]:
for id in apps[apps["price"] == ""].appid:
    add_to_apps(get_steamspy_page(id))
    print(id, "done", apps[apps["price"] != ""].count().appid, "pending",  apps[apps["price"] == ""].count().appid)


452000 done 9421 pending 67209
452060 done 9422 pending 67208
452120 done 9423 pending 67207
452180 done 9424 pending 67206
452230 done 9425 pending 67205
452240 done 9426 pending 67204
452320 done 9427 pending 67203
452340 done 9428 pending 67202
452410 done 9429 pending 67201
452420 done 9430 pending 67200
452440 done 9431 pending 67199
452450 done 9432 pending 67198
452510 done 9433 pending 67197
452570 done 9434 pending 67196
452650 done 9435 pending 67195
452710 done 9436 pending 67194
452860 done 9437 pending 67193
452920 done 9438 pending 67192
452930 done 9439 pending 67191
452970 done 9440 pending 67190
453030 done 9441 pending 67189
453090 done 9442 pending 67188
453100 done 9443 pending 67187
453130 done 9444 pending 67186
453220 done 9445 pending 67185
453270 done 9446 pending 67184
453290 done 9447 pending 67183
453300 done 9448 pending 67182
453310 done 9449 pending 67181
453320 done 9450 pending 67180
453340 done 9451 pending 67179
453350 done 9452 pending 67178
453390 d

In [278]:
print("done", apps[apps["price"] != ""].count().appid, "pending",  apps[apps["price"] == ""].count().appid)

done 9296 pending 67334


In [277]:
apps.to_pickle("steam_data.pkl")

In [None]:
# todo: reviews = positive + negative

In [None]:
# todo: log(reviews) + histogram

In [None]:
# todo: boolean genres

In [None]:
# todo: boolean tags