# Steam Data

In this notebook I'm adding all the methods required to download game data.

## appids, name and last_modified
I want to start by downloading all the information that Steam provides about a game in their API.
We do this by calling the IStoreService/GetAppList API endpoint. 
This returns a maximum of 50k appids. So we need to iterate through multiple pages to download all appids.

Whenever we want to update our data, we can start from the last app id from our dataframe.

In [7]:
!pip install pandas
!pip install beautifulsoup4



In [1]:
import requests
import json
import pandas as pd
import numpy as np

In [3]:
STEAMAPIS_KEY="fD9Ddf9K93IJtVk3PbYySF0ghxU" # Key for https://steamapis.com/
STEAM_API_KEY="EE167EC066B7A3EBB2F6B2392E72AD42" # Key for https://api.steampowered.com/

In [14]:
def from_last_appid(last_appid, count):
    url = f"https://api.steampowered.com/IStoreService/GetAppList/v1/?key={STEAM_API_KEY}&max_results={count}&last_appid={last_appid}"
    return requests.get(url)

response = {"last_appid": 0}
apps = []

while 'last_appid' in response:
    result = from_last_appid(response['last_appid'], 50000)
    if result.status_code == 200:
        response = result.json()['response']
        if response != {}:
            apps += response['apps']
            

apps = pd.DataFrame(apps)
apps.head()

Unnamed: 0,appid,name,last_modified,price_change_number
0,10,Counter-Strike,1666823513,16899276
1,20,Team Fortress Classic,1579634708,16899276
2,30,Day of Defeat,1512413490,16899276
3,40,Deathmatch Classic,1568752159,16899276
4,50,Half-Life: Opposing Force,1579628243,16899276


In [16]:
apps.to_pickle("steam_data.pkl")

# Store Images

Luckily, images are a function of the appid, so we can add the urls for them easily.

In [None]:
# apps["store_image_header"] = apps[apps[

# Store Page Scrapping

Additional data can be scrapped from the store page, so let's do that now.

In [12]:
from bs4 import BeautifulSoup

In [97]:
def get_steamdb_page(appid):
    return requests.get(f"https://store.steampowered.com/app/{appid}").text

html_doc = get_steamdb_page(10)

In [98]:
doc = BeautifulSoup(html_doc, "html.parser")

In [62]:
description = doc.find("div", class_="game_description_snippet").text.strip()

In [73]:
header_image = doc.find("img", class_="game_header_image_full")['src']

In [102]:
developers = []
for dev in doc.find("div", id="developers_list").find_all("a"):
    developers.append(dev.text)
developers = ", ".join(developers)

# SteamSpy Data

SteamSpy provides an API with some useful data, so we can add that to the data frame.

We first add the fields that we can find in the api:

In [129]:
# create new columns for data from SteamSpy
fields = ['developer', 'publisher', 'score_rank', 'positive', 'negative', 'userscore', 'owners', 'average_forever', 'average_2weeks', 'median_forever', 'median_2weeks', 'price', 'initialprice', 'discount', 'ccu', 'languages', 'genre', 'tags'];

app_count = apps.count().appid

empty_array = []
for i in range(app_count):
    empty_array.append("")

for field in fields:
    apps[field] = empty_array

The next step is to load the existing data, define the methods used to download info, and then run the process

### apps = pd.read_pickle("steam_data.pkl")

In [3]:
def get_steamspy_all(page):
    return requests.get(f"https://steamspy.com/api.php?request=all&page={page}").json()
    
def get_steamspy_page(appid):
    return requests.get(f"https://steamspy.com/api.php?request=appdetails&appid={appid}").json()

def add_to_apps(result):
    appid = result.pop('appid')

    for key in result:
        if key != 'tags':
            apps.loc[apps['appid'] == appid, key] = result[key]
        else:
            if result['tags'] != []:
                apps.loc[apps['appid'] == appid, "tags"] = ", ".join(result['tags'].keys())
    apps.to_pickle("steam_data.pkl")


In [8]:
for id in apps[apps["price"] == ""].appid:
    add_to_apps(get_steamspy_page(id))
    print(id, "done", apps[apps["price"] != ""].count().appid, "pending",  apps[apps["price"] == ""].count().appid)


1745840 done 59571 pending 17059
1745860 done 59572 pending 17058
1745880 done 59573 pending 17057
1745920 done 59574 pending 17056
1745960 done 59575 pending 17055
1745990 done 59576 pending 17054
1746010 done 59577 pending 17053
1746030 done 59578 pending 17052
1746040 done 59579 pending 17051
1746080 done 59580 pending 17050
1746110 done 59581 pending 17049
1746120 done 59582 pending 17048
1746130 done 59583 pending 17047
1746150 done 59584 pending 17046
1746210 done 59585 pending 17045
1746230 done 59586 pending 17044
1746280 done 59587 pending 17043
1746340 done 59588 pending 17042
1746370 done 59589 pending 17041
1746400 done 59590 pending 17040
1746420 done 59591 pending 17039
1746500 done 59592 pending 17038
1746520 done 59593 pending 17037
1746530 done 59594 pending 17036
1746540 done 59595 pending 17035
1746560 done 59596 pending 17034
1746590 done 59597 pending 17033
1746620 done 59598 pending 17032
1746630 done 59599 pending 17031
1746640 done 59600 pending 17030
1746650 do

In [278]:
print("done", apps[apps["price"] != ""].count().appid, "pending",  apps[apps["price"] == ""].count().appid)

done 9296 pending 67334


In [277]:
apps.to_pickle("steam_data.pkl")

In [None]:
# todo: reviews = positive + negative

In [None]:
# todo: log(reviews) + histogram

In [None]:
# todo: boolean genres

In [None]:
# todo: boolean tags