# Steam Data

In this notebook I'm adding all the methods required to download game data.

## appids, name and last_modified
I want to start by downloading all the information that Steam provides about a game in their API.
We do this by calling the IStoreService/GetAppList API endpoint. 
This returns a maximum of 50k appids. So we need to iterate through multiple pages to download all appids.

Whenever we want to update our data, we can start from the last app id from our dataframe.

In [2]:
!pip install pandas
!pip install beautifulsoup4

Collecting pandas
  Downloading pandas-1.5.2-cp310-cp310-macosx_11_0_arm64.whl (10.8 MB)
[2K     [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m10.8/10.8 MB[0m [31m51.3 MB/s[0m eta [36m0:00:00[0m[36m0:00:01[0m eta [36m0:00:01[0m
[?25hCollecting python-dateutil>=2.8.1
  Using cached python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB)
Collecting pytz>=2020.1
  Using cached pytz-2022.6-py2.py3-none-any.whl (498 kB)
Collecting numpy>=1.21.0
  Downloading numpy-1.23.5-cp310-cp310-macosx_11_0_arm64.whl (13.4 MB)
[2K     [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.4/13.4 MB[0m [31m69.2 MB/s[0m eta [36m0:00:00[0mm eta [36m0:00:01[0m[36m0:00:01[0m
[?25hCollecting six>=1.5
  Downloading six-1.16.0-py2.py3-none-any.whl (11 kB)
Installing collected packages: pytz, six, numpy, python-dateutil, pandas
Successfully installed numpy-1.23.5 pandas-1.5.2 python-dateutil-2.8.2 pytz-2022.6 six-1.16.0
Collecting beautifulsoup4
  Using cache

In [10]:
import requests
import json
import pandas as pd
import numpy as np

In [11]:
STEAMAPIS_KEY="fD9Ddf9K93IJtVk3PbYySF0ghxU" # Key for https://steamapis.com/
STEAM_API_KEY="EE167EC066B7A3EBB2F6B2392E72AD42" # Key for https://api.steampowered.com/

In [14]:
def from_last_appid(last_appid, count):
    url = f"https://api.steampowered.com/IStoreService/GetAppList/v1/?key={STEAM_API_KEY}&max_results={count}&last_appid={last_appid}"
    return requests.get(url)

response = {"last_appid": 0}
apps = []

while 'last_appid' in response:
    result = from_last_appid(response['last_appid'], 50000)
    if result.status_code == 200:
        response = result.json()['response']
        if response != {}:
            apps += response['apps']
            

apps = pd.DataFrame(apps)
apps.head()

Unnamed: 0,appid,name,last_modified,price_change_number
0,10,Counter-Strike,1666823513,16899276
1,20,Team Fortress Classic,1579634708,16899276
2,30,Day of Defeat,1512413490,16899276
3,40,Deathmatch Classic,1568752159,16899276
4,50,Half-Life: Opposing Force,1579628243,16899276


In [16]:
apps.to_pickle("steam_data.pkl")

# Store Images

Luckily, images are a function of the appid, so we can add the urls for them easily.

In [None]:
# apps["store_image_header"] = apps[apps[

# Store Page Scrapping

Additional data can be scrapped from the store page, so let's do that now.

In [12]:
from bs4 import BeautifulSoup

In [97]:
def get_steamdb_page(appid):
    return requests.get(f"https://store.steampowered.com/app/{appid}").text

html_doc = get_steamdb_page(10)

In [98]:
doc = BeautifulSoup(html_doc, "html.parser")

In [62]:
description = doc.find("div", class_="game_description_snippet").text.strip()

In [73]:
header_image = doc.find("img", class_="game_header_image_full")['src']

In [102]:
developers = []
for dev in doc.find("div", id="developers_list").find_all("a"):
    developers.append(dev.text)
developers = ", ".join(developers)

# SteamSpy Data

SteamSpy provides an API with some useful data, so we can add that to the data frame.

We first add the fields that we can find in the api:

In [13]:
apps = pd.read_pickle("steam_data.pkl")

In [129]:
# create new columns for data from SteamSpy
fields = ['developer', 'publisher', 'score_rank', 'positive', 'negative', 'userscore', 'owners', 'average_forever', 'average_2weeks', 'median_forever', 'median_2weeks', 'price', 'initialprice', 'discount', 'ccu', 'languages', 'genre', 'tags'];

app_count = apps.count().appid

empty_array = []
for i in range(app_count):
    empty_array.append("")

for field in fields:
    apps[field] = empty_array

In [14]:
def get_steamspy_all(page):
    return requests.get(f"https://steamspy.com/api.php?request=all&page={page}").json()
    
def get_steamspy_page(appid):
    return requests.get(f"https://steamspy.com/api.php?request=appdetails&appid={appid}").json()

def add_to_apps(result):
    appid = result.pop('appid')

    for key in result:
        if key != 'tags':
            apps.loc[apps['appid'] == appid, key] = result[key]
        else:
            if result['tags'] != []:
                apps.loc[apps['appid'] == appid, "tags"] = ", ".join(result['tags'].keys())
    apps.to_pickle("steam_data.pkl")


In [19]:
for id in apps[apps["price"] == ""].appid:
    add_to_apps(get_steamspy_page(id))
    print(id, "done", apps[apps["price"] != ""].count().appid, "pending",  apps[apps["price"] == ""].count().appid)


708340 done 17784 pending 58846
708360 done 17785 pending 58845
708370 done 17786 pending 58844
708420 done 17787 pending 58843
708430 done 17788 pending 58842
708450 done 17789 pending 58841
708510 done 17790 pending 58840
708550 done 17791 pending 58839
708580 done 17792 pending 58838
708590 done 17793 pending 58837
708630 done 17794 pending 58836
708640 done 17795 pending 58835
708680 done 17796 pending 58834
708710 done 17797 pending 58833
708720 done 17798 pending 58832
708760 done 17799 pending 58831
708780 done 17800 pending 58830
708820 done 17801 pending 58829
708830 done 17802 pending 58828
708850 done 17803 pending 58827
708870 done 17804 pending 58826
708890 done 17805 pending 58825
708910 done 17806 pending 58824
708920 done 17807 pending 58823
708930 done 17808 pending 58822
708950 done 17809 pending 58821
708970 done 17810 pending 58820
708990 done 17811 pending 58819
709010 done 17812 pending 58818
709020 done 17813 pending 58817
709040 done 17814 pending 58816
709050 d

ConnectionError: ('Connection aborted.', ConnectionResetError(54, 'Connection reset by peer'))

In [278]:
print("done", apps[apps["price"] != ""].count().appid, "pending",  apps[apps["price"] == ""].count().appid)

done 9296 pending 67334


In [277]:
apps.to_pickle("steam_data.pkl")

In [None]:
# todo: reviews = positive + negative

In [None]:
# todo: log(reviews) + histogram

In [None]:
# todo: boolean genres

In [None]:
# todo: boolean tags