# Steam Data

In this notebook I'm adding all the methods required to download game data.

## appids, name and last_modified
I want to start by downloading all the information that Steam provides about a game in their API.
We do this by calling the IStoreService/GetAppList API endpoint. 
This returns a maximum of 50k appids. So we need to iterate through multiple pages to download all appids.

Whenever we want to update our data, we can start from the last app id from our dataframe.

In [2]:
!pip install pandas
!pip install beautifulsoup4



In [10]:
import requests
import json
import pandas as pd
import numpy as np
from bs4 import BeautifulSoup

In [3]:
STEAMAPIS_KEY="fD9Ddf9K93IJtVk3PbYySF0ghxU" # Key for https://steamapis.com/
STEAM_API_KEY="EE167EC066B7A3EBB2F6B2392E72AD42" # Key for https://api.steampowered.com/

In [14]:
def from_last_appid(last_appid, count):
    url = f"https://api.steampowered.com/IStoreService/GetAppList/v1/?key={STEAM_API_KEY}&max_results={count}&last_appid={last_appid}"
    return requests.get(url)

response = {"last_appid": 0}
apps = []

while 'last_appid' in response:
    result = from_last_appid(response['last_appid'], 50000)
    if result.status_code == 200:
        response = result.json()['response']
        if response != {}:
            apps += response['apps']
            

apps = pd.DataFrame(apps)
apps.head()

Unnamed: 0,appid,name,last_modified,price_change_number
0,10,Counter-Strike,1666823513,16899276
1,20,Team Fortress Classic,1579634708,16899276
2,30,Day of Defeat,1512413490,16899276
3,40,Deathmatch Classic,1568752159,16899276
4,50,Half-Life: Opposing Force,1579628243,16899276


In [16]:
apps.to_pickle("steam_data.pkl")

# Store Images

Luckily, images are a function of the appid, so we can add the urls for them easily.

In [None]:
# apps["store_image_header"] = apps[apps[

# Store Page Scrapping

Additional data can be scrapped from the store page, so let's do that now.

In [12]:
from bs4 import BeautifulSoup

In [31]:
apps["description"] = ""
apps["release_date"] = ""
apps["game_description"] = ""

In [13]:
apps = pd.read_pickle("steam_data.pkl")

In [33]:
def get_steam_page(appid):
    return BeautifulSoup(
        requests.get(f"https://store.steampowered.com/app/{appid}").text, 
        "html.parser"
    )

def get_steam_store_data(appid):
    html = get_steam_page(appid)
    apps.loc[apps["appid"] == appid, "description"] = doc.find("div", class_="game_description_snippet").text.strip()
    apps.loc[apps["appid"] == appid, "release_date"] = doc.find("div", class_="date").text.strip()
    apps.loc[apps["appid"] == appid, "game_description"] = doc.find("div", class_="game_area_description").text.strip()


In [None]:
for id in apps[apps["price"] == ""].appid:
    add_to_apps(get_steamspy_page(id))
    print(id, "done", apps[apps["price"] != ""].count().appid, "pending",  apps[apps["price"] == ""].count().appid)


# SteamSpy Data

SteamSpy provides an API with some useful data, so we can add that to the data frame.

We first add the fields that we can find in the api:

In [129]:
# create new columns for data from SteamSpy
fields = ['developer', 'publisher', 'score_rank', 'positive', 'negative', 'userscore', 'owners', 'average_forever', 'average_2weeks', 'median_forever', 'median_2weeks', 'price', 'initialprice', 'discount', 'ccu', 'languages', 'genre', 'tags'];

app_count = apps.count().appid

empty_array = []
for i in range(app_count):
    empty_array.append("")

for field in fields:
    apps[field] = empty_array

The next step is to load the existing data, define the methods used to download info, and then run the process

In [4]:
apps = pd.read_pickle("steam_data.pkl")

In [5]:
def get_steamspy_all(page):
    return requests.get(f"https://steamspy.com/api.php?request=all&page={page}").json()
    
def get_steamspy_page(appid):
    return requests.get(f"https://steamspy.com/api.php?request=appdetails&appid={appid}").json()

def add_to_apps(result):
    appid = result.pop('appid')

    for key in result:
        if key != 'tags':
            apps.loc[apps['appid'] == appid, key] = result[key]
        else:
            if result['tags'] != []:
                apps.loc[apps['appid'] == appid, "tags"] = ", ".join(result['tags'].keys())
    apps.to_pickle("steam_data.pkl")


In [6]:
for id in apps[apps["price"] == ""].appid:
    add_to_apps(get_steamspy_page(id))
    print(id, "done", apps[apps["price"] != ""].count().appid, "pending",  apps[apps["price"] == ""].count().appid)


In [278]:
print("done", apps[apps["price"] != ""].count().appid, "pending",  apps[apps["price"] == ""].count().appid)

done 9296 pending 67334


In [277]:
apps.to_pickle("steam_data.pkl")

In [None]:
# todo: reviews = positive + negative

In [None]:
# todo: log(reviews) + histogram

In [None]:
# todo: boolean genres

In [None]:
# todo: boolean tags