# Steam reviews deep dive

This notebook will act as a live document as I use it to source, clean and analyze a dataset sourced from Steam. The notebook will overall be split into two sections:
* Getting the database
* Analyzing the database

Once the database is formed, it will most likely be huge. Instead of storing it on github, I will upload it to kaggle, at which point I will load said database for the sections afterwards. 



## Steam Reviews Get!


#### Objectives
* Source a dataset of all English reviews on steam!
    * Access publicly available api calls for store.steampowered
    * end with a dataframe for which a row will hold the app title/id, review author details, and review details
* Write and save dataset to Kaggle for convenience
* Thoroughly document process to help learn and also later reference as a tutorial

#### Concerns
* Avoiding DOS
    * I'm making an obscene number of requests from Valve what happens if I request too much?
    * Can I limit the amount of requests I make and request in a piecemeal fashion?
        * From research, it looks like steam limits the amount of requests allowed to 100,000 per calendar day. 
        * Approach two different ways: sleep an amount every request to ensure I don't go over (865ms per request) or count requests via a global variable and sleep on 100,000
        

In [21]:
import pandas as pd
import numpy as np
import requests
import time

In [51]:


URL= "https://store.steampowered.com/appreviews/317710?json=1"
cursor = '*'
PARAMS = {
    'filter':'updated',
    'language':'english',
    'cursor': cursor,
    'purchase_type':'all',
    'num_per_page' : 1,
}

1533420
r = requests.get(URL, PARAMS)
r=r.json()

In [52]:
print(r)

print(r["query_summary"]['total_reviews'])
print(r["cursor"])

396
AoJ4pdf2/IADdfC0ugM=


In [94]:
# print(r["reviews"])
df = pd.json_normalize(r["reviews"])
# df.head()

In [28]:
def getAppReviews(app, step, pause): #get all reviews associated with an app given an appid, size to get reviews with, and a pause between each request (seconds)
    URL = "https://store.steampowered.com/appreviews/" + str(app) +"?json=1"

    tally=step
    PARAMS = {
        'language':'english',
        'filter':'updated',
        'purchase_type':'all',
        'cursor': '*',  #initial value defined in steam's documentation
        'num_per_page' : step,
    }

    r = requests.get(URL, PARAMS)
    r=r.json()
    nCursor = {'cursor' : r["cursor"]}
    PARAMS.update(nCursor)
    total = r["query_summary"]['total_reviews']
    df = pd.json_normalize(r["reviews"])
    while tally < total:
        time.sleep(pause)
        r = requests.get(URL, PARAMS)
        r=r.json()
        df = pd.concat([df, (pd.json_normalize(r["reviews"]))])
        tally += step
        nCursor = {'cursor' : r["cursor"]}
        PARAMS.update(nCursor)
    return df
    

In [70]:
df = getAppReviews(317710, 51, 0.87)

In [65]:
df.shape

(396, 20)

We now have a method to get reviews given a specific app. Next we get a list of all apps to call from.

In [51]:
apps = requests.get("https://api.steampowered.com/ISteamApps/GetAppList/v2/")
apps=apps.json()

In [52]:
df = pd.json_normalize(apps["applist"]["apps"])

In [53]:
df.head()

Unnamed: 0,appid,name
0,1829051,
1,1808781,
2,1983382,
3,1828741,
4,216938,Pieterw test app76 ( 216938 )


In [54]:
dc=df.replace(r'^\s*$', np.nan, regex=True)
dc=dc.dropna()
dc.shape
dc.reset_index(drop=True, inplace=True)
dc.head()

Unnamed: 0,appid,name
0,216938,Pieterw test app76 ( 216938 )
1,660010,test2
2,660130,test3
3,1460200,Merge Nymphs
4,1479780,Our Apartment


In [39]:
apps = dc
all_reviews = pd.DataFrame
for id in apps['appid']:
    df = getAppReviews(apptest, 51, 0.87)
    


216938

What am I doing next? 

* write code to iterate through the applist
    * check if pd is empty
    * add to major overall database
* save database to file to upload to kaggle

    

In [42]:
#Test Value cell
apptest = 1462040
df = getAppReviews(apptest, 51, 0.87)
print(df.empty)


False


In [43]:
id=1462040
df.insert(0,dc,value)

Unnamed: 0,recommendationid,language,review,timestamp_created,timestamp_updated,voted_up,votes_up,votes_funny,weighted_vote_score,comment_count,steam_purchase,received_for_free,written_during_early_access,author.steamid,author.num_games_owned,author.num_reviews,author.playtime_forever,author.playtime_last_two_weeks,author.playtime_at_review,author.last_played
0,118403162,english,"Remake and it's on PC, what more can you ask f...",1657218023,1657218023,True,0,0,0.0,0,True,False,False,76561198017661797,46,3,870,870,870,1657217906
1,118402906,english,Great revisit to the game. I loved the cut sce...,1657217747,1657217747,True,0,0,0.0,0,True,False,False,76561198073669555,41,2,3096,3096,3071,1657218885
2,118402709,english,"This is game isn't just bad, I can almost feel...",1657217543,1657217543,False,0,0,0.4589985311031341,0,True,False,False,76561198069565559,176,11,4557,4364,4557,1657209829
3,117643060,english,i love Aerith,1656238808,1657213036,True,0,0,0.0,0,True,False,False,76561198860089878,135,6,1343,1343,1343,1657117141
4,118397635,english,this remake is in good hands,1657211954,1657211954,True,0,0,0.0,0,True,False,False,76561198050524284,61,21,2970,2970,2883,1657217271


In [55]:
dc.index[dc['appid']=='1462040']


Int64Index([], dtype='int64')