<h1>Predicting game prices</h1>
<h5 style="margin-left: 2rem">By: Elad Ben-Haim, Shalev Hadar</h5>
<br/>
<br/>

<h4>נושא המחקר</h4>
<table dir="rtl">
    <tr>
        <th>
            נושא המחקר
        </th>
        <th>
            פירוט הנושא
        </th>
    </tr>
    <tr>
        <td>
        האם ניתן לחזות מחיר של משחק בעוד x זמן?
        </td>
        <td>
        מתי הכי ישתלם לקנות את המשחק בעתיד
        </td>
    </tr>
    <tr>
        <td>
        האם ניתן לחזור מתי למוכר הכי משתלם לעשות מבצע על המשחק?
        </td>
        <td>
        מתי הכי כדאי למוכר לעשות מבצע כדי להביא עוד שחקנים ולהמשיך למכור עם הרווח הגדול ביותר
        </td>
    </tr>
</table>
<br/>

<br/>

<h4>נתוני המחקר ודרכי ניתוח</h4>
<table dir="rtl">
    <tr>
        <th>
            נתוני המחקר & דרכי ניתוח
        </th>
        <th>
             ואיך ננתח אותם
        </th>
    </tr>
    <tr>
        <td>
		פירוט הנתונים: פרטים פיננסיים הוא נמכר בזמן X ואז לחזות בכמה הוא ימכר בזמן Y ובנוסף פרטים על המשחק - כמו שם, ז'אנר פופולריות וכו'.
        </td>
        <td>
            <p style="font-size: 1.1rem">
                נשתמש ב-Crawling על אתר isThereAnyDeal(Fig.3)<br/>
                כדי לא לקבל הודעת שגיאה על שימוש יתר, נשתמש ב-PROXY כדי לא להחסם ע"י isThereAnyDeal<br/>
                ונייבא משם את כל המידע הדרוש כדי לחזות את המחיר של משחק בעוד X זמן מסוים.<br/>
                ראינו שלכל משחק קיים באיזור ה1000+ (Fig.2) רשומות של log (Fig.1) של המחיר שלו ביחס לזמן, ולחנות בה הוא נמכר ואת העלייה\ירידה במחיר של המשחק ביחד ללוג הקודם.<br/>
                בעזרת STEAM API נוציא את ז'אנר המשחק, שנת הייצור ועוד פרטים מורכבים יותר על המשחק עצמו<br/>
                ולבסוף נצרף לכל לוג את פרטי המשחק ונקבל Dataset בגודל n = כמות המשחקים, x = כמות הלוגים, כלומר n*x<br/>
                אנחנו מעוניינים כרגע לקחת את 100 המשחקים הראשונים ולפי מה שראינו כמות הלוגים בדרך כלל היא לפחות 1000 אז נקבל בסביבות ה100,000+ רשומות.<br/>
            </p>
        </td>
    </tr>
    <tr>
        <td>
		דרכי ניתוח: נשתמש בכלים שלמדנו במהלך הקורס לעבד\ללמוד מהנתונים כמה מידע שאנו צריכים למטרה זו
        </td>
        <td>
            <p style="font-size: 1.1rem">
                ננתח את הDataFrame, בעזרת טבלאות יחסי משתנים, סטטיסטיקות, ולבסוף ננסה ללמד מכונה שתחזה את התאריך של המחיר הזול ביותר בשנה מסוימת, ואת המחיר אשר יביא את כמות המכירות הגדול ביותר
            </p>
        </td>
    </tr>
</table>
<br/>
<table style="width:100%;grid-template-rows: 1fr 1fr 1fr;">
<tr>
    <td>
        <figure>
            <img src="images\log_table_for_ds3.png" alt="Is there any deal log table">
            <figcaption>Fig.1 - The main crawled data source</figcaption>
        </figure>
    </td>
    <td>
        <figure>
            <img src="images\number_of_logs_for_ds3.png" alt="Example for number of rows in a typical game">
            <figcaption>Fig.2 - Example for number of rows in a typical game (Dark souls 3) </figcaption>
        </figure>
    </td>
    <td>
        <figure>
            <img src="images\is_there_any_deal_site_example_ds3.png" alt="Is there any deal game page">
            <figcaption>Fig.3 - IsThereAnyDeal game page</figcaption>
        </figure>
    </td>
</tr>

<h4>Importing</h6>

In [214]:
import requests
import bs4
from bs4 import BeautifulSoup
import random
import itertools
import re
import pandas as pd
import time


<h4>Global functions</h4>

In [215]:
def get_html_response(url: str, proxy: str = None, params: list = None) -> requests.Response:
    time.sleep(1)
    if (proxy is not None):
        return requests.get(url, proxies={"http": proxy, "https":proxy}, params=params)
    else:
        return requests.get(url, params=params)

def get_response_as_beautiful_soup(req: requests.Response) -> BeautifulSoup:
    return BeautifulSoup(req.text, 'html.parser')

<br/>

<h4>Defining proxies for scraping</h4>

<h5>Get proxy list response html website</h5>
<h6>get the html as response object instead of getting the html again and again</h6>

In [216]:
# Get the html of the proxy list website
def get_proxy_list_html() -> requests.Response:
    # Website to get free proxies
    return get_html_response('https://free-proxy-list.net/')

In [217]:
proxies_response = get_proxy_list_html()

<h5>Scrape proxy ip addresses</h5>
<h6>gets the ip addresses as a list, shuffles them and returns an iterator to cycle through when making scrape requests</h6>

In [218]:
def get_proxy_list() -> itertools.cycle:
    soup = get_response_as_beautiful_soup(proxies_response)
    proxy_soup_list = soup.select('#list > div > div.table-responsive > div > table > tbody > tr')
    proxy_list = list(map(lambda i: i.select('td:nth-child(1)')[0].text + ':' + i.select('td:nth-child(2)')[0].text, proxy_soup_list))
    length = len(proxy_list)
    random.shuffle(proxy_list)
    return itertools.cycle(proxy_list), length

In [219]:
proxy_list, proxy_list_length = get_proxy_list()
current_proxy = next(proxy_list)

In [220]:
def get_proxied(url: str, params: list = None) -> requests.Response:
    i = 0
    while(i < proxy_list_length / 2):
        try:
            response = get_html_response(url, current_proxy, params=params)
            time.sleep(2)
            return response
        except:
            i += 1
        finally:
            current_proxy = next(proxy_list)

    raise RuntimeError('Half of the proxies provided don`t work.')

<br/>

<h4>Scrape isThereAnyDeal website</h4>
<h6>Steps:</h6>
<ol>
<li>Crawl list of top 100 trending games</li>
<li>For each game:</li>
<ul>
    <li>get the game details from steam API using "appId" scraped either from PC Gaming Wiki or Steam Ladder links</li>
    <li>mine price Log history on isThereAnyDeal</li>
    <li>mine Number of sales of the game</li>
<ul>
</ol>

In [221]:
is_there_any_deal_url = 'https://isthereanydeal.com'
steam_api_url = 'https://store.steampowered.com/api/appdetails'

<h5>Get list of 100 top trending games</h5>

In [222]:
def get_is_there_any_deal_games_response() -> requests.Response:
    #filteredUrl = 'https://isthereanydeal.com/?by=trending:desc#/filter:&pl/windows,&drm/steam,steam,-dlc,-type/6,-type/8,-type/7,&releaseyear/2015/2020;/options:all'
    filteredUrl = 'https://isthereanydeal.com/'
    return get_html_response(filteredUrl)

In [223]:
is_there_any_deal_games_response = get_is_there_any_deal_games_response()
print(is_there_any_deal_games_response)

<Response [200]>


In [224]:
def add_game_to_dataframe(df: pd.DataFrame, game: dict) -> pd.DataFrame:
    return df.append(pd.DataFrame([game]), ignore_index=True)

In [225]:
def get_steam_api_game_response(steamId: str) -> requests.Response:
    return get_html_response(steam_api_url, params={'appids': steamId})

In [226]:
def delete_if_exists(d: dict, *keys: list) -> dict:
    for key in keys:
        if d.get(key) is not None:
            del d[key]
    return d

In [227]:
def get_steam_api_game_details(steamId: str) -> dict:
    data = get_steam_api_game_response(steamId).json()
    if data.get(steamId) is None or data.get(steamId).get('data') is None:
        return {}
    data = data[steamId]['data']
    data = delete_if_exists(data,
    'type',
    'name',
    'steam_appid',
    'dlc',
    'detailed_description',
    'about_the_game',
    'short_description',
    'fullgame',
    'header_image',
    'website',
    'pc_requirements',
    'mac_requirements',
    'linux_requirements',
    'legal_notice',
    'price_overview',
    'packages',
    'package_groups',
    'screenshots',
    'achievements',
    'background',
    'content_descriptors',
    'support_info',
    'ext_user_account_notice',
    'reviews'
    )

    if data.get('demos') is not None:
        data['has_demos'] = len(data['demos']) >= 1
        del data['demos']

    if data.get('movies') is not None:
        data['num_of_game_videos'] = len(data['movies'])
        del data['movies']

    if data.get('packages') is not None:
        data['num_of_packages_game_is_in'] = len(data['packages'])
        del data['packages']


    if data.get('metacritic') is not None:
        data['metacritic_score'] = data['metacritic']['score'] / 100
        del data['metacritic']

    if data.get('platforms') is not None:
        data['windows_supported'] = data['platforms']['windows']
        data['mac_supported'] = data['platforms']['mac']
        data['linux_supported'] = data['platforms']['linux']
        del data['platforms']
    
    if data.get('recommendations') is not None:
        data['total_steam_recommendations'] = data['recommendations']['total']
        del data['recommendations']

    if data.get('categories') is not None:
        for category in data['categories']:
            data['category.'+str(category['id'])]=True
        del data['categories']
    
    if data.get('genres') is not None:
        for genre in data['genres']:
            data['genre.'+str(genre['id'])]=True
        del data['genres']
    
    if data.get('developers') is not None:
        for developer in data['developers']:
            data[f'developer.' + developer.strip().replace(' ', '_')]=True
        del data['developers']
    
    if data.get('publishers') is not None:
        for publisher in data['publishers']:
            data[f'publisher.' + publisher.strip().replace(' ', '_')]=True
        del data['publishers']
        
    if data.get('supported_languages') is not None:
        for language in data['supported_languages']\
                            .replace('<strong>*</strong>', '')\
                            .replace('<br/>', '')\
                            .replace('<br>', '')\
                            .replace('languages with full audio support', '')\
                            .split(','):
            data[f'supported_language.' + language.strip().replace(' ', '_')]=True
        del data['supported_languages']

    data['release_date'] = data['release_date']['date']

    return data

<img style="width: 50%" src="images/steam_api_response.png"/>

In [228]:
def get_game_dataframe():
    soup = get_response_as_beautiful_soup(is_there_any_deal_games_response)
    df = pd.DataFrame()
    games = soup.select("#games > div.game")
    game_dict = dict()
    for game in games:
        steamId = game.attrs.get('data-steamid')
        if steamId is not None:
            steamId: str
            if 'app' in steamId:
                steamId = steamId.split('/')[1]
                title = game.select("div.title > a")[0].text
                history = game.select("div.overview.exp.tgl-hide > a:nth-child(5)")[0].attrs.get('href')
                game_details = get_steam_api_game_details(steamId)
                if game_details == {}: continue
                df = add_game_to_dataframe(df, {'steamId': steamId, 'title': title, 'history_link': is_there_any_deal_url + history, **game_details})
    return df

In [229]:
games_dataframe = get_game_dataframe()
games_dataframe

Unnamed: 0,steamId,title,history_link,required_age,is_free,release_date,num_of_game_videos,metacritic_score,windows_supported,mac_supported,...,publisher.Beamdog,developer.Frontier_Developments,publisher.Frontier_Developments,developer.UBIart_Montpellier,"developer.Mane6,_Inc.","publisher.Mane6,_Inc.",developer.Stumpy🐙Squid,developer.Fury_Studios,developer.Coatsink,developer.4A_Games
0,1174180,Red Dead Redemption 2,https://isthereanydeal.com/game/reddeadredempt...,0,False,5 дек. 2019,2.0,0.93,True,False,...,,,,,,,,,,
1,1091500,Cyberpunk 2077,https://isthereanydeal.com/game/cyberpunkii0vi...,18,False,"9 Dec, 2020",12.0,0.86,True,False,...,,,,,,,,,,
2,879850,Box: The Game,https://isthereanydeal.com/game/boxgame/history/,0,False,"16 Jan, 2019",1.0,,True,False,...,,,,,,,,,,
3,632470,Disco Elysium,https://isthereanydeal.com/game/discoelysium/h...,0,False,"15 Oct, 2019",3.0,0.97,True,True,...,,,,,,,,,,
4,1151640,Horizon Zero Dawn™ Complete Edition,https://isthereanydeal.com/game/horizonzerodaw...,16,False,"7 Aug, 2020",3.0,,True,False,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
73,207490,Rayman Origins,https://isthereanydeal.com/game/raymanorigins/...,0,False,"29 Mar, 2012",,0.86,True,False,...,,,,True,,,,,,
74,574980,Them's Fightin' Herds,https://isthereanydeal.com/game/themsfightinhe...,0,False,"30 Apr, 2020",2.0,,True,True,...,,,,,True,True,,,,
75,701160,Kingdom: Two Crowns,https://isthereanydeal.com/game/kingdomtwocrow...,0,False,"11 Dec, 2018",3.0,,True,True,...,,,,,,,True,True,True,
76,412020,Metro Exodus,https://isthereanydeal.com/game/metroexodus/hi...,0,False,"14 Feb, 2020",8.0,0.85,True,True,...,,,,,,,,,,True


In [230]:
games_dataframe2 = games_dataframe.copy()
cols = [col for col in games_dataframe2 if col.startswith('category.') or col.startswith('genre.') or col.startswith('developer.') or col.startswith('supported_language.') or col.startswith('publisher.')]
for category in cols:
    games_dataframe2[category] = games_dataframe2[category].fillna(False)

games_dataframe2

Unnamed: 0,steamId,title,history_link,required_age,is_free,release_date,num_of_game_videos,metacritic_score,windows_supported,mac_supported,...,publisher.Beamdog,developer.Frontier_Developments,publisher.Frontier_Developments,developer.UBIart_Montpellier,"developer.Mane6,_Inc.","publisher.Mane6,_Inc.",developer.Stumpy🐙Squid,developer.Fury_Studios,developer.Coatsink,developer.4A_Games
0,1174180,Red Dead Redemption 2,https://isthereanydeal.com/game/reddeadredempt...,0,False,5 дек. 2019,2.0,0.93,True,False,...,False,False,False,False,False,False,False,False,False,False
1,1091500,Cyberpunk 2077,https://isthereanydeal.com/game/cyberpunkii0vi...,18,False,"9 Dec, 2020",12.0,0.86,True,False,...,False,False,False,False,False,False,False,False,False,False
2,879850,Box: The Game,https://isthereanydeal.com/game/boxgame/history/,0,False,"16 Jan, 2019",1.0,,True,False,...,False,False,False,False,False,False,False,False,False,False
3,632470,Disco Elysium,https://isthereanydeal.com/game/discoelysium/h...,0,False,"15 Oct, 2019",3.0,0.97,True,True,...,False,False,False,False,False,False,False,False,False,False
4,1151640,Horizon Zero Dawn™ Complete Edition,https://isthereanydeal.com/game/horizonzerodaw...,16,False,"7 Aug, 2020",3.0,,True,False,...,False,False,False,False,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
73,207490,Rayman Origins,https://isthereanydeal.com/game/raymanorigins/...,0,False,"29 Mar, 2012",,0.86,True,False,...,False,False,False,True,False,False,False,False,False,False
74,574980,Them's Fightin' Herds,https://isthereanydeal.com/game/themsfightinhe...,0,False,"30 Apr, 2020",2.0,,True,True,...,False,False,False,False,True,True,False,False,False,False
75,701160,Kingdom: Two Crowns,https://isthereanydeal.com/game/kingdomtwocrow...,0,False,"11 Dec, 2018",3.0,,True,True,...,False,False,False,False,False,False,True,True,True,False
76,412020,Metro Exodus,https://isthereanydeal.com/game/metroexodus/hi...,0,False,"14 Feb, 2020",8.0,0.85,True,True,...,False,False,False,False,False,False,False,False,False,True


In [231]:
games_dataframe2.to_csv('games_details_data.csv')
games_dataframe2

Unnamed: 0,steamId,title,history_link,required_age,is_free,release_date,num_of_game_videos,metacritic_score,windows_supported,mac_supported,...,publisher.Beamdog,developer.Frontier_Developments,publisher.Frontier_Developments,developer.UBIart_Montpellier,"developer.Mane6,_Inc.","publisher.Mane6,_Inc.",developer.Stumpy🐙Squid,developer.Fury_Studios,developer.Coatsink,developer.4A_Games
0,1174180,Red Dead Redemption 2,https://isthereanydeal.com/game/reddeadredempt...,0,False,5 дек. 2019,2.0,0.93,True,False,...,False,False,False,False,False,False,False,False,False,False
1,1091500,Cyberpunk 2077,https://isthereanydeal.com/game/cyberpunkii0vi...,18,False,"9 Dec, 2020",12.0,0.86,True,False,...,False,False,False,False,False,False,False,False,False,False
2,879850,Box: The Game,https://isthereanydeal.com/game/boxgame/history/,0,False,"16 Jan, 2019",1.0,,True,False,...,False,False,False,False,False,False,False,False,False,False
3,632470,Disco Elysium,https://isthereanydeal.com/game/discoelysium/h...,0,False,"15 Oct, 2019",3.0,0.97,True,True,...,False,False,False,False,False,False,False,False,False,False
4,1151640,Horizon Zero Dawn™ Complete Edition,https://isthereanydeal.com/game/horizonzerodaw...,16,False,"7 Aug, 2020",3.0,,True,False,...,False,False,False,False,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
73,207490,Rayman Origins,https://isthereanydeal.com/game/raymanorigins/...,0,False,"29 Mar, 2012",,0.86,True,False,...,False,False,False,True,False,False,False,False,False,False
74,574980,Them's Fightin' Herds,https://isthereanydeal.com/game/themsfightinhe...,0,False,"30 Apr, 2020",2.0,,True,True,...,False,False,False,False,True,True,False,False,False,False
75,701160,Kingdom: Two Crowns,https://isthereanydeal.com/game/kingdomtwocrow...,0,False,"11 Dec, 2018",3.0,,True,True,...,False,False,False,False,False,False,True,True,True,False
76,412020,Metro Exodus,https://isthereanydeal.com/game/metroexodus/hi...,0,False,"14 Feb, 2020",8.0,0.85,True,True,...,False,False,False,False,False,False,False,False,False,True


In [36]:
def get_game_details_response():
    return get_html_response("https://isthereanydeal.com/game/deadbydaylight/history/#/chart:low")

In [None]:
game_details_response = get_game_details_response()

In [None]:
def get_game_details():
    soup = get_response_as_beautiful_soup(game_details_response)
    return soup.select("#pageContainer > script:nth-child(9)")

gamedetails = str(get_game_details()[0])
m = re.findall('JSON\.stringify\((.*?)\)', gamedetails)
# print(m)

<h3>Used Resources</h3>
<dl>
    <dt>Scraping</dt>
    <dd>
        <a href="https://isthereanydeal.com/game/reddeadredemptionii/info/">
            <b>Is-There-Any-Deal website</b> For scraping cost history and more financial details
        </a>
    </dd>
    <dd>
        <a href="https://www.geeksforgeeks.org/web-scraping-without-getting-blocked/">
            <b>Using Proxies to avoiding getting blocked</b>
        </a>
    </dd>
    <dd>
        <a href="https://wiki.teamfortress.com/wiki/User:RJackson/StorefrontAPI#App_info">
            <b>Steam StoreFront API</b> Limited to 100,000 requests per day, and no more than 10 per second
        </a>
    </dd>
    <dd>
        <a href="https://store.steampowered.com/api/appdetails">Steam API for Game Metadata - https://store.steampowered.com/api/appdetails?appids=1091500</a>
    </dd>
</dl>