# **<p align="center">Predicting The Success of Video Games</p>**
##### <p align="center">Kevin Rathbun</p>

### **1. Introduction**
Video game developers and companies develop their games with one major goal in mind: to produce a successful game.<br>
But what makes a game successful and how would we measure its success? In this data exploration project I will<br>
explore these questions to provide insight and analysis into what makes a game successful and will create a<br>
method to predict the success of a game based on various metrics.<br>
<br><br>
TODO UPDATE IF GOALS CHANGE <br>
The metrics I will be looking at are the release date, the developer, the publisher, the supported platforms,<br>
the categories and genres the game falls into, the price, and the hardware requirements to play the game.<br>
This is by no means a comprehensive list of what determines the success of a game, but will serve as useful metrics<br>
in predicting success. To measure success, I will look at average playtimes of users, the ratings of the<br>
game, and the revenue generated from sales.<br>
<br><br>
A dataset produced by Nik Davis on Kaggle (available at https://www.kaggle.com/datasets/nikdavis/steam-store-games)<br>
will serve very nicely for my analysis. This dataset contains the metrics I need on over 27,000 Steam games.<br>
<br><br>
What is Steam? Steam is a digital PC game distribution platform created by Valve Corporation in 2003.<br>
It is the largest and most popular platform for PC gaming worldwide with over 30,000 games. With Steam,<br>
users are able to browse the store for games, watch game trailers, interact with friends, received tailored<br>
game recommendations, and much more. Because of its popularity and huge number of diverse games, it will<br>
be a great platform to use in my exploration.
(sources: https://pcgamesforsteam.com/what-is-steam,<br>
https://www.uktech.news/other_news/best-pc-video-game-digital-distribution-services).<br>

#### The Dataset:<br>
**18 Columns:**<br>
**0: appid:**<br>
&nbsp;&nbsp;&nbsp;&nbsp;The unique appid associated with the game<br>
**1: name:**<br>
&nbsp;&nbsp;&nbsp;&nbsp;The name of the game<br>
**2: release_date:**<br>
&nbsp;&nbsp;&nbsp;&nbsp;When the game was first released on Steam in YYYY-MM-DD format<br>
**3: english:**<br>
&nbsp;&nbsp;&nbsp;&nbsp;1 if the game is in English, 0 otherwise<br>
**4: developer:**<br>
&nbsp;&nbsp;&nbsp;&nbsp;Name(s) of developer(s) separated by semicolon if multiple<br>
**5: publisher:**<br>
&nbsp;&nbsp;&nbsp;&nbsp;Name(s) of publisher(s) separated by semicolon if multiple<br>
**6: platforms:**<br>
&nbsp;&nbsp;&nbsp;&nbsp;The supported platform(s) separated by semicolons if multiple.<br>
&nbsp;&nbsp;&nbsp;&nbsp;Possible platforms are windows;mac;linux.<br>
**7: required_age:**<br>
&nbsp;&nbsp;&nbsp;&nbsp;Minimum age required according to PEGI UK standards.<br>
&nbsp;&nbsp;&nbsp;&nbsp;Entries of 0 are unrated or unsupplied<br>
**8: categories:**<br>
&nbsp;&nbsp;&nbsp;&nbsp;Categories associated with the game separated by semicolons e.g., Single-player;Multi-player<br>
**9: genres:**<br>
&nbsp;&nbsp;&nbsp;&nbsp;The generes associated with the game separated by semicolons e.g., Action;Indie<br>
**10: steamspy_tags:**<br>
&nbsp;&nbsp;&nbsp;&nbsp;The steamspy_tags associated with the game separated by semicolons.<br>
&nbsp;&nbsp;&nbsp;&nbsp;Similar to genres, but are community voted e.g., Action;Indie<br>
**11: achievements:**<br>
&nbsp;&nbsp;&nbsp;&nbsp;Number of achievements in the game<br>
**12: positive_ratings:**<br>
&nbsp;&nbsp;&nbsp;&nbsp;Number of positive ratings on Steam<br>
**13: negative_ratings:**<br>
&nbsp;&nbsp;&nbsp;&nbsp;Number of negative ratings on Steam<br>
**14: average_playtime:**<br>
&nbsp;&nbsp;&nbsp;&nbsp;Average playtime of users in minutes<br>
**15: median_playtime:**<br>
&nbsp;&nbsp;&nbsp;&nbsp;Median playtime of users in minutes<br>
**16: owners:**<br>
&nbsp;&nbsp;&nbsp;&nbsp;Estimated number of owners (lower and upper bound) e.g., 20000-50000<br>
**17: price:**<br>
&nbsp;&nbsp;&nbsp;&nbsp;Price of game in GBP<br>

#### Important Notes:<br>
It is important to note that this data was gathered around May of 2019, so it is not entirely representative of<br>
the games on Steam today or the metrics associated with them.<br>
It should also be noted that Steam is for PC games *only* so this data is not representative of console games<br>
or their playerbase.<br>

### **2. Gathering and Transforming the Data**

In [140]:
# Imports
import pandas as pd
import numpy as np
import re

In [141]:
# Increase the max number of columns and rows which can be displayed
pd.set_option("max_columns", 100)
pd.set_option("max_rows", 500)
# Load the steam_games csv into a pandas dataframe
games_df = pd.read_csv('steam_games.csv')
# Load the steam_requirements_data into a pandas dataframe
requirements_df = pd.read_csv('steam_requirements_data.csv')
print("Steam games data:")
display(games_df.head(10))
print("Steam game requirements data:")
display(requirements_df.head(10))
print("Value counts of required_age")
print(games_df["required_age"].value_counts())

Steam games data:


Unnamed: 0,appid,name,release_date,english,developer,publisher,platforms,required_age,categories,genres,steamspy_tags,achievements,positive_ratings,negative_ratings,average_playtime,median_playtime,owners,price
0,10,Counter-Strike,2000-11-01,1,Valve,Valve,windows;mac;linux,0,Multi-player;Online Multi-Player;Local Multi-P...,Action,Action;FPS;Multiplayer,0,124534,3339,17612,317,10000000-20000000,7.19
1,20,Team Fortress Classic,1999-04-01,1,Valve,Valve,windows;mac;linux,0,Multi-player;Online Multi-Player;Local Multi-P...,Action,Action;FPS;Multiplayer,0,3318,633,277,62,5000000-10000000,3.99
2,30,Day of Defeat,2003-05-01,1,Valve,Valve,windows;mac;linux,0,Multi-player;Valve Anti-Cheat enabled,Action,FPS;World War II;Multiplayer,0,3416,398,187,34,5000000-10000000,3.99
3,40,Deathmatch Classic,2001-06-01,1,Valve,Valve,windows;mac;linux,0,Multi-player;Online Multi-Player;Local Multi-P...,Action,Action;FPS;Multiplayer,0,1273,267,258,184,5000000-10000000,3.99
4,50,Half-Life: Opposing Force,1999-11-01,1,Gearbox Software,Valve,windows;mac;linux,0,Single-player;Multi-player;Valve Anti-Cheat en...,Action,FPS;Action;Sci-fi,0,5250,288,624,415,5000000-10000000,3.99
5,60,Ricochet,2000-11-01,1,Valve,Valve,windows;mac;linux,0,Multi-player;Online Multi-Player;Valve Anti-Ch...,Action,Action;FPS;Multiplayer,0,2758,684,175,10,5000000-10000000,3.99
6,70,Half-Life,1998-11-08,1,Valve,Valve,windows;mac;linux,0,Single-player;Multi-player;Online Multi-Player...,Action,FPS;Classic;Action,0,27755,1100,1300,83,5000000-10000000,7.19
7,80,Counter-Strike: Condition Zero,2004-03-01,1,Valve,Valve,windows;mac;linux,0,Single-player;Multi-player;Valve Anti-Cheat en...,Action,Action;FPS;Multiplayer,0,12120,1439,427,43,10000000-20000000,7.19
8,130,Half-Life: Blue Shift,2001-06-01,1,Gearbox Software,Valve,windows;mac;linux,0,Single-player,Action,FPS;Action;Sci-fi,0,3822,420,361,205,5000000-10000000,3.99
9,220,Half-Life 2,2004-11-16,1,Valve,Valve,windows;mac;linux,0,Single-player;Steam Achievements;Steam Trading...,Action,FPS;Action;Sci-fi,33,67902,2419,691,402,10000000-20000000,7.19


Steam game requirements data:


Unnamed: 0,steam_appid,pc_requirements,mac_requirements,linux_requirements,minimum,recommended
0,10,{'minimum': '\r\n\t\t\t<p><strong>Minimum:</st...,{'minimum': 'Minimum: OS X Snow Leopard 10.6....,"{'minimum': 'Minimum: Linux Ubuntu 12.04, Dual...","500 mhz processor, 96mb ram, 16mb video card, ...",
1,20,{'minimum': '\r\n\t\t\t<p><strong>Minimum:</st...,{'minimum': 'Minimum: OS X Snow Leopard 10.6....,"{'minimum': 'Minimum: Linux Ubuntu 12.04, Dual...","500 mhz processor, 96mb ram, 16mb video card, ...",
2,30,{'minimum': '\r\n\t\t\t<p><strong>Minimum:</st...,{'minimum': 'Minimum: OS X Snow Leopard 10.6....,"{'minimum': 'Minimum: Linux Ubuntu 12.04, Dual...","500 mhz processor, 96mb ram, 16mb video card, ...",
3,40,{'minimum': '\r\n\t\t\t<p><strong>Minimum:</st...,{'minimum': 'Minimum: OS X Snow Leopard 10.6....,"{'minimum': 'Minimum: Linux Ubuntu 12.04, Dual...","500 mhz processor, 96mb ram, 16mb video card, ...",
4,50,{'minimum': '\r\n\t\t\t<p><strong>Minimum:</st...,{'minimum': 'Minimum: OS X Snow Leopard 10.6....,"{'minimum': 'Minimum: Linux Ubuntu 12.04, Dual...","500 mhz processor, 96mb ram, 16mb video card, ...",
5,60,{'minimum': '\r\n\t\t\t<p><strong>Minimum:</st...,{'minimum': 'Minimum: OS X Snow Leopard 10.6....,"{'minimum': 'Minimum: Linux Ubuntu 12.04, Dual...","500 mhz processor, 96mb ram, 16mb video card, ...",
6,70,{'minimum': '\r\n\t\t\t<p><strong>Minimum:</st...,{'minimum': 'Minimum: OS X Snow Leopard 10.6....,"{'minimum': 'Minimum: Linux Ubuntu 12.04, Dual...","500 mhz processor, 96mb ram, 16mb video card, ...",
7,80,{'minimum': '\r\n\t\t\t<p><strong>Minimum:</st...,[],[],"500 mhz processor, 96mb ram, 16mb video card, ...",
8,130,{'minimum': '\r\n\t\t\t<p><strong>Minimum:</st...,{'minimum': 'Minimum: OS X Snow Leopard 10.6....,"{'minimum': 'Minimum: Linux Ubuntu 12.04, Dual...","500 mhz processor, 96mb ram, 16mb video card, ...",
9,220,{'minimum': '<strong>Minimum:</strong><br><ul ...,{'minimum': '<strong>Minimum:</strong><br><ul ...,[],"OS: Windows 7, Vista, XP Processor: 1.7 Ghz Me...",


Value counts of required_age
0     26479
18      308
16      192
12       73
7        12
3        11
Name: required_age, dtype: int64


Now, lets transform the dataframe to be better suited for our data analysis by removing unnecessary columns and transforming columns to be better suited for analysis.<br>
<br>
The columns we will be removing are:<br>
**english**: Even if games in English are associated with more or less success, it doesn't really tell us anything<br>
about the game itself.<br>
**achievements**: The number of achievements in a game is most likely not a good indicator of success.<br>
**required_age**: As shown above, the dataset contains mostly 0s for this metric so it will not be very useful.<br>
**positive_ratings and negative_ratings**: Instead, we will have one ratings column for the percent of people who rated positively.<br>
<br>
The columns we will be transforming are:<br>
**release_date**: Only the year is important<br>
**platforms, categories, genres, and steamspy_tags**: One-hot encode for easier analysis<br>
**owners**: Is currently a range--just take the middle of this range<br>
**minimum**: Currently has all the minimum hardware requirements... Instead we will just look at the minimum ram<br>
requirement (in GB) so we have a metric which can easily be compared across games.

We will also add a new column **est_revenue** which is a product of the number of owners and the price to estimate the revenue of the game

In [142]:
def transform_owners(x):
    split = x.split("-")
    low = int(split[0])
    high = int(split[1])
    return int((low + high) * 0.5)
def transform_reqs(reqs):
    if (reqs is not np.nan):
        # Remove spaces and make the string lowercase
        reqs = reqs.replace(" ", "")
        reqs = reqs.lower()
        # Find how much ram and the unit (mb or gb)
        res = re.search('[0-9]+(mbram|gbram)', reqs)
        # If found
        if res:
            # The string found (e.g., 96mbram, 1gbram)
            res_str = res.group()
            # If mb, convert to gb and return
            if 'mb' in res_str:
                # Extract the number
                res = re.search('[0-9]+', res_str)
                number = int(res.group())
                # Convert to gb
                return number / 1000
            elif 'gb' in res_str:
                # Extract the number
                res = re.search('[0-9]+', res_str)
                return int(res.group())
        else:
            return np.nan
    else:
        return np.nan

games_df["release_date"] = games_df["release_date"].apply(lambda x: int(x[0:4]))
games_df["owners"] = games_df["owners"].apply(transform_owners)
games_df["rating"] = games_df["positive_ratings"] / (games_df["positive_ratings"] + games_df["negative_ratings"])
# Join the two dataframes on appid
games_df = games_df.join(requirements_df.set_index("steam_appid"), on=["appid"])
# Drop unneeded columns mentioned above
games_df.drop(columns=['english', 'achievements', 'required_age', 'positive_ratings', 'negative_ratings'], inplace=True)
# Drop unneeded columns resulting from joining the 2 dataframes
games_df.drop(columns=['pc_requirements', 'mac_requirements', 'linux_requirements', 'recommended'], inplace=True)
games_df["minimum"] = games_df["minimum"].apply(transform_reqs)
# Raname "minimum" to "min_req_ram"
games_df.rename(columns={'minimum': 'min_req_ram'}, inplace=True)
games_df["est_revenue"] = games_df["owners"] * games_df["price"]
display(games_df.head(10))

Unnamed: 0,appid,name,release_date,developer,publisher,platforms,categories,genres,steamspy_tags,average_playtime,median_playtime,owners,price,rating,min_req_ram,est_revenue
0,10,Counter-Strike,2000,Valve,Valve,windows;mac;linux,Multi-player;Online Multi-Player;Local Multi-P...,Action,Action;FPS;Multiplayer,17612,317,15000000,7.19,0.973888,0.096,107850000.0
1,20,Team Fortress Classic,1999,Valve,Valve,windows;mac;linux,Multi-player;Online Multi-Player;Local Multi-P...,Action,Action;FPS;Multiplayer,277,62,7500000,3.99,0.839787,0.096,29925000.0
2,30,Day of Defeat,2003,Valve,Valve,windows;mac;linux,Multi-player;Valve Anti-Cheat enabled,Action,FPS;World War II;Multiplayer,187,34,7500000,3.99,0.895648,0.096,29925000.0
3,40,Deathmatch Classic,2001,Valve,Valve,windows;mac;linux,Multi-player;Online Multi-Player;Local Multi-P...,Action,Action;FPS;Multiplayer,258,184,7500000,3.99,0.826623,0.096,29925000.0
4,50,Half-Life: Opposing Force,1999,Gearbox Software,Valve,windows;mac;linux,Single-player;Multi-player;Valve Anti-Cheat en...,Action,FPS;Action;Sci-fi,624,415,7500000,3.99,0.947996,0.096,29925000.0
5,60,Ricochet,2000,Valve,Valve,windows;mac;linux,Multi-player;Online Multi-Player;Valve Anti-Ch...,Action,Action;FPS;Multiplayer,175,10,7500000,3.99,0.801278,0.096,29925000.0
6,70,Half-Life,1998,Valve,Valve,windows;mac;linux,Single-player;Multi-player;Online Multi-Player...,Action,FPS;Classic;Action,1300,83,7500000,7.19,0.961878,0.096,53925000.0
7,80,Counter-Strike: Condition Zero,2004,Valve,Valve,windows;mac;linux,Single-player;Multi-player;Valve Anti-Cheat en...,Action,Action;FPS;Multiplayer,427,43,15000000,7.19,0.893871,0.096,107850000.0
8,130,Half-Life: Blue Shift,2001,Gearbox Software,Valve,windows;mac;linux,Single-player,Action,FPS;Action;Sci-fi,361,205,7500000,3.99,0.90099,0.096,29925000.0
9,220,Half-Life 2,2004,Valve,Valve,windows;mac;linux,Single-player;Steam Achievements;Steam Trading...,Action,FPS;Action;Sci-fi,691,402,15000000,7.19,0.965601,0.512,107850000.0
