# **Predicting The Success of Video Games**
##### Kevin Rathbun

### **1. Introduction**
Video game developers and companies develop their games with one major goal in mind: to produce a successful game.<br>
But what makes a game successful and how would we measure its success? In this data exploration project I will<br>
explore these questions to provide insight and analysis into what makes a game successful and will create a<br>
method to predict the success of a game based on various metrics.<br>
<br><br>
TODO UPDATE IF GOALS CHANGE <br>
The metrics I will be looking at are the developer, the publisher, the supported platforms, the required age<br>
to play, the categories and genres the game falls into, and the hardware requirements to play the game.<br>
This is by no means a comprehensive list of what makes a game successful, but will serve as useful metrics<br>
in determining the success of a game. To measure success, I will look at playtimes of users, the ratings of the<br>
game, and the revenue generated from sales.<br>
<br><br>
A dataset produced by Nik Davis on Kaggle (available at https://www.kaggle.com/datasets/nikdavis/steam-store-games)<br>
will serve very nicely for my analysis. This dataset contains the metrics I need on over 27,000 Steam games.<br>
<br><br>
What is Steam? Steam is a digital PC game distribution platform created by Valve Corporation in 2003.<br>
It is the largest and most popular platform for PC gaming worldwide with over 30,000 games. With Steam,<br>
users are able to browse the store for games, watch game trailers, interact with friends, received tailored<br>
game recommendations, and much more.<br>
(sources: https://pcgamesforsteam.com/what-is-steam,<br>
https://www.uktech.news/other_news/best-pc-video-game-digital-distribution-services)<br>

#### The Dataset:<br>
**18 Columns:**<br>
**0: appid:**<br>
&nbsp;&nbsp;&nbsp;&nbsp;The unique appid (integer) associated with the game<br>
**1: name:**<br>
&nbsp;&nbsp;&nbsp;&nbsp;The name of the game<br>
**2: release_date:**<br>
&nbsp;&nbsp;&nbsp;&nbsp;When the game was first released on Steam in YYYY-MM-DD format<br>
**3: english:**<br>
&nbsp;&nbsp;&nbsp;&nbsp;1 if the game is in English, 0 otherwise<br>
**4: developer:**<br>
&nbsp;&nbsp;&nbsp;&nbsp;Name(s) of developer(s) separated by semicolon if multiple<br>
**5: publisher:**<br>
&nbsp;&nbsp;&nbsp;&nbsp;Name(s) of publisher(s) separated by semicolon if multiple<br>
**6: platforms:**<br>
&nbsp;&nbsp;&nbsp;&nbsp;The supported platform(s) separated by semicolons if multiple.<br>
&nbsp;&nbsp;&nbsp;&nbsp;Possible platforms are windows;mac;linux.<br>
**7: required_age:**<br>
&nbsp;&nbsp;&nbsp;&nbsp;Minimum age required according to PEGI UK standards.<br>
&nbsp;&nbsp;&nbsp;&nbsp;Entries of 0 are unrated or unsupplied<br>
**8: categories:**<br>
&nbsp;&nbsp;&nbsp;&nbsp;Categories associated with the game separated by semicolons e.g., Single-player;Multi-player<br>
**9: genres:**<br>
&nbsp;&nbsp;&nbsp;&nbsp;The generes associated with the game separated by semicolons e.g., Action;Indie<br>
**10: steamspy_tags:**<br>
&nbsp;&nbsp;&nbsp;&nbsp;The steamspy_tags associated with the game separated by semicolons.<br>
&nbsp;&nbsp;&nbsp;&nbsp;Similar to genres, but are community voted e.g., Action;Indie<br>
**11: achievements:**<br>
&nbsp;&nbsp;&nbsp;&nbsp;Number of achievements in the game<br>
**12: positive_ratings:**<br>
&nbsp;&nbsp;&nbsp;&nbsp;Number of positive ratings on Steam<br>
**13: negative_ratings:**<br>
&nbsp;&nbsp;&nbsp;&nbsp;Number of negative ratings on Steam<br>
**14: average_playtime:**<br>
&nbsp;&nbsp;&nbsp;&nbsp;Average playtime of users in minutes<br>
**15: median_playtime:**<br>
&nbsp;&nbsp;&nbsp;&nbsp;Median playtime of users in minutes<br>
**16: owners:**<br>
&nbsp;&nbsp;&nbsp;&nbsp;Estimated number of owners (lower and upper bound) e.g., 20000-50000<br>
**17: price:**<br>
&nbsp;&nbsp;&nbsp;&nbsp;Price of game in GBP<br>

#### Important Notes:<br>
It is important to note that this data was gathered around May of 2019, so it is not entirely representative of<br>
the games on Steam today or the metrics associated with them.<br>
It should also be noted that Steam is for PC games *only* so this data is not representative of console games<br>
or their playerbase.<br>

### **2. Gathering and Transforming the Data**

In [69]:
# Imports
import pandas as pd

In [70]:
# Increase the max number of columns and rows which can be displayed
pd.set_option("max_columns", 30)
pd.set_option("max_rows", 100)
# Load the csv into a pandas dataframe
games_df = pd.read_csv('steam_games.csv')
display(games_df.head())

Unnamed: 0,appid,name,release_date,english,developer,publisher,platforms,required_age,categories,genres,steamspy_tags,achievements,positive_ratings,negative_ratings,average_playtime,median_playtime,owners,price
0,10,Counter-Strike,2000-11-01,1,Valve,Valve,windows;mac;linux,0,Multi-player;Online Multi-Player;Local Multi-P...,Action,Action;FPS;Multiplayer,0,124534,3339,17612,317,10000000-20000000,7.19
1,20,Team Fortress Classic,1999-04-01,1,Valve,Valve,windows;mac;linux,0,Multi-player;Online Multi-Player;Local Multi-P...,Action,Action;FPS;Multiplayer,0,3318,633,277,62,5000000-10000000,3.99
2,30,Day of Defeat,2003-05-01,1,Valve,Valve,windows;mac;linux,0,Multi-player;Valve Anti-Cheat enabled,Action,FPS;World War II;Multiplayer,0,3416,398,187,34,5000000-10000000,3.99
3,40,Deathmatch Classic,2001-06-01,1,Valve,Valve,windows;mac;linux,0,Multi-player;Online Multi-Player;Local Multi-P...,Action,Action;FPS;Multiplayer,0,1273,267,258,184,5000000-10000000,3.99
4,50,Half-Life: Opposing Force,1999-11-01,1,Gearbox Software,Valve,windows;mac;linux,0,Single-player;Multi-player;Valve Anti-Cheat en...,Action,FPS;Action;Sci-fi,0,5250,288,624,415,5000000-10000000,3.99


Now, lets transform the dataframe to be better suited for our data analysis by removing unnecessary columns and transforming columns to be better suited for analysis.