# Donkey Games -- Game Sales Report

## Business context

The aim of this report is to assist Donkey Games in deciding what their next game project should be. Many different factors will be taken into consideration, including platform as well as game genre.

## Data

### Data source and format

The data used has been sourced from VGChartz, a video game industry news website with a focus on console and game sales. Four different csv files will be used to complete the report.

### Data quality and bias

The data contains records of game sales between the years 1970 and 2019, across many different platforms and game genres. There is a high number of missing values which will need to be addressed, and in terms of bias certain gaming platforms for example are more represented than others. Overall however, there is a lot of data available from which to extract potentially valuable insights.

### Ethical considerations

The data does not contain anything that could be considered PII (potentially identifiable information). However, there are links to box art within the datasets, so copyright laws might be a valid concern.

## Modelling

### Methodology

After an initial data cleaning phase, the resulting joined dataset was used to create an explanatory multiple linear regression model. The aim was to attempt to explain which factors might impact a game's number of global sales. Multiple models were created iteratively, leading to different insights. The models created are available for review in individual notebooks, and the insights extracted are summarised below.

### Insights

The main insights gained through modelling the data available to us are as follow:

- European and North American sales are good indicators of global sales performance. If a game performs well in either of those markets, it is likely that it would perform well globally.
- Critic scores are a good indicator of how well a game will perform globally. The higher the average critic score, the more copies the game is likely to sell. User scores on the other hand are less correlated to global sales.
- The following game genres appear more likely to sell higher numbers of copies globally: action, action-adventure, MMO, RPG, sandbox, shooter.
- Unfortunately the models did not lend themselves to determining which publishers or platforms were more likely to have high sales numbers. Perhaps more data sources could be used in the future to help with these predictors.

Although the process led to some valuable insights, there is room to build more robust models once more varied data is available.

## Global overview across all platforms

In [1]:
import pandas as pd
import plotly.express as px
from pyprojroot import here

In [2]:
import plotly.io as pio
pio.renderers.default='notebook'

In [3]:
sales_jp = pd.read_csv(here("./clean_data/sales-jp.csv"))

In [22]:
fig = px.histogram(sales_jp, x="genre", y="global_sales", 
             labels={
                 "genre": "Game Genre",
                 "global_sales": "Global Sales (millions)",
                 "genre": "Game Genre"
             },
             title="Global Sales by Game Genre")
fig.update_xaxes(tickangle=45)
fig.show()

In [5]:
fig = px.bar(sales_jp.groupby("genre")["global_sales"].mean().reset_index(name = "mean global sales"), 
             x="genre", y="mean global sales", 
             labels={
                 "genre": "Game Genre",
                 "mean global sales": "Mean Global Sales per Game (millions)",
                 "genre": "Game Genre"
             },
             title="Mean Global Sales by Game Genre")
fig.update_xaxes(tickangle=45)
fig.show()

In [24]:
fig = px.scatter(sales_jp, x="critic_score", y="global_sales", 
             labels={
                 "critic_score": "Aggregate Critic Score",
                 "global_sales": "Global Sales (millions)"
             },
             title="Global Game Sales by Aggregate Critic Score")
fig.show()

In [25]:
fig = px.histogram(sales_jp, x="year", y="global_sales", color="genre", 
                   labels={
                       "year": "Year",
                       "global_sales": "Global Sales (millions)",
                       "genre": "Game Genre"
                   },
                   title="Global Game Sales Across All Platforms")
fig.update_xaxes(tickangle=45)
fig.show()

Although some aspects of this bar plot are hard to interpret, it provides a visualisation of the scale of the video game industry. We also have confirmation of some of our modelling findings; shooters and action games appear to be quite popular in recent years.

In [8]:
fig = px.histogram(sales_jp.loc[sales_jp["global_sales"].isin(sales_jp['global_sales'].nlargest(n=10))], 
                   x="name", y="global_sales", color="genre", 
                   labels={
                       "name": "Game Title",
                       "global_sales": "Global Sales (millions)",
                       "genre": "Game Genre"
                   },
                   title="Top 10 Best Selling Games Across All Platforms")
fig.update_xaxes(tickangle=45)
fig.show();

Here we have plotted the global sales for the 10 best performing games in our data. As we can see, there is a wide spread of different genres, and 8 of the games were released on Nintendo systems. The issue for us is that Nintendo produce most of their own games in-house, so we will mostly be focusing on other platforms.

For the following sections of the report, we will focus on PC games, as well as the two major competing console platforms, Sony's PlayStation 4 and Microsoft's Xbox One.

## PC

In [9]:
sales_pc = pd.read_csv(here("./clean_data/sales-pc.csv"))

In [30]:
fig = px.histogram(sales_pc, x="genre", y="global_sales", 
             labels={
                 "genre": "Game Genre",
                 "global_sales": "Global Sales (millions)"
             },
             title="Global Sales by Game Genre (PC)")
fig.update_xaxes(tickangle=45)
fig.show()

In [26]:
fig = px.bar(sales_pc.groupby("genre")["global_sales"].mean().reset_index(name = "mean global sales"), 
             x="genre", y="mean global sales", 
             labels={
                 "genre": "Game Genre",
                 "mean global sales": "Mean Global Sales per Game (millions)",
                 "genre": "Game Genre"
             },
             title="Mean Global Sales by Game Genre (PC)")
fig.update_xaxes(tickangle=45)
fig.show()

In [32]:
sales_year_genre_pc = sales_pc.groupby(['year', 'genre']).agg(
    sales=pd.NamedAgg(column='global_sales', aggfunc=sum)
).reset_index().copy()

sales_year_genre_pc = sales_year_genre_pc.loc[sales_year_genre_pc["genre"].isin(["Shooter", "Action", 
                                                                                 "Role-Playing", "Strategy",
                                                                                "Simulation"])]

fig = px.line(sales_year_genre_pc, x="year", y="sales", color="genre", line_shape="spline", 
             labels={
                 "year": "Year",
                 "sales": "Global Sales (millions)",
                 "genre": "Game Genre"
             },
             title="Global Sales by Year and Genre - Top 5 PC Game Genres")
fig.update_xaxes(tickangle=45)
fig.show()

## PlayStation 4

In [13]:
sales_ps4 = pd.read_csv(here("./clean_data/sales-ps4.csv"))

In [29]:
fig = px.histogram(sales_ps4, x="genre", y="global_sales", 
             labels={
                 "genre": "Game Genre",
                 "global_sales": "Global Sales (millions)"
             },
             title="Global Sales by Game Genre (PS4)")
fig.update_xaxes(tickangle=45)
fig.show()

In [27]:
fig = px.bar(sales_ps4.groupby("genre")["global_sales"].mean().reset_index(name = "mean global sales"), 
             x="genre", y="mean global sales", 
             labels={
                 "genre": "Game Genre",
                 "mean global sales": "Mean Global Sales per Game (millions)",
                 "genre": "Game Genre"
             },
             title="Mean Global Sales by Game Genre (PS4)")
fig.update_xaxes(tickangle=45)
fig.show()

In [33]:
sales_year_genre_ps4 = sales_ps4.groupby(['year', 'genre']).agg(
    sales=pd.NamedAgg(column='global_sales', aggfunc=sum)
).reset_index().copy()

sales_year_genre_ps4 = sales_year_genre_ps4.loc[sales_year_genre_ps4["genre"].isin(["Action", "Action-Adventure",
                                                                                 "Role-Playing", "Sports",
                                                                                "Shooter"])]

fig = px.line(sales_year_genre_ps4, x="year", y="sales", color="genre", line_shape="spline", 
             labels={
                 "year": "Year",
                 "sales": "Global Sales (millions)",
                 "genre": "Game Genre"
             },
             title="Global Sales by Year and Genre - Top 5 PS4 Game Genres")
fig.update_xaxes(tickangle=45)
fig.show()

## Xbox One

In [17]:
sales_xbox = pd.read_csv(here("./clean_data/sales-xbox.csv"))

In [31]:
fig = px.histogram(sales_xbox, x="genre", y="global_sales", 
             labels={
                 "genre": "Game Genre",
                 "global_sales": "Global Sales (millions)"
             },
             title="Global Sales by Game Genre (Xbox One)")
fig.update_xaxes(tickangle=45)
fig.show()

In [28]:
fig = px.bar(sales_xbox.groupby("genre")["global_sales"].mean().reset_index(name = "mean global sales"), 
             x="genre", y="mean global sales", 
             labels={
                 "genre": "Game Genre",
                 "mean global sales": "Mean Global Sales per Game (millions)",
                 "genre": "Game Genre"
             },
             title="Mean Global Sales by Game Genre (Xbox One)")
fig.update_xaxes(tickangle=45)
fig.show()

In [34]:
sales_year_genre_xbox = sales_xbox.groupby(['year', 'genre']).agg(
    sales=pd.NamedAgg(column='global_sales', aggfunc=sum)
).reset_index().copy()

sales_year_genre_xbox = sales_year_genre_xbox.loc[sales_year_genre_xbox["genre"].isin(["Action", "Shooter",
                                                                                 "Action-Adventure", "Sports",
                                                                                "Racing"])]

fig = px.line(sales_year_genre_xbox, x="year", y="sales", color="genre", line_shape="spline", 
             labels={
                 "year": "Year",
                 "sales": "Global Sales (millions)",
                 "genre": "Game Genre"
             },
             title="Global Sales by Year and Genre - Top 5 Xbox One Game Genres")
fig.update_xaxes(tickangle=45)
fig.show()

## Recommendations