# Table of Contents

* [Loading Data](#Loading-Data)
* [Exploring Sales Data](#Exploring-Sales-Data)
* [Exploring Ratings Data](#Exploring-Ratings-Data)
* [Conclusions](#Conclusions)

In [None]:
# packages
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Loading Data

In [None]:
# load file and show info
df = pd.read_csv('../input/videogamedata/game_sales_data.csv')
df.info()

# Exploring Sales Data

In [None]:
# create and plot year vs sales data
sales_dict = {}
sales_column = df["Year"].unique()

for year in sales_column:
    sum = df.loc[df["Year"] == year, "Total_Shipped"].sum()
    sales_dict[year] = sum
    
plt.bar(sales_dict.keys(), sales_dict.values())
plt.xlabel("Year")
plt.ylabel("Total Game Sales")
plt.title("Year vs Total Game Sales")
plt.show()

In [None]:
# create and plot year vs number of games
num_games = {}
games_column = df["Year"].unique()

for year in games_column:
    count = df.loc[df["Year"] == year, "Name"].count()
    num_games[year] = count
    
plt.bar(num_games.keys(), num_games.values())
plt.xlabel("Year")
plt.ylabel("Total Number of Games")
plt.title("Year vs Total Number of Games")
plt.show()

#### Although sales have gone down in the years after 2011 so too have the number of games. The drop in sales is explained by the drop in total number of games.

# Exploring Ratings Data

In [None]:
# create and plot the critic and user rating for the best selling game in each year
for col_name in ["Critic_Score", "User_Score"]:
    best_selling = {}
    ratings_dict = {}
    ratings_column = df["Year"].unique()
    sorted_ratings = sorted(ratings_column, key=lambda e: e, reverse=False)
    for year in sorted_ratings:
        if year >= 1985:
            index = df.Year.eq(year).idxmax()
            rating = df.at[index, col_name]
            name = df.at[index, "Name"]
            ratings_dict[year] = rating
            best_selling[year] = []
            best_selling[year].append(name)
            best_selling[year].append(rating)
    
    plt.plot(ratings_dict.keys(), ratings_dict.values())
    plt.xlabel("Year")
    plt.ylim(3.0, 10.0)
    plt.ylabel(col_name)
    plt.title(col_name + " of the best selling game of each year")
    plt.show()

# create a dataframe of the best selling games, including their name and user score
best_selling_df = pd.DataFrame.from_dict(best_selling, orient='index', columns=['Name', 'User_Score'])
best_selling_df

#### While critic scores have stayed mostly consistent, user scores have become quite inconsistent with the best selling games in 2015, 2017, 2019 and 2020 receiving quite a low score.

In [None]:
for col_name in ["Critic_Score", "User_Score"]:
    avg_ratings_dict = {}
    ratings_column = df["Year"].unique()
    sorted_ratings = sorted(ratings_column, key=lambda e: e, reverse=False)
    
    for year in sorted_ratings:
        if 2012 <= year <= 2018:
            sum = df.loc[df["Year"] == year, col_name].sum()
            count = df.loc[df["Year"] == year, col_name].count()
            average = sum / count
            avg_ratings_dict[year] = average
            
    plt.plot(avg_ratings_dict.keys(), avg_ratings_dict.values())
    plt.ylim(6.0, 9.0)
    plt.xlabel("Year")
    plt.ylabel("Average " + col_name)
    plt.title("Average " + col_name + " of games in a given year")
    plt.show()

#### Even when looking at the average over a 6 year period critic scores have stayed consistent while user scores have a downwards trend.

# Conclusions

## Flaws with this analysis
* The dataset is quite small compared all of the video games that exist.
* The dataset is also still very incomplete with thousands of games missing ratings.
* Most of the user ratings were taken from Metacritic so they are probably somewhat biased.

## Lessons Learned
* Although the data is flawed we can still learn something from it.
* There is a trend of newer games having lower user scores despite still selling very well.
* It seems that players are more unhappy with games coming out now than before.
* More analysis needs to be done to figure out why this is the case.