<a href="https://www.kaggle.com/code/luhaowangsg/exploratory-data-analysis-on-nintendo-games?scriptVersionId=143955255" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import seaborn as sns
import matplotlib.pyplot as plt
import math

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

## Table of Contents
1. [Preview Data](#1)
2. [Data Cleaning](#2)
3. [Game distribution by platforms](#3)
4. [Trend analysis between metascore and userscore](#4)
5. [Analysis on metascore and userscore based on developers](#5)
6. [Conclusion](#6)

<a id="1"></a> <br>
> ## 1: Loading and previewing dataframe

In this dataset we have a total of 9 columns with a total of 1094 entries.\
Below is a quick description of each column.

* **meta_score** - Average scores based on critic reviews
* **title** - Title of the game
* **platform** - Platform that the game was released on
* **date** - Date of release
* **user_score** - Average scores based on user reviews
* **link** - Link to the game
* **esrb_rating** - Age and content rating by the Entertainment Software Rating Board
* **developers** - Developers who worked on the game title
* **genres** - Genres that the game title belongs to

In [None]:
df = pd.read_csv('/kaggle/input/nintendo-games/NintendoGames.csv')
df.head()

In [None]:
df.shape

When previewing the dataset, it was evident that there were null values mostly stemming from metascore and userscore.\
These null values will be treated in the next step.

In [None]:
df.info()

<a id="2"></a> <br>
> ## 2: Data Cleaning

As it was not possible to aggraate meta score, user score and esrb rating on our own accord, rows with null meta score, userscore and esrb rating were dropped.\

This leaves us with a total of 656 entries.

In [None]:
df_clean = df.dropna()
df_clean.info()

A quick look at the statistical data of metascore and userscore tells us that the highest scores are 99 and 9.6 repsectively and the lowest scores are 37 and 3.4 respectively.

In [None]:
df_clean.describe()

<a id="3"></a> <br>
> ## 3: Distribution by platforms

### Creating a piechart to understand the distribution of platforms for Nintendo Games.

A piechart was created to show the distribution of platforms by Nintendo and it is not suprised that Nintendo's proprietary consoles, the Switch and 3DS are amongst the Top 3.

Top 3 platforms for Nintendo games
1. 3DS
2. Switch
3. DS

In [None]:
data=df_clean['platform'].value_counts()
labels = data.index

#create pie chart
plt.figure(figsize = (8,8))
plt.pie(data, labels = labels, autopct='%.0f%%')
plt.title('Distribution of games by platforms')

plt.show()

<a id="4"></a> <br>
> ## 4: Trend analysis between metascore and userscore

### Exploring trends between metascore and userscore

Apart from several outliners, from the scatterplot, we can see that as the higher the metascore for each game title, the higher the user score.\
As such, we can hypothesise that there seems to be a positive correlation between metascore and userscore.

**H1: There is a positve correlation between metascore and userscore**

In [None]:
sns.set_style(style='whitegrid')
sns.regplot(
    data=df_clean,
    x = "meta_score",
    y = "user_score",
    scatter_kws={'alpha':0.3}
)

plt.title("Exploring relationship between metascore and userscore")
plt.show()

### Hypothesis testing

To test the hypothesis, correlation test from the Numpy package was used which returned a correlation coefficient of 0.634 between metascore and user score.\
To test if the correlation coeffcient was statistically significant, the p-value was calculated using Scipy pearsonr().

The p-value calculated was 3.4361881274100374 x 10<sup>-75</sup> which is less than P < 0.05.
                                                                                            
As such it can be concluded that the correlation is significant and H1 is proven.

In [None]:
np.corrcoef(df_clean["meta_score"], df_clean["user_score"])

In [None]:
from scipy.stats.stats import pearsonr

value = pearsonr(df_clean["meta_score"], df_clean["user_score"])

print("Correlation Coefficient: ",value[0],) 
print("p-value: ",value[1])

<a id="5"></a> <br>
> ## 5: Analysis on metascore and userscore based on developers 

### Analysis on metascore and users in relations to developers

An extended analysis was conducted to discover the Top 10 developers based on metascore and userscore.\
The previous hypothesis testing revealed that there was a positve correlation between metascore and userscore.\
<br>Therefore, based on the above premise, we can expect that the Top 10 developers ranked by metascores should bear some similiarites with the Top 10 developers ranked by user scores. 

<br>However, the result was suprising as there were certain game developers who appeared in the the Top 10 ranking by metascore but not appearing in the Top 10 ranking by user scores and vice versa.

<br>Examples of such developers are Bandai Namco Games & Monolith Soft. These two developers will be further explored in the later sections.

In [None]:
dev_grouped = df_clean
average_metascore = dev_grouped.groupby('developers')['meta_score'].mean().round(2)
average_metascore.sort_values(ascending = False).head(10)

In [None]:
dev_grouped = df_clean
average_userscore = dev_grouped.groupby('developers')['user_score'].mean().round(2)
average_userscore.sort_values(ascending = False).head(10)

### Exploring titles by Monolith Soft

To start the exploratory analysis of Monolith Soft, the dataframe was filtered to included only titles from Monolith Soft.

In this dataset, Monolith Soft developed a total of 8 titles.

When sorted by metascore, the Top 3 titles were
1. Xenoblade Chronicles 3 Expansion Pass Wave 4
2. Xenoblade Chonicles 3
3. Xenoblade Chronicles

However, when sorted by user score, the Top 3 titles were slightly different.
1. Xenoblade Chronicles X
2. Xenoblade Chonicles 3
3. Xenoblade Chronicles 3 Expansion Pass Wave 4

In [None]:
mono_soft_df = df_clean[df_clean['developers'].str.contains('Monolith Soft')]
mono_soft_df.head()

In [None]:
mono_soft_df.info()

In [None]:
mono_soft_df.sort_values(by='meta_score', ascending = False)

In [None]:
mono_soft_df.sort_values(by='user_score', ascending = False)

#### Scatterplot for Monolith Soft titles.

A scatterplot was plotted to analyse the relationship between the metescores and user scores of titles from Monolith Soft and it was evident that there were some expections where the metascore does not reflect on the user score.

In [None]:
sns.set_style(style='whitegrid')
sns.regplot(
    data=mono_soft_df,
    x = "meta_score",
    y = "user_score",
    scatter_kws={'alpha':0.3}
)

plt.title("Exploring relationship between metascore and userscore (Monolith Soft)")
plt.show()

### Exploring titles by Bandai Namco

To start the exploratory analysis of Bandai Namco, the dataframe was filtered to included only titles from Bandai Namco.

In this dataset, Bandai Namco developed a total of 8 titles.

When sorted by metascore, the Top 3 titles were
1. Super Smash Bros Ultimate
2. Super Smash Bros Ultimate for Wii U
3. Super Smash Bros Ultimate for 3DS

However, when sorted by user score, the Top 3 titles were the same but ranked differently.
1. Super Smash Bros Ultimate for Wii U
2. Super Smash Bros Ultimate
3. Super Smash Bros Ultimate for 3DS

In [None]:
bandai_namco_df = df_clean[df_clean['developers'].str.contains('Bandai Namco Games')]
bandai_namco_df.head()

In [None]:
bandai_namco_df.info()

In [None]:
bandai_namco_df.sort_values(by='meta_score', ascending = False)

In [None]:
bandai_namco_df.sort_values(by='user_score', ascending = False)

#### Scatterplot for Bandai Namco titles.

A scatterplot was plotted to analyse the relationship between the metescores and user scores of titles from Bandai Namco and it was evident that there were some expections where the metascore does not reflect on the user score.

In [None]:
sns.set_style(style='whitegrid')
sns.regplot(
    data=bandai_namco_df,
    x = "meta_score",
    y = "user_score",
    scatter_kws={'alpha':0.3}
)

plt.title("Exploring relationship between metascore and userscore (Bandai Namco)")
plt.show()

<a id="6"></a> <br>
> ## 6: Conclusion

To conclude, even though there were statistical significance supporting the postive correlation between metascore and user scores for game titles published by Nintendo. These scores are still subjective to players and game reviewers as seen from the outliers from Monolith Soft and Bandai Namco.

This quick exploratory analysis supports the all time saying of

> ## *Always take reviews with a pinch of salt!*