# Video game Sales - Analysis

- You should remain focused on your research question(s) - it is very easy to get lost down rabbit holes in data analyses projects.

- If you find that your research questions are not that interesting, or you find more interesting questions (especially after your EDA) you may revise them, or add more.

- Use the lab times, as well as our office hours (TAs and instructors), to get help and guidance on your analyses.

- You should experiment with “plenty of” data visualizations to try and visualize your dataset and answer your research questions.

- Give us a narrative/story of your explorations as you go along, in-line with your data - use the new Markdown skills you learned in Task 1!

In [1]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import pandas_profiling as pdp

sns.set_style("ticks")



import sys
sys.path.insert(1, '../scripts')

# Imports the project_functions.py file containing the different functions
import project_functions

## Intro
This analysis looks at the different aspects of videogames over the past ~40 years using a random sampls of videogames, representative of the entire industry, to relate real life qualitative events to measurable changes in the sales and consumption of videogames.

The data used in this analysis was obtained from [kaggle]("https://www.kaggle.com/rush4ratio/video-game-sales-with-ratings"), and is a user compiled data set which extends a scrape already performed by Metacritic, *"a website that aggregates reviews of films, TV shows, music albums, video games and formerly, books"* [Wikipedia] (https://en.wikipedia.org/wiki/Metacritic). It includes over 16,000 entries, of which 6,825 entries have all descriptors complete.

Each entry in this dataset is a videogame, and various pieces of information relating to it. As the extracted information will show, this dataset does not contain all videogames from 1980-2016, and although it contains many entries, it does not contain all videogames from 1980-2016, and so in this analysis it is considered a sample of that set of all videogames.

We are going to start by cleaning the data:
  - removing all rows with na entries
  - removing user scores with TBD
  - casting all entries in the table to their proper datatypes (they are all saved as strings), i.e. integers, floats, categories
below is the head of our cleaned data:

In [2]:
Cleaned_DataFrame = project_functions.load_and_process()
Cleaned_DataFrame.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 16719 entries, 0 to 16718
Data columns (total 16 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   Name             16717 non-null  object 
 1   Platform         16719 non-null  object 
 2   Year_of_Release  16450 non-null  float64
 3   Genre            16717 non-null  object 
 4   Publisher        16665 non-null  object 
 5   NA_Sales         16719 non-null  float64
 6   EU_Sales         16719 non-null  float64
 7   JP_Sales         16719 non-null  float64
 8   Other_Sales      16719 non-null  float64
 9   Global_Sales     16719 non-null  float64
 10  Critic_Score     8137 non-null   float64
 11  Critic_Count     8137 non-null   float64
 12  User_Score       10015 non-null  object 
 13  User_Count       7590 non-null   float64
 14  Developer        10096 non-null  object 
 15  Rating           9950 non-null   object 
dtypes: float64(9), object(7)
memory usage: 2.0+ MB
None


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  rawdata.User_Score[rownumber] = 'NaN'


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6825 entries, 0 to 6824
Data columns (total 16 columns):
 #   Column           Non-Null Count  Dtype   
---  ------           --------------  -----   
 0   Name             6825 non-null   string  
 1   Platform         6825 non-null   category
 2   Year_of_Release  6825 non-null   int64   
 3   Genre            6825 non-null   category
 4   Publisher        6825 non-null   string  
 5   NA_Sales         6825 non-null   float64 
 6   EU_Sales         6825 non-null   float64 
 7   JP_Sales         6825 non-null   float64 
 8   Other_Sales      6825 non-null   float64 
 9   Global_Sales     6825 non-null   float64 
 10  Critic_Score     6825 non-null   float64 
 11  Critic_Count     6825 non-null   float64 
 12  User_Score       6825 non-null   float64 
 13  User_Count       6825 non-null   float64 
 14  Developer        6825 non-null   string  
 15  Rating           6825 non-null   category
dtypes: category(3), float64(9), int64(1), stri

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  dataclean['Name'] = dataclean['Name'].astype("string")
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  dataclean['Platform'] = dataclean['Platform'].astype("category")
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  dataclean['Year_of_Release'] = dataclean['Year_of_Release'].astype('int64')
A value i

Unnamed: 0,Name,Platform,Year_of_Release,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales,Critic_Score,Critic_Count,User_Score,User_Count,Developer,Rating
0,Wii Sports,Wii,2006,Sports,Nintendo,41.36,28.96,3.77,8.45,82.53,76.0,51.0,8.0,322.0,Nintendo,E
1,Mario Kart Wii,Wii,2008,Racing,Nintendo,15.68,12.76,3.79,3.29,35.52,82.0,73.0,8.3,709.0,Nintendo,E
2,Wii Sports Resort,Wii,2009,Sports,Nintendo,15.61,10.93,3.28,2.95,32.77,80.0,73.0,8.0,192.0,Nintendo,E
3,New Super Mario Bros.,DS,2006,Platform,Nintendo,11.28,9.14,6.5,2.88,29.8,89.0,65.0,8.5,431.0,Nintendo,E
4,Wii Play,Wii,2006,Misc,Nintendo,13.96,9.18,2.93,2.84,28.92,58.0,41.0,6.6,129.0,Nintendo,E


## Platform
Platform can act as an isolated instance of videogame markets, where we can see how they succeed based on certain conditions which vary for each platform. Of course the competition between each of these platforms must be considered, but generally we can examine these Platforms and their characteristic to see how well the videogames are selling for each of them.

The things we first need to consider is that there will be variance in the popularity of Platforms across different regions in the world, and this can be the result of many different things, alot of which cannot be quantified; the culture and history of a region cannot be boiled down to numbers and data.

The question we are looking at is can we measure how successful a videogame will be based on evidence retreived by looking at different consoles
//
insert stuff here
//

## Ratings, Sales, Genres



## more stuff


## Take-Aways/conc.
