# Video Game Sales

`Matheus Raz (mrol@cin.ufpe.br)`

`João Paulo Lins (jplo@cin.ufpe.br)`

A base escolhida foi coletada do repositório [Kaggle](https://www.kaggle.com/rush4ratio/video-game-sales-with-ratings) e é referente à vendas de jogos expressa na escala de milhão (10^6) no mundo todo, separados pelas principais regiões do mercado como Europa, América do Norte, Japão, Resto do Mundo e o somatório de tudo (Vendas globais). A base apresentar em torno de 16.719 linhas de dados e 16 atributos referentes à cada jogo. Abaixo segue algumas informações relevantes quanto ao tipo dos atributos.

In [2]:
from IPython.display import display

import numpy as np
import pandas as pd
import sklearn as sk

In [3]:
df = pd.read_csv('VideoGamesSalesWithRatings.csv')
df.head()

Unnamed: 0,Name,Platform,Year_of_Release,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales,Critic_Score,Critic_Count,User_Score,User_Count,Developer,Rating
0,Wii Sports,Wii,2006.0,Sports,Nintendo,41.36,28.96,3.77,8.45,82.53,76.0,51.0,8.0,322.0,Nintendo,E
1,Super Mario Bros.,NES,1985.0,Platform,Nintendo,29.08,3.58,6.81,0.77,40.24,,,,,,
2,Mario Kart Wii,Wii,2008.0,Racing,Nintendo,15.68,12.76,3.79,3.29,35.52,82.0,73.0,8.3,709.0,Nintendo,E
3,Wii Sports Resort,Wii,2009.0,Sports,Nintendo,15.61,10.93,3.28,2.95,32.77,80.0,73.0,8.0,192.0,Nintendo,E
4,Pokemon Red/Pokemon Blue,GB,1996.0,Role-Playing,Nintendo,11.27,8.89,10.22,1.0,31.37,,,,,,


## Contextualização

O mercado de games é algo que há um bom tempo só vem crescendo e ganhando cada vez mais relevância em diversos aspectos que se possa imaginar. E como tal, podemos observar na base escolhida como se têm dado essa crescente e extrair conhecimento com base nesses números.

In [4]:
display(df.info())
display(df.describe())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 16719 entries, 0 to 16718
Data columns (total 16 columns):
Name               16717 non-null object
Platform           16719 non-null object
Year_of_Release    16450 non-null float64
Genre              16717 non-null object
Publisher          16665 non-null object
NA_Sales           16719 non-null float64
EU_Sales           16719 non-null float64
JP_Sales           16719 non-null float64
Other_Sales        16719 non-null float64
Global_Sales       16719 non-null float64
Critic_Score       8137 non-null float64
Critic_Count       8137 non-null float64
User_Score         10015 non-null object
User_Count         7590 non-null float64
Developer          10096 non-null object
Rating             9950 non-null object
dtypes: float64(9), object(7)
memory usage: 2.0+ MB


None

Unnamed: 0,Year_of_Release,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales,Critic_Score,Critic_Count,User_Count
count,16450.0,16719.0,16719.0,16719.0,16719.0,16719.0,8137.0,8137.0,7590.0
mean,2006.487356,0.26333,0.145025,0.077602,0.047332,0.533543,68.967679,26.360821,162.229908
std,5.878995,0.813514,0.503283,0.308818,0.18671,1.547935,13.938165,18.980495,561.282326
min,1980.0,0.0,0.0,0.0,0.0,0.01,13.0,3.0,4.0
25%,2003.0,0.0,0.0,0.0,0.0,0.06,60.0,12.0,10.0
50%,2007.0,0.08,0.02,0.0,0.01,0.17,71.0,21.0,24.0
75%,2010.0,0.24,0.11,0.04,0.03,0.47,79.0,36.0,81.0
max,2020.0,41.36,28.96,10.22,10.57,82.53,98.0,113.0,10665.0


In [5]:
minor = df['Year_of_Release'].min()
major = df['Year_of_Release'].max()
print("Menor ano da base: %d"%(minor))
display(df[df['Year_of_Release'] == minor])
print("Maior ano da base: %d"%(major))
display(df[df['Year_of_Release'] == major])

Menor ano da base: 1980


Unnamed: 0,Name,Platform,Year_of_Release,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales,Critic_Score,Critic_Count,User_Score,User_Count,Developer,Rating
262,Asteroids,2600,1980.0,Shooter,Atari,4.0,0.26,0.0,0.05,4.31,,,,,,
546,Missile Command,2600,1980.0,Shooter,Atari,2.56,0.17,0.0,0.03,2.76,,,,,,
1764,Kaboom!,2600,1980.0,Misc,Activision,1.07,0.07,0.0,0.01,1.15,,,,,,
1968,Defender,2600,1980.0,Misc,Atari,0.99,0.05,0.0,0.01,1.05,,,,,,
2650,Boxing,2600,1980.0,Fighting,Activision,0.72,0.04,0.0,0.01,0.77,,,,,,
4019,Ice Hockey,2600,1980.0,Sports,Activision,0.46,0.03,0.0,0.01,0.49,,,,,,
5360,Freeway,2600,1980.0,Action,Activision,0.32,0.02,0.0,0.0,0.34,,,,,,
6301,Bridge,2600,1980.0,Misc,Activision,0.25,0.02,0.0,0.0,0.27,,,,,,
6876,Checkers,2600,1980.0,Misc,Atari,0.22,0.01,0.0,0.0,0.24,,,,,,


Maior ano da base: 2020


Unnamed: 0,Name,Platform,Year_of_Release,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales,Critic_Score,Critic_Count,User_Score,User_Count,Developer,Rating
5936,Imagine: Makeup Artist,DS,2020.0,Simulation,Ubisoft,0.27,0.0,0.0,0.02,0.29,,,tbd,,Ubisoft,E


## Possíveis Hipóteses

1 - É possível prever o número de vendas globais de uma publisher baseado no gênero e rating do
game a ser lançado?

2 - Devido à políticas contra a violência crescendo no mundo, é constatavel que as vendas globais de jogos do gênero "Shooter" têm diminuído ao longo dos anos?

3 - É possível prever o número de vendas em outras regiões do mundo baseado nas vendas da América do Norte e Europa?

4 - É possível prever que um jogo de determinado gênero irá vender mais ou menos em uma plataforma do que em outra (levando em consideração a geração de consoles atuais)?

5 - Levando em consideração jogos de mesmo gênero lançados no mesmo ano por publishers diferentes, é possível prever se um jogo irá vender mais que outro?