# Desafio Multiplas Saídas - Base Video Game

Base de dados: https://www.kaggle.com/datasets/gregorut/videogamesales

Na aula principal, é feito os previsores e os alvos em 3 classes diferentes. No Desafio, consiste em fazer a regressão baseado como alvo apenas o global_sales

## Importando bibliotecas

In [4]:
!pip install -q tensorflow==2.16.1

In [5]:
# Importacao desta lib para desativar erro no TensorFlow
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "-1"

In [6]:
import pandas as pd
import tensorflow as tf
import sklearn

In [7]:
pd.__version__,tf.__version__,sklearn.__version__

('2.2.2', '2.16.1', '1.4.2')

In [8]:
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense, Dropout, Activation, Input
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer

## Importando base de dados

In [9]:
base = pd.read_csv('games.csv')
base

Unnamed: 0,Rank,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
0,1,Wii Sports,Wii,2006.0,Sports,Nintendo,41.49,29.02,3.77,8.46,82.74
1,2,Super Mario Bros.,NES,1985.0,Platform,Nintendo,29.08,3.58,6.81,0.77,40.24
2,3,Mario Kart Wii,Wii,2008.0,Racing,Nintendo,15.85,12.88,3.79,3.31,35.82
3,4,Wii Sports Resort,Wii,2009.0,Sports,Nintendo,15.75,11.01,3.28,2.96,33.00
4,5,Pokemon Red/Pokemon Blue,GB,1996.0,Role-Playing,Nintendo,11.27,8.89,10.22,1.00,31.37
...,...,...,...,...,...,...,...,...,...,...,...
16593,16596,Woody Woodpecker in Crazy Castle 5,GBA,2002.0,Platform,Kemco,0.01,0.00,0.00,0.00,0.01
16594,16597,Men in Black II: Alien Escape,GC,2003.0,Shooter,Infogrames,0.01,0.00,0.00,0.00,0.01
16595,16598,SCORE International Baja 1000: The Official Game,PS2,2008.0,Racing,Activision,0.00,0.00,0.00,0.00,0.01
16596,16599,Know How 2,DS,2010.0,Puzzle,7G//AMES,0.00,0.01,0.00,0.00,0.01


## Pre-processamento da base de dados

### Apagando dados irrelevantes

In [10]:
base = base.drop('Rank',axis=1)
base = base.drop('NA_Sales',axis=1)
base = base.drop('EU_Sales',axis=1)
base = base.drop('JP_Sales',axis=1)
base = base.drop('Other_Sales',axis=1) 

In [11]:
base.shape

(16598, 6)

### Verificar se tem dados nulos

base.isnull().sum()

Neste caso, como é uma base de teste, podemos deletar as colunas nulas, porém em um projeto real deveria-se entender o motivo disto e corrigir

In [12]:
base = base.dropna(axis=0)

In [13]:
base.shape

(16291, 6)

In [14]:
base.isnull().sum()

Name            0
Platform        0
Year            0
Genre           0
Publisher       0
Global_Sales    0
dtype: int64

### Verificar a repeticao do campo de name

In [15]:
base['Name'].value_counts()

Name
Need for Speed: Most Wanted    12
FIFA 14                         9
Ratatouille                     9
LEGO Marvel Super Heroes        9
Cars                            8
                               ..
PGA Tour 96                     1
Game & Wario                    1
Angry Birds                     1
Shadow Hearts: Covenant         1
Know How 2                      1
Name: count, Length: 11325, dtype: int64

Baseado na quantidade de nomes, e do total de registros, o campo de name não é relevante para a Rede Neural aprender algo com ele, por conta disto pode-se deletar esta coluna

In [16]:
base = base.drop('Name',axis=1)

In [17]:
base.shape

(16291, 5)

In [18]:
base.head(5)

Unnamed: 0,Platform,Year,Genre,Publisher,Global_Sales
0,Wii,2006.0,Sports,Nintendo,82.74
1,NES,1985.0,Platform,Nintendo,40.24
2,Wii,2008.0,Racing,Nintendo,35.82
3,Wii,2009.0,Sports,Nintendo,33.0
4,GB,1996.0,Role-Playing,Nintendo,31.37


### Divisão de previsores e alvo

In [19]:
base.columns

Index(['Platform', 'Year', 'Genre', 'Publisher', 'Global_Sales'], dtype='object')

In [20]:
X = base.iloc[:,[0,1,2,3]].values
X

array([['Wii', 2006.0, 'Sports', 'Nintendo'],
       ['NES', 1985.0, 'Platform', 'Nintendo'],
       ['Wii', 2008.0, 'Racing', 'Nintendo'],
       ...,
       ['PS2', 2008.0, 'Racing', 'Activision'],
       ['DS', 2010.0, 'Puzzle', '7G//AMES'],
       ['GBA', 2003.0, 'Platform', 'Wanadoo']], dtype=object)

In [21]:
## Alvo baseado em global_sales
y = base.iloc[:,4].values

In [22]:
y

array([8.274e+01, 4.024e+01, 3.582e+01, ..., 1.000e-02, 1.000e-02,
       1.000e-02])

### Converter dados categóricos nominais em ordinais com OneHotEncoder

In [23]:
# PS2 1 0 0 0 0 ...
# PS3 0 1 0 0 0 ...
base['Platform'].value_counts()

Platform
DS      2131
PS2     2127
PS3     1304
Wii     1290
X360    1234
PSP     1197
PS      1189
PC       938
XB       803
GBA      786
GC       542
3DS      499
PSV      410
PS4      336
N64      316
SNES     239
XOne     213
SAT      173
WiiU     143
2600     116
NES       98
GB        97
DC        52
GEN       27
NG        12
SCD        6
WS         6
3DO        3
TG16       2
GG         1
PCFX       1
Name: count, dtype: int64

In [24]:
base.columns

Index(['Platform', 'Year', 'Genre', 'Publisher', 'Global_Sales'], dtype='object')

In [25]:
onehotencoder = ColumnTransformer(transformers=[("OneHot", OneHotEncoder(), [0,2,3])], remainder='passthrough')

In [26]:
X = onehotencoder.fit_transform(X).toarray()
X.shape

(16291, 620)

In [27]:
X[0]

array([0.000e+00, 0.000e+00, 0.000e+00, 0.000e+00, 0.000e+00, 0.000e+00,
       0.000e+00, 0.000e+00, 0.000e+00, 0.000e+00, 0.000e+00, 0.000e+00,
       0.000e+00, 0.000e+00, 0.000e+00, 0.000e+00, 0.000e+00, 0.000e+00,
       0.000e+00, 0.000e+00, 0.000e+00, 0.000e+00, 0.000e+00, 0.000e+00,
       0.000e+00, 0.000e+00, 1.000e+00, 0.000e+00, 0.000e+00, 0.000e+00,
       0.000e+00, 0.000e+00, 0.000e+00, 0.000e+00, 0.000e+00, 0.000e+00,
       0.000e+00, 0.000e+00, 0.000e+00, 0.000e+00, 0.000e+00, 1.000e+00,
       0.000e+00, 0.000e+00, 0.000e+00, 0.000e+00, 0.000e+00, 0.000e+00,
       0.000e+00, 0.000e+00, 0.000e+00, 0.000e+00, 0.000e+00, 0.000e+00,
       0.000e+00, 0.000e+00, 0.000e+00, 0.000e+00, 0.000e+00, 0.000e+00,
       0.000e+00, 0.000e+00, 0.000e+00, 0.000e+00, 0.000e+00, 0.000e+00,
       0.000e+00, 0.000e+00, 0.000e+00, 0.000e+00, 0.000e+00, 0.000e+00,
       0.000e+00, 0.000e+00, 0.000e+00, 0.000e+00, 0.000e+00, 0.000e+00,
       0.000e+00, 0.000e+00, 0.000e+00, 0.000e+00, 

## Estrutura da Rede Neural

In [28]:
## Calcular quantidade de neuronios camada oculta
## somar neuronios de entrada + de saide e dividir por 2
( 620 + 1 ) / 2

310.5

In [29]:
## Aqui as camadas são conectas desta forma - Que é diferente do Sequential
camada_entrada = Input(shape=(620,))
camada_oculta1 = Dense(units = 311, activation='relu')(camada_entrada)
camada_oculta2 = Dense(units = 311, activation='relu')(camada_oculta1)
camada_saida = Dense(units = 1, activation='linear')(camada_oculta2)

In [30]:
regressor = Model(inputs = camada_entrada, outputs = [camada_saida] )

In [31]:
regressor.compile(optimizer='adam',loss='mse')

In [32]:
regressor.fit(X,[y],epochs=500,batch_size=100)

Epoch 1/500
[1m163/163[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 4ms/step - loss: 897.6419
Epoch 2/500
[1m163/163[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 4ms/step - loss: 3.8404
Epoch 3/500
[1m163/163[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 4ms/step - loss: 2.0383
Epoch 4/500
[1m163/163[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 4ms/step - loss: 2.5825
Epoch 5/500
[1m163/163[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 4ms/step - loss: 2.7029
Epoch 6/500
[1m163/163[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 4ms/step - loss: 2.3935
Epoch 7/500
[1m163/163[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 4ms/step - loss: 2.7768
Epoch 8/500
[1m163/163[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 4ms/step - loss: 4.4446
Epoch 9/500
[1m163/163[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 4ms/step - loss: 4.6239
Epoch 10/500
[1m163/163[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 4

<keras.src.callbacks.history.History at 0x70f1e2b75490>

In [33]:
previsao = regressor.predict(X)

[1m510/510[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step


In [34]:
previsao, previsao.mean()

(array([[4.525791  ],
        [4.525791  ],
        [4.525791  ],
        ...,
        [1.1353184 ],
        [0.38291675],
        [0.4107095 ]], dtype=float32),
 0.71007544)

In [35]:
y, y.mean()

(array([8.274e+01, 4.024e+01, 3.582e+01, ..., 1.000e-02, 1.000e-02,
        1.000e-02]),
 0.5409103185808114)

In [36]:
from sklearn.metrics import mean_absolute_error

In [37]:
mean_absolute_error(y,previsao)

0.5936130023879992

## Tentativa de Rede Neural 2

In [38]:
!pip install -q scikeras

In [39]:
import pandas as pd
import tensorflow as tf
import sklearn
import scikeras

In [40]:
pd.__version__, tf.__version__, sklearn.__version__, scikeras.__version__

('2.2.2', '2.16.1', '1.4.2', '0.13.0')

In [41]:
import time
from scikeras.wrappers import KerasRegressor
from tensorflow.keras import backend as k
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn import metrics

In [42]:
def criar_rede():
    k.clear_session()
    regressor = Sequential([
        tf.keras.layers.InputLayer(shape=(620,)),
        tf.keras.layers.Dense(units=311,activation='relu'),
        tf.keras.layers.Dense(units=311,activation='relu'),
        tf.keras.layers.Dense(units=1,activation='linear'),
    ])
    regressor.compile(loss='mean_absolute_error',optimizer='adam',metrics=['mean_absolute_error'])
    return regressor

In [43]:
regressor = KerasRegressor(model = criar_rede, epochs=100, batch_size=300)

In [44]:
resultados = cross_val_score(estimator=regressor,X=X,y=y, cv=5, scoring='neg_mean_absolute_error')

Epoch 1/100
[1m44/44[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 5ms/step - loss: 42.2366 - mean_absolute_error: 42.2366
Epoch 2/100
[1m44/44[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - loss: 6.4676 - mean_absolute_error: 6.4676
Epoch 3/100
[1m44/44[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - loss: 16.9014 - mean_absolute_error: 16.9014
Epoch 4/100
[1m44/44[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - loss: 16.1695 - mean_absolute_error: 16.1695
Epoch 5/100
[1m44/44[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - loss: 14.3100 - mean_absolute_error: 14.3100
Epoch 6/100
[1m44/44[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - loss: 12.4352 - mean_absolute_error: 12.4352
Epoch 7/100
[1m44/44[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - loss: 11.3982 - mean_absolute_error: 11.3982
Epoch 8/100
[1m44/44[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7m

In [45]:
resultados

array([-1.78212549, -0.22415771, -0.1947349 , -0.44481629, -0.18475416])

In [46]:
abs(resultados.mean())

0.5661177077646865

In [47]:
resultados.std()

0.615416759720911