# Data Science Academy

https://www.datascienceacademy.com.br/start

### Projeto com Feedback 3

Gabriel Quaiotti - Abr 2020

A satisfação do cliente é uma medida fundamental de sucesso. Clientes
insatisfeitos cancelam seus serviços e raramente expressam sua insatisfação antes
de sair. Clientes satisfeitos, por outro lado, se tornam defensores da marca!
O Banco Santander está pedindo para ajudá-los a identificar clientes
insatisfeitos no início do relacionamento. Isso permitiria que o Santander
adotasse medidas proativas para melhorar a felicidade de um cliente antes que
seja tarde demais.
Neste projeto de aprendizado de máquina, você trabalhará com centenas
de recursos anônimos para prever se um cliente está satisfeito ou insatisfeito com
sua experiência bancária.
Defina claramente o problema de negócio, faça a coleta e preparo dos
dados, escolha um algoritmo, treine o modelo e avalie a acurácia, que deve ser de
pelo menos 70%.

https://www.kaggle.com/c/santander-customer-satisfaction/overview/evaluation

In [1]:
# libraries
from pandas import read_csv
from pandas import concat
from pandas import DataFrame
from pandas import Series

from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import MinMaxScaler

import pickle

# TEST DATASET

In [2]:
# Read the test dataset
test_ds = read_csv('../data/test.csv')

In [3]:
test_ds.head()

Unnamed: 0,ID,var3,var15,imp_ent_var16_ult1,imp_op_var39_comer_ult1,imp_op_var39_comer_ult3,imp_op_var40_comer_ult1,imp_op_var40_comer_ult3,imp_op_var40_efect_ult1,imp_op_var40_efect_ult3,...,saldo_medio_var29_ult3,saldo_medio_var33_hace2,saldo_medio_var33_hace3,saldo_medio_var33_ult1,saldo_medio_var33_ult3,saldo_medio_var44_hace2,saldo_medio_var44_hace3,saldo_medio_var44_ult1,saldo_medio_var44_ult3,var38
0,2,2,32,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,40532.1
1,5,2,35,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,45486.72
2,6,2,23,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,46993.95
3,7,2,24,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,187898.61
4,9,2,23,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,73649.73


In [4]:
# Prepare test dataset
test_ds = test_ds[['ID', 'var15', 'imp_op_var41_efect_ult1', 'imp_op_var39_efect_ult1', 'ind_var13', 'ind_var30_0', 'ind_var30', 'saldo_var30', 'num_var22_hace2', 'saldo_medio_var5_ult1', 'var38']]

In [5]:
test_ds.head()

Unnamed: 0,ID,var15,imp_op_var41_efect_ult1,imp_op_var39_efect_ult1,ind_var13,ind_var30_0,ind_var30,saldo_var30,num_var22_hace2,saldo_medio_var5_ult1,var38
0,2,32,0.0,0.0,0,1,1,6.0,0,6.0,40532.1
1,5,35,0.0,0.0,0,1,1,3.0,0,3.0,45486.72
2,6,23,60.0,60.0,0,1,1,30.0,0,51.45,46993.95
3,7,24,0.0,0.0,0,1,0,0.0,0,0.0,187898.61
4,9,23,0.0,0.0,0,1,1,30.0,0,30.0,73649.73


### SCALE

In [6]:
scaler = MinMaxScaler( feature_range = (0, 1) )

scaled_df = DataFrame(scaler.fit_transform( test_ds.drop( ['ID'], axis=1 ) ), index = test_ds.drop( ['ID'], axis=1 ).index, columns = test_ds.drop( ['ID'], axis=1 ).columns)
scaled_df = concat( [test_ds.ID, scaled_df], axis=1 )
test_ds = scaled_df
test_ds.head()

Unnamed: 0,ID,var15,imp_op_var41_efect_ult1,imp_op_var39_efect_ult1,ind_var13,ind_var30_0,ind_var30,saldo_var30,num_var22_hace2,saldo_medio_var5_ult1,var38
0,2,0.27,0.0,0.0,0.0,1.0,1.0,0.000943,0.0,0.002449,0.001361
1,5,0.3,0.0,0.0,0.0,1.0,1.0,0.000943,0.0,0.002445,0.001533
2,6,0.18,0.000889,0.000889,0.0,1.0,1.0,0.000949,0.0,0.002518,0.001585
3,7,0.19,0.0,0.0,0.0,1.0,0.0,0.000942,0.0,0.00244,0.006462
4,9,0.18,0.0,0.0,0.0,1.0,1.0,0.000949,0.0,0.002486,0.002507


In [7]:
model = pickle.load(open('../model/model.sav', 'rb'))

In [8]:
x = test_ds.drop(['ID'], axis=1)
predict = model.predict(x)

In [9]:
submit_ds = concat( [test_ds.ID, Series(predict)], axis=1 )
submit_ds.columns = ['ID', 'TARGET']
submit_ds.to_csv('../output/submit.csv', index=False)

# End