# **Previsão de Inventário - TTM2**
# Você é capaz de prever se um material irá faltar no inventário? **

##Base line

Um modelo usado como ponto de referência para comparar o desempenho de outro modelo (normalmente, mais complexo). Por exemplo, um modelo de regressão logística pode servir como uma boa linha de base para um modelo profundo.

Para um problema específico, a baseline ajuda os desenvolvedores do modelo a quantificar o desempenho mínimo esperado que um novo modelo deve atingir para que o novo modelo seja útil.

Para esse projeto foi escolhido como baseline o modelo de apredizado Random Forest ou floresta aleatória, uma abordagem de conjunto para encontrar a árvore de decisão que melhor se ajusta aos dados de treinamento, criando muitas árvores de decisão e determinando a "média". A parte "aleatória" do termo refere-se à construção de cada uma das árvores de decisão a partir de uma seleção aleatória de recursos; a "floresta" refere-se ao conjunto de árvores de decisão.

Neste caso a previsão foi de 0.91543 ROC como dempenho minimo: RandomForestClassifier(n_estimators=30, random_state=64, n_jobs=-1).
A curva ROC mostra o quão bom o modelo criado pode distinguir entre duas coisas (já que é utilizado para classificação). Essas duas coisas podem ser 0 ou 1, ou positivo e negativo. Os melhores modelos conseguem distinguir com precisão o binômio


# Modelo utilizado

## MLP - Perceptron Multicamadas
A perceptron multicamadas (MLP) é uma rede neural semelhante à perceptron, mas com mais de uma camada de neurônios em alimentação direta. Tal tipo de rede é composta por camadas de neurônios ligadas entre si por sinapses com pesos. O aprendizado nesse tipo de rede é geralmente feito através do algoritmo de retro-propagação do erro.

![alt text](https://miro.medium.com/max/564/1*KDiqpWOgtCnO8x3wZJHmDA.png)

## Combinando neurônios em camadas

![alt text](https://miro.medium.com/max/658/1*H61ieko1YyHzqBNZTzz7IQ.png)

# Módulos utilizados

In [0]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.neural_network import MLPClassifier

from sklearn.decomposition import PCA
from sklearn.pipeline import Pipeline

from sklearn.model_selection import train_test_split

# Carregando o dataframe

In [2]:
from google.colab import drive
drive.mount('/content/drive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at /content/drive


In [3]:
# Leitura dos arquivos, preenchendo valores faltantes com -1
train = pd.read_csv('drive/My Drive/Machine Learning/Previsão de inventário/train.csv', index_col='sku').fillna(-1)
test = pd.read_csv('drive/My Drive/Machine Learning/Previsão de inventário/test.csv', index_col='sku').fillna(-1)

  interactivity=interactivity, compiler=compiler, result=result)


# Visualizando os dados

In [4]:
train.describe()

Unnamed: 0,national_inv,lead_time,in_transit_qty,forecast_3_month,forecast_6_month,forecast_9_month,sales_1_month,sales_3_month,sales_6_month,sales_9_month,min_bank,potential_issue,pieces_past_due,perf_6_month_avg,perf_12_month_avg,local_bo_qty,deck_risk,oe_constraint,ppap_risk,stop_auto_buy,rev_stop,isBackorder
count,1350955.0,1350955.0,1350955.0,1350955.0,1350955.0,1350955.0,1350955.0,1350955.0,1350955.0,1350955.0,1350955.0,1350955.0,1350955.0,1350955.0,1350955.0,1350955.0,1350955.0,1350955.0,1350955.0,1350955.0,1350955.0,1350955.0
mean,490.877,7.347406,43.08673,179.7218,346.8696,509.5988,56.14105,175.5745,343.0677,526.4156,52.56003,0.0005277748,2.080692,-6.895363,-6.454207,0.6971172,0.225821,0.0001695097,0.1204045,0.9633378,0.0004263651,0.007244505
std,28665.12,7.153678,1255.806,5152.311,9924.598,14608.83,1968.606,5121.892,9561.94,14787.94,1211.626,0.0229673,252.1026,26.59268,25.86997,38.25771,0.418122,0.01301849,0.325434,0.1879312,0.02064421,0.08480582
min,-27256.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,0.0,-1.0,-99.0,-99.0,-1.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,4.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.63,0.66,0.0,0.0,0.0,0.0,1.0,0.0,0.0
50%,15.0,8.0,0.0,0.0,0.0,0.0,0.0,1.0,2.0,4.0,0.0,0.0,0.0,0.82,0.81,0.0,0.0,0.0,0.0,1.0,0.0,0.0
75%,80.0,8.0,0.0,4.0,12.0,20.0,4.0,15.0,31.0,47.0,3.0,0.0,0.0,0.96,0.95,0.0,0.0,0.0,0.0,1.0,0.0,0.0
max,12334400.0,52.0,489408.0,1510592.0,2461360.0,3777304.0,741774.0,1105478.0,2146625.0,3205172.0,313319.0,1.0,146496.0,1.0,1.0,12530.0,1.0,1.0,1.0,1.0,1.0,1.0


# Intrepret ML - Microsoft

O InterpretML é um pacote python de código aberto para treinar modelos de aprendizado de máquina interpretáveis ​​e explicar os sistemas de caixa preta. A interpretabilidade é essencial para:

Depuração de modelo - Por que meu modelo cometeu esse erro? <br>
* Detectando viés - Meu modelo discrimina?<br>
* Cooperação humano-AI - Como posso entender e confiar nas decisões do modelo?<br>
* Conformidade regulamentar - Meu modelo atende aos requisitos legais?<br>
* Aplicações de alto risco - Assistência médica, finanças, judiciário, ...

A Microsoft Research desenvolveu um algoritmo chamado EBM (Explicable Boosting Machine) *, que possui alta precisão e capacidade de interpretação. A EBM usa técnicas modernas de aprendizado de máquina, como ensacamento e reforço, para dar nova vida aos GAMs tradicionais (modelos aditivos generalizados).

In [5]:
pip install -U interpret

Collecting interpret
  Downloading https://files.pythonhosted.org/packages/d6/b3/e53fb0ecf09fc4a443a9c7da15a56916544b544c9f134d0496b1152be795/interpret-0.1.19-py3-none-any.whl
Collecting interpret-core[dash,debug,decisiontree,ebm,lime,linear,notebook,plotly,required,sensitivity,shap,treeinterpreter]>=0.1.19
[?25l  Downloading https://files.pythonhosted.org/packages/b0/c6/1414b77de333f11bc46c6033fcdb7b965513b2e0d3310cae87e5484a64f5/interpret_core-0.1.19-py3-none-any.whl (7.8MB)
[K     |████████████████████████████████| 7.8MB 2.7MB/s 
Collecting dash-table>=4.1.0; extra == "dash"
[?25l  Downloading https://files.pythonhosted.org/packages/17/3b/955732356f7078feb4fbae38a9eb971fa55278cc31310c8ae1f0186f1b10/dash_table-4.5.1.tar.gz (1.7MB)
[K     |████████████████████████████████| 1.7MB 32.2MB/s 
Collecting dash>=1.0.0; extra == "dash"
[?25l  Downloading https://files.pythonhosted.org/packages/30/7e/0e7d010ae02797b522e368d53db24ac78f14dc982af6a599c06a67fce576/dash-1.7.0.tar.gz (62kB)
[K

## Separação de atributos de entrada (X) e saída (y) - Treino e Teste

In [0]:
X_train, y_train = train.drop('isBackorder', axis=1), train['isBackorder']

seed = 1
X_train, X_test, y_train, y_test = train_test_split(X_train, y_train, test_size=0.20, random_state=seed)

## Treinando um sistema de classificação caixa preta - MLPClassifier

In [7]:
pca = PCA()
rf = MLPClassifier(hidden_layer_sizes=(40,40,20,40,30,40,50), activation='relu', alpha=0.0001, learning_rate='adaptive', solver='adam', random_state=42, max_iter=10, shuffle=True, verbose=True)

blackbox_model = Pipeline([('pca', pca), ('rf', rf)])
blackbox_model.fit(X_train, y_train)

Iteration 1, loss = inf
Iteration 2, loss = inf
Iteration 3, loss = inf
Iteration 4, loss = inf
Iteration 5, loss = 0.03455246
Iteration 6, loss = 0.03402215
Iteration 7, loss = 0.03395399
Iteration 8, loss = inf
Iteration 9, loss = inf
Iteration 10, loss = 0.03392199




Pipeline(memory=None,
         steps=[('pca',
                 PCA(copy=True, iterated_power='auto', n_components=None,
                     random_state=None, svd_solver='auto', tol=0.0,
                     whiten=False)),
                ('rf',
                 MLPClassifier(activation='relu', alpha=0.0001,
                               batch_size='auto', beta_1=0.9, beta_2=0.999,
                               early_stopping=False, epsilon=1e-08,
                               hidden_layer_sizes=(40, 40, 20, 40, 30, 40, 50),
                               learning_rate='adaptive',
                               learning_rate_init=0.001, max_iter=10,
                               momentum=0.9, n_iter_no_change=10,
                               nesterovs_momentum=True, power_t=0.5,
                               random_state=42, shuffle=True, solver='adam',
                               tol=0.0001, validation_fraction=0.1,
                               verbose=True, warm_start=F

## Mostra o desempenho do modelo de caixa preta - MLPClassifier

In [8]:
from interpret import show
from interpret.perf import ROC

blackbox_perf = ROC(blackbox_model.predict_proba).explain_perf(X_test, y_test, name='Blackbox')
show(blackbox_perf)


Output hidden; open in https://colab.research.google.com to view.

## Explicações locais: Como uma previsão individual foi feita

In [9]:
from interpret.blackbox import LimeTabular
from interpret import show

#Blackbox explainers need a predict function, and optionally a dataset
lime = LimeTabular(predict_fn=blackbox_model.predict_proba, data=X_train, random_state=1)

#Pick the instances to explain, optionally pass in labels if you have them
lime_local = lime.explain_local(X_test[:5], y_test[:5], name='LIME')

show(lime_local)


Ill-conditioned matrix (rcond=4.45231e-27): result may not be accurate.


Ill-conditioned matrix (rcond=5.72876e-27): result may not be accurate.


Ill-conditioned matrix (rcond=5.77946e-27): result may not be accurate.


Ill-conditioned matrix (rcond=5.71372e-27): result may not be accurate.



In [10]:
from interpret.blackbox import ShapKernel
import numpy as np
feature_names = list(X_train.columns)

background_val = np.median(X_train, axis=0).reshape(1, -1)
shap = ShapKernel(predict_fn=blackbox_model.predict_proba, data=background_val, feature_names=feature_names)
shap_local = shap.explain_local(X_test[:5], y_test[:5], name='SHAP')
show(shap_local)

HBox(children=(IntProgress(value=0, max=5), HTML(value='')))


l1_reg="auto" is deprecated and in the next version (v0.29) the behavior will change from a conditional use of AIC to simply "num_features(10)"!


l1_reg="auto" is deprecated and in the next version (v0.29) the behavior will change from a conditional use of AIC to simply "num_features(10)"!


l1_reg="auto" is deprecated and in the next version (v0.29) the behavior will change from a conditional use of AIC to simply "num_features(10)"!


l1_reg="auto" is deprecated and in the next version (v0.29) the behavior will change from a conditional use of AIC to simply "num_features(10)"!


l1_reg="auto" is deprecated and in the next version (v0.29) the behavior will change from a conditional use of AIC to simply "num_features(10)"!






## Explicações globais: Como o modelo se comporta de maneira geral

In [11]:
from interpret.blackbox import MorrisSensitivity

sensitivity = MorrisSensitivity(predict_fn=blackbox_model.predict_proba, data=X_train)
sensitivity_global = sensitivity.explain_global(name="Global Sensitivity")

show(sensitivity_global)

In [12]:
from interpret.blackbox import PartialDependence

pdp = PartialDependence(predict_fn=blackbox_model.predict_proba, data=X_train)
pdp_global = pdp.explain_global(name='Partial Dependence')

show(pdp_global)

## Comparar os modelos

In [13]:
show([blackbox_perf, lime_local, shap_local, sensitivity_global, pdp_global])

# Gerando a previsão em um arquivo CSV para submissão no Kaggle

In [14]:
# Predição da probabilidade de falta (y) para novos valores
y_pred = model.predict_proba(test)[:,1]

NameError: ignored

In [0]:
# Cria e salva arquivo para submissão
test['isBackorder'] = y_pred
pred = test['isBackorder'].reset_index()
pred.to_csv('submissionMLPv4.1.csv',index=False)