# Modelos Lineares 

---

## Naive Bayes
Existem algumas maneiras interessantes para definir um cientista de dados. Duas delas são *Um estatístico que sabe computaço* e *Um computólogo que sabe estatística*. Estatística é uma arma poderosa que nunca deve ser deixada de lado quando se trabalha com dados.

Vamos explorar um pouco mais dela agora.

## Pausa para o vinho
Calma, claro que não vamos parar essa aula maravilhosa para tomar vinho (embora não pareça uma má ideia), mas que tal se a gente misturasse essas duas coisas lindas juntas: vinhos e modelos lineares?

No arquivo *winequality-white.csv* temos um conjunto de dados de qualidade do vinho que envolve a previsão da qualidade dos vinhos brancos em uma escala, com medidas químicas de cada vinho. É um problema de classficação multiclasse. Existem 4.898 observações com 11 atributos de entrada e 1 atributo de saída. Os nomes das variáveis são os seguintes:

 1. Fixed acidity
 2. Volatile acidity
 3. Citric acid
 4. Residual sugar
 5. Chlorides
 6. Free sulfur dioxide
 7. Total sulfur dioxide
 8. Densidity
 9. pH
 10. Sulphates
 11. Alcohol
 12. Quality (score between 0 and 10)

 Vamos dar uma olhada em algumas métricas de estística descritiva sobre os atributos:

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

data = pd.read_csv('/content/winequality-white.csv', delimiter=';')
data.head()

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
0,7.0,0.27,0.36,20.7,0.045,45.0,170.0,1.001,3.0,0.45,8.8,6
1,6.3,0.3,0.34,1.6,0.049,14.0,132.0,0.994,3.3,0.49,9.5,6
2,8.1,0.28,0.4,6.9,0.05,30.0,97.0,0.9951,3.26,0.44,10.1,6
3,7.2,0.23,0.32,8.5,0.058,47.0,186.0,0.9956,3.19,0.4,9.9,6
4,7.2,0.23,0.32,8.5,0.058,47.0,186.0,0.9956,3.19,0.4,9.9,6


Outra coisa interessante que podemos notar é a **correlação** entre os atributos, ou seja, como os atributos influenciam entre si:

In [None]:
data.corr()

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
fixed acidity,1.0,-0.022697,0.289181,0.089021,0.023086,-0.049396,0.09107,0.265331,-0.425858,-0.017143,-0.120881,-0.113663
volatile acidity,-0.022697,1.0,-0.149472,0.064286,0.070512,-0.097012,0.089261,0.027114,-0.031915,-0.035728,0.067718,-0.194723
citric acid,0.289181,-0.149472,1.0,0.094212,0.114364,0.094077,0.121131,0.149503,-0.163748,0.062331,-0.075729,-0.009209
residual sugar,0.089021,0.064286,0.094212,1.0,0.088685,0.299098,0.401439,0.838966,-0.194133,-0.026664,-0.450631,-0.097577
chlorides,0.023086,0.070512,0.114364,0.088685,1.0,0.101392,0.19891,0.257211,-0.090439,0.016763,-0.360189,-0.209934
free sulfur dioxide,-0.049396,-0.097012,0.094077,0.299098,0.101392,1.0,0.615501,0.29421,-0.000618,0.059217,-0.250104,0.008158
total sulfur dioxide,0.09107,0.089261,0.121131,0.401439,0.19891,0.615501,1.0,0.529881,0.002321,0.134562,-0.448892,-0.174737
density,0.265331,0.027114,0.149503,0.838966,0.257211,0.29421,0.529881,1.0,-0.093591,0.074493,-0.780138,-0.307123
pH,-0.425858,-0.031915,-0.163748,-0.194133,-0.090439,-0.000618,0.002321,-0.093591,1.0,0.155951,0.121432,0.099427
sulphates,-0.017143,-0.035728,0.062331,-0.026664,0.016763,0.059217,0.134562,0.074493,0.155951,1.0,-0.017433,0.053678


É, dá para ver que existem atributos fortemente e fracamente correlacionados...

Vamos ver, agora, a covariância entre eles:

In [None]:
data.cov()

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
fixed acidity,0.712114,-0.001931,0.029533,0.381022,0.000426,-0.708919,3.266013,0.00067,-0.054265,-0.001651,-0.125533,-0.084947
volatile acidity,-0.001931,0.01016,-0.001823,0.032865,0.000155,-0.1663,0.382354,8e-06,-0.000486,-0.000411,0.0084,-0.017382
citric acid,0.029533,-0.001823,0.014646,0.057829,0.000302,0.19363,0.622989,5.4e-05,-0.002992,0.000861,-0.011278,-0.000987
residual sugar,0.381022,0.032865,0.057829,25.72577,0.009828,25.800578,86.531303,0.012727,-0.148684,-0.015435,-2.81274,-0.438316
chlorides,0.000426,0.000155,0.000302,0.009828,0.000477,0.037674,0.184687,1.7e-05,-0.000298,4.2e-05,-0.009684,-0.004062
free sulfur dioxide,-0.708919,-0.1663,0.19363,25.800578,0.037674,289.24272,444.865891,0.014966,-0.001587,0.114938,-5.234509,0.122878
total sulfur dioxide,3.266013,0.382354,0.622989,86.531303,0.184687,444.865891,1806.085491,0.067352,0.014894,0.652645,-23.476605,-6.576746
density,0.00067,8e-06,5.4e-05,0.012727,1.7e-05,0.014966,0.067352,9e-06,-4.2e-05,2.5e-05,-0.002871,-0.000814
pH,-0.054265,-0.000486,-0.002992,-0.148684,-0.000298,-0.001587,0.014894,-4.2e-05,0.022801,0.002688,0.022565,0.013297
sulphates,-0.001651,-0.000411,0.000861,-0.015435,4.2e-05,0.114938,0.652645,2.5e-05,0.002688,0.013025,-0.002448,0.005425


A matriz de covariância é muito importante quando lidamos com *FDPs* (Função de Densidade Probabilística). Podemos usar essas funções para calcular a probabilidade de um elemento $x$ oertencer a uma classe $c$ (a qual possui matriz de covariância $C$ e média dos atributos $M_c$) utilizando uma FDP Gaussiana, por exemplo:


\begin{equation}
  p_c(x) = \frac{1}{\sqrt{2 \pi |C|}} e^{-(\frac{(x-M_c)^2}{2|C|})}
\end{equation}

Pela lógica, a classe $c$ que tiver maior probabilidade é a classe que representará $x$. Mas há um porém na fórmula acima (não, não é a fórmuula em si). Qual você acha que é?

## E se formos mais ingênuos (Naive)?

Calcular o determinante da matriz de covariância pode ser (quando possível) bem complicado. Sabe um tipo de matriz que é fácil de calcular o determinante? As diagonais. Poxa, matrizes diagonais são lindas! Basta multiplicar os elementos da diagonal principal e... pronto! Temos o determinante!

Se ao menos nossa matriz de covariância fosse diagonal...

Mas espera, e se nos forçarmos ela a ser?

E se assumirmos que os atributos são **independentes**, ou seja, eles não influenciam uns nos outros? Se for assim, não haveria correlação. Não havendo correlação, os *elementos fora da diagonal principal serão nulos*!

Por isso a técnica chama-se Naive Bayes: pois ela assume que os atributos são todos independentes e não influenciam uns nos outros. É muita ingenuidade, não mesmo?

Nesse caso, a fórmula ficaria bem mais simples, pois o determinante da matriz de covariância é nada mais do que a variância!

\begin{equation}
  p_c(x) = \frac{1}{sqrt(2\pi) std_c} e^{-(\frac{(x - M_c)^2}{var_c})}
\end{equation}

Agora que entendemos um pouco do que é o Naive Bayes, podemos usá-lo! (Sim, só podemos usá-lo porque entendemos ele. Um bom cientista de dados nunca usa uma técnica que ele não entende).

O scikit-learn traz 3 implementações de Naive Bayes: Gaussiana, Bernoulli e Multinomial. Consegue identificar a diferença entre elas?

In [None]:
from sklearn.naive_bayes import GaussianNB
from sklearn.naive_bayes import BernoulliNB
from sklearn.naive_bayes import MultinomialNB

In [None]:
gaussian_model = GaussianNB()
bernoulli_model = BernoulliNB()
multinomial_model = MultinomialNB()

Vamos experimentar os modelos com o dataset de vinhos e ver como eles se comportam! Lembrando:

- Para treinar um modelo, usamos o método *fit*
- Para computar o desempenho, utilizamos o método *score*

In [None]:
values = data.values

X = values[:, :-1]
Y = values[:, -1]

gaussian_model.fit(X, Y)
bernoulli_model.fit(X, Y)
multinomial_model.fit(X, Y)

print("""
  Score do modelo Gaussiano: {}
  Score do modelo Bernouli: {}
  Score do modelo Multinomial: {}
""".format(
    gaussian_model.score(X, Y),
    bernoulli_model.score(X, Y),
    multinomial_model.score(X, Y)
))


  Score do modelo Gaussiano: 0.4495712535728869
  Score do modelo Bernouli: 0.44875459371171905
  Score do modelo Multinomial: 0.3999591670069416

