## Indice

[Bibliotecas necessárias](#Bibliotecas-necess%C3%A1rias)<br>
[Carregar os dados](#Carregar-os-dados)<br>
[Normalizar os dados](#Normalizar-os-dados)<br>
[Separar dados em treino e teste](#Separar-dados-em-treino-e-teste)<br>
[Regressão Logística](#Regress%C3%A3o-Log%C3%ADstica)<br>
[Regressão Logística com regularização L1](#Regress%C3%A3o-Log%C3%ADstica-com-regulariza%C3%A7%C3%A3o-L1)<br>
[Alterando os valores de aprendizagem e o lambda](#Alterando-os-valores-de-aprendizagem-e-o-lambda)<br>
[Regressão logística sem regularização](#Regress%C3%A3o-Log%C3%ADstica-sem-regulariza%C3%A7%C3%A3o)<br>
[Scikit-learn](#Scikit-learn)<br>
[Gráficos](#Gr%C3%A1ficos)

### Bibliotecas necessárias

In [None]:
import numpy as np
import pandas as pd

### Carregar os dados

Nessa etapa é feito o carregamento dos dados e tratamento para uso no modelo de regressão logística

In [None]:
df = pd.read_csv('data/wdbc.data', header=None)

In [None]:
df[1].value_counts()

In [None]:
df[1].replace(["B","M"], [0,1],inplace=True)

In [None]:
y = np.array(df[1])

In [None]:
df.drop(columns=[0,1], inplace=True)

### Normalizar os dados
Normalização com média 0 e variância unitária

In [None]:
norm_df=(df-df.mean())/df.std()

In [None]:
X = np.array(norm_df)

In [None]:
norm_df[32] = y

### Separar dados em treino e teste

In [None]:
# Definir quantos % do dataset será usado para treino
train=norm_df.sample(frac=0.7,random_state=np.random.RandomState())
test=norm_df.drop(train.index)

X_train = np.array(train.iloc[:,:30])
y_train = np.array(train.iloc[:,30:31]).flatten()

X_test = np.array(test.iloc[:,:30])
y_test = np.array(test.iloc[:,30:31]).flatten()
print("Dataset de treino:", X_train.shape, y_train.shape)
print("Dataset de teste:", X_test.shape, y_test.shape)

## Regressão Logística
Podemos usar regularização no cálculo do gradiente, com a função `logisticRegressionReg`

In [None]:
def addOnes(X):
    ones = np.ones((X.shape[0], 1))
    return np.concatenate((ones, X), axis=1)

def sigmoid(z):
    return 1 / (1 + np.exp(-z))

def costFunction(h, y):
    return (-y * np.log(h) - (1 - y) * np.log(1 - h)).mean()

def costFunctionReg(h, y, lambda_, weights):
    return (-y * np.log(h) - (1 - y) * np.log(1 - h)).mean() + (lambda_*(weights**2)).mean()/2

def logisticRegressionReg(X, y, weights, num_iter, alfa, lambda_):

    #weights = np.zeros(X.shape[1])
    J = []
    grad = []
    
    for i in range(num_iter):
        z = np.dot(X, weights)
        h = sigmoid(z)
        
        J.append(costFunctionReg(h, y, lambda_, weights))
        print("Epoca: {} - Custo: {}".format(i+1,J[i]))
        
        gradient = np.dot(X.T, (h - y)) / y.size
        
        grad.append(gradient)
        
        regularized = lambda_*weights / y.size
        if i == 0:
            regularized = 0
        weights -= alfa * (gradient + regularized)
    return weights, J, grad

def logisticRegression(X, y, num_iter, alfa, lambda_):

    weights = np.zeros(X.shape[1])
    J = []
    grad = []

    for i in range(num_iter):
        z = np.dot(X, weights)
        h = sigmoid(z)
        J.append(costFunction(h, y))
        print("Epoca: {} - Custo: {}".format(i+1,J[i]))
        gradient = np.dot(X.T, (h - y)) / y.size
        grad.append(gradient)
        weights -= alfa * gradient
    return weights, J, grad

def predict_prob(X, weights):  
    return sigmoid(np.dot(X, weights))

### Regressão Logística com regularização L1

In [None]:
learning_rate=0.005
lambda_=0.07
num_iter=100

weights = np.zeros(shape=X_train.shape[1]+1)

w, J, grad = logisticRegressionReg(addOnes(X_train), y_train, weights, num_iter, learning_rate, lambda_)
pred = np.around(predict_prob(addOnes(X_test), w).flatten())
(y_test==pred).mean()

#### Plotar o gráfico
[Clique aqui](#Gráficos)

### Alterando os valores de aprendizagem e o lambda

In [None]:
learning_rate=0.01
lambda_=0.01
num_iter=10
weights = np.zeros(shape=X_train.shape[1]+1)

w, J, grad = logisticRegressionReg(addOnes(X_train), y_train, weights, num_iter, learning_rate, lambda_)
pred = np.around(predict_prob(addOnes(X_test), w).flatten())
(y_test==pred).mean()

#### Plotar o gráfico
[Clique aqui](#Gráficos)

### Regressão Logística sem regularização


In [None]:
learning_rate=0.7
lambda_=0.9
num_iter=50

w,J,grad = logisticRegression(addOnes(X_train), y_train, num_iter, learning_rate, lambda_)
pred = np.around(predict_prob(addOnes(X_test), w).flatten())
(y_test==pred).mean()

#### Plotar o gráfico
[Clique aqui](#Gráficos)

## Scikit-learn

Podemos mostrar a acurácia da biblioteca do [Scikit-Learn](https://scikit-learn.org/) com o modelo linear [LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html)

In [None]:
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler

In [None]:
#X, y = make_classification(random_state=0)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42, train_size=0.8)
pipe = make_pipeline(StandardScaler(), LogisticRegression(max_iter=100))
pipe.fit(X_train, y_train)  

In [None]:
pipe.score(X_test, y_test)

## Outra forma de uso
Forma mais linear, normalizando os dados e aplicado a regressão logística sob os dados de treino.

In [None]:
from sklearn.linear_model import LogisticRegression
from sklearn import preprocessing

In [None]:
scaler = preprocessing.StandardScaler().fit(X_train)
X_scaled = scaler.transform(X_train)
X_scl = scaler.transform(X_test)

clf = LogisticRegression(random_state=42).fit(X_scaled, y_train)
clf.predict(X_scl)
clf.score(X_scaled, y_train)

## Gráficos

In [None]:
from plotly.subplots import make_subplots
import plotly.graph_objects as go

In [None]:
fig = make_subplots(rows=3, cols=1)
fig.add_trace(go.Scatter(x=np.arange(1,len(J)+1), y=J,
                    mode='lines+markers',
                    name='costFunction'),row=1, col=1)

fig.add_trace(go.Scatter(x=np.arange(1,len(grad[0])+1), y=grad[0],
                    mode='lines+markers',
                    name='initialGradient'),row=2, col=1)

fig.add_trace(go.Scatter(x=np.arange(1,len(grad[-1])+1), y=grad[-1],
                    mode='lines+markers',
                    name='finalGradient'),row=3, col=1)
fig.show()

### Voltar para Indice
[Clique aqui](#Indice)

### Integrantes

Everson Magalhães Cavalcante - 384351 <br>
Lucas da Silva Gouveia - 384363 <br>
Ubiratan de Oliveira Junior - 397322

## Referências

#### Bibliotecas
1. [Numpy](https://numpy.org/doc/1.19/)
2. [Pandas](https://pandas.pydata.org/docs/)
3. [Markdown](https://daringfireball.net/projects/markdown/basics)
4. [Plotly](https://plotly.com/python/)

    4.1. [LineChart](https://plotly.com/python/line-charts/)
    
    
5. [Scikit-Learn](https://scikit-learn.org)

    5.1. [Logistic Regression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html)