<a href="https://colab.research.google.com/github/eutiagovski/projetos-cursos/blob/main/datascience-mentorama/11_implementando_Modelos_exercicio.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [5]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_friedman1, make_classification

In [3]:
# Funções dos exercícios

def getData():
  X, y = make_friedman1(n_samples=10000, n_features=5, noise=0.5, random_state=0)
  return X, y

def getData2():
  X, y = make_classification(n_classes=2, n_features=5, n_samples=10000, random_state=0)
  return X, y

# Classe regressão linear criada em aula

class regLinear():
  def __init__(self, learning_rate, num_steps):
    self.learning_rate = learning_rate
    self.num_steps = num_steps

  def fit(self, X, y):
    y = y.reshape(-1, 1)
    m = X.shape[0]
    k = X.shape[1]
    theta = np.random.randn(k + 1, 1)
    X_b = np.c_[np.ones((m, 1)), X]

    for step in range(self.num_steps):
      gradients = 2/m * X_b.T.dot(X_b.dot(theta) - y)
      theta = theta - self.learning_rate * gradients

    self.final_theta = theta
    print('Model trained')

  def predict(self, X):
    m = X.shape[0]
    X_b = np.c_[np.ones((m, 1)), X]
    preds = X_b.dot(self.final_theta)
    return preds.reshape(-1,)

## Exercício 1: Regressão Linear:

###Parte 1

1- Usando a função getData(), carregue os dados disponibilizados.

2- Separe parte dos dados para o dataset de teste.

3- Usando a metodologia de validação cruzada, teste diferentes parâmetros da regLinear - diferentes learning_rates e num_steps - para escolher a melhor combinação de parâmetros.

4- Implemente a regressão linear do scikit-learn e compare os resultados obtidos.


In [None]:
# Carregando os dados do exercicio

X, y = getData()
X.shape, y.shape

((10000, 5), (10000,))

In [None]:
# Seprando os dados em treino e teste

from sklearn.model_selection import train_test_split

Xtrain, Xtest, ytrain, ytest = train_test_split(X, y)

In [None]:
Xtrain.shape, Xtest.shape

((7500, 5), (2500, 5))

In [None]:
## teste de caso:

lin_reg = regLinear(num_steps=1, learning_rate=0.25)
lin_reg.fit(Xtrain, ytrain)

ypred = lin_reg.predict(Xtest)
ypred, ytest

Model trained


(array([[21.7352223 ],
        [23.76019688],
        [14.22755497],
        ...,
        [17.6200454 ],
        [12.85730272],
        [13.1970261 ]]),
 array([23.35886272, 22.42612662, 15.12363749, ..., 18.74407227,
         3.99198614, 14.87770943]))

In [None]:
from sklearn.metrics import mean_squared_error

lin_mse = mean_squared_error(ytest, ypred)
lin_mse

19.86231329318883

In [None]:
steps = [1, 3, 5, 10, 100, 200]
rates = [0.0025, 0.025, 0.5, 0.75, 1, 1.25]

for rate in rates:
 for step in steps:
    lin_reg = regLinear(num_steps=step, learning_rate=rate)
    lin_reg.fit(Xtrain, ytrain)

    ypred = lin_reg.predict(Xtrain)
    lin_mse = mean_squared_error(ytrain, ypred)

    print(f'Step: {step}')
    print(f'Learning Rate: {rate}')
    print(f'Rmse: {np.sqrt(np.mean(np.square(ytrain - ypred)))}')
    print()        

Model trained
Step: 1
Learning Rate: 0.0025
Rmse: 17.267002513132766

Model trained
Step: 3
Learning Rate: 0.0025
Rmse: 14.700402747769225

Model trained
Step: 5
Learning Rate: 0.0025
Rmse: 12.896831453597956

Model trained
Step: 10
Learning Rate: 0.0025
Rmse: 12.776868710511684

Model trained
Step: 100
Learning Rate: 0.0025
Rmse: 6.797140035879628

Model trained
Step: 200
Learning Rate: 0.0025
Rmse: 5.49992245261138

Model trained
Step: 1
Learning Rate: 0.025
Rmse: 13.901535634897423

Model trained
Step: 3
Learning Rate: 0.025
Rmse: 12.091275418341242

Model trained
Step: 5
Learning Rate: 0.025
Rmse: 8.211172147975931

Model trained
Step: 10
Learning Rate: 0.025
Rmse: 5.968832965826814

Model trained
Step: 100
Learning Rate: 0.025
Rmse: 5.497174186769912

Model trained
Step: 200
Learning Rate: 0.025
Rmse: 5.812730564386747

Model trained
Step: 1
Learning Rate: 0.5
Rmse: 19.176848468223646

Model trained
Step: 3
Learning Rate: 0.5
Rmse: 29.781058340835646

Model trained
Step: 5
Learnin

In [None]:
## Melhores parâmetros para o modelo

lin_reg = regLinear(num_steps=200, learning_rate=0.75)

lin_reg.fit(Xtrain, ytrain)
ypred = lin_reg.predict(Xtrain)

print(f'MSE: {mean_squared_error(ytrain, ypred)}')
print(f'RMSE: {np.sqrt(np.mean(np.square(ytrain - ypred)))}')


Model trained
MSE: 3.660129814671154e+156
RMSE: 1.913146574277867e+78


In [None]:
# Comparando com o modelo do sklearn

from sklearn.linear_model import LinearRegression

lin_reg = LinearRegression()

lin_reg.fit(Xtrain, ytrain)

ypred = lin_reg.predict(Xtrain)

print(f'MSE: {mean_squared_error(ytrain, ypred)}')
print(f'RMSE: {np.sqrt(np.mean(np.square(ytrain - ypred)))}')

MSE: 6.049563628914155
RMSE: 2.459586068612797


In [None]:
# Validando no conjunto de testes:

lin_reg = regLinear(num_steps=200, learning_rate=0.75)

lin_reg.fit(Xtest, ytest)
ypred = lin_reg.predict(Xtest)

print(f'MSE: {mean_squared_error(ytest, ypred)}')
print(f'RMSE: {np.sqrt(np.mean(np.square(ytest - ypred)))}')

Model trained
MSE: 1.123770385255462e+157
RMSE: 3.352268463675696e+78


###Parte 2
Introdução__

Para cada variável explicativa $X_1, .., X_5$, crie outras variáveis usando o __quadrado__ de cada um delas. Desta forma, o conjunto final será de 10 variáveis, em que:

$X_6 = (X_1)^{2}$, $X_7 = (X_2)^{2}$, $X_8 = (X_3)^{2}$, $X_9 = (X_4)^{2}$, $X_{10} = (X_5)^{2}$.

Ao treinarmos uma regressão linear com essas 10 variáveis, a predição é da forma:

$y_{pred} = \theta_0 + \theta_1 \cdot X_1 + .. + \theta_5 \cdot X_5 + \theta_6 \cdot (X_1)^{2} + .. + \theta_{10} \cdot (X_5)^{2}$

Como estamos usando o quadrado das variáveis explicativas, dizemos que temos um __modelo de regressão polinomial de grau 2__. Podemos ter variações deste modelo:

-Podemos aumentar o grau: basta mudar a potência que elevamos as variáveis. Por exemplo, podemos incluir o __cubo__ das variáveis e termos um modelo polinomial de ordem 3.

-Podemos ter __interações__ entre as variáveis: multiplicações entre as variáveis.

Exemplo:

$y_{pred} = \theta_0 + \theta_1 \cdot X_1 + .. + \theta_5 \cdot X_5 + \theta_6 \cdot (X_1)^{2} + .. + \theta_{10} \cdot (X_5)^{2} + \theta_{11} \cdot (X_1)^{3} + \theta_{12} \cdot V1 + \theta_{13} \cdot V2$,

onde

$V_1 = X_1 \cdot X_2$ e $V_2 = (X_2)^{2} \cdot X_4$

Exercício__

1- Estude o link:
https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.PolynomialFeatures.html

em que é discutido como criar modelos polinomiais com o scikit-learn de forma detalhada.

2- Repita os passos da primeira parte, mas agora considerando polinômios de graus 2 ou mais.

3- Inclua regularização Ridge e Lasso nas análises e teste os resultados para diferentes parâmetros $\alpha$.

<br>

In [None]:
# realizando o quadrado das variáveis X

from sklearn.preprocessing import PolynomialFeatures

poly_features = PolynomialFeatures(degree=2, include_bias=False)
X_poly = poly_features.fit_transform(Xtrain)
X_poly.shape


(7500, 20)

In [None]:
X_poly[:10]

array([[6.54137349e-01, 4.39491629e-01, 7.42546626e-01, 5.84881233e-02,
        2.31813099e-02, 4.27895671e-01, 2.87487889e-01, 4.85727481e-01,
        3.82592659e-02, 1.51637606e-02, 1.93152892e-01, 3.26343027e-01,
        2.57050406e-02, 1.01879916e-02, 5.51375492e-01, 4.34301587e-02,
        1.72132034e-02, 3.42086057e-03, 1.35583131e-03, 5.37373126e-04],
       [8.45075242e-01, 1.67746108e-01, 1.03945363e-01, 6.11141276e-01,
        4.95987227e-02, 7.14152164e-01, 1.41758083e-01, 8.78416527e-02,
        5.16460362e-01, 4.19146526e-02, 2.81387567e-02, 1.74364301e-02,
        1.02516570e-01, 8.31999269e-03, 1.08046385e-02, 6.35253017e-02,
        5.15555723e-03, 3.73493659e-01, 3.03118267e-02, 2.46003329e-03],
       [4.46516825e-01, 9.86511266e-01, 1.50774754e-01, 4.42882415e-01,
        3.74080187e-01, 1.99377275e-01, 4.40493879e-01, 6.73234643e-02,
        1.97754450e-01, 1.67033098e-01, 9.73204477e-01, 1.48740993e-01,
        4.36908492e-01, 3.69034319e-01, 2.27330263e-02, 6.6775

In [None]:
# testando no modelo de rgressão do sklearn

poly_fit = LinearRegression()
poly_fit.fit(X_poly, ytrain)

y_new = poly_fit.predict(X_poly)

print(f'MSE: {mean_squared_error(ytrain, y_new)}')
print(f'RMSE: {np.sqrt(np.mean(np.square(ytrain - y_new)))}')

MSE: 1.981269729714431
RMSE: 1.4075758344453173


In [None]:
# testando no modelo consolidado de regressão linear

poly_fit = regLinear(num_steps=200, learning_rate=0.75)
poly_fit.fit(X_poly, ytrain)

y_new = poly_fit.predict(X_poly)

print(f'MSE: {mean_squared_error(ytrain, y_new)}')
print(f'RMSE: {np.sqrt(np.mean(np.square(ytrain - y_new)))}')

Model trained
MSE: 1.2643995787615632e+261
RMSE: 3.555839674059513e+130


In [None]:
## procurando o melhor parâmetro

steps = [1, 3, 5, 10, 100, 200]
rates = [0.0025, 0.025, 0.5, 0.75, 1, 1.25]

for rate in rates:
  for step in steps:
      lin_reg = regLinear(num_steps=step, learning_rate=rate)
      lin_reg.fit(X_poly, ytrain)

      ypred = lin_reg.predict(X_poly)
      lin_mse = mean_squared_error(ytrain, ypred)

      print(f'Step: {step}')
      print(f'Learning Rate: {rate}')
      print(f'Mse: {mean_squared_error(ytrain, ypred)}')
      print(f'Rmse: {np.sqrt(np.mean(np.square(ytrain - ypred)))}')
      print()

Model trained
Step: 1
Learning Rate: 0.0025
Mse: 136.35282913205108
Rmse: 12.078276880261233

Model trained
Step: 3
Learning Rate: 0.0025
Mse: 213.59190962484732
Rmse: 14.582112829809118

Model trained
Step: 5
Learning Rate: 0.0025
Mse: 243.49642300276741
Rmse: 15.636452359827157

Model trained
Step: 10
Learning Rate: 0.0025
Mse: 111.7502626194361
Rmse: 10.796282283593575

Model trained
Step: 100
Learning Rate: 0.0025
Mse: 15.466497902566148
Rmse: 6.267989177161196

Model trained
Step: 200
Learning Rate: 0.0025
Mse: 10.060774860939256
Rmse: 6.112374822660034

Model trained
Step: 1
Learning Rate: 0.025
Mse: 196.70927057192327
Rmse: 14.139060569644794

Model trained
Step: 3
Learning Rate: 0.025
Mse: 69.56476913360258
Rmse: 9.177761822089979

Model trained
Step: 5
Learning Rate: 0.025
Mse: 33.07099024860825
Rmse: 7.115695187077639

Model trained
Step: 10
Learning Rate: 0.025
Mse: 14.14720238921846
Rmse: 6.2679674777810686

Model trained
Step: 100
Learning Rate: 0.025
Mse: 7.67112600027559

  output_errors = np.average((y_true - y_pred) ** 2, axis=0, weights=sample_weight)
  output_errors = np.average((y_true - y_pred) ** 2, axis=0, weights=sample_weight)
  from ipykernel import kernelapp as app


Rmse: inf

Model trained
Step: 1
Learning Rate: 1.25
Mse: 17336.499128893694
Rmse: 132.68242621209671

Model trained
Step: 3
Learning Rate: 1.25
Mse: 76694930.70490895
Rmse: 8758.476458962674

Model trained
Step: 5
Learning Rate: 1.25
Mse: 152801706868.71603
Rmse: 390899.49939528387

Model trained
Step: 10
Learning Rate: 1.25
Mse: 3.469829720859526e+20
Rmse: 18627478950.51925

Model trained
Step: 100
Learning Rate: 1.25
Mse: 2.425117191909739e+183
Rmse: 4.924547889816625e+91

Model trained
Step: 200
Learning Rate: 1.25
Mse: inf


  output_errors = np.average((y_true - y_pred) ** 2, axis=0, weights=sample_weight)
  output_errors = np.average((y_true - y_pred) ** 2, axis=0, weights=sample_weight)


Rmse: inf



  from ipykernel import kernelapp as app


In [None]:
# Consolidando os melhores parâmetros: 

lin_reg_poly_best = regLinear(learning_rate=0.5, num_steps=100)
lin_reg_poly_best.fit(X_poly, ytrain)

y_new = lin_reg_poly_best.predict(X_poly)

print(f'MSE: {mean_squared_error(ytrain, y_new)}')
print(f'RMSE: {np.sqrt(np.mean(np.square(ytrain - y_new)))}')

Model trained
MSE: 1.1997127881378082e+86
RMSE: 1.0953140134855431e+43


In [None]:
# Validando no conjunto de testes:

poly_features = PolynomialFeatures(degree=2, include_bias=False)
X_poly_test = poly_features.fit_transform(Xtest)

lin_reg_poly_best = regLinear(learning_rate=0.5, num_steps=100)
lin_reg_poly_best.fit(X_poly_test, ytest)

y_new = lin_reg_poly_best.predict(X_poly_test)

print(f'MSE: {mean_squared_error(ytest, y_new)}')
print(f'RMSE: {np.sqrt(np.mean(np.square(ytest - y_new)))}')

Model trained
MSE: 1.5988566509071223e+87
RMSE: 3.998570558220926e+43


In [None]:
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline

def polyFit(X, y, grau):
  poylbig_features = PolynomialFeatures(degree=grau, include_bias=False)
  std_scaler = StandardScaler()
  lin_reg = LinearRegression()

  polynomial_regressor = Pipeline([
                                   ('poly_features', poylbig_features),
                                   ('std_scaler', std_scaler),
                                   ('lin_reg', lin_reg),
  ])

  polynomial_regressor.fit(X, y)
  return polynomial_regressor

In [None]:
for grau in [1,2,5,10]:
  print()

  polyfit = polyFit(Xtrain, ytrain, grau)

  ypoly = polyfit.predict(Xtrain)

  print(f'Grau: {grau}')
  print(f'RMSE: {np.sqrt(np.mean(np.square(ytrain - ypoly)))}')
  print('-' * 60)


Grau: 1
RMSE: 2.4595860686127966
------------------------------------------------------------

Grau: 2
RMSE: 1.4075758344453173
------------------------------------------------------------

Grau: 5
RMSE: 0.4931481456953378
------------------------------------------------------------

Grau: 10
RMSE: 0.38775558354102774
------------------------------------------------------------


In [None]:
from sklearn.linear_model import Ridge, Lasso

def polyFitReg(X, y, grau, base_model, base_model_name):
  poylbig_features = PolynomialFeatures(degree=grau, include_bias=False)
  std_scaler = StandardScaler()
  basemodel = base_model

  polynomial_regressor = Pipeline([
                                   ('poly_features', poylbig_features),
                                   ('std_scaler', std_scaler),
                                   (base_model_name, basemodel),
  ])

  polynomial_regressor.fit(X, y)
  return polynomial_regressor

grau = 10

for alpha in [0, 0.001, 0.01, 1, 10, 100, 10000]:
  model_name = 'Ridge_alpha: ' + str(alpha)
  polyfit = polyFitReg(Xtrain,
                       ytrain, 
                       grau, 
                       base_model = Ridge(alpha=alpha),
                       base_model_name = model_name)
  
  ypoly_novo = polyfit.predict(Xtest)

  print(model_name)

  train_error = np.sqrt(np.mean(np.square(ytrain - polyfit.predict(Xtrain))))
  test_error = np.sqrt(np.mean(np.square(ytest - polyfit.predict(Xtest))))

  print(f'RMSE (treino): {train_error}')
  print(f'RMSE (teste): {test_error}')

  print(f'{train_error-test_error*-1}')

  print('-' * 60)

  return linalg.solve(A, Xy, sym_pos=True, overwrite_a=True).T


Ridge_alpha: 0
RMSE (treino): 0.3877723443117173
RMSE (teste): 1.1295117927592704
1.5172841370709877
------------------------------------------------------------
Ridge_alpha: 0.001
RMSE (treino): 0.45158128325835956
RMSE (teste): 0.5608284863667962
1.0124097696251557
------------------------------------------------------------
Ridge_alpha: 0.01
RMSE (treino): 0.4634325230674249
RMSE (teste): 0.537798688828515
1.0012312118959399
------------------------------------------------------------
Ridge_alpha: 1
RMSE (treino): 0.48206007653380534
RMSE (teste): 0.5159412952316388
0.9980013717654441
------------------------------------------------------------
Ridge_alpha: 10
RMSE (treino): 0.5025495738487654
RMSE (teste): 0.5268198234202932
1.0293693972690585
------------------------------------------------------------
Ridge_alpha: 100
RMSE (treino): 0.6044552098858843
RMSE (teste): 0.6231022567574963
1.2275574666433806
------------------------------------------------------------
Ridge_alpha: 1000

## Exercício 2: Regressão Logística:


###Parte 1

Crie uma classe regLogistica para treinar o modelo de regressão logística. Essa classe deve ser usada para problemas de classificação binária, cuja variável target assume os valores: 0 (classe negativa) e 1 (classe positiva).

O método construtor dessa classe deve possuir 3 parâmetros: learning_rate, num_steps e limiar.

Os outros médotos devem ser:

    - médoto fit: para treinar o modelo - usando gradient descent
    
    - médoto predict_proba: para retornar a probabilidade da classe 1
    
    - médoto predict: retornar a classe predita: 0 ou 1 - dependente do limiar
    


In [200]:
class regLogistica:
  def __init__(self, learning_rate=0.01, num_steps=10, limiar=0.8, info=False):
    self.learning_rate = learning_rate
    self.num_steps = num_steps
    self.limiar = limiar
    self.info = info

  def fit(self, X, y):
    y = y.reshape(-1, 1)
    # treinar o modelo segundo o método gradient descent
    
    # primeiro reshape os dados para o formato adequado
    X_b = np.c_[np.ones(X.shape[0]), X]
    theta = np.random.randn(X_b.shape[1], 1)

    for step in range(self.num_steps):
      #Calculando a probabilidade
      yscores = (1 / (1 + np.exp(-X_b.dot(theta))))

      #Calculando o gradiente do logloss
      gradient = X_b.T.dot(yscores -y)

      #Atualizando os pesos
      theta = theta - self.learning_rate * gradient

      #Calculando o logloss nos passos
      self.logloss_step = ((y * np.log(yscores) + (1 - y) * np.log(1 - yscores)).mean() * -1)

      #Printa as informações
      if self.info:
        print(f'Step: {step}')
        print(f'Theta: {theta.reshape(-1,)}')
        print(f'LogLoss: {self.logloss_step}')
        print()
        print('Model Trained!')
        print('-' * 60)
        print()

      self.theta_final = theta

  def predict(self, X):
    self.ypred = np.where(X > self.limiar, 1, 0)
    return self.ypred
  
  def predict_proba(self, X):
    m = X.shape[0]
    X_b = np.c_[np.ones((m, 1)), X]
    probs = (X_b.dot(self.theta_final))
    return probs


###Parte 2

Usando a função getData2(), carregue o dataset disponibilizado.

Use a regLogistica, classe criada na parte 1 do exercício, para treinar modelos nestes dados. Use validação cruzada para seleção dos parâmetros. Considere diferentes métricas de classificação e justifique as escolhas.


In [201]:
X, y = getData2()
X, y

(array([[-0.82380715, -0.59163837,  0.13041933, -0.40345475,  1.16360785],
        [ 0.7091986 ,  0.60606127, -0.37678226,  0.39654936, -1.15961369],
        [ 1.61194498,  0.36486859,  1.91264129,  0.38601731, -0.31972146],
        ...,
        [ 1.38015938,  1.43125078, -1.42179351,  0.89985272, -0.70967569],
        [-1.63030207, -0.23544436, -2.29968645, -0.32243952, -1.49535664],
        [ 1.07627839,  1.178116  , -1.27826779,  0.73327205, -1.27906183]]),
 array([0, 1, 1, ..., 0, 0, 0]))

In [202]:
reg_log = regLogistica(learning_rate=0.0001, num_steps=10, limiar=0.9, info=True)
reg_log.fit(X, y)


Step: 0
Theta: [ 1.6368773   0.55419767  0.65963913  0.53650266 -1.96427246  2.30179449]
LogLoss: 1.4392522469901945

Model Trained!
------------------------------------------------------------

Step: 1
Theta: [ 1.43346586  0.53159113  0.52498652  0.86388505 -2.03560216  2.03639424]
LogLoss: 1.086414380811293

Model Trained!
------------------------------------------------------------

Step: 2
Theta: [ 1.25557412  0.52542086  0.4403946   1.08405542 -2.07941488  1.79777894]
LogLoss: 0.8714401593430686

Model Trained!
------------------------------------------------------------

Step: 3
Theta: [ 1.10107583  0.52890078  0.38906493  1.23069582 -2.10510221  1.58785682]
LogLoss: 0.7406615580116819

Model Trained!
------------------------------------------------------------

Step: 4
Theta: [ 0.96589911  0.53678385  0.35845947  1.32861716 -2.11969646  1.40353153]
LogLoss: 0.6563525785354732

Model Trained!
------------------------------------------------------------

Step: 5
Theta: [ 0.8466067

In [203]:
probs = reg_log.predict_proba(X)


In [204]:
ypred = reg_log.predict(probs)

In [205]:
ypred.shape, y.shape

((10000, 1), (10000,))

In [206]:
from sklearn.metrics import roc_auc_score

def auc_score(y, ypred):
  accuracy = roc_auc_score(y, ypred)
  print(f'Precisão: {(accuracy *100).round(2)}%')
  return accuracy

In [207]:
auc_score(y, ypred)

Precisão: 80.24%


0.8024330755036286

In [208]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y)

In [209]:
def search_best(X_train, y_train, steps, rates):
  for rate in rates:
    for step in steps:
        reg_log = regLogistica(learning_rate=rate, num_steps=step, limiar=0.8)
        reg_log.fit(X_train, y_train)

        ypred = reg_log.predict(reg_log.predict_proba(X_train))
        score = reg_log.logloss_step

        print(f'Steps: {step}')
        print(f'Learning Rate: {rate}')
        print(f'LogLoss: {score}')
        print()  

In [210]:
steps = [2, 8, 16, 20]
rates = [0.0001, 0.0005, 0.001, 0.005]

search_best(X_train, y_train, steps=steps, rates=rates)

Steps: 2
Learning Rate: 0.0001
LogLoss: 2.075086670841956

Steps: 8
Learning Rate: 0.0001
LogLoss: 0.46352867279704374

Steps: 16
Learning Rate: 0.0001
LogLoss: 0.4087133914178898

Steps: 20
Learning Rate: 0.0001
LogLoss: 0.40324484806727995

Steps: 2
Learning Rate: 0.0005
LogLoss: 0.46608173628076827

Steps: 8
Learning Rate: 0.0005
LogLoss: 0.4025219485870364

Steps: 16
Learning Rate: 0.0005
LogLoss: 0.40251871192896804

Steps: 20
Learning Rate: 0.0005
LogLoss: 0.402518617859431

Steps: 2
Learning Rate: 0.001
LogLoss: 0.6361259037496491

Steps: 8
Learning Rate: 0.001
LogLoss: 0.4025294542887039

Steps: 16
Learning Rate: 0.001
LogLoss: 0.40251862101588926

Steps: 20
Learning Rate: 0.001
LogLoss: 0.40251863035387037

Steps: 2
Learning Rate: 0.005
LogLoss: nan

Steps: 8
Learning Rate: 0.005
LogLoss: nan

Steps: 16
Learning Rate: 0.005
LogLoss: nan

Steps: 20
Learning Rate: 0.005
LogLoss: nan





In [211]:
reg_log = regLogistica(learning_rate=0.0005, num_steps=20, limiar=0.9, info=True)
reg_log.fit(X_train, y_train)

Step: 0
Theta: [ 0.10348466 -0.37595466 -1.13612361  2.17649411  0.28787232  0.44417244]
LogLoss: 2.0982811623026487

Model Trained!
------------------------------------------------------------

Step: 1
Theta: [-0.0198917  -0.12127501 -0.90496857  2.00423798  0.43715305  0.28461909]
LogLoss: 0.475258960666341

Model Trained!
------------------------------------------------------------

Step: 2
Theta: [-0.08285961 -0.01043048 -0.77283717  1.84306239  0.51816689  0.16010748]
LogLoss: 0.42947712293708

Model Trained!
------------------------------------------------------------

Step: 3
Theta: [-0.10276518  0.01148275 -0.71376956  1.7211102   0.5509479   0.07305722]
LogLoss: 0.4123800530338971

Model Trained!
------------------------------------------------------------

Step: 4
Theta: [-1.03604113e-01  1.17184769e-03 -6.89863872e-01  1.63712553e+00
  5.61831110e-01  1.88002299e-02]
LogLoss: 0.40641306547927214

Model Trained!
------------------------------------------------------------

St

In [212]:
probs = reg_log.predict_proba(X_train)
ypred = reg_log.predict(probs)

In [213]:
auc_score(y_train, ypred)


Precisão: 80.6%


0.8060482688119472

In [214]:
# Validando o modelo

reg_log = regLogistica(learning_rate=0.0005, num_steps=20, limiar=0.9, info=False)
reg_log.fit(X_test, y_test)

In [216]:
test_probs = reg_log.predict_proba(X_test)
test_predict = reg_log.predict(test_probs)

In [217]:
auc_score(y_test, test_predict)

Precisão: 78.04%


0.7803793388424461