*A regressão logística é muito usada no processo de concessão de crédito bancário*

In [17]:
import pandas as pd
import numpy as np
import plotly.express as px
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

In [3]:
df = pd.read_csv('iris.csv', index_col=0)
df

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),target,target_name
0,5.1,3.5,1.4,0.2,0,setosa
1,4.9,3.0,1.4,0.2,0,setosa
2,4.7,3.2,1.3,0.2,0,setosa
3,4.6,3.1,1.5,0.2,0,setosa
4,5.0,3.6,1.4,0.2,0,setosa
...,...,...,...,...,...,...
145,6.7,3.0,5.2,2.3,2,virginica
146,6.3,2.5,5.0,1.9,2,virginica
147,6.5,3.0,5.2,2.0,2,virginica
148,6.2,3.4,5.4,2.3,2,virginica


In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 150 entries, 0 to 149
Data columns (total 6 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   sepal length (cm)  150 non-null    float64
 1   sepal width (cm)   150 non-null    float64
 2   petal length (cm)  150 non-null    float64
 3   petal width (cm)   150 non-null    float64
 4   target             150 non-null    int64  
 5   target_name        150 non-null    object 
dtypes: float64(4), int64(1), object(1)
memory usage: 8.2+ KB


## Entendendo os dados

In [7]:
fig = px.scatter_3d(df, x = 'sepal length (cm)', y = 'sepal width (cm)', z = 'petal length (cm)', color = 'target_name')

fig.show()

**Observações**:

- Setosas possuem sépalas mais largas e menos compridas, tendo, também, as pétalas mais curtas
- Versicolors possuem comprimento mediano nas sépalas e pétalas, além de largura curta para mediana nas primeiras
- Virgínicas são as mais compridas, tanto em pétalas quanto em pétalas, com uma largura mediana, que nem as Versicolors

### Modelando

In [11]:
X = df.drop(columns=['target', 'target_name']) #Definindo os parâmetros do modelo sem as variáveis target
y = df[['target']] #É costume colocar o X maiúsculo e o y minúsculo

In [12]:
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size = 0.3, random_state=9)

**Conferindo os datasets**

In [13]:
print(X_train.shape)
print(X_test.shape)
print(y_train.shape)
print(y_test.shape)

(105, 4)
(45, 4)
(105, 1)
(45, 1)


**Construindo o modelo**

In [14]:
clf = LogisticRegression()

**Ajustando o modelo**

In [15]:
clf.fit(X_train, y_train)


A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().



LogisticRegression()

In [20]:
clf.coef_

array([[-0.40223297,  0.80962952, -2.28883408, -0.95999769],
       [ 0.46547011, -0.17327536, -0.21334307, -0.77675674],
       [-0.06323714, -0.63635416,  2.50217716,  1.73675443]])

**Predição**

In [16]:
y_pred = clf.predict(X_test)
y_pred

array([2, 1, 2, 2, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0, 1, 2, 0, 0, 0,
       2, 0, 2, 1, 0, 2, 0, 2, 2, 2, 0, 1, 1, 1, 1, 0, 2, 0, 0, 2, 1, 0,
       2], dtype=int64)

**Verificando acurácia do modelo**

In [19]:
accuracy_score(y_test, y_pred)*100 #A resposta vem calculada de 0-1, então multiplicamos por 100 para visualização intuitiva

100.0

*Conseguimos 100% de acurácia o.o (Lembrando que, provavelmente, isso veio pelo random_state. Sem ele, a acurácia foi de 97.77%)*