Os Modelos Lineares Generalizados (GLM, do inglês Generalized Linear Models) são uma extensão dos modelos lineares que permitem que a variável dependente tenha uma distribuição diferente da normal.

In [1]:
import numpy as np
import pandas as pd
import statsmodels.api as sm
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix

Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466
        
  import pandas as pd


# 1. Carregando um conjunto de dados de exemplo (Iris)

In [2]:
data = load_iris()
X = data.data
y = (data.target == 0).astype(int)  # Convertendo para um problema binário (Classe 0 vs. outras)

# 2. Dividindo os dados em conjuntos de treino e teste

In [3]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
                                                    random_state=42)

# 3. Adicionando uma constante para o intercepto

In [4]:
X_train_sm = sm.add_constant(X_train)
X_test_sm = sm.add_constant(X_test)

# 4. Ajustando o modelo GLM (Regressão Logística)

In [5]:
model = sm.GLM(y_train, X_train_sm, family=sm.families.Binomial())
results = model.fit()



# 5. Resumo do modelo

In [7]:
print(results.summary())

                 Generalized Linear Model Regression Results                  
Dep. Variable:                      y   No. Observations:                  120
Model:                            GLM   Df Residuals:                      115
Model Family:                Binomial   Df Model:                            4
Link Function:                  Logit   Scale:                          1.0000
Method:                          IRLS   Log-Likelihood:            -1.2898e-09
Date:                Thu, 31 Oct 2024   Deviance:                   2.5797e-09
Time:                        02:34:13   Pearson chi2:                 1.29e-09
No. Iterations:                    25   Pseudo R-squ. (CS):             0.7200
Covariance Type:            nonrobust                                         
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const        -14.4405   4.96e+05  -2.91e-05      1.0

# 6. Fazendo previsões

In [8]:
y_pred_prob = results.predict(X_test_sm)
y_pred = (y_pred_prob > 0.5).astype(int)

# 7. Avaliando o modelo

In [9]:
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)

print(f"\nAcurácia: {accuracy:.2f}")
print("\nMatriz de Confusão:")
print(conf_matrix)


Acurácia: 1.00

Matriz de Confusão:
[[20  0]
 [ 0 10]]


-----