### **State University of Campinas - UNICAMP** </br>
**Course**: MC886A </br>
**Professor**: Marcelo da Silva Reis </br>
**TA (PED)**: Marcos Vinicius Souza Freire

---

### **Hands-On: Logistic Regression, Classification Methods, and Resampling Methods**
##### Notebook: 00 Logistic Regression and Classification and Resampling methods

> Dataset from Scikit Learn - [load_breast_cancer](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_breast_cancer.html), based on [Breast Cancer Wisconsin (Diagnostic)](https://archive.ics.uci.edu/dataset/17/breast+cancer+wisconsin+diagnostic)(1993)[1]
---

**Este notebook aborda os seguintes tópicos:**

- **Regressão Logística:** binárias, múltiplas e multinomiais.
- **Métodos de Classificação:** Análise Discriminante Linear (LDA), Análise Discriminante Quadrática (QDA) e Naive Bayes.
- **Métodos de Reamostragem:** Leave-One-Out (LOOCV), Validação Cruzada k-Fold e Bootstrap.

Ao longo do notebook, ilustramos os métodos usando fórmulas, gráficos Plotly interativos para os limites de decisão e células de código bem estruturadas.

---

### **Notação e Fórmulas**

### **1. Regressão Logística Binária**
Usada para **classificação de duas classes** (por exemplo, sim/não, 0/1).

#### **Fórmula**:
A probabilidade $ p $ de uma instância pertencer à classe $ y = 1 $ é modelada usando a **função *sigmoid***:
$
p(y = 1 \mid \mathbf{x}) = \frac{1}{1 + e^{-z}}, \quad \text{onde } z = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \dots + \beta_n x_n
$
- $ \mathbf{x} = [x_1, x_2, \dots, x_n] $: Características de entrada.
- $ \beta_0, \beta_1, \dots, \beta_n $: Coeficientes do modelo.
- **Decisão**: Classificar como $ y = 1 $ se $ p \geq 0,5 $, caso contrário $ y = 0 $.

#### **Logit (Log-Odds)**:
$
\ln\left(\frac{p}{1-p}\right) = \beta_0 + \beta_1 x_1 + \dots + \beta_n x_n
$

Esta equação linear representa a relação entre as características e os log-odds da classe positiva.

</br>

#### **Função de Perda (Perda por Entropia Cruzada)**:
$
\mathcal{L} = -\frac{1}{N} \sum_{i=1}^N \left[ y_i \ln(p_i) + (1-y_i) \ln(1-p_i) \right]
$

</br>

Minimizar esta perda ajusta os coeficientes $ \beta $ via gradiente descendente.

---
</br>

### **2. Regressão Logística Multinomial**
Usada para **classificação multiclasse** (por exemplo, classes A/B/C/D).

#### **Fórmula Principal**:
A probabilidade $ p(y=k \mid \mathbf{x}) $ para a classe $ k $ é modelada usando a **função softmax**:
$
p(y=k \mid \mathbf{x}) = \frac{e^{z_k}}{\sum_{j=1}^K e^{z_j}}, \quad \text{onde } z_k = \beta_{k0} + \beta_{k1} x_1 + \dots + \beta_{kn} x_n
$
- $ K $: Número total de classes.
- $ z_k $: Combinação linear para a classe $ k $.
- **Decisão**: Classificar como a classe com a maior probabilidade $ p(y=k \mid \mathbf{x}) $.

#### **Observações Principais**:
- Uma classe (por exemplo, $ K $) é normalmente tratada como a **categoria de referência**, e seus coeficientes são definidos como zero (por exemplo, $ z_K = 0 $).
- O modelo estima $ K-1 $ conjuntos de coeficientes.

#### **Função de Perda (Entropia Cruzada Generalizada)**:
$
\mathcal{L} = -\frac{1}{N} \sum_{i=1}^N \sum_{k=1}^K y_{ik} \ln(p_{ik})
$
- $ y_{ik} = 1 $ se a observação $ i $ estiver na classe $ k $, caso contrário, 0.
- $ p_{ik} $: Probabilidade prevista de que a observação $ i $ pertença à classe $ k $.

---

#### **Diferenças**
| **Aspecto** | **Binária** | **Multinomial** |
|-------------------------|-------------------------------------|-------------------------------------|
| **Classes** | 2 classes (0/1) | $ K \geq 2 $ classes |
| **Função** | Sigmoide | Softmax |
| **Coeficientes** | Um conjunto ($ \beta_0, \beta_1, \dots $) | Conjuntos $ K-1 $ (um por classe) |

---

### **Exemplos de Aplicação**
1. **Binário**:
- Preveja se um e-mail é spam ($ p \geq 0,5 $) ou não.
- Calcule $ z = 2,5 + 0,8x_1 - 1,2x_2 $, então $ p = \frac{1}{1 + e^{-z}} $.

2. **Multinomial**:
- Classifique uma imagem como "gato", "cachorro" ou "pássaro".

- Para as características $ \mathbf{x} $, calcule as probabilidades:
$
p(\text{gato}) = \frac{e^{z_{\text{gato}}}}{e^{z_{\text{gato}}} + e^{z_{\text{cachorro}}} + e^{z_{\text{pássaro}}}}
$
(Da mesma forma para outras classes.)

Com base nas aulas de Jurafsky & Martin (2025) [2]

---

In [1]:
# Import necessary libraries
import numpy as np
import pandas as pd

# Replace Matplotlib with Plotly for interactive plotting
import plotly.graph_objects as go
import plotly.express as px

from sklearn.datasets import make_classification, load_breast_cancer
from sklearn.model_selection import train_test_split, KFold, LeaveOneOut, cross_val_score, GridSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis, QuadraticDiscriminantAnalysis
from sklearn.naive_bayes import GaussianNB
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.feature_selection import RFE

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader, TensorDataset

import warnings
warnings.filterwarnings('ignore')

# Set seeds for reproducibility
np.random.seed(42)
torch.manual_seed(42)


<torch._C.Generator at 0x7dccf1b3eef0>

#### **Basic exploration of the dataset**

In [2]:
# Let's load the Breast Cancer Dataset from Scikit-Learn
cancer_dataset = load_breast_cancer()

In [3]:
# Keys in dataset
cancer_dataset.keys()

dict_keys(['data', 'target', 'frame', 'target_names', 'DESCR', 'feature_names', 'filename', 'data_module'])

In [4]:
# Malignant or benign value
cancer_dataset['target']

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,
       0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0,
       1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0,
       1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1,
       1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0,
       0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1,
       1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0,
       0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0,
       1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1,
       1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0,

In [5]:
# Target value name malignant or benign tumor
cancer_dataset['target_names']

array(['malignant', 'benign'], dtype='<U9')

In [6]:
# Description of data
print(cancer_dataset['DESCR'])

.. _breast_cancer_dataset:

Breast cancer wisconsin (diagnostic) dataset
--------------------------------------------

**Data Set Characteristics:**

:Number of Instances: 569

:Number of Attributes: 30 numeric, predictive attributes and the class

:Attribute Information:
    - radius (mean of distances from center to points on the perimeter)
    - texture (standard deviation of gray-scale values)
    - perimeter
    - area
    - smoothness (local variation in radius lengths)
    - compactness (perimeter^2 / area - 1.0)
    - concavity (severity of concave portions of the contour)
    - concave points (number of concave portions of the contour)
    - symmetry
    - fractal dimension ("coastline approximation" - 1)

    The mean, standard error, and "worst" or largest (mean of the three
    worst/largest values) of these features were computed for each image,
    resulting in 30 features.  For instance, field 0 is Mean Radius, field
    10 is Radius SE, field 20 is Worst Radius.

    - 

In [7]:
# Name of features
print(cancer_dataset['feature_names'])

['mean radius' 'mean texture' 'mean perimeter' 'mean area'
 'mean smoothness' 'mean compactness' 'mean concavity'
 'mean concave points' 'mean symmetry' 'mean fractal dimension'
 'radius error' 'texture error' 'perimeter error' 'area error'
 'smoothness error' 'compactness error' 'concavity error'
 'concave points error' 'symmetry error' 'fractal dimension error'
 'worst radius' 'worst texture' 'worst perimeter' 'worst area'
 'worst smoothness' 'worst compactness' 'worst concavity'
 'worst concave points' 'worst symmetry' 'worst fractal dimension']


In [8]:
# Create datafrmae
cancer_df = pd.DataFrame(np.c_[cancer_dataset['data'],cancer_dataset['target']],
             columns = np.append(cancer_dataset['feature_names'], ['target']))

In [9]:
# Head of cancer DataFrame
cancer_df.head(6)

Unnamed: 0,mean radius,mean texture,mean perimeter,mean area,mean smoothness,mean compactness,mean concavity,mean concave points,mean symmetry,mean fractal dimension,...,worst texture,worst perimeter,worst area,worst smoothness,worst compactness,worst concavity,worst concave points,worst symmetry,worst fractal dimension,target
0,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,0.2419,0.07871,...,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189,0.0
1,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,0.1812,0.05667,...,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902,0.0
2,19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,0.2069,0.05999,...,25.53,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758,0.0
3,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,0.2597,0.09744,...,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173,0.0
4,20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,0.1809,0.05883,...,16.67,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678,0.0
5,12.45,15.7,82.57,477.1,0.1278,0.17,0.1578,0.08089,0.2087,0.07613,...,23.75,103.4,741.6,0.1791,0.5249,0.5355,0.1741,0.3985,0.1244,0.0


In [10]:
# Tail of cancer DataFrame
cancer_df.tail(6)

Unnamed: 0,mean radius,mean texture,mean perimeter,mean area,mean smoothness,mean compactness,mean concavity,mean concave points,mean symmetry,mean fractal dimension,...,worst texture,worst perimeter,worst area,worst smoothness,worst compactness,worst concavity,worst concave points,worst symmetry,worst fractal dimension,target
563,20.92,25.09,143.0,1347.0,0.1099,0.2236,0.3174,0.1474,0.2149,0.06879,...,29.41,179.1,1819.0,0.1407,0.4186,0.6599,0.2542,0.2929,0.09873,0.0
564,21.56,22.39,142.0,1479.0,0.111,0.1159,0.2439,0.1389,0.1726,0.05623,...,26.4,166.1,2027.0,0.141,0.2113,0.4107,0.2216,0.206,0.07115,0.0
565,20.13,28.25,131.2,1261.0,0.0978,0.1034,0.144,0.09791,0.1752,0.05533,...,38.25,155.0,1731.0,0.1166,0.1922,0.3215,0.1628,0.2572,0.06637,0.0
566,16.6,28.08,108.3,858.1,0.08455,0.1023,0.09251,0.05302,0.159,0.05648,...,34.12,126.7,1124.0,0.1139,0.3094,0.3403,0.1418,0.2218,0.0782,0.0
567,20.6,29.33,140.1,1265.0,0.1178,0.277,0.3514,0.152,0.2397,0.07016,...,39.42,184.6,1821.0,0.165,0.8681,0.9387,0.265,0.4087,0.124,0.0
568,7.76,24.54,47.92,181.0,0.05263,0.04362,0.0,0.0,0.1587,0.05884,...,30.37,59.16,268.6,0.08996,0.06444,0.0,0.0,0.2871,0.07039,1.0


In [11]:
# Information of cancer Dataframe
cancer_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 569 entries, 0 to 568
Data columns (total 31 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   mean radius              569 non-null    float64
 1   mean texture             569 non-null    float64
 2   mean perimeter           569 non-null    float64
 3   mean area                569 non-null    float64
 4   mean smoothness          569 non-null    float64
 5   mean compactness         569 non-null    float64
 6   mean concavity           569 non-null    float64
 7   mean concave points      569 non-null    float64
 8   mean symmetry            569 non-null    float64
 9   mean fractal dimension   569 non-null    float64
 10  radius error             569 non-null    float64
 11  texture error            569 non-null    float64
 12  perimeter error          569 non-null    float64
 13  area error               569 non-null    float64
 14  smoothness error         5

In [12]:
# Numerical distribution of data
cancer_df.describe()

Unnamed: 0,mean radius,mean texture,mean perimeter,mean area,mean smoothness,mean compactness,mean concavity,mean concave points,mean symmetry,mean fractal dimension,...,worst texture,worst perimeter,worst area,worst smoothness,worst compactness,worst concavity,worst concave points,worst symmetry,worst fractal dimension,target
count,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,...,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0
mean,14.127292,19.289649,91.969033,654.889104,0.09636,0.104341,0.088799,0.048919,0.181162,0.062798,...,25.677223,107.261213,880.583128,0.132369,0.254265,0.272188,0.114606,0.290076,0.083946,0.627417
std,3.524049,4.301036,24.298981,351.914129,0.014064,0.052813,0.07972,0.038803,0.027414,0.00706,...,6.146258,33.602542,569.356993,0.022832,0.157336,0.208624,0.065732,0.061867,0.018061,0.483918
min,6.981,9.71,43.79,143.5,0.05263,0.01938,0.0,0.0,0.106,0.04996,...,12.02,50.41,185.2,0.07117,0.02729,0.0,0.0,0.1565,0.05504,0.0
25%,11.7,16.17,75.17,420.3,0.08637,0.06492,0.02956,0.02031,0.1619,0.0577,...,21.08,84.11,515.3,0.1166,0.1472,0.1145,0.06493,0.2504,0.07146,0.0
50%,13.37,18.84,86.24,551.1,0.09587,0.09263,0.06154,0.0335,0.1792,0.06154,...,25.41,97.66,686.5,0.1313,0.2119,0.2267,0.09993,0.2822,0.08004,1.0
75%,15.78,21.8,104.1,782.7,0.1053,0.1304,0.1307,0.074,0.1957,0.06612,...,29.72,125.4,1084.0,0.146,0.3391,0.3829,0.1614,0.3179,0.09208,1.0
max,28.11,39.28,188.5,2501.0,0.1634,0.3454,0.4268,0.2012,0.304,0.09744,...,49.54,251.2,4254.0,0.2226,1.058,1.252,0.291,0.6638,0.2075,1.0


---

### **Funções Auxiliares**

Nesta seção, definimos funções auxiliares para avaliar classificadores e plotar limites de decisão.

**Plotagem de Limites de Decisão com Plotly:**

Criamos uma função (`plot_decision_boundary_plotly`) que plota os limites de decisão usando Plotly. Para um determinado modelo, geramos um *grid* sobre o espaço de características, prevemos as classes para cada ponto do *grid* e, em seguida, plotamos um contorno juntamente com os pontos de dados.

Além disso, implementamos o `evaluate_classifier`, que mostra:

1. **Precisão**
- **O que ela mede**: Precisão é a proporção de previsões que o classificador acertou dentre todas as previsões feitas, ou seja, a proporção de instâncias classificadas corretamente.

- **Fórmula**:
$
\text{Precisão} = \frac{\text{Número de Previsões Corretas}}{\text{Número Total de Previsões}}
$
- **Saída**: Um único número entre 0 e 1 (por exemplo, `0.85` significa que 85% das previsões estavam corretas).
- **Por que é importante**: Ela fornece uma representação rápida do desempenho geral, mas pode ser enganosa se o seu conjunto de dados tiver classes desbalanceadas (por exemplo, 90% de uma classe e 10% de outra).

2. **Matriz de Confusão**
- **O que mostra**: Esta é uma tabela que conta quantas vezes o classificador previu cada classe corretamente ou incorretamente em comparação com os rótulos verdadeiros.

- **Estrutura** (para classificação binária):

| | Positivo Previsto | Negativo Previsto |
|----------------|--------------------|--------------------|
| **Positivo Real** | Positivos Verdadeiros (VP) | Falsos Negativos (FN) |
| **Negativos Reais** | Falsos Positivos (FP) | Verdadeiros Negativos (NV) |

- **Exemplo de Saída**:
```
[[50 5]
[10 35]]
```
Aqui, 50 verdadeiros negativos, 5 falsos positivos, 10 falsos negativos e 35 verdadeiros positivos.
- **Por que é Importante**: Revela os tipos específicos de erros (por exemplo, confundir positivos com negativos), o que é crucial para entender o comportamento do modelo além da mera precisão.

3. **Relatório de Classificação**
- **O que Fornece**: Um resumo detalhado do desempenho de cada classe, incluindo:
- **Precisão**: Quantos dos positivos previstos são realmente positivos.
$
\text{Precisão} = \frac{\text{TP}}{\text{TP} + \text{FP}}
$
- **Recall**: Quantos dos positivos reais foram previstos corretamente.
$
\text{Recall} = \frac{\text{TP}}{\text{TP} + \text{FN}}
$
- **F1 Score**: Uma medida balanceada que combina precisão e recall.
$
\text{Pontuação F1} = 2 \times \frac{\text{Precisão} \times \text{Recall}}{\text{Precisão} + \text{Recall}}
$
- **Suport**: O número de instâncias verdadeiras (número de ocorrências) de cada classe no conjunto de dados.
- **Saída de Exemplo**:
```
precisão recall pontuação f1 suport
0 0.83 0.91 0.87 55
1 0.88 0.78 0.82 45
precisão 0.85 100
```

In [13]:
def plot_decision_boundary_plotly(model, X, y, title="Decision Boundary"):
    """
    Plot the decision boundary using Plotly. Works for both PyTorch and sklearn models.
    """
    x_min, x_max = X[:, 0].min() - 0.1, X[:, 0].max() + 0.1
    y_min, y_max = X[:, 1].min() - 0.1, X[:, 1].max() + 0.1
    xx, yy = np.meshgrid(np.linspace(x_min, x_max, 200),
                         np.linspace(y_min, y_max, 200))

    grid = np.c_[xx.ravel(), yy.ravel()]

    # Prediction logic for PyTorch and scikit-learn models
    if hasattr(model, 'forward'):
        # For PyTorch models: using forward pass
        with torch.no_grad():
            grid_tensor = torch.FloatTensor(grid)
            outputs = model(grid_tensor)
            # Multi-class case
            if outputs.ndim > 1 and outputs.shape[1] > 1:
                Z = np.argmax(outputs.numpy(), axis=1)
            else:
                Z = (outputs.numpy() > 0.5).astype(int).reshape(-1)
    else:
        # For scikit-learn models
        Z = model.predict(grid)

    Z = Z.reshape(xx.shape)

    # Create contour plot with Plotly
    fig = go.Figure()
    fig.add_trace(
        go.Contour(
            x=np.linspace(x_min, x_max, 200),
            y=np.linspace(y_min, y_max, 200),
            z=Z,
            colorscale='Viridis',
            opacity=0.3,
            showscale=False
        )
    )
    # Scatter plot for data points
    fig.add_trace(
        go.Scatter(
            x=X[:, 0],
            y=X[:, 1],
            mode="markers",
            marker=dict(
                color=y,
                colorscale='Viridis',
                line=dict(width=1, color='black')
            )
        )
    )
    fig.update_layout(
        title=title,
        xaxis_title='Feature 1',
        yaxis_title='Feature 2'
    )
    fig.show()


In [14]:
def plot_confusion_matrix(y_true, y_pred, class_names=None, title="Confusion Matrix"):
    """
    Plot confusion matrix using Plotly.

    Parameters:
    -----------
    y_true : array-like
        True labels
    y_pred : array-like
        Predicted labels
    class_names : list, optional
        List of class names
    title : str, optional
        Title for the plot
    """
    cm = confusion_matrix(y_true, y_pred)

    if class_names is None:
        class_names = [f"Class {i}" for i in range(len(np.unique(y_true)))]

    fig = px.imshow(
        cm,
        text_auto=False,
        labels=dict(x="Predicted", y="Actual", color="Count"),
        x=class_names,
        y=class_names,
        color_continuous_scale="Blues"
    )

    fig.update_layout(
        title=title,
        xaxis_title="Predicted Label",
        yaxis_title="True Label",
        width=600,
        height=500
    )

    # Add custom annotations with count and percentage
    annotations = []
    total = np.sum(cm)
    for i, row in enumerate(cm):
        for j, value in enumerate(row):
            percentage = value / total * 100
            annotations.append(
                dict(
                    x=j,
                    y=i,
                    text=f"{value}<br>({percentage:.1f}%)",
                    showarrow=False,
                    font=dict(color="white" if value > cm.max() / 2 else "black")
                )
            )

    fig.update_layout(annotations=annotations)
    fig.show()

In [15]:
def evaluate_classifier(y_true, y_pred, plot_cm=True):
    """Print evaluation metrics for a classifier and optionally plot confusion matrix."""
    print("Accuracy:", accuracy_score(y_true, y_pred))
    print("\nConfusion Matrix:")
    print(confusion_matrix(y_true, y_pred))
    print("\nClassification Report:")
    print(classification_report(y_true, y_pred))

    if plot_cm:
        # Determine class names based on number of unique classes
        unique_classes = np.unique(np.concatenate([y_true, y_pred]))
        if len(unique_classes) == 2:
            class_names = ["Negative", "Positive"]
        else:
            class_names = [f"Class {i}" for i in unique_classes]

        plot_confusion_matrix(y_true, y_pred, class_names=class_names)

---

### **Parte 1: Regressão Logística**

Nesta seção, abordaremos:

- **Regressão Logística Binária:** usando um modelo PyTorch simples com uma função sigmoid.
- **Regressão Logística Múltipla:** aplicando o método a um conjunto de dados de câncer de mama.
- **Regressão Logística Multinomial:** estendendo a regressão logística para lidar com casos multiclasse usando a função softmax.

#### **1.1 Regressão Logística Binária com PyTorch**

**Conceitos Principais:**

- **Função Sigmoid:**
$
\sigma(z) = \frac{1}{1 + e^{-z}}
$

- **Função de Perda:** A perda de Entropia Cruzada Binária é utilizada.

Geramos um conjunto de dados sintético (com duas características) para classificação binária, padronizamos as características e definimos um modelo PyTorch simples.

In [16]:
# Generate synthetic data for binary classification
X, y = make_classification(n_samples=200, n_features=2, n_redundant=0,
                           n_informative=2, random_state=42, n_clusters_per_class=1)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Standardize features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Define PyTorch model for logistic regression
class LogisticRegressionModel(nn.Module):
    def __init__(self, input_dim):
        super(LogisticRegressionModel, self).__init__()
        self.linear = nn.Linear(input_dim, 1)

    def forward(self, x):
        return torch.sigmoid(self.linear(x))

# Convert data to PyTorch tensors
X_train_tensor = torch.FloatTensor(X_train_scaled)
y_train_tensor = torch.FloatTensor(y_train.reshape(-1, 1))
X_test_tensor = torch.FloatTensor(X_test_scaled)
y_test_tensor = torch.FloatTensor(y_test.reshape(-1, 1))

# Create DataLoader
train_dataset = TensorDataset(X_train_tensor, y_train_tensor)
train_loader = DataLoader(dataset=train_dataset, batch_size=32, shuffle=True)

# Initialize model, loss, optimizer
input_dim = X_train_scaled.shape[1]
model_binary = LogisticRegressionModel(input_dim)
criterion = nn.BCELoss()
optimizer = optim.SGD(model_binary.parameters(), lr=0.01)

# Training loop
epochs = 1000
for epoch in range(epochs):
    for inputs, labels in train_loader:
        outputs = model_binary(inputs)
        loss = criterion(outputs, labels)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

    if (epoch + 1) % 100 == 0:
        print(f'Epoch [{epoch+1}/{epochs}], Loss: {loss.item():.4f}')

# Evaluate model on test set
model_binary.eval()
with torch.no_grad():
    y_pred_probs = model_binary(X_test_tensor)
    y_pred = (y_pred_probs > 0.5).float().numpy().flatten()

print("\nBinary Logistic Regression Evaluation:")
evaluate_classifier(y_test, y_pred)

# Plot decision boundary using Plotly
plot_decision_boundary_plotly(model_binary, X_train_scaled, y_train, title="Binary Logistic Regression (Training Data)")
plot_decision_boundary_plotly(model_binary, X_test_scaled, y_test, title="Binary Logistic Regression (Test Data)")


Epoch [100/1000], Loss: 0.5736
Epoch [200/1000], Loss: 0.2773
Epoch [300/1000], Loss: 0.2397
Epoch [400/1000], Loss: 0.2986
Epoch [500/1000], Loss: 0.7064
Epoch [600/1000], Loss: 0.1375
Epoch [700/1000], Loss: 0.2499
Epoch [800/1000], Loss: 0.2229
Epoch [900/1000], Loss: 0.3619
Epoch [1000/1000], Loss: 0.1153

Binary Logistic Regression Evaluation:
Accuracy: 0.8833333333333333

Confusion Matrix:
[[28  6]
 [ 1 25]]

Classification Report:
              precision    recall  f1-score   support

           0       0.97      0.82      0.89        34
           1       0.81      0.96      0.88        26

    accuracy                           0.88        60
   macro avg       0.89      0.89      0.88        60
weighted avg       0.90      0.88      0.88        60



#### **1.2 Regressão Logística Múltipla**

Aqui, usamos o conjunto de dados de Câncer de Mama (que possui várias características) para demonstrar a regressão logística em um conjunto de dados do mundo real e de dimensões superiores.

In [17]:
data = load_breast_cancer()
X_multi = data.data
y_multi = data.target
print(f"Dataset: Breast Cancer Dataset with {X_multi.shape[1]} features")

X_train, X_test, y_train, y_test = train_test_split(X_multi, y_multi, test_size=0.3, random_state=42)

# Standardize features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Convert to PyTorch tensors
X_train_tensor = torch.FloatTensor(X_train_scaled)
y_train_tensor = torch.FloatTensor(y_train.reshape(-1, 1))
X_test_tensor = torch.FloatTensor(X_test_scaled)

# Initialize and train model
model_multi = LogisticRegressionModel(X_train_scaled.shape[1])
criterion = nn.BCELoss()
optimizer = optim.SGD(model_multi.parameters(), lr=0.01)

train_dataset = TensorDataset(X_train_tensor, y_train_tensor)
train_loader = DataLoader(dataset=train_dataset, batch_size=32, shuffle=True)

epochs = 1000
for epoch in range(epochs):
    for inputs, labels in train_loader:
        outputs = model_multi(inputs)
        loss = criterion(outputs, labels)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

    if (epoch + 1) % 100 == 0:
        print(f'Epoch [{epoch+1}/{epochs}], Loss: {loss.item():.4f}')

# Evaluate the multiple logistic regression model
model_multi.eval()
with torch.no_grad():
    y_pred_probs = model_multi(X_test_tensor)
    y_pred = (y_pred_probs > 0.5).float().numpy().flatten()

print("\nMultiple Logistic Regression Evaluation:")
evaluate_classifier(y_test, y_pred)


Dataset: Breast Cancer Dataset with 30 features
Epoch [100/1000], Loss: 0.0694
Epoch [200/1000], Loss: 0.0632
Epoch [300/1000], Loss: 0.0353
Epoch [400/1000], Loss: 0.1553
Epoch [500/1000], Loss: 0.0241
Epoch [600/1000], Loss: 0.1485
Epoch [700/1000], Loss: 0.0544
Epoch [800/1000], Loss: 0.0827
Epoch [900/1000], Loss: 0.0077
Epoch [1000/1000], Loss: 0.0388

Multiple Logistic Regression Evaluation:
Accuracy: 0.9883040935672515

Confusion Matrix:
[[ 62   1]
 [  1 107]]

Classification Report:
              precision    recall  f1-score   support

           0       0.98      0.98      0.98        63
           1       0.99      0.99      0.99       108

    accuracy                           0.99       171
   macro avg       0.99      0.99      0.99       171
weighted avg       0.99      0.99      0.99       171



#### **1.3 Regressão Logística Multinomial**

Para a classificação multiclasse, definimos um modelo usando a saída da softmax. A função softmax é dada por:

$
\text{softmax}(z_i) = \frac{e^{z_i}}{\sum_{j} e^{z_j}}
$

Geramos um conjunto de dados sintético multiclasse e, em seguida, treinamos um modelo PyTorch usando a perda de entropia cruzada.

In [18]:
# Generate synthetic data for multi-class classification
"""
Isso cria dados sintéticos com 500 amostras, 2 características e 3 classes distintas.
Os dados são então divididos em conjuntos de treinamento e teste, e as características são padronizadas.
"""
X_multiclass, y_multiclass = make_classification(n_samples=500, n_features=2, n_informative=2,
                                                 n_redundant=0, n_classes=3, n_clusters_per_class=1,
                                                 random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X_multiclass, y_multiclass, test_size=0.3, random_state=42)


# Standardize features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Define a multinomial logistic regression model using softmax
"""
Isso define uma classe de modelo PyTorch que:
- Recebe recursos de entrada e produz saídas para cada classe
- Usa uma única camada linear que mapeia recursos `input_dim` para saídas `num_classes`
- Aplica softmax para converter as raw outputs (saídas brutas) em probabilidades (cuja soma é 1)
"""
class MultinomialLogisticRegression(nn.Module):
    def __init__(self, input_dim, num_classes):
        super(MultinomialLogisticRegression, self).__init__()
        self.linear = nn.Linear(input_dim, num_classes)

    def forward(self, x):
        return torch.softmax(self.linear(x), dim=1)

# Convert data to tensors
"""
Converte matrizes NumPy em tensores PyTorch.
Observe que os rótulos usam `LongTensor` porque são inteiros.
"""
X_train_tensor = torch.FloatTensor(X_train_scaled)
y_train_tensor = torch.LongTensor(y_train)
X_test_tensor = torch.FloatTensor(X_test_scaled)

# Initialize model, loss (cross-entropy) and optimizer
"""
- Inicializa o modelo com as dimensões corretas (recursos → 3 classes)
- Utiliza perda de entropia cruzada, que é a função de perda padrão para classificação
- Utiliza Gradiente Descendente Estocástico com taxa de aprendizado de 0,1
"""
model_multi_class = MultinomialLogisticRegression(X_train_scaled.shape[1], 3)
criterion_multi = nn.CrossEntropyLoss()
optimizer_multi = optim.SGD(model_multi_class.parameters(), lr=0.1)

# Training loop
"""
Este é o processo padrão de treinamento do PyTorch:
1. Obter previsões do modelo
2. Calcular a perda comparando as previsões com os rótulos verdadeiros
3. Zerar os gradientes anteriores
4. Calcular os gradientes com o backward pass
5. Atualizar os pesos do modelo usando o otimizador
"""
epochs = 1000
for epoch in range(epochs):
    outputs = model_multi_class(X_train_tensor)
    loss = criterion_multi(outputs, y_train_tensor)

    optimizer_multi.zero_grad()
    loss.backward()
    optimizer_multi.step()

    if (epoch + 1) % 100 == 0:
        print(f'Epoch [{epoch+1}/{epochs}], Loss: {loss.item():.4f}')

# Evaluate the multinomial model
"""
- `torch.no_grad()` desabilita o rastreamento de gradiente para maior eficiência durante a avaliação
- `torch.max(outputs, 1)` encontra a classe com a maior probabilidade
- O resultado é convertido para NumPy para funções de avaliação do scikit-learn
"""
model_multi_class.eval()
with torch.no_grad():
    outputs = model_multi_class(X_test_tensor)
    _, y_pred_tensor = torch.max(outputs, 1)
    y_pred = y_pred_tensor.numpy()

print("\nMultinomial Logistic Regression Evaluation:")
evaluate_classifier(y_test, y_pred)

# Plot decision boundary using Plotly
"""
Isso chama a nossa função auxiliar que visualiza os limites de decisão usando o Plotly.
"""
plot_decision_boundary_plotly(model_multi_class, X_test_scaled, y_test, title="Multinomial Logistic Regression")


Epoch [100/1000], Loss: 0.9924
Epoch [200/1000], Loss: 0.8598
Epoch [300/1000], Loss: 0.8015
Epoch [400/1000], Loss: 0.7762
Epoch [500/1000], Loss: 0.7610
Epoch [600/1000], Loss: 0.7503
Epoch [700/1000], Loss: 0.7421
Epoch [800/1000], Loss: 0.7356
Epoch [900/1000], Loss: 0.7301
Epoch [1000/1000], Loss: 0.7255

Multinomial Logistic Regression Evaluation:
Accuracy: 0.8733333333333333

Confusion Matrix:
[[36 13  1]
 [ 0 51  4]
 [ 1  0 44]]

Classification Report:
              precision    recall  f1-score   support

           0       0.97      0.72      0.83        50
           1       0.80      0.93      0.86        55
           2       0.90      0.98      0.94        45

    accuracy                           0.87       150
   macro avg       0.89      0.88      0.87       150
weighted avg       0.89      0.87      0.87       150



### **Parte 2: LDA e Outros Métodos de Classificação**

Nesta seção, aplicamos técnicas clássicas de classificação:

- **Análise Discriminante Linear (LDA)**
- **Análise Discriminante Quadrática (QDA)**
- **Naive Bayes**

Esses métodos são demonstrados em um conjunto de dados sintéticos de duas características.


In [19]:
# Generate and standardize data for LDA/QDA/Naive Bayes
"""
- Cria um conjunto de dados de classificação binária sintética com 200 amostras e 2 características
- Define uma semente aleatória (42) para reprodutibilidade
- Utiliza 1 cluster por classe para uma separação mais precisa
"""
X, y = make_classification(n_samples=200, n_features=2, n_redundant=0,
                           n_informative=2, random_state=42, n_clusters_per_class=1)
# Divide os dados em 70% de treinamento e 30% de testes
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
scaler = StandardScaler()

"""
- Padroniza recursos para ter média zero e variância unitária
- Ajusta o escalonador aos dados de treinamento e aplica a mesma transformação aos dados de teste
"""

X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)


# Linear Discriminant Analysis (LDA)
"""
A LDA pressupõe que:
- Cada classe segue uma distribuição gaussiana
- Todas as classes compartilham a mesma matriz de covariância
- Isso cria limites de decisão lineares entre as classes
"""
lda = LinearDiscriminantAnalysis()
lda.fit(X_train_scaled, y_train)
y_pred_lda = lda.predict(X_test_scaled)
print("LDA Evaluation:")
evaluate_classifier(y_test, y_pred_lda)
plot_decision_boundary_plotly(lda, X_test_scaled, y_test, title="LDA Decision Boundary")

# Quadratic Discriminant Analysis (QDA)
"""
QDA é semelhante a LDA, mas:
- Permite que cada classe tenha sua própria matriz de covariância
- Isso cria limites de decisão quadráticos (curvos, ou seja, não lineares)
- Melhor para dados com diferentes dispersões em diferentes classes
"""
qda = QuadraticDiscriminantAnalysis()
qda.fit(X_train_scaled, y_train)
y_pred_qda = qda.predict(X_test_scaled)
print("\nQDA Evaluation:")
evaluate_classifier(y_test, y_pred_qda)
plot_decision_boundary_plotly(qda, X_test_scaled, y_test, title="QDA Decision Boundary")

# Naive Bayes
"""
Naive Bayes:
- Também assume distribuição gaussiana para características
- Faz a suposição "ingênua" de que as características são condicionalmente independentes
- Simplifica os cálculos de probabilidade
- Frequentemente funciona bem apesar dessa suposição simplificadora
"""

"""
Para cada modelo, o código:
1. Avalia o classificador usando uma função personalizada `evaluate_classifier()`
2. Plota o limite de decisão usando nossa função auxiliar `plot_decision_boundary_plotly()`
"""

nb = GaussianNB()
nb.fit(X_train_scaled, y_train)
y_pred_nb = nb.predict(X_test_scaled)
print("\nNaive Bayes Evaluation:")
evaluate_classifier(y_test, y_pred_nb)
plot_decision_boundary_plotly(nb, X_test_scaled, y_test, title="Naive Bayes Decision Boundary")



LDA Evaluation:
Accuracy: 0.8833333333333333

Confusion Matrix:
[[28  6]
 [ 1 25]]

Classification Report:
              precision    recall  f1-score   support

           0       0.97      0.82      0.89        34
           1       0.81      0.96      0.88        26

    accuracy                           0.88        60
   macro avg       0.89      0.89      0.88        60
weighted avg       0.90      0.88      0.88        60




QDA Evaluation:
Accuracy: 0.85

Confusion Matrix:
[[26  8]
 [ 1 25]]

Classification Report:
              precision    recall  f1-score   support

           0       0.96      0.76      0.85        34
           1       0.76      0.96      0.85        26

    accuracy                           0.85        60
   macro avg       0.86      0.86      0.85        60
weighted avg       0.87      0.85      0.85        60




Naive Bayes Evaluation:
Accuracy: 0.85

Confusion Matrix:
[[27  7]
 [ 2 24]]

Classification Report:
              precision    recall  f1-score   support

           0       0.93      0.79      0.86        34
           1       0.77      0.92      0.84        26

    accuracy                           0.85        60
   macro avg       0.85      0.86      0.85        60
weighted avg       0.86      0.85      0.85        60



In [20]:
# Generate and standardize data for LDA/QDA/Naive Bayes
X, y = make_classification(n_samples=200, n_features=2, n_redundant=0,
                           n_informative=2, random_state=42, n_clusters_per_class=1)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled  = scaler.transform(X_test)

# Train LDA, QDA, and Naive Bayes models
lda = LinearDiscriminantAnalysis()
lda.fit(X_train_scaled, y_train)

qda = QuadraticDiscriminantAnalysis()
qda.fit(X_train_scaled, y_train)

nb = GaussianNB()
nb.fit(X_train_scaled, y_train)

# Create a common meshgrid over the feature space
x_min, x_max = X_test_scaled[:, 0].min() - 0.1, X_test_scaled[:, 0].max() + 0.1
y_min, y_max = X_test_scaled[:, 1].min() - 0.1, X_test_scaled[:, 1].max() + 0.1
xx, yy = np.meshgrid(np.linspace(x_min, x_max, 200),
                     np.linspace(y_min, y_max, 200))
grid = np.c_[xx.ravel(), yy.ravel()]

# For each classifier, get the probability for class 1 over the grid
# Note: For binary classification, the decision boundary is where P(class=1)=0.5
Z_lda = lda.predict_proba(grid)[:, 1].reshape(xx.shape)
Z_qda = qda.predict_proba(grid)[:, 1].reshape(xx.shape)
Z_nb  = nb.predict_proba(grid)[:, 1].reshape(xx.shape)

# Create combined Plotly figure with contours from each classifier
import plotly.graph_objects as go

fig = go.Figure()

# LDA decision boundary (blue)
fig.add_trace(go.Contour(
    x=np.linspace(x_min, x_max, 200),
    y=np.linspace(y_min, y_max, 200),
    z=Z_lda,
    contours=dict(
        start=0.5,
        end=0.5,
        size=0.01,
        coloring='lines'
    ),
    line=dict(color='blue', width=2),
    # Force the colorscale to be solid blue:
    colorscale=[[0, 'blue'], [1, 'blue']],
    showscale=False,
    name='LDA'
))

# QDA decision boundary (red)
fig.add_trace(go.Contour(
    x=np.linspace(x_min, x_max, 200),
    y=np.linspace(y_min, y_max, 200),
    z=Z_qda,
    contours=dict(
        start=0.5,
        end=0.5,
        size=0.01,
        coloring='lines'
    ),
    line=dict(color='red', width=2),
    colorscale=[[0, 'red'], [1, 'red']],
    showscale=False,
    name='QDA'
))

# Naive Bayes decision boundary (green)
fig.add_trace(go.Contour(
    x=np.linspace(x_min, x_max, 200),
    y=np.linspace(y_min, y_max, 200),
    z=Z_nb,
    contours=dict(
        start=0.5,
        end=0.5,
        size=0.01,
        coloring='lines'
    ),
    line=dict(color='green', width=2),
    colorscale=[[0, 'green'], [1, 'green']],
    showscale=False,
    name='Naive Bayes'
))

# Add scatter plot for the test data
fig.add_trace(go.Scatter(
    x=X_test_scaled[:, 0],
    y=X_test_scaled[:, 1],
    mode='markers',
    marker=dict(
        color=y_test,
        colorscale='Viridis',
        line=dict(width=1, color='black')
    ),
    name='Test Data'
))

fig.update_layout(
    title='Combined Decision Boundaries: LDA (blue), QDA (red), Naive Bayes (green)',
    xaxis_title='Feature 1',
    yaxis_title='Feature 2'
)

fig.show()


### **Parte 3: Métodos de Reamostragem**

Aqui, ilustramos as seguintes técnicas de reamostragem:

- **Validação Cruzada Leave-One-Out (LOOCV)**
- **Validação Cruzada K-Fold**
- **Bootstrap**

Essas técnicas são úteis para avaliar a generalização do modelo e entender o trade-off entre viés e variância.

In [21]:
# Using the Breast Cancer dataset to demonstrate resampling methods
"""
- Carrega o conjunto de dados do câncer de mama e separa as características (X) e os rótulos de destino (Y)
- Mostra as dimensões do conjunto de dados
"""
X, y = load_breast_cancer(return_X_y=True)
print("Dataset shape:", X.shape)

# LOOCV
"""
- O LOOCV treina o modelo em todas as amostras, exceto uma, e testa na amostra excluída.
- Repete esse processo para cada amostra do conjunto de dados.
"""
loocv = LeaveOneOut()
model = LinearDiscriminantAnalysis()
loocv_scores = cross_val_score(model, X, y, cv=loocv, scoring='accuracy')
print(f"\nLOOCV - Mean Accuracy: {loocv_scores.mean():.4f}, Std: {loocv_scores.std():.4f}")

# K-Fold Cross-Validation for different k values
"""
- Divide o conjunto de dados em k partes iguais (folds)
- Para cada iteração, usa um fold (parte) como dados de teste e o restante como dados de treinamento
"""
for k in [5, 10]:
    kf = KFold(n_splits=k, shuffle=True, random_state=42)
    cv_scores = cross_val_score(model, X, y, cv=kf, scoring='accuracy')
    print(f"{k}-Fold CV - Mean Accuracy: {cv_scores.mean():.4f}, Std: {cv_scores.std():.4f}")

# Bias-Variance Trade-off demonstration
"""
- Testa diferentes valores de k (número de folds)
- Registra a precisão média (indicando o desempenho do modelo) e o desvio padrão (indicando a variabilidade)
- Plota os resultados para mostrar como diferentes valores de k afetam o trade-off entre viés e variância

O trade-off entre viés e variância refere-se a:
- Viés: Quão distantes as previsões do modelo estão dos valores reais (erro sistemático)
- Variância: O quanto as previsões do modelo mudam com diferentes dados de treinamento (sensibilidade)
"""
k_range = [2, 5, 10, 20, len(X)]  # last one is LOOCV
mean_scores = []
std_scores = []
for k in k_range:
    if k == len(X):
        cv = LeaveOneOut()
        label = "LOOCV"
    else:
        cv = KFold(n_splits=k, shuffle=True, random_state=42)
        label = f"{k}-fold"
    cv_scores = cross_val_score(model, X, y, cv=cv, scoring='accuracy')
    mean_scores.append(cv_scores.mean())
    std_scores.append(cv_scores.std())
    print(f"{label} - Mean Accuracy: {cv_scores.mean():.4f}, Std: {cv_scores.std():.4f}")

# Plot bias-variance trade-off with Plotly
fig = go.Figure()
fig.add_trace(
    go.Scatter(
        x=[str(k) if k != len(X) else "LOOCV" for k in k_range],
        y=mean_scores,
        error_y=dict(type='data', array=std_scores, visible=True),
        mode='lines+markers'
    )
)
fig.update_layout(
    title='Bias-Variance Trade-off in K-Fold Cross-Validation',
    xaxis_title='Number of Folds (k)',
    yaxis_title='Accuracy'
)
fig.show()

# Bootstrap
"""
Método bootstrap:
- Amostras aleatórias do conjunto de dados COM substituição (algumas amostras podem aparecer várias vezes)
- As amostras "fora do conjunto" (aquelas não selecionadas) atuam como um conjunto de validação
- Repete esse processo diversas vezes para criar uma distribuição das pontuações de desempenho do modelo
- Útil para entender a estabilidade do modelo e criar intervalos de confiança
"""
def bootstrap(X, y, model, n_bootstraps=1000):
    n_samples = len(X)
    scores = []
    for _ in range(n_bootstraps):
        indices = np.random.choice(n_samples, n_samples, replace=True)
        X_boot, y_boot = X[indices], y[indices]
        # Out-of-bag samples
        oob_indices = list(set(range(n_samples)) - set(indices))
        if not oob_indices:
            continue  # skip if no out-of-bag samples (rare)
        X_oob, y_oob = X[oob_indices], y[oob_indices]
        model.fit(X_boot, y_boot)
        scores.append(model.score(X_oob, y_oob))
    return scores

bootstrap_scores = bootstrap(X, y, LinearDiscriminantAnalysis(), n_bootstraps=100)
print(f"\nBootstrap - Mean Accuracy: {np.mean(bootstrap_scores):.4f}, Std: {np.std(bootstrap_scores):.4f}")

# Plot bootstrap distribution using Plotly
"""
Os plots monstram:
1. O trade-off entre viés e variância entre diferentes valores de k
2. A distribuição das pontuações de precisão do bootstrap

Essas visualizações ajudam a compreender a estabilidade e a confiabilidade do modelo em diferentes subconjuntos de dados.
"""
fig = px.histogram(bootstrap_scores, nbins=20, title='Bootstrap Distribution of LDA Accuracy')
fig.add_vline(x=np.mean(bootstrap_scores), line_dash="dash", line_color="red",
              annotation_text=f"Mean={np.mean(bootstrap_scores):.4f}")
fig.update_layout(xaxis_title="Accuracy", yaxis_title="Frequency")
fig.show()


Dataset shape: (569, 30)

LOOCV - Mean Accuracy: 0.9578, Std: 0.2010
5-Fold CV - Mean Accuracy: 0.9543, Std: 0.0116
10-Fold CV - Mean Accuracy: 0.9543, Std: 0.0179
2-fold - Mean Accuracy: 0.9473, Std: 0.0106
5-fold - Mean Accuracy: 0.9543, Std: 0.0116
10-fold - Mean Accuracy: 0.9543, Std: 0.0179
20-fold - Mean Accuracy: 0.9541, Std: 0.0390
LOOCV - Mean Accuracy: 0.9578, Std: 0.2010



Bootstrap - Mean Accuracy: 0.9527, Std: 0.0139


## **REFERENCES**

[1] Wolberg, W., Mangasarian, O., Street, N., & Street, W. (1993). Breast Cancer Wisconsin (Diagnostic) [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C5DW2B.

[2] Jurafsky and Martin. (2025). Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition with Language Models, 3rd edition. Ch. 5. Logistic Regression. Online manuscript released January 12, 2025. https://web.stanford.edu/~jurafsky/slp3.