## Resultados da Análise Exploratória

Objetivo: Explorar o dataset Breast Cancer Wisconsin e criar modelos baseline simples (Regressão Logística e Árvore de Decisão) para medir recall e entender padrões de dados

Importações das bibliotecas necessárias:

In [23]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import classification_report, confusion_matrix, ConfusionMatrixDisplay

### Carregamento do Dataset

In [25]:
# Carrega o dataset Breast Cancer Wisconsin direto do sklearn
data = load_breast_cancer(as_frame=True)

# Cria um DataFrame com os dados para facilitar a analise
df = data.frame
df['target'] = data.target

# Exibe informações gerais
print("Shape (linhas, colunas):", df.shape)
print("\nDistribuição das classes (0 = maligno, 1 = benigno):\n", df['target'].value_counts(normalize=True))

Shape (linhas, colunas): (569, 31)

Distribuição das classes (0 = maligno, 1 = benigno):
 target
1    0.627417
0    0.372583
Name: proportion, dtype: float64


Este dataset possúi 569 amostras com 30 variáveis numéricas que descrevem características de células observadas em imagens microscópicas de biópsias de mama. O objetivo é classificar essas amostras em malignas (câncer) ou benignas (não câncer).

O <b>target</b> indica o dignóstico:
- <b>0</b>: maligno (tumor com câncer)
- <b>1</b>: benigno (tumor não cancerígeno)

```python

### Estrutura e estatísticas descritivas

In [27]:
# Verifica tipos de dados e possíveis valores ausentes
df.info()

# Estatísticas descritivas (média, desvio, mínimo, máximo)
df.describe().T.head(10)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 569 entries, 0 to 568
Data columns (total 31 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   mean radius              569 non-null    float64
 1   mean texture             569 non-null    float64
 2   mean perimeter           569 non-null    float64
 3   mean area                569 non-null    float64
 4   mean smoothness          569 non-null    float64
 5   mean compactness         569 non-null    float64
 6   mean concavity           569 non-null    float64
 7   mean concave points      569 non-null    float64
 8   mean symmetry            569 non-null    float64
 9   mean fractal dimension   569 non-null    float64
 10  radius error             569 non-null    float64
 11  texture error            569 non-null    float64
 12  perimeter error          569 non-null    float64
 13  area error               569 non-null    float64
 14  smoothness error         5

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
mean radius,569.0,14.127292,3.524049,6.981,11.7,13.37,15.78,28.11
mean texture,569.0,19.289649,4.301036,9.71,16.17,18.84,21.8,39.28
mean perimeter,569.0,91.969033,24.298981,43.79,75.17,86.24,104.1,188.5
mean area,569.0,654.889104,351.914129,143.5,420.3,551.1,782.7,2501.0
mean smoothness,569.0,0.09636,0.014064,0.05263,0.08637,0.09587,0.1053,0.1634
mean compactness,569.0,0.104341,0.052813,0.01938,0.06492,0.09263,0.1304,0.3454
mean concavity,569.0,0.088799,0.07972,0.0,0.02956,0.06154,0.1307,0.4268
mean concave points,569.0,0.048919,0.038803,0.0,0.02031,0.0335,0.074,0.2012
mean symmetry,569.0,0.181162,0.027414,0.106,0.1619,0.1792,0.1957,0.304
mean fractal dimension,569.0,0.062798,0.00706,0.04996,0.0577,0.06154,0.06612,0.09744


O <b>.iinfo()</b> mostra que todas as colunas são numéricas (<b>float64</b>) e não há valores nulos, excelente qualidade de dados para análise.

O <b>.describe()</b> permite observar escalas diferentes entre colunas, algo importante, pois modelos baseados em distancias ou gradiente são sensíveis à escala (por isso usaremos <b>StandardScaler</b>).

```