# 04 - Modelagem Preditiva
Autora: Fernanda Baptista de Siqueira  
Curso: MBA em Tecnologia para Negócios – AI, Data Science e Big Data  
Tema: Análise de Acidentes de Trânsito em Porto Alegre (2020–2024)  
Origem DataFrame: Equipe Armazém de Dados de Mobilidade - EAMOB/CIET  
https://dadosabertos.poa.br/dataset/acidentes-de-transito-acidentes (11/05/2025)  

### 1. Importa bibliotecas e funções. Carrega dados

In [2]:
from config import (
    pd, sns, plt, resumo_df,  
    PATH_CLEAN, COLS_VEICULOS
)

from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix, roc_auc_score, RocCurveDisplay
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from imblearn.over_sampling import SMOTE
from sklearn.linear_model import LogisticRegression
from statsmodels.tsa.statespace.sarimax import SARIMAX
from  xgboost import XGBClassifier

df = pd.read_parquet(PATH_CLEAN + "df_limpo_chuva.parquet")

### 2. Checa dados. Divide treino e validação

In [3]:
resumo_df(df)

df_treino = df[df["data"] < "2025-01-01"]
df_valid = df[df["data"] >= "2025-01-01"]
print(resumo_df(df_treino))


Dimensões: (68837, 35)

Tipos de dados:
predial1                   Int32
queda_arr                  Int32
data              datetime64[ns]
feridos                    Int32
feridos_gr                 Int32
fatais                     Int32
auto                       Int32
taxi                       Int32
lotacao                    Int32
onibus_urb                 Int32
onibus_met                 Int32
onibus_int                 Int32
caminhao                   Int32
moto                       Int32
carroca                    Int32
bicicleta                  Int32
outro                      Int32
cont_vit                   Int32
ups                        Int32
patinete                   Int32
idacidente                 Int32
log1              string[python]
log2              string[python]
tipo_acid               category
dia_sem                 category
hora             timedelta64[ns]
noite_dia               category
regiao                  category
hora_int                   int64
dat

Unnamed: 0,predial1,queda_arr,data,feridos,feridos_gr,fatais,auto,taxi,lotacao,onibus_urb,onibus_met,onibus_int,caminhao,moto,carroca,bicicleta,outro,cont_vit,ups,patinete,idacidente,log1,log2,tipo_acid,dia_sem,hora,noite_dia,regiao,hora_int,data_hora,total_vitimas,soma_veiculos,data_meteo,chuva,chovendo
0,2500,0,2020-01-01,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,1,0,669196,AV FARRAPOS,AV SAO PEDRO,ABALROAMENTO,Quarta,0 days 02:20:00,NOITE,NORTE,2,2020-01-01 02:20:00,0,2,2020-01-01 02:00:00,0.0,0
1,598,0,2020-01-01,1,0,0,0,1,0,0,0,0,0,1,0,0,0,1,5,0,669089,AV BENTO GONCALVES,,ABALROAMENTO,Quarta,0 days 03:00:00,NOITE,LESTE,3,2020-01-01 03:00:00,1,2,2020-01-01 03:00:00,0.0,0
2,0,0,2020-01-01,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,1,0,669206,R SANTA FLORA,AV DA CAVALHADA,COLISÃO,Quarta,0 days 17:15:00,DIA,SUL,17,2020-01-01 17:15:00,0,2,2020-01-01 17:00:00,0.4,1
3,399,0,2020-01-01,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,669195,R SAO FRANCISCO DE ASSIS,,EVENTUAL,Quarta,0 days 17:15:00,DIA,NORTE,17,2020-01-01 17:15:00,0,1,2020-01-01 17:00:00,5.7,1
4,400,0,2020-01-01,1,1,0,0,0,0,0,0,0,0,1,0,1,0,1,5,0,683303,AV SENADOR TARSO DUTRA,,ABALROAMENTO,Quarta,0 days 23:00:00,NOITE,LESTE,23,2020-01-01 23:00:00,1,2,2020-01-01 23:00:00,0.0,0


Dimensões: (65554, 35)

Tipos de dados:
predial1                   Int32
queda_arr                  Int32
data              datetime64[ns]
feridos                    Int32
feridos_gr                 Int32
fatais                     Int32
auto                       Int32
taxi                       Int32
lotacao                    Int32
onibus_urb                 Int32
onibus_met                 Int32
onibus_int                 Int32
caminhao                   Int32
moto                       Int32
carroca                    Int32
bicicleta                  Int32
outro                      Int32
cont_vit                   Int32
ups                        Int32
patinete                   Int32
idacidente                 Int32
log1              string[python]
log2              string[python]
tipo_acid               category
dia_sem                 category
hora             timedelta64[ns]
noite_dia               category
regiao                  category
hora_int                   int64
dat

Unnamed: 0,predial1,queda_arr,data,feridos,feridos_gr,fatais,auto,taxi,lotacao,onibus_urb,onibus_met,onibus_int,caminhao,moto,carroca,bicicleta,outro,cont_vit,ups,patinete,idacidente,log1,log2,tipo_acid,dia_sem,hora,noite_dia,regiao,hora_int,data_hora,total_vitimas,soma_veiculos,data_meteo,chuva,chovendo
0,2500,0,2020-01-01,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,1,0,669196,AV FARRAPOS,AV SAO PEDRO,ABALROAMENTO,Quarta,0 days 02:20:00,NOITE,NORTE,2,2020-01-01 02:20:00,0,2,2020-01-01 02:00:00,0.0,0
1,598,0,2020-01-01,1,0,0,0,1,0,0,0,0,0,1,0,0,0,1,5,0,669089,AV BENTO GONCALVES,,ABALROAMENTO,Quarta,0 days 03:00:00,NOITE,LESTE,3,2020-01-01 03:00:00,1,2,2020-01-01 03:00:00,0.0,0
2,0,0,2020-01-01,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,1,0,669206,R SANTA FLORA,AV DA CAVALHADA,COLISÃO,Quarta,0 days 17:15:00,DIA,SUL,17,2020-01-01 17:15:00,0,2,2020-01-01 17:00:00,0.4,1
3,399,0,2020-01-01,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,669195,R SAO FRANCISCO DE ASSIS,,EVENTUAL,Quarta,0 days 17:15:00,DIA,NORTE,17,2020-01-01 17:15:00,0,1,2020-01-01 17:00:00,5.7,1
4,400,0,2020-01-01,1,1,0,0,0,0,0,0,0,0,1,0,1,0,1,5,0,683303,AV SENADOR TARSO DUTRA,,ABALROAMENTO,Quarta,0 days 23:00:00,NOITE,LESTE,23,2020-01-01 23:00:00,1,2,2020-01-01 23:00:00,0.0,0


None


### 3. Define variável alvo e features

In [None]:
# Alvo: acidentes com vítimas (binário)
target = "cont_vit"

# Features: categóricas + veículos + clima
features = [
    "regiao", "dia_sem", "noite_dia", "tipo_acid",
    "hora", "mes", "chuva_hora", "chovendo"
] + COLS_VEICULOS

X_train, y_train = df_treino[features], df_treino[target]
X_valid, y_valid = df_valid[features], df_valid[target]


02. Define problema modelagem
* Alvo: acidentes com vitmas, ou gravidade
* Preditores: dados temporais, regionais, meteorológicos, cols_veiculos

### 4. Pré-processamento

In [None]:
# Separar colunas
cat_cols = ["regiao", "dia_sem", "noite_dia", "tipo_acid"]
num_cols = ["hora", "mes", "chuva_hora", "chovendo"] + cols_veiculos

# Pré-processamento
categorical_transformer = Pipeline(steps=[
    ("imputer", SimpleImputer(strategy="most_frequent")),
    ("onehot", OneHotEncoder(handle_unknown="ignore"))
])
numeric_transformer = Pipeline(steps=[
    ("imputer", SimpleImputer(strategy="median"))
])

preprocessor = ColumnTransformer(
    transformers=[
        ("categorical", categorical_transformer, cat_cols),
        ("numeric", numeric_transformer, num_cols)
    ]
)

03. Pré processamento
* seleção de features
* balanceamento (SMOTE, undersampling)
* normalização

4. Modelos candidatos
* Classificação: Regressão Logística, Random Forest, XGBoost
* Series Temporais: SARIMA, Prophet

5. Treinamento e validação
* Separação treino/teste
* Treinar modelos, guardar métricas (accuracy, recall, AUC)
* Comparar modelos

6. Interpretação
* Importância das variáveis (ex.: feature_importances_ da Random Forest)
* Gráficos de desempenho (ROC, matriz de confusão)

7. Previsão para 2025
* Usar o modelo escolhido para prever
* Comparar com dados reais de 2025 quando disponíveis.