## 3. Modeling
📒 `3.0-rc-modeling.ipynb`

**Objetivo:** Desenvolver, treinar modelo de regressão afim de estimar o preço de veículos com base em suas especificações técnicas e características de mercado.

⚙️ **Atividades:**
- Seleção das variáveis preditoras mais relevantes para o modelo.
- Tratamento de dados faltantes e codificação de variáveis categóricas.
- Definição das métricas de avaliação apropriadas (e.g., RMSE, MAE, R²).
- Implementação de diversos algoritmos de regressão (e.g., Linear Regression, Random Forest, Gradient Boosting).
- Ajuste de hiperparâmetros utilizando técnicas como Grid Search ou Random Search.
- Validação dos modelos por meio de técnicas de cross-validation.
- Comparação de desempenho entre os modelos desenvolvidos.
- Seleção do modelo final baseado nos resultados das métricas de avaliação.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [2]:
# Classificar as features dos veículos
tech_map = {
    'perform': ['engine-size', 'horsepower', 'compression-ratio',
                    'bore', 'stroke', 'peak-rpm', 'num-of-cylinders', 'engine-type',
                    'fuel-system', 'engine-location'],
    'design': ['body-style', 'num-of-doors', 'drive-wheels', 'wheel-base',
               'curb-weight', 'length', 'width', 'height'],
    'mercado': ['make', 'fuel-type', 'aspiration', 'avg-mpg'],
    'risk_insurance': ['symboling'],
    'cost_losses': ['normalized-losses']
}
# Neste exemplo, vamos considerar que as caracteristicas de 'powertrain' e 'design' referem-se a caracteristicas Técnicas do veículo

features_map = {
    'technical': ['engine-size', 'horsepower', 'curb-weight', 'compression-ratio',
                    'bore', 'stroke', 'peak-rpm', 'num-of-cylinders', 'engine-type',
                    'fuel-system', 'engine-location''body-style', 'num-of-doors', 
                    'drive-wheels', 'wheel-base','length', 'width', 'height'],

    'mercado': ['make', 'fuel-type', 'aspiration', 'avg-mpg'],
    'risk_insurance': ['symboling'],
    'cost_losses': ['normalized-losses']
}

In [10]:
file_path = "../data/processed/car_price_prep.csv"
df = pd.read_csv(file_path)

df.head()

Unnamed: 0,symboling,normalized-losses,make,fuel-type,aspiration,num-of-doors,body-style,drive-wheels,wheel-base,length,...,bore,stroke,compression-ratio,horsepower,peak-rpm,price,price-binned,risk_insurance,car-profile,avg-mpg
0,3,115,alfa-romero,gas,std,two,convertible,rwd,88.6,168.8,...,3.47,2.68,9.0,111,5000,13495,medium,high,sport/premium,24.0
1,3,115,alfa-romero,gas,std,two,convertible,rwd,88.6,168.8,...,3.47,2.68,9.0,111,5000,16500,medium,high,sport/premium,24.0
2,1,115,alfa-romero,gas,std,two,hatchback,rwd,94.5,171.2,...,2.68,3.47,9.0,154,5000,16500,medium,moderate,utility,22.5
3,2,164,audi,gas,std,four,sedan,fwd,99.8,176.6,...,3.19,3.4,10.0,102,5500,13950,medium,high,utility,27.0
4,2,164,audi,gas,std,four,sedan,4wd,99.4,176.6,...,3.19,3.4,8.0,115,5500,17450,medium,high,utility,20.0


In [11]:
df.dtypes

symboling              int64
normalized-losses      int64
make                  object
fuel-type             object
aspiration            object
num-of-doors          object
body-style            object
drive-wheels          object
wheel-base           float64
length               float64
width                float64
height               float64
curb-weight            int64
engine-type           object
num-of-cylinders      object
engine-size            int64
fuel-system           object
bore                 float64
stroke               float64
compression-ratio    float64
horsepower             int64
peak-rpm               int64
price                  int64
price-binned          object
risk_insurance        object
car-profile           object
avg-mpg              float64
dtype: object

### 3.1 Data Pre-Processing

In [9]:
# Define categorical and numerical columns

categorical_features = df.select_dtypes(include=['object']).columns.to_list()
numerical_features = df.select_dtypes(include=['int64', 'float64']).columns.to_list()
numerical_features

['symboling',
 'normalized-losses',
 'wheel-base',
 'length',
 'width',
 'height',
 'curb-weight',
 'engine-size',
 'bore',
 'stroke',
 'compression-ratio',
 'horsepower',
 'peak-rpm',
 'price',
 'avg-mpg']