# Exemplo de Árvores de Regressão: previsão de preço de carros usados

O **problema** que utilizaremos para exemplificar é o seguinte:

* Deseja-se definir o preço de venda de carros usados.

Os dados utilizados para abordar esse problema são os seguintes:

* Conjunto de **dados** Kelly Blue Book disponibilizado neste [link](https://modeldata.tidymodels.org/reference/car_prices.html), contendo uma amostra de dados de 804 carros da fabricante GM do ano de 2005.

Não houve nenhum pré-processamento nos dados.


Uma pequena análise exploratória é exibida abaixo:

# Bibliotecas

In [None]:
!pip install ISLP
!pip install skimpy

Collecting ISLP
  Downloading ISLP-0.3.21-py3-none-any.whl (3.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.6/3.6 MB[0m [31m15.5 MB/s[0m eta [36m0:00:00[0m
Collecting lifelines (from ISLP)
  Downloading lifelines-0.28.0-py3-none-any.whl (349 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m349.2/349.2 kB[0m [31m20.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting pygam (from ISLP)
  Downloading pygam-0.9.0-py3-none-any.whl (522 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m522.2/522.2 kB[0m [31m24.0 MB/s[0m eta [36m0:00:00[0m
Collecting pytorch-lightning (from ISLP)
  Downloading pytorch_lightning-2.1.3-py3-none-any.whl (777 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m777.7/777.7 kB[0m [31m32.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting torchmetrics (from ISLP)
  Downloading torchmetrics-1.3.0-py3-none-any.whl (840 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [

Collecting skimpy
  Downloading skimpy-0.0.11-py3-none-any.whl (16 kB)
Collecting ipykernel<7.0.0,>=6.7.0 (from skimpy)
  Downloading ipykernel-6.28.0-py3-none-any.whl (114 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m114.1/114.1 kB[0m [31m4.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting jupyter<2.0.0,>=1.0.0 (from skimpy)
  Downloading jupyter-1.0.0-py2.py3-none-any.whl (2.7 kB)
Collecting pandas<3.0.0,>=2.0.3 (from skimpy)
  Downloading pandas-2.1.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (12.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.3/12.3 MB[0m [31m22.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting polars<0.20.0,>=0.19.0 (from skimpy)
  Downloading polars-0.19.19-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (28.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m28.5/28.5 MB[0m [31m64.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting pyarrow<14.0.0,>=13.0.0 (from skimpy)
 

In [None]:
#Carregar bibliotecas

import pandas as pd
from skimpy import skim
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_squared_error

# Dados

In [None]:
#carregar dados

data = pd.read_csv("dados.csv")
data

Unnamed: 0,Price,Mileage,Cylinder,Doors,Cruise,Sound,Leather,Buick,Cadillac,Chevy,Pontiac,Saab,Saturn,convertible,coupe,hatchback,sedan,wagon
0,22661.05,20105,6,4,1,0,0,1,0,0,0,0,0,0,0,0,1,0
1,21725.01,13457,6,2,1,1,0,0,0,1,0,0,0,0,1,0,0,0
2,29142.71,31655,4,2,1,1,1,0,0,0,0,1,0,1,0,0,0,0
3,30731.94,22479,4,2,1,0,0,0,0,0,0,1,0,1,0,0,0,0
4,33358.77,17590,4,2,1,1,1,0,0,0,0,1,0,1,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
799,10813.34,266,4,4,1,0,1,0,0,1,0,0,0,0,0,1,0,0
800,9720.98,20836,4,4,1,1,0,0,0,1,0,0,0,0,0,1,0,0
801,9482.22,24842,4,4,1,0,0,0,0,1,0,0,0,0,0,0,1,0
802,9563.79,19273,4,4,1,1,0,0,0,1,0,0,0,0,0,0,1,0


In [None]:
#pequena análise exploratória

skim(data)

As **previsões** produzidas pelo algoritmo serão exibidas abaixo

# Modelo

## Amostras de treino e teste


In [None]:
#modelagem

#Separação amostra treino e teste

training_data, test_data = train_test_split(
    data,
    test_size = 0.3,
    random_state = 1984
)

print(training_data)
print(test_data)

        Price  Mileage  Cylinder  Doors  Cruise  Sound  Leather  Buick  \
471  16644.09    22383         6      4       1      1        1      0   
546  21525.34    25020         6      2       1      1        1      0   
633  16997.69    25830         6      4       1      0        1      0   
22   27610.86    22881         4      4       1      1        1      0   
399  16379.85     4188         4      4       1      1        1      0   
..        ...      ...       ...    ...     ...    ...      ...    ...   
793  20382.15    25240         6      2       1      1        1      0   
755  13436.00    20530         4      4       0      1        1      0   
767  16805.06    19498         6      4       1      0        0      0   
623  38445.90    18661         8      4       1      0        1      0   
220  35622.14    10340         4      2       1      1        0      0   

     Cadillac  Chevy  Pontiac  Saab  Saturn  convertible  coupe  hatchback  \
471         0      1        0    

## Modelagem e Previsão

In [None]:
#Treino do algoritmo

method = DecisionTreeRegressor()

method.fit(
    X = training_data.filter(
        items = ['Mileage', 'Cylinder', 'Doors', 'Leather'],
        axis = "columns"
    ),
    y = training_data.Price.values.ravel()
)

In [None]:
#Produzir previsões

predict = method.predict(
    test_data.filter(
        items = ['Mileage', 'Cylinder', 'Doors', 'Leather'],
        axis = "columns"
    )
)

predict[0:5]

array([12649.11, 11115.01, 20538.09, 22100.39, 19471.97])

Por fim, reportageremos algumas medidas de acurácia

## Acurácia

In [None]:
#Calcular acurácia

print(mean_squared_error(
    training_data.Price,
    method.predict(
        training_data.filter(
            items = ['Mileage', 'Cylinder', 'Doors', 'Leather'],
            axis = "columns"
        )
    ),
    squared= False
))

print(mean_squared_error(test_data.Price, predict, squared = False))

158.9182223406238
10614.899729612547
