![image info](https://raw.githubusercontent.com/davidzarruk/MIAD_ML_NLP_2023/main/images/banner_1.png)

# Proyecto 1 - Predicción de precios de vehículos usados

En este proyecto podrán poner en práctica sus conocimientos sobre modelos predictivos basados en árboles y ensambles, y sobre la disponibilización de modelos. Para su desasrrollo tengan en cuenta las instrucciones dadas en la "Guía del proyecto 1: Predicción de precios de vehículos usados".

**Entrega**: La entrega del proyecto deberán realizarla durante la semana 4. Sin embargo, es importante que avancen en la semana 3 en el modelado del problema y en parte del informe, tal y como se les indicó en la guía.

Para hacer la entrega, deberán adjuntar el informe autocontenido en PDF a la actividad de entrega del proyecto que encontrarán en la semana 4, y subir el archivo de predicciones a la [competencia de Kaggle](https://www.kaggle.com/t/b8be43cf89c540bfaf3831f2c8506614).

## Datos para la predicción de precios de vehículos usados

En este proyecto se usará el conjunto de datos de Car Listings de Kaggle, donde cada observación representa el precio de un automóvil teniendo en cuenta distintas variables como: año, marca, modelo, entre otras. El objetivo es predecir el precio del automóvil. Para más detalles puede visitar el siguiente enlace: [datos](https://www.kaggle.com/jpayne/852k-used-car-listings).

## Ejemplo predicción conjunto de test para envío a Kaggle

En esta sección encontrarán el formato en el que deben guardar los resultados de la predicción para que puedan subirlos a la competencia en Kaggle.

In [1]:
import warnings
warnings.filterwarnings('ignore')

In [2]:
# Importación librerías
import pandas as pd
import numpy as np

In [3]:
# Carga de datos de archivo .csv
dataTraining = pd.read_csv('https://raw.githubusercontent.com/davidzarruk/MIAD_ML_NLP_2023/main/datasets/dataTrain_carListings.zip')
dataTesting = pd.read_csv('https://raw.githubusercontent.com/davidzarruk/MIAD_ML_NLP_2023/main/datasets/dataTest_carListings.zip', index_col=0)

In [4]:
# Visualización datos de entrenamiento
dataTraining.head()

Unnamed: 0,Price,Year,Mileage,State,Make,Model
0,34995,2017,9913,FL,Jeep,Wrangler
1,37895,2015,20578,OH,Chevrolet,Tahoe4WD
2,18430,2012,83716,TX,BMW,X5AWD
3,24681,2014,28729,OH,Cadillac,SRXLuxury
4,26998,2013,64032,CO,Jeep,Wrangler


In [5]:
# Visualización datos de test
dataTesting.head()

Unnamed: 0_level_0,Year,Mileage,State,Make,Model
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
0,2014,31909,MD,Nissan,MuranoAWD
1,2017,5362,FL,Jeep,Wrangler
2,2014,50300,OH,Ford,FlexLimited
3,2004,132160,WA,BMW,5
4,2015,25226,MA,Jeep,Grand


In [6]:
# Predicción del conjunto de test - acá se genera un número aleatorio como ejemplo
np.random.seed(42)
y_pred = pd.DataFrame(np.random.rand(dataTesting.shape[0]) * 75000 + 5000, index=dataTesting.index, columns=['Price'])

In [7]:
# Guardar predicciones en formato exigido en la competencia de kaggle
y_pred.to_csv('test_submission.csv', index_label='ID')
y_pred.head()

Unnamed: 0_level_0,Price
ID,Unnamed: 1_level_1
0,33090.508914
1,76303.572981
2,59899.545636
3,49899.386315
4,16701.398033


In [8]:
### XGB Regressor

from sklearn.model_selection import train_test_split
from xgboost import XGBRegressor


In [9]:
refs = (dataTraining["Make"] + "_" + dataTraining["Model"]).value_counts().reset_index()
refs = refs.rename(columns={'index': 'ref', 0: 'Cantidad'})

r = list(refs[refs.Cantidad < 200].ref)

dataTraining['ref'] = (dataTraining["Make"] + "_" + dataTraining["Model"])

dataTraining['ref'] = dataTraining['ref'].apply(lambda x: 'Otro' if x in r else x)

dataTraining = dataTraining[['Price', 'Year', 'Mileage', 'State', 'ref']].copy()

dataTraining["State"] = dataTraining["State"].astype("category")
dataTraining["ref"]   = dataTraining["ref"].astype("category")

dataTraining.head()

Unnamed: 0,Price,Year,Mileage,State,ref
0,34995,2017,9913,FL,Jeep_Wrangler
1,37895,2015,20578,OH,Chevrolet_Tahoe4WD
2,18430,2012,83716,TX,BMW_X5AWD
3,24681,2014,28729,OH,Cadillac_SRXLuxury
4,26998,2013,64032,CO,Jeep_Wrangler


In [10]:
data = dataTraining.copy()
y = data[['Price', 'State', 'ref']]
X = data[['Year', 'Mileage', 'State', 'ref']]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=42)

y_train = y_train['Price']
y_test  = y_test['Price']

In [11]:
np.random.seed(123)
idx_validacion = np.random.choice(
                    X_train.shape[0],
                    size= int(X_train.shape[0]*0.1),
                    replace=False
                 )

X_val = X_train.iloc[idx_validacion, :].copy()
y_val = y_train.iloc[idx_validacion].copy()

In [12]:
model = XGBRegressor(
    tree_method        = "hist", 
    enable_categorical = True, 
    n_jobs             = -1,
    random_state       = 42,
    eval_metric        = 'mae'
)

model.fit(X_train, y_train, eval_set=[(X_val, y_val)])

[0]	validation_0-mae:6319.80790
[1]	validation_0-mae:4910.12595
[2]	validation_0-mae:4012.10576
[3]	validation_0-mae:3440.47522
[4]	validation_0-mae:3078.88602
[5]	validation_0-mae:2858.72300
[6]	validation_0-mae:2714.78401
[7]	validation_0-mae:2627.51972
[8]	validation_0-mae:2565.91022
[9]	validation_0-mae:2527.79131
[10]	validation_0-mae:2497.59244
[11]	validation_0-mae:2479.03684
[12]	validation_0-mae:2463.50515
[13]	validation_0-mae:2451.29698
[14]	validation_0-mae:2440.55490
[15]	validation_0-mae:2433.16843
[16]	validation_0-mae:2425.04998
[17]	validation_0-mae:2420.79163
[18]	validation_0-mae:2417.77474
[19]	validation_0-mae:2412.44718
[20]	validation_0-mae:2409.15739
[21]	validation_0-mae:2404.81682
[22]	validation_0-mae:2401.76195
[23]	validation_0-mae:2395.95594
[24]	validation_0-mae:2394.08033
[25]	validation_0-mae:2387.95985
[26]	validation_0-mae:2382.55306
[27]	validation_0-mae:2378.02360
[28]	validation_0-mae:2375.37574
[29]	validation_0-mae:2371.25321
[30]	validation_0-ma

In [13]:
from sklearn.metrics import mean_absolute_error
mean_absolute_error(y_test, model.predict(X_test))

2395.0134768554685

In [14]:
# Validación empleando el Out-of-Bag error

scores = []

# Valores evaluados
hiper_vals =  [i for i in range(50, 150, 5)]

# Bucle para entrenar un modelo con cada valor de n_estimators y extraer su error
for vals in hiper_vals:
    modelo = XGBRegressor(tree_method       = "hist", 
                         enable_categorical = True, 
                         n_jobs             = -1,
                         random_state       = 42,
                         eval_metric        = 'mae',
                         n_estimators       = vals, 
                         )
    
    modelo.fit(X_train, y_train, eval_set=[(X_val, y_val)])
    scores.append(mean_absolute_error(y_test, modelo.predict(X_test)))
    
    
fig, ax = plt.subplots(figsize=(6, 3.84))
ax.plot(hiper_vals, scores, label="MAE")
ax.plot(hiper_vals[np.argmin(scores)], min(scores),
        marker='o', color = "red", label="min MAE")
ax.set_ylabel("MAE")
ax.set_xlabel("n_estimators")
ax.set_title("Evolución del MAE vs n_estimators")
plt.legend();
print(f"Valor óptimo de n_estimators: {hiper_vals[np.argmin(scores)]}")

print(f"MAE: {min(scores)}")

[0]	validation_0-mae:6319.80790
[1]	validation_0-mae:4910.12595
[2]	validation_0-mae:4012.10576
[3]	validation_0-mae:3440.47522
[4]	validation_0-mae:3078.88602
[5]	validation_0-mae:2858.72300
[6]	validation_0-mae:2714.78401
[7]	validation_0-mae:2627.51972
[8]	validation_0-mae:2565.91022
[9]	validation_0-mae:2527.79131
[10]	validation_0-mae:2497.59244
[11]	validation_0-mae:2479.03684
[12]	validation_0-mae:2463.50515
[13]	validation_0-mae:2451.29698
[14]	validation_0-mae:2440.55490
[15]	validation_0-mae:2433.16843
[16]	validation_0-mae:2425.04998
[17]	validation_0-mae:2420.79163
[18]	validation_0-mae:2417.77474
[19]	validation_0-mae:2412.44718
[20]	validation_0-mae:2409.15739
[21]	validation_0-mae:2404.81682
[22]	validation_0-mae:2401.76195
[23]	validation_0-mae:2395.95594
[24]	validation_0-mae:2394.08033
[25]	validation_0-mae:2387.95985
[26]	validation_0-mae:2382.55306
[27]	validation_0-mae:2378.02360
[28]	validation_0-mae:2375.37574
[29]	validation_0-mae:2371.25321
[30]	validation_0-ma

[20]	validation_0-mae:2409.15739
[21]	validation_0-mae:2404.81682
[22]	validation_0-mae:2401.76195
[23]	validation_0-mae:2395.95594
[24]	validation_0-mae:2394.08033
[25]	validation_0-mae:2387.95985
[26]	validation_0-mae:2382.55306
[27]	validation_0-mae:2378.02360
[28]	validation_0-mae:2375.37574
[29]	validation_0-mae:2371.25321
[30]	validation_0-mae:2364.72523
[31]	validation_0-mae:2360.79187
[32]	validation_0-mae:2358.32403
[33]	validation_0-mae:2356.52977
[34]	validation_0-mae:2352.29723
[35]	validation_0-mae:2349.91269
[36]	validation_0-mae:2347.29404
[37]	validation_0-mae:2343.27902
[38]	validation_0-mae:2342.12111
[39]	validation_0-mae:2339.86006
[40]	validation_0-mae:2337.18346
[41]	validation_0-mae:2334.63724
[42]	validation_0-mae:2332.91168
[43]	validation_0-mae:2330.99668
[44]	validation_0-mae:2329.06511
[45]	validation_0-mae:2326.39893
[46]	validation_0-mae:2323.80742
[47]	validation_0-mae:2321.79337
[48]	validation_0-mae:2320.57997
[49]	validation_0-mae:2316.78676
[50]	valid

[45]	validation_0-mae:2326.39893
[46]	validation_0-mae:2323.80742
[47]	validation_0-mae:2321.79337
[48]	validation_0-mae:2320.57997
[49]	validation_0-mae:2316.78676
[50]	validation_0-mae:2314.89494
[51]	validation_0-mae:2313.95427
[52]	validation_0-mae:2313.08293
[53]	validation_0-mae:2311.10934
[54]	validation_0-mae:2310.13452
[55]	validation_0-mae:2307.88645
[56]	validation_0-mae:2306.00115
[57]	validation_0-mae:2304.54952
[58]	validation_0-mae:2303.01149
[59]	validation_0-mae:2301.74385
[60]	validation_0-mae:2299.60461
[61]	validation_0-mae:2296.67778
[62]	validation_0-mae:2295.19225
[63]	validation_0-mae:2293.63649
[64]	validation_0-mae:2291.87253
[65]	validation_0-mae:2290.03114
[66]	validation_0-mae:2288.91942
[67]	validation_0-mae:2287.13421
[68]	validation_0-mae:2284.39286
[69]	validation_0-mae:2283.44727
[70]	validation_0-mae:2282.55855
[71]	validation_0-mae:2280.83566
[72]	validation_0-mae:2279.92428
[73]	validation_0-mae:2278.73714
[74]	validation_0-mae:2277.87412
[75]	valid

[25]	validation_0-mae:2387.95985
[26]	validation_0-mae:2382.55306
[27]	validation_0-mae:2378.02360
[28]	validation_0-mae:2375.37574
[29]	validation_0-mae:2371.25321
[30]	validation_0-mae:2364.72523
[31]	validation_0-mae:2360.79187
[32]	validation_0-mae:2358.32403
[33]	validation_0-mae:2356.52977
[34]	validation_0-mae:2352.29723
[35]	validation_0-mae:2349.91269
[36]	validation_0-mae:2347.29404
[37]	validation_0-mae:2343.27902
[38]	validation_0-mae:2342.12111
[39]	validation_0-mae:2339.86006
[40]	validation_0-mae:2337.18346
[41]	validation_0-mae:2334.63724
[42]	validation_0-mae:2332.91168
[43]	validation_0-mae:2330.99668
[44]	validation_0-mae:2329.06511
[45]	validation_0-mae:2326.39893
[46]	validation_0-mae:2323.80742
[47]	validation_0-mae:2321.79337
[48]	validation_0-mae:2320.57997
[49]	validation_0-mae:2316.78676
[50]	validation_0-mae:2314.89494
[51]	validation_0-mae:2313.95427
[52]	validation_0-mae:2313.08293
[53]	validation_0-mae:2311.10934
[54]	validation_0-mae:2310.13452
[55]	valid

[69]	validation_0-mae:2283.44727
[70]	validation_0-mae:2282.55855
[71]	validation_0-mae:2280.83566
[72]	validation_0-mae:2279.92428
[73]	validation_0-mae:2278.73714
[74]	validation_0-mae:2277.87412
[75]	validation_0-mae:2277.16117
[76]	validation_0-mae:2274.79822
[77]	validation_0-mae:2273.36678
[78]	validation_0-mae:2272.43072
[79]	validation_0-mae:2270.89147
[80]	validation_0-mae:2270.10620
[81]	validation_0-mae:2269.06579
[82]	validation_0-mae:2268.01752
[83]	validation_0-mae:2265.92976
[84]	validation_0-mae:2264.19224
[85]	validation_0-mae:2262.79311
[86]	validation_0-mae:2261.07987
[87]	validation_0-mae:2260.28984
[88]	validation_0-mae:2258.46529
[89]	validation_0-mae:2257.52070
[90]	validation_0-mae:2256.49125
[91]	validation_0-mae:2254.97547
[92]	validation_0-mae:2254.31857
[93]	validation_0-mae:2253.68501
[94]	validation_0-mae:2252.92942
[95]	validation_0-mae:2251.64043
[96]	validation_0-mae:2250.97244
[97]	validation_0-mae:2249.87809
[98]	validation_0-mae:2248.60486
[99]	valid

[93]	validation_0-mae:2253.68501
[94]	validation_0-mae:2252.92942
[95]	validation_0-mae:2251.64043
[96]	validation_0-mae:2250.97244
[97]	validation_0-mae:2249.87809
[98]	validation_0-mae:2248.60486
[99]	validation_0-mae:2247.35267
[100]	validation_0-mae:2246.29409
[101]	validation_0-mae:2245.64908
[102]	validation_0-mae:2245.06453
[103]	validation_0-mae:2243.90258
[104]	validation_0-mae:2243.20346
[105]	validation_0-mae:2242.46856
[106]	validation_0-mae:2240.74619
[107]	validation_0-mae:2239.98660
[108]	validation_0-mae:2238.08930
[109]	validation_0-mae:2236.64481
[110]	validation_0-mae:2235.60429
[111]	validation_0-mae:2234.94698
[112]	validation_0-mae:2233.89916
[113]	validation_0-mae:2233.04904
[114]	validation_0-mae:2231.51641
[115]	validation_0-mae:2230.40479
[116]	validation_0-mae:2229.94698
[117]	validation_0-mae:2229.08668
[118]	validation_0-mae:2228.51983
[119]	validation_0-mae:2228.09738
[0]	validation_0-mae:6319.80790
[1]	validation_0-mae:4910.12595
[2]	validation_0-mae:4012

[96]	validation_0-mae:2250.97244
[97]	validation_0-mae:2249.87809
[98]	validation_0-mae:2248.60486
[99]	validation_0-mae:2247.35267
[100]	validation_0-mae:2246.29409
[101]	validation_0-mae:2245.64908
[102]	validation_0-mae:2245.06453
[103]	validation_0-mae:2243.90258
[104]	validation_0-mae:2243.20346
[105]	validation_0-mae:2242.46856
[106]	validation_0-mae:2240.74619
[107]	validation_0-mae:2239.98660
[108]	validation_0-mae:2238.08930
[109]	validation_0-mae:2236.64481
[110]	validation_0-mae:2235.60429
[111]	validation_0-mae:2234.94698
[112]	validation_0-mae:2233.89916
[113]	validation_0-mae:2233.04904
[114]	validation_0-mae:2231.51641
[115]	validation_0-mae:2230.40479
[116]	validation_0-mae:2229.94698
[117]	validation_0-mae:2229.08668
[118]	validation_0-mae:2228.51983
[119]	validation_0-mae:2228.09738
[120]	validation_0-mae:2227.24125
[121]	validation_0-mae:2225.82554
[122]	validation_0-mae:2224.47063
[123]	validation_0-mae:2223.53046
[124]	validation_0-mae:2222.25703
[125]	validation_0

[78]	validation_0-mae:2272.43072
[79]	validation_0-mae:2270.89147
[80]	validation_0-mae:2270.10620
[81]	validation_0-mae:2269.06579
[82]	validation_0-mae:2268.01752
[83]	validation_0-mae:2265.92976
[84]	validation_0-mae:2264.19224
[85]	validation_0-mae:2262.79311
[86]	validation_0-mae:2261.07987
[87]	validation_0-mae:2260.28984
[88]	validation_0-mae:2258.46529
[89]	validation_0-mae:2257.52070
[90]	validation_0-mae:2256.49125
[91]	validation_0-mae:2254.97547
[92]	validation_0-mae:2254.31857
[93]	validation_0-mae:2253.68501
[94]	validation_0-mae:2252.92942
[95]	validation_0-mae:2251.64043
[96]	validation_0-mae:2250.97244
[97]	validation_0-mae:2249.87809
[98]	validation_0-mae:2248.60486
[99]	validation_0-mae:2247.35267
[100]	validation_0-mae:2246.29409
[101]	validation_0-mae:2245.64908
[102]	validation_0-mae:2245.06453
[103]	validation_0-mae:2243.90258
[104]	validation_0-mae:2243.20346
[105]	validation_0-mae:2242.46856
[106]	validation_0-mae:2240.74619
[107]	validation_0-mae:2239.98660
[1

NameError: name 'plt' is not defined

In [None]:
# Validación empleando el Out-of-Bag error

scores = []

# Valores evaluados
hiper_vals =  [i for i in range(1, 20, 1)]

# Bucle para entrenar un modelo con cada valor de n_estimators y extraer su error
for vals in hiper_vals:
    modelo = XGBRegressor(tree_method       = "hist", 
                         enable_categorical = True, 
                         n_jobs             = -1,
                         n_estimators       = 110,
                         random_state       = 42,
                         eval_metric        = 'mae',
                         max_depth          = vals)
    
    modelo.fit(X_train, y_train, eval_set=[(X_val, y_val)])
    scores.append(mean_absolute_error(y_test, modelo.predict(X_test)))
    
    
fig, ax = plt.subplots(figsize=(6, 3.84))
ax.plot(hiper_vals, scores, label="MAE")
ax.plot(hiper_vals[np.argmin(scores)], min(scores),
        marker='o', color = "red", label="min MAE")
ax.set_ylabel("MAE")
ax.set_xlabel("max_depth")
ax.set_title("Evolución del MAE vs max_depth")
plt.legend();
print(f"Valor óptimo de max_depth: {hiper_vals[np.argmin(scores)]}")

print(f"MAE: {min(scores)}")

In [None]:
# Validación empleando el Out-of-Bag error

scores = []

# Valores evaluados
hiper_vals = [i  for i in range(100, 210, 10)]

# Bucle para entrenar un modelo con cada valor de n_estimators y extraer su error
for vals in hiper_vals:
    modelo = XGBRegressor(tree_method       = "hist", 
                         enable_categorical = True, 
                         n_jobs             = -1,
                         n_estimators       = 110,
                         random_state       = 42,
                         eval_metric        = 'mae',
                         max_depth          = 6,
                         max_bin            = vals)
    
    modelo.fit(X_train, y_train, eval_set=[(X_val, y_val)])
    scores.append(mean_absolute_error(y_test, modelo.predict(X_test)))
    
    
fig, ax = plt.subplots(figsize=(6, 3.84))
ax.plot(hiper_vals, scores, label="MAE")
ax.plot(hiper_vals[np.argmin(scores)], min(scores),
        marker='o', color = "red", label="min MAE")
ax.set_ylabel("MAE")
ax.set_xlabel("max_bin")
ax.set_title("Evolución del MAE vs max_bin")
plt.legend();
print(f"Valor óptimo de max_bin: {hiper_vals[np.argmin(scores)]}")
print(f"MAE: {min(scores)}")

In [15]:
scores = []

# Valores evaluados
hiper_vals = [i/10 for i in range(1, 11, 1)]

# Bucle para entrenar un modelo con cada valor de n_estimators y extraer su error
# de entrenamiento y de Out-of-Bag.
for vals in hiper_vals:
    modelo = XGBRegressor(tree_method       = "hist", 
                         enable_categorical = True, 
                         n_jobs             = -1,
                         n_estimators       = 110,
                         random_state       = 42,
                         max_depth          = 6,
                         eval_metric        = 'mae',
                         max_bin            = 190,
                         subsample          = vals)
    
    modelo.fit(X_train, y_train, eval_set=[(X_val, y_val)])
    scores.append(mean_absolute_error(y_test, modelo.predict(X_test)))
    
    
fig, ax = plt.subplots(figsize=(6, 3.84))
ax.plot(hiper_vals, scores, label="MAE")
ax.plot(hiper_vals[np.argmin(scores)], min(scores),
        marker='o', color = "red", label="min MAE")
ax.set_ylabel("MAE")
ax.set_xlabel("subsample")
ax.set_title("Evolución del MAE vs subsample")
plt.legend();
print(f"Valor óptimo de subsample: {hiper_vals[np.argmin(scores)]}")

print(f"MAE: {min(scores)}")

[0]	validation_0-mae:6334.67689
[1]	validation_0-mae:4941.22985
[2]	validation_0-mae:4039.66220
[3]	validation_0-mae:3458.20308
[4]	validation_0-mae:3101.73818
[5]	validation_0-mae:2882.17778
[6]	validation_0-mae:2759.89508
[7]	validation_0-mae:2675.46128
[8]	validation_0-mae:2624.65867
[9]	validation_0-mae:2587.38114
[10]	validation_0-mae:2565.53420
[11]	validation_0-mae:2548.17911
[12]	validation_0-mae:2539.69371
[13]	validation_0-mae:2532.39212
[14]	validation_0-mae:2528.30513
[15]	validation_0-mae:2522.10590
[16]	validation_0-mae:2521.94838
[17]	validation_0-mae:2519.37757
[18]	validation_0-mae:2518.21841
[19]	validation_0-mae:2517.21252
[20]	validation_0-mae:2514.90740
[21]	validation_0-mae:2511.91540
[22]	validation_0-mae:2513.23236
[23]	validation_0-mae:2512.66386
[24]	validation_0-mae:2514.24934
[25]	validation_0-mae:2512.63388
[26]	validation_0-mae:2513.44585
[27]	validation_0-mae:2513.33648
[28]	validation_0-mae:2512.66330
[29]	validation_0-mae:2513.44825
[30]	validation_0-ma

[29]	validation_0-mae:2423.51841
[30]	validation_0-mae:2422.61521
[31]	validation_0-mae:2418.70760
[32]	validation_0-mae:2417.25134
[33]	validation_0-mae:2415.01740
[34]	validation_0-mae:2413.01244
[35]	validation_0-mae:2413.05596
[36]	validation_0-mae:2410.34726
[37]	validation_0-mae:2408.57319
[38]	validation_0-mae:2405.32432
[39]	validation_0-mae:2404.68386
[40]	validation_0-mae:2403.17578
[41]	validation_0-mae:2402.04099
[42]	validation_0-mae:2400.92022
[43]	validation_0-mae:2401.37505
[44]	validation_0-mae:2399.68126
[45]	validation_0-mae:2399.75142
[46]	validation_0-mae:2398.04549
[47]	validation_0-mae:2396.60075
[48]	validation_0-mae:2395.62270
[49]	validation_0-mae:2394.18927
[50]	validation_0-mae:2395.58239
[51]	validation_0-mae:2394.88312
[52]	validation_0-mae:2392.47728
[53]	validation_0-mae:2390.02282
[54]	validation_0-mae:2390.66494
[55]	validation_0-mae:2390.23150
[56]	validation_0-mae:2387.17338
[57]	validation_0-mae:2386.79729
[58]	validation_0-mae:2385.66478
[59]	valid

[58]	validation_0-mae:2353.67407
[59]	validation_0-mae:2352.82374
[60]	validation_0-mae:2350.99086
[61]	validation_0-mae:2349.68257
[62]	validation_0-mae:2348.38764
[63]	validation_0-mae:2347.85690
[64]	validation_0-mae:2347.53319
[65]	validation_0-mae:2347.67918
[66]	validation_0-mae:2345.79550
[67]	validation_0-mae:2346.16667
[68]	validation_0-mae:2345.09991
[69]	validation_0-mae:2344.11286
[70]	validation_0-mae:2343.12395
[71]	validation_0-mae:2342.12434
[72]	validation_0-mae:2341.14508
[73]	validation_0-mae:2340.87761
[74]	validation_0-mae:2339.45167
[75]	validation_0-mae:2339.18391
[76]	validation_0-mae:2339.07075
[77]	validation_0-mae:2339.43747
[78]	validation_0-mae:2339.17593
[79]	validation_0-mae:2338.40755
[80]	validation_0-mae:2336.58850
[81]	validation_0-mae:2335.47843
[82]	validation_0-mae:2335.31453
[83]	validation_0-mae:2334.38190
[84]	validation_0-mae:2334.21979
[85]	validation_0-mae:2333.60727
[86]	validation_0-mae:2332.45116
[87]	validation_0-mae:2331.37859
[88]	valid

[87]	validation_0-mae:2308.48835
[88]	validation_0-mae:2308.56228
[89]	validation_0-mae:2308.11655
[90]	validation_0-mae:2307.22152
[91]	validation_0-mae:2305.71838
[92]	validation_0-mae:2304.93163
[93]	validation_0-mae:2303.24470
[94]	validation_0-mae:2302.52573
[95]	validation_0-mae:2302.68513
[96]	validation_0-mae:2300.84743
[97]	validation_0-mae:2300.63810
[98]	validation_0-mae:2299.23068
[99]	validation_0-mae:2298.58486
[100]	validation_0-mae:2298.11547
[101]	validation_0-mae:2297.59056
[102]	validation_0-mae:2297.03691
[103]	validation_0-mae:2296.30787
[104]	validation_0-mae:2295.90395
[105]	validation_0-mae:2294.87222
[106]	validation_0-mae:2295.12148
[107]	validation_0-mae:2294.94870
[108]	validation_0-mae:2294.35796
[109]	validation_0-mae:2293.47420
[0]	validation_0-mae:6317.89536
[1]	validation_0-mae:4910.85044
[2]	validation_0-mae:4008.44860
[3]	validation_0-mae:3441.28316
[4]	validation_0-mae:3089.21517
[5]	validation_0-mae:2861.12921
[6]	validation_0-mae:2721.81217
[7]	val

[6]	validation_0-mae:2717.54799
[7]	validation_0-mae:2621.50537
[8]	validation_0-mae:2562.29941
[9]	validation_0-mae:2523.79738
[10]	validation_0-mae:2495.35781
[11]	validation_0-mae:2476.20504
[12]	validation_0-mae:2458.59468
[13]	validation_0-mae:2446.84214
[14]	validation_0-mae:2438.44094
[15]	validation_0-mae:2432.33880
[16]	validation_0-mae:2423.98189
[17]	validation_0-mae:2416.50543
[18]	validation_0-mae:2411.67988
[19]	validation_0-mae:2407.97862
[20]	validation_0-mae:2403.85288
[21]	validation_0-mae:2397.48397
[22]	validation_0-mae:2395.79955
[23]	validation_0-mae:2389.16055
[24]	validation_0-mae:2385.22883
[25]	validation_0-mae:2382.15221
[26]	validation_0-mae:2379.84391
[27]	validation_0-mae:2375.41993
[28]	validation_0-mae:2373.33382
[29]	validation_0-mae:2370.98588
[30]	validation_0-mae:2367.87620
[31]	validation_0-mae:2364.84199
[32]	validation_0-mae:2361.69869
[33]	validation_0-mae:2360.07565
[34]	validation_0-mae:2357.33249
[35]	validation_0-mae:2353.84302
[36]	validatio

NameError: name 'plt' is not defined

In [16]:
scores = []

# Valores evaluados
hiper_vals =  [i/20 for i in range(1, 11, 1)]

# Bucle para entrenar un modelo con cada valor de n_estimators y extraer su error
# de entrenamiento y de Out-of-Bag.
for vals in hiper_vals:
    modelo = XGBRegressor(tree_method       = "hist", 
                         enable_categorical = True, 
                         n_jobs             = -1,
                         n_estimators       = 110,
                         random_state       = 42,
                         max_depth          = 6,
                         subsample          = 1,
                         max_bin            = 190, 
                         eval_metric        = 'mae',
                         learning_rate      = vals)
    
    modelo.fit(X_train, y_train, eval_set=[(X_val, y_val)])
    scores.append(mean_absolute_error(y_test, modelo.predict(X_test)))
    
    
fig, ax = plt.subplots(figsize=(6, 3.84))
ax.plot(hiper_vals, scores, label="MAE")
ax.plot(hiper_vals[np.argmin(scores)], min(scores),
        marker='o', color = "red", label="min MAE")
ax.set_ylabel("MAE")
ax.set_xlabel("learning_rate")
ax.set_title("Evolución del MAE vs learning_rate")
plt.legend();
print(f"Valor óptimo de learning_rate: {hiper_vals[np.argmin(scores)]}")

print(f"MAE: {min(scores)}")

[0]	validation_0-mae:8054.47820
[1]	validation_0-mae:7717.74457
[2]	validation_0-mae:7400.57821
[3]	validation_0-mae:7097.30557
[4]	validation_0-mae:6811.01639
[5]	validation_0-mae:6538.70503
[6]	validation_0-mae:6285.80194
[7]	validation_0-mae:6043.36495
[8]	validation_0-mae:5816.08683
[9]	validation_0-mae:5600.67065
[10]	validation_0-mae:5398.85684
[11]	validation_0-mae:5208.82548
[12]	validation_0-mae:5031.21385
[13]	validation_0-mae:4865.31927
[14]	validation_0-mae:4707.53533
[15]	validation_0-mae:4559.73797
[16]	validation_0-mae:4420.75255
[17]	validation_0-mae:4290.93678
[18]	validation_0-mae:4169.30792
[19]	validation_0-mae:4054.46871
[20]	validation_0-mae:3947.42842
[21]	validation_0-mae:3846.97551
[22]	validation_0-mae:3753.10749
[23]	validation_0-mae:3666.31105
[24]	validation_0-mae:3584.43387
[25]	validation_0-mae:3508.04521
[26]	validation_0-mae:3437.82332
[27]	validation_0-mae:3371.81712
[28]	validation_0-mae:3310.36529
[29]	validation_0-mae:3252.53984
[30]	validation_0-ma

[29]	validation_0-mae:2418.28214
[30]	validation_0-mae:2411.98606
[31]	validation_0-mae:2407.24208
[32]	validation_0-mae:2402.35768
[33]	validation_0-mae:2398.64905
[34]	validation_0-mae:2394.77507
[35]	validation_0-mae:2392.42379
[36]	validation_0-mae:2389.36754
[37]	validation_0-mae:2386.42143
[38]	validation_0-mae:2384.04178
[39]	validation_0-mae:2382.03938
[40]	validation_0-mae:2379.51403
[41]	validation_0-mae:2376.68880
[42]	validation_0-mae:2374.94571
[43]	validation_0-mae:2373.17832
[44]	validation_0-mae:2370.58938
[45]	validation_0-mae:2368.63865
[46]	validation_0-mae:2366.42810
[47]	validation_0-mae:2365.17139
[48]	validation_0-mae:2362.86102
[49]	validation_0-mae:2361.81217
[50]	validation_0-mae:2360.11879
[51]	validation_0-mae:2358.94381
[52]	validation_0-mae:2357.28113
[53]	validation_0-mae:2354.53985
[54]	validation_0-mae:2353.88528
[55]	validation_0-mae:2352.65784
[56]	validation_0-mae:2351.83300
[57]	validation_0-mae:2350.02070
[58]	validation_0-mae:2348.55544
[59]	valid

[58]	validation_0-mae:2316.91934
[59]	validation_0-mae:2315.36774
[60]	validation_0-mae:2313.93776
[61]	validation_0-mae:2312.76244
[62]	validation_0-mae:2311.08743
[63]	validation_0-mae:2309.42291
[64]	validation_0-mae:2308.20043
[65]	validation_0-mae:2306.20975
[66]	validation_0-mae:2303.96144
[67]	validation_0-mae:2302.65875
[68]	validation_0-mae:2301.36923
[69]	validation_0-mae:2300.26538
[70]	validation_0-mae:2299.18785
[71]	validation_0-mae:2298.26540
[72]	validation_0-mae:2297.24733
[73]	validation_0-mae:2295.15335
[74]	validation_0-mae:2292.44679
[75]	validation_0-mae:2291.40188
[76]	validation_0-mae:2290.26996
[77]	validation_0-mae:2288.99437
[78]	validation_0-mae:2287.48358
[79]	validation_0-mae:2286.39522
[80]	validation_0-mae:2285.03533
[81]	validation_0-mae:2284.03032
[82]	validation_0-mae:2282.74539
[83]	validation_0-mae:2281.59818
[84]	validation_0-mae:2280.96498
[85]	validation_0-mae:2279.68814
[86]	validation_0-mae:2278.56060
[87]	validation_0-mae:2277.06970
[88]	valid

[87]	validation_0-mae:2251.05068
[88]	validation_0-mae:2249.89691
[89]	validation_0-mae:2248.61517
[90]	validation_0-mae:2248.05179
[91]	validation_0-mae:2247.51429
[92]	validation_0-mae:2246.04781
[93]	validation_0-mae:2245.22039
[94]	validation_0-mae:2243.96374
[95]	validation_0-mae:2242.98253
[96]	validation_0-mae:2241.57001
[97]	validation_0-mae:2241.00107
[98]	validation_0-mae:2239.67790
[99]	validation_0-mae:2238.17605
[100]	validation_0-mae:2236.41890
[101]	validation_0-mae:2234.88616
[102]	validation_0-mae:2234.22003
[103]	validation_0-mae:2233.57234
[104]	validation_0-mae:2232.25070
[105]	validation_0-mae:2231.57992
[106]	validation_0-mae:2230.64596
[107]	validation_0-mae:2229.45728
[108]	validation_0-mae:2228.12800
[109]	validation_0-mae:2226.68410
[0]	validation_0-mae:5660.24721
[1]	validation_0-mae:4136.99710
[2]	validation_0-mae:3349.31494
[3]	validation_0-mae:2937.76480
[4]	validation_0-mae:2737.23592
[5]	validation_0-mae:2617.97473
[6]	validation_0-mae:2558.16389
[7]	val

[6]	validation_0-mae:2539.27966
[7]	validation_0-mae:2522.28905
[8]	validation_0-mae:2509.27306
[9]	validation_0-mae:2490.46484
[10]	validation_0-mae:2470.68036
[11]	validation_0-mae:2462.13973
[12]	validation_0-mae:2452.68731
[13]	validation_0-mae:2446.11444
[14]	validation_0-mae:2437.60887
[15]	validation_0-mae:2431.97564
[16]	validation_0-mae:2422.84181
[17]	validation_0-mae:2408.97298
[18]	validation_0-mae:2404.37755
[19]	validation_0-mae:2395.34745
[20]	validation_0-mae:2393.53302
[21]	validation_0-mae:2389.09937
[22]	validation_0-mae:2385.18346
[23]	validation_0-mae:2380.92945
[24]	validation_0-mae:2376.57055
[25]	validation_0-mae:2371.27982
[26]	validation_0-mae:2368.56414
[27]	validation_0-mae:2362.93801
[28]	validation_0-mae:2360.05761
[29]	validation_0-mae:2357.53929
[30]	validation_0-mae:2354.06226
[31]	validation_0-mae:2349.23069
[32]	validation_0-mae:2346.28400
[33]	validation_0-mae:2341.97259
[34]	validation_0-mae:2340.64211
[35]	validation_0-mae:2338.99083
[36]	validatio

NameError: name 'plt' is not defined

In [17]:
scores = []

# Valores evaluados
hiper_vals =  [i for i in range(1, 11, 1)]

# Bucle para entrenar un modelo con cada valor de n_estimators y extraer su error
# de entrenamiento y de Out-of-Bag.
for vals in hiper_vals:
    modelo = XGBRegressor(tree_method       = "hist", 
                         enable_categorical = True, 
                         n_jobs             = -1,
                         n_estimators       = 110,
                         random_state       = 42,
                         max_depth          = 6,
                         subsample          = 1,
                         learning_rate      = 0.2,
                         max_bin            = 190,  
                         eval_metric        = 'mae',
                         min_child_weight   = vals)
    
    modelo.fit(X_train, y_train, eval_set=[(X_val, y_val)])
    scores.append(mean_absolute_error(y_test, modelo.predict(X_test)))
    
    
fig, ax = plt.subplots(figsize=(6, 3.84))
ax.plot(hiper_vals, scores, label="MAE")
ax.plot(hiper_vals[np.argmin(scores)], min(scores),
        marker='o', color = "red", label="min MAE")
ax.set_ylabel("MAE")
ax.set_xlabel("min_child_weight")
ax.set_title("Evolución del MAE vs min_child_weight")
plt.legend();
print(f"Valor óptimo de min_child_weight: {hiper_vals[np.argmin(scores)]}")

print(f"MAE: {min(scores)}")

[0]	validation_0-mae:7002.46878
[1]	validation_0-mae:5889.84464
[2]	validation_0-mae:5034.82536
[3]	validation_0-mae:4389.70992
[4]	validation_0-mae:3892.37695
[5]	validation_0-mae:3522.88070
[6]	validation_0-mae:3245.87711
[7]	validation_0-mae:3043.28550
[8]	validation_0-mae:2888.48544
[9]	validation_0-mae:2780.09598
[10]	validation_0-mae:2701.21266
[11]	validation_0-mae:2638.89102
[12]	validation_0-mae:2589.87662
[13]	validation_0-mae:2550.58953
[14]	validation_0-mae:2522.52501
[15]	validation_0-mae:2498.03191
[16]	validation_0-mae:2480.88886
[17]	validation_0-mae:2463.69396
[18]	validation_0-mae:2452.56919
[19]	validation_0-mae:2442.40602
[20]	validation_0-mae:2434.82048
[21]	validation_0-mae:2424.55517
[22]	validation_0-mae:2417.02825
[23]	validation_0-mae:2412.21623
[24]	validation_0-mae:2407.35939
[25]	validation_0-mae:2402.63919
[26]	validation_0-mae:2399.26855
[27]	validation_0-mae:2395.50133
[28]	validation_0-mae:2393.68349
[29]	validation_0-mae:2388.17842
[30]	validation_0-ma

[29]	validation_0-mae:2390.73655
[30]	validation_0-mae:2388.41960
[31]	validation_0-mae:2385.58034
[32]	validation_0-mae:2382.70378
[33]	validation_0-mae:2381.16842
[34]	validation_0-mae:2379.69270
[35]	validation_0-mae:2377.17595
[36]	validation_0-mae:2375.28198
[37]	validation_0-mae:2373.04974
[38]	validation_0-mae:2370.61373
[39]	validation_0-mae:2367.79283
[40]	validation_0-mae:2366.27664
[41]	validation_0-mae:2363.43527
[42]	validation_0-mae:2361.41010
[43]	validation_0-mae:2359.94601
[44]	validation_0-mae:2357.48653
[45]	validation_0-mae:2353.27194
[46]	validation_0-mae:2350.58726
[47]	validation_0-mae:2347.43357
[48]	validation_0-mae:2346.55386
[49]	validation_0-mae:2344.98626
[50]	validation_0-mae:2343.12734
[51]	validation_0-mae:2341.62144
[52]	validation_0-mae:2340.68444
[53]	validation_0-mae:2339.23253
[54]	validation_0-mae:2338.07602
[55]	validation_0-mae:2337.70068
[56]	validation_0-mae:2336.00139
[57]	validation_0-mae:2334.53379
[58]	validation_0-mae:2331.76606
[59]	valid

[58]	validation_0-mae:2330.07498
[59]	validation_0-mae:2328.93876
[60]	validation_0-mae:2326.67665
[61]	validation_0-mae:2324.95302
[62]	validation_0-mae:2323.45770
[63]	validation_0-mae:2322.24239
[64]	validation_0-mae:2320.85977
[65]	validation_0-mae:2319.41869
[66]	validation_0-mae:2317.33872
[67]	validation_0-mae:2315.95112
[68]	validation_0-mae:2314.74034
[69]	validation_0-mae:2313.89979
[70]	validation_0-mae:2311.89075
[71]	validation_0-mae:2310.59068
[72]	validation_0-mae:2309.70456
[73]	validation_0-mae:2308.60087
[74]	validation_0-mae:2308.21181
[75]	validation_0-mae:2306.71362
[76]	validation_0-mae:2306.08766
[77]	validation_0-mae:2305.23543
[78]	validation_0-mae:2304.57671
[79]	validation_0-mae:2304.16082
[80]	validation_0-mae:2302.41653
[81]	validation_0-mae:2301.52002
[82]	validation_0-mae:2300.69548
[83]	validation_0-mae:2299.74719
[84]	validation_0-mae:2298.39293
[85]	validation_0-mae:2297.83029
[86]	validation_0-mae:2296.87506
[87]	validation_0-mae:2296.05669
[88]	valid

[87]	validation_0-mae:2301.16779
[88]	validation_0-mae:2300.24593
[89]	validation_0-mae:2299.30905
[90]	validation_0-mae:2298.57804
[91]	validation_0-mae:2297.84085
[92]	validation_0-mae:2296.93638
[93]	validation_0-mae:2296.71903
[94]	validation_0-mae:2296.24370
[95]	validation_0-mae:2295.27856
[96]	validation_0-mae:2294.14600
[97]	validation_0-mae:2293.10250
[98]	validation_0-mae:2292.24783
[99]	validation_0-mae:2291.23565
[100]	validation_0-mae:2290.50234
[101]	validation_0-mae:2289.23784
[102]	validation_0-mae:2287.39483
[103]	validation_0-mae:2286.67450
[104]	validation_0-mae:2285.71424
[105]	validation_0-mae:2285.03273
[106]	validation_0-mae:2283.49953
[107]	validation_0-mae:2282.97782
[108]	validation_0-mae:2281.99895
[109]	validation_0-mae:2280.55465
[0]	validation_0-mae:7000.36427
[1]	validation_0-mae:5887.24780
[2]	validation_0-mae:5038.95639
[3]	validation_0-mae:4384.82305
[4]	validation_0-mae:3889.57659
[5]	validation_0-mae:3523.97634
[6]	validation_0-mae:3247.20815
[7]	val

[6]	validation_0-mae:3245.47472
[7]	validation_0-mae:3041.62565
[8]	validation_0-mae:2897.26171
[9]	validation_0-mae:2785.78497
[10]	validation_0-mae:2704.30996
[11]	validation_0-mae:2642.06488
[12]	validation_0-mae:2591.73013
[13]	validation_0-mae:2556.75401
[14]	validation_0-mae:2527.77198
[15]	validation_0-mae:2505.38708
[16]	validation_0-mae:2487.55823
[17]	validation_0-mae:2475.05791
[18]	validation_0-mae:2462.00898
[19]	validation_0-mae:2452.65929
[20]	validation_0-mae:2443.82422
[21]	validation_0-mae:2435.52964
[22]	validation_0-mae:2429.56150
[23]	validation_0-mae:2424.06781
[24]	validation_0-mae:2418.47131
[25]	validation_0-mae:2414.78761
[26]	validation_0-mae:2411.63959
[27]	validation_0-mae:2406.83176
[28]	validation_0-mae:2405.06526
[29]	validation_0-mae:2402.70870
[30]	validation_0-mae:2399.44226
[31]	validation_0-mae:2398.40174
[32]	validation_0-mae:2395.55346
[33]	validation_0-mae:2393.56556
[34]	validation_0-mae:2392.29103
[35]	validation_0-mae:2390.29420
[36]	validatio

NameError: name 'plt' is not defined

In [18]:
### Modelo Preliminar

model = XGBRegressor(
            eval_metric        = 'mae',
            tree_method        = "hist", 
            enable_categorical = True, 
            n_jobs             = -1,
            n_estimators       = 110,
            random_state       = 42,
            max_depth          = 6,
            subsample          = 1,
            learning_rate      = 0.2,
            max_bin            = 190,  
            min_child_weight   = 3
)

model.fit(X_train, y_train)

In [19]:
# Media

mean_absolute_error(y_test, model.predict(X_test))

2381.832622607422

In [20]:
### Predicciones

r = ['Volkswagen_TiguanSEL',
 'Honda_CivicSi',
 'Chrysler_200LX',
 'Nissan_Versa4dr',
 'Toyota_Highlander4dr',
 'Honda_CR-VSE',
 'Mazda_CX-9Grand',
 'Ford_Focus5dr',
 'Subaru_WRXSTI',
 'Ford_RangerSuperCab',
 'Volkswagen_New',
 'Porsche_9112dr',
 'Toyota_Tundra2WD',
 'Chrysler_200Touring',
 'Lincoln_Town',
 'Chevrolet_Colorado4WD',
 'Honda_RidgelineRTL',
 'Lincoln_Navigator2WD',
 'INFINITI_FX35AWD',
 'Dodge_Dakota2WD',
 'Jeep_Patriot4WD',
 'Subaru_Legacy',
 'Volkswagen_GTI4dr',
 'Honda_FitSport',
 'Nissan_Quest4dr',
 'Jeep_CompassLimited',
 'Honda_PilotSE',
 'Ford_F-250XL',
 'Honda_OdysseyLX',
 'Pontiac_G64dr',
 'Toyota_Sequoia4WD',
 'Ford_F-350XLT',
 'Mazda_CX-9FWD',
 'Audi_A64dr',
 'Toyota_SiennaLimited',
 'Ford_F-150Limited',
 'Mercedes-Benz_SL-ClassSL500',
 'Volkswagen_GTI2dr',
 'Mercury_Grand',
 'Chevrolet_Avalanche2WD',
 'Jeep_Compass4WD',
 'Jaguar_XF4dr',
 'Cadillac_STS4dr',
 'Toyota_PriusFive',
 'Volvo_XC60AWD',
 'Tesla_Model',
 'Ford_Ranger4WD',
 'Ford_TaurusSE',
 'Mitsubishi_Lancer4dr',
 'Toyota_Matrix5dr',
 'Mazda_CX-7FWD',
 'Ford_Explorer4dr',
 'Ford_MustangShelby',
 'Buick_Regal4dr',
 'Toyota_Tundra4WD',
 'Toyota_SiennaSE',
 'Volkswagen_Passat',
 'Ford_F-150King',
 'Lincoln_Navigator',
 'Hyundai_VeracruzFWD',
 'Nissan_Frontier',
 'Audi_Q7quattro',
 'Jeep_WranglerX',
 'Chevrolet_Suburban4dr',
 'Toyota_CamryBase',
 'Chrysler_PacificaLimited',
 'Volkswagen_Eos2dr',
 'Subaru_Legacy3.6R',
 'Chevrolet_Cobalt4dr',
 'Toyota_Highlander',
 'Cadillac_CTS-V',
 'Chrysler_300Base',
 'Lexus_LX',
 'Ford_FocusST',
 'Mercury_Milan4dr',
 'Toyota_RAV44dr',
 'Acura_TLAutomatic',
 'Kia_SportageSX',
 'Ford_F-350XL',
 'Cadillac_Escalade2WD',
 'Acura_TSXAutomatic',
 'Mitsubishi_Outlander4WD',
 'Toyota_Land',
 'Nissan_Xterra4dr',
 'Hyundai_Accent4dr',
 'Ram_1500Tradesman',
 'Jaguar_XJ4dr',
 'Buick_EnclaveConvenience',
 'Toyota_SequoiaSR5',
 'Ford_FocusSEL',
 'Hyundai_Azera4dr',
 'Subaru_WRXBase',
 'Nissan_Xterra2WD',
 'Honda_AccordLX-S',
 'INFINITI_QX562WD',
 'Jeep_LibertyLimited',
 'Cadillac_Escalade4dr',
 'Lexus_SC',
 'Volkswagen_Touareg4dr',
 'Volvo_XC60FWD',
 'Toyota_4Runner2WD',
 'Toyota_SequoiaLimited',
 'Dodge_Durango4dr',
 'Toyota_Sequoia4dr',
 'Ford_FocusS',
 'Volvo_XC90FWD',
 'Mercedes-Benz_C-ClassC350',
 'Ford_Excursion137"',
 'Kia_ForteSX',
 'Scion_xD5dr',
 'Volvo_S804dr',
 'Kia_Sedona4dr',
 'Mercedes-Benz_E-ClassE320',
 'Ford_F-250King',
 'Lexus_LXLX',
 'Mitsubishi_Outlander2WD',
 'Ford_F-150FX2',
 'GMC_Canyon4WD',
 'Toyota_4RunnerTrail',
 'Kia_SedonaEX',
 'Toyota_RAV4',
 'Honda_Element4WD',
 'Chevrolet_CorvetteConvertible',
 'Ford_EscapeLImited',
 'Ford_FlexSE',
 'Mitsubishi_Eclipse3dr',
 'Ford_EscapeLimited',
 'Toyota_PriusBase',
 'Pontiac_Vibe4dr',
 'Honda_RidgelineSport',
 'Chevrolet_Cobalt2dr',
 'GMC_Yukon4dr',
 'Audi_A34dr',
 'Volvo_XC704dr',
 'Volkswagen_GLI4dr',
 'Toyota_PriusOne',
 'Hyundai_VeracruzAWD',
 'Toyota_SequoiaPlatinum',
 'GMC_Canyon2WD',
 'Toyota_RAV4Sport',
 'Chevrolet_TahoeLS',
 'Buick_LaCrosseAWD',
 'Jeep_PatriotLimited',
 'Subaru_WRXPremium',
 'Toyota_YarisLE',
 'Subaru_WRXLimited',
 'Ford_FiestaS',
 'Toyota_YarisBase',
 'Toyota_HighlanderSE',
 'Volvo_XC90T6',
 'Chrysler_PT',
 'Toyota_AvalonTouring',
 'Ford_Escape4dr',
 'Pontiac_Grand',
 'Audi_TT2dr',
 'Lincoln_Navigator4dr',
 'GMC_CanyonExtended',
 'Ram_25002WD',
 'Jaguar_XK2dr',
 'Volvo_C702dr',
 'Honda_AccordSE',
 'Audi_S44dr',
 'Ford_ExplorerEddie',
 'Honda_S2000Manual',
 'Porsche_Cayman2dr',
 'Nissan_MuranoS',
 'Honda_CR-ZEX',
 'Chrysler_300Touring',
 'Bentley_Continental',
 'Ford_MustangDeluxe',
 'Ford_F-350King',
 'Honda_Element2WD',
 'Subaru_ImprezaSport',
 'Toyota_Yaris4dr',
 'Buick_RegalGS',
 'Nissan_PathfinderSE',
 'Mitsubishi_Galant4dr',
 'Mercedes-Benz_SLK-ClassSLK350',
 'Chevrolet_Monte',
 'Mazda_RX-84dr',
 'Suzuki_Grand',
 'Ram_Dakota4WD',
 'Ram_Dakota2WD',
 'GMC_New',
 'Freightliner_Sprinter',
 'Dodge_Sprinter',
 'Chevrolet_New']

dataTesting['ref'] = (dataTesting["Make"] + "_" + dataTesting["Model"])

dataTesting['ref'] = dataTesting['ref'].apply(lambda x: 'Otro' if x in r else x)

dataTesting = dataTesting[['Year', 'Mileage', 'State', 'ref']].copy()

dataTesting["State"] = dataTesting["State"].astype("category")
dataTesting["ref"]   = dataTesting["ref"].astype("category")

In [21]:
y_pred = pd.DataFrame(model.predict(dataTesting), index=dataTesting.index, columns=['Price'])
y_pred.to_csv('test_submission.csv', index_label='ID')
y_pred.head()

Unnamed: 0_level_0,Price
ID,Unnamed: 1_level_1
0,21415.496094
1,37169.996094
2,24305.603516
3,9023.192383
4,30981.394531


In [22]:
### Grid

# Grid de hiperparámetros evaluados

param_grid = {'n_estimators'     : [i for i in range(90, 125, 5)],
              'max_depth'        : [i for i in range(2, 12, 2)],
              'learning_rate'    : [(i+3)/40 for i in range(5, 9, 1)],
              'min_child_weight' : [i for i in range(1, 12, 2)]
             }

In [24]:
from sklearn.model_selection import GridSearchCV
import multiprocessing
from sklearn.model_selection import RepeatedKFold

fit_params = {"early_stopping_rounds" : 5, 
              "eval_metric"           : "mae", 
              "eval_set"              : [(X_val, y_val)],
              "verbose"               : 0,
             }

In [None]:
### Grid search con validación cruzada

grid = GridSearchCV(
        estimator  = XGBRegressor(
                            tree_method        = "hist", 
                            enable_categorical = True, 
                            subsample          = 1,
                            n_jobs             = -1,
                            random_state       = 42,
                    ),
        param_grid = param_grid,
        scoring    = 'neg_mean_absolute_error',
        n_jobs     = - 1,
        #cv         = RepeatedKFold(n_splits=3, n_repeats=1, random_state=123), 
        cv         = 3,
        refit      = True,
        verbose    = 0,
        return_train_score = True
       )

grid.fit(X = X_train, y = y_train, **fit_params)

In [None]:
# Resultados

resultados = pd.DataFrame(grid.cv_results_)
resultados.filter(regex = '(param.*|mean_t|std_t)') \
    .drop(columns = 'params') \
    .sort_values('mean_test_score', ascending = False) \
    .head(4)