El servicio de venta de autos usados Rusty Bargain está desarrollando una aplicación para atraer nuevos clientes. Gracias a esa app, puedes averiguar rápidamente el valor de mercado de tu coche. Tienes acceso al historial: especificaciones técnicas, versiones de equipamiento y precios. Tienes que crear un modelo que determine el valor de mercado.
A Rusty Bargain le interesa:
- la calidad de la predicción;
- la velocidad de la predicción;
- el tiempo requerido para el entrenamiento

## Preparación de datos

In [116]:
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.ensemble import RandomForestRegressor
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.metrics import accuracy_score , root_mean_squared_error
from lightgbm import LGBMRegressor

import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px
import pandas as pd
import numpy as np
import re


In [117]:
# %pip install -U --user scikit-learn

In [118]:
# df = pd.read_csv('/datasets/car_data.csv')

In [119]:
df = pd.read_csv('D:/Tripleten/datasets/car_data.csv')

In [120]:
# # BORRADOR , NO MANTENER

# df = df.sample(1000)

In [None]:
df.sample(4)

Unnamed: 0,DateCrawled,Price,VehicleType,RegistrationYear,Gearbox,Power,Model,Mileage,RegistrationMonth,FuelType,Brand,NotRepaired,DateCreated,NumberOfPictures,PostalCode,LastSeen
334064,27/03/2016 21:58,1799,small,2001,auto,61,fortwo,70000,6,petrol,smart,no,27/03/2016 00:00,0,47652,31/03/2016 05:47
305293,19/03/2016 14:45,100,,1995,,0,astra,150000,7,petrol,opel,,19/03/2016 00:00,0,88255,30/03/2016 07:45
176627,31/03/2016 12:55,4250,suv,2001,auto,218,m_klasse,150000,6,petrol,mercedes_benz,no,31/03/2016 00:00,0,25421,06/04/2016 05:44
22853,25/03/2016 13:58,1600,wagon,1999,manual,136,3er,150000,12,gasoline,bmw,yes,25/03/2016 00:00,0,75031,25/03/2016 13:58


Renombrado columnas para mantener formato Snake_Case en el código.

In [121]:
cols = df.columns
snake_case_cols = []

for col in cols:
    snake_case_col = '_'.join(re.findall(r'[A-Z][a-z0-9]*', col)).lower()
    snake_case_cols.append(snake_case_col)

# print(snake_case_cols)
df.columns = snake_case_cols

Obteniendo información general

In [57]:
df.sample(5)

Unnamed: 0,date_crawled,price,vehicle_type,registration_year,gearbox,power,model,mileage,registration_month,fuel_type,brand,not_repaired,date_created,number_of_pictures,postal_code,last_seen
146088,07/03/2016 12:36,850,small,1998,auto,65,corsa,150000,9,petrol,opel,no,07/03/2016 00:00,0,51580,05/04/2016 13:15
71187,09/03/2016 15:38,1400,wagon,2004,manual,136,c5,150000,4,,citroen,no,09/03/2016 00:00,0,36269,09/03/2016 15:38
22869,04/04/2016 22:39,4400,bus,2001,manual,68,transporter,80000,12,gasoline,volkswagen,,04/04/2016 00:00,0,37081,07/04/2016 01:16
25202,26/03/2016 17:25,7900,sedan,2007,manual,105,golf,90000,5,gasoline,volkswagen,no,26/03/2016 00:00,0,65719,31/03/2016 21:44
342165,06/03/2016 14:36,4111,small,2006,manual,101,fabia,125000,2,gasoline,skoda,no,06/03/2016 00:00,0,9599,26/03/2016 20:18


In [58]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 354369 entries, 0 to 354368
Data columns (total 16 columns):
 #   Column              Non-Null Count   Dtype 
---  ------              --------------   ----- 
 0   date_crawled        354369 non-null  object
 1   price               354369 non-null  int64 
 2   vehicle_type        316879 non-null  object
 3   registration_year   354369 non-null  int64 
 4   gearbox             334536 non-null  object
 5   power               354369 non-null  int64 
 6   model               334664 non-null  object
 7   mileage             354369 non-null  int64 
 8   registration_month  354369 non-null  int64 
 9   fuel_type           321474 non-null  object
 10  brand               354369 non-null  object
 11  not_repaired        283215 non-null  object
 12  date_created        354369 non-null  object
 13  number_of_pictures  354369 non-null  int64 
 14  postal_code         354369 non-null  int64 
 15  last_seen           354369 non-null  object
dtypes:

Resaltando columnas con valores nulos

In [59]:
null_values = df.isna().sum()
null_rows= df.shape[0] - df.dropna().shape[0]

null_values = null_values[null_values>0]


print(f'Null values by column:')
print(null_values, end='\n\n')
print(f'Total null rows: {null_rows}')

Null values by column:
vehicle_type    37490
gearbox         19833
model           19705
fuel_type       32895
not_repaired    71154
dtype: int64

Total null rows: 108555


Observando si tenemos valores duplicados

In [60]:
duplicated_rows = df.duplicated()
duplicated_rows.sum()

262

Examinando manualmente un caso para verificar coherencia

In [61]:
df[(df['date_crawled'] == '21/03/2016 19:06') & (df['price']== 5999)]
# df.query("date_crawled == '21/03/2016 19:06' and price == 5999")

Unnamed: 0,date_crawled,price,vehicle_type,registration_year,gearbox,power,model,mileage,registration_month,fuel_type,brand,not_repaired,date_created,number_of_pictures,postal_code,last_seen
183,21/03/2016 19:06,5999,small,2009,manual,80,polo,125000,5,petrol,volkswagen,no,21/03/2016 00:00,0,65529,05/04/2016 20:47
14266,21/03/2016 19:06,5999,small,2009,manual,80,polo,125000,5,petrol,volkswagen,no,21/03/2016 00:00,0,65529,05/04/2016 20:47


Observaciones:

- Elementos con clasificaciones Dtype erroneos (price, not_repaired, date_crawled, last_seen)
- Elementos nulos para las columnas (vehicle_type, gearbox, model, fuel_type, not_repaired)
- 262 elementos duplicados.

Se reasignaran los tipos de valor date_crawled y last_seen al formato datetime64, se evitará reasignar los tipos de valor de price y not_repaired debido a sus valores nulos

In [62]:
df['date_crawled'] = pd.to_datetime(df['date_crawled'], format='%d/%m/%Y %H:%M')
df['last_seen'] = pd.to_datetime(df['last_seen'], format='%d/%m/%Y %H:%M')

In [63]:
print(df['date_crawled'].dt.year.unique() == df['last_seen'].dt.year.unique())
# print(df['date_crawled'].dt.month.sort_values().unique() == df['last_seen'].dt.month.sort_values().unique())

[ True]


Modificamos la columna 'not_repaired' a booleano

In [64]:
pd.set_option('future.no_silent_downcasting', True)
df['not_repaired'].replace({'yes':1, 'no':0}, inplace=True)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['not_repaired'].replace({'yes':1, 'no':0}, inplace=True)


Los datos pertenecen al 2016 y las columnas date_crawled y last seen tiene los mismo resultados a nivel año y mes

Procedemos a eliminar los valores duplicados

In [65]:
df = df.drop_duplicates().reset_index(drop=True)

Crearé una función que me permita analizar de una manera visuallos elementos únicos que se encuentran dentro de mis columnas, para verificar si existen datos erroneos o extraños.



In [66]:
def analyze(db):
    obj = db.select_dtypes(include='object').columns
    int = db.select_dtypes(include='int').columns
    
    print('Analisis de elementos Object')
    for i in obj:

        print(f'Elementos de la columna {i}')
        print(db[i].sort_values().unique(), end='\n\n')

    for i in int:

        print(f'Elementos de la columna {i}')
        print(db[i].sort_values().unique(), end='\n\n')



In [67]:
analyze(df)

Analisis de elementos Object
Elementos de la columna vehicle_type
['bus' 'convertible' 'coupe' 'other' 'sedan' 'small' 'suv' 'wagon' nan]

Elementos de la columna gearbox
['auto' 'manual' nan]

Elementos de la columna model
['100' '145' '147' '156' '159' '1_reihe' '1er' '200' '2_reihe' '300c'
 '3_reihe' '3er' '4_reihe' '500' '5_reihe' '5er' '601' '6_reihe' '6er'
 '7er' '80' '850' '90' '900' '9000' '911' 'a1' 'a2' 'a3' 'a4' 'a5' 'a6'
 'a8' 'a_klasse' 'accord' 'agila' 'alhambra' 'almera' 'altea' 'amarok'
 'antara' 'arosa' 'astra' 'auris' 'avensis' 'aveo' 'aygo' 'b_klasse'
 'b_max' 'beetle' 'berlingo' 'bora' 'boxster' 'bravo' 'c1' 'c2' 'c3' 'c4'
 'c5' 'c_klasse' 'c_max' 'c_reihe' 'caddy' 'calibra' 'captiva' 'carisma'
 'carnival' 'cayenne' 'cc' 'ceed' 'charade' 'cherokee' 'citigo' 'civic'
 'cl' 'clio' 'clk' 'clubman' 'colt' 'combo' 'cooper' 'cordoba' 'corolla'
 'corsa' 'cr_reihe' 'croma' 'crossfire' 'cuore' 'cx_reihe' 'defender'
 'delta' 'discovery' 'doblo' 'ducato' 'duster' 'e_klasse' 'el

In [68]:
def chart_prev(df):
    fig, axes =plt.subplots(ncols=4,nrows=4,figsize=[12,12])

    for ax, column in zip(axes.flatten(), df.columns):
        ax.hist(df[column].dropna(), bins=10, color='g')
        ax.grid(True)
        ax.set_title(column)

    plt.tight_layout()
    plt.show()
    

In [69]:
# chart_prev(df)


### Observaciones


Al momento nos encontramos con las siguientes situaciones:

Elementos nulos para las columnas:
- vehicle_type
- gearbox 
- model
- fuel_type
- not_repaired

A su vez, hemos identificado incongruencias en los datos registrados para las siguientes columnas: 

- `price`: En los valores de precio puedo observar valores de muy bajos que pueden ser considerados `sesgos de información.`
- `registration year` : se pueden observar ver valores superiores al año actual y valores por debajo de 1886 (cuando se creó el primer vehículo), existen muchos `datos con sesgo`.
- `power` la medición esta en caballos de vapor. El primer vehiculo creado tenía una fuerza de CV de 0.75, por lo que es incongruente ver valores de 0 en la base de datos, en su contraparte, el vehículo con mayor velocidad de CV regitrada para 2020 fue de 1800 CV, por lo que `tenemos incongruencias` en los datos cuyos valores superan los a 10,000 CV. 

De un total de 16 columnas, 8 columnas se encuentran sesgadas

### Sesgo de 'price'

`- Existen precios de vehículos por debajo de los 600 dolares.`

Para poder ejecutar un análisis mas profundo es necesario identificar que pudo haber pasado con los datos, una suposición sería que la base se descargo de una plataforma en línea donde los usuarios pueden decidir el precio final de su vehículo, algunos usuarios podrían optar por colocar precios irreales para llamar la atención de los compradores.

Con el fin de alimentar nuestro modelo de una mejor manera, será necesario reclasificar o excluir los precios menores a 600 dolares. Este umbral se ha obtenido de la busqueda de los precios minimos y maximos para la venta de vehiculos en europa de los sitios
- www.coches.com,
- www.milanuncios.com

Si realizaramos un filtro de precios encontrariamos 46,313 filas afectadas que representan el 13.07% del total del dataset

### Sesgo de 'registration_year'
`- Se pueden observar ver valores superiores al año actual y valores por debajo de 1886 (antes del primer automovil creado).`

Siguiendo las fuentes antes mencionadas, podemos ver vehículos ofertados con fecha de registro desde 1980, mantendremos este umbral en nuestro dataset, por otra parte el registro muestra vehiculos con fecha de registro mayor a 2016, fecha en la que se descargaron estos datos. Por lo que datos posteriores no deberían de considerarse para este estudio.

Considerando la depuración del umbral mínimo y máximo, sería necesario depurar 17,978 filas representando un 5.07% del total del dataset.

### Sesgo de 'power'
`- Valores incongruentes en los CV, con valores inferiores a 60 y superiores a 500 en nuestra base de datos.`

Para la seccion de power tenemos como entrada que el primer vehiculo construido tenía una potencia de Caballos de Vapor (CV) de 0.75. 
De acuerdo con las paginas antes mencionadas, los vehiculos pueden tener una potencia entre 60 y 400 cv


Se obtuvieron muestras al azar los vehiculos de la base de datos con registro mayor a 400cv y ningúno clasifico con mas de 250 cv reales, por lo que limitaremos la búsqueda a 300 cv
Si manejamos los umbrales de 60,300 tendríamos un total de 67,268  datos por de purar representando un 18.99% del dataset.


### Sesgo de 'mileage'
`- Existen vehiculos con registro de salida del año en curso y con 150,000 km registrados (datos incongruentes).`

En promedio, los vehiculos generarn un kilometraje de 15,000 a 27,000 kms por año, para minimizar los datos afectados, se excluiran los autos con un kilometraje mayor a 150,000 para añós posteriores a 2011. (27 kms por añó).
19102 filas que corresponden a un 5.39 del dataset


### Primera conclusión

El dataset descargado presenta bastantes datos sesgados e incongruentes, si bien hemos establecido la hipótesis de que la base se descargo de una plataforma en línea donde los usuarios pueden decidir el precio final de su vehículo no podemos asegurar que sea correcta, sin embargo el enfoque principal del de este proyecto se basa en entrenar un modelo de predicción, por lo que será necesario excluir estos datos para tener una base de mejor calidad.

In [70]:
# Excluyendo valores price
recalibrate_df= df.copy()

In [71]:
recalibrate_df= recalibrate_df.query('price>=600')
recalibrate_df = recalibrate_df.query(' 1980 < registration_year < 2017')
recalibrate_df = recalibrate_df.query('60 <=power <=300')
recalibrate_df = recalibrate_df.query('~(registration_year>=2012 and mileage >= 150000)')

In [72]:
loss_data = (1 - (recalibrate_df.shape[0] / df.shape[0])) *100
print(f'Se ha depurado un {loss_data:.2f} de la data original')

Se ha depurado un 30.42 de la data original


In [73]:
recalibrate_df.isna().sum().sort_values(ascending=False)

not_repaired          30254
fuel_type              9408
model                  7475
vehicle_type           3932
gearbox                3357
date_crawled              0
price                     0
registration_year         0
power                     0
mileage                   0
registration_month        0
brand                     0
date_created              0
number_of_pictures        0
postal_code               0
last_seen                 0
dtype: int64

In [74]:
# chart_prev(recalibrate_df)

Las columnas de power y de registration year ahora tienen coherencia, el kilometraje sigue estando alto para autos con valores de recorrido mayor a 150,000, de acuerdo con la información externa, esta puede entrar en el umbral de 27,000 kms por año.

### Valores nuloss

Para la manipulación de los valores nulos, se han analizado las siguientes opciones:
- Eliminación de filas o columnas con valores nulos
- Imputación de valores nulos
`- Uso de modelos de imputación`
- Codificación de valores nulos

Optaré por el uso de modelos de imputación

In [75]:
imputated_df = recalibrate_df.dropna()


Mantenemos las columnas que realmente nos interesan

In [76]:
imputated_df = imputated_df.drop(columns=['date_crawled','date_created','number_of_pictures', 'last_seen','postal_code'])

# chart_prev(imputated_df)


In [77]:
imputated_df.sample(5)

Unnamed: 0,price,vehicle_type,registration_year,gearbox,power,model,mileage,registration_month,fuel_type,brand,not_repaired
255804,8900,wagon,2007,manual,170,passat,150000,2,gasoline,volkswagen,0
203174,8000,bus,2006,manual,116,sharan,150000,11,gasoline,volkswagen,0
6927,7300,sedan,2002,auto,150,a4,125000,6,petrol,audi,0
161753,11900,convertible,2004,auto,163,slk,70000,3,petrol,mercedes_benz,0
107968,9990,wagon,2008,manual,109,clubman,125000,1,gasoline,mini,0


## Entrenamiento del modelo 

In [78]:
def OHE(df):
    encoder = OneHotEncoder()
    encoded_data = encoder.fit_transform(df)
    output_names = encoder.get_feature_names_out(df.columns)
    df_codified = pd.DataFrame(encoded_data.toarray(), columns=output_names, index= df.index)
    return df_codified

In [79]:
def Scaler(df):
    scaler = StandardScaler()
    scaled_date = scaler.fit_transform(df)
    return scaled_date

Standarizando valores numéricos para mejorar los resultados de el modelo LinerarRegression, los modelos RandomForest y LightGBM no se verán beneficiados ni perjudicados por esta manipulación.

In [80]:
# standarized_df = imputated_df.drop(columns=['price'])
# int_colums = standarized_df.select_dtypes(include=[int]).columns
# standarized_df[int_colums] = Scaler(standarized_df[int_colums])

In [81]:
# standarized_df.head(3)

In [85]:
# obj_cols = standarized_df.select_dtypes(include=['object']).columns
# new_cols = OHE(standarized_df[obj_cols])

# standarized_df =standarized_df.drop(columns=obj_cols)
# standarized_df = pd.concat([standarized_df, new_cols], axis=1)

obj_cols = imputated_df.select_dtypes(include=['object']).columns
new_cols = OHE(imputated_df[obj_cols])

imputated_df =imputated_df.drop(columns=obj_cols)
imputated_df = pd.concat([imputated_df, new_cols], axis=1)

In [89]:
# X = standarized_df
# y = imputated_df['price']


X = imputated_df
y = imputated_df['price']

In [90]:
X

Unnamed: 0,price,registration_year,power,mileage,registration_month,vehicle_type_bus,vehicle_type_convertible,vehicle_type_coupe,vehicle_type_other,vehicle_type_sedan,...,brand_seat,brand_skoda,brand_smart,brand_subaru,brand_suzuki,brand_toyota,brand_volkswagen,brand_volvo,not_repaired_0,not_repaired_1
3,1500,2001,75,150000,6,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0
4,3600,2008,69,90000,7,0.0,0.0,0.0,0.0,0.0,...,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
5,650,1995,102,150000,10,0.0,0.0,0.0,0.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0
6,2200,2004,109,150000,8,0.0,1.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
10,2000,2004,105,150000,12,0.0,0.0,0.0,0.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
354093,4400,2008,105,150000,7,0.0,0.0,0.0,0.0,1.0,...,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
354097,7900,2010,140,150000,7,0.0,0.0,0.0,0.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0
354100,3200,2004,225,150000,5,0.0,0.0,0.0,0.0,1.0,...,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0
354104,1199,2000,101,125000,3,0.0,1.0,0.0,0.0,0.0,...,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0


In [99]:
def model_training(X,y, model, params):
    X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.25, random_state=100) 

    grid_search = GridSearchCV(model, params, scoring='neg_mean_squared_error', cv=5, verbose=2)
    grid_search.fit(X_train, y_train)

    best_params = grid_search.best_params_
    best_model = grid_search.best_estimator_

    y_pred = best_model.predict(X_test)
    rmse = root_mean_squared_error(y_test,y_pred)


    return grid_search, rmse

In [100]:
model = LinearRegression()
params = {
    'fit_intercept': [True, False],
}
grid_search, rmse = model_training(X,y,model,params)

# print("RMSE:", rmse_lr)

Fitting 5 folds for each of 2 candidates, totalling 10 fits
[CV] END .................................fit_intercept=True; total time=   5.1s
[CV] END .................................fit_intercept=True; total time=   4.1s
[CV] END .................................fit_intercept=True; total time=   4.2s
[CV] END .................................fit_intercept=True; total time=   4.2s
[CV] END .................................fit_intercept=True; total time=   4.0s
[CV] END ................................fit_intercept=False; total time=   4.1s
[CV] END ................................fit_intercept=False; total time=   5.0s
[CV] END ................................fit_intercept=False; total time=   4.2s
[CV] END ................................fit_intercept=False; total time=   4.2s
[CV] END ................................fit_intercept=False; total time=   4.1s


In [101]:
print("RMSE:", rmse)

RMSE: 4.2718501462409623e-11


In [107]:
pd.DataFrame(grid_search.cv_results_)


Unnamed: 0,mean_fit_time,std_fit_time,mean_score_time,std_score_time,param_fit_intercept,params,split0_test_score,split1_test_score,split2_test_score,split3_test_score,split4_test_score,mean_test_score,std_test_score,rank_test_score
0,4.307136,0.409045,0.110845,0.005775,True,{'fit_intercept': True},-1.432255e-21,-9.240535000000001e-23,-7.264991000000001e-22,-1.1756659999999999e-20,-1.222863e-21,-3.0461369999999998e-21,4.379667e-21,1
1,4.286282,0.340223,0.107592,0.004134,False,{'fit_intercept': False},-1.051156e-20,-8.770121e-21,-1.1358859999999999e-20,-9.583182e-21,-1.1013e-20,-1.024734e-20,9.502427000000001e-22,2


In [30]:
y= imputated_df['price']
model = LGBMRegressor()
params = {
    'boosting_type': ['gbdt','dart'],
    'num_leaves': [20,30,40],
    'learning_rate': [0.05,0.1,0.2],
    'n_estimators': [50,100,200]
}

In [116]:
rmse_lgbm = model_training(X,y,model,params)

[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.004554 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 982
[LightGBM] [Info] Number of data points in the train set: 122072, number of used features: 491
[LightGBM] [Info] Start training from score 5690.157899
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.003668 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 990
[LightGBM] [Info] Number of data points in the train set: 122072, number of used features: 495
[LightGBM] [Info] Start training from score 5687.642719
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.002921 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is n

In [117]:
print("RMSE:", rmse_lgbm)

RMSE: 1590.4910096686092


In [44]:
model = RandomForestRegressor(random_state=100)
params = {
    'n_estimators': [50,10,100],
    'max_depth': [None,10,20],
    'min_samples_split': [2,5,10],
    'min_samples_leaf': [1,2,4]
}

rmse_rf = model_training(X,y,model,params)

In [36]:
print("RMSE:", rmse)

RMSE: 18775602690846.797


In [None]:
print(f'max{y.max()}, min{y.min()}')

## Análisis del modelo

# Lista de control

Escribe 'x' para verificar. Luego presiona Shift+Enter

- [x]  Jupyter Notebook está abierto
- [ ]  El código no tiene errores- [ ]  Las celdas con el código han sido colocadas en orden de ejecución- [ ]  Los datos han sido descargados y preparados- [ ]  Los modelos han sido entrenados
- [ ]  Se realizó el análisis de velocidad y calidad de los modelos