# Car price prediction Exercise 

This script trains a machine learning model to predict the car price in Spain based on the car type, age, fuel type, milege and gearbox type. When you run this python notebook, it will create and save the trained model in the directory. You can use this model in your final product.

## Task
Build a web interface/web application/API that predicts car prices based on user inputs include make, model year, fuel type, gear type, and mileage using a provided pre-trained machine learning model.

## Requirements ##
1. Load the pre-trained car price prediction model created by this python notebook.
2. Create a web interface/web application/API that accepts the following user inputs: 
    1. Car make (e.g. BMW)  
    2. Model year (e.g. 2018)  
    3. Fuel type (e.g. Diesel)  
    4. Transmission type (e.g. Manual)  
    5. Mileage (e.g. 10000)  
3. Pass the user inputs to the pre-trained model to generate a predicted car price  
4. Display the predicted price back to the user

## Guidelines ##
1. Use any frontend (e.g. React, Vue), backend (e.g. Flask, FastAPI, node.JS), web application (Streamlit, Django) technologies
2. Host the app locally, no need to deploy online
3. Include code your Github repository for review
4. Don't forget the README file and comments in your code!


In [36]:
import pandas as pd
import datetime
from sklearn.preprocessing import OneHotEncoder
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error
from sklearn.metrics import mean_absolute_error

## Data Ingestion

In [37]:
df = pd.read_csv("used_cars_data.csv")

In [38]:
df

Unnamed: 0.1,Unnamed: 0,brand,model,price (eur),engine,year,mileage (kms),fuel,gearbox,location
0,0,SEAT,Ibiza,8990,SC 1.2 TSI 90cv Style,2016,67000,Gasolina,Manual,Granollers
1,1,Hyundai,i30,9990,1.6 CRDi 110cv Tecno,2014,104868,Diésel,Manual,Viladecans
2,2,BMW,Serie 5,13490,530d Touring,2011,137566,Diésel,Automatica,Viladecans
3,3,Volkswagen,Golf,24990,GTI 2.0 TSI 169kW (230CV),2018,44495,Gasolina,Manual,Viladecans
4,4,Opel,Corsa,10460,1.4 Expression 90 CV,2016,69800,Gasolina,Manual,Sabadell 1
...,...,...,...,...,...,...,...,...,...,...
787,787,Volkswagen,Golf,13990,Edition 1.6 TDI 110CV BMT,2016,84040,Diésel,Manual,Gavá
788,788,Kia,Sportage,24990,1.6 GDi 97kW (132CV) Basic 4x2,2018,65872,Gasolina,Manual,Viladecans
789,789,Abarth,500,17990,1.4 16v T-Jet 595 118kW (160CV) Pista E6,2019,28830,Gasolina,Manual,Mataró
790,790,Volkswagen,Tiguan,14990,2.0 TDI 177cv DSG 4x4 Sport BMotion Tech,2014,162895,Diésel,Automatica,Mataró


## Data Preprocessing

In [39]:
# One hot encoding for categorical features

enc = OneHotEncoder(handle_unknown='ignore')

In [40]:
X = df[['brand', 'fuel', 'gearbox']]

In [41]:
enc.fit(X)

In [42]:
enc.categories_

[array(['Abarth', 'Alfa', 'Audi', 'BMW', 'Chevrolet', 'Citroen', 'Cupra',
        'DS', 'Dacia', 'Fiat', 'Ford', 'Honda', 'Hyundai', 'Jaguar',
        'Jeep', 'Kia', 'Land', 'Lexus', 'Mazda', 'Mercedes', 'Mini',
        'Mitsubishi', 'Nissan', 'Opel', 'Peugeot', 'Porsche', 'Renault',
        'SEAT', 'Skoda', 'Smart', 'Ssangyong', 'Subaru', 'Suzuki',
        'Toyota', 'Volkswagen', 'Volvo'], dtype=object),
 array(['Diésel', 'Eléctrico', 'GLP', 'Gasolina', 'Híbrido'], dtype=object),
 array(['Automatica', 'Manual'], dtype=object)]

In [43]:
enc.transform(df[['brand', 'fuel', 'gearbox']]).toarray()

array([[0., 0., 0., ..., 0., 0., 1.],
       [0., 0., 0., ..., 0., 0., 1.],
       [0., 0., 0., ..., 0., 1., 0.],
       ...,
       [1., 0., 0., ..., 0., 0., 1.],
       [0., 0., 0., ..., 0., 1., 0.],
       [0., 0., 0., ..., 0., 0., 1.]])

In [44]:
X_features = pd.DataFrame(enc.transform(df[['brand', 'fuel', 'gearbox']]).toarray())

In [45]:
# Feature transforming
year = datetime.datetime.now().year

In [46]:
df['age'] = year-df['year']

In [47]:
X_num = df[['age', 'mileage (kms)']]
y = df['price (eur)']

In [48]:
X = pd.concat([X_num, X_features], axis=1)

In [49]:
y[:5]

0     8990
1     9990
2    13490
3    24990
4    10460
Name: price (eur), dtype: int64

## Model Training

In [50]:

# Convert feature names to strings
X.columns = X.columns.astype(str)


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

regr = RandomForestRegressor()

regr.fit(X_train, y_train)

y_pred = regr.predict(X_test)

## Model Evaluation

In [51]:
errors = mean_squared_error(y_test, y_pred, squared=False)

errors

4609.006083550564

In [52]:
errors2 = mean_absolute_error(y_test, y_pred)
errors2

3325.6667938931296

## Saving the model

In [53]:
from joblib import dump, load

dump(regr, 'model.joblib') 

['model.joblib']

In [54]:
# To load the model for your product

# regr = load('model.joblib') 