# Car price prediction & EDA

In this notebook, we're going to analyse Ford Car dataset from Kaggle. we're going to predict car price (supervised learning) and do some EDA and visualization

We'll try to answer the following questions:

*   Top model sales
*   Distribution of car sales price based on model
*   fuel , engine type & engine size distribution
*   Sales trend over the year
*   price sales distribution
*   corellation between tax,mileage and mpg with car price
*   Car price prediction







In [171]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [172]:
#import visualization library
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots

### dataset: https://www.kaggle.com/datasets/adhurimquku/ford-car-price-prediction

The dataset belongs to Ford (car manufacture). The dataset provide the transaction of cars based on their model . The dataset has 17966 entries, and 9 columns.

The data contains the following fields:

1. model - > Ford Car Brands
2. year - > Production Year
3. price - > Price of car in $
4. transmission - > Automatic, Manual, Semi-Auto
5. mileage -> Number of miles traveled
6. fuel_Type -> Petrol, Diesel, Hybrid, Electric, Other
7. tax -> Annual Tax
8. mpg - > Miles per Gallon
9. engineSize - > Car's Engine Size



In [173]:
data=pd.read_csv("/content/drive/MyDrive/dataset/portofolio/car_prediction/ford.csv")

In [174]:
data.head()

Unnamed: 0,model,year,price,transmission,mileage,fuelType,tax,mpg,engineSize
0,Fiesta,2017,12000,Automatic,15944,Petrol,150,57.7,1.0
1,Focus,2018,14000,Manual,9083,Petrol,150,57.7,1.0
2,Focus,2017,13000,Manual,12456,Petrol,150,57.7,1.0
3,Fiesta,2019,17500,Manual,10460,Petrol,145,40.3,1.5
4,Fiesta,2019,16500,Automatic,1482,Petrol,145,48.7,1.0


In [175]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 17966 entries, 0 to 17965
Data columns (total 9 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   model         17966 non-null  object 
 1   year          17966 non-null  int64  
 2   price         17966 non-null  int64  
 3   transmission  17966 non-null  object 
 4   mileage       17966 non-null  int64  
 5   fuelType      17966 non-null  object 
 6   tax           17966 non-null  int64  
 7   mpg           17966 non-null  float64
 8   engineSize    17966 non-null  float64
dtypes: float64(2), int64(4), object(3)
memory usage: 1.2+ MB


In [176]:
data.isna().sum()

model           0
year            0
price           0
transmission    0
mileage         0
fuelType        0
tax             0
mpg             0
engineSize      0
dtype: int64

From above we can conclude that our data is **complete** and **clean** . No missing values to be seen.

# Data Wrangling

In [177]:
#create a series of model's count
freqcars= data['model'].value_counts().sort_values(ascending=False)
freqcars

 Fiesta                   6557
 Focus                    4588
 Kuga                     2225
 EcoSport                 1143
 C-MAX                     543
 Ka+                       531
 Mondeo                    526
 B-MAX                     355
 S-MAX                     296
 Grand C-MAX               247
 Galaxy                    228
 Edge                      208
 KA                        199
 Puma                       80
 Tourneo Custom             69
 Grand Tourneo Connect      59
 Mustang                    57
 Tourneo Connect            33
 Fusion                     16
 Streetka                    2
 Ranger                      1
 Escort                      1
 Transit Tourneo             1
Focus                        1
Name: model, dtype: int64

In [178]:
#create a series of fuel type ( count ) available
freqfuel= data['fuelType'].value_counts().sort_values(ascending=False)
freqfuel

Petrol      12179
Diesel       5762
Hybrid         22
Electric        2
Other           1
Name: fuelType, dtype: int64

In [179]:
# create a series of transmissison type ( count ) available
freqtransmission= data['transmission'].value_counts().sort_values(ascending=False)
freqtransmission

Manual       15518
Automatic     1361
Semi-Auto     1087
Name: transmission, dtype: int64

In [180]:
# create a series of engine size ( count ) available
freqenginesize= data['engineSize'].value_counts().sort_values(ascending=False)
freqenginesize

1.0    7765
1.5    3418
2.0    3311
1.2    1626
1.6     923
1.1     559
1.4     112
2.3      80
0.0      51
5.0      45
1.8      35
2.2      13
2.5      13
1.3      13
3.2       1
1.7       1
Name: engineSize, dtype: int64

In [181]:
# remove year 2022 since it is an outlier |the data ranging from 1996-2020 |
data.drop(data[(data['year'] >2022)].index, inplace=True)
# create a series of year sales ( count )
freqyear= data['year'].value_counts()
freqyear = freqyear.to_frame()
freqyear = freqyear.reset_index()
freqyear.columns=['Year','Count']
freqyear = freqyear.sort_values(by='Year',ascending=True)

In [182]:
#create a bar chart of top car sales all the time
fig =px.bar(data_frame=freqcars ,x=freqcars,color=freqcars,color_continuous_scale=px.colors.sequential.Bluyl )
fig.update_traces(texttemplate='%{x}', textposition='outside', hovertemplate = '<b>%{y}</b><br>No. of fuels: %{x}')
fig.update_layout(showlegend=False ,plot_bgcolor='white',coloraxis_showscale =False,
  title={'text': "Top Car Sales ",
        'y':0.95,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'})
fig.update_xaxes(title='Count')
fig.update_yaxes(title='Model types')
fig.show()

With the data above , we could see that **fiesta model** has the **highest** sell with **6557 sales**  ,and then followed by the **second highest** ,**focus model** with **2588 sales** ,and the **third highest** ,**kuga model** with **2225 sales**.

In [183]:
# create a box plot of price distribution based on model
fig =px.box(data_frame=data,x='model',y='price',color='model')
fig.update_layout(showlegend=False ,plot_bgcolor='white',coloraxis_showscale =False,
  title={'text': "Car price distribution based on model",
        'y':0.95,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'})
fig.update_yaxes(title='Price')
fig.update_xaxes(title='Model types')
fig.show()

with the box plot above ,we could see that **Mustang** has the **highest price** with median price at **33.979.999 `$`**  and the **lowest price** is **streetka** with median price **1.924 ```$```**

In [184]:
#create a bar chart of fuel type used
fig =px.bar(data_frame=freqfuel ,color=freqfuel,color_continuous_scale=px.colors.sequential.Bluyl )
fig.update_traces(texttemplate='%{y}', textposition='outside', hovertemplate = '<b>%{x}</b><br>No. of fuels: %{y}')
fig.update_layout(showlegend=False ,plot_bgcolor='white',coloraxis_showscale =False,
  title={'text': " Fuel Type distribution ",
        'y':0.95,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'})
fig.update_yaxes(title='Count')
fig.update_xaxes(title='Fuel types')
fig.show()

**fuel type** is dominated by **petrol** which used in **12.179.000 cars.**

In [185]:
#creat a bar chart of engine type distribution 
fig =px.bar(data_frame=freqtransmission ,color=freqtransmission,color_continuous_scale=px.colors.sequential.Bluyl )
fig.update_traces(texttemplate='%{y}', textposition='outside', hovertemplate = '<b>%{x}</b><br>No. of fuels: %{y}')
fig.update_layout(showlegend=False ,plot_bgcolor='white',coloraxis_showscale =False,
  title={'text': "Engine type distribution",
        'y':0.95,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'})
fig.update_yaxes(title='Count')
fig.update_xaxes(title='Engine types')
fig.show()

as for the **engine type** , **Most** of the car (15.518.000) has **manual transmission** 

In [186]:
#create a line chart to show sales trendline for all years
fig =px.line(data_frame=freqyear ,x='Year' ,y ='Count' ,markers=True ,color_discrete_sequence = ['red'])
fig.update_layout(showlegend=False ,plot_bgcolor='white',coloraxis_showscale =False,
  title={'text': "Sales trend over the year",
        'y':0.95,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'})
fig.update_xaxes(title='Year',showline=True, linewidth=2, linecolor='black', mirror=True,tickmode='linear')
fig.update_yaxes(title='Count',showline=True, linewidth=2, linecolor='black', mirror=True )
fig.show()

between **1996-2012** ,total car sold is **not very significant** and then it started to rose slowly. car sold has the highest jumped between **2016-2017** ,it  **peaked at 4888** .  in between **2017-2019** ,car sold **decreased.**

we can ignore sales in 2020 because the time span of this dataset is not until the end of 2020

In [187]:
#creat 2 plot , histogram plot and box plot , we'll try to show price sales distribution.
fig = make_subplots(rows=1, cols=2, subplot_titles=("Histogram plot of Price Distribution ","Box plot of Price Distribution "))

fig1 =px.histogram(data_frame=data['price'] )
fig.update_yaxes(title='Price',row=1,col=1)

fig2 =px.box(data_frame=data['price'] )
fig.update_yaxes(title='Price',row=1 ,col=2)

fig.add_trace(fig1['data'][0], row=1, col=1)
fig.add_trace(fig2['data'][0], row=1, col=2)

fig.update_xaxes(showline=True, linewidth=2, linecolor='black', mirror=True)
fig.update_yaxes(showline=True, linewidth=2, linecolor='black', mirror=True)

fig.update_layout(showlegend=False, coloraxis_showscale=False , plot_bgcolor='white')

fig.show()

**we could see that car price median is at 11.291k** , with upper fence at 24.69k

In [188]:
fig= px.imshow(data.corr(),text_auto=True,aspect="auto")
fig.update_layout(
    
    width=1300,
    height=600,
)
fig.show()

with looking at data correlation , i found that all of our data has correlation with price , the **biggest correlation** with price **is "year"** .this mean **the newest the model ,the higher the price.**

In [189]:
#create 3 plot ,with each correlated with price
fig = make_subplots(rows=1, cols=3)

fig1 =px.scatter(data_frame=data,x='tax',y='price' )
fig.update_yaxes(title='Price',row=1 ,col=1)
fig.update_xaxes(title='Tax',row=1 ,col=1)

fig2=px.scatter(data_frame=data,x='mileage',y='price' )
fig.update_yaxes(title='Price',row=1 ,col=2)
fig.update_xaxes(title='mileage',row=1 ,col=2)

fig3 =px.scatter(data_frame=data,x='mpg',y='price' )
fig.update_yaxes(title='Price',row=1,col=3)
fig.update_xaxes(title='mpg',row=1,col=3)

fig.add_trace(fig1['data'][0], row=1, col=1)
fig.add_trace(fig2['data'][0], row=1, col=2)
fig.add_trace(fig3['data'][0], row=1, col=3)

fig.update_xaxes(showline=True, linewidth=2, linecolor='black', mirror=True)
fig.update_yaxes(showline=True, linewidth=2, linecolor='black', mirror=True)

fig.update_layout(showlegend=False, coloraxis_showscale=False , plot_bgcolor='white', 
        title={'text': "Correlation with Price",
        'y':0.95,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'})


fig.show()


*   **Tax and price** has **posiitve correlation**,it's about **40%** , higher price means higher tax
*   **mileage and price** has **negative correlation** ,it's about **53%** , more mileage means lower price
*  **mpg and price** has **negative correlation** ,it's about **36%** ,more mpg means lower price

In [190]:
#create 3 plot ,with each correlated with price
fig = make_subplots(rows=1, cols=2)

fig1 =px.scatter(data_frame=data,x='year',y='price' )
fig.update_yaxes(title='Price',row=1 ,col=1)
fig.update_xaxes(title='Year',row=1 ,col=1)

fig2=px.scatter(data_frame=data,x='engineSize',y='price' )
fig.update_yaxes(title='Price',row=1 ,col=2)
fig.update_xaxes(title='engineSize',row=1 ,col=2)

fig.add_trace(fig1['data'][0], row=1, col=1)
fig.add_trace(fig2['data'][0], row=1, col=2)


fig.update_xaxes(showline=True, linewidth=2, linecolor='black', mirror=True)
fig.update_yaxes(showline=True, linewidth=2, linecolor='black', mirror=True)

fig.update_layout(showlegend=False, coloraxis_showscale=False , plot_bgcolor='white', 
        title={'text': "Correlation with Price",
        'y':0.95,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'})


fig.show()



*   **Year and price** has the **biggest correlation** ,it's about **64%** ,the latest model has the most expensive price
*   **engine size and price** has **positive correlation**,it's about **41%** , bigger engine size means higher price



In [191]:
#create 2 plot ,we'll try to show engine size distribution

fig = make_subplots(rows=1, cols=2)

fig1 =px.bar(data_frame=freqenginesize)
fig.update_yaxes(title='Count',row=1,col=1)
fig.update_xaxes(title='Engine Size',tickmode='linear',dtick = 0.1,row=1,col=1)

fig2 =px.box(data_frame=data,x='engineSize',y='price')
fig.update_yaxes(title='Price',row=1,col=2)
fig.update_xaxes(title='Engine Size',tickmode='linear',dtick = 0.1,row=1,col=2)


fig.add_trace(fig1['data'][0], row=1, col=1)
fig.add_trace(fig2['data'][0], row=1, col=2)

fig.update_xaxes(showline=True, linewidth=2, linecolor='black', mirror=True)
fig.update_yaxes(showline=True, linewidth=2, linecolor='black', mirror=True)

fig.update_layout(showlegend=False, coloraxis_showscale=False , plot_bgcolor='white', 
        title={'text': "Engine Size distribution",
        'y':0.95,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'})

fig.show()

even though engine size 5 has the lowest sales, it has the highest median (33k)

In [192]:
# create 2 plot ,first bar plot show each year sales count ,for the second plot we'll show price distribution coresponding with year
fig = make_subplots(rows=1, cols=2)

fig1=px.bar(data_frame=freqyear ,x='Year' ,y='Count' ,color='Year')
fig.update_traces(texttemplate='%{y}', textposition='outside', hovertemplate = '<b>%{x}</b><br>No. of fuels: %{y}')
fig.update_yaxes(title='Count',row=1,col=1)
fig.update_xaxes(title='Year',tickmode='linear',row=1,col=1)

fig2 =px.box(data_frame=data,x='year',y='price')
fig.update_yaxes(title='Count',row=1,col=2)
fig.update_xaxes(title='Year',tickmode='linear',row=1,col=2)

fig.add_trace(fig1['data'][0], row=1, col=1)
fig.add_trace(fig2['data'][0], row=1, col=2)

fig.update_xaxes(showline=True, linewidth=2, linecolor='black', mirror=True)
fig.update_yaxes(showline=True, linewidth=2, linecolor='black', mirror=True)

fig.update_layout(showlegend=False, coloraxis_showscale=False , plot_bgcolor='white', 
        title={'text': "Price distribution based on year",
        'y':0.95,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'})


fig.show()

we look once again at **year vs price** , we can conclude that the **latest model has the most expensive price **.

In [193]:
#let's check our data correlation
data.corr()

Unnamed: 0,year,price,mileage,tax,mpg,engineSize
year,1.0,0.645465,-0.718668,0.300518,-0.021488,-0.13917
price,0.645465,1.0,-0.53061,0.406999,-0.346557,0.411203
mileage,-0.718668,-0.53061,1.0,-0.260619,0.120225,0.21504
tax,0.300518,0.406999,-0.260619,1.0,-0.502919,0.184365
mpg,-0.021488,-0.346557,0.120225,-0.502919,1.0,-0.260528
engineSize,-0.13917,0.411203,0.21504,0.184365,-0.260528,1.0


In [194]:
data['price'].describe()

count    17965.000000
mean     12279.856833
std       4741.279186
min        495.000000
25%       8999.000000
50%      11291.000000
75%      15299.000000
max      54995.000000
Name: price, dtype: float64

In [195]:
# remove outlier on our data using IQR technique.
feature = ['price','mileage','tax','mpg']

for i in feature:
  x=data[i].describe()
  Q1 = x[4]
  Q3 = x[6]
  IQR = Q3-Q1
  Lowerbound = Q1-(1.5*IQR)
  Upperbound = Q3+(1.5*IQR)
  data= data[(data[i]>Lowerbound) & (data[i]<Upperbound)]

In [196]:
#create dummies for our data 
dummies = pd.get_dummies(data.model)
dummies2 = pd.get_dummies(data.transmission)
dummies3 = pd.get_dummies(data.fuelType)

In [197]:
# combine new dummies with our dataframe
data2 = pd.concat([data,dummies,dummies2,dummies3],axis=0)
data2.head(3)

Unnamed: 0,model,year,price,transmission,mileage,fuelType,tax,mpg,engineSize,B-MAX,...,Puma,S-MAX,Tourneo Connect,Tourneo Custom,Automatic,Manual,Semi-Auto,Diesel,Hybrid,Petrol
0,Fiesta,2017.0,12000.0,Automatic,15944.0,Petrol,150.0,57.7,1.0,,...,,,,,,,,,,
1,Focus,2018.0,14000.0,Manual,9083.0,Petrol,150.0,57.7,1.0,,...,,,,,,,,,,
2,Focus,2017.0,13000.0,Manual,12456.0,Petrol,150.0,57.7,1.0,,...,,,,,,,,,,


In [198]:
#fill missing value/NaN with 0
data2 =data2.fillna(0)

In [199]:
# drop column that cant be use anymore
data2 = data2.drop(['model','transmission','fuelType'],axis=1)
data2.head()

Unnamed: 0,year,price,mileage,tax,mpg,engineSize,B-MAX,C-MAX,EcoSport,Edge,...,Puma,S-MAX,Tourneo Connect,Tourneo Custom,Automatic,Manual,Semi-Auto,Diesel,Hybrid,Petrol
0,2017.0,12000.0,15944.0,150.0,57.7,1.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,2018.0,14000.0,9083.0,150.0,57.7,1.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,2017.0,13000.0,12456.0,150.0,57.7,1.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,2019.0,17500.0,10460.0,145.0,40.3,1.5,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,2019.0,16500.0,1482.0,145.0,48.7,1.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [200]:
#replace our dataframe with new data & reset index
data = data2
data = data.reset_index(drop=True)

In [201]:
# scale our data with MinMaxScaler
from sklearn.preprocessing import MinMaxScaler
MMS = MinMaxScaler()
features = ['year','price','mileage','tax','mpg','engineSize']
data[features] = MMS.fit_transform(data[features])
data.head()

Unnamed: 0,year,price,mileage,tax,mpg,engineSize,B-MAX,C-MAX,EcoSport,Edge,...,Puma,S-MAX,Tourneo Connect,Tourneo Custom,Automatic,Manual,Semi-Auto,Diesel,Hybrid,Petrol
0,0.998515,0.486027,0.253079,0.909091,0.776581,0.5,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.99901,0.567031,0.144175,0.909091,0.776581,0.5,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.998515,0.526529,0.197714,0.909091,0.776581,0.5,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.999505,0.708789,0.166032,0.878788,0.542396,0.75,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.999505,0.668287,0.023524,0.878788,0.655451,0.5,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


# Training Model

In [202]:
#split our data into x and y
X= data.drop(['price'],axis=1)
y= data['price']

In [203]:
#split our data into train and testing using train_test_split 
from sklearn.model_selection import train_test_split

x_train,x_test,y_train,y_test = train_test_split(X,y ,test_size=0.8,random_state=20)

In [204]:
#import few regression library
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
from sklearn.tree import DecisionTreeRegressor

In [205]:
# train our data using model specified before.
from sklearn.metrics import mean_squared_error,mean_absolute_error
def model_train(x_train,y_train):
  all_models = [LinearRegression(),RandomForestRegressor(),DecisionTreeRegressor()]
  scores = []
  for i in all_models:
    model = i
    model.fit(x_train,y_train)
    y_predicted = model.predict(x_train)
    mse = mean_squared_error(y_train,y_predicted)
    mae = mean_absolute_error(y_train,y_predicted)
    scores.append({
        'model': i,
        'best_score':model.score(x_train,y_train),
        'mean_squared_error':mse,
        'mean_absolute_error':mae
    })
  return pd.DataFrame(scores,columns=['model','best_score','mean_squared_error','mean_absolute_error'])

# see our model result on training data
model_train(x_train,y_train)

Unnamed: 0,model,best_score,mean_squared_error,mean_absolute_error
0,LinearRegression(),0.96251,0.002267774,0.018227
1,"(DecisionTreeRegressor(max_features='auto', ra...",0.997869,0.0001289187,0.004085
2,DecisionTreeRegressor(),0.999984,9.875286e-07,2.5e-05


It's safe to say that our training model fits perfectly using "DecisionTreeRegressor"

In [206]:
from sklearn.metrics import mean_squared_error,mean_absolute_error
def model_test(x_test,y_test):
  all_models = [LinearRegression(),RandomForestRegressor(),DecisionTreeRegressor()]
  scores = []
  for i in all_models:
    model = i
    model.fit(x_test,y_test)
    y_predicted = model.predict(x_test)
    mse = mean_squared_error(y_test,y_predicted)
    mae = mean_absolute_error(y_test,y_predicted)
    scores.append({
        'model': i,
        'best_score':model.score(x_test,y_test),
        'mean_squared_error':mse,
        'mean_absolute_error':mae
        })
  return pd.DataFrame(scores,columns=['model','best_score','mean_squared_error','mean_absolute_error'])

# see our model result on testing data
model_test(x_test,y_test)

Unnamed: 0,model,best_score,mean_squared_error,mean_absolute_error
0,LinearRegression(),0.963903,0.00218,0.017762
1,"(DecisionTreeRegressor(max_features='auto', ra...",0.997982,0.000122,0.00396
2,DecisionTreeRegressor(),0.999911,5e-06,0.000129


It's safe to say that our testing model also fits perfectly using "DecisionTreeRegressor"

**we got 99% score using Decision Tree Regressor**

In [207]:
# train our model using the best model we got before.
from sklearn.tree import DecisionTreeRegressor
model = DecisionTreeRegressor()
model.fit(x_test,y_test)
y_predicted = model.predict(x_test)

# Result

In [208]:
#plot our testing and predicted data, more linear the data ,the better our model is.

fig =px.scatter(x = y_test,y = y_predicted) 
fig.update_layout(showlegend=False ,plot_bgcolor='white',coloraxis_showscale =False,
  title={'text': "Truth vs Predicted",
        'y':0.95,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'})
fig.update_xaxes(title='Truth',showline=True, linewidth=2, linecolor='black', mirror=True)
fig.update_yaxes(title='Predicted',showline=True, linewidth=2, linecolor='black', mirror=True )
fig.show()  

# Summary

There are several things I can conclude about our dataset :

1.   Fiesta models are the models that sell the most
2.   Mustang models are the models that sell at the highest price
3.  There is a decrease in sales from 2017 until 2019
4.  "year" is the one that has the most influence on the selling price 
5.  mileage also has a big contribution that affects sales price


Business strategy :

*  With my model "Decision Tree Regressor", sellers can easily determine the right selling price for the used car.


