# Regression Miniproject
In this tutorial we will discuss how we can make a regression model and integrate it using flask for front end. Moreover we will also see how to host this model for free to make it useful for everyone.

## Car Price Prediction
We will be predicting car price based on various features or independent variable such as:-<br>
1)Name<br>
2)Year<br>
3)km_driven<br>
4)fuel<br>
5)seller_type<br>
6)transmission<br>
7)Owner<br>

Here the output of the model will be car's selling price.<br>
Let's start the project with importing dataset.
## Importing Libraries

In [41]:
import numpy as np
import pandas as pd


In [42]:
# Importing the dataset
dataset = pd.read_csv('car_data.csv')
dataset.head()

Unnamed: 0,name,year,selling_price,km_driven,fuel,seller_type,transmission,owner
0,Maruti 800 AC,2007,60000,70000,Petrol,Individual,Manual,First Owner
1,Maruti Wagon R LXI Minor,2007,135000,50000,Petrol,Individual,Manual,First Owner
2,Hyundai Verna 1.6 SX,2012,600000,100000,Diesel,Individual,Manual,First Owner
3,Datsun RediGO T Option,2017,250000,46000,Petrol,Individual,Manual,First Owner
4,Honda Amaze VX i-DTEC,2014,450000,141000,Diesel,Individual,Manual,Second Owner


In [43]:
X = dataset.iloc[:, [0,1,3,4,5,6,7]]
y = dataset.iloc[:, 2]

In [44]:
X.head()

Unnamed: 0,name,year,km_driven,fuel,seller_type,transmission,owner
0,Maruti 800 AC,2007,70000,Petrol,Individual,Manual,First Owner
1,Maruti Wagon R LXI Minor,2007,50000,Petrol,Individual,Manual,First Owner
2,Hyundai Verna 1.6 SX,2012,100000,Diesel,Individual,Manual,First Owner
3,Datsun RediGO T Option,2017,46000,Petrol,Individual,Manual,First Owner
4,Honda Amaze VX i-DTEC,2014,141000,Diesel,Individual,Manual,Second Owner


In [45]:
# Including columns which are of object datatype in modified dataframe
df_car_mod = X.select_dtypes(include=['object'])
# Viewing first few rows of data
df_car_mod.head()

Unnamed: 0,name,fuel,seller_type,transmission,owner
0,Maruti 800 AC,Petrol,Individual,Manual,First Owner
1,Maruti Wagon R LXI Minor,Petrol,Individual,Manual,First Owner
2,Hyundai Verna 1.6 SX,Diesel,Individual,Manual,First Owner
3,Datsun RediGO T Option,Petrol,Individual,Manual,First Owner
4,Honda Amaze VX i-DTEC,Diesel,Individual,Manual,Second Owner


In [46]:
# Checking for any null values present in the dataset
df_car_mod['seller_type'].value_counts()

Individual          3244
Dealer               994
Trustmark Dealer     102
Name: seller_type, dtype: int64

In [47]:
# Encoding fuel_type using get_dummies
df_car_mod = pd.get_dummies(df_car_mod, columns=['fuel','seller_type','transmission'], drop_first=True)

In [48]:
df_car_mod.head(15)

Unnamed: 0,name,owner,fuel_Diesel,fuel_Electric,fuel_LPG,fuel_Petrol,seller_type_Individual,seller_type_Trustmark Dealer,transmission_Manual
0,Maruti 800 AC,First Owner,0,0,0,1,1,0,1
1,Maruti Wagon R LXI Minor,First Owner,0,0,0,1,1,0,1
2,Hyundai Verna 1.6 SX,First Owner,1,0,0,0,1,0,1
3,Datsun RediGO T Option,First Owner,0,0,0,1,1,0,1
4,Honda Amaze VX i-DTEC,Second Owner,1,0,0,0,1,0,1
5,Maruti Alto LX BSIII,First Owner,0,0,0,1,1,0,1
6,Hyundai Xcent 1.2 Kappa S,First Owner,0,0,0,1,1,0,1
7,Tata Indigo Grand Petrol,Second Owner,0,0,0,1,1,0,1
8,Hyundai Creta 1.6 VTVT S,First Owner,0,0,0,1,1,0,1
9,Maruti Celerio Green VXI,First Owner,0,0,0,0,1,0,1


In [49]:
df_car_mod.dtypes

name                            object
owner                           object
fuel_Diesel                      uint8
fuel_Electric                    uint8
fuel_LPG                         uint8
fuel_Petrol                      uint8
seller_type_Individual           uint8
seller_type_Trustmark Dealer     uint8
transmission_Manual              uint8
dtype: object

In [50]:
# Create a dictionary to find and replace values
dic_to_replace = {"owner": {"First Owner": 1, "Second Owner": 2,"Third Owner": 3,"Fourth & Above Owner": 4,"Test Drive Car":5}}
df_car_mod.replace(dic_to_replace, inplace=True)
# View first few rows of data
df_car_mod['owner'].head()

0    1
1    1
2    1
3    1
4    2
Name: owner, dtype: int64

In [51]:
# Enoding make column using LabelBinarizer
from sklearn.preprocessing import LabelBinarizer
labelbinarizer = LabelBinarizer()
make_encoded_results = labelbinarizer.fit_transform(df_car_mod['name'])

In [52]:
labelbinarizer.classes_

array(['Ambassador CLASSIC 1500 DSL AC', 'Ambassador Classic 2000 Dsz',
       'Ambassador Grand 1800 ISZ MPFI PW CL', ...,
       'Volvo XC 90 D5 Inscription BSIV', 'Volvo XC60 D3 Kinetic',
       'Volvo XC60 D5 Inscription'], dtype='<U58')

In [53]:
# Converting an numpy array into a pandas dataframe
df_make_encoded = pd.DataFrame(make_encoded_results, columns=labelbinarizer.classes_)
# Viewing few rows of data
df_make_encoded.sample(10)

Unnamed: 0,Ambassador CLASSIC 1500 DSL AC,Ambassador Classic 2000 Dsz,Ambassador Grand 1800 ISZ MPFI PW CL,Audi A4 1.8 TFSI,Audi A4 2.0 TDI,Audi A4 2.0 TDI 177 Bhp Premium Plus,Audi A4 3.0 TDI Quattro,Audi A4 30 TFSI Technology,Audi A4 35 TDI Premium,Audi A4 35 TDI Premium Plus,...,Volkswagen Vento Diesel Trendline,Volkswagen Vento IPL II Diesel Trendline,Volkswagen Vento Magnific 1.6 Highline,Volkswagen Vento New Diesel Highline,Volkswagen Vento Petrol Highline,Volkswagen Vento Petrol Highline AT,Volvo V40 D3 R Design,Volvo XC 90 D5 Inscription BSIV,Volvo XC60 D3 Kinetic,Volvo XC60 D5 Inscription
2562,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2666,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2273,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
482,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1212,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3013,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3022,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
566,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1223,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1458,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [54]:
dfs = [df_make_encoded,X['year'],X['km_driven'],df_car_mod]

In [55]:
df_car_mod.drop(['name'],axis=1,inplace=True)

In [56]:
df_car_mod.head()

Unnamed: 0,owner,fuel_Diesel,fuel_Electric,fuel_LPG,fuel_Petrol,seller_type_Individual,seller_type_Trustmark Dealer,transmission_Manual
0,1,0,0,0,1,1,0,1
1,1,0,0,0,1,1,0,1
2,1,1,0,0,0,1,0,1
3,1,0,0,0,1,1,0,1
4,2,1,0,0,0,1,0,1


In [57]:
res = pd.concat(dfs,axis=1)

In [58]:
res.head(15)

Unnamed: 0,Ambassador CLASSIC 1500 DSL AC,Ambassador Classic 2000 Dsz,Ambassador Grand 1800 ISZ MPFI PW CL,Audi A4 1.8 TFSI,Audi A4 2.0 TDI,Audi A4 2.0 TDI 177 Bhp Premium Plus,Audi A4 3.0 TDI Quattro,Audi A4 30 TFSI Technology,Audi A4 35 TDI Premium,Audi A4 35 TDI Premium Plus,...,year,km_driven,owner,fuel_Diesel,fuel_Electric,fuel_LPG,fuel_Petrol,seller_type_Individual,seller_type_Trustmark Dealer,transmission_Manual
0,0,0,0,0,0,0,0,0,0,0,...,2007,70000,1,0,0,0,1,1,0,1
1,0,0,0,0,0,0,0,0,0,0,...,2007,50000,1,0,0,0,1,1,0,1
2,0,0,0,0,0,0,0,0,0,0,...,2012,100000,1,1,0,0,0,1,0,1
3,0,0,0,0,0,0,0,0,0,0,...,2017,46000,1,0,0,0,1,1,0,1
4,0,0,0,0,0,0,0,0,0,0,...,2014,141000,2,1,0,0,0,1,0,1
5,0,0,0,0,0,0,0,0,0,0,...,2007,125000,1,0,0,0,1,1,0,1
6,0,0,0,0,0,0,0,0,0,0,...,2016,25000,1,0,0,0,1,1,0,1
7,0,0,0,0,0,0,0,0,0,0,...,2014,60000,2,0,0,0,1,1,0,1
8,0,0,0,0,0,0,0,0,0,0,...,2015,25000,1,0,0,0,1,1,0,1
9,0,0,0,0,0,0,0,0,0,0,...,2017,78000,1,0,0,0,0,1,0,1


In [59]:
# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(res.values, y.values, test_size = 0.2, random_state = 0)

# Feature Scaling
"""
from sklearn.preprocessing import StandardScaler
sc_X = StandardScaler()
sc_y = StandardScaler()
X = sc_X.fit_transform(X)
y = sc_y.fit_transform(y)"""
print(X_train[1:15,:])


[[0 0 0 ... 1 0 1]
 [0 0 0 ... 1 0 1]
 [0 0 0 ... 0 1 1]
 ...
 [0 0 0 ... 0 0 1]
 [0 0 0 ... 1 0 1]
 [0 0 0 ... 0 0 1]]


In [60]:
from sklearn.ensemble import RandomForestRegressor
regressor = RandomForestRegressor(n_estimators=300,random_state=0)
regressor.fit(X_train,y_train)

RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=None,
                      max_features='auto', max_leaf_nodes=None,
                      min_impurity_decrease=0.0, min_impurity_split=None,
                      min_samples_leaf=1, min_samples_split=2,
                      min_weight_fraction_leaf=0.0, n_estimators=300,
                      n_jobs=None, oob_score=False, random_state=0, verbose=0,
                      warm_start=False)

In [61]:
accuracy = regressor.score(X_test,y_test)
print(accuracy*100,'%')

81.14147576233616 %


In [77]:
new_data=["Maruti 800 AC",2007,70000,"Petrol","Individual","Manual","First Owner"]
new_data=pd.DataFrame(new_data)
new_data_mod = new_data.select_dtypes(include=['object'])

In [79]:
new_data_mod=new_data_mod.T
print(new_data_mod)

               0     1      2       3           4       5            6
0  Maruti 800 AC  2007  70000  Petrol  Individual  Manual  First Owner


In [81]:
new_data_mod_1 = pd.get_dummies(new_data_mod, columns=[3,4,5], drop_first=True)

In [82]:
new_data_mod_1

Unnamed: 0,0,1,2,6
0,Maruti 800 AC,2007,70000,First Owner
