<a href="https://colab.research.google.com/github/kiran9615/Linear-Regression-Model-Car-Dekho-/blob/main/Linear_Regression_(Car_Dekho).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**Project : Prediction using Linear Regression**

**Objective :  To predict the most probable car prices with the use of basic linear regression model**

In [1]:
#importing all the required libraries for analysis
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import math
#importing all the required libraries for predictive modeling
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error,mean_absolute_error
from sklearn.metrics import r2_score
#to ignore unnecessary warnings
import warnings
warnings.filterwarnings('ignore')

In [2]:
#mounting drive in colab
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [3]:
#reading car details from car dekho csv file and storing it in cardekho_cardetails_data dataframe
file_path = '/content/drive/MyDrive/Datascience/Datasets/'
cardekho_cardetails_data = pd.read_csv(file_path + 'CAR DETAILS FROM CAR DEKHO.csv')

##**Data Preprocessing**

In [4]:
#checking for the dimension of cardekho_cardetails_data
cardekho_cardetails_data.shape   

(4340, 8)

*So, the data has 4340 observations and 8 features*

In [5]:
#1st five observations of cardekho_cardetails_data
cardekho_cardetails_data.head()   

Unnamed: 0,name,year,selling_price,km_driven,fuel,seller_type,transmission,owner
0,Maruti 800 AC,2007,60000,70000,Petrol,Individual,Manual,First Owner
1,Maruti Wagon R LXI Minor,2007,135000,50000,Petrol,Individual,Manual,First Owner
2,Hyundai Verna 1.6 SX,2012,600000,100000,Diesel,Individual,Manual,First Owner
3,Datsun RediGO T Option,2017,250000,46000,Petrol,Individual,Manual,First Owner
4,Honda Amaze VX i-DTEC,2014,450000,141000,Diesel,Individual,Manual,Second Owner


Here, we can see the **name** column consists of model name with their brands. So, we can split this column to form two separate columns, one is **brand** and another is **model**

In [6]:
#checking for duplicate rows
print('Number of duplicate observations in the data : ',cardekho_cardetails_data.duplicated().sum()) 

Number of duplicate observations in the data :  763


In [7]:
#dropping the duplicated rows
cardekho_cardetails_data.drop_duplicates(inplace = True)   

In [8]:
#checking for the dimension of cardekho_cardetails_data after dropping duplicated observations
cardekho_cardetails_data.shape   

(3577, 8)

In [9]:
#defining a function splitnme_getbrand to get name of the brand from 'name' column
def splitnme_getbrand(x):
  y=x.split(' ')   #splitting a string by ' ' to get list of words 
  return y[0]   #returning first word from the list which is a name of brand

In [10]:
#creating a new feature 'brand' using above defined function
cardekho_cardetails_data['brand'] = cardekho_cardetails_data['name'].apply(splitnme_getbrand)

In [11]:
#defining a function splitnme_getmodel to get model name from 'name' column
def splitnme_getmodel(x):
  y=x.split(' ')   #splitting a string by ' ' to get list of words 
  return y[1]   #returning second word from the list which is a name of model

In [12]:
#creating a new feature 'model' using above defined function
cardekho_cardetails_data['model'] = cardekho_cardetails_data['name'].apply(splitnme_getmodel)

In [13]:
cardekho_cardetails_data.head()   #checking for 1st five rows after creating new features

Unnamed: 0,name,year,selling_price,km_driven,fuel,seller_type,transmission,owner,brand,model
0,Maruti 800 AC,2007,60000,70000,Petrol,Individual,Manual,First Owner,Maruti,800
1,Maruti Wagon R LXI Minor,2007,135000,50000,Petrol,Individual,Manual,First Owner,Maruti,Wagon
2,Hyundai Verna 1.6 SX,2012,600000,100000,Diesel,Individual,Manual,First Owner,Hyundai,Verna
3,Datsun RediGO T Option,2017,250000,46000,Petrol,Individual,Manual,First Owner,Datsun,RediGO
4,Honda Amaze VX i-DTEC,2014,450000,141000,Diesel,Individual,Manual,Second Owner,Honda,Amaze


In [14]:
for col in ['name','brand','model']:   #iterating over list 
   print(f'Number of unique values of {col} is : ',cardekho_cardetails_data[col].nunique())   #no. of unique values for features of the list

Number of unique values of name is :  1491
Number of unique values of brand is :  29
Number of unique values of model is :  185


*we have data of 3577 unique observations, and 1491 unique values for name feature, the given data would be quite less to learn all the patterns for this large no. of values of name feature and as **brand** and **model** are created from this **name** feature itself and unique values for these both also not that large, so we can use these both features instead of name feature to learn the patterns, So Let's drop 'name' feature.*

In [15]:
cardekho_cardetails_data.drop(columns='name',inplace=True)   #dropping feature 'name'

In [16]:
cardekho_cardetails_data.info()   #to get basic information of data

<class 'pandas.core.frame.DataFrame'>
Int64Index: 3577 entries, 0 to 4339
Data columns (total 9 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   year           3577 non-null   int64 
 1   selling_price  3577 non-null   int64 
 2   km_driven      3577 non-null   int64 
 3   fuel           3577 non-null   object
 4   seller_type    3577 non-null   object
 5   transmission   3577 non-null   object
 6   owner          3577 non-null   object
 7   brand          3577 non-null   object
 8   model          3577 non-null   object
dtypes: int64(3), object(6)
memory usage: 279.5+ KB


*Okay, no any null value in the data*

###**Data Dictionary**

* *year - year in which the car was bought*
* *km_driven - distance travelled by the car in km since it bought*
* *fuel - type of fuel car uses (Petrol, Diesel, CNG, LPG, Electric)*
* *seller_type - whether the seller is a dealer or an individual*
* *transmission -  car is manual or automatic*
* *owner - number of owners the car previously had*
* *brand - brand of the car*
* *model - model of the car*
* *selling_price - selling price of the car*
