Context
A huge demand is observale in India for used cars market. Whereas the demand for new cars is dropping, however the demand for used cars is rising. From the available dataset Used_car_data, a proper analysis is required to identify the relationship among different variables, for example, Power, Engine, Price, New_Price, Location etc. A good model is also required to predict the price of used car for this growing market. 

In [2]:
#Libraries to read and manipulate data
import pandas as pd
import numpy as np

#Libraries to visualize the data
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
sns.set()

In [3]:
#Removing the limit for showing number of columns
pd.set_option("display.max_columns", None)
#Seting the limit for maximum number of rows to 200
pd.set_option("display.max_rows", 200)

In [4]:
# to divide the data into train and test classifications
from sklearn.model_selection import train_test_split

#to build the linear regression model
from sklearn.linear_model import LinearRegression

#to check the model performence
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

In [5]:
#loading the original dataset
uc=pd.read_csv("used_cars_data.csv", index_col=0)
#copying the dataset, so that it remains unchanged
uca=uc.copy()

In [6]:
#checking the shape of the data using f string
print(f"There are {uca.shape[0]} rows and {uca.shape[1]} columns.")

There are 7253 rows and 13 columns.


In [7]:
#checking the 10 random rows from the data
np.random.seed(1)
uca.sample(n=10)

Unnamed: 0_level_0,Name,Location,Year,Kilometers_Driven,Fuel_Type,Transmission,Owner_Type,Mileage,Engine,Power,Seats,New_Price,Price
S.No.,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
2397,Ford EcoSport 1.5 Petrol Trend,Kolkata,2016,21460,Petrol,Manual,First,17.0 kmpl,1497 CC,121.36 bhp,5.0,9.47 Lakh,6.0
3777,Maruti Wagon R VXI 1.2,Kochi,2015,49818,Petrol,Manual,First,21.5 kmpl,1197 CC,81.80 bhp,5.0,5.44 Lakh,4.11
4425,Ford Endeavour 4x2 XLT,Hyderabad,2007,130000,Diesel,Manual,First,13.1 kmpl,2499 CC,141 bhp,7.0,,6.0
3661,Mercedes-Benz E-Class E250 CDI Avantgrade,Coimbatore,2016,39753,Diesel,Automatic,First,13.0 kmpl,2143 CC,201.1 bhp,5.0,,35.28
4514,Hyundai Xcent 1.2 Kappa AT SX Option,Kochi,2016,45560,Petrol,Automatic,First,16.9 kmpl,1197 CC,82 bhp,5.0,,6.34
599,Toyota Innova Crysta 2.8 ZX AT,Coimbatore,2019,40674,Diesel,Automatic,First,11.36 kmpl,2755 CC,171.5 bhp,7.0,28.05 Lakh,24.82
186,Mercedes-Benz E-Class E250 CDI Avantgrade,Bangalore,2014,37382,Diesel,Automatic,First,13.0 kmpl,2143 CC,201.1 bhp,5.0,,32.0
305,Audi A6 2011-2015 2.0 TDI Premium Plus,Kochi,2014,61726,Diesel,Automatic,First,17.68 kmpl,1968 CC,174.33 bhp,5.0,,20.77
4582,Hyundai i20 1.2 Magna,Kolkata,2011,36000,Petrol,Manual,First,18.5 kmpl,1197 CC,80 bhp,5.0,,2.5
5434,Honda WR-V Edge Edition i-VTEC S,Kochi,2019,13913,Petrol,Manual,First,17.5 kmpl,1199 CC,88.7 bhp,5.0,9.36 Lakh,8.2


In [8]:
#checking is there any duplication of the data
uca.duplicated().sum()

1

In [27]:
#Removing the duplicated rows and confirming last status of duplication
uca.drop_duplicates(subset=None,keep="first", inplace=True)
uca.duplicated().sum()

0

In [29]:
#tocheck the duplicated data
# int this case as we have delted the duplicated data, that is not showing now
#uca[uca.duplicated(keep=False)==True]

In [None]:
#uca.drop(rowno, inplace=True)

In [10]:
#checking the name of columns in the data
uca.columns

Index(['Name', 'Location', 'Year', 'Kilometers_Driven', 'Fuel_Type',
       'Transmission', 'Owner_Type', 'Mileage', 'Engine', 'Power', 'Seats',
       'New_Price', 'Price'],
      dtype='object')

In [11]:
#check for datatypes and non-null numbers
uca.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 7252 entries, 0 to 7252
Data columns (total 13 columns):
Name                 7252 non-null object
Location             7252 non-null object
Year                 7252 non-null int64
Kilometers_Driven    7252 non-null int64
Fuel_Type            7252 non-null object
Transmission         7252 non-null object
Owner_Type           7252 non-null object
Mileage              7250 non-null object
Engine               7206 non-null object
Power                7077 non-null object
Seats                7199 non-null float64
New_Price            1006 non-null object
Price                6019 non-null float64
dtypes: float64(2), int64(2), object(9)
memory usage: 793.2+ KB


#From the above info, it is very significant, the new_price column has a huge number of missing columns, that is not a good idea to keep the column for building a model. Though it is a important variable, because the price of used car depends on the price of the car, when newly purchased
#There are columns which has wrong data types should be changed into right one, for example Mileage should be converted into float
#Year column can be converted into duration of years 
#Seats can be converted into integer variable than float, because there is no chance of fraction values
#Mileage and Power columns should be converted into float and remove the kmpl, km/kg and bhp 
#Engine should be converted into integer and remove CC
#Price is dependent variable

In [12]:
#chekcing the nullvalues in the data
uca.isnull().sum().sort_values(ascending=False)

New_Price            6246
Price                1233
Power                 175
Seats                  53
Engine                 46
Mileage                 2
Owner_Type              0
Transmission            0
Fuel_Type               0
Kilometers_Driven       0
Year                    0
Location                0
Name                    0
dtype: int64

#New_price has a significant number of mising values
#Price has 1233 missing values.As this is our target variable, better to remove these rows
#Mileage, Engine, Power and Seats columns also have missing values, however insignificant in number, we can can apply the mean, median or mode in those cases accoridng to the data type of the features.

In [13]:
#dropping the New_Price column from data
uca.drop(["New_Price"], axis=1, inplace=True)

#checking the data after droping the column
uca.head()

Unnamed: 0_level_0,Name,Location,Year,Kilometers_Driven,Fuel_Type,Transmission,Owner_Type,Mileage,Engine,Power,Seats,Price
S.No.,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
0,Maruti Wagon R LXI CNG,Mumbai,2010,72000,CNG,Manual,First,26.6 km/kg,998 CC,58.16 bhp,5.0,1.75
1,Hyundai Creta 1.6 CRDi SX Option,Pune,2015,41000,Diesel,Manual,First,19.67 kmpl,1582 CC,126.2 bhp,5.0,12.5
2,Honda Jazz V,Chennai,2011,46000,Petrol,Manual,First,18.2 kmpl,1199 CC,88.7 bhp,5.0,4.5
3,Maruti Ertiga VDI,Chennai,2012,87000,Diesel,Manual,First,20.77 kmpl,1248 CC,88.76 bhp,7.0,6.0
4,Audi A4 New 2.0 TDI Multitronic,Coimbatore,2013,40670,Diesel,Automatic,Second,15.2 kmpl,1968 CC,140.8 bhp,5.0,17.74


In [None]:
#split function to treat all mercedrs as one mercedez, all maruti brand as 1 Brankd
#uca["Brand"]=uca["Name"].str.apply(lambda x: x,split("")[0].lower())
#uca["Model"]=uca["Name"].str.apply(lambda x: x,split("")[1].lower())
#checking extreme values in seat
#uca.sort_values(by=["Seats"], ascending=True).head(5)
#craitng new column with transformed value
#uca["Price_log"]=np.log(uca["Price"])

In [14]:
# creating the loop, which will show us where there is  'CC'
cc_cols = []
for colname in uca.columns[uca.dtypes == 'object']:
    if uca[colname].str.endswith('CC').any():
        cc_cols.append(colname)
print(cc_cols)

['Engine']


In [15]:
#now changing the CC values and changing the columntype to integer
def cc_to_num(engine):
    if isinstance (engine, str):
        return float(engine.replace("CC", ""))
    else:
        return np.nan
for colname in cc_cols:
    uca[colname] = uca[colname].apply(cc_to_num)
uca[cc_cols].head()

Unnamed: 0_level_0,Engine
S.No.,Unnamed: 1_level_1
0,998.0
1,1582.0
2,1199.0
3,1248.0
4,1968.0


In [None]:
#alternative to split
#uca_Mileage=ucs["Mileage"].str.split("", expand=True)

In [16]:
uca.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 7252 entries, 0 to 7252
Data columns (total 12 columns):
Name                 7252 non-null object
Location             7252 non-null object
Year                 7252 non-null int64
Kilometers_Driven    7252 non-null int64
Fuel_Type            7252 non-null object
Transmission         7252 non-null object
Owner_Type           7252 non-null object
Mileage              7250 non-null object
Engine               7206 non-null float64
Power                7077 non-null object
Seats                7199 non-null float64
Price                6019 non-null float64
dtypes: float64(3), int64(2), object(7)
memory usage: 736.5+ KB


In [17]:
#we will change the year into duration taking 2021 as running year
#First changing the datatype of year column
uca["Year"].astype(int)

S.No.
0       2010
1       2015
2       2011
3       2012
4       2013
        ... 
7248    2011
7249    2015
7250    2012
7251    2013
7252    2014
Name: Year, Length: 7252, dtype: int32

In [18]:
#creating a new column assigning 2021 value for each row and changing the datatpye as integer
uca["Running_Year"]="2021"

In [19]:
#converting the column into integer
uca["Running_Year"]=uca["Running_Year"].astype(int)

In [20]:
#creating another column usage year
uca["Usage_Years"]=uca["Running_Year"]-uca["Year"]
uca.head()

Unnamed: 0_level_0,Name,Location,Year,Kilometers_Driven,Fuel_Type,Transmission,Owner_Type,Mileage,Engine,Power,Seats,Price,Running_Year,Usage_Years
S.No.,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
0,Maruti Wagon R LXI CNG,Mumbai,2010,72000,CNG,Manual,First,26.6 km/kg,998.0,58.16 bhp,5.0,1.75,2021,11
1,Hyundai Creta 1.6 CRDi SX Option,Pune,2015,41000,Diesel,Manual,First,19.67 kmpl,1582.0,126.2 bhp,5.0,12.5,2021,6
2,Honda Jazz V,Chennai,2011,46000,Petrol,Manual,First,18.2 kmpl,1199.0,88.7 bhp,5.0,4.5,2021,10
3,Maruti Ertiga VDI,Chennai,2012,87000,Diesel,Manual,First,20.77 kmpl,1248.0,88.76 bhp,7.0,6.0,2021,9
4,Audi A4 New 2.0 TDI Multitronic,Coimbatore,2013,40670,Diesel,Automatic,Second,15.2 kmpl,1968.0,140.8 bhp,5.0,17.74,2021,8


In [21]:
#Dropping the year and running year column as they are redaundant now
uca.drop(["Year", "Running_Year"], axis=1, inplace=True)

In [22]:
uca.head()

Unnamed: 0_level_0,Name,Location,Kilometers_Driven,Fuel_Type,Transmission,Owner_Type,Mileage,Engine,Power,Seats,Price,Usage_Years
S.No.,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
0,Maruti Wagon R LXI CNG,Mumbai,72000,CNG,Manual,First,26.6 km/kg,998.0,58.16 bhp,5.0,1.75,11
1,Hyundai Creta 1.6 CRDi SX Option,Pune,41000,Diesel,Manual,First,19.67 kmpl,1582.0,126.2 bhp,5.0,12.5,6
2,Honda Jazz V,Chennai,46000,Petrol,Manual,First,18.2 kmpl,1199.0,88.7 bhp,5.0,4.5,10
3,Maruti Ertiga VDI,Chennai,87000,Diesel,Manual,First,20.77 kmpl,1248.0,88.76 bhp,7.0,6.0,9
4,Audi A4 New 2.0 TDI Multitronic,Coimbatore,40670,Diesel,Automatic,Second,15.2 kmpl,1968.0,140.8 bhp,5.0,17.74,8


In [23]:
#checking for kg string in column
kg_cols = []
for colname in uca.columns[uca.dtypes == 'object']:  
    if uca[colname].str.endswith('km/kg').any():
        kg_cols.append(colname)
print(kg_cols)

['Mileage']


In [24]:
# replacing the kg/km and kmpl value and converting the column into float
def kg_to_num(mileage):
    if isinstance (mileage, str):
        return float(mileage.replace("km/kg", "").replace("kmpl", ""))
    else:
        return np.nan
for colname in kg_cols:
    uca[colname] = uca[colname].apply(kg_to_num)
uca[kg_cols].head()

Unnamed: 0_level_0,Mileage
S.No.,Unnamed: 1_level_1
0,26.6
1,19.67
2,18.2
3,20.77
4,15.2


In [25]:
#checking for bhp string
bhp_cols = []
for colname in ucb.columns[uca.dtypes == 'object']:
    if uca[colname].str.endswith('bhp').any():
        bhp_cols.append(colname)
print(bhp_cols)

NameError: name 'ucb' is not defined

In [None]:
#removing the bhp value and converting the column into float
def bhp_to_num(power):
    if isinstance (power, str):
        return float(power.replace("bhp", ""))
    else:
        return np.nan
for colname in bhp_cols:
    uca[colname] = uca[colname].apply(bhp_to_num)
uca[bhp_cols].head()

In [None]:
#deleting the missing values by rowwise
uca.dropna(axis=0, how="any", inplace=True)

In [None]:
#Checking for missing values
uca.isnull().sum()

In [None]:
# it shows there are no missing values in all coulmns

In [None]:
uca.describe().T

In [None]:
#We see from above some statistics from the numerical columns
#From the Kilometers_driven column, the gap between max and min value is very high
#There are many outliers in price, power these columns
#Mileage, Engine, those are so far normally distributed with having some outliers


In [None]:
#Data Visualization and EDA
#Creating a countplot for Location column
plt.figure(figsize=(15,7))
sns.countplot(uca["Location"])
plt.show()

In [None]:
#Creating a countplot for Fuel_Type column
plt.figure(figsize=(15,7))
sns.countplot(uca["Fuel_Type"])
plt.show()

In [None]:
#Creating a countplot for Owner_Type column
plt.figure(figsize=(15,7))
sns.countplot(uca["Owner_Type"])
plt.show()

In [None]:
#Creating a countplot for Transmission column
plt.figure(figsize=(15,7))
sns.countplot(uca["Transmission"])
plt.show()

In [None]:
#Creating a countplot for Seats column
plt.figure(figsize=(15,7))
sns.countplot(uca["Seats"])
plt.show()

In [None]:
#Creating a boxplot  for Kilometers_Driven column
plt.figure(figsize=(15,7))
sns.boxplot(uca["Kilometers_Driven"])
plt.show()

from the above boxplot, we see there are significant number outliers

In [None]:
#Creating a distplot  for Kilometers_Driven column
plt.figure(figsize=(15,7))
sns.distplot(uca["Kilometers_Driven"])
plt.show()

#this is highly skewed. We can try transfom to log value here

In [None]:
#Creating a boxplot  for Mileage column
plt.figure(figsize=(15,7))
sns.boxplot(uca["Mileage"]);
plt.show()

In [None]:
#Creating a distplot  for Mileage column
plt.figure(figsize=(15,7))
sns.distplot(uca["Mileage"]);
plt.show()

From the above plot, we see the data are good normally distributed

In [None]:
#Creating a boxplot  for Engine column
plt.figure(figsize=(15,7))
sns.boxplot(uca["Engine"]);
plt.show()

In [None]:
#Creating a distplot  for Engine column
plt.figure(figsize=(15,7))
sns.distplot(uca["Engine"]);
plt.show()

In [None]:
#Creating a boxplot  for Power column
plt.figure(figsize=(15,7))
sns.boxplot(uca["Power"]);
plt.show()

# from the above boxplot, we see there are significant number outliers

In [None]:
#Creating a boxplot  for Power column
plt.figure(figsize=(15,7))
sns.distplot(uca["Power"]);
plt.show()

In [None]:
#Creating a boxplot  for Price column
plt.figure(figsize=(15,7))
sns.boxplot(uca["Price"]);
plt.show()

# from the above boxplot, we see there are significant number outliers

In [None]:
#Creating a distplot  for Price column
plt.figure(figsize=(15,7))
sns.distplot(uca["Price"]);
plt.show()

#the price data is rightly skewed

In [None]:
#Creating a boxplot  for Usage_Years column
plt.figure(figsize=(15,7))
sns.boxplot(uca["Usage_Years"]);
plt.show()

In [None]:
#Creating a distplot  for Usage_Years column
plt.figure(figsize=(15,7))
sns.distplot(uca["Usage_Years"]);
plt.show()

In [None]:
#Kilometers_driven and price columns are higly skewed. Those can be transformed to log
cols_to_log = ['Kilometers_Driven', 'Price']
for colname in cols_to_log:
    plt.hist(uca[colname], bins=50)
    plt.title(colname)
    plt.show()
    print(np.sum(uca[colname] <= 0))

#Though incase of price column, the log transformation worked, however in Kilometers_Driven it has no impact, so we can try with arcsinh 

In [None]:
#applying the arcsinh
plt.hist(np.arcsinh(uca['Kilometers_Driven']), 50)
plt.title('arcsinh(Kilometers_Driven)')
plt.show()

In [None]:
#Incase of continous variables, we can also try with Z distribution

In [None]:
#Importing standard scaler
from sklearn.preprocessing import StandardScaler

In [None]:
uca['Price_z_std'] = std_scaler.fit_transform(uca[['Price']])
uca['Price_z_std'].hist(bins=20)
plt.title('Z transformation')
plt.show()

#Bivariate analysis

In [None]:
#checking the correlation among variables
ucorr=uca.corr()
ucorr

#From the correlation, we see there is negative correlation of Kilometers_Driven with price, that makes sense, however the relationship is very insignificant. It also applies for Seats.There should be positive relationship on price with increasing tnumber of seats
#There is strong positive relation Engine, and Power with price, which also logical
#There is negative relationship of usage years with price, that also makes sense, though the relationship is not so strong.


In [None]:
#Bivariate Analysis
#heatmap is plottes for correaltion
plt.figure(figsize=(15,7))
sns.heatmap(ucorr, vmin=-1, vmax=1, fmt=".2f", cmap="Spectral", annot=True);

In [None]:
#Scatterplot  for Mileage and Price columns
plt.figure(figsize=(15,7))
sns.scatterplot(uca["Mileage"], uca["Price"], ci=None);
plt.show()

In [None]:
#Scatterplot  for Usage_Years and Price columns
plt.figure(figsize=(15,7))
sns.scatterplot(uca["Usage_Years"], uca["Price"], ci=None);
plt.show()

In [None]:
#Scatterplot  for Kilometers_Driven and Price columns
plt.figure(figsize=(15,7))
sns.scatterplot(uca["Kilometers_Driven"], uca["Price"], ci=None);
plt.show()

In [None]:
#Scatterplot  for Engine and Price columns
plt.figure(figsize=(15,7))
sns.scatterplot(uca["Engine"], uca["Price"], ci=None);
plt.show()

In [None]:
#Scatterplot  for Power and Price columns
plt.figure(figsize=(15,7))
sns.scatterplot(uca["Power"], uca["Price"], ci=None);
plt.show()

# From the scatterplotts also observable, there are positive influence on the dependent variable from power and engine. That also important information for marketer, that customers really put a lot of importance on engine and power to buy a old used cars rather than seat and  mileage. 

In [None]:
#Before building mode, I am droppping the Name column, as there are too many unique values, which is difficult to handle
uca.drop(["Name"], axis=1, inplace=True)

In [None]:
#Building the Model and taking the Price as x Variable
X = uca.drop(["Price"], axis=1)
y = uca["Price"]

In [None]:
#Crating the dummy variables for object features
X = pd.get_dummies(
    X,
    columns=X.select_dtypes(include=["object"]).columns.tolist(),
    drop_first=True,
)
X.head()

In [None]:
X.shape

In [None]:
#splitling the data into train and test
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=1)

In [None]:
print("Rows in train data =", x_train.shape[0])
print("Rows in test data =", x_test.shape[0])

In [None]:
#fitting the Linear Regression Model
lin_reg_model = LinearRegression()
lin_reg_model.fit(x_train, y_train)

In [None]:
#Checking the coeffcients values generated by the model
coef_df = pd.DataFrame(
    np.append(lin_reg_model.coef_, lin_reg_model.intercept_),
    index=x_train.columns.tolist() + ["Intercept"],
    columns=["Coefficients"],
)
coef_df

Explanation of coefficients:
    -There is negative coefficients with owner type fourth and above, whereas positive with other owner types
    -There is negative relation with petrol fuel types, whereas positive with other one
    -
    

In [None]:
#defining the function for adjusted r sqaure
def adj_r2_score(predictors, targets, predictions):
    r2 = r2_score(targets, predictions)
    n = predictors.shape[0]
    k = predictors.shape[1]
    return 1 - ((1 - r2) * (n - 1) / (n - k - 1))


#defining the function for mape score
def mape_score(targets, predictions):
    return np.mean(np.abs(targets - predictions) / targets) * 100


# definig the function for different metrics to check performance of a regression model
def model_performance_regression(model, predictors, target):
  

 # predicting using the independent variables
    pred = model.predict(predictors)
    
    # computing the r square
    r2 = r2_score(target, pred)  
    # computing adjusted R-squared
    adjr2 = adj_r2_score(predictors, target, pred)  
    # computing RMSE
    rmse = np.sqrt(mean_squared_error(target, pred))  
    # computing MAE
    mae = mean_absolute_error(target, pred)  
    # computing MAPE
    mape = mape_score(target, pred)  

    # creating a dataframe of metrics
    df_perf = pd.DataFrame(
        {
            "RMSE": rmse,
            "MAE": mae,
            "R-squared": r2,
            "Adj. R-squared": adjr2,
            "MAPE": mape,
        },
        index=[0],
    )

    return df_perf

In [None]:
#Performance metrics on train data
print("Training Performance\n")
lin_reg_model_train_perf = model_performance_regression(lin_reg_model, x_train, y_train)
lin_reg_model_train_perf

In [None]:
# Checking model performance on test set
print("Test Performance\n")
lin_reg_model_test_perf = model_performance_regression(lin_reg_model, x_test, y_test)
lin_reg_model_test_perf

The both train and test R2 generate the value1, which indicates the model probably is suffering from overfitting, which means it might performs well on train data, but it is difficult to generalize

MAPE 2.05 on the test data indicates that we can predict within 2% of price

MAE indicates that our model can predict price within a mean error of 1.13

MAE indicates that our current model is able to predict price within a mean error 1.13

However, the overall performance is not so great.

Recommendations and Conclusion:
    - From the above analysis, we see there are positive relationship of Power and Engine Variables on the price of the used car. So the marketer should give more importance on Power and Engine factors while setting the price
    - There are also logical relationsip between usage_years and price. The price will be down when age of the car increases.
    -Incase of fuel type, Patrol cars are preferred than diesel and LPG. The price is with diesel and LPG type cars negatively correlated
    -Incase of owner type, fourth and above are not preferred by the customer, then price goes down
    - incase of location Kochi, Pune and Bangalore are negatively correlated with the price variables
    