# Car Price Prediction

    This notebook will use the Kaggle cardekho data set to model car prices. With a 4 month old child, my wife and I have recently decided we need to upgrade our vehicle to something newer and safer. Given we are novices in the car purchasing world, I would like to explore what factors have the greatest impact on pricing so I can enter the market as a savvy buyer.
    
## Business Question

**What are the greatest factors influencing the price of a vehicle?**
- Use linear regression to fit and score a model to the cardekho dataset
- Get the parameter list in order of explained variability
- Consider the parameter list in how it impacts your buying decision

**Is there a point at which buying a used car no longer makes sense?**

- Model pricing with various loan terms and APRs

In [10]:
# Read in packages necessary
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
%matplotlib inline

# Read in the cardekho data set
cars_df = pd.read_csv('./cars.csv')
cars_df.head()



Unnamed: 0,name,year,selling_price,km_driven,fuel,seller_type,transmission,owner
0,Maruti 800 AC,2007,60000,70000,Petrol,Individual,Manual,First Owner
1,Maruti Wagon R LXI Minor,2007,135000,50000,Petrol,Individual,Manual,First Owner
2,Hyundai Verna 1.6 SX,2012,600000,100000,Diesel,Individual,Manual,First Owner
3,Datsun RediGO T Option,2017,250000,46000,Petrol,Individual,Manual,First Owner
4,Honda Amaze VX i-DTEC,2014,450000,141000,Diesel,Individual,Manual,Second Owner


In [14]:
#Check for missingness
np.sum(cars_df.isnull()) > 0

name             False
year             False
selling_price    False
km_driven        False
fuel             False
seller_type      False
transmission     False
owner            False
dtype: bool

In [15]:
#Row Count
cars_df.shape[0]

4340

In [16]:
#Summary Stats
cars_df.describe()

Unnamed: 0,year,selling_price,km_driven
count,4340.0,4340.0,4340.0
mean,2013.090783,504127.3,66215.777419
std,4.215344,578548.7,46644.102194
min,1992.0,20000.0,1.0
25%,2011.0,208749.8,35000.0
50%,2014.0,350000.0,60000.0
75%,2016.0,600000.0,90000.0
max,2020.0,8900000.0,806599.0


In [19]:
car_types = dict(cars_df['name'].value_counts())

car_types

{'Maruti Swift Dzire VDI': 69,
 'Maruti Alto 800 LXI': 59,
 'Maruti Alto LXi': 47,
 'Maruti Alto LX': 35,
 'Hyundai EON Era Plus': 35,
 'Maruti Swift VDI BSIV': 29,
 'Maruti Wagon R VXI BS IV': 29,
 'Maruti Swift VDI': 27,
 'Hyundai EON Magna Plus': 24,
 'Maruti Wagon R LXI Minor': 24,
 'Maruti 800 AC': 23,
 'Maruti Wagon R LXI': 23,
 'Hyundai i10 Magna': 22,
 'Maruti Ritz VDi': 22,
 'Mahindra XUV500 W8 2WD': 22,
 'Hyundai Santro Xing GLS': 21,
 'Maruti Alto K10 VXI': 21,
 'Renault KWID RXT': 21,
 'Hyundai Creta 1.6 CRDi SX': 19,
 'Renault KWID 1.0 RXT Optional': 17,
 'Renault Duster 85PS Diesel RxL': 17,
 'Chevrolet Beat Diesel LT': 17,
 'Chevrolet Beat Diesel LS': 16,
 'Maruti SX4 Vxi BSIV': 16,
 'Tata Indica GLS BS IV': 15,
 'Chevrolet Spark 1.0 LS': 15,
 'Hyundai Verna 1.6 SX': 15,
 'Maruti S-Cross Zeta DDiS 200 SH': 14,
 'Hyundai Verna 1.6 SX CRDi (O)': 14,
 'Mahindra XUV500 W6 2WD': 14,
 'Maruti Alto 800 VXI': 14,
 'Maruti Alto LXi BSIII': 14,
 'Hyundai Grand i10 1.2 Kappa Magna 