## Problem Statement

### Description Context

There is a huge demand for used cars in the Indian Market today. As sales of new cars have slowed down in the recent past, the pre-owned car market has continued to grow over the past years and is larger than the new car market now. Cars4U is a budding tech start-up that aims to find footholes in this market.

In 2018-19, while new car sales were recorded at 3.6 million units, around 4 million second-hand cars were bought and sold. There is a slowdown in new car sales and that could mean that the demand is shifting towards the pre-owned market. In fact, some car sellers replace their old cars with pre-owned cars instead of buying new ones. Unlike new cars, where price and supply are fairly deterministic and managed by OEMs (Original Equipment Manufacturer / except for dealership level discounts which come into play only in the last stage of the customer journey), used cars are very different beasts with huge uncertainty in both pricing and supply. Keeping this in mind, the pricing scheme of these used cars becomes important in order to grow in the market.

As a senior data scientist at Cars4U, you have to come up with a pricing model that can effectively predict the price of used cars and can help the business in devising profitable strategies using differential pricing. For example, if the business knows the market price, it will never sell anything below it.

### Objective

To explore and visualize the dataset, build a linear regression model to predict the prices of used cars, and generate a set of insights and recommendations that will help the business.

### Data Dictionary

The data contains the different attributes of used cars sold in different locations. The detailed data dictionary is given below.

* **S.No.**: Serial number
* **Name**: Name of the car which includes brand name and model name
* **Location**: Location in which the car is being sold or is available for purchase (cities)
* **Year**: Manufacturing year of the car
* **Kilometers_driven**: The total kilometers driven in the car by the previous owner(s) in km
* **Fuel_Type**: The type of fuel used by the car (Petrol, Diesel, Electric, CNG, LPG)
* **Transmission**: The type of transmission used by the car (Automatic/Manual)
* **Owner**: Type of ownership
* **Mileage**: The standard mileage offered by the car company in kmpl or km/kg
* **Engine**: The displacement volume of the engine in CC
* **Power**: The maximum power of the engine in bhp
* **Seats**: The number of seats in the car 
* **New_Price**: The price of a new car of the same model in INR Lakhs (1 Lakh INR = 100,000 INR)
* **Price**: The price of the used car in INR Lakhs

### Best Practices for Notebook

The notebook should be well-documented, with inline comments explaining the functionality of code and markdown cells containing comments on the observations and insights. The notebook should be run from start to finish in a sequential manner before submission. It is preferable to remove all warnings and errors before submission. The notebook should be submitted as an HTML file (.html) and NOT as a notebook file (.ipynb).

# Solution

## Imports + Data Overview

**Hint**: Python imports. Load the data from the SCV file. Check data types



In [5]:
import pandas as pd 

In [15]:
used_cars = pd.read_csv("test_datasets/used_cars_data.csv")

In [17]:
used_cars.head()

Unnamed: 0,S.No.,Name,Location,Year,Kilometers_Driven,Fuel_Type,Transmission,Owner_Type,Mileage,Engine,Power,Seats,New_Price,Price
0,0,Maruti Wagon R LXI CNG,Mumbai,2010,72000,CNG,Manual,First,26.6 km/kg,998 CC,58.16 bhp,5.0,5.51,1.75
1,1,Hyundai Creta 1.6 CRDi SX Option,Pune,2015,41000,Diesel,Manual,First,19.67 kmpl,1582 CC,126.2 bhp,5.0,16.06,12.5
2,2,Honda Jazz V,Chennai,2011,46000,Petrol,Manual,First,18.2 kmpl,1199 CC,88.7 bhp,5.0,8.61,4.5
3,3,Maruti Ertiga VDI,Chennai,2012,87000,Diesel,Manual,First,20.77 kmpl,1248 CC,88.76 bhp,7.0,11.27,6.0
4,4,Audi A4 New 2.0 TDI Multitronic,Coimbatore,2013,40670,Diesel,Automatic,Second,15.2 kmpl,1968 CC,140.8 bhp,5.0,53.14,17.74


In [18]:
used_cars.shape

(7253, 14)

In [19]:
used_cars.info

<bound method DataFrame.info of       S.No.                                               Name    Location  \
0         0                             Maruti Wagon R LXI CNG      Mumbai   
1         1                   Hyundai Creta 1.6 CRDi SX Option        Pune   
2         2                                       Honda Jazz V     Chennai   
3         3                                  Maruti Ertiga VDI     Chennai   
4         4                    Audi A4 New 2.0 TDI Multitronic  Coimbatore   
...     ...                                                ...         ...   
7248   7248                  Volkswagen Vento Diesel Trendline   Hyderabad   
7249   7249                             Volkswagen Polo GT TSI      Mumbai   
7250   7250                             Nissan Micra Diesel XV     Kolkata   
7251   7251                             Volkswagen Polo GT TSI        Pune   
7252   7252  Mercedes-Benz E-Class 2009-2013 E 220 CDI Avan...       Kochi   

      Year  Kilometers_Driven F

In [20]:
used_cars.describe

<bound method NDFrame.describe of       S.No.                                               Name    Location  \
0         0                             Maruti Wagon R LXI CNG      Mumbai   
1         1                   Hyundai Creta 1.6 CRDi SX Option        Pune   
2         2                                       Honda Jazz V     Chennai   
3         3                                  Maruti Ertiga VDI     Chennai   
4         4                    Audi A4 New 2.0 TDI Multitronic  Coimbatore   
...     ...                                                ...         ...   
7248   7248                  Volkswagen Vento Diesel Trendline   Hyderabad   
7249   7249                             Volkswagen Polo GT TSI      Mumbai   
7250   7250                             Nissan Micra Diesel XV     Kolkata   
7251   7251                             Volkswagen Polo GT TSI        Pune   
7252   7252  Mercedes-Benz E-Class 2009-2013 E 220 CDI Avan...       Kochi   

      Year  Kilometers_Driven

In [16]:
used_cars.dtypes

S.No.                  int64
Name                  object
Location              object
Year                   int64
Kilometers_Driven      int64
Fuel_Type             object
Transmission          object
Owner_Type            object
Mileage               object
Engine                object
Power                 object
Seats                float64
New_Price            float64
Price                float64
dtype: object

## Exploratory Data Analysis

**Hint**: Univariate analysis - Bivariate analysis. Key meaningful observations on the relationship between variables.

## Data pre-processing

**Hint**: Prepare the data for analysis and modeling - Missing value Treatment - Outlier Treatment - Feature Engineering

## Model building - Linear Regression

**Hint**: Build the model and comment on the model statistics - Display model coefficients with column names

## Model performance evaluation

**Hint**: Evaluate the model on different performance metrics


## Actionable Insights & Recommendations

**Hint**: Conclude with the key takeaways for the business