# __Raw Project__

## Project Content

<a id = 0></a>

### First Step: First Organization

1. [Introduction](#1)
2. [Loading libraries and packages to embark our new journey](#2)
3. [Loading and Checking The Dataset](#3)

### Second Step: Data Preprocessing

4. [Exploratory Data Analysis](#4)
5. [Numeric Fields Analysis](#5)
6. [Categorical Fields Analysis](#6)
7. [Feature Scaling](#7)
8. [Correlation Analysis](#8)
9. [Dealing with Outliers](#9)
10. [Determining Distributions of Numeric Fields](#10)
11. [Appllying One Hot Encoding to Categorical Fields](#11)
12. [Feature Scaling with The RobustScaler Method](#12)
13. [Seperating Data into Two Parts of Train and Test](#13)

### Third and Final Step: Modeling

16. [Modelling (Logistic Regression)](#16)
17. [Cross Validation (Logistic Regression)](#17)
18. [AUC-ROC Curve (Logistic Regression)](#18)
19. [Hyper Parameter Optimization (Logistic Regression)](#19)
20. [Modelling (Decision Tree)](#20)
21. [Cross Validation (Decision Tree)](#21)
22. [AUC-ROC Curve (Decision Tree)](#22)
23. [Hyper Parameter Optimization (Decision Tree)](#23)
24. [Modelling (Support Vector Classifier)](#24)
25. [Cross Validation (Support Vector Classifier)](#25)
26. [AUC-ROC Curve (Support Vector Classifier)](#26)
27. [Hyper Parameter Optimization (Support Vector Classifier)](#27)
24. [Modelling (Random Forest)](#28)
25. [Cross Validation (Support Vector Classifier)](#29)
26. [AUC-ROC Curve (Support Vector Classifier)](#30)
27. [Hyper Parameter Optimization (Support Vector Classifier)](#31)

## 1. Introduction <a id = 1></a>

[Project Content](#0)

## 2. Loading libraries and packages to embark our new journey <a id = 2></a>

In [95]:
# Basic Python Packages

import warnings
warnings.filterwarnings("ignore")

# Numpy Library

import numpy as np

# Pandas Library

import pandas as pd

# Visualization Libraries (Matplotlib, Seaborn, Missingno)

import matplotlib.pyplot as plt
import seaborn as sns
import missingno

%matplotlib inline

[Project Content](#0)

## 3. Loading and Checking The Dataset <a id = 3></a>

### Loading The Dataset

In [96]:
df = pd.read_csv("Car-Dataset.csv")

### Checking The Dataset

In [97]:
df.head(5)

Unnamed: 0,Car_Name,Year,Selling_Price,Present_Price,Kms_Driven,Fuel_Type,Seller_Type,Transmission,Owner
0,ritz,2014,3.35,5.59,27000,Petrol,Dealer,Manual,0
1,sx4,2013,4.75,9.54,43000,Diesel,Dealer,Manual,0
2,ciaz,2017,7.25,9.85,6900,Petrol,Dealer,Manual,0
3,wagon r,2011,2.85,4.15,5200,Petrol,Dealer,Manual,0
4,swift,2014,4.6,6.87,42450,Diesel,Dealer,Manual,0


### Examining Missing Values

In [98]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 301 entries, 0 to 300
Data columns (total 9 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Car_Name       301 non-null    object 
 1   Year           301 non-null    int64  
 2   Selling_Price  301 non-null    float64
 3   Present_Price  301 non-null    float64
 4   Kms_Driven     301 non-null    int64  
 5   Fuel_Type      301 non-null    object 
 6   Seller_Type    301 non-null    object 
 7   Transmission   301 non-null    object 
 8   Owner          301 non-null    int64  
dtypes: float64(2), int64(3), object(4)
memory usage: 21.3+ KB


### Examining Number of Unique Values

In [99]:
unique_values_df = pd.DataFrame(columns = ["Column_Name", "Unique_Values_Num"])

columns_list = []
unique_values_list = []

for column in df.columns:
    
    columns_list += [column]
    unique_values_list += [df[column].nunique()]
    
unique_values_df["Column_Name"] = columns_list
unique_values_df["Unique_Values_Num"] = unique_values_list
    
unique_values_df

Unnamed: 0,Column_Name,Unique_Values_Num
0,Car_Name,98
1,Year,16
2,Selling_Price,156
3,Present_Price,147
4,Kms_Driven,206
5,Fuel_Type,3
6,Seller_Type,2
7,Transmission,2
8,Owner,3


### Deleting Car Name Column

In [100]:
del df["Car_Name"]

df.head(5)

Unnamed: 0,Year,Selling_Price,Present_Price,Kms_Driven,Fuel_Type,Seller_Type,Transmission,Owner
0,2014,3.35,5.59,27000,Petrol,Dealer,Manual,0
1,2013,4.75,9.54,43000,Diesel,Dealer,Manual,0
2,2017,7.25,9.85,6900,Petrol,Dealer,Manual,0
3,2011,2.85,4.15,5200,Petrol,Dealer,Manual,0
4,2014,4.6,6.87,42450,Diesel,Dealer,Manual,0


### Separating Fields to Numerical and Categorical

In [103]:
columns_list = list(df.columns)
cat_f = ["Fuel_Type", "Seller_Type", "Transmission", "Owner"]
num_f = columns_list

for column in cat_f:
    num_f.remove(column)

print("Categorical fields are:      ", cat_f)
print("Numerical fields are:        ", num_f)

Categorical fields are:       ['Fuel_Type', 'Seller_Type', 'Transmission', 'Owner']
Numerical fields are:         ['Year', 'Selling_Price', 'Present_Price', 'Kms_Driven']


[Project Content](#0)

## 4. Exploratory Data Analysis <a id = 4></a>

### Examining Statistics of Numeric Fields

### Examining Statistics of Categorical Fields

### Examining The Missing Data

### Filling The Missing Values

[Project Content](#0)

## 5. Numeric Fields Analysis <a id = 5></a>

### Bi-Variate Analysis between Numeric Fields and Target

### Analysis between Numeric Fields Among Themselves

### Correlation Analysis between Numeric Fields and Target

[Project Content](#0)

## 6. Categorical Fields Analysis <a id = 6></a>

### Bi-Variate Analysis between Categorical Fields and Target

### Correlation Analysis between Categorical Fields and Target

[Project Content](#0)

## 7. Feature Scaling <a id = 7></a>

[Project Content](#0)

## 8. Correlation Analysis <a id = 8></a> 

### Numeric fields and Target Value Correlation Analysis

### Numeric and Categorical Fields Correlation Analysis

### Dropping Columns with Low Correlation

[Project Content](#0)

## 9. Dealing with Outliers <a id = 9></a> 

[Project Content](#0)

### 12. Determining Distributions of Numeric Fields <a id = 12></a> 

[Project Content](#0)

### 13. Appllying One Hot Encoding to Categorical Fields <a id = 13></a> 

[Project Content](#0)

### 14. Feature Scaling with The RobustScaler Method <a id = 14></a> 

[Project Content](#0)

### 15. Seperating Data into Two Parts of Train and Test <a id = 15></a> 

[Project Content](#0)

### 16. Modelling (Logistic Regression) <a id = 16></a> 

### 17. Cross Validation (Logistic Regression) <a id = 17></a> 

### 18. AUC-ROC Curve (Logistic Regression) <a id = 18></a> 

### 19. Hyper Parameter Optimization (Logistic Regression) <a id = 19></a> 

### 20. Modelling (Desicion Tree) <a id = 20></a> 

### 21. Cross Validation (Desicion Tree) <a id = 21></a> 

### 22. AUC-ROC Curve (Desicion Tree) <a id = 22></a> 

### 23. Hyper Parameter Optimization (Desicion Tree) <a id = 23></a> 

### 24. Modelling (Support Vector Classification) <a id = 24></a> 

### 25. Cross Validation (Support Vector Classification) <a id = 25></a> 

### 26. AUC-ROC Curve (Support Vector Classification) <a id = 26></a> 

### 27. Hyper Parameter Optimization (Support Vector Classification) <a id = 27></a> 

### 28. Modelling (Random Forest Classification) <a id = 28></a> 

### 29. Cross Validation (Random Forest Classification) <a id = 29></a> 

### 30. AUC-ROC Curve (Random Forest Classification) <a id = 30></a> 

### 31. Hyper Parameter Optimization (Random Forest Classification) <a id = 31></a> 