# Sports Cars Price Analysis

## About Dataset

This dataset contains information about the prices of different sports cars from various manufacturers. The dataset includes the make and model of the car, the year of production, the engine size, the horsepower, the torque, the 0-60 MPH time, and the price in USD. The dataset is useful for analyzing the prices of different sports cars and identifying trends in the market.
Columns
1. Car Make: The make of the sports car, which represents the brand or company that produced the car. Examples of car makes in this dataset include Porsche, Lamborghini, Ferrari, Audi, and McLaren.
2. Car Model: The model of the sports car, which represents the specific version or variant of the car produced by the manufacturer. Examples of car models in this dataset include 911, Huracan, 488 GTB, R8, 720S, M8, AMG GT, Corvette, Mustang Shelby GT500, and GT-R Nismo.
3. Year: The year of production of the sports car, which indicates the model year when the car was first introduced or made available for purchase.
4. Engine Size (L): The size of the sports car's engine in liters, which represents the volume of the engine's cylinders. A larger engine size typically indicates higher power and performance. Engine sizes in this dataset range from 2.0L to 8.0L, with some cars having electric motors instead.
5. Horsepower: The horsepower of the sports car, which represents the power output of the car's engine. Higher horsepower typically indicates faster acceleration and higher top speed. Horsepower values in this dataset range from 300 to 1479.
6. Torque (lb-ft): The torque of the sports car in pound-feet, which represents the rotational force generated by the engine. Higher torque values typically indicate stronger acceleration and better handling. Torque values in this dataset range from 270 to 1180.
7. 0-60 MPH Time (seconds): The time it takes for the sports car to accelerate from 0 to 60 miles per hour, which is a common measure of acceleration and performance. Lower 0-60 MPH times typically indicate faster acceleration and better performance. 0-60 MPH times in this dataset range from 1.85 to 5.3 seconds.
8. Price (in USD): The price of the sports car in US dollars, which represents the cost of purchasing the car. Prices in this dataset range from $25,000 to $3,000,000.

Source: https://www.kaggle.com/datasets/rkiattisak/sports-car-prices-dataset

## Importing importatnt Libraries

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [2]:
%matplotlib inline

## Section - 1: Importing the DataSet

In [75]:
sportscar = pd.read_csv('Sport car price.csv')

# Section 2: exploring the Dataset

In [76]:
sportscar.head(100)

Unnamed: 0,Car Make,Car Model,Year,Engine Size (L),Horsepower,Torque (lb-ft),0-60 MPH Time (seconds),Price (in USD)
0,Porsche,911,2022,3,379,331,4,101200
1,Lamborghini,Huracan,2021,5.2,630,443,2.8,274390
2,Ferrari,488 GTB,2022,3.9,661,561,3,333750
3,Audi,R8,2022,5.2,562,406,3.2,142700
4,McLaren,720S,2021,4,710,568,2.7,298000
...,...,...,...,...,...,...,...,...
95,Pagani,Huayra,2022,6,764,738,2.8,2800000
96,Porsche,718 Cayman GT4,2022,4,414,309,4.2,100550
97,Rimac,Nevera,2022,Electric,1914,1696,1.95,2400000
98,Rolls-Royce,Wraith,2021,6.8,624,605,4.4,330000


Columns 'Engine Size (L)', 'Horsepower', 'Torque (lb-ft)', '0-60 MPH Time (seconds)' can be used as parameters for predicting the price of car.

In [77]:
type(sportscar)

pandas.core.frame.DataFrame

In [78]:
sportscar.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1007 entries, 0 to 1006
Data columns (total 8 columns):
 #   Column                   Non-Null Count  Dtype 
---  ------                   --------------  ----- 
 0   Car Make                 1007 non-null   object
 1   Car Model                1007 non-null   object
 2   Year                     1007 non-null   int64 
 3   Engine Size (L)          997 non-null    object
 4   Horsepower               1007 non-null   object
 5   Torque (lb-ft)           1004 non-null   object
 6   0-60 MPH Time (seconds)  1007 non-null   object
 7   Price (in USD)           1007 non-null   object
dtypes: int64(1), object(7)
memory usage: 63.1+ KB


In [79]:
# Lets fill the null values
sportscar["Engine Size (L)"].fillna("No Value", inplace = True)

In [80]:
sportscar.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1007 entries, 0 to 1006
Data columns (total 8 columns):
 #   Column                   Non-Null Count  Dtype 
---  ------                   --------------  ----- 
 0   Car Make                 1007 non-null   object
 1   Car Model                1007 non-null   object
 2   Year                     1007 non-null   int64 
 3   Engine Size (L)          1007 non-null   object
 4   Horsepower               1007 non-null   object
 5   Torque (lb-ft)           1004 non-null   object
 6   0-60 MPH Time (seconds)  1007 non-null   object
 7   Price (in USD)           1007 non-null   object
dtypes: int64(1), object(7)
memory usage: 63.1+ KB


In [81]:
sportscar["Torque (lb-ft)"].fillna("No Value", inplace = True)

In [82]:
sportscar.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1007 entries, 0 to 1006
Data columns (total 8 columns):
 #   Column                   Non-Null Count  Dtype 
---  ------                   --------------  ----- 
 0   Car Make                 1007 non-null   object
 1   Car Model                1007 non-null   object
 2   Year                     1007 non-null   int64 
 3   Engine Size (L)          1007 non-null   object
 4   Horsepower               1007 non-null   object
 5   Torque (lb-ft)           1007 non-null   object
 6   0-60 MPH Time (seconds)  1007 non-null   object
 7   Price (in USD)           1007 non-null   object
dtypes: int64(1), object(7)
memory usage: 63.1+ KB


### Lets check what kinds of value does different colums hold.

### 1. Engine Size Column

In [84]:
sportscar.value_counts('Engine Size (L)')

Engine Size (L)
4                       219
6.2                     113
3                        85
3.5                      79
5                        68
6.5                      46
3.8                      38
Electric                 36
3.7                      35
2                        34
3.9                      30
2.9                      30
5.2                      29
6                        28
2.5                      25
4.7                      23
8                        23
4.4                      11
No Value                 10
6.8                       6
1.7                       4
6.6                       3
Electric Motor            3
1.8                       3
8.4                       3
Hybrid                    2
1.5                       2
Hybrid (4.0)              1
Electric (tri-motor)      1
Electric (93 kWh)         1
Electric (100 kWh)        1
1.5 + Electric            1
7                         1
4.0 (Hybrid)              1
6.7                       1
3.6 

As we can see above, there are different kinds of values for 'Engine Size (L)' column, which will pose difficulties while trying to predict car prices.

Possible Solutions:
1. As most of the Engine types here are conventional and there are very few Electric and Hybrid ones, therefore, we can take out all those values and place them in a seperate dataset for Electric and Hybrid Cars, such that only numerical values will be left in Engine Size column.
2. Try to get engine sizes for all particular Electric or Hybrid vehicles(Not Preffered).

In [85]:
# Creating a new Dataset for Electric and Hybrid Engine Cars.
Ecars = pd.DataFrame(columns=sportscar.columns)

In [86]:
Ecars

Unnamed: 0,Car Make,Car Model,Year,Engine Size (L),Horsepower,Torque (lb-ft),0-60 MPH Time (seconds),Price (in USD)


In [87]:
for i in range(len(sportscar['Engine Size (L)'])):
    if 'Electric' in sportscar['Engine Size (L)'][i]:        
        s = sportscar.loc[i]
        temp1_df = pd.DataFrame(s).transpose()
        Ecars = pd.concat([Ecars, temp1_df])
        temp1_df.drop(temp1_df.index, inplace=True)
        
    elif 'Hybrid' in sportscar['Engine Size (L)'][i]:
        s = sportscar.loc[i]
        temp1_df = pd.DataFrame(s).transpose()
        Ecars = pd.concat([Ecars, temp1_df])
        temp1_df.drop(temp1_df.index, inplace=True)

In [88]:
Ecars

Unnamed: 0,Car Make,Car Model,Year,Engine Size (L),Horsepower,Torque (lb-ft),0-60 MPH Time (seconds),Price (in USD)
26,Rimac,Nevera,2022,Electric,1914,1696,1.85,2400000
37,Porsche,Taycan 4S,2022,Electric Motor,562,479,3.8,104000
42,BMW,i8,2020,1.5 + Electric,369,420,4.2,148500
97,Rimac,Nevera,2022,Electric,1914,1696,1.95,2400000
99,Tesla,Roadster,2022,Electric,1000+,737,1.9,200000
185,Porsche,Taycan,2021,Electric,750,774,2.6,185000
278,Rimac,C_Two,2022,Electric,1914,1732,1.85,2400000
280,Pininfarina,Battista,2022,Electric,1874,1696,1.9,2500000
299,Porsche,Taycan,2022,Electric,616,774,2.6,79900
300,Tesla,Model S,2022,Electric,1020,1050,1.98,119000


#### Now that we have created a 2nd dataset, we can drop those values from the main dataset.

In [89]:
for i in range(len(sportscar['Engine Size (L)'])):
    if 'Electric' in sportscar['Engine Size (L)'][i]:        
        sportscar.drop(i, inplace = True)      
    elif 'Hybrid' in sportscar['Engine Size (L)'][i]:
        sportscar.drop(i, inplace = True)
sportscar

Unnamed: 0,Car Make,Car Model,Year,Engine Size (L),Horsepower,Torque (lb-ft),0-60 MPH Time (seconds),Price (in USD)
0,Porsche,911,2022,3,379,331,4,101200
1,Lamborghini,Huracan,2021,5.2,630,443,2.8,274390
2,Ferrari,488 GTB,2022,3.9,661,561,3,333750
3,Audi,R8,2022,5.2,562,406,3.2,142700
4,McLaren,720S,2021,4,710,568,2.7,298000
...,...,...,...,...,...,...,...,...
1000,Aston Martin,Vantage,2021,4,503,505,3.6,146000
1001,Bugatti,Chiron,2021,8,1479,1180,2.4,3000000
1002,Koenigsegg,Jesko,2022,5,1280,1106,2.5,3000000
1004,McLaren,Senna,2021,4,789,590,2.7,1000000


In [90]:
sportscar.value_counts('Engine Size (L)')

Engine Size (L)
4           219
6.2         113
3            85
3.5          79
5            68
6.5          46
3.8          38
3.7          35
2            34
2.9          30
3.9          30
5.2          29
6            28
2.5          25
8            23
4.7          23
4.4          11
No Value     10
6.8           6
1.7           4
1.8           3
6.6           3
8.4           3
1.5           2
6.4           1
7             1
6.7           1
-             1
6.3           1
5.7           1
5.5           1
0             1
3.6           1
3.3           1
2.3           1
4.6           1
dtype: int64

The only remaining issue in Engine Size (L) column is the missing values that we earlier filled in as "No Value", possible solutions for that are:
1. we can replace the missing value with mean of all values, would be totally wrong but time saving.
2. we can try to find the value for each missing value online using the car name and model and fill it in manually one by one, which would be very time consuming.

For now I am opting the 1st method.

In [107]:
# Calculating the mean of 'Engine Size (L)' column
temp = []
for i in sportscar['Engine Size (L)']:
    if i == 'No Value' or i == '-':
        pass
    else:
        temp.append(float(i))
temp_mean = sum(temp)/len(temp)

In [108]:
temp_mean

4.39578059071729

In [110]:
# Replacing the missing and absurd values with mean.
sportscar['Engine Size (L)'] = sportscar['Engine Size (L)'].replace(['No Value', '-'], ['4.4', '4.4'])

In [111]:
sportscar

Unnamed: 0,Car Make,Car Model,Year,Engine Size (L),Horsepower,Torque (lb-ft),0-60 MPH Time (seconds),Price (in USD)
0,Porsche,911,2022,3,379,331,4,101200
1,Lamborghini,Huracan,2021,5.2,630,443,2.8,274390
2,Ferrari,488 GTB,2022,3.9,661,561,3,333750
3,Audi,R8,2022,5.2,562,406,3.2,142700
4,McLaren,720S,2021,4,710,568,2.7,298000
...,...,...,...,...,...,...,...,...
1000,Aston Martin,Vantage,2021,4,503,505,3.6,146000
1001,Bugatti,Chiron,2021,8,1479,1180,2.4,3000000
1002,Koenigsegg,Jesko,2022,5,1280,1106,2.5,3000000
1004,McLaren,Senna,2021,4,789,590,2.7,1000000


In [112]:
sportscar.value_counts('Engine Size (L)')

Engine Size (L)
4      219
6.2    113
3       85
3.5     79
5       68
6.5     46
3.8     38
3.7     35
2       34
2.9     30
3.9     30
5.2     29
6       28
2.5     25
8       23
4.7     23
4.4     22
6.8      6
1.7      4
1.8      3
6.6      3
8.4      3
1.5      2
6.4      1
6.7      1
7        1
0        1
6.3      1
5.7      1
5.5      1
3.6      1
3.3      1
2.3      1
4.6      1
dtype: int64

Now we just need to convert all the string values to float and our 'Engine Size (L)' column will be ready.

In [113]:
# Converting str to float
sportscar['Engine Size (L)'] = sportscar['Engine Size (L)'].astype(float)