# Car Dataset Exploratory Data Analysis (EDA) Project

## Introduction
In this project, I will conduct an in-depth exploratory data analysis (EDA) on a car dataset. The goal is to clean, analyze, and gain insights into the dataset by leveraging various data manipulation techniques. This will involve identifying patterns, understanding the distribution of data, and performing operations that help in better data interpretation. The project will cover essential data analysis tasks such as data cleaning, filtering, and transformation.

## Commands Used in This Project:
- **`import pandas as pd`**: To import the Pandas library for data manipulation and analysis.
- **`pd.read_csv`**: To load the CSV file into the Jupyter notebook.
- **`head()`**: Displays the first N rows of the data (default is 5).
- **`shape`**: Shows the total number of rows and columns in the DataFrame.
- **`df.isnull().sum()`**: Detects missing values in each column of the DataFrame.
- **`fillna()`**: Fills null values in a column with a specified value.
- **`value_counts`**: Shows the unique values and their counts in a single column.
- **`isin()`**: Filters records that include specific elements.
- **`apply()`**: Applies a function to each element along any axis of the DataFrame.

## Questions and Instructions:
1. **Data Cleaning**: Identify all null values in the dataset. If any null values are found in a column, replace them with the mean of that column.
2. **Value Counts**: What are the different types of car makes in the dataset? Display the count of occurrences for each make.
3. **Filtering**: Show all records where the origin of the car is either Asia or Europe.
4. **Removing Unwanted Records**: Remove all rows where the car's weight exceeds 4000.
5. **Applying Functions on a Column**: Increase all values in the 'MPG_City' column by 3 to analyze the potential impact on fuel efficiency.

## Objective
By the end of this project, the aim is to have a cleaner and more insightful version of the car dataset, with key findings on car types, regional origins, and performance metrics. This will help understand trends, identify anomalies, and prepare the data for further analysis or machine learning tasks.


In [1]:
import pandas as pd

In [2]:
car = pd.read_csv("car.csv")

In [3]:
car.head()

Unnamed: 0,Make,Model,Type,Origin,DriveTrain,MSRP,Invoice,EngineSize,Cylinders,Horsepower,MPG_City,MPG_Highway,Weight,Wheelbase,Length
0,Acura,MDX,SUV,Asia,All,"$36,945","$33,337",3.5,6.0,265.0,17.0,23.0,4451.0,106.0,189.0
1,Acura,RSX Type S 2dr,Sedan,Asia,Front,"$23,820","$21,761",2.0,4.0,200.0,24.0,31.0,2778.0,101.0,172.0
2,Acura,TSX 4dr,Sedan,Asia,Front,"$26,990","$24,647",2.4,4.0,200.0,22.0,29.0,3230.0,105.0,183.0
3,Acura,TL 4dr,Sedan,Asia,Front,"$33,195","$30,299",3.2,6.0,270.0,20.0,28.0,3575.0,108.0,186.0
4,Acura,3.5 RL 4dr,Sedan,Asia,Front,"$43,755","$39,014",3.5,6.0,225.0,18.0,24.0,3880.0,115.0,197.0


In [5]:
car.shape

(432, 15)

In [7]:
car.isnull().sum()

Make           4
Model          4
Type           4
Origin         4
DriveTrain     4
MSRP           4
Invoice        4
EngineSize     4
Cylinders      6
Horsepower     4
MPG_City       4
MPG_Highway    4
Weight         4
Wheelbase      4
Length         4
dtype: int64

In [9]:
# Replace null values in numerical columns with the mean of that column
numerical_columns = car.select_dtypes(include=['float64', 'int64']).columns   # car.select_dtypes (selects all columns in dataframe tht are of type so and so)
car[numerical_columns] = car[numerical_columns].fillna(car[numerical_columns].mean()) 

# For one column for example: 'Cylinder' --- car['Cylinders'].fillna(car['Cylinders'].mean(), inplace = True)

# Replace null values in categorical columns with the mode
categorical_columns = car.select_dtypes(include=['object']).columns
car[categorical_columns] = car[categorical_columns].fillna(car[categorical_columns].mode().iloc[0])

# Verify that there are no more null values
print(car.isnull().sum())


Make           0
Model          0
Type           0
Origin         0
DriveTrain     0
MSRP           0
Invoice        0
EngineSize     0
Cylinders      0
Horsepower     0
MPG_City       0
MPG_Highway    0
Weight         0
Wheelbase      0
Length         0
dtype: int64


In [10]:
car.value_counts('Make') # car['Make'].value_counts() alternative method

Make
Toyota           32
Chevrolet        27
Mercedes-Benz    26
Ford             23
BMW              20
Audi             19
Honda            17
Nissan           17
Volkswagen       15
Chrysler         15
Dodge            13
Mitsubishi       13
Volvo            12
Jaguar           12
Hyundai          12
Subaru           11
Pontiac          11
Mazda            11
Lexus            11
Kia              11
Buick             9
Mercury           9
Lincoln           9
Saturn            8
Cadillac          8
Suzuki            8
Infiniti          8
GMC               8
Acura             7
Porsche           7
Saab              7
Land Rover        3
Oldsmobile        3
Jeep              3
Scion             2
Isuzu             2
MINI              2
Hummer            1
dtype: int64

In [14]:
# Filter records where the car's origin is either Asia or Europe
filtered_cars = car[(car['Origin'] == 'Asia') | (car['Origin'] == 'Europe')]
print(filtered_cars.head())

# Alternative method
filtered_cars = car[car['Origin'].isin(['Asia', 'Europe'])]
print(filtered_cars.head()) 


    Make           Model   Type Origin DriveTrain      MSRP   Invoice  \
0  Acura             MDX    SUV   Asia        All  $36,945   $33,337    
1  Acura  RSX Type S 2dr  Sedan   Asia      Front  $23,820   $21,761    
2  Acura         TSX 4dr  Sedan   Asia      Front  $26,990   $24,647    
3  Acura          TL 4dr  Sedan   Asia      Front  $33,195   $30,299    
4  Acura      3.5 RL 4dr  Sedan   Asia      Front  $43,755   $39,014    

   EngineSize  Cylinders  Horsepower  MPG_City  MPG_Highway  Weight  \
0         3.5        6.0       265.0      17.0         23.0  4451.0   
1         2.0        4.0       200.0      24.0         31.0  2778.0   
2         2.4        4.0       200.0      22.0         29.0  3230.0   
3         3.2        6.0       270.0      20.0         28.0  3575.0   
4         3.5        6.0       225.0      18.0         24.0  3880.0   

   Wheelbase  Length  
0      106.0   189.0  
1      101.0   172.0  
2      105.0   183.0  
3      108.0   186.0  
4      115.0   197.

In [15]:
car = car[car['Weight'] <= 4000]
print(car.shape)

(329, 15)


In [19]:
# Increase all values in 'MPG_City' by 3
car['MPG_City'] = car['MPG_City'] + 3
print(car[['MPG_City']].head())

# Alternative method
car['MPG_City'] = car['MPG_City'].apply(lambda x: x + 3)
print(car[['MPG_City']].head())



   MPG_City
1      30.0
2      28.0
3      26.0
4      24.0
5      24.0
