<a href="https://colab.research.google.com/github/naina-dot/Automobile_Analysis_Capstone_Project/blob/main/Copy_of_Sample_EDA_Submission_Template.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    -  Enhancing Business Strategies through Automobile Data Analytics



##### **Project Type**    - EDA/Regression/Classification/Unsupervised
##### **Contribution**    - Individual
##### **Team Member -     Naina Mehta


# **Project Summary -**

In today's competitive automobile industry, data-driven decisions are crucial to optimize pricing, improve fuel efficiency, and cater to customer preferences. The purpose of this project is to analyze a comprehensive dataset of automobile features and specifications to generate actionable insights that can drive business growth and innovation.

The dataset comprises 205 entries and 26 features detailing various car attributes such as make, engine specifications, dimensions, and price. The analysis will focus on understanding the relationships between these variables and how they impact key outcomes like pricing, fuel efficiency, and insurance risk. Here are **10 objectives** for the automobile dataset excluding machine learning:

### 1. **Determine the Key Factors Influencing Automobile Prices**
   - Objective: Identify the most influential features (e.g., engine size, horsepower, body style) that drive car pricing by analyzing correlations and relationships between variables.

### 2. **Investigate the Impact of Fuel Type on Automobile Performance and Pricing**
   - Objective: Compare cars with different fuel types (gas, diesel) to understand how fuel type affects both performance (e.g., horsepower, fuel efficiency) and pricing.

### 3. **Examine the Relationship Between Engine Specifications and Fuel Efficiency**
   - Objective: Analyze the impact of engine characteristics such as engine size, number of cylinders, and horsepower on fuel efficiency (city and highway MPG).

### 4. **Assess the Role of Body Style in Pricing and Consumer Preferences**
   - Objective: Explore how different body styles (sedan, hatchback, SUV, etc.) affect vehicle pricing and are associated with consumer preferences for specific features.

### 5. **Analyze Curb Weight and Its Impact on Vehicle Performance**
   - Objective: Study how the weight of a car (curb weight) influences its performance metrics like fuel efficiency, handling, and price.

### 6. **Examine the Relationship Between Horsepower and Pricing**
   - Objective: Investigate the direct correlation between horsepower and the price of vehicles to understand how performance contributes to the overall market value.

### 7. **Evaluate the Impact of Drivetrain Configuration on Vehicle Performance**
   - Objective: Compare cars with front-wheel drive (FWD), rear-wheel drive (RWD), and all-wheel drive (AWD) to see how drivetrain type affects performance metrics like fuel efficiency, handling, and price.

### 8. **Assess the Influence of Engine Size on Price and Performance**
   - Objective: Analyze the relationship between engine size and key performance metrics (e.g., horsepower, fuel efficiency) as well as its influence on pricing.

### 9. **Explore the Correlation Between Insurance Risk (Symboling) and Vehicle Features**
   - Objective: Investigate how the insurance risk rating (symboling) is related to factors like horsepower, curb weight, and price to determine if higher-risk vehicles have distinct characteristics.

### 10. **Identify Trends in Fuel Efficiency Across Vehicle Classes**
   - Objective: Compare the fuel efficiency of different car types (e.g., sedans, SUVs, hatchbacks) across city and highway driving conditions to identify which vehicle classes are most fuel-efficient.

These objectives focus on gaining actionable insights and exploring relationships within the dataset, providing a deeper understanding of how various factors influence automobile performance and pricing.








# **GitHub Link -**

https://github.com/naina-dot

# **Problem Statement**


**Problem Statement:**

The automobile industry is highly competitive, with numerous factors influencing vehicle pricing, performance, and consumer preferences. Understanding how attributes such as engine size, horsepower, body style, fuel type, and drivetrain configuration affect a car's price and performance is critical for manufacturers and marketers. However, there is a lack of clear insight into how these factors interact and contribute to the final pricing of vehicles, as well as how they align with consumer preferences for performance, fuel efficiency, and safety. This project aims to analyze the relationships between key automobile features and pricing to provide actionable insights that can help manufacturers optimize pricing strategies and offer vehicles that better meet consumer demands in terms of performance, efficiency, and safety.

#### **Define Your Business Objective?**

**Business Objective:**  
To optimize vehicle pricing based on key features that align with consumer preferences.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt


### Dataset Loading

In [None]:
# Load Dataset
dataset = pd.read_csv('/content/automobile_data.csv')


### Dataset First View

In [None]:
# Dataset First Look
dataset.head()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
dataset.shape

### Dataset Information

In [None]:
# Dataset Info
dataset.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count

In [None]:
len(dataset[dataset.duplicated()])

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count

In [None]:
df= pd.read_csv('/content/automobile_data.csv')
df.replace('?', np.nan, inplace=True)
print(df.head())
print(df.isnull().sum())

In [None]:
# Visualizing the missing values
sns.heatmap(df.isnull(), cbar=False)

### What did you know about your dataset?

The dataset contains information about various automobiles including their specifications, performance metrix, and pricing. The dataset includes 5 rows and 26 columns, with no null values detected. It includes various types of datasets with Numerical: Price, horsepower, engine size, etc. Categorical: Fuel type, body style, drivetrain, etc.
Certain columns may contain missing values, represented by '?' or null entries, indicating a need for data cleaning and imputation.The dataset includes performance metrics such as fuel efficiency (MPG), which are crucial for understanding the economic and environmental aspects of the vehicles.The dataset includes performance metrics such as fuel efficiency (MPG), which are crucial for understanding the economic and environmental aspects of the vehicles.The dataset may require data wrangling steps such as handling missing values, correcting inconsistent entries, and normalizing or transforming features for analysis.
Overall, the dataset provides a rich source of information that can be analyzed to uncover insights about automobile pricing, performance, and consumer preferences, but it requires careful cleaning and preparation to ensure accurate and meaningful results.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
dataset.columns

In [None]:
# Dataset Describe
dataset.describe(include='all')

### Variables Description

Symboling:

Normalized-losses:

Make:

Fuel-type:

aspiration:

num-of-doors:

body-style:

drive-wheels:

engine-location:

wheel-base:

length:

width:

height:

curb-weight:

engine-type:

num-of-cylinders:

engine-size:

fuel-system:

bore:

stroke:
       
compression-ratio:

horsepower:

peak-rpm:

city-mpg:
      
highway-mpg:
       
price:


### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.

In [None]:
for i in dataset.columns.tolist():
  print("No. of unique values in ",i,"is",dataset[i].nunique(),".")

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.
# Check for missing values
df.isnull().sum()


In [None]:
# Write your code to make your dataset analysis ready.
# Check for missing values
df.isnull().sum()

# Replace '?' with NaN
df.replace('?', np.nan, inplace=True)

# Convert 'normalized-losses' column to numeric, coercing errors to NaN
df['normalized-losses'] = pd.to_numeric(df['normalized-losses'], errors='coerce')

# Calculate the mean of 'normalized-losses', excluding NaN values
mean_value = df['normalized-losses'].mean()

# Replace NaN values with the mean using loc
df.loc[:, 'normalized-losses'] = df['normalized-losses'].fillna(mean_value)

print("\nUpdated dataset (first 5 rows):")
print(df.head())







In [None]:
# Check for missing values in 'num-of-doors'
print("Missing values in 'num-of-doors':")
print(df['num-of-doors'].isnull().sum())  # Verify there are 2 missing values

# Calculate the mode of 'num-of-doors'
mode_value = df['num-of-doors'].mode()[0]
print(f"\nMode of 'num-of-doors': {mode_value}")

# Replace NaN values in 'num-of-doors' with the mode
df.loc[:, 'num-of-doors'] = df['num-of-doors'].fillna(mode_value)

# Verify the replacement of missing values
print("\nMissing values in 'num-of-doors' after replacement:")
print(df['num-of-doors'].isnull().sum())

# Print the updated dataset's head to verify the changes
print("\nUpdated dataset (first 5 rows):")
print(df.head())

In [None]:
# Convert 'bore' to numeric (it may be stored as a string)
df['bore'] = pd.to_numeric(df['bore'], errors='coerce')

# Check for missing values in 'bore'
print("Missing values in 'bore':")
missing_bore_count = df['bore'].isnull().sum()
print(missing_bore_count)  # This should show 4 missing values

#Fill NaN values with the **median**
median_value_bore = df['bore'].median()
print(f"\nMedian of 'bore': {median_value_bore}")
df.loc[:, 'bore'] = df['bore'].fillna(median_value_bore)  # Replace with median

# Verify the replacement of missing values
print("\nMissing values in 'bore' after replacement:")
print(df['bore'].isnull().sum())  # This should show 0 if all NaNs were replaced

# Print the updated dataset's head to verify the changes
print("\nUpdated dataset (first 5 rows):")
print(df.head())

In [None]:
# Convert 'stroke' to numeric (it may be stored as a string)
df['stroke'] = pd.to_numeric(df['stroke'], errors='coerce')

# Check for missing values in 'stroke'
print("Missing values in 'stroke':")
print(df['stroke'].isnull().sum())  # This will show how many missing values are there

# Calculate the mean of 'stroke', excluding NaN values
mean_value_stroke = df['stroke'].mean()
print(f"\nMean of 'stroke': {mean_value_stroke}")

# Replace NaN values in 'stroke' with the mean
df.loc[:, 'stroke'] = df['stroke'].fillna(mean_value_stroke)

# Verify the replacement of missing values
print("\nMissing values in 'stroke' after replacement:")
print(df['stroke'].isnull().sum())  # This should show 0 if all NaNs were replaced

# Print the updated dataset's head to verify the changes
print("\nUpdated dataset (first 5 rows):")
print(df.head())

In [None]:
# Convert 'horsepower' to numeric (it may be stored as a string)
df['horsepower'] = pd.to_numeric(df['horsepower'], errors='coerce')

# Check for missing values in 'horsepower'
print("Missing values in 'horsepower':")
print(df['horsepower'].isnull().sum())  # This will show how many missing values there are

# Calculate the mean of 'horsepower', excluding NaN values
mean_value_horsepower = df['horsepower'].mean()
print(f"\nMean of 'horsepower': {mean_value_horsepower}")

# Replace NaN values in 'horsepower' with the mean
df.loc[:, 'horsepower'] = df['horsepower'].fillna(mean_value_horsepower)

# Verify the replacement of missing values
print("\nMissing values in 'horsepower' after replacement:")
print(df['horsepower'].isnull().sum())  # This should show 0 if all NaNs were replaced

# Print the updated dataset's head to verify the changes
print("\nUpdated dataset (first 5 rows):")
print(df.head())

In [None]:
# Convert 'peak-rpm' to numeric (it may be stored as a string)
df['peak-rpm'] = pd.to_numeric(df['peak-rpm'], errors='coerce')

# Check for missing values in 'peak-rpm'
print("Missing values in 'peak-rpm':")
print(df['peak-rpm'].isnull().sum())  # This will show how many missing values are there

# Calculate the mean of 'peak-rpm', excluding NaN values
mean_value_peak_rpm = df['peak-rpm'].mean()
print(f"\nMean of 'peak-rpm': {mean_value_peak_rpm}")

# Replace NaN values in 'peak-rpm' with the mean
df.loc[:, 'peak-rpm'] = df['peak-rpm'].fillna(mean_value_peak_rpm)

# Verify the replacement of missing values
print("\nMissing values in 'peak-rpm' after replacement:")
print(df['peak-rpm'].isnull().sum())  # This should show 0 if all NaNs were replaced

# Print the updated dataset's head to verify the changes
print("\nUpdated dataset (first 5 rows):")
print(df.head())

In [None]:
# Convert 'price' to numeric (it may be stored as a string)
df['price'] = pd.to_numeric(df['price'], errors='coerce')

# Check for missing values in 'peak-rpm'
print("Missing values in 'price':")
print(df['price'].isnull().sum())  # This will show how many missing values are there

# Calculate the mean of 'price', excluding NaN values
mean_value_price = df['price'].mean()
print(f"\nMean of 'price': {mean_value_price}")

# Replace NaN values in 'price' with the mean
df.loc[:, 'price'] = df['price'].fillna(mean_value_price)

# Verify the replacement of missing values
print("\nMissing values in 'price' after replacement:")
print(df['price'].isnull().sum())  # This should show 0 if all NaNs were replaced

# Print the updated dataset's head to verify the changes
print("\nUpdated dataset (first 5 rows):")
print(df.head())

### What all manipulations have you done and insights you found?

After manipulation I found that the column names such as horsepower, normalization, bore, stroke etc. were having null values affecting the overall data and its visualization aspect. I replaced the null values in each of these columns with their respective means, medians and mode values which can provide us with useful insights regarding the automobile normalization rate, horsepower in engines, stroke,bore and price. Furthermore, during manipulation I found that the price can be easily impacted by all these other factors.  

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code
import matplotlib.pyplot as plt

# Plot a histogram to show the distribution of the 'price' column
plt.figure(figsize=(10, 6))

# Histogram of price
plt.hist(df['price'], bins=20, color='skyblue', edgecolor='black')
plt.title('Distribution of Price after Imputation')
plt.xlabel('Price')
plt.ylabel('Frequency')

# Show the plot
plt.show()

# Boxplot to check for outliers in 'price'
plt.figure(figsize=(8, 4))

# Boxplot of price
plt.boxplot(df['price'], vert=False, patch_artist=True, boxprops=dict(facecolor='lightblue'))
plt.title('Boxplot of Price after Imputation')
plt.xlabel('Price')

# Show the boxplot
plt.show()





##### 1. Why did you pick the specific chart?

I picked the **histogram** and **boxplot** for the `price` attribute because they are both highly effective in understanding different aspects of the data distribution:

### 1. **Histogram**:
   - **Why**: A histogram is ideal for visualizing the distribution of a single continuous numerical variable, such as price. It allows you to see the frequency of prices across different ranges.
  

### 2. **Boxplot**:
   - **Why**: A boxplot is great for detecting outliers and summarizing the distribution of data through its quartiles.
  
   
### Why These Plots?
- **Combination of insights**: The histogram gives a broad view of the frequency distribution, showing how prices are spread across different ranges. The boxplot complements this by showing the statistical summary (like median and quartiles) and any potential outliers.
- **Detecting patterns**: These plots allow us to spot patterns in the `price` attribute such as skewness, clusters, and unusual values, which are important for understanding the dataset and making informed decisions in further analysis.

If you want to explore any other visualizations, let me know!

##### 2. What is/are the insight(s) found from the chart?

### 1. **Histogram**:
   - **What it shows**:
     - The spread of the prices : spread variably from 5000 to 45000.
     - The most common price ranges (peaks) : from 5000 to 15000
     - Whether the distribution is skewed to the left (lower prices) or to the right (higher prices): to the right
     - Any gaps or unusual distributions: yes, after 35000 and 40000.


2. Boxplot
   - **What it shows**:
     - The **median** price (middle line in the box): The orange line inside the box represents the median price, which appears to be around $12,000. This is the middle value of the dataset.

     - The **interquartile range** (IQR) which shows the spread of the middle 50% of prices: The box itself represents the range between the first quartile (Q1, around $9,000) and the third quartile (Q3, around $17,000). This means that 50% of the vehicles in the dataset have prices between these two values.

     - Any **outliers** (dots outside of the whiskers): The "whiskers" extend to about $5,000 (minimum) and $25,000 (maximum), representing the range of prices that fall within 1.5 times the IQR. Any data points outside this range are considered potential outliers.

     - Whether the distribution is symmetric or skewed : The distribution of the price data shown in the boxplot is right-skewed (positively skewed).
     This plot indicates that while most vehicles are priced within a moderate range, a small number of high-end vehicles have much higher prices.

##### 3. Will the gained insights help creating a positive business impact?
The insights from the price distribution offer practical ways for a business to tailor its pricing, marketing strategies, and inventory management. By understanding the concentration of prices and the presence of outliers, the company can develop a balanced approach that maximizes profitability and customer satisfaction across different market segments.
1. Knowing that the majority of cars are priced between $9,000 and $17,000, with some outliers at higher prices, can help the company position its products effectively.
2. e right-skewed distribution indicates that while most customers may look for moderately priced vehicles, there is still a market for high-end cars, as seen from the outliers. The company can focus on segmenting the market and offering tailored marketing strategies for different price ranges.
3. Since there are fewer high-priced vehicles (outliers), the company could decide to expand its product line to cater more to the premium segment, provided there is sufficient demand.


#### Chart - 2

In [None]:
# Chart - 2 visualization code

# Assuming the dataset has 'city-mpg' and 'highway-mpg' columns for fuel efficiency
city_mpg = df['city-mpg']
highway_mpg = df['highway-mpg']

# Plot a histogram for city and highway mpg
plt.figure(figsize=(12, 6))

# Histogram for city-mpg
plt.subplot(1, 2, 1)  # Two subplots, this is the first
plt.hist(city_mpg, bins=20, color='green', edgecolor='black')
plt.title('Distribution of City MPG')
plt.xlabel('City MPG')
plt.ylabel('Frequency')

# Histogram for highway-mpg
plt.subplot(1, 2, 2)  # This is the second subplot
plt.hist(highway_mpg, bins=20, color='blue', edgecolor='black')
plt.title('Distribution of Highway MPG')
plt.xlabel('Highway MPG')
plt.ylabel('Frequency')
# Adding a legend
plt.legend()

# Show the plot
plt.show()





##### 1. Why did you pick the specific chart?

I chose histograms for the City MPG and Highway MPG distributions because they are an ideal way to visualize the frequency of continuous numerical data. In this case, both MPG values are continuous variables that measure fuel efficiency, and the histograms effectively show how these values are distributed across the dataset.

##### 2. What is/are the insight(s) found from the chart?

From the **histograms** of **City MPG** and **Highway MPG**, several key insights can be derived:

### 1. **Most Common City MPG Range**:
   - The **City MPG** (left histogram) is most commonly clustered between **20 and 30 MPG**, with the largest group of vehicles around the **25 MPG** mark.
   - This indicates that most vehicles in the dataset offer moderate fuel efficiency in city driving conditions.
   
### 2. **Highway MPG Distribution**:
   - For **Highway MPG** (right histogram), the distribution shows the majority of vehicles fall between **25 and 40 MPG**. The peak is around **30-35 MPG**, meaning vehicles tend to be more fuel-efficient on highways compared to city driving, which is expected.
   - There's also a significant spike at **40 MPG**, suggesting that some vehicles offer very high efficiency on highways.

### 3. **Skewness in the Distribution**:
   - The **City MPG** distribution shows a slight **right skew**, meaning there are some vehicles with very low fuel efficiency (below 15 MPG), though these are few in number.
   - The **Highway MPG** distribution is less skewed, but still shows a few vehicles offering very high efficiency, close to **50 MPG**.

### 4. **Gaps in MPG Range**:
   - Both histograms show gaps, especially in the higher MPG ranges (above 35 MPG for City and above 50 MPG for Highway). This suggests that fewer vehicles in the dataset offer extremely high fuel efficiency.
   
### 5. **Highway MPG is Generally Better**:
   - As expected, vehicles perform better in terms of fuel efficiency on highways than in cities. The data shows more vehicles achieving higher MPG on highways, with a larger range between 30 and 40 MPG compared to city driving, where the common range is 20-30 MPG.

### 6. **Potential Outliers**:
   - In both charts, there are some vehicles with unusually high or low fuel efficiency (above 45 MPG in city or highway driving, or below 15 MPG). These could be outliers, indicating specialized vehicles (such as hybrids for high MPG or large trucks for low MPG).

### Business Implications:
   - **Customer Segmentation**: Vehicles in the 25-35 MPG range may appeal to the majority of consumers, particularly those who prioritize balanced fuel efficiency.
   - **Targeting Eco-Friendly Customers**: The few vehicles with high fuel efficiency (above 40 MPG) could be marketed to environmentally conscious customers or those looking for long-distance, fuel-saving options.
   - **Performance-Based Pricing**: Cars with high fuel efficiency, especially on highways, could be positioned as premium options due to the long-term savings on fuel.

Overall, these insights suggest the majority of vehicles are mid-range in fuel efficiency, with a few models designed for either high performance or greater fuel economy.

##### 3. Will the gained insights help creating a positive business impact?
While most of the insights are actionable for positive growth, there are a few potential areas that could lead to challenges:

Limited Availability of High-Efficiency Vehicles:

The analysis shows that very few vehicles achieve extremely high fuel efficiency (above 40 MPG). This limited availability of eco-friendly or fuel-efficient models may prevent the company from fully capitalizing on the growing demand for greener alternatives. If competitors offer more high-MPG vehicles, it could lead to lost market share in the eco-friendly vehicle segment.
Poorly Performing Vehicles in Fuel Efficiency:

Some vehicles in the dataset exhibit very low MPG (below 15 MPG), which could be a liability in markets where fuel efficiency is a top concern. Consumers may be deterred by these models, especially as fuel costs rise and environmental concerns increase. Without careful positioning (e.g., luxury or performance branding), these vehicles might be seen as outdated or costly to operate, leading to negative brand perception and reduced sales.
Outliers and Perception:

The presence of outliers with extremely low MPG might create a perception that the company's product line is not competitive with fuel-efficient brands. This could hinder growth if customers associate the brand with poor fuel economy.

#### Chart - 3

In [None]:
# Chart - 3 visualization code


# Create a boxplot for Body Style vs Price
plt.figure(figsize=(10, 6))

# Boxplot using seaborn
sns.boxplot(x='body-style', y='price', data=df, palette="Set3")

# Add title and labels
plt.title('Price Distribution by Body Style')
plt.xlabel('Body Style')
plt.ylabel('Price')

# Show the plot
plt.show()


##### 1. Why did you pick the specific chart?

I chose this chart because the combination of boxplot is ideal for showing the both the parameters of category-wise distribution and the price range. This will help us to figure out the most common car type with the price range it fits. Also, the outliers will help us figure out that there are some values in the categorical-wise distribution which are outside the usual price range.

##### 2. What is/are the insight(s) found from the chart?

The box plot of price distribution by body style provides several key insights about the relationship between the type of vehicle and its price:

### 1. **Price Range Differences**:
   - Each body style (e.g., sedan, hatchback, SUV, convertible, etc.) exhibits a different price range. Some body styles have a wider price distribution, indicating a broader range of prices within that category, while others are more tightly clustered.
   - For instance, SUVs and convertibles may have a higher median price and larger spread (interquartile range), suggesting they have premium models or are more expensive on average compared to sedans or hatchbacks.

### 2. **Outliers**:
   - The presence of **outliers** (data points that fall outside the whiskers) indicates that certain vehicles within a body style are priced significantly higher or lower than the majority of models in that category. For example, luxury versions of sedans or hatchbacks could be driving these outliers.
   - Identifying these outliers could help the business analyze premium models or special editions that fetch much higher prices.

### 3. **Skewness**:
   - The distribution of prices within a body style may show skewness (either left or right). A **right-skewed** distribution indicates that while most vehicles are relatively affordable, a few models are significantly more expensive, possibly due to luxury or high-performance variants.
   - This could help in understanding whether a certain body style tends to include a wide spectrum of offerings, from economy to luxury.

### 4. **Median Price Comparison**:
   - The **median price** (marked by the horizontal line within the box) gives a central value for each body style, which helps compare the relative affordability or premium nature of different types of vehicles.
   - For example, sedans might have a lower median price than SUVs, reflecting that sedans are generally more affordable.

### 5. **Business Strategy Insights**:
   - The company can use these insights to target different market segments. For instance, if SUVs have a higher price range and median, they can be marketed towards high-income or family-oriented customers. On the other hand, more affordable body styles like hatchbacks may be targeted towards first-time buyers or budget-conscious customers.
   - Additionally, the presence of outliers suggests the possibility of niche markets for luxury or high-end models within certain body styles.

### 6. **Price Consistency**:
   - A body style with a **smaller interquartile range** (IQR) has a more consistent pricing structure, indicating that most vehicles in that category are priced similarly. This could appeal to customers seeking predictability in pricing. Conversely, a wider IQR indicates greater diversity in price points.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The box plot of price distribution by body style helps businesses understand how different types of vehicles are priced in the market, which can inform marketing, product development, and pricing strategies. It reveals variations in affordability, the existence of premium models, and opportunities for targeted segmentation.

#### Chart - 4

In [None]:
# Chart - 4 visualization code


# Select relevant columns for correlation analysis
correlation_columns = ['price', 'horsepower', 'engine-size', 'curb-weight', 'city-mpg', 'highway-mpg']

# Calculate the correlation matrix
correlation_matrix = df[correlation_columns].corr()

# Plot the heatmap
plt.figure(figsize=(8, 6))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', linewidths=0.5)

# Add title
plt.title('Correlation Heatmap of Selected Features with Price')

# Show the heatmap
plt.show()


##### 1. Why did you pick the specific chart?

The heatmap was chosen because it provides a **concise, visual representation** of multiple relationships between variables at once. It's particularly useful for identifying:

- **Patterns**: It highlights where strong positive or negative correlations exist, guiding further analysis.
- **Strength of Relationships**: The color-coded intensity helps to easily identify the strength of the correlation (e.g., dark colors for strong correlations).
- **Insights at a Glance**: A heatmap simplifies the comparison of many numeric variables, making it quicker to spot important relationships compared to reading raw correlation coefficients.

This diagram is valuable for exploring key trends that impact **pricing**, **performance**, and **efficiency** in the automotive industry, providing actionable insights that can inform business decisions such as vehicle design, pricing strategy, and consumer segmentation.

##### 2. What is/are the insight(s) found from the chart?

### Insights from the Heatmap of Price, Horsepower, Curb Weight, Engine Size, City MPG, and Highway MPG:

The heatmap visually represents the correlation between different numeric variables in the dataset. In this case, it helps identify the relationships between **price**, **horsepower**, **curb weight**, **engine size**, **city MPG**, and **highway MPG**. Here are the key insights we can draw from this heatmap:

### 1. **Positive Correlations with Price**:
   - **Horsepower**: A strong positive correlation between **horsepower** and **price** suggests that vehicles with higher horsepower tend to be more expensive. This is expected, as more powerful engines are often associated with premium or high-performance cars.
   - **Curb Weight**: A positive correlation with **curb weight** indicates that heavier vehicles tend to have higher prices. This could reflect the fact that larger, heavier vehicles (such as SUVs and trucks) are more expensive, possibly due to their size, materials, and additional features.
   - **Engine Size**: The heatmap is likely to show a strong positive correlation between **engine size** and **price**. Larger engines are often associated with more powerful vehicles, which tend to be more expensive due to increased performance and production costs.

### 2. **Negative Correlations with Price**:
   - **City MPG** and **Highway MPG**: A negative correlation between **city MPG** (miles per gallon in the city) and **highway MPG** with **price** suggests that more fuel-efficient vehicles tend to be less expensive. Vehicles with lower fuel efficiency, such as high-performance sports cars or large SUVs, may have higher prices due to their performance capabilities but lower fuel economy.

### 3. **Correlation Between Horsepower and Other Variables**:
   - **Horsepower and Engine Size**: A strong positive correlation between **horsepower** and **engine size** is expected, as larger engines typically produce more power.
   - **Horsepower and Curb Weight**: Moderate to strong correlation between **horsepower** and **curb weight** suggests that more powerful engines are often found in heavier vehicles. This could be due to the fact that larger, more powerful engines are needed to move heavier cars efficiently.
   
### 4. **Fuel Efficiency (MPG) and Engine Variables**:
   - **City MPG and Highway MPG**: A high positive correlation between these two variables indicates that vehicles that perform well in city driving conditions tend to be fuel-efficient on highways as well.
   - **Engine Size and Fuel Efficiency**: Likely, there’s a negative correlation between **engine size** and **MPG** (both city and highway). Larger engines consume more fuel, leading to lower fuel efficiency, which can be useful for determining which types of vehicles are more eco-friendly.

### 5. **Impact of Curb Weight**:
   - **Curb Weight and Fuel Efficiency**: There’s likely a negative correlation between **curb weight** and both **city MPG** and **highway MPG**, suggesting that heavier vehicles tend to be less fuel-efficient. Heavier cars generally require more energy to move, which results in lower fuel economy.

---




##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

### Will the Gained Insights Help Create a Positive Business Impact?

Yes, the insights gained from the data analysis can significantly contribute to positive business outcomes. Here's how:

1. **Pricing Strategy**:
   - The strong correlation between **price** and variables like **horsepower**, **curb weight**, and **engine size** allows businesses to position vehicles with higher performance and larger sizes at premium prices. These insights can help manufacturers and dealerships tailor their marketing strategies for different customer segments, emphasizing power, luxury, or utility.
   
2. **Product Development**:
   - Understanding the negative correlation between **price** and **fuel efficiency** (MPG) could drive automakers to innovate in producing more fuel-efficient vehicles while maintaining competitive pricing. This can be especially important as consumer demand for eco-friendly cars grows. Businesses can identify opportunities for hybrid or electric vehicle development that balances fuel efficiency and performance.

3. **Target Market Identification**:
   - The segmentation of vehicles based on attributes like **body style**, **engine size**, and **fuel efficiency** can help businesses identify specific consumer segments (e.g., performance enthusiasts, eco-conscious buyers, or budget buyers). This allows for better-targeted marketing campaigns, potentially improving sales conversions.
   
4. **Supply Chain and Inventory Management**:
   - By knowing which factors (e.g., horsepower, curb weight) drive the most expensive vehicle models, businesses can optimize their inventory. High-value cars may be stocked differently depending on regional demand, aligning production and distribution strategies with consumer preferences.

5. **Environmental Considerations**:
   - With increasing emphasis on sustainability, the insight into how fuel efficiency correlates with pricing and vehicle attributes can guide manufacturers toward greener technology and marketing eco-friendly vehicles. This can enhance brand reputation and attract a growing segment of environmentally conscious customers.

---

### Are There Any Insights That Lead to Negative Growth?

Yes, certain insights could potentially lead to negative growth if not addressed appropriately:

1. **Fuel Inefficiency Impact on Sales**:
   - The negative correlation between **fuel efficiency** and **price** could be a warning for businesses selling high-performance, less fuel-efficient vehicles. As fuel costs rise and environmental concerns grow, demand for these vehicles might decline. If a company continues focusing solely on large, fuel-inefficient cars, it risks losing market share to more eco-friendly competitors. This trend is especially relevant in markets where governments impose higher taxes or restrictions on high-emission vehicles.
   
   **Justification**: With the global shift toward sustainability, many customers prefer fuel-efficient or electric vehicles. If a company fails to innovate in this direction, it could lose relevance and face declining sales in the long run.

2. **Higher-Priced, Heavier Vehicles May Lose Appeal**:
   - Heavier vehicles tend to be priced higher (correlation between **curb weight** and **price**), but their lower fuel efficiency may turn off a growing segment of budget-conscious or environmentally aware customers. While luxury or performance car buyers may still prefer such vehicles, general market demand may shrink over time.
   
   **Justification**: Economic downturns or rising fuel prices could shift customer preferences toward more affordable and efficient cars. Companies relying on heavier, expensive models may face revenue declines unless they diversify their offerings.

3. **Over-Emphasis on Performance**:
   - A strong focus on **horsepower** and **engine size** may appeal to a niche audience but might alienate a larger audience looking for affordable, practical, and fuel-efficient options. If a company continues to prioritize these attributes without balancing them with fuel efficiency or cost-effective alternatives, it could lead to slower growth.
   
   **Justification**: Consumer demand for performance vehicles may remain high in certain markets, but global trends indicate a shift toward practicality and efficiency. If not balanced, the business could face stagnant or negative growth in broader markets.

---

### Conclusion:

While the insights provide clear opportunities for **positive business impact** through better pricing strategies, product development, and market targeting, **negative growth** could arise if businesses don't adapt to the increasing demand for fuel efficiency and eco-friendly vehicles. Addressing these trends by diversifying offerings and focusing on innovation can help avoid stagnation and maintain long-term profitability.

#### Chart - 5

In [None]:
# Chart - 5 visualization code

# Calculate the average horsepower for each drivetrain configuration
avg_horsepower = df.groupby('drive-wheels')['horsepower'].mean().reset_index()

# Create a bar chart
plt.figure(figsize=(10, 6))

# Bar plot using seaborn
sns.barplot(x='drive-wheels', y='horsepower', data=avg_horsepower, palette="coolwarm")

# Add title and labels
plt.title('Average Horsepower by Drivetrain Configuration')
plt.xlabel('Drivetrain Configuration')
plt.ylabel('Average Horsepower')

# Show the plot
plt.show()


##### 1. Why did you pick the specific chart?

Bar plots are excellent for comparing discrete categories. In this case, different drivetrain configurations (e.g., front-wheel, rear-wheel, all-wheel drive) can be easily compared side by side to highlight differences in average horsepower.

##### 2. What is/are the insight(s) found from the chart?

### Insights from the Average Horsepower by Drivetrain Bar Plot

1. **Performance Variations**:
   - The bar plot highlights distinct differences in average horsepower among the various drivetrain configurations. For instance, vehicles with rear-wheel drive (RWD) may show higher average horsepower compared to those with front-wheel drive (FWD), indicating a potential focus on performance-oriented vehicles in that category.

2. **Market Segmentation**:
   - The data can reveal consumer preferences for different drivetrain types. Higher average horsepower in certain configurations may appeal to performance enthusiasts, suggesting a market segment that values speed and power.

3. **Design Implications**:
   - Manufacturers can use these insights to guide their design and engineering processes. If RWD or all-wheel drive (AWD) configurations consistently offer higher horsepower, companies might prioritize these drivetrains in new models aimed at performance markets.

4. **Consumer Expectations**:
   - Understanding average horsepower by drivetrain helps set consumer expectations. For instance, buyers of vehicles with FWD may need to adjust their performance expectations compared to those purchasing RWD or AWD vehicles.

5. **Strategic Marketing**:
   - Marketers can tailor their messaging based on drivetrain performance. Highlighting the horsepower of specific configurations can enhance promotional strategies aimed at performance-driven customers.

6. **Trends Over Time**:
   - If historical data is available, analyzing how average horsepower by drivetrain has changed over time can provide insights into industry trends. A consistent increase in horsepower for specific drivetrains might indicate technological advancements and a shift in consumer demand toward more powerful vehicles.

7. **Competitive Analysis**:
   - Comparing average horsepower of similar vehicle types across different brands can help manufacturers identify competitive advantages or gaps in their product offerings. If competitors are consistently outperforming in horsepower for a particular drivetrain type, it may prompt a reevaluation of design strategies.



##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

the insights drawn from this analysis can significantly inform decision-making processes for manufacturers, marketers, and consumers, ultimately contributing to enhanced vehicle performance, targeted marketing strategies, and better consumer satisfaction.

#### Chart - 6

In [None]:
# Chart - 6 visualization code
# Calculate average horsepower and price for each fuel type
avg_fuel_performance = df.groupby('fuel-type')[['horsepower', 'price']].mean().reset_index()

# Create a bar chart
plt.figure(figsize=(12, 6))

# Bar plot for average horsepower
plt.subplot(1, 2, 1)
sns.barplot(x='fuel-type', y='horsepower', data=avg_fuel_performance, palette='Set2')
plt.title('Average Horsepower by Fuel Type')
plt.xlabel('Fuel Type')
plt.ylabel('Average Horsepower')

# Bar plot for average price
plt.subplot(1, 2, 2)
sns.barplot(x='fuel-type', y='price', data=avg_fuel_performance, palette='Set2')
plt.title('Average Price by Fuel Type')
plt.xlabel('Fuel Type')
plt.ylabel('Average Price')

plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?

Bar plots are ideal for comparing categorical variables like fuel type (e.g., gas, diesel). By using bars, the differences in average horsepower and average price between fuel types are visually clear, allowing for quick identification of which fuel type is associated with higher horsepower or price.

##### 2. What is/are the insight(s) found from the chart?

### Insights from the Bar Plots for Average Horsepower and Average Price by Fuel Type

1. **Performance Differences by Fuel Type**:
   - **Average Horsepower**: The bar plot comparing horsepower across fuel types (e.g., gas and diesel) shows clear differences in performance. Typically, **diesel engines** may exhibit higher average horsepower compared to **gas engines**, which is useful for vehicles requiring more power, such as trucks and SUVs. This suggests that diesel-powered vehicles are designed with more emphasis on performance and towing capacity.

2. **Price Variations by Fuel Type**:
   - **Average Price**: The bar plot comparing price across fuel types reveals that **diesel vehicles** may often be priced higher than their **gasoline** counterparts. This could be due to the higher cost of manufacturing diesel engines or the perceived durability and efficiency of diesel-powered cars. The higher price may also reflect the targeted market segment willing to pay more for performance and fuel efficiency.

3. **Target Market Segmentation**:
   - **Diesel vs. Gasoline**: The insights highlight that diesel vehicles, while offering higher horsepower, come with a higher price tag, potentially appealing to a market segment that prioritizes power and long-term efficiency. Conversely, gasoline vehicles, typically priced lower, may appeal to cost-conscious buyers prioritizing affordability.

4. **Consumer Expectations**:
   - **Fuel Type Influence**: Consumers looking for high performance or durability (as indicated by higher horsepower) are more likely to choose diesel vehicles, while those prioritizing lower upfront costs and broader availability might opt for gasoline-powered vehicles.

5. **Business Strategy**:
   - **Strategic Pricing**: These insights guide automotive companies in pricing strategies. Since diesel vehicles tend to offer more horsepower and higher performance, companies can justify higher pricing for diesel models, targeting performance-driven consumers. On the other hand, companies can promote gasoline-powered cars as more economical, expanding their appeal to a wider, cost-sensitive customer base.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The bar plots provide clear insights into how fuel type influences both horsepower and price, with **diesel** generally offering more performance at a higher cost, while **gasoline** appeals to more budget-conscious consumers. This segmentation helps manufacturers target different markets effectively.

#### Chart - 7

In [None]:
# Chart - 7 visualization code

# Count the occurrences of each fuel type
fuel_counts = df['fuel-type'].value_counts()

# Create a pie chart for fuel type distribution
plt.figure(figsize=(8, 8))
plt.pie(fuel_counts, labels=fuel_counts.index, autopct='%1.1f%%', startangle=90, colors=['#ff9999','#66b3ff','#99ff99'])

# Add a title
plt.title('Distribution of Fuel Types in the Dataset')

# Show the pie chart
plt.show()


##### 1. Why did you pick the specific chart?

A pie chart is ideal for showing the distribution of fuel types because it emphasizes proportions. It visually represents how much each fuel type (e.g., gasoline, diesel) contributes to the overall dataset. This makes it easy to see the relative market share of each fuel type.

##### 2. What is/are the insight(s) found from the chart?

### Insights from the Pie Chart of Fuel Type Distribution

1. **Dominance of Gasoline**:
   - If the pie chart shows that the majority of the vehicles use **gasoline** as their fuel type, it indicates that gasoline-powered vehicles dominate the market. This could be due to factors like **cost-efficiency**, **availability**, and **consumer preference** for gasoline engines over alternatives like diesel.

2. **Niche Market for Diesel**:
   - A smaller slice for **diesel** fuel suggests that diesel vehicles occupy a **niche** in the market. Diesel engines are often used in **specific vehicle types** like trucks or for buyers seeking **better fuel efficiency** and **higher torque** for towing or long-distance driving, though fewer in number.

3. **Potential for Alternative Fuels**:
   - If the dataset includes **alternative fuel types** (like electric or hybrid) and these have a small or non-existent share, it suggests there is either **limited adoption** of these technologies or that the dataset primarily focuses on traditional internal combustion engines.

4. **Consumer Preferences**:
   - The chart provides insight into **consumer preferences**. A larger portion for gasoline vehicles might reflect buyers prioritizing factors like **purchase price** and **ease of maintenance**, while diesel's smaller share may reflect its appeal to a **specific subset** of buyers valuing fuel efficiency and performance.

5. **Market Segmentation**:
   - Manufacturers can use these insights for **market segmentation**. If gasoline dominates, they may focus marketing efforts on gasoline-powered models while developing **diesel-specific campaigns** for targeted segments like commercial vehicle owners.

6. **Environmental and Policy Impacts**:
   - If diesel has a very small share, it could be due to **environmental regulations** and changing policies that favor gasoline or alternative fuels over diesel, especially in regions where emissions standards are stricter.



##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

 the pie chart reveals the **market distribution** of fuel types, indicating gasoline’s dominance and diesel’s niche role, providing valuable insights for manufacturers, marketers, and policymakers in understanding consumer behavior and planning future vehicle development.

#### Chart - 8

In [None]:
# Chart - 8 visualization code

# Create a scatter plot for Curb Weight vs. Horsepower
plt.figure(figsize=(10, 6))
sns.scatterplot(x='curb-weight', y='horsepower', data=df, alpha=0.7, color='blue')

# Add a regression line to show the trend
sns.regplot(x='curb-weight', y='horsepower', data=df, scatter=False, color='orange')

# Add title and labels
plt.title('Curb Weight vs. Horsepower')
plt.xlabel('Curb Weight (lbs)')
plt.ylabel('Horsepower')
plt.grid()

# Show the plot
plt.show()


##### 1. Why did you pick the specific chart?

A regression plot is ideal for showing the relationship between two continuous variables: curb weight (the weight of the vehicle without passengers or cargo) and horsepower (the engine's power output). It helps visualize how these two attributes are correlated and whether a heavier vehicle tends to have more horsepower.

##### 2. What is/are the insight(s) found from the chart?

### Insights from the Curb Weight vs. Horsepower Regression Plot

1. **Positive Correlation**:
   - The regression plot likely shows a **positive correlation** between **curb weight** and **horsepower**. This means that as the **weight** of the vehicle increases, the **horsepower** also tends to increase. Heavier vehicles typically require more powerful engines to maintain performance, explaining this trend.

2. **Performance-Oriented Vehicles**:
   - Vehicles with **higher curb weights** (e.g., SUVs, trucks) usually have **higher horsepower** to provide adequate performance, suggesting that these vehicles are designed for heavy-duty tasks, such as towing or off-road driving, where more power is required.

3. **Fuel Efficiency Considerations**:
   - While the plot shows that heavier vehicles have more horsepower, this could also indicate a **trade-off with fuel efficiency**. Vehicles with high curb weight and horsepower are generally less fuel-efficient, which could be a concern for consumers focused on economy.

4. **Design Implications**:
   - Manufacturers can use this insight to balance **design choices**. For lighter vehicles, lower horsepower may be sufficient for optimal performance, while heavier vehicles require more powerful engines, impacting **engine design** and **vehicle cost**.

5. **Outliers**:
   - If there are any **outliers** in the plot, they represent vehicles that do not follow the typical trend—e.g., lighter vehicles with high horsepower or heavier vehicles with lower horsepower. This could be due to specific engineering choices or specialized designs (e.g., sports cars with lightweight frames but high power).

6. **Product Development Strategy**:
   - The insights help manufacturers align their **product development strategies**. They can design vehicles with horsepower tailored to their weight class, ensuring performance is optimized while potentially managing costs by not over-engineering lighter models.



##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

the regression plot reveals a **strong link** between curb weight and horsepower, offering valuable insights for **vehicle design**, **performance optimization**, and understanding the trade-offs between weight, power, and efficiency.

#### Chart - 9

In [None]:
# Chart - 9 visualization code

# Create the scatter plot
plt.figure(figsize=(10, 6))
sns.scatterplot(x='curb-weight', y='price', data=df, color='b', marker='o')

# Add title and labels
plt.title('Price vs. Curb Weight')
plt.xlabel('Curb Weight')
plt.ylabel('Price')

# Show the plot
plt.show()


##### 1. Why did you pick the specific chart?

### Reason for Choosing a Scatter Plot for Price vs. Curb Weight

1. **Visualizing Relationship Between Two Variables**:
   - A **scatter plot** is ideal for visualizing the relationship between two continuous variables, in this case, **price** and **curb weight**. It effectively shows how the **price** of a vehicle changes as its **curb weight** increases or decreases.

2. **Detecting Patterns and Trends**:
   - The scatter plot helps identify whether there is a **pattern** or **correlation** between the two variables. For example, heavier vehicles might tend to be more expensive, and this can be easily spotted in a scatter plot by observing the concentration of data points.

3. **Outlier Identification**:
   - A scatter plot also highlights **outliers**—vehicles that have an unusually high or low price for their curb weight. Identifying these outliers can be helpful in understanding **exceptions** to the general trend and investigating special cases, such as luxury or performance cars.

4. **Clustering Insight**:
   - If clusters of points appear, it might suggest **distinct categories** of vehicles, such as compact, midsize, and large vehicles, each with their own price range. This is particularly useful for **customer segmentation** and **market analysis**.

5. **Simplicity and Clarity**:
   - Scatter plots provide a **clear, simple visualization** that is easy to interpret. The relationship between price and curb weight can be assessed visually without the need for complex statistical techniques, making it accessible for quick business decision-making.

6. **Exploring Variability**:
   - Scatter plots also show the **variability** in data. Even if there is a general trend (like heavier vehicles being more expensive), the scatter plot reveals the spread of data points, helping businesses understand how **consistent** the relationship is.



##### 2. What is/are the insight(s) found from the chart?

### Insights from the Scatter Plot of Price vs. Curb Weight

1. **Positive Correlation**:
   - The scatter plot likely shows a **positive correlation** between **price** and **curb weight**. As the curb weight of vehicles increases, their price tends to rise as well. This suggests that heavier vehicles often come with additional features, capabilities, or manufacturing costs that contribute to higher pricing.

2. **Market Segmentation**:
   - The distribution of data points may reveal different **market segments**. For example, if certain clusters of points are present, it might indicate specific categories of vehicles (e.g., compact cars, sedans, SUVs, and trucks) with distinct pricing strategies based on their weight class.

3. **Outliers and Anomalies**:
   - The presence of **outliers** can be observed in the scatter plot. For instance, a lightweight vehicle priced significantly higher than the majority of heavy vehicles could represent luxury brands or performance vehicles that prioritize features or brand prestige over typical weight-based pricing.

4. **Price Distribution**:
   - If the data points are spread out widely for certain weights, it indicates that there is a **diverse pricing strategy** within specific weight categories. This variability might stem from features, performance, or brand positioning.

5. **Identifying Value for Money**:
   - By analyzing the scatter plot, consumers and businesses can identify vehicles that offer good **value for money** based on their curb weight and price. For instance, a heavier vehicle with a relatively low price might present an attractive option for buyers.

6. **Implications for Manufacturers**:
   - Insights from the scatter plot can help manufacturers adjust their pricing strategies. If heavy vehicles consistently show higher prices, it may warrant a focus on improving value through features or technology, especially for vehicles in competitive weight ranges.

7. **Fuel Efficiency Considerations**:
   - The plot might suggest that while heavier vehicles often have higher prices, they may also impact fuel efficiency differently. This is a vital consideration for consumers prioritizing cost of ownership over the initial purchase price.

8. **Future Product Development**:
   - Insights from the scatter plot can guide manufacturers in developing future products. For example, understanding the weight-to-price ratio can help in designing vehicles that are competitively priced while maintaining desired curb weight for performance and safety.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

### Will the Gained Insights Help Create a Positive Business Impact?

1. **Informed Pricing Strategies**:
   - Understanding the correlation between **price** and **curb weight** allows manufacturers to develop more effective pricing strategies. By identifying how weight influences price, companies can set competitive prices that align with consumer expectations, ultimately leading to increased sales and profitability.

2. **Targeted Marketing and Segmentation**:
   - Insights from the scatter plot enable businesses to segment the market effectively. By recognizing different consumer segments based on vehicle weight and price, marketing efforts can be tailored to address specific needs and preferences, enhancing customer engagement and conversion rates.

3. **Product Development and Innovation**:
   - Insights about the relationship between curb weight and price can inform product development decisions. Manufacturers can prioritize features that resonate with consumers and optimize the weight of vehicles to improve performance and fuel efficiency, thus creating a better product that appeals to the market.

4. **Competitive Advantage**:
   - By analyzing how different body styles and weights correlate with pricing, businesses can identify gaps in the market. This knowledge allows them to position themselves more strategically against competitors, leading to increased market share and brand loyalty.

5. **Sustainability and Efficiency**:
   - If heavier vehicles generally have higher prices, understanding the dynamics can drive companies to focus on lightweight materials or technologies that maintain performance while reducing costs. This can improve overall vehicle efficiency, benefiting both the environment and the bottom line.

### Are There Any Insights That Lead to Negative Growth? Justify with Specific Reason

1. **Misalignment with Consumer Expectations**:
   - If the scatter plot indicates that higher curb weight consistently leads to significantly higher prices, it may deter budget-conscious consumers from purchasing heavier vehicles. This misalignment between consumer expectations and pricing could lead to decreased sales, negatively impacting revenue.

2. **Market Saturation in Heavy Vehicles**:
   - If the data reveals that many competitors are producing heavy vehicles with similar price points, this could result in market saturation. In such cases, it may become increasingly difficult to differentiate products, leading to price wars and reduced profit margins, which can hinder growth.

3. **Increased Production Costs**:
   - A trend of heavier vehicles commanding higher prices may encourage manufacturers to focus on heavier models to increase revenue. However, if these vehicles require significantly more expensive materials or components, the increase in production costs might offset the potential gains, leading to reduced profitability.

4. **Regulatory Compliance and Consumer Trends**:
   - Insights suggesting a consistent trend towards heavier vehicles may conflict with evolving consumer preferences for **fuel efficiency** and **environmental sustainability**. If businesses ignore this shift and continue to emphasize heavy vehicles, they risk losing relevance in a market that increasingly values eco-friendly options.

5. **Dependence on Performance Features**:
   - If the analysis shows that higher curb weight is often associated with vehicles that rely on performance features (e.g., horsepower), companies may overinvest in these features. This could lead to financial strain if consumers prioritize fuel economy and affordability over performance in the long term.





#### Chart - 10

In [None]:
# Chart - 10 visualization code
# Calculate fuel efficiency by averaging city-mpg and highway-mpg
df['fuel-efficiency'] = (df['city-mpg'] + df['highway-mpg']) / 2

# Verify the new column is added
print(df[['city-mpg', 'highway-mpg', 'fuel-efficiency']].head())

# Group by body style and calculate the average fuel efficiency
avg_fuel_efficiency = df.groupby('body-style')['fuel-efficiency'].mean().reset_index()

# Line plot for Average Fuel Efficiency by Body Style
plt.figure(figsize=(10, 6))
sns.lineplot(x='body-style', y='fuel-efficiency', data=avg_fuel_efficiency, marker='o', color='green')

# Add title and labels
plt.title('Average Fuel Efficiency by Body Style')
plt.xlabel('Body Style')
plt.ylabel('Fuel Efficiency (MPG)')

# Show the plot
plt.show()



##### 1. Why did you pick the specific chart?

A line plot effectively showcases trends over categorical data. By using it to illustrate average fuel efficiency across different body styles, it's easy to observe how fuel efficiency varies from one body style to another. This clarity in trends can aid stakeholders in understanding performance metrics quickly.

##### 2. What is/are the insight(s) found from the chart?

### Insights Found from the Line Plot of Average Fuel Efficiency by Body Style

1. **Variation in Fuel Efficiency**:
   - The line plot clearly demonstrates that different body styles exhibit varying levels of average fuel efficiency. For instance, smaller body styles like sedans and hatchbacks typically show higher fuel efficiency compared to larger body styles such as SUVs and trucks. This insight is crucial for consumers prioritizing fuel economy.

2. **Trends and Patterns**:
   - If the line plot shows a distinct upward or downward trend for certain body styles, it can indicate a shift in consumer preferences or improvements in technology. For example, if SUVs show a steady increase in fuel efficiency over time, it might suggest that manufacturers are responding to consumer demands for more efficient larger vehicles.

3. **Market Positioning**:
   - The insights gained from this plot can inform manufacturers about where they stand in terms of fuel efficiency compared to competitors. If a specific body style, like a compact SUV, is showing relatively low efficiency compared to others, this could signal the need for redesign or technological enhancements.

4. **Consumer Preferences**:
   - Understanding which body styles offer the best fuel efficiency can guide marketing strategies. If hatchbacks and sedans consistently lead in fuel efficiency, manufacturers could emphasize these models in their marketing campaigns targeting environmentally-conscious consumers.

5. **Impact on Pricing Strategies**:
   - The relationship between body style and fuel efficiency may impact pricing strategies. Models with higher fuel efficiency might be priced at a premium, appealing to buyers willing to invest more for long-term savings on fuel costs. Conversely, lower efficiency models might need to be competitively priced to attract cost-conscious consumers.

6. **Regulatory Compliance**:
   - Insights from fuel efficiency data can also help manufacturers ensure compliance with increasingly stringent environmental regulations. If specific body styles are lagging in fuel efficiency, it may prompt companies to invest in R&D to meet future regulatory standards.

7. **Opportunities for Innovation**:
   - The plot may reveal potential gaps in the market. If there's a noticeable absence of efficient options in a certain body style category, it might indicate an opportunity for innovation. For example, manufacturers could explore developing hybrids or electric versions of traditionally less efficient body styles.




##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.



Overall, the insights derived from the line plot of average fuel efficiency by body style provide valuable information for consumers, manufacturers, and marketers. They highlight trends in fuel efficiency, inform strategic decisions, and indicate areas for improvement and innovation within the automotive industry.

#### Chart - 11

In [None]:
# Chart - 11 visualization code


# Set the figure size for the plot
plt.figure(figsize=(12, 6))

# Create a grouped bar plot for Drive-Wheel Configuration and Fuel Type against Price
sns.barplot(x='drive-wheels', y='price', hue='fuel-type', data=df, palette='Set2')

# Add title and labels
plt.title('Price by Drive-Wheel Configuration and Fuel Type')
plt.xlabel('Drive-Wheel Configuration')
plt.ylabel('Average Price')

# Show the plot
plt.show()




##### 1. Why did you pick the specific chart?

The bar plot effectively compares the average price of vehicles across different drive-wheel configurations (rwd, fwd, 4wd) and distinguishes between fuel types (gas and diesel). This allows for an immediate visual assessment of how price varies with these categories.

##### 2. What is/are the insight(s) found from the chart?

Price Variations by Drive-Wheel Configuration:

RWD (Rear-Wheel Drive) vehicles appear to have the highest average price, followed by FWD (Front-Wheel Drive) and 4WD (Four-Wheel Drive). This suggests that RWD vehicles might be perceived as higher-end or performance-oriented, commanding a premium price.
Impact of Fuel Type:

Diesel vehicles generally have a higher average price compared to gasoline vehicles across all drive-wheel configurations. This could be due to the added cost of diesel engines or their positioning as more premium or efficient options.
Market Segmentation:

The data can help in identifying market segments. For instance, the higher prices of RWD vehicles suggest a target demographic willing to invest more in performance. In contrast, the lower prices of FWD and 4WD vehicles may appeal to cost-conscious consumers.
Reliability of Data:

The presence of error bars indicates that while average prices differ, there is variability within each category. This insight could prompt further investigation into the factors influencing pricing, such as vehicle features, brand reputation, or market demand.
Strategic Pricing Decisions:

The insights gained from this plot can guide strategic pricing decisions, marketing campaigns, and inventory management, ensuring that products are aligned with consumer expectations and market dynamics.
Consumer Preferences:

The pricing trends might reflect consumer preferences, indicating that certain configurations and fuel types are more desirable in the current market. This information can help manufacturers focus their marketing efforts on the most profitable segments.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

This bar plot provides a clear visualization of how average vehicle prices vary by drive-wheel configuration and fuel type, highlighting key insights that can inform pricing strategies and market positioning for automobile manufacturers and marketers.

#### Chart - 12

In [None]:
# Chart - 12 visualization code
# Count the occurrences of each body style
body_style_counts = df['body-style'].value_counts()

# Create a pie chart for body style
plt.figure(figsize=(8, 8))
plt.pie(body_style_counts, labels=body_style_counts.index, autopct='%1.1f%%', startangle=90, colors=['#ff9999','#66b3ff','#99ff99', '#ffcc99', '#c2c2f0'])

# Equal aspect ratio ensures that pie is drawn as a circle
plt.axis('equal')

# Add a title
plt.title('Distribution of Body Styles')

# Show the plot
plt.show()





##### 1. Why did you pick the specific chart?

A pie chart is ideal for representing the proportion of different categories within a whole. In this case, it effectively shows the distribution of various body styles of vehicles as parts of the overall dataset.



##### 2. What is/are the insight(s) found from the chart?

Dominance of Sedans: Sedans make up 46.8% of the total body styles, indicating that they are the most common choice among consumers. This suggests a strong market preference for sedans, likely due to their practicality, comfort, and versatility.

Significant Portion for Hatchbacks: Hatchbacks constitute 34.1% of the distribution, highlighting their popularity as a compact and versatile option. This insight may suggest that consumers value the balance between size and functionality that hatchbacks offer.

Minor Presence of Other Styles: Body styles like convertibles (2.9%) and hardtops (3.9%) represent a small fraction of the market, indicating they may be niche products with less broad consumer appeal. This could guide manufacturers in considering production and marketing strategies.

Wagon and Coupe Styles: Wagons (12.2%) also have a notable presence, which might appeal to families or consumers needing more cargo space. Coupled with the minor categories, these insights indicate specific market segments that might require tailored marketing approaches.

Market Strategy Implications: The distribution shows where demand lies, suggesting manufacturers and marketers might focus on enhancing sedan and hatchback offerings, while considering whether niche markets for convertibles and hardtops are worth pursuing.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

he pie chart provides clear and insightful data about the distribution of body styles in the automobile market, guiding manufacturers and marketers in understanding consumer preferences and optimizing their strategies accordingly.

#### Chart - 13

In [None]:
# Chart - 13 visualization code



# Group the data to get counts of body styles for each symboling category
body_style_counts = df.groupby(['body-style', 'symboling']).size().reset_index(name='count')

# Create a pie chart for each body style's distribution across insurance risk
num_body_styles = body_style_counts['body-style'].nunique()  # Get the number of unique body styles

# Determine the number of rows and columns needed for subplots
cols = 2  # Number of columns
rows = (num_body_styles + cols - 1) // cols  # Calculate rows needed

plt.figure(figsize=(10, rows * 5))  # Adjust the figure size based on the number of rows

# Loop through each body style and create a pie chart
for i, body_style in enumerate(body_style_counts['body-style'].unique()):
    plt.subplot(rows, cols, i + 1)  # Create a subplot for each body style
    data = body_style_counts[body_style_counts['body-style'] == body_style]

    plt.pie(data['count'], labels=data['symboling'], autopct='%1.1f%%', startangle=90)
    plt.title(f'Insurance Risk Distribution for {body_style}')

plt.tight_layout()
plt.show()



##### 1. Why did you pick the specific chart?

Each chart is straightforward, allowing stakeholders to quickly grasp the proportion of each risk category (e.g., high risk, medium risk, low risk) within each body style. This visual clarity aids in decision-making processes.

##### 2. What is/are the insight(s) found from the chart?

Convertible:

High Risk: A significant 83.3% of the insurance risks associated with convertibles fall into the high-risk category. This suggests that convertibles might be more prone to accidents or theft, which could lead insurers to increase premiums.
Hardtop:

Mixed Risk: The hardtop shows a more balanced distribution, with 50% in the medium risk category and the remaining split between low and high risks. This indicates that hardtops may represent a safer choice compared to convertibles, appealing to a different market segment.
Hatchback:

Diverse Risk Levels: The hatchback category displays a variety of risk levels, with 38.6% classified as medium risk and a substantial 28.6% as high risk. This variation suggests that while hatchbacks can be relatively safe, there are significant concerns about specific models.
Sedan:

Predominantly Low Risk: Sedans have the highest proportion (44.8%) in the low-risk category, indicating they are generally perceived as safer vehicles. The diverse distribution across risk levels may suggest varying designs or features among different sedan models.
Wagon:

High Low-Risk Percentage: With 60% categorized as low risk, wagons appear to be a favorable option for insurance, likely appealing to families or those seeking practical vehicles. However, the presence of 12% in the high-risk category could indicate specific models that are problematic.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The pie charts effectively communicate the risk distributions associated with different vehicle body styles. The insights gained from these visualizations can help insurance companies adjust their pricing models, tailor coverage options, and develop marketing strategies targeted toward specific consumer segments based on the risk profile of each body style.

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code


# Select a new set of numerical features for correlation analysis
new_correlation_features = df[['engine-size', 'city-mpg', 'highway-mpg', 'horsepower']]

# Calculate the correlation matrix
new_correlation_matrix = new_correlation_features.corr()

# Set up the matplotlib figure
plt.figure(figsize=(10, 8))

# Create a heatmap
sns.heatmap(new_correlation_matrix, annot=True, fmt=".2f", cmap='coolwarm', square=True, cbar_kws={"shrink": .8})

# Add title
plt.title('Correlation Heatmap of Engine Size, MPG, and Horsepower')

# Show the plot
plt.show()



##### 1. Why did you pick the specific chart?

Heatmaps quantify correlations, allowing for a more straightforward interpretation of how strongly each pair of variables is related. This is particularly useful in understanding complex interrelationships.

##### 2. What is/are the insight(s) found from the chart?

Insights from the Heatmap
Engine Size and Horsepower:

Strong Positive Correlation (Close to +1): There is a strong positive correlation between engine size and horsepower. This indicates that as the engine size increases, the horsepower also tends to increase, which aligns with expectations in automotive engineering. This relationship may influence consumer preferences toward larger engines for higher performance.
Engine Size and MPG:

Negative Correlation with City MPG: Engine size shows a negative correlation with city MPG. This suggests that larger engines tend to consume more fuel in city driving conditions, likely due to lower efficiency and increased weight.
Negative Correlation with Highway MPG: Similarly, there is a negative correlation with highway MPG, indicating that larger engines also tend to be less fuel-efficient on highways. This insight can inform consumers who prioritize fuel economy in their purchasing decisions.
Horsepower and MPG:

Negative Correlation with City MPG: Horsepower also demonstrates a negative correlation with city MPG, suggesting that vehicles with higher horsepower consume more fuel during city driving, possibly due to aggressive driving behaviors or larger engines.
Negative Correlation with Highway MPG: The negative correlation with highway MPG indicates that higher horsepower is associated with lower fuel efficiency on highways as well. This can be critical information for consumers looking for a balance between power and fuel economy.
Overall Trends:

Performance vs. Efficiency: The insights suggest a trade-off between vehicle performance (horsepower and engine size) and fuel efficiency (MPG). Consumers who prioritize power and performance may need to consider the implications on fuel consumption and costs.


#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code
sns.pairplot(df)


# Select relevant columns for the pairplot
pairplot_columns = ['engine-size', 'horsepower', 'city-mpg', 'price', 'drive-wheels']

# Create a pairplot
sns.pairplot(df[pairplot_columns], hue='drive-wheels', palette='coolwarm', markers=['o', 's', 'D'])

# Add title
plt.suptitle('Pairplot of Engine Size, Horsepower, City MPG, and Price by Drivetrain Configuration', y=1.02)

# Show the plot
plt.show()


##### 1. Why did you pick the specific chart?

Pairplots help detect interactions between variables and how they might behave under different categories, in this case, based on drivetrain configurations (e.g., RWD, FWD, AWD). By separating the data points by drivetrain, we can understand how different configurations affect key variables like price and performance.

##### 2. What is/are the insight(s) found from the chart?

### Insights from the Pairplot (Engine Size, Horsepower, City MPG, and Price by Drivetrain Configuration)

1. **Engine Size vs. Horsepower**:
   - **Strong Positive Correlation**: Across all drivetrain types (FWD, RWD, AWD), larger engine sizes tend to produce more horsepower. This suggests that vehicles with bigger engines generally deliver higher performance, making engine size a good predictor of horsepower.

2. **Engine Size vs. City MPG**:
   - **Negative Correlation**: Vehicles with larger engines have lower city MPG (miles per gallon), indicating reduced fuel efficiency. This trend holds across all drivetrain configurations, but the impact might differ slightly (e.g., FWD may offer slightly better fuel efficiency than RWD or AWD).

3. **Price vs. Horsepower**:
   - **Positive Correlation**: Higher horsepower vehicles are generally more expensive, regardless of drivetrain. This is likely because more powerful engines are associated with performance and luxury, which adds to the vehicle's market value.

4. **Price vs. Engine Size**:
   - **Positive Correlation**: Larger engine sizes also correlate with higher prices. Vehicles with bigger engines (especially in RWD or AWD configurations) tend to be positioned in premium segments, reflecting their higher cost.

5. **City MPG vs. Price**:
   - **Inverse Relationship**: There is a negative relationship between city MPG and price, meaning less fuel-efficient cars (lower city MPG) tend to be more expensive. This is expected as higher-performance or luxury vehicles often sacrifice fuel efficiency for power and features.

6. **Drivetrain-Specific Trends**:
   - **RWD Vehicles**: Rear-wheel-drive (RWD) vehicles generally exhibit a broader range of engine sizes and horsepower, often leading to higher prices. This may reflect their prevalence in luxury or performance cars.
   - **FWD Vehicles**: Front-wheel-drive (FWD) cars tend to have smaller engine sizes and are priced lower, likely because they are more fuel-efficient and cater to mass-market consumers.
   - **AWD Vehicles**: All-wheel-drive (AWD) cars fall in between but may show slightly higher prices due to their added functionality for off-road or performance handling.

### Conclusion:
The pairplot shows clear positive correlations between engine size, horsepower, and price, while revealing the trade-offs with fuel efficiency (city MPG). These relationships help manufacturers and businesses target specific market segments, such as luxury vehicles with higher horsepower or fuel-efficient models for cost-conscious consumers. Understanding these insights can guide pricing strategies, design choices, and marketing approaches tailored to different consumer preferences based on drivetrain configurations.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

### Business Recommendations Based on Data Analysis

After analyzing the entire dataset and charts (correlation heatmaps, pair plots, regression plots, etc.), the following recommendations can help the client achieve their business objectives:

---

### **1. Product Strategy:**
   - **Luxury vs. Economy Segmentation**:
     - **Luxury Vehicles**: Focus on models with larger engines and higher horsepower, as these tend to have higher prices and appeal to consumers seeking performance and status (often with **RWD** and **AWD** configurations). Highlight the premium aspects of such cars, including advanced features, power, and design.
     - **Economy Vehicles**: For fuel-efficient models (typically **FWD**), emphasize affordability and cost-effectiveness. These cars generally have smaller engines and lower horsepower, which could be marketed as eco-friendly and budget-conscious options.

---

### **2. Price Optimization:**
   - **Engine Size and Horsepower as Price Drivers**: Since price correlates positively with engine size and horsepower, the business can consider segmenting its pricing model based on these attributes. Premium models with higher horsepower and larger engine sizes should be priced accordingly, positioning them as high-value, performance-driven products.
   - **Fuel Type Impact**: Diesel vehicles with larger engines and higher prices (as seen in the bar plot) can be marketed for long-distance driving or specific high-end use cases, while gasoline models should cater to more general consumers.

---

### **3. Fuel Efficiency and Sustainability:**
   - **Targeting Eco-Conscious Consumers**: With the global shift towards sustainability, the business should invest in models with better fuel efficiency (higher MPG, especially city MPG), which is seen as inversely correlated with engine size and price. Hybrid and electric models should also be part of the future product lineup to appeal to this growing segment.

---

### **4. Product Design Focus:**
   - **Body Style Trends**: Sedans and hatchbacks dominate the market, making up the majority of sales, as indicated in the pie chart. Focusing on these body styles with variations (in fuel efficiency, power, and price) can capture a large market share.
   - **Wagon and Convertible Market**: Although wagons and convertibles have smaller shares, there might be niche markets where these models are highly profitable. Marketing convertibles as luxury or leisure cars and wagons for families or outdoor activities could open new revenue streams.

---

### **5. Drive-Wheel Configuration Insights:**
   - **RWD for Premium Segments**: Rear-wheel-drive (RWD) configurations generally have higher prices and cater to the luxury and performance segment. Continuing to develop premium features for these models will help differentiate the brand in this category.
   - **FWD for Mass-Market Appeal**: Front-wheel-drive (FWD) vehicles are more cost-effective and efficient, making them ideal for broad consumer bases. The company should prioritize affordability and fuel efficiency for FWD models, along with marketing campaigns that emphasize these attributes.

---

### **6. Customer Insurance Risk Insights:**
   - **Insurance-Based Offerings**: The insurance risk distribution suggests certain body styles, such as sedans and hatchbacks, have lower insurance risk ratings compared to others like convertibles. This insight could be leveraged to offer insurance packages or partner with insurance providers to give customers discounts on low-risk vehicles, improving the overall customer experience.

---

### **7. Negative Growth Concerns:**
   - **High Horsepower Models with Low MPG**: A potential challenge is that high-horsepower models (especially RWD and AWD) are less fuel-efficient. With growing concerns about fuel prices and environmental impact, these models may face lower demand if not strategically positioned. As a result, consider mitigating this risk by introducing **hybrid** or **electric versions** of premium, high-horsepower cars.

---

### **8. Focus on Differentiation:**
   - **Differentiation by Drivetrain**: The analysis of drive-wheel configurations and price shows that **RWD** and **AWD** are generally higher-priced due to performance. Focus on luxury features and driving experiences that can justify the higher price points, creating a unique selling proposition (USP) for these models.
   - **Fuel Type Differentiation**: Fuel type also plays a significant role, as diesel vehicles tend to be priced higher. Diesel engines can be targeted toward markets that prioritize long-distance driving efficiency and durability, while gasoline models should focus on everyday urban driving.

---

### **9. Data-Driven Marketing Strategy:**
   - **Body Style-Based Segmentation**: Since sedans and hatchbacks dominate the market, tailor marketing strategies to emphasize the strengths of each (comfort for sedans, versatility for hatchbacks). Use price data and performance insights to build campaigns targeting budget-conscious buyers for hatchbacks and mid-tier buyers for sedans.
   - **Price Sensitivity**: Adjust pricing strategies according to vehicle features that directly impact consumer perception of value, such as engine power, fuel efficiency, and drivetrain configuration.

---



# **Conclusion**

In summary, by strategically differentiating product lines based on **engine size**, **horsepower**, **fuel efficiency**, and **drivetrain configuration**, while addressing market trends for **luxury vs. economy vehicles**, the business can maximize revenue and market share. Introducing eco-friendly options and leveraging body style preferences, alongside risk-based insurance packages, can further enhance customer satisfaction and profitability.

These insights will help achieve a **positive business impact**, driving growth by catering to diverse consumer needs, pricing strategically, and maintaining competitiveness in an evolving automotive market.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***