## <center><font color='Red'>Data Analysis On Electric Vehicles</font></center>

# Table of Contents :
- 1. Introduction
- 2. Data Exploration
    - Missing Value Detection    
    - Important Observations
- 4. TASK A (Exploratory Data Analysis (EDA))
    - Univariate Analysis on Each and Every column or feature
    - Bivariate Analysis    
- 5. Task B (Choropleth)
- 6. Task C (bar_chart_race)


### Introduction
#### Electronic vehicles (EVs) are a type of automobile that use one or more electric motors powered by rechargeable batteries as their primary source of propulsion. They are an eco-friendly and sustainable alternative to traditional internal combustion engine (ICE) vehicles, which rely on fossil fuels for power. EVs have gained significant popularity in recent years due to their potential to reduce greenhouse gas emissions, improve air quality, and decrease dependence on fossil fuels.

### Loading the data

In [None]:
import pandas as pd
import plotly.express as px


## Data Exploration

In [None]:
## Load The Data
df_cars = pd.read_csv(r"/content/drive/MyDrive/dataset.csv")
df_cars.head()

In [None]:
### Checking for the column name and data types
df_cars.info()

## Detected Missing Values in "Model", "Legislative District", "Vehicle Location", "Electric Utility" features.

In [None]:
df_cars.dropna(axis=0, inplace=True)
df_cars.info()


### Model Feature Missing value treatment

In [None]:
mode_value = df_cars['Model'].mode()[0]  # Compute the mode of the 'Model' column

# Replace missing values in the 'Model' column with the mode_value
df_cars['Model'].fillna(mode_value, inplace=True)

# Display DataFrame information after filling missing values
df_cars.info()

### Legislative District Missing values treatment

In [None]:
mean_value = df_cars["Legislative District"].mean()

# Replace missing values in the 'Legislative District' column with the mean_value
df_cars["Legislative District"].fillna(mean_value, inplace=True)

# Display DataFrame information after filling missing values
df_cars.info()

### Vehicle Location Missing Values treament

In [None]:

# Extract latitude and longitude from the "Vehicle Location" column using regular expressions
df_cars['Longitude'] = df_cars['Vehicle Location'].str.extract(r'POINT \(([-\d.]+) ([-\d.]+)\)')[0]
df_cars['Latitude'] = df_cars['Vehicle Location'].str.extract(r'POINT \(([-\d.]+) ([-\d.]+)\)')[1]
# Convert the latitude and longitude columns to numeric (float) data type
df_cars['Longitude'] = pd.to_numeric(df_cars['Longitude'])
df_cars['Latitude'] = pd.to_numeric(df_cars['Latitude'])

df_cars.info()

In [None]:
df_cars['Electric Utility'].unique()

In [None]:
df_cars.drop('Electric Utility', axis=1, inplace=True)


# Exploratory Data Analysis

## Univariate Analysis

'Model Year' Univariate Analysis

In [None]:
fig = px.histogram(df_cars, x='Model Year', title='Hist plot for Model Year')
# Show the plot
fig.show()

### WE can Easily Observe that it left skewed. After 2015 only the usage of elecric vehicles are more.


In [None]:
# Create the box plot
fig = px.box(data_frame=df_cars, y='Model Year')

# Show the plot
fig.show()

### As We observed in the previous histogram that before 2010 are very less usage of electric cars

## Otlier treatment by caping

In [None]:
cleaned_EV_df = df_cars[(df_cars["Model Year"] > 2010)]
print(cleaned_EV_df.shape)
print(df_cars.shape)

In [None]:

# Create the box plot
fig = px.box(data_frame=df_cars, y='Model Year')

# Show the plot
fig.show()

'Electric Range' Univariate Analysis

In [None]:
fig = px.histogram(df_cars, x='Electric Range', title='Count Plot for Electric Range')
# Show the plot
fig.show()

In [None]:
fig = px.box(data_frame=df_cars, y='Electric Range')

# Show the plot
fig.show()

### Categorical Unicvariate Analysis

In [None]:
# Assuming the column you want to plot is 'County'
fig = px.histogram(df_cars, y='County', title='Count Plot for County')
# Show the plot
fig.show()

### King County has more Electric vehicles than the other countys.

In [None]:
# Assuming the column you want to plot is 'City'
fig = px.histogram(df_cars, y='City', title='Count Plot for City')
# Show the plot
fig.show()


Seatle city has using more Electical vehicles

In [None]:
# Assuming the column you want to plot is 'County'
fig = px.histogram(df_cars, y='Make', title='Count Plot for Make')
# Show the plot
fig.show()


Tesla is making more electrical vehicles

In [None]:

fig = px.histogram(df_cars ,y='Electric Vehicle Type', title='Count Plot for Electic Vehicle Type')
# Show the plot
fig.show()


Battey Elecrtic Vehicle are more in Count.

In [None]:
# Assuming the column you want to plot is 'County'
fig = px.histogram(df_cars, y='Model', title='Count Plot for Model')
# Show the plot
fig.show()


### Model 3 and Model Y are more in count

In [None]:
df_cars.corr()

In [None]:
heatmap = px.imshow(df_cars)

# Show the plot
heatmap.show()

## There is no Positive relation between any columns

In [None]:
# scatter plot
fig = px.scatter(x=df_cars['Model Year'],y=df_cars['Electric Range'])
fig.show()

In [None]:
fig = px.scatter(x=df_cars['Model Year'],y=df_cars['Base MSRP'])
fig.show()

## We can observe that Therre is No strog relation between Model Year and Base  Msrp.
## But As year increasing There is a little increase in price.
## Based on The Model Year also there may have change in price.

## Task B

In [None]:
df_cars.columns

In [None]:
location_counts


In [None]:
import plotly.express as px

state_counts = df_cars.groupby(['State', 'Model Year']).size().reset_index(name='EV Count')

state_counts = state_counts.merge(df_cars[['State', 'Model Year', 'Postal Code']].drop_duplicates(), on=['State', 'Model Year'])

fig = px.choropleth(
    state_counts,
    locations='State',
    color='EV Count',
    hover_name='State',
    hover_data=['Model Year', 'EV Count', 'Postal Code'],
    locationmode='USA-states',
    scope='usa',
    title='EV Vehicles by Postalcodes',
    animation_frame='Model Year',
    center={"lat": 37.0902, "lon": -95.7129},
)

fig.update_layout(
    height=600, width=800
)
fig.show()


## The above Choropleth graph displays the count of electric vehicles in different states of USA with Make of those vehicles.
## It also maps the Postal codes of those states.

## Task C: Racing BarPlots

In [None]:
# Step 1: Extract 'Year' from 'Model Year' column
df_cars['Year'] = df_cars['Model Year']

# Step 2: Extract 'Maker' from 'Make' column
df_cars['Maker'] = df_cars['Make']

# Step 3: Calculate the count of EV vehicles for each year and maker combination
count_data = df_cars.groupby(['Year', 'Maker']).size().reset_index(name='Count')

count_data

In [None]:
import plotly.express as px

make_count_by_year = df_cars.groupby(['Make', 'Model Year']).size().reset_index(name='Count')


fig = px.bar(make_count_by_year,
             x='Count',
             y='Make',
             animation_frame='Model Year',
             color='Make',
             labels={'Make': 'Electric Vehicle Make', 'Count': 'Count of EV Vehicles'},
             title='Racing Bar Plot of EV Vehicles by Make and Model Year')

fig.update_layout(
    xaxis=dict(title='Electric Vehicle Make'),
    yaxis=dict(title='Count of EV Vehicles'),
    showlegend=False,
    height=600, width=800
)

fig.show()


In [None]:
!pip install bar_chart_race

In [None]:
import bar_chart_race as bcr

# Group data by 'Make' and 'Model Year' and count the number of EV vehicles for each combination
px_make_count_by_year = df_cars.groupby(['Make', 'Model Year']).size().reset_index(name='Count')

# Pivot the data to have 'Make' as columns and 'Model Year' as index
px_make_count_pivot = px_make_count_by_year.pivot(index='Model Year', columns='Make', values='Count')

# Create the bar chart race using the pivoted DataFrame
bcr.bar_chart_race(df=px_make_count_pivot, filename='px_make_count_by_year_plot.mp4',title='Racing Bar Plot of EV Vehicles by Make and Model Year')


In [None]:
from IPython.display import HTML, Video
video_path = '/content/px_make_count_by_year_plot.mp4'
Video(video_path, embed=True)


### The above bar_chart_race gives Count of electronic Vehicles by make and model year