<a href="https://www.kaggle.com/code/rishijmanna/ev-sale?scriptVersionId=222286272" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

# **Introduction**

"The automotive world is experiencing a seismic shift towards sustainability, driven by the accelerating adoption of electric vehicles (EVs). This notebook embarks on a journey to dissect the global EV sales landscape, focusing on trends at both continental and national levels. We delve into the data to uncover the leaders and laggards in EV sales, visualize the evolving patterns using a variety of charts, and even map the geographical distribution of EV adoption.

Our primary objective is to answer critical questions about the EV revolution:

****Mounting Google Drive****

In [1]:
#USE THIS IF YOU ARE USING COLLAB
#google.colab import drive
#drive.mount('/content/drive')

# Importing Necessary

In [2]:
import pandas as pd
import numpy as np
import warnings

## Importing Dataset

## Source of Dataset:
The following dataset is downloaded from Kaggle and has been reused

Link Given Below:

[IEA Global EV Data 2024.csv](https://www.kaggle.com/code/waqi786/global-ev-sales-analysis-2010-2024/input)

In [3]:
df=pd.read_csv("/kaggle/input/ev-sales/IEA Global EV Data 2024.csv")
df.head()

Unnamed: 0,region,category,parameter,mode,powertrain,year,unit,value
0,Australia,Historical,EV stock share,Cars,EV,2011,percent,0.00039
1,Australia,Historical,EV sales share,Cars,EV,2011,percent,0.0065
2,Australia,Historical,EV sales,Cars,BEV,2011,Vehicles,49.0
3,Australia,Historical,EV stock,Cars,BEV,2011,Vehicles,49.0
4,Australia,Historical,EV stock,Cars,BEV,2012,Vehicles,220.0


# Analyzing  Dataset

<h2>Data Loading and Initial Exploration</h2>
<p>

In this section, we begin by loading the Global EV Sales Dataset using pandas and performing an initial inspection of the data. The dataset contains 12,654 rows and 8 columns, providing a substantial amount of data for analysis. Each row represents data related to electric vehicles (EVs), categorized by year, vehicle type (mode), and powertrain. This data includes sales, stock, and stock share information, enabling a comprehensive view of the EV landscape across various regions.
</p>
<h3>The Columns in the Dataset Include:</h3>
<ul>
  <li><b>category</b>: Categorizes the data as "Historical" or "Projected," indicating whether it reflects past data or future projections.</li>
  <li><b>parameter</b>: Specifies the metric being measured, such as "EV sales," "EV stock," or "charging points," allowing us to focus on specific aspects of the EV market.</li>
  <li><b>mode</b>: Identifies the type of vehicle, encompassing "cars," "buses," "trucks," "vans," "two/three-wheelers," offering insights into the adoption of EVs in different vehicle segments.</li>
  <li><b>powertrain</b>: Indicates the specific type of drivetrain technology, primarily "BEV" (Battery Electric Vehicle) or "PHEV" (Plug-in Hybrid Electric Vehicle), reflecting the diverse powertrain options in the EV market.</li>
  <li><b>year</b>: Provides the year of the data point, enabling tracking of EV trends and growth over time.</li>
  <li><b>unit</b>: Specifies the unit of measurement for the corresponding value, which can be "Vehicles," "percent," "charging points," "GWh," "Million barrels per day," or "Oil displacement, million lge," indicating the variety of metrics being analyzed.</li>

  <li><b>value</b>: Holds the numerical value associated with the parameter and unit, representing the data point itself.</li>

</ul>



<h3>Simplifying the Analysis:</h3>



<p>

While the dataset covers a wide range of metrics, we will refine our analysis by focusing on records with the unit "Vehicles" to analyze and visualize the trends of absolute numbers such as sales and stock. In our initial exploration, we found that percentage values might distort the overall interpretation of sales, and these rows were filtered out before plotting visualizations for data storytelling.

</p>

# `Target Findings` -
1. Representing all Unique attributes of unit column with respect to Region  

2. Representing the overall trend of different parameter (e.g., EV sales, EV stock share, EV stock) at different regions over the years  

3. Representing the distribution of charging points across different regions or countries  

4. Representing the distribution of a particular powertrain type (e.g., Electric, Hybrid) across different regions or countries over various modes (car, EV, etc.)  

5. Representing change in total number of each unique powertrain with year  

6. Representing the distribution of each unique parameter by count  

7. Representing the distribution of parameters (EV sales, EV stock) by mode type  

8. Representing the relation of parameters (EV sales, EV stock) by Region  

9. Representing how the popularity of different mode (vehicle) types varies across regions or countries  

10. Representing how the popularity of different mode (vehicle) types varies across regions or countries over years  

11. Grouping by Continent and representing the total value continent-wise  




In [4]:
display(df.info())
display(df.shape)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12654 entries, 0 to 12653
Data columns (total 8 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   region      12654 non-null  object 
 1   category    12654 non-null  object 
 2   parameter   12654 non-null  object 
 3   mode        12654 non-null  object 
 4   powertrain  12654 non-null  object 
 5   year        12654 non-null  int64  
 6   unit        12654 non-null  object 
 7   value       12654 non-null  float64
dtypes: float64(1), int64(1), object(6)
memory usage: 791.0+ KB


None

(12654, 8)

#### Presenting Unique Value of each column

In [5]:
for column in df.columns:
    unique_values = df[column].unique()
    print(f"Unique values for column '{column}':")
    print(unique_values)
    print("\n")

Unique values for column 'region':
['Australia' 'Austria' 'Belgium' 'Brazil' 'Bulgaria' 'Canada' 'Chile'
 'China' 'Colombia' 'Costa Rica' 'Croatia' 'Cyprus' 'Czech Republic'
 'Denmark' 'Estonia' 'EU27' 'Europe' 'Finland' 'France' 'Germany' 'Greece'
 'Hungary' 'Iceland' 'India' 'Indonesia' 'Ireland' 'Israel' 'Italy'
 'Japan' 'Korea' 'Latvia' 'Lithuania' 'Luxembourg' 'Mexico' 'Netherlands'
 'New Zealand' 'Norway' 'Poland' 'Portugal' 'Rest of the world' 'Romania'
 'Seychelles' 'Slovakia' 'Slovenia' 'South Africa' 'Spain' 'Sweden'
 'Switzerland' 'Thailand' 'Turkiye' 'United Arab Emirates'
 'United Kingdom' 'USA' 'World']


Unique values for column 'category':
['Historical' 'Projection-STEPS' 'Projection-APS']


Unique values for column 'parameter':
['EV stock share' 'EV sales share' 'EV sales' 'EV stock'
 'EV charging points' 'Electricity demand' 'Oil displacement Mbd'
 'Oil displacement, million lge']


Unique values for column 'mode':
['Cars' 'EV' 'Buses' 'Vans' 'Trucks']


Unique values

Checking For Null Values

In [6]:
df.isnull().sum()

region        0
category      0
parameter     0
mode          0
powertrain    0
year          0
unit          0
value         0
dtype: int64

Here we can Clearly see that there is no Null value


# Analyzing Electric Vehicle Data

First, we filter the dataset to include only those records where the unit of measurement is categorized as uniquely in the `'unit'` column. This step allows us to narrow down our analysis to relevant data.




In [7]:
#For vehicles
vehicles=df.loc[df['unit']=='Vehicles']
vehicles.drop('unit',axis=1,inplace=True)
display(vehicles.head())
display(vehicles.shape)
display(vehicles.describe())
warnings.filterwarnings("ignore")

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  vehicles.drop('unit',axis=1,inplace=True)


Unnamed: 0,region,category,parameter,mode,powertrain,year,value
2,Australia,Historical,EV sales,Cars,BEV,2011,49.0
3,Australia,Historical,EV stock,Cars,BEV,2011,49.0
4,Australia,Historical,EV stock,Cars,BEV,2012,220.0
5,Australia,Historical,EV sales,Cars,BEV,2012,170.0
8,Australia,Historical,EV stock,Cars,PHEV,2012,80.0


(6842, 7)

Unnamed: 0,year,value
count,6842.0,6842.0
mean,2019.883221,750380.3
std,5.352174,9307153.0
min,2010.0,0.001
25%,2016.0,67.25
50%,2020.0,1200.0
75%,2022.0,22000.0
max,2035.0,440000000.0


In [8]:
# For 'percent'
percent = df.loc[df['unit'] == 'percent']
percent.drop('unit',axis=1,inplace=True)
display(percent.head())
display(percent.shape)
display(percent.describe())
warnings.filterwarnings("ignore")

Unnamed: 0,region,category,parameter,mode,powertrain,year,value
0,Australia,Historical,EV stock share,Cars,EV,2011,0.00039
1,Australia,Historical,EV sales share,Cars,EV,2011,0.0065
6,Australia,Historical,EV sales share,Cars,EV,2012,0.03
7,Australia,Historical,EV stock share,Cars,EV,2012,0.0024
12,Australia,Historical,EV stock share,Cars,EV,2013,0.0046


(3171, 7)

Unnamed: 0,year,value
count,3171.0,3171.0
mean,2018.962157,4.958949
std,5.228093,12.213339
min,2010.0,1.5e-05
25%,2015.0,0.069
50%,2019.0,0.48
75%,2022.0,3.0
max,2035.0,93.0


In [9]:
# For 'charging points'
print('charging points')
charging_points = df.loc[df['unit'] == 'charging points']
charging_points.drop('unit',axis=1,inplace=True)
display(charging_points.head())
display(charging_points.shape)
display(charging_points.describe())
warnings.filterwarnings("ignore")

charging points


Unnamed: 0,region,category,parameter,mode,powertrain,year,value
38,Australia,Historical,EV charging points,EV,Publicly available fast,2017,40.0
39,Australia,Historical,EV charging points,EV,Publicly available slow,2017,440.0
44,Australia,Historical,EV charging points,EV,Publicly available fast,2018,61.0
45,Australia,Historical,EV charging points,EV,Publicly available slow,2018,670.0
54,Australia,Historical,EV charging points,EV,Publicly available slow,2019,1700.0


(918, 7)

Unnamed: 0,year,value
count,918.0,918.0
mean,2019.245098,274721.5
std,4.731833,1201415.0
min,2010.0,0.1
25%,2016.0,252.5
50%,2019.0,2800.0
75%,2022.0,31750.0
max,2035.0,15000000.0


In [10]:
# For 'GWh'
print('GWh')
gwh = df.loc[df['unit'] == 'GWh']
gwh.drop('unit',axis=1,inplace=True)
display(gwh.head())
display(gwh.shape)
display(gwh.describe())
warnings.filterwarnings("ignore")

GWh


Unnamed: 0,region,category,parameter,mode,powertrain,year,value
1071,China,Historical,Electricity demand,Buses,EV,2010,150.0
1072,China,Historical,Electricity demand,Vans,EV,2010,3.0
1073,China,Historical,Electricity demand,Cars,EV,2010,46.0
1120,China,Historical,Electricity demand,Buses,EV,2011,110.0
1121,China,Historical,Electricity demand,Vans,EV,2011,2.5


(551, 7)

Unnamed: 0,year,value
count,551.0,551.0
mean,2021.593466,30393.99
std,6.40638,117491.2
min,2010.0,0.0039
25%,2018.0,300.0
50%,2021.0,2000.0
75%,2025.0,15000.0
max,2035.0,1600000.0


In [11]:
# For 'Million barrels per day'
print('Milion barrels per day')
million_barrels_per_day = df.loc[df['unit'] == 'Milion barrels per day']
million_barrels_per_day.drop('unit',axis=1,inplace=True)
display(million_barrels_per_day.head())
display(million_barrels_per_day.shape)
display(million_barrels_per_day.describe())
warnings.filterwarnings("ignore")

Milion barrels per day


Unnamed: 0,region,category,parameter,mode,powertrain,year,value
1090,China,Historical,Oil displacement Mbd,Buses,EV,2010,0.00023
1091,China,Historical,Oil displacement Mbd,Vans,EV,2010,1e-05
1092,China,Historical,Oil displacement Mbd,Cars,EV,2010,2.9e-05
1096,China,Historical,Oil displacement Mbd,Buses,EV,2011,0.00024
1097,China,Historical,Oil displacement Mbd,Vans,EV,2011,1.3e-05


(586, 7)

Unnamed: 0,year,value
count,586.0,586.0
mean,2021.411263,0.144178
std,6.297997,0.588919
min,2010.0,1e-06
25%,2017.0,0.000603
50%,2021.0,0.00855
75%,2023.0,0.05
max,2035.0,7.8


In [12]:
# For 'Oil displacement, million lge'
print('Oil displacement, million lge')
oil_displacement = df.loc[df['unit'] == 'Oil displacement, million lge']
oil_displacement.drop('unit',axis=1,inplace=True)
display(oil_displacement.head())
display(oil_displacement.shape)
display(oil_displacement.describe())
warnings.filterwarnings("ignore")

Oil displacement, million lge


Unnamed: 0,region,category,parameter,mode,powertrain,year,value
1093,China,Historical,"Oil displacement, million lge",Buses,EV,2010,14.0
1094,China,Historical,"Oil displacement, million lge",Vans,EV,2010,0.62
1095,China,Historical,"Oil displacement, million lge",Cars,EV,2010,1.7
1099,China,Historical,"Oil displacement, million lge",Buses,EV,2011,15.0
1100,China,Historical,"Oil displacement, million lge",Vans,EV,2011,0.81


(586, 7)

Unnamed: 0,year,value
count,586.0,586.0
mean,2021.411263,8419.437285
std,6.297997,34255.54534
min,2010.0,0.069
25%,2017.0,35.0
50%,2021.0,500.0
75%,2023.0,3000.0
max,2035.0,450000.0


In [13]:
df['mode'].unique()

array(['Cars', 'EV', 'Buses', 'Vans', 'Trucks'], dtype=object)

## The aim of this is to analyze and display the frequency counts of different modes of operation, energy units, or metrics categorized by specific units from a dataset. Specifically, it extracts and counts the values in the 'mode' column for each of the unique units

In [14]:
# For 'Vehicles'
print()
print('Vehicles')
mode_vehicles = vehicles['mode'].value_counts()
display(mode_vehicles)

# For 'Percent'
print()
print('percent')
percent_mode = percent['mode'].value_counts()
display(percent_mode)

# For 'charging points'
print()
print('charging points')
charging_points_mode = charging_points['mode'].value_counts()
display(charging_points_mode)

# For 'GWh'
print()
print('GWh')
gwh_mode = gwh['mode'].value_counts()
display(gwh_mode)

# For 'Million barrels per day'
print()
print('Milion barrels per day')
million_barrels_per_day_mode = df.loc[df['unit'] == 'Milion barrels per day']['mode'].value_counts()
display(million_barrels_per_day_mode)

# For 'Oil displacement, million lge'
print()
print('Oil displacement, million lge')
oil_displacement_mode = df.loc[df['unit'] == 'Oil displacement, million lge']['mode'].value_counts()
display(oil_displacement_mode)


Vehicles


mode
Cars      2975
Buses     1485
Vans      1449
Trucks     933
Name: count, dtype: int64


percent


mode
Cars      1236
Buses      739
Vans       716
Trucks     480
Name: count, dtype: int64


charging points


mode
EV    918
Name: count, dtype: int64


GWh


mode
Cars      159
Buses     148
Vans      125
Trucks    119
Name: count, dtype: int64


Milion barrels per day


mode
Cars      168
Buses     162
Vans      139
Trucks    117
Name: count, dtype: int64


Oil displacement, million lge


mode
Cars      168
Buses     162
Vans      139
Trucks    117
Name: count, dtype: int64

# Representing the data in 2way crosstab

In [15]:
pd.crosstab(index=df['mode'],columns=df['unit'],normalize=True,dropna=True)

unit,GWh,Milion barrels per day,"Oil displacement, million lge",Vehicles,charging points,percent
mode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Buses,0.011696,0.012802,0.012802,0.117354,0.0,0.058401
Cars,0.012565,0.013276,0.013276,0.235104,0.0,0.097677
EV,0.0,0.0,0.0,0.0,0.072546,0.0
Trucks,0.009404,0.009246,0.009246,0.073732,0.0,0.037933
Vans,0.009878,0.010985,0.010985,0.114509,0.0,0.056583


# **VISUALISATION**


### 1. Total Veichle by Region

This code generates an interactive bar chart using Plotly to display the top regions based on the total number of vehicles. The user is prompted to input the number of top regions they want to visualize. The code then counts the vehicles in each region and creates a DataFrame to store the regions and their corresponding vehicle counts. It filters the data to include only the top `n` regions, as specified by the user. A color map is applied to assign unique colors to each region in the chart. The bar chart is created with regions on the x-axis and the total vehicle count on the y-axis. The bars are color-coded, and the hover information shows the region and vehicle count. The layout is customized with titles, axis labels, and adjusted margins, and the x-axis is sorted in descending order by the number of vehicles. This results in an interactive and visually appealing chart that highlights the most populated regions in terms of vehicles.

In [16]:
import plotly.graph_objects as go
# User input for number of top regions
top_n = int(input("Enter the number of top regions you want to display: "))
region_counts = vehicles['region'].value_counts()
bar_data = pd.DataFrame({'region': region_counts.index, 'Total Vehicles': region_counts.values})

# Filter the top N regions based on user input
bar_data = bar_data.head(top_n)
# Create a color map for regions

color_map = {region: i for i, region in enumerate(bar_data['region'].unique())}
fig = go.Figure(data=[go.Bar(
    x=bar_data['region'],
    y=bar_data['Total Vehicles'],
    text=bar_data['Total Vehicles'],
    textposition='auto',
    marker_color=[color_map[region] for region in bar_data['region']],
    hovertemplate="<b>Region: %{x}</b><br>Total Vehicles: %{y}<extra></extra>"
)])
fig.update_layout(
    title='Top Regions by Total Vehicles',
    xaxis_title='Region',
    yaxis_title='Total Vehicles',
    xaxis_tickangle=90,
    xaxis={'categoryorder': 'total descending'},
    font=dict(size=10),
    margin=dict(l=30, r=30, t=30, b=30)
)
fig.show()

Enter the number of top regions you want to display:  8


### 2.Total Percent by Region

This code generates a bar chart using Plotly to display the top regions based on the total percentage values from the `percent` DataFrame. The user is prompted to enter the number of top regions to visualize. The data is processed by counting the occurrences of each region, and a DataFrame is created containing the regions and their total percentages. This DataFrame is then filtered to display only the top `n` regions specified by the user. A color map is applied to each region, giving each a unique color. A bar chart is then created with the regions on the x-axis and the total percentages on the y-axis. The bars are color-coded, and hover information displays the region and total percentage. The chart layout is customized with titles, axis labels, a rotated x-axis for better readability, and a sorted order based on total percentages.



For any future codes you share, I'll provide a similar concise summary in paragraph form.

In [17]:
import plotly.graph_objects as go
# User input for number of top regions
top_n = int(input("Enter the number of top regions you want to display: "))
region_counts = percent['region'].value_counts()
bar_data = pd.DataFrame({'region': region_counts.index, 'Total ': region_counts.values})
# Filter the top N regions based on user input
bar_data = bar_data.head(top_n)
# Create a color map for region
color_map = {region: i for i, region in enumerate(bar_data['region'].unique())}
fig = go.Figure(data=[go.Bar(
    x=bar_data['region'],
    y=bar_data['Total '],
    text=bar_data['Total '],
    textposition='auto',
    marker_color=[color_map[region] for region in bar_data['region']],
    hovertemplate="<b>Region: %{x}</b><br>Total : %{y}<extra></extra>"
)])



fig.update_layout(
    title='Top Regions by Total percent',
    xaxis_title='Region',
    yaxis_title='Total percent',
    xaxis_tickangle=90,
    xaxis={'categoryorder': 'total descending'},
    font=dict(size=10),
    margin=dict(l=30, r=30, t=30, b=30)
)
fig.show()

Enter the number of top regions you want to display:  8


### 3.Total  charging_points by Region

This code generates a bar chart using Plotly to visualize the top regions based on the total number of charging points. The user is prompted to input the number of top regions they want to display. It first counts the occurrences of each region in the `charging_points['region']` column and creates a DataFrame to store the regions and their respective counts. The DataFrame is then filtered to include only the top `n` regions based on the user's input. A color map is assigned to each region to give it a unique color. The bar chart is created with regions on the x-axis and the total number of charging points on the y-axis. The bars are color-coded based on the color map, and hover information displays the region and the corresponding number of charging points. The layout is customized with titles, axis labels, rotated x-axis for better readability, and sorting of regions by total charging points in descending order.

In [18]:
import plotly.graph_objects as go
import pandas as pd

# User input for number of top regions
top_n = int(input("Enter the number of top regions you want to display: "))

region_counts = charging_points['region'].value_counts()
bar_data = pd.DataFrame({'region': region_counts.index, 'Total ': region_counts.values})

# Filter the top N regions based on user input
bar_data = bar_data.head(top_n)

# Create a color map for regions
color_map = {region: i for i, region in enumerate(bar_data['region'].unique())}

fig = go.Figure(data=[go.Bar(
    x=bar_data['region'],
    y=bar_data['Total '],
    text=bar_data['Total '],
    textposition='auto',
    marker_color=[color_map[region] for region in bar_data['region']],
    hovertemplate="<b>Region: %{x}</b><br>Total : %{y}<extra></extra>"
)])

fig.update_layout(
    title='Top Regions by Total charging_points',
    xaxis_title='Region',
    yaxis_title='Total charging_points',
    xaxis_tickangle=90,
    xaxis={'categoryorder': 'total descending'},
    font=dict(size=10),
    margin=dict(l=30, r=30, t=30, b=30)
)

fig.show()

Enter the number of top regions you want to display:  7


### 4.Total  gwh by Region

This code generates a bar chart using Plotly to visualize the top regions based on the total GWH (Gigawatt hours) from the `gwh` DataFrame. The user is prompted to input the number of top regions they wish to display. The code counts the occurrences of each region using `value_counts()` and creates a DataFrame to store the regions along with their respective total GWH values. It then filters the data to display only the top `n` regions based on the user's input. A color map is applied, assigning a unique color to each region. The bar chart is created with regions on the x-axis and the total GWH values on the y-axis. The bars are color-coded, and hover information is displayed with the region and its corresponding total GWH. The layout is customized with titles, axis labels, rotated x-axis labels for better readability, and the sorting of regions by total GWH in descending order.

In [19]:
import plotly.graph_objects as go
# User input for number of top regions
top_n = int(input("Enter the number of top regions you want to display: "))
region_counts = gwh['region'].value_counts()
bar_data = pd.DataFrame({'region': region_counts.index, 'Total ': region_counts.values})

# Filter the top N regions based on user input
bar_data = bar_data.head(top_n)

# Create a color map for regions
color_map = {region: i for i, region in enumerate(bar_data['region'].unique())}
fig = go.Figure(data=[go.Bar(
    x=bar_data['region'],
    y=bar_data['Total '],
    text=bar_data['Total '],
    textposition='auto',
    marker_color=[color_map[region] for region in bar_data['region']],
    hovertemplate="<b>Region: %{x}</b><br>Total : %{y}<extra></extra>"
)])
fig.update_layout(
    title='Top Regions by Total GWH',
    xaxis_title='Region',
    yaxis_title='Total GWH',
    xaxis_tickangle=90,
    xaxis={'categoryorder': 'total descending'},
    font=dict(size=10),
    margin=dict(l=30, r=30, t=30, b=30)
)

fig.show()

Enter the number of top regions you want to display:  8


### 5. Total  million_barrels_per_day by Region

This code generates a bar chart using Plotly to visualize the top regions based on the total "million barrels per day" data from the `million_barrels_per_day` DataFrame. The user is asked to input the number of top regions they want to display. The data is processed by counting the occurrences of each region using the `value_counts()` method, and a new DataFrame is created with the regions and their corresponding counts (representing total barrels per day). The code then filters the data to display only the top `n` regions as per the user's input. A color map is applied, assigning a unique color to each region. The bar chart is created with the regions on the x-axis and the total barrels per day on the y-axis. The bars are color-coded, and hover information displays the region and its corresponding total. The layout is customized with a title, axis labels, rotated x-axis labels for better readability, and sorting by total barrels per day in descending order.

In [20]:
import plotly.graph_objects as go
# Assuming million_barrels_per_day is the DataFrame containing the data
# User input for number of top regions
top_n = int(input("Enter the number of top regions you want to display: "))

# Count occurrences of each region
region_counts = million_barrels_per_day['region'].value_counts()
bar_data = pd.DataFrame({'region': region_counts.index, 'Total': region_counts.values})  # Remove space after 'Total'
# Filter the top N regions based on user input
bar_data = bar_data.head(top_n)

# Create a color map for regions
color_map = {region: i for i, region in enumerate(bar_data['region'].unique())}

# Create a bar chart
fig = go.Figure(data=[go.Bar(
    x=bar_data['region'],
    y=bar_data['Total'],
    text=bar_data['Total'],
    textposition='auto',
    marker_color=[color_map[region] for region in bar_data['region']],
    hovertemplate="<b>Region: %{x}</b><br>Total: %{y}<extra></extra>"
)])

# Update the layout of the chart
fig.update_layout(
    title='Top Regions by Total Million Barrels Per Day',
    xaxis_title='Region',
    yaxis_title='Total Million Barrels Per Day',
    xaxis_tickangle=90,
    xaxis={'categoryorder': 'total descending'},
    font=dict(size=10),
    margin=dict(l=30, r=30, t=30, b=30)
)

# Show the chart
fig.show()

Enter the number of top regions you want to display:  9


### 6. Total oil_displacement by region

This code generates a bar chart using Plotly to visualize the top regions based on the total "oil displacement" data from the `oil_displacement` DataFrame. The user is prompted to input the number of top regions they wish to display. The code processes the data by counting the occurrences of each region using the `value_counts()` method, creating a new DataFrame containing the regions and their corresponding total oil displacement values. It then filters the data to display only the top `n` regions as specified by the user. A color map is applied, assigning a unique color to each region. The bar chart is created with regions on the x-axis and the total oil displacement on the y-axis. The bars are color-coded based on the color map, and hover information is provided, displaying the region and its corresponding total oil displacement. The layout is customized with a title, axis labels, rotated x-axis labels for better readability, and the regions are sorted in descending order by total oil displacement.

In [21]:
import plotly.graph_objects as go
# Ensure the oil_displacement DataFrame is loaded (example)
# oil_displacement = pd.read_csv("path_to_data.csv")  # Uncomment and replace with actual path if necessary
# User input for number of top regions
top_n = int(input("Enter the number of top regions you want to display: "))
# Generate region counts
region_counts = oil_displacement['region'].value_counts()

# Create DataFrame for bar chart
bar_data = pd.DataFrame({'region': region_counts.index, 'Total': region_counts.values})
# Filter the top N regions based on user input
bar_data = bar_data.head(top_n)

# Create a color map for regions (using a sequential color scale for better differentiation)
color_map = {region: i for i, region in enumerate(bar_data['region'].unique())}
# Create bar chart
fig = go.Figure(data=[go.Bar(
    x=bar_data['region'],
    y=bar_data['Total'],
    text=bar_data['Total'],
    textposition='auto',
    marker_color=[color_map[region] for region in bar_data['region']],  # Color by region
    hovertemplate="<b>Region: %{x}</b><br>Total Oil Displacement: %{y}<extra></extra>"  # Improved hover text
)])



# Update layout for the figure
fig.update_layout(
    title='Top Regions by Total Oil Displacement',
    xaxis_title='Region',
    yaxis_title='Total Oil Displacement',
    xaxis_tickangle=90,
    xaxis={'categoryorder': 'total descending'},  # Order bars from highest to lowest
    font=dict(size=10),
    margin=dict(l=30, r=30, t=40, b=50)  # Adjusted margins for better spacing
)

# Show the plot
fig.show()

Enter the number of top regions you want to display:  12


# GENERAL TRENDS

## <font color=red> *1.Overall trend of different parameter (eg. EV sales, EV stock share,EV stock ) at different region over the years*

### **Aim of the Code:**

The aim of this code is to visualize the trend of a specific parameter (e.g., EV sales, charging points) over the years for a chosen country using an interactive line plot. The user can dynamically select both the country and the parameter, and the code will filter the data accordingly to display the corresponding trend.

In [22]:
import plotly.graph_objects as go
# Display available countries and parameters
print("Available COUNTRIES: ", df['region'].unique())
country = input("Enter the country name: ")
print("Available PARAMETERS: ", df['parameter'].unique())
aspect = input("Enter the parameter name: ")

# Filter data based on user input
filtered_data = df[(df['region'] == country) & (df['parameter'] == aspect)]

# Group filtered data by year
sales_by_year = filtered_data.groupby('year')['value'].sum().reset_index()

# Add custom data for hovertemplate
sales_by_year['country'] = country
sales_by_year['aspect'] = aspect

# Create a line plot
fig = go.Figure()
fig.add_trace(go.Scatter(
    x=sales_by_year['year'],
    y=sales_by_year['value'],
    mode='lines+markers',
    name=f'{country} - {aspect}',
    customdata=sales_by_year[['country', 'aspect']],  # Adding custom data
    hovertemplate="<b>%{customdata[0]}</b><br><b>Year: %{x}</b><br>%{customdata[1]}: %{y}<extra></extra>"
))
# Update layout
fig.update_layout(
    title=f'{country} - {aspect} Trend Over Years',
    xaxis_title='Year',
    yaxis_title=f'Total {aspect}',
    xaxis_tickangle=0,
    font=dict(size=10),
    margin=dict(l=30, r=30, t=30, b=30)
)
fig.show()

Available COUNTRIES:  ['Australia' 'Austria' 'Belgium' 'Brazil' 'Bulgaria' 'Canada' 'Chile'
 'China' 'Colombia' 'Costa Rica' 'Croatia' 'Cyprus' 'Czech Republic'
 'Denmark' 'Estonia' 'EU27' 'Europe' 'Finland' 'France' 'Germany' 'Greece'
 'Hungary' 'Iceland' 'India' 'Indonesia' 'Ireland' 'Israel' 'Italy'
 'Japan' 'Korea' 'Latvia' 'Lithuania' 'Luxembourg' 'Mexico' 'Netherlands'
 'New Zealand' 'Norway' 'Poland' 'Portugal' 'Rest of the world' 'Romania'
 'Seychelles' 'Slovakia' 'Slovenia' 'South Africa' 'Spain' 'Sweden'
 'Switzerland' 'Thailand' 'Turkiye' 'United Arab Emirates'
 'United Kingdom' 'USA' 'World']


Enter the country name:  India


Available PARAMETERS:  ['EV stock share' 'EV sales share' 'EV sales' 'EV stock'
 'EV charging points' 'Electricity demand' 'Oil displacement Mbd'
 'Oil displacement, million lge']


Enter the parameter name:  EV sales


# Charging Infrastructure

# <font color=red> *2.Distribution of charging points across different regions*

## **Aim of the Code:**

The aim of this code is to visualize the distribution of charging points across different regions or countries for a specific year or range of years using an interactive choropleth map. The map dynamically highlights the number of charging points in each region based on user input, providing a clear geographical overview of charging infrastructure.

In [23]:
import pandas as pd
import plotly.express as px

# Ask the user for a year or range of years
year_input = input("Enter the year or range of years (e.g., '2022' or '2020-2023'): ")

# Parse the input to determine if it's a single year or a range
if '-' in year_input:
    start_year, end_year = map(int, year_input.split('-'))
    filtered_data = df[(df['unit'] == 'charging points') & (df['year'] >= start_year) & (df['year'] <= end_year)]
else:
    year = int(year_input)
    filtered_data = df[(df['unit'] == 'charging points') & (df['year'] == year)]

# Group by region and sum the charging points
region_sums = filtered_data.groupby('region')['value'].sum().reset_index()

# Drop the 'World' entry if it exists
region_sums = region_sums[region_sums['region'] != 'World']

# Create a choropleth map
fig = px.choropleth(
    region_sums,locations='region',
    locationmode='country names',
    color='value',
    color_continuous_scale=px.colors.sequential.Viridis,  # Consistent Viridis color scheme
    title=f'Distribution of Charging Points ({year_input})',
    labels={'value': 'Charging Points'},
    hover_name='region',
)

# Customize map layout for clarity and user interaction
fig.update_layout(
    geo=dict(
        showframe=False,
        showcoastlines=True,
        projection_type='natural earth',  # More visually appealing projection
    ),
    title_font=dict(size=20, family="Arial Black"),
    margin=dict(l=0, r=0, t=50, b=0),
    coloraxis_colorbar=dict(
        title="Charging Points",
        ticks="outside",
        ticklen=5,
        len=0.7,
    )
)

# Show the map
fig.show()

Enter the year or range of years (e.g., '2022' or '2020-2023'):  2016-2025


# <font color=red> *3.Distribution of a particular powertrain type (e.g., Electric, Hybrid) across different regions or countries over various modes(car,EV,etc..)*

The **aim of the code** is to create an interactive choropleth map that visualizes the distribution of number of a specific powertrain type (e.g., Electric, Hybrid) across different regions or countries over various modes.

In [24]:
import plotly.express as px

# 1. Group data by region, powertrain, and mode, counting the occurrences of each powertrain
grouped_data = df.groupby(['region', 'powertrain', 'mode'])['powertrain'].count().reset_index(name='count')
# 2. Ask the user for the powertrain they want to visualize
print("Available Powertrains: ", grouped_data['powertrain'].unique())
powertrain_input = input("Enter the powertrain you want to visualize: ")
# 3. Filter the data based on the selected powertrai
filtered_data = grouped_data[grouped_data['powertrain'] == powertrain_input]
# 4. Remove 'World' from the region
filtered_data = filtered_data[filtered_data['region'] != 'World']
# 5. Create a choropleth map for the selected powertrain with animated modes
fig = px.choropleth(
    filtered_data,
    locations='region',
    locationmode='country names',
    color='count',
    color_continuous_scale=px.colors.sequential.Viridis,
    title=f'Distribution of powertrain - {powertrain_input} by Region and Mode (Count)',
    labels={
        'count': f'No. of {powertrain_input}',
        'region': 'Region',
        'mode': 'Mode'
    },
    animation_frame='mode'  # Animate the map by different modes
)

# 6. Customize map layout for clarity
fig.update_layout(
    geo=dict(
        showframe=False,
        showcoastlines=True,
        projection_type='natural earth'
    ),
    title_font=dict(size=22, family="Arial Black"),
    margin=dict(l=0, r=0, t=50, b=0),
    coloraxis_colorbar=dict(
        title=f"Count of {powertrain_input}",
        ticks="outside",
        ticklen=5,
        len=0.7,
    ),
    sliders=[{
        "currentvalue": {"prefix": "Mode: ", "font": {"size": 18}},
        "pad": {"t": 40},
    }],
)
# 7. Show the map
fig.show()


Available Powertrains:  ['BEV' 'EV' 'FCEV' 'PHEV' 'Publicly available fast'
 'Publicly available slow']


Enter the powertrain you want to visualize:  EV


### **How it Works:**

The map will now show data for each region (excluding "World") and animate over the different mode values, based on the powertrain chosen by the user.

The "World" entry will be excluded from the visualization, focusing only on country-level data.


# <font color=red> 4. Change in total no. of each unique power train changed with year

This code performs two main tasks: calculating the year-over-year change in powertrain counts and visualizing the results using a line chart.



1. **Grouping and Counting Powertrain Data**: The code begins by grouping the data in the DataFrame `df` by 'year' and 'powertrain', counting the occurrences of each powertrain in each year using the `groupby()` method. The result is stored in `powertrain_counts`, which contains columns for the year, powertrain type, and the count.



2. **Pivoting the Data**: The `powertrain_counts` table is then pivoted so that years are represented as rows and different powertrain types are represented as columns. This creates a clearer structure for analyzing the data over multiple years.



3. **Calculating Year-over-Year Change**: The `.diff()` method is used on the pivoted table to calculate the year-over-year change in powertrain counts for each powertrain type. This shows how the count of each powertrain has increased or decreased compared to the previous year.



4. **Creating a PrettyTable**: The `PrettyTable` library is used to format the year-over-year change data into a readable table. The field names are set to include 'Year' followed by the powertrain types. Each row represents a year, with the values showing the change in counts for each powertrain type.



5. **Line Chart Visualization**: A line chart is created using Plotly's `px.line()`, plotting the count of each powertrain type over the years. The lines are color-coded by the powertrain type, and the chart includes titles and labels for better clarity.



The formatted table of changes and the line chart together provide a comprehensive view of how powertrain counts have changed over time.

In [25]:
!pip install prettytable
from prettytable import PrettyTable  # Importing PrettyTable for better table formatting

# 1. Group data by year and powertrain, counting occurrences
powertrain_counts = df.groupby(['year', 'powertrain'])['powertrain'].count().reset_index(name='count')

# 2. Pivot the table to have years as rows and powertrains as columns
powertrain_pivot = powertrain_counts.pivot(index='year', columns='powertrain', values='count')

# 3. Calculate the year-over-year change in powertrain counts
powertrain_change = powertrain_pivot.diff()  # Calculate the difference between consecutive years

# 4. Create a PrettyTable instance
table = PrettyTable()

# Set the field names based on the columns of the resulting DataFrame
table.field_names = ['Year'] + list(powertrain_change.columns)

# 5. Add rows to the table
for year, row in powertrain_change.iterrows():
    table.add_row([year] + row.tolist())

# 6. Print the formatted table
print("Year-over-Year Change in Powertrain Counts:")
print(table)
fig = px.line(powertrain_counts, x='year', y='count', color='powertrain',
              title='Change in Powertrain Counts Over Years',
              labels={'count': 'Total Count', 'year': 'Year', 'powertrain': 'Powertrain'})
fig.show()

Year-over-Year Change in Powertrain Counts:
+------+--------+--------+--------+--------+-------------------------+-------------------------+
| Year |  BEV   |   EV   |  FCEV  |  PHEV  | Publicly available fast | Publicly available slow |
+------+--------+--------+--------+--------+-------------------------+-------------------------+
| 2010 |  nan   |  nan   |  nan   |  nan   |           nan           |           nan           |
| 2011 |  30.0  |  35.0  |  -9.0  |  21.0  |           6.0           |           4.0           |
| 2012 |  5.0   |  8.0   |  2.0   |  22.0  |           8.0           |           7.0           |
| 2013 |  7.0   |  8.0   |  10.0  |  6.0   |           4.0           |           4.0           |
| 2014 |  2.0   |  2.0   |  14.0  |  9.0   |           4.0           |           3.0           |
| 2015 |  25.0  |  28.0  |  16.0  |  32.0  |           2.0           |           3.0           |
| 2016 |  16.0  |  17.0  |  4.0   |  5.0   |           3.0           |           2.

# <font color=red>5.Distribution of each unique Parameter by count

This code creates a pie chart to visualize the distribution of different parameters in a dataset, with a 3D effect and a vibrant color scheme.



1. **Counting Parameter Occurrences**: The code first counts the occurrences of each unique parameter in the DataFrame `df` by using the `value_counts()` method on the 'parameter' column. It then resets the index and renames the columns to 'parameter' and 'count' for clarity.



2. **Creating the Pie Chart**: Using Plotly Express (`px.pie`), a pie chart is generated where:

   - The 'count' column provides the values (size of each pie slice).

   - The 'parameter' column provides the labels for each slice.

   - The chart title is set to 'Distribution of Parameters by Count'.

   - The `color_discrete_sequence=px.colors.qualitative.Bold` argument applies a vibrant color palette to the slices.



3. **Adding 3D Effect and Text**: The chart's traces are updated to position the text inside the slices (`textposition='inside'`), displaying both the percentage and the label (`textinfo='percent+label'`). Additionally, a black outline (`marker=dict(line=dict(color='#000000', width=1))`) is added to each slice for enhanced clarity.



4. **Displaying the Chart**: Finally, the pie chart is displayed using `fig.show()`.



This chart provides an interactive and visually appealing way to analyze the distribution of parameters in the dataset, making it easier to identify the proportion of each parameter.

In [26]:
import plotly.express as px
# Counting the occurrences of each parameter
parameter_counts = df['parameter'].value_counts().reset_index()
parameter_counts.columns = ['parameter', 'count']
# Creating a 3D-style pie chart with a vivid color scheme
fig = px.pie(
    parameter_counts,
    values='count',
    names='parameter',
    title='Distribution of each unique Parameters by Count',
    color_discrete_sequence=px.colors.qualitative.Bold  # Vibrant color palette
)
# Update traces for 3D effect and text display
fig.update_traces(
    textposition='inside',
    textinfo='percent+label',
    marker=dict(line=dict(color='#000000', width=1))  # Adds a black outline for clarity
)
# Show the chart
fig.show()

# <font color=red> 6. Distribution of Parameter (EV sales,Ev stock,...) on basis mode type

This code generates an interactive pie chart based on a user-selected parameter from the DataFrame `df` and visualizes the distribution of the 'mode' within that parameter. Here's a breakdown of the process:



1. **User Input for Parameter**: The user is prompted to enter a parameter of interest, which is compared against the unique values in the `df['parameter']` column. The available parameters are printed for reference.



2. **Filtering Data**: After the user inputs a parameter, the DataFrame `df` is filtered to include only rows where the 'parameter' matches the input value. This filtered data is stored in `ev_df`.



3. **Counting Mode Occurrences**: The code counts how many times each unique 'mode' occurs in the filtered DataFrame `ev_df`. This is done using the `value_counts()` method on the 'mode' column.



4. **Renaming Columns**: For clarity, the columns of `mode_counts` are renamed to 'mode' and 'count', so they better represent the data.



5. **Creating the Pie Chart**: A pie chart is created using Plotly Express (`px.pie`). The pie slices represent the counts of each 'mode', and the colors are based on the `Viridis` color palette. The chart's title is dynamically set based on the chosen parameter.



6. **Formatting the Chart**: The chart is updated to display both the percentage and the label inside the pie slices (`textinfo='percent+label'`). The text is positioned inside the slices for better readability.



7. **Displaying the Chart**: The pie chart is displayed with `fig.show()`, providing an interactive visualization of how the modes are distributed for the selected parameter.



This approach allows the user to explore the distribution of 'modes' within different parameters in an engaging and visually appealing way.

In [27]:
import plotly.express as px
print("Available Parameter", df['parameter'].unique())
para = input("Enter the parameter")
# 1. Filter the DataFrame to include only rows where the 'parameter' is 'EV sales'
ev_df = df[df['parameter'] == para]

# 2. Count the occurrences of each 'mode' in the filtered DataFrame
mode_counts = ev_df['mode'].value_counts().reset_index()

# 3. Rename columns for better readability: 'index' to 'mode' and 'mode' to 'count'
mode_counts.columns = ['mode', 'count']

# 4. Create a pie chart with the 'count' of each 'mode' and set a title
fig = px.pie(
    mode_counts,
    values='count',       # Numerical values for pie chart
    names='mode',         # Labels for pie chart slices
    title = f"Distribution of Mode by Count on Basis of '{ev_df['parameter'].iloc[0]}'",
    color_discrete_sequence=px.colors.sequential.Viridis  # Catchy color scheme using Viridis
)
# 5. Update the chart to display both percentage and label inside the pie slices
fig.update_traces(textposition='inside', textinfo='percent+label')

# 6. Display the pie chart
fig.show()


Available Parameter ['EV stock share' 'EV sales share' 'EV sales' 'EV stock'
 'EV charging points' 'Electricity demand' 'Oil displacement Mbd'
 'Oil displacement, million lge']


Enter the parameter EV sales share


# <font color=red> 7. Relation between Parameter (EV sales,Ev stock,...) by Region

This code enables users to interactively visualize the distribution of regions based on a selected parameter from the dataset. First, it displays the available parameters and prompts the user to choose one. The dataset is then filtered according to the selected parameter. After counting the occurrences of each region, the user is asked to specify a rank range (e.g., from the top 1st to the 5th region). The data is filtered to include only the regions within the specified rank range. A pie chart is then created to represent the selected regions, with slices sized according to their count, and displaying both percentage and label within each slice. The title of the pie chart dynamically updates based on the selected rank range and parameter. The chart is styled with a vibrant color palette (Viridis) for better visual appeal and clarity. Finally, the pie chart is displayed, providing an interactive, user-driven way to explore the regional distribution for the chosen parameter.

In [28]:
import plotly.express as px
# Display available parameters for user selection
print("Available Parameters:", df['parameter'].unique())

# 1. Take user input for the parameter to filter
para = input("Enter the parameter: ")

# 2. Filter the DataFrame to include only rows with the selected parameter
ev_df = df[df['parameter'] == para]

# 3. Count the occurrences of each 'region' in the filtered DataFrame
region_counts = ev_df['region'].value_counts().reset_index()

# 4. Rename columns for better readability: 'index' to 'region' and 'region' to 'count'
region_counts.columns = ['region', 'count']

# 5. Ask the user for a range of ranks to display
start_rank = int(input("Enter the starting rank (e.g., 1 for the top region): "))
end_rank = int(input("Enter the ending rank (e.g., 5 for the 5th region): "))

# 6. Filter the DataFrame to include only the regions within the specified rank range
ranked_regions = region_counts.iloc[start_rank-1:end_rank]

# 7. Create a pie chart with the 'count' of each 'region' and set a dynamic title
title = f"Regions Ranked {start_rank} to {end_rank} by Count on Basis of '{para}'"
fig = px.pie(
    ranked_regions,
    values='count',  # Numerical values for pie chart
    names='region',  # Labels for pie chart slices
    title=title,
    color_discrete_sequence=px.colors.sequential.Viridis  # Catchy color scheme using Viridis
)

# 8. Update the chart to display both percentage and label inside the pie slices
fig.update_traces(textposition='inside', textinfo='percent+label')

# 9. Display the pie chart
fig.show()

Available Parameters: ['EV stock share' 'EV sales share' 'EV sales' 'EV stock'
 'EV charging points' 'Electricity demand' 'Oil displacement Mbd'
 'Oil displacement, million lge']


Enter the parameter:  EV sales share
Enter the starting rank (e.g., 1 for the top region):  10
Enter the ending rank (e.g., 5 for the 5th region):  25


In [29]:
import plotly.express as px
# 1. Group the data by 'parameter' and 'region', and count occurrences
region_counts_by_parameter = df.groupby(['parameter', 'region']).size().reset_index(name='count')

# 2. Create a choropleth map with animation based on 'parameter'
fig = px.choropleth(
    region_counts_by_parameter,
    locations='region',
    locationmode='country names',
    color='count',
    hover_name='region',
    animation_frame='parameter',  # Add slider for 'parameter'
    color_continuous_scale=px.colors.sequential.Viridis,
    title='Distribution of Regions by Count Based on Parameter'
)
# 3. Customize the map layout
fig.update_layout(
    geo=dict(showframe=False, showcoastlines=True, projection_type='natural earth'),
    title_font=dict(size=20, family="Arial Black"),
    sliders=[{
        "currentvalue": {"prefix": "Parameter: ", "font": {"size": 18}},
        "pad": {"t": 40}
    }],
    updatemenus=[{
        "buttons": [
            {
                "args": [None, {"frame": {"duration": 1000, "redraw": True}, "fromcurrent": True}],
                "label": "Play",
                "method": "animate"
            },
            {
                "args": [[None], {"frame": {"duration": 0, "redraw": False}, "mode": "immediate"}],
                "label": "Pause",
                "method": "animate"
            }
        ],
        "direction": "left",
        "pad": {"r": 10, "t": 87},
        "showactive": False,
        "type": "buttons",
        "x": 0.1,
        "xanchor": "right",
        "y": 0,
        "yanchor": "top"
    }]
)


# 4. Display the map with slider
fig.show()


# <font color=red> 8. Popularity of different mode(vehicle) types across regions or countries

This code allows users to explore the distribution of modes within a selected region. First, it presents a list of available regions and prompts the user to select one. After filtering the dataset to include only rows for the chosen region, the code counts the occurrences of each mode within the region. The resulting data is then presented in a pie chart, where each slice represents a mode, with the size of the slice corresponding to its count. The pie chart is styled using the Viridis color scheme for vibrant visualization, with black slice outlines for better clarity. Hovering over each slice reveals the mode's label, percentage, and count for detailed information. Additionally, the layout of the chart is customized for improved readability, including font adjustments, a prominent title, and a horizontally oriented legend positioned below the chart. Finally, the pie chart is displayed, providing a clear and interactive visualization of the mode distribution for the selected region.

In [30]:
import plotly.express as px
# Display available regions for user selection
print("Available Regions:", df['region'].unique())

# 1. Take user input for the region to filter
selected_region = input("Enter the region: ")

# 2. Filter the DataFrame to include only rows with the selected region
filtered_df = df[df['region'] == selected_region]

# 3. Count the occurrences of each 'mode' in the filtered DataFrame
mode_counts = filtered_df['mode'].value_counts().reset_index()

# 4. Rename columns for better readability
mode_counts.columns = ['mode', 'count']

# 5. Create a pie chart with Viridis color scheme
fig = px.pie(
    mode_counts,
    values='count',
    names='mode',
    title=f"Distribution of Modes in {selected_region} Region",
    color_discrete_sequence=px.colors.sequential.Viridis  # Retaining Viridis color scheme
)

# 6. Update the chart to display detailed hover info and enhanced slice appearance
fig.update_traces(
    textposition='inside',
    textinfo='percent+label',
    marker=dict(line=dict(color='black', width=1.5)),  # Enhanced slice outline for clarity
    hoverinfo='label+percent+value'  # Shows label, percentage, and count on hover
)

# 7. Customize layout for aesthetics
fig.update_layout(
    font=dict(size=16, family="Arial"),  # Font adjustments for readability
    title_font=dict(size=20, family="Arial Black"),  # Title font for prominence
    legend_title_text='Modes',  # Legend title for clarity
    legend=dict(
        orientation="h",  # Horizontal legend placement
        yanchor="bottom", y=-0.1,  # Positioning below the chart
        xanchor="center", x=0.5
    )
)

# 8. Display the pie chart
fig.show()

Available Regions: ['Australia' 'Austria' 'Belgium' 'Brazil' 'Bulgaria' 'Canada' 'Chile'
 'China' 'Colombia' 'Costa Rica' 'Croatia' 'Cyprus' 'Czech Republic'
 'Denmark' 'Estonia' 'EU27' 'Europe' 'Finland' 'France' 'Germany' 'Greece'
 'Hungary' 'Iceland' 'India' 'Indonesia' 'Ireland' 'Israel' 'Italy'
 'Japan' 'Korea' 'Latvia' 'Lithuania' 'Luxembourg' 'Mexico' 'Netherlands'
 'New Zealand' 'Norway' 'Poland' 'Portugal' 'Rest of the world' 'Romania'
 'Seychelles' 'Slovakia' 'Slovenia' 'South Africa' 'Spain' 'Sweden'
 'Switzerland' 'Thailand' 'Turkiye' 'United Arab Emirates'
 'United Kingdom' 'USA' 'World']


Enter the region:  India


This code creates an interactive choropleth map that visualizes the distribution of modes across different regions. Initially, the code aggregates the counts of each unique mode by region using `groupby()`. It then uses `plotly.express` to create a choropleth map, where regions are mapped with color intensity representing the count of each mode. The color scale is based on the Viridis palette, providing a gradient from low to high values. The map includes an animation feature that allows users to see how the mode distribution changes over time or across different mode categories.



The `geo` layout options remove the frame, show coastlines, and apply a 'natural earth' projection to enhance the geographic view. The title is styled to stand out with a large, bold font. Users can hover over regions to see the mode count for each, and the map can animate through various modes for a dynamic experience. This visualization provides an intuitive way to explore how modes are distributed across regions.

In [31]:
import plotly.express as px
# 1. Aggregate the counts of each unique mode by region
mode_counts_by_region = df.groupby(['region', 'mode'])['mode'].count().reset_index(name='count')
# 2. Create a choropleth map for each unique mode, showing the counts in each region
fig = px.choropleth(
    mode_counts_by_region,
    locations='region',  # The column with region names
    locationmode='country names',  # Use 'country names' for geographic matching
    color='count',  # Column for color intensity (counts of each mode)
    hover_name='region',  # Displayed on hover
    color_continuous_scale=px.colors.sequential.Viridis,  # Viridis color scale
    animation_frame='mode',  # Animate over different modes
    title='Distribution of Modes by Region and Mode Count'
)

# 3. Update layout for better appearance
fig.update_layout(
    geo=dict(showframe=False, showcoastlines=True, projection_type='natural earth'),
    title_font=dict(size=20, family="Arial Black")
)

# 4. Display the map
fig.show()


The provided code creates an animated line chart using Plotly Express to visualize the distribution of modes across different regions. The data is first grouped by the `mode` and `region`, and the count of each mode in each region is calculated. This data is then used to create a line chart where:



- The X-axis represents different regions.

- The Y-axis represents the count of modes.

- Different lines are drawn for each mode, and the lines are colored by mode.

- The chart includes an animation slider, which allows the user to animate the data over different modes.



The layout is customized for a cleaner look, with a title and labels for the axes. Additionally, the animation speed is adjusted to 1000 milliseconds (1 second) per frame, and custom play/pause buttons are added to control the animation. This interactive chart gives users an engaging way to explore how modes are distributed across regions.

In [32]:
import plotly.express as px
# 1. Group data by region, mode, and count occurrences of each mode
mode_counts_by_region_mode = df.groupby(['mode', 'region'])['mode'].count().reset_index(name='count')

# 2. Create a line chart showing mode distribution across regions
fig = px.line(
    mode_counts_by_region_mode,
    x='region',  # X-axis will be the region
    y='count',  # Y-axis will be the count of modes
    color='mode',  # Color lines by mode
    line_group='mode',  # Group the lines by mode for differentiation
    title='Distribution of Modes by Region',
    labels={'count': 'Number of Modes', 'region': 'Region'},
    animation_frame='mode',  # Add the slider for the 'mode' column
)

# 3. Update layout for a cleaner presentation and decrease animation speed
fig.update_layout(
    title_font=dict(size=20, family="Arial Black", color="darkblue"),
    xaxis_title="Region",
    yaxis_title="Mode Count",
    margin=dict(l=0, r=0, t=40, b=60),  # Adjust margins for better fit
    legend_title="Mode",
    sliders=[{
        "currentvalue": {"prefix": "Mode: ", "font": {"size": 18}},
        "pad": {"t": 40},
    }],
)

# Adjust the animation speed (frame_duration) by setting the frame duration in milliseconds
fig.update_layout(
    updatemenus=[{
        "buttons": [
            {
                "args": [None, {"frame": {"duration": 1000, "redraw": True}, "fromcurrent": True}],  # 1000 ms (1 second) per frame
                "label": "Play",
                "method": "animate"
            },
            {
                "args": [[None], {"frame": {"duration": 0, "redraw": False}, "mode": "immediate"}],
                "label": "Pause",
                "method": "animate"
            }
        ],
        "direction": "left",
        "pad": {"r": 10, "t": 87},
        "showactive": False,
        "type": "buttons",
        "x": 0.1,
        "xanchor": "right",
        "y": 0,
        "yanchor": "top"
    }]
)
# 4. Show the line chart
fig.show()

# <font color=red> 9.Popularity of different mode(vehicle) types across regions or countries over year

In [33]:
import plotly.express as px

# Ask the user for the available modes and choose one
print("Modes available:", df['mode'].unique())
mo = input("Enter the mode: ")
# Filter data by selected mode
d = df[df['mode'] == mo]

# 1. Group data by year, region, and mode, and count occurrences of each mode
mode_counts_by_region_year_mode = d.groupby(['year', 'region', 'mode'])['mode'].count().reset_index(name='count')

# 2. Remove 'World' from the region to focus on countries (if applicable)
mode_counts_by_region_year_mode = mode_counts_by_region_year_mode[mode_counts_by_region_year_mode['region'] != 'World']

# 3. Create a choropleth map for the selected mode, animated over the years
fig = px.choropleth(
    mode_counts_by_region_year_mode,
    locations='region',  # The column with region names
    locationmode='country names',  # Use 'country names' for geographic matching
    color='count',  # Color intensity based on mode count
    animation_frame='year',  # Animation by year
    hover_name='region',  # Region name displayed on hover
    hover_data=['mode', 'count'],  # Display mode and count on hover
    color_continuous_scale=px.colors.sequential.Viridis,  # Color scale for the map
    title=f'Distribution of {mo} Across Regions Over Years',  # Title showing selected mode
    labels={'count': 'Number of Modes', 'year': 'Year', 'region': 'Region'}
)

# 4. Update layout for better visualization
fig.update_layout(
    geo=dict(
        showframe=False,  # Hide borders of regions
        showcoastlines=True,  # Show coastlines for context
        projection_type='natural earth',  # Use a global perspective
    ),
    title_font=dict(size=22, family="Arial Black", color="darkblue"),
    margin=dict(l=0, r=0, t=50, b=0),  # Adjust margins for better fit
    coloraxis_colorbar=dict(
        title="Mode Count",
        ticks="outside",
        ticklen=5,
        len=0.7
    ),
)

# 5. Adjust the animation speed
fig.update_layout(
    updatemenus=[{
        "buttons": [
            {
                "args": [None, {"frame": {"duration": 1000, "redraw": True}, "fromcurrent": True}],  # Set animation speed
                "label": "Play",
                "method": "animate"
            },
            {
                "args": [[None], {"frame": {"duration": 0, "redraw": False}, "mode": "immediate"}],  # Pause button
                "label": "Pause",
                "method": "animate"
            }
        ],
        "direction": "left",
        "pad": {"r": 10, "t": 87},
        "showactive": False,
        "type": "buttons",
        "x": 0.1,
        "xanchor": "right",
        "y": 0,
        "yanchor": "top"
    }]
)
# 6. Show the choropleth map
fig.show()


Modes available: ['Cars' 'EV' 'Buses' 'Vans' 'Trucks']


Enter the mode:  EV


In [34]:
import plotly.express as px
# 1. Group data by region, mode, and year, and count occurrences of each mode
mode_counts_by_region_year_mode = df.groupby(['year', 'region', 'mode'])['mode'].count().reset_index(name='count')
# 2. Create a line chart showing mode distribution across regions and over years
fig = px.line(
    mode_counts_by_region_year_mode,
    x='region',  # X-axis will be the region
    y='count',  # Y-axis will be the count of modes
    color='mode',  # Color lines by mode
    line_group='mode',  # Group the lines by mode for differentiation
    title='Distribution of Modes by Region Over Years',
    labels={'count': 'Number of Modes', 'region': 'Region'},
    animation_frame='year',  # Add the slider for the 'year' column
)

# 3. Update layout for a cleaner presentation and decrease animation speed
fig.update_layout(
    title_font=dict(size=20, family="Arial Black", color="darkblue"),
    xaxis_title="Region",
    yaxis_title="Mode Count",
    margin=dict(l=0, r=0, t=40, b=60),  # Adjust margins for better fit
    legend_title="Mode",
    sliders=[{
        "currentvalue": {"prefix": "Year: ", "font": {"size": 18}},
        "pad": {"t": 40},
    }],
)
# Adjust the animation speed (frame_duration) by setting the frame duration in milliseconds
fig.update_layout(
    updatemenus=[{
        "buttons": [
            {
                "args": [None, {"frame": {"duration": 1000, "redraw": True}, "fromcurrent": True}],  # 1000 ms (1 second) per frame
                "label": "Play",
                "method": "animate"
            },
            {
               "args": [[None], {"frame": {"duration": 0, "redraw": False}, "mode": "immediate"}],
                "label": "Pause",
                "method": "animate"
            }
        ],
        "direction": "left",
        "pad": {"r": 10, "t": 87},
        "showactive": False,
        "type": "buttons",
        "x": 0.1,
        "xanchor": "right",
        "y": 0,
        "yanchor": "top"
    }]
)

# 4. Show the line chart
fig.show()


In [35]:
import plotly.express as px
print("Modes availbable", df['mode'].unique())
mo= input("Enter the mode")
d = df[df['mode'] == mo]
# 1. Group data by region, mode, and year, and count occurrences of each mode
mode_counts_by_region_year_mode = d.groupby(['year', 'region', 'mode'])['mode'].count().reset_index(name='count')

# 2. Create a line chart showing mode distribution across regions and over years
fig = px.line(
    mode_counts_by_region_year_mode,
    x='region',  # X-axis will be the region
    y='count',  # Y-axis will be the count of modes
    color='mode',  # Color lines by mode
    line_group='mode',  # Group the lines by mode for differentiation
    title='Distribution of Modes by Region Over Years',
    labels={'count': 'Number of Modes', 'region': 'Region'},
    animation_frame='year',  # Add the slider for the 'year' column
)

# 3. Update layout for a cleaner presentation and decrease animation speed
fig.update_layout(
    title_font=dict(size=20, family="Arial Black", color="darkblue"),
    xaxis_title="Region",
    yaxis_title="Mode Count",
    margin=dict(l=0, r=0, t=40, b=60),  # Adjust margins for better fit
    legend_title="Mode",
    sliders=[{
        "currentvalue": {"prefix": "Year: ", "font": {"size": 18}},
        "pad": {"t": 40},
    }],
)

# Adjust the animation speed (frame_duration) by setting the frame duration in milliseconds
fig.update_layout(
    updatemenus=[{
        "buttons": [
            {
                "args": [None, {"frame": {"duration": 1000, "redraw": True}, "fromcurrent": True}],  # 1000 ms (1 second) per frame
                "label": "Play",
                "method": "animate"
            },
            {
                "args": [[None], {"frame": {"duration": 0, "redraw": False}, "mode": "immediate"}],
                "label": "Pause",
                "method": "animate"
            }
        ],
        "direction": "left",
        "pad": {"r": 10, "t": 87},
        "showactive": False,
        "type": "buttons",
        "x": 0.1,
        "xanchor": "right",
        "y": 0,
        "yanchor": "top"
    }]
)

# 4. Show the line chart
fig.show()


Modes availbable ['Cars' 'EV' 'Buses' 'Vans' 'Trucks']


Enter the mode Cars


#  **<font color=red> 10. Grouping by Continent and representing the total value continent wise**

To facilitate further analysis, a continent mapping is created through a dictionary (continent_map), which manually assigns each country or region to its respective continent (e.g., Australia to Oceania, China to Asia). This map is then applied to the vehicles_countries_df dataframe, creating a new 'continent' column where each country is now associated with its continent.

Next, the code performs grouping by continent using the .groupby() function, summing the total EV sales (total_value) for each continent. This aggregated data is stored in the continent_totals dataframe, with the column names modified to clarify the data: continent and total_sales.

The final result, continent_totals, shows the total EV sales for each continent. Asia leads with over 1.2 billion sales, followed by Europe with 671 million, and North America with 413 million. Smaller regions like Oceania and South America show more modest totals, while the 'Other' category represents 308 million sales across the rest of the world. 308 million sales across the rest of the world.

In [36]:
import pandas as pd

# Assuming df is already defined as your initial DataFrame with 'region' and 'value' columns
# Sample structure of df:
# df = pd.DataFrame({'region': ['USA', 'Australia', 'India'], 'value': [100, 200, 300]})

# Continent mapping
continent_map = {
    'Australia': 'Oceania', 'Austria': 'Europe', 'Belgium': 'Europe', 'Brazil': 'South America',
    'Bulgaria': 'Europe', 'Canada': 'North America', 'Chile': 'South America', 'China': 'Asia',
    'Colombia': 'South America', 'Costa Rica': 'North America', 'Croatia': 'Europe', 'Cyprus': 'Europe',
    'Czech Republic': 'Europe', 'Denmark': 'Europe', 'EU27': 'Europe', 'Estonia': 'Europe', 'Europe': 'Europe',
    'Finland': 'Europe', 'France': 'Europe', 'Germany': 'Europe', 'Greece': 'Europe', 'Hungary': 'Europe',
    'Iceland': 'Europe', 'India': 'Asia', 'Ireland': 'Europe', 'Israel': 'Asia', 'Italy': 'Europe',
    'Japan': 'Asia', 'Korea': 'Asia', 'Latvia': 'Europe', 'Lithuania': 'Europe', 'Luxembourg': 'Europe',
    'Mexico': 'North America', 'Netherlands': 'Europe', 'New Zealand': 'Oceania', 'Norway': 'Europe',
    'Poland': 'Europe', 'Portugal': 'Europe', 'Rest of the world': 'Other', 'Romania': 'Europe',
    'Seychelles': 'Africa', 'Slovakia': 'Europe', 'Slovenia': 'Europe', 'South Africa': 'Africa',
    'Spain': 'Europe', 'Sweden': 'Europe', 'Switzerland': 'Europe', 'Turkiye': 'Europe', 'USA': 'North America',
    'United Arab Emirates': 'Asia', 'United Kingdom': 'Europe'
}

# Ensure 'region' column exists in df
if 'region' in df.columns:
    # Create a new column 'continent' by mapping 'region' using continent_map
    df['continent'] = df['region'].map(continent_map)

    # Group by continent and sum the 'value' column
    continent_totals = df.groupby('continent')['value'].sum().reset_index()

    # Rename columns for clarity
    continent_totals.columns = ['continent', 'total_value']

    # Display the result
    print(continent_totals)
else:
    print("Error: 'region' column not found in DataFrame.")


       continent   total_value
0         Africa  1.869902e+04
1           Asia  1.309607e+09
2         Europe  6.959460e+08
3  North America  4.247935e+08
4        Oceania  9.669110e+05
5          Other  3.212946e+08
6  South America  3.066531e+05


ploting the data

In [37]:
import pandas as pd
import plotly.express as px



# Create Bar Plot
fig_bar = px.bar(
    continent_totals,
    x='continent',
    y='total_value',
    title='Total StockValue by Continent',
    labels={'continent': 'Continent', 'total_value': 'Total Value'},
    color='total_value',
    color_continuous_scale='viridis'
)

fig_bar.show()


**10.a. Grouping by Continent and representing the total value continent wise on  Log scale**

In the above section we could see that the scale of data is varying by a huge scale. So for covenience we use log scale for plotting the data

In [38]:
import plotly.express as px
import pandas as pd
import numpy as np

# Assuming 'continent_totals' is a DataFrame with 'continent' and 'total_value'
data = continent_totals.to_dict('list')
df_continent_totals = pd.DataFrame(data)

# Apply log transformation to 'total_value', handling any zeros or negative values safely
df_continent_totals['log_total_value'] = np.log10(df_continent_totals['total_value'].replace(0, np.nan))

# Define a custom formatting function to display values consistently
def format_value(value):
    return f'{value:.2e}'  # Always use scientific notation with 2 decimal places

# Create a bar plot using log-transformed values, but displaying original values in the labels
fig = px.bar(
    df_continent_totals,
    x='continent',
    y='log_total_value',
    title='Log-Transformed Total Value by Continent',
    labels={'log_total_value': 'Log(Total Value)', 'continent': 'Continent'},
    text=df_continent_totals['total_value'].apply(format_value),  # Apply custom formatting to text
    color='total_value',
    color_continuous_scale='viridis',
    color_discrete_sequence=px.colors.qualitative.Set3
)

fig.update_layout(
    xaxis_title='Continent',
    yaxis_title='Log(Total Value)',
    xaxis_tickangle=-45
)
fig.show()


In [39]:
import plotly.express as px
import pandas as pd
import numpy as np

# Log transformation, handling zeros safely
continent_totals['log_total_value'] = np.log10(continent_totals['total_value'].replace(0, np.nan))

# Manually map continents to representative countries (just for visualization)
continent_to_country_map = {
    'Africa': 'South Africa',
    'Asia': 'China',
    'Europe': 'Germany',
    'North America': 'USA',
    'Oceania': 'Australia',
    'South America': 'Brazil'
}

# Add a 'country' column based on the representative country for each continent
continent_totals['country'] = continent_totals['continent'].map(continent_to_country_map)

# Create a choropleth-like map using 'scatter_geo'
fig_map = px.scatter_geo(
    continent_totals,
    locations='country',
    locationmode='country names',
    size='log_total_value',  # Bubble size based on log value
    color='log_total_value',  # Color based on log-transformed values
    hover_name='continent',  # Show continent on hover
    hover_data={'total_value': True},  # Display original total value on hover
    projection='natural earth',
    title='Log-Transformed Stock Value by Continent (World Map)',
    color_continuous_scale='Viridis'
)

fig_map.update_layout(
    geo=dict(
        showframe=False,
        showcoastlines=True,
        coastlinecolor="Gray"
    ),
    title_x=0.5  # Center the title
)

fig_map.show()


<h2>Conclusion</h2>



<p>This analysis explored the global landscape of electric vehicle (EV) adoption using the IEA Global EV Data 2024 dataset. We investigated various parameters such as EV sales, charging infrastructure, and powertrain types across different regions and countries over time. By employing visualizations like bar charts, line charts, pie charts, and choropleth maps, we gained valuable insights into the distribution, trends, and potential growth areas of the EV market.</p>



<h3>Key Takeaways:</h3>



<ul>

  <li><b>Regional Trends:</b> China and Europe emerged as dominant players in EV sales, charging infrastructure, and other parameters. While North America has a presence, other regions, like 'Other Asia Pacific,' also show significant activity, highlighting the global expansion of EV adoption. This 'Other' category likely represents a combination of smaller Asian countries not individually listed.</li>

  <li><b>Sales Growth:</b> Analysis revealed notable growth trends in EV sales and stock, with clear growth trajectories across many countries and regions. Further investigation into powertrain distribution and sales figures would be necessary to identify the specific factors driving this growth within each area.</li>

  <li><b>Charging Infrastructure:</b> Although visualizing charging infrastructure was challenging due to data limitations, further analysis could focus on specific charging technologies and countries to better understand the key infrastructural factors driving EV adoption.</li>

  <li><b>Powertrain Types:</b> The distribution of powertrains, such as BEV (Battery Electric Vehicle) and PHEV (Plug-in Hybrid Electric Vehicle), could be explored with an animated choropleth map. This helped to understand which powertrain type is favored by various modes like cars, buses, trucks, vans, and two/three-wheelers.</li>

</ul>



<p>This analysis establishes a foundation for understanding the global EV market. With future research and updated datasets, more detailed regional analysis and predictive modeling could be conducted to inform investment strategies and policy decisions in the rapidly evolving EV landscape.</p>

<h2>Futuristic Scopes of Electric Vehicles (EVs)</h2>



<p>The EV revolution is accelerating, and its future is filled with exciting possibilities. Here are some key areas where EVs are expected to have a significant impact:</p>



<h3>1. Advanced Battery Technology</h3>

<ul>

  <li><b>Solid-State Batteries:</b> Offering higher energy density, faster charging, and improved safety compared to current lithium-ion batteries.</li>

  <li><b>Battery Recycling and Sustainability:</b> Efficient recycling programs and innovative technologies for a more sustainable EV ecosystem.</li>

  <li><b>Wireless Charging:</b> Seamless and convenient charging, potentially even while driving.</li>

</ul>



<h3>2. Autonomous Driving Integration</h3>

<ul>

  <li><b>Self-Driving EVs:</b> Enhancing safety, reducing traffic congestion, and improving accessibility.</li>

  <li><b>Robotaxis and Ride-Sharing:</b> Forming the backbone of future on-demand transportation services.</li>

</ul>



<h3>3. Smart Charging and Grid Integration</h3>

<ul>

  <li><b>Vehicle-to-Grid (V2G) Technology:</b> EVs acting as mobile energy storage units, stabilizing the grid.</li>

  <li><b>Smart Charging Optimization:</b> Coordinating EV charging with electricity prices and grid demand.</li>

</ul>



<h3>4. Infrastructure Development and Accessibility</h3>

<ul>

  <li><b>Expansion of Charging Networks:</b> Making EV charging accessible to everyone.</li>

  <li><b>Charging Innovations:</b> Faster charging technologies and solutions like battery swapping stations.</li>

</ul>



<h3>5. Sustainability and Environmental Impact</h3>

<ul>

  <li><b>Reduced Emissions:</b> Contributing to cleaner air and a healthier environment.</li>

  <li><b>Renewable Energy Integration:</b> Aligning with the transition to renewable energy sources.</li>

</ul>



<h3>6. New Business Models and Services</h3>

<ul>

  <li><b>Battery-as-a-Service (BaaS):</b> Leasing batteries separately from the vehicle, reducing upfront costs.</li>

  <li><b>EV Subscription Services:</b> Providing an alternative to traditional car buying.</li>

</ul>



<h3>7. Enhanced Connectivity and Personalization</h3>

<ul>

  <li><b>Connected Car Features:</b> Real-time traffic updates, remote vehicle control, and in-car entertainment.</li>

  <li><b>Personalized Driving Experiences:</b> Customized settings for driving modes, comfort, and entertainment.</li>

</ul>



<p>These are just some of the exciting possibilities that lie ahead for EVs. Continued advancements in technology, infrastructure, and business models will shape the future of mobility and create a more sustainable and connected transportation ecosystem.</p>