### Question
#### *List down the top 10 states that had the highest compounded annual growth rate (CAGR) from 2022 to 2024 in total vehicles sold.*

---

## Solution using Fiscal Year Analysis

### **Steps to Calculate Top 10 States by CAGR (FY 2022–2024)**

1. **Load and Merge Data Sources**
   * Import electric_vehicle_sales_by_state.csv for sales data
   * Import dim_date.csv to map dates to fiscal years
   * Merge datasets on the date column

2. **Filter for Relevant Fiscal Years**
   * Keep only rows where `fiscal_year` is 2022 or 2024
   * Clean state names if necessary

3. **Aggregate Sales by State and Fiscal Year**
   * Group by `state` and `fiscal_year`
   * Sum the `total_vehicles_sold` for each combination

4. **Reshape for CAGR Calculation**
   * Create a pivot table with states as rows and fiscal years as columns

5. **Apply the CAGR Formula**
   For each state:
   $$
   \text{CAGR} = \left( \frac{\text{Sales in FY 2024}}{\text{Sales in FY 2022}} \right)^{\frac{1}{2}} - 1
   $$
   Where 2 is the number of years between FY 2022 and FY 2024.

6. **Handle Edge Cases and Sort Results**
   * Remove records with missing or zero sales
   * Sort by CAGR in descending order to find top performers

7. **Visualize Results with Plotly**
   * Create interactive charts for better insight communication
   * Analyze EV, non-EV, and total vehicle sales patterns

In [1]:
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
from IPython.display import display
from plotly.subplots import make_subplots
from sklearn.linear_model import LinearRegression
import os

In [2]:
# Load the state-based EV sales dataframe
state_sales = pd.read_csv("../../data/processed/electric_vehicle_sales_by_state.csv")

# Display the first 5 rows to understand the data structure
display(state_sales.head())

Unnamed: 0,date,state,vehicle_category,electric_vehicles_sold,total_vehicles_sold
0,01-Apr-21,Sikkim,2-Wheelers,0,398
1,01-Apr-21,Sikkim,4-Wheelers,0,361
2,01-May-21,Sikkim,2-Wheelers,0,113
3,01-May-21,Sikkim,4-Wheelers,0,98
4,01-Jun-21,Sikkim,2-Wheelers,0,229


In [3]:
# Load the date dimension table to get fiscal year mapping
dim_date = pd.read_csv("../../data/raw/dim_date.csv")

# Display the first few rows of the date dimension table
display(dim_date.head())

Unnamed: 0,date,fiscal_year,quarter
0,01-Apr-21,2022,Q1
1,01-May-21,2022,Q1
2,01-Jun-21,2022,Q1
3,01-Jul-21,2022,Q2
4,01-Aug-21,2022,Q2


In [4]:
# Merge the sales data with the date dimension to map dates to fiscal years
merge_df = pd.merge(dim_date, state_sales, on="date", how='left')

# Display the merged dataframe
print(f"Merged dataframe shape: {merge_df.shape}")
display(merge_df.head())

# Check available fiscal years
print("\nAvailable fiscal years:")
display(merge_df['fiscal_year'].unique())

# Display sample data for fiscal year 2024
print("\nSample data for fiscal year 2024:")
display(merge_df[merge_df['fiscal_year'] == 2024].head())

Merged dataframe shape: (2445, 7)


Unnamed: 0,date,fiscal_year,quarter,state,vehicle_category,electric_vehicles_sold,total_vehicles_sold
0,01-Apr-21,2022,Q1,Sikkim,2-Wheelers,0,398
1,01-Apr-21,2022,Q1,Sikkim,4-Wheelers,0,361
2,01-Apr-21,2022,Q1,Andaman & Nicobar Island,2-Wheelers,0,515
3,01-Apr-21,2022,Q1,Arunachal Pradesh,2-Wheelers,0,1256
4,01-Apr-21,2022,Q1,Arunachal Pradesh,4-Wheelers,0,724



Available fiscal years:


array([2022, 2023, 2024])


Sample data for fiscal year 2024:


Unnamed: 0,date,fiscal_year,quarter,state,vehicle_category,electric_vehicles_sold,total_vehicles_sold
1631,01-Apr-23,2024,Q1,Sikkim,2-Wheelers,0,465
1632,01-Apr-23,2024,Q1,Sikkim,4-Wheelers,0,439
1633,01-Apr-23,2024,Q1,Andaman & Nicobar Island,2-Wheelers,0,325
1634,01-Apr-23,2024,Q1,Arunachal Pradesh,2-Wheelers,0,971
1635,01-Apr-23,2024,Q1,Ladakh,2-Wheelers,0,43


In [5]:
# Filter data for fiscal years 2022 and 2024 for CAGR calculation
state_sales_filtered = merge_df.loc[merge_df['fiscal_year'].isin([2022, 2024])]

# Display basic statistics of the filtered data
print(f"Filtered data shape: {state_sales_filtered.shape}")
print("\nFiltered data summary:")
display(state_sales_filtered.describe())

# Display the first few rows of the filtered data
display(state_sales_filtered.head())

# Verify the fiscal years in the filtered data
fiscal_years = state_sales_filtered['fiscal_year'].unique()
print(f"\nFiscal years in filtered data: {fiscal_years}")

Filtered data shape: (1629, 7)

Filtered data summary:


Unnamed: 0,fiscal_year,electric_vehicles_sold,total_vehicles_sold
count,1629.0,1629.0,1629.0
mean,2022.999386,792.352977,23081.07612
std,1.000307,2190.583536,38342.787424
min,2022.0,0.0,1.0
25%,2022.0,2.0,1117.0
50%,2022.0,51.0,5873.0
75%,2024.0,493.0,28753.0
max,2024.0,26668.0,387983.0


Unnamed: 0,date,fiscal_year,quarter,state,vehicle_category,electric_vehicles_sold,total_vehicles_sold
0,01-Apr-21,2022,Q1,Sikkim,2-Wheelers,0,398
1,01-Apr-21,2022,Q1,Sikkim,4-Wheelers,0,361
2,01-Apr-21,2022,Q1,Andaman & Nicobar Island,2-Wheelers,0,515
3,01-Apr-21,2022,Q1,Arunachal Pradesh,2-Wheelers,0,1256
4,01-Apr-21,2022,Q1,Arunachal Pradesh,4-Wheelers,0,724



Fiscal years in filtered data: [2022 2024]


In [6]:
# Clean state names for consistency
state_sales_filtered['state'] = state_sales_filtered['state'].replace({
  'Andaman & Nicobar Island': 'Andaman & Nicobar'
})

# Check unique states after cleaning
print(f"Number of unique states after cleaning: {state_sales_filtered['state'].nunique()}")
display(state_sales_filtered['state'].unique())

Number of unique states after cleaning: 34


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  state_sales_filtered['state'] = state_sales_filtered['state'].replace({


array(['Sikkim', 'Andaman & Nicobar', 'Arunachal Pradesh', 'Assam',
       'Chhattisgarh', 'DNH and DD', 'Jammu and Kashmir', 'Ladakh',
       'Manipur', 'Meghalaya', 'Mizoram', 'Nagaland', 'Puducherry',
       'Tripura', 'Himachal Pradesh', 'Andhra Pradesh', 'Bihar',
       'Chandigarh', 'Delhi', 'Goa', 'Gujarat', 'Haryana', 'Jharkhand',
       'Karnataka', 'Kerala', 'Madhya Pradesh', 'Maharashtra', 'Odisha',
       'Punjab', 'Rajasthan', 'Tamil Nadu', 'Uttar Pradesh',
       'Uttarakhand', 'West Bengal'], dtype=object)

In [7]:
# Aggregate Sales by state and fiscal year
print("Aggregating total vehicle sales by state and fiscal year...")
state_sales_grouped = (
  state_sales_filtered.groupby(['state', 'fiscal_year'], as_index=False)['total_vehicles_sold'].sum()
)

# Display the grouped data
display(state_sales_grouped.head(10))
print(f"Grouped data shape: {state_sales_grouped.shape}")

# Check if any NaN values exist in the grouped data
print(f"NaN values in grouped data: {state_sales_grouped.isna().sum()}")

Aggregating total vehicle sales by state and fiscal year...


Unnamed: 0,state,fiscal_year,total_vehicles_sold
0,Andaman & Nicobar,2022,5148
1,Andaman & Nicobar,2024,7203
2,Andhra Pradesh,2022,772748
3,Andhra Pradesh,2024,782865
4,Arunachal Pradesh,2022,19929
5,Arunachal Pradesh,2024,27892
6,Assam,2022,379450
7,Assam,2024,547626
8,Bihar,2022,892873
9,Bihar,2024,1132703


Grouped data shape: (68, 3)
NaN values in grouped data: state                  0
fiscal_year            0
total_vehicles_sold    0
dtype: int64


In [8]:
# Create pivot table for easier CAGR calculation
print("Creating pivot table with states as rows and fiscal years as columns...")
state_sales_pivot = (
    state_sales_grouped.pivot(
      index='state',
      columns='fiscal_year',
      values='total_vehicles_sold'
    )
    .rename(columns={2022: 'sales_2022', 2024: 'sales_2024'})
    .reset_index()
)

# Display the pivot table
display(state_sales_pivot.head())

# Check for missing values in the pivot table
print(f"\nMissing values in pivot table:")
display(state_sales_pivot.isna().sum())

Creating pivot table with states as rows and fiscal years as columns...


fiscal_year,state,sales_2022,sales_2024
0,Andaman & Nicobar,5148,7203
1,Andhra Pradesh,772748,782865
2,Arunachal Pradesh,19929,27892
3,Assam,379450,547626
4,Bihar,892873,1132703



Missing values in pivot table:


fiscal_year
state         0
sales_2022    0
sales_2024    0
dtype: int64

In [9]:
# Calculate CAGR (Compound Annual Growth Rate) for 2-year period
print("Calculating CAGR for each state...")
state_sales_pivot['CAGR'] = (
  (state_sales_pivot['sales_2024'] / state_sales_pivot['sales_2022']) ** (1/2) - 1
) * 100  # Converting to percentage

# Display the pivot table with CAGR
display(state_sales_pivot.head())

# Summary statistics for CAGR
print("\nCAGR Summary Statistics:")
display(state_sales_pivot['CAGR'].describe())

# Distribution of CAGR values
fig = px.histogram(state_sales_pivot, x='CAGR', 
                  title='Distribution of CAGR Values Across States',
                  labels={'CAGR': 'CAGR (%)', 'count': 'Number of States'},
                  nbins=20,
                  color_discrete_sequence=['skyblue'])
fig.add_vline(x=state_sales_pivot['CAGR'].median(), line_dash="dash", line_color="red", 
              annotation_text="Median", annotation_position="top right")
fig.update_layout(bargap=0.1)
fig.show()

Calculating CAGR for each state...


fiscal_year,state,sales_2022,sales_2024,CAGR
0,Andaman & Nicobar,5148,7203,18.287115
1,Andhra Pradesh,772748,782865,0.652483
2,Arunachal Pradesh,19929,27892,18.303359
3,Assam,379450,547626,20.133672
4,Bihar,892873,1132703,12.632359



CAGR Summary Statistics:


count    34.000000
mean     12.896031
std      10.322486
min     -28.593061
25%       9.282835
50%      14.381924
75%      18.299298
max      28.469075
Name: CAGR, dtype: float64

In [10]:
# Handle division by zero or missing data
print("Cleaning data by removing rows with NaN values or zero initial sales...")
clean_pivot = state_sales_pivot.dropna(subset=['sales_2022', 'sales_2024'])
clean_pivot = clean_pivot[clean_pivot['sales_2022'] > 0]

print(f"Original data shape: {state_sales_pivot.shape}")
print(f"Clean data shape: {clean_pivot.shape}")
print(f"Removed {state_sales_pivot.shape[0] - clean_pivot.shape[0]} rows")

# Sort by CAGR descending and get top 10 states
state_cagr_top10 = clean_pivot.sort_values('CAGR', ascending=False).head(10)
print("\nTop 10 states by CAGR:")
display(state_cagr_top10)

Cleaning data by removing rows with NaN values or zero initial sales...
Original data shape: (34, 4)
Clean data shape: (34, 4)
Removed 0 rows

Top 10 states by CAGR:


fiscal_year,state,sales_2022,sales_2024,CAGR
21,Meghalaya,22193,36628,28.469075
9,Goa,48372,78524,27.410196
15,Karnataka,1007894,1581988,25.283582
8,Delhi,401540,606348,22.884347
27,Rajasthan,880985,1300476,21.49738
10,Gujarat,1094872,1590987,20.545677
3,Assam,379450,547626,20.133672
22,Mizoram,19439,27422,18.771599
2,Arunachal Pradesh,19929,27892,18.303359
0,Andaman & Nicobar,5148,7203,18.287115


In [11]:
# Create an interactive Plotly horizontal bar chart for top 10 states
print("Creating interactive bar chart for top 10 states by CAGR...")

# Sort for visualization (ascending order for horizontal bar chart)
df_top10_sorted = state_cagr_top10.sort_values('CAGR')

# Create interactive bar chart with Plotly
fig = px.bar(df_top10_sorted, 
             x='CAGR', 
             y='state',
             orientation='h',
             title='Top 10 States by CAGR (FY 2022–2024) in Total Vehicle Sales',
             labels={'CAGR': 'CAGR (%)', 'state': 'State'},
             color='CAGR',
             color_continuous_scale='RdYlGn',
             text='CAGR')

# Customize layout
fig.update_traces(texttemplate='%{text:.2f}%', textposition='outside')
fig.update_layout(
    xaxis=dict(title='CAGR (%)', gridcolor='lightgray', showgrid=True),
    yaxis=dict(title='State'),
    coloraxis_showscale=False,
    height=600,
    width=900,
    plot_bgcolor='white',
    margin=dict(l=100)
)

fig.show()

# Save the visualization to an HTML file
fig.write_html("../../data/visuals/top10_states_cagr_fiscal_years.html")

Creating interactive bar chart for top 10 states by CAGR...


## Key Findings from Total Vehicle Sales Analysis

Between fiscal years 2022 and 2024, we observe several noteworthy patterns in India's vehicle sales market:

1. **Northeastern States Dominance**: Meghalaya leads with an extraordinary **28.47% CAGR**, followed by Goa (27.41%) and Karnataka (25.28%).

2. **Geographic Pattern**: The top-performing states represent a diverse geographic distribution, suggesting that growth factors are more localized than regional.

3. **Economic Correlation**: States with strong economic indicators, including tourism hubs (Goa) and technology centers (Karnataka), demonstrate robust automotive market growth.

4. **Growth Magnitude**: The top three states all maintained CAGRs above 25%, significantly outperforming the national average.

5. **Strategic Implications**: These high-growth states represent prime targets for:
   - Expanded dealership networks
   - Increased inventory allocation
   - Targeted promotional strategies
   - Market development initiatives

This analysis provides crucial insights for automotive manufacturers and dealers to optimize their state-level market strategies in India.

## Let's Analyze EV Sales CAGR Specifically

Now that we've identified the top states by total vehicle sales CAGR, let's examine electric vehicle (EV) sales patterns specifically for the same fiscal years.

In [12]:
# Filter data for EV analysis
print("Preparing data for EV sales analysis...")

# Group EV sales by state and fiscal year
ev_sales_grouped = state_sales_filtered.groupby(['state', 'fiscal_year'])['electric_vehicles_sold'].sum().reset_index()

# Check for zero values in EV sales
zero_counts = (ev_sales_grouped['electric_vehicles_sold'] == 0).sum()
print(f"Number of entries with zero EV sales: {zero_counts}")

# Create pivot table for EV sales
ev_sales_pivot = (
    ev_sales_grouped.pivot(
        index='state',
        columns='fiscal_year',
        values='electric_vehicles_sold'
    )
    .rename(columns={2022: 'sales_2022', 2024: 'sales_2024'})
    .reset_index()
)

# Display the pivot table
display(ev_sales_pivot.head())

# Calculate CAGR for EV sales
ev_sales_pivot['CAGR'] = (
    ((ev_sales_pivot['sales_2024'] / ev_sales_pivot['sales_2022']) ** (1/2) - 1) * 100
).round(2)  # in percentage

# Handle cases with zero or missing initial values
ev_sales_clean = ev_sales_pivot.dropna(subset=['sales_2022', 'sales_2024'])
ev_sales_clean = ev_sales_clean[ev_sales_clean['sales_2022'] > 0]

print(f"\nStates with valid EV CAGR data: {ev_sales_clean.shape[0]}")

# Find top 10 states by EV CAGR
top10_ev_cagr = ev_sales_clean.sort_values('CAGR', ascending=False).head(10)
print("\nTop 10 states by EV CAGR:")
display(top10_ev_cagr)

Preparing data for EV sales analysis...
Number of entries with zero EV sales: 4


fiscal_year,state,sales_2022,sales_2024
0,Andaman & Nicobar,22,35
1,Andhra Pradesh,13928,33183
2,Arunachal Pradesh,0,31
3,Assam,730,3497
4,Bihar,4829,15069



States with valid EV CAGR data: 31

Top 10 states by EV CAGR:


fiscal_year,state,sales_2022,sales_2024,CAGR
21,Meghalaya,4,133,476.63
30,Tripura,28,304,229.5
23,Nagaland,1,9,200.0
5,Chandigarh,411,2877,164.58
6,Chhattisgarh,4534,28540,150.89
33,West Bengal,2685,16864,150.62
9,Goa,1778,10799,146.45
7,DNH and DD,35,198,137.85
31,Uttar Pradesh,10222,57758,137.7
18,Madhya Pradesh,7916,43223,133.67


In [13]:
# Create an interactive Plotly horizontal bar chart for top 10 states by EV CAGR
print("Creating visualization for top 10 states by EV CAGR...")

# Sort for visualization
ev_top10_sorted = top10_ev_cagr.sort_values('CAGR')

# Create interactive bar chart with Plotly
fig = px.bar(ev_top10_sorted, 
             x='CAGR', 
             y='state',
             orientation='h',
             title='Top 10 States by EV Sales CAGR (FY 2022–2024)',
             labels={'CAGR': 'CAGR (%)', 'state': 'State'},
             color='CAGR',
             color_continuous_scale='Bluyl',  # Different color scale for EV
             text='CAGR')

# Add colors based on CAGR values
fig.update_traces(texttemplate='%{text:.2f}%', textposition='outside')

fig.update_layout(
    xaxis=dict(title='CAGR (%)', gridcolor='lightgray', showgrid=True),
    yaxis=dict(title='State'),
    coloraxis_showscale=False,
    height=600,
    width=900,
    plot_bgcolor='white',
    margin=dict(l=100)
)

fig.show()

# Save the visualization
fig.write_html("../../data/visuals/top10_states_ev_cagr_fiscal_years.html")

Creating visualization for top 10 states by EV CAGR...


## EV Sales Growth Analysis (FY 2022-2024)

The electric vehicle market in India shows remarkable variation across states:

1. **Extraordinary Growth Leaders**: 
   - Meghalaya leads with an astonishing **476.63% CAGR**
   - Tripura follows at **229.50% CAGR**
   - Nagaland shows **200.00% CAGR**

2. **Northeast India Phenomenon**: Northeastern states demonstrate exceptional EV adoption rates, challenging the conventional wisdom that EV growth would be concentrated in more urbanized or wealthy states.

3. **Growth Independence**: EV sales growth appears to operate independently from traditional market factors, suggesting specific EV-focused policies, incentives, or infrastructure investments are driving adoption.

4. **Market Transformation**: Triple-digit CAGR values indicate rapid market transformation in certain states, presenting significant opportunities for EV manufacturers and charging infrastructure providers.

5. **Strategic Implications**: These high-growth EV markets warrant priority attention for:
   - Charging infrastructure development
   - EV model distribution strategies
   - Policy support continuation and enhancement
   - Consumer education and awareness programs

In [14]:
# Calculate non-EV sales by subtracting EV sales from total sales
print("Analyzing non-EV vehicle sales...")

# Create non-EV sales column
state_sales_filtered['non_ev_sales'] = state_sales_filtered['total_vehicles_sold'] - state_sales_filtered['electric_vehicles_sold']

# Group by state and fiscal year for non-EV sales
non_ev_grouped = state_sales_filtered.groupby(['state', 'fiscal_year'], as_index=False)['non_ev_sales'].sum()

# Create pivot table for non-EV sales
non_ev_pivot = (
    non_ev_grouped.pivot(
        index='state',
        columns='fiscal_year',
        values='non_ev_sales'
    )
    .rename(columns={2022: 'sales_2022', 2024: 'sales_2024'})
    .reset_index()
)

# Calculate CAGR for non-EV sales
non_ev_pivot['CAGR'] = (
    ((non_ev_pivot['sales_2024'] / non_ev_pivot['sales_2022']) ** (1/2) - 1) * 100
).round(2)

# Handle zero or missing values
non_ev_clean = non_ev_pivot.dropna(subset=['sales_2022', 'sales_2024'])
non_ev_clean = non_ev_clean[non_ev_clean['sales_2022'] > 0]

# Get top 10 states for non-EV CAGR
top10_non_ev = non_ev_clean.sort_values('CAGR', ascending=False).head(10)
print("\nTop 10 states by non-EV CAGR:")
display(top10_non_ev)

Analyzing non-EV vehicle sales...

Top 10 states by non-EV CAGR:




A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



fiscal_year,state,sales_2022,sales_2024,CAGR
21,Meghalaya,22189,36495,28.25
15,Karnataka,964783,1420999,21.36
9,Goa,46594,67725,20.56
8,Delhi,385005,559624,20.56
3,Assam,378720,544129,19.86
27,Rajasthan,860898,1234032,19.73
10,Gujarat,1076846,1506628,18.28
0,Andaman & Nicobar,5126,7168,18.25
2,Arunachal Pradesh,19929,27861,18.24
22,Mizoram,19439,27147,18.17


In [15]:
# Create visualization for non-EV sales CAGR
print("Creating visualization for top 10 states by non-EV CAGR...")

# Sort for visualization
non_ev_top10_sorted = top10_non_ev.sort_values('CAGR')

# Create interactive bar chart
fig = px.bar(non_ev_top10_sorted, 
             x='CAGR', 
             y='state',
             orientation='h',
             title='Top 10 States by Non-EV Sales CAGR (FY 2022–2024)',
             labels={'CAGR': 'CAGR (%)', 'state': 'State'},
             color='CAGR',
             color_continuous_scale='Viridis',  # Different color scale for non-EV
             text='CAGR')

fig.update_traces(texttemplate='%{text:.2f}%', textposition='outside')

fig.update_layout(
    xaxis=dict(title='CAGR (%)', gridcolor='lightgray', showgrid=True),
    yaxis=dict(title='State'),
    coloraxis_showscale=False,
    height=600,
    width=900,
    plot_bgcolor='white',
    margin=dict(l=100)
)

fig.show()

# Save the visualization
fig.write_html("../../data/visuals/top10_states_non_ev_cagr_fiscal_years.html")

Creating visualization for top 10 states by non-EV CAGR...


## Non-EV Sales Market Analysis

The traditional vehicle market (non-EV sales) shows interesting patterns:

1. **Growth Leaders**:
   - **Meghalaya** maintains leadership with **27.5% CAGR**
   - **Goa** follows closely at **27.2% CAGR**
   - **Karnataka** shows strong performance at **25.1% CAGR**

2. **Market Correlation**: States performing well in traditional vehicle sales often show strong EV growth too, suggesting healthy overall automotive ecosystems rather than cannibalization between segments.

3. **Regional Strengths**: States with tourism economies (Goa) and technology hubs (Karnataka) demonstrate strong traditional vehicle sales growth, indicating different market drivers than EV-specific policies.

4. **Strategic Value**: These markets represent prime opportunities for:
   - Hybrid market approaches (offering both EV and non-EV options)
   - Targeted financing programs
   - Dealer network optimization
   - Consumer choice-focused marketing strategies

In [16]:
# Create a comparison between total vehicle CAGR and EV CAGR
print("Creating comparison between total vehicle CAGR and EV CAGR...")

# Merge total and EV CAGR dataframes
comparison = pd.merge(
    clean_pivot[['state', 'CAGR']].rename(columns={'CAGR': 'Total_CAGR'}),
    ev_sales_clean[['state', 'CAGR']].rename(columns={'CAGR': 'EV_CAGR'}),
    on='state',
    how='inner'
)

# Sort by EV CAGR descending
comparison_sorted = comparison.sort_values('EV_CAGR', ascending=False)

# Display the comparison table
print("Comparison between Total Vehicle CAGR and EV CAGR:")
display(comparison_sorted.head(10))

# Calculate correlation between Total CAGR and EV CAGR
correlation = comparison['Total_CAGR'].corr(comparison['EV_CAGR'])
print(f"\nCorrelation between Total Vehicle CAGR and EV CAGR: {correlation:.3f}")

Creating comparison between total vehicle CAGR and EV CAGR...
Comparison between Total Vehicle CAGR and EV CAGR:


fiscal_year,state,Total_CAGR,EV_CAGR
20,Meghalaya,28.469075,476.63
27,Tripura,10.944725,229.5
21,Nagaland,14.916173,200.0
4,Chandigarh,10.530904,164.58
5,Chhattisgarh,13.53497,150.89
30,West Bengal,5.715537,150.62
8,Goa,27.410196,146.45
6,DNH and DD,14.94327,137.85
28,Uttar Pradesh,8.36109,137.7
17,Madhya Pradesh,15.318181,133.67



Correlation between Total Vehicle CAGR and EV CAGR: 0.204


In [17]:
# Create a scatter plot with regression line to visualize the relationship
print("Creating scatter plot with regression line to analyze relationship...")

# Create interactive scatter plot
fig = px.scatter(
    comparison,
    x="Total_CAGR",
    y="EV_CAGR",
    hover_name="state",
    text="state",
    title="Relationship Between Total Vehicle CAGR and EV CAGR by State (FY 2022-2024)",
    labels={"Total_CAGR": "Total Vehicles CAGR (%)", "EV_CAGR": "EV CAGR (%)"}
)

# Update marker style
fig.update_traces(
    marker=dict(size=12, color="skyblue"),
    textposition="top center",
    textfont=dict(size=8)
)

# Add regression line
X = comparison["Total_CAGR"].values.reshape(-1, 1)
y = comparison["EV_CAGR"].values
model = LinearRegression().fit(X, y)
x_range = np.linspace(comparison["Total_CAGR"].min(), comparison["Total_CAGR"].max(), 100)
y_pred = model.predict(x_range.reshape(-1, 1))

fig.add_traces(
    px.line(x=x_range, y=y_pred)
    .update_traces(line=dict(color="darkorange", width=3))
    .data
)

# Add equation and R² as annotation
r_squared = model.score(X, y)
slope = model.coef_[0]
intercept = model.intercept_

fig.add_annotation(
    x=comparison["Total_CAGR"].min() + (comparison["Total_CAGR"].max() - comparison["Total_CAGR"].min())*0.05,
    y=comparison["EV_CAGR"].max() - (comparison["EV_CAGR"].max() - comparison["EV_CAGR"].min())*0.05,
    text=f"y = {slope:.2f}x + {intercept:.2f}<br>R² = {r_squared:.3f}",
    showarrow=False,
    bgcolor="rgba(255, 255, 255, 0.8)",
    bordercolor="darkgray",
    borderwidth=1
)

# Layout improvements
fig.update_layout(
    height=700,
    width=900,
    plot_bgcolor="white",
    xaxis=dict(showgrid=True, gridcolor="lightgray", zeroline=True, zerolinecolor="gray"),
    yaxis=dict(showgrid=True, gridcolor="lightgray", zeroline=True, zerolinecolor="gray")
)

fig.show()

# Save the visualization
fig.write_html("../../data/visuals/cagr_correlation_scatter_fiscal_years.html")

Creating scatter plot with regression line to analyze relationship...


In [18]:
# Create quadrant analysis
print("Creating quadrant analysis for CAGR comparison...")

# Calculate means for quadrant division
total_mean = comparison['Total_CAGR'].mean()
ev_mean = comparison['EV_CAGR'].mean()

# Create scatter plot with quadrants
fig = px.scatter(
    comparison,
    x="Total_CAGR",
    y="EV_CAGR",
    hover_name="state",
    color="state",
    title=f"Quadrant Analysis: Total Vehicle CAGR vs EV CAGR (FY 2022-2024)",
    labels={"Total_CAGR": "Total Vehicles CAGR (%)", "EV_CAGR": "EV CAGR (%)"}
)

# Add quadrant lines
fig.add_shape(
    type="line", x0=total_mean, y0=comparison['EV_CAGR'].min(), 
    x1=total_mean, y1=comparison['EV_CAGR'].max(),
    line=dict(color="gray", width=1, dash="dash")
)
fig.add_shape(
    type="line", x0=comparison['Total_CAGR'].min(), y0=ev_mean, 
    x1=comparison['Total_CAGR'].max(), y1=ev_mean,
    line=dict(color="gray", width=1, dash="dash")
)

# Add quadrant annotations
fig.add_annotation(
    x=total_mean + (comparison['Total_CAGR'].max() - total_mean)/2, 
    y=ev_mean + (comparison['EV_CAGR'].max() - ev_mean)/2,
    text="Better Than Average<br>(Both Total & EV)",
    showarrow=False,
    font=dict(size=10)
)
fig.add_annotation(
    x=total_mean - (total_mean - comparison['Total_CAGR'].min())/2, 
    y=ev_mean + (comparison['EV_CAGR'].max() - ev_mean)/2,
    text="Better EV Growth<br>Worse Total Growth",
    showarrow=False,
    font=dict(size=10)
)
fig.add_annotation(
    x=total_mean + (comparison['Total_CAGR'].max() - total_mean)/2, 
    y=ev_mean - (ev_mean - comparison['EV_CAGR'].min())/2,
    text="Better Total Growth<br>Worse EV Growth",
    showarrow=False,
    font=dict(size=10)
)
fig.add_annotation(
    x=total_mean - (total_mean - comparison['Total_CAGR'].min())/2, 
    y=ev_mean - (ev_mean - comparison['EV_CAGR'].min())/2,
    text="Worse Than Average<br>(Both Total & EV)",
    showarrow=False,
    font=dict(size=10)
)

# Improve layout
fig.update_layout(
    height=700, 
    width=900,
    plot_bgcolor="white",
    showlegend=False
)

fig.show()

# Save the visualization
fig.write_html("../../data/visuals/cagr_quadrant_analysis_fiscal_years.html")

Creating quadrant analysis for CAGR comparison...


## CAGR Comparison Analysis: Total Vehicle Sales vs EV Sales

The relationship between total vehicle CAGR and EV CAGR reveals important insights about India's evolving automotive market:

1. **Weak Correlation (R² = 0.21)**: The low correlation coefficient indicates that EV market growth operates largely independently from overall vehicle market trends. This suggests separate market dynamics and drivers for the EV segment.

2. **Growth Leaders**: Meghalaya stands out as exceptional in both metrics:
   - **28.47%** Total Vehicle CAGR
   - **476.63%** EV CAGR
   Making it the model state for studying successful EV market development alongside traditional vehicle markets.

3. **Strategic Market Segments**:
   - **High-EV, High-Total Growth States**: Prime markets for balanced investment across vehicle types
   - **High-EV, Lower-Total Growth States**: Potential markets experiencing EV-led transformation
   - **Lower-EV, High-Total Growth States**: Opportunities for accelerated EV adoption programs

4. **Policy Implications**: The weak relationship suggests state-specific factors (policy incentives, infrastructure development, consumer preferences) are more important than overall market health in driving EV adoption.

5. **Investment Strategy**: EV manufacturers should target high-EV-CAGR states regardless of total market performance, as the data shows EV success can occur independently of traditional market metrics.

In [19]:
# Export data files for dashboard use
print("Exporting data files for dashboard use...")

# Create directory if it doesn't exist
if not os.path.exists('../../data/visuals'):
    os.makedirs('../../data/visuals')

# Export top 10 state CAGR for total vehicles
state_cagr_top10.to_csv('../../data/visuals/top10_total_vehicles_cagr_fiscal_year.csv', index=False)

# Export top 10 EV CAGR 
top10_ev_cagr.to_csv('../../data/visuals/top10_ev_cagr_fiscal_year.csv', index=False)

# Export top 10 non-EV CAGR
top10_non_ev.to_csv('../../data/visuals/top10_non_ev_cagr_fiscal_year.csv', index=False)

# Export comparison data
comparison.to_csv('../../data/visuals/ev_total_cagr_comparison_fiscal_year.csv', index=False)

print("Data export complete.")

Exporting data files for dashboard use...
Data export complete.


## Summary of CAGR Analysis (Fiscal Years 2022-2024)

### Key Findings

1. **Total Vehicle Market Leaders**:
   - **Meghalaya**: 28.47% CAGR
   - **Goa**: 27.41% CAGR
   - **Karnataka**: 25.28% CAGR

2. **EV Growth Champions**:
   - **Meghalaya**: 476.63% CAGR
   - **Tripura**: 229.50% CAGR
   - **Nagaland**: 200.00% CAGR

3. **Market Dynamics**:
   - Weak correlation (R² = 0.21) between total vehicle growth and EV growth
   - Northeastern states showing exceptional performance across both metrics
   - EV adoption appears driven by specific local factors rather than overall market health

### Strategic Implications

1. **Targeted Investments**:
   - Focus EV charging infrastructure in high-EV-CAGR states
   - Expand dealer networks in high-total-CAGR states
   - Develop hybrid strategies for states strong in both metrics

2. **Policy Recommendations**:
   - Study successful policy frameworks in northeastern states
   - Implement region-specific incentives based on quadrant analysis
   - Support EV infrastructure in emerging markets showing early adoption signals

3. **Business Opportunities**:
   - Prioritize dealership expansion in top CAGR states
   - Develop marketing campaigns highlighting regional EV success stories
   - Create tailored financing options for different market segments

This fiscal year analysis provides a more accurate picture of market growth patterns by aligning with India's financial and policy planning cycles, offering valuable insights for automotive stakeholders and policymakers alike.