## EV Adoption Forecasting
As electric vehicle (EV) adoption surges, urban planners need to anticipate infrastructure needs—especially charging stations. Inadequate planning can lead to bottlenecks, impacting user satisfaction and hindering sustainability goals.

**Problem Statement:** Using the electric vehicle dataset (which includes information on EV populations, vehicle types, and possibly historical charging usage), create a model to forecast future EV adoption. For example, predict the number of electric vehicles in upcoming years based on the trends in the data.

**Goal:** Build a regression model that forecasts future EV adoption demand based on historical trends in EV growth, types of vehicles, and regional data.

**Dataset:** This dataset shows the number of vehicles that were registered by Washington State Department of Licensing (DOL) each month. The data is separated by county for passenger vehicles and trucks.

-- Date: Registration date (end of each month) from 2017-01-31 to 2024-02-29
-- County: Geographic region of the state where the owner resides
-- State: State associated with the vehicle record
-- Vehicle Primary Use: Main use of the vehicle (Passenger/Truck)
-- Battery Electric Vehicles (BEVs): Count of vehicles that run solely on battery
-- Plug-In Hybrid Electric Vehicles (PHEVs): Count of hybrid vehicles
-- Electric Vehicle (EV) Total: Sum of BEVs and PHEVs
-- Non-Electric Vehicle Total: Count of conventional vehicles
-- Total Vehicles: All registered vehicles in the county
-- Percent Electric Vehicles: Percentage of vehicles that are electric

**Dataset Link:** https://www.kaggle.com/datasets/sahirmaharajj/electric-vehicle-population-size-2024/data

### Import Required Libraries

In [1]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.preprocessing import LabelEncoder
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split, RandomizedSearchCV
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

### Load Dataset

In [2]:
# Load data
df = pd.read_csv("Electric_Vehicle_Population_By_County.csv")

### Explore and Understand the Data

In [3]:
display(df.head())
print(df.shape)
df.info()
print("\nMissing values per column:\n", df.isnull().sum())

### Check Outliers in 'Percent Electric Vehicles'

In [4]:
Q1 = df['Percent Electric Vehicles'].quantile(0.25)
Q3 = df['Percent Electric Vehicles'].quantile(0.75)
IQR = Q3 - Q1
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR
outliers = df[(df['Percent Electric Vehicles'] < lower_bound) | (df['Percent Electric Vehicles'] > upper_bound)]
print(f"{outliers.shape[0]} outlier rows in 'Percent Electric Vehicles'.")

### Data Preprocessing: Date, Nulls, Outlier Capping

In [5]:
df['Date'] = pd.to_datetime(df['Date'], errors='coerce')
df = df[df['Date'].notnull()]
df = df[df['Electric Vehicle (EV) Total'].notnull()]
df['County'] = df['County'].fillna('Unknown')
df['State'] = df['State'].fillna('Unknown')
# Cap percent EV outliers
df['Percent Electric Vehicles'] = np.where(
    df['Percent Electric Vehicles'] > upper_bound, upper_bound,
    np.where(df['Percent Electric Vehicles'] < lower_bound, lower_bound, df['Percent Electric Vehicles'])
)

### Convert Number Columns to Numeric and Feature Engineering

In [6]:
cols_to_numeric = [
    'Battery Electric Vehicles (BEVs)',
    'Plug-In Hybrid Electric Vehicles (PHEVs)',
    'Electric Vehicle (EV) Total',
    'Non-Electric Vehicle Total',
    'Total Vehicles'
]
for col in cols_to_numeric:
    df.loc[:, col] = pd.to_numeric(
        df[col].astype(str).str.replace(',', '', regex=False), errors='coerce')
    df.loc[:, col] = df[col].fillna(0)
df['Year'] = df['Date'].dt.year
df['Month'] = df['Date'].dt.month

### Quick EDA: Distribution and Trend Plots

In [7]:
sns.histplot(df['Percent Electric Vehicles'], bins=40)
plt.title('Distribution: Percent Electric Vehicles')
plt.show()
sample_df = df.sample(n=1000, random_state=42) if len(df) > 1000 else df
sns.lineplot(x='Date', y='Electric Vehicle (EV) Total', data=sample_df)
plt.title('EV Total Over Time (sample)')
plt.tight_layout()
plt.show()

### Encode Categorical Columns

In [8]:
for col in ['County', 'State', 'Vehicle Primary Use']:
    le = LabelEncoder()
    df.loc[:, col] = le.fit_transform(df[col].astype(str))

### Prepare Features and Target

In [9]:
features = [
    'Year', 'Month', 'County', 'State', 'Vehicle Primary Use',
    'Battery Electric Vehicles (BEVs)',
    'Plug-In Hybrid Electric Vehicles (PHEVs)',
    'Non-Electric Vehicle Total', 'Total Vehicles', 'Percent Electric Vehicles'
]
target = 'Electric Vehicle (EV) Total'
X = df[features]
y = df[target]

### Train-Test Split

In [10]:
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42)

### Model Training with RandomForest and Hyperparameter Search (Optimized)

In [11]:
rf = RandomForestRegressor(random_state=42)
param_grid = {
    'n_estimators': [25, 50, 100],
    'max_depth': [5, 10, 20]
}
search = RandomizedSearchCV(rf, param_grid, cv=3, n_jobs=-1, random_state=42)
search.fit(X_train, y_train)
best_rf = search.best_estimator_

### Model Evaluation & Visualization

In [12]:
y_pred = best_rf.predict(X_test)
print(f"MAE: {mean_absolute_error(y_test, y_pred):.2f}")
print(f"MSE: {mean_squared_error(y_test, y_pred):.2f}")
print(f"R2:  {r2_score(y_test, y_pred):.3f}")
sns.scatterplot(x=y_test, y=y_pred)
plt.xlabel('Actual EV Total')
plt.ylabel('Predicted EV Total')
plt.title('Actual vs Predicted EV Total (Test set)')
plt.show()

### Conclusion
- Numeric columns were converted with outlier handling, features were engineered for modeling.
- Optimized hyperparameter search finds the best RandomForest.
- Metrics are reported and predictions visualized to assess performance.
