# Electric Vehicle Range Prediction
## End-to-End Machine Learning Workflow
This notebook estimates the driving range of electric vehicles using regression models. Data is cleaned, features are engineered, a model is trained, and results are exported for Power BI visualization.

### 📂 Required Input File: 'Electric_Vehicle_Population_Data.xlsx'
Please place this file in the same directory as this notebook before running.

In [52]:

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error, r2_score
from sklearn.preprocessing import LabelEncoder


In [53]:

# Load the dataset
df = pd.read_excel('cleanedprojectdataset.xlsx')

# Display initial data
df.head()


Unnamed: 0,Vehicle Identification Numbers,Country,City,Postal Code,Model Year,Manufacturer,Model,Electric Vehicle Type,Clean Alternative Fuel Vehicle (CAFV) Eligibility,Electric Vehicle Range(Miles),Base Price ($),Legislative District,Utility Provider
0,1N4BZ0CP5G,King,Seattle,98125,2016,NISSAN,LEAF,Battery Electric Vehicle (BEV),Clean Alternative Fuel Vehicle Eligible,84,56925,46,SEATTLE CITY LIGHT
1,KNDJX3AEXG,King,Renton,98058,2016,KIA,SOUL,Battery Electric Vehicle (BEV),Clean Alternative Fuel Vehicle Eligible,93,31950,11,TACOMA POWER
2,5YJ3E1EB2J,King,Seattle,98115,2018,TESLA,MODEL 3,Battery Electric Vehicle (BEV),Clean Alternative Fuel Vehicle Eligible,215,56925,43,SEATTLE CITY LIGHT
3,1C4RJXN64R,Kitsap,Bremerton,98312,2024,JEEP,WRANGLER,Plug-in Hybrid Electric Vehicle (PHEV),Not eligible due to low battery range,21,56925,26,PUDGET SOUND ENERGY
4,5YJ3E1EB1J,Thurston,Olympia,98512,2018,TESLA,MODEL 3,Battery Electric Vehicle (BEV),Clean Alternative Fuel Vehicle Eligible,215,56925,35,PUDGET SOUND ENERGY


In [54]:
# Rename columns for easier access
df.rename(columns={
    'Manufacturer': 'Make',
    'Electric Vehicle Range(Miles)': 'Electric Range',
    'Base Price ($)': 'Base MSRP',
    'Country': 'State'  # Assuming 'Country' stores state-level data
}, inplace=True)

# Create a 'location' column by combining City and State
df['location'] = df['City'].astype(str) + ', ' + df['State'].astype(str)

# Select relevant columns
df = df[['Model Year', 'Make', 'Model', 'Electric Vehicle Type',
         'Electric Range', 'Base MSRP', 'City', 'State', 'location']]

# Drop rows with missing key features
df = df.dropna(subset=['Electric Range', 'Base MSRP'])


In [47]:
# Extract latitude and longitude from 'Vehicle Location'
def extract_lat_lon(point):
    return pd.Series([np.nan, np.nan])  # Placeholder for real geocoding

# Step 3: Assign dummy latitude and longitude (you can replace this with actual geocoding)
df[['Latitude', 'Longitude']] = df['location'].apply(extract_lat_lon)

# Step 4: Safely drop 'Vehicle Location' only if it exists
if 'Vehicle Location' in df.columns:
    df = df.drop(columns=['Vehicle Location'])

In [55]:
from sklearn.preprocessing import LabelEncoder

# Create vehicle age feature
df['Vehicle Age'] = 2025 - df['Model Year']

# Initialize label encoders
le_type = LabelEncoder()
le_make = LabelEncoder()
le_model = LabelEncoder()

# Encode categorical variables
df['EV_Type_Encoded'] = le_type.fit_transform(df['Electric Vehicle Type'].astype(str))
df['Make_Encoded']    = le_make.fit_transform(df['Make'].astype(str))
df['Model_Encoded']   = le_model.fit_transform(df['Model'].astype(str))

# Drop non-numeric columns that are no longer needed for modeling
df.drop(columns=['Electric Vehicle Type', 'Make', 'Model', 'City', 'State'], inplace=True)


In [49]:
X = df.drop(columns='Electric Range')
y = df['Electric Range']

# Convert categorical variables to numeric using one-hot encoding
X_encoded = pd.get_dummies(X, drop_first=True)

# Split the data
X_train, X_test, y_train, y_test = train_test_split(
    X_encoded, y, test_size=0.2, random_state=42
)

# Train Random Forest Regressor
model = RandomForestRegressor(random_state=42)
model.fit(X_train, y_train)

# Predict and evaluate
y_pred = model.predict(X_test)
print("MAE:", mean_absolute_error(y_test, y_pred))
print("R² Score:", r2_score(y_test, y_pred))

MAE: 3.9088458333333334
R² Score: 0.9768668910620069


In [50]:

# Save predictions for Power BI
df_results = X_test.copy()
df_results['Actual Range'] = y_test
df_results['Predicted Range'] = y_pred

output_csv = "EV_Range_Predictions_For_PowerBI.csv"
df_results.to_csv(output_csv, index=False)
print(f"Results saved to {output_csv}")


Results saved to EV_Range_Predictions_For_PowerBI.csv
