# Energy Consumption Prediction
This notebook demonstrates a machine learning approach to predict energy consumption using different regression models. We'll go through data exploration, preprocessing, model training, and evaluation.

In [1]:

# Importing necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
from sklearn.linear_model import LinearRegression
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.preprocessing import StandardScaler
from sklearn.impute import SimpleImputer
import warnings
warnings.filterwarnings('ignore')


## Data Loading and Exploration

In [3]:

# Load the dataset (Assuming a CSV file named 'energy_data.csv')
df = pd.read_csv('energy_consumption_india_2024_jan_to_apr.csv')


## Data Preprocessing

In [5]:

# Check for missing values
print("Missing values:", df.isnull().sum())

# Display basic statistics
df.describe()


Missing values: Start time UTC                      0
End time UTC                        0
Start time UTC+03:00                0
End time UTC+03:00                  0
Electricity consumption in India    0
dtype: int64


Unnamed: 0,Electricity consumption in India
count,26316.0
mean,9622.028044
std,1568.064279
min,5593.0
25%,8459.0
50%,9443.0
75%,10775.0
max,15105.0


## Feature Scaling

In [7]:

# Feature Scaling using StandardScaler
scaler = StandardScaler()
scaled_features = scaler.fit_transform(df.drop('Electricity consumption in India', axis=1))
X = pd.DataFrame(scaled_features, columns=df.columns[:-1])
y = df['Electricity consumption in India']


ValueError: could not convert string to float: '2024-01-01'

## Train-Test Split

In [None]:

# Splitting data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


## Model Training and Evaluation

In [None]:

# Define a function to evaluate models
def evaluate_model(model):
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    print(f"Model: {model.__class__.__name__}")
    print(f"MAE: {mean_absolute_error(y_test, y_pred):.2f}")
    print(f"MSE: {mean_squared_error(y_test, y_pred):.2f}")
    print(f"R2 Score: {r2_score(y_test, y_pred):.2f}
")

# Linear Regression
evaluate_model(LinearRegression())

# Decision Tree
evaluate_model(DecisionTreeRegressor())

# Random Forest
evaluate_model(RandomForestRegressor())


## Visualization of Predictions

In [None]:

# Plotting actual vs predicted values using RandomForest model
model = RandomForestRegressor()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

plt.figure(figsize=(10, 6))
plt.scatter(y_test, y_pred, alpha=0.7, color='blue')
plt.xlabel('Actual Values')
plt.ylabel('Predicted Values')
plt.title('Actual vs Predicted Energy Consumption')
plt.grid(True)
plt.show()


# Conclusion
In this notebook, we explored energy consumption data and used multiple regression models to predict future energy consumption. We applied data preprocessing, scaling, and evaluated models like Linear Regression, Decision Tree, and Random Forest. The Random Forest model performed the best based on R2 score. Further model tuning and feature engineering could improve the results.