<a href="https://colab.research.google.com/github/nithin-grk/Energy-Consumption-Forecasting-Using-Machine-Learning/blob/main/Energy_Consumption_Forecasting_Using_Machine_Learning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Energy Consumption Forecasting Using Machine Learning

In recent years, the demand for efficient energy management has grown significantly, driven by increasing energy costs and the global push for sustainability. This project, titled "Energy Consumption Forecasting Using Machine Learning", aims to predict household energy usage based on historical time-series data.

Using the UCI Individual Household Electric Power Consumption dataset, this offline project involves collecting and preprocessing real-world electricity usage data, extracting time-based features, and applying regression models such as Random Forest Regressor to forecast short-term energy consumption.

This project emphasizes:
*   Time-Series Data Handling
*   Feature Engineering
*   Model Evaluation and Visualization
*   Offline Workflow using core libraries: NumPy, Pandas, Scikit-Learn and Matplotlib

By building a robust predictive model, this project showcases the potential of machine learning in enhancing energy  efficiency and contributes toward smart grid technologies and informed energy usage behavior.





# Step 1: Load and Inspect the Dataset

We need to download the dataset from the UCI Machine Learning Repository

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
import pandas as pd

In [None]:
power_data = "./drive/MyDrive/Colab Notebooks/Energy Consumption/sample_power_data.txt"

In [None]:
power_data_df = pd.read_csv(power_data, sep = ";")
power_data_df

Unnamed: 0,Date,Time,Global_active_power,Global_reactive_power,Voltage,Global_intensity,Sub_metering_1,Sub_metering_2,Sub_metering_3
0,16/12/2006,17:24:00,4.216,0.418,234.84,18.4,0.0,1.0,17.0
1,16/12/2006,17:25:00,5.36,0.436,233.63,23.0,0.0,1.0,16.0
2,16/12/2006,17:26:00,5.374,0.498,233.29,23.0,0.0,2.0,17.0
3,16/12/2006,17:27:00,5.388,0.502,233.74,23.0,0.0,1.0,17.0


In [None]:
power_data_df.shape

(4, 9)

In [None]:
power_data_df.dtypes

Unnamed: 0,0
Date,object
Time,object
Global_active_power,float64
Global_reactive_power,float64
Voltage,float64
Global_intensity,float64
Sub_metering_1,float64
Sub_metering_2,float64
Sub_metering_3,float64


# Step 2: Data Cleaning & Preprocessing

Our Goals in this step:
1) Combine Data and Time into a sinngle datetime column

*   Combine Date and Time into a single datetime column
*   Convert numeric columns from strings to float

*   Handle Missing Values or Invalid Values
*   Filter/Select a managable time range if needed(for testing)





In [None]:
import numpy as np

# Combine 'Date' and 'Time' into a single datetime column
power_data_df['datetime'] = pd.to_datetime(power_data_df['Date'] + ' ' + power_data_df['Time'], format='%d/%m/%Y %H:%M:%S')

# Drop the original Date and Time columns
power_data_df.drop(['Date', 'Time'], axis=1, inplace=True)

In [None]:
# Replace '?' with NaN and convert columns to numeric
for col in power_data_df.columns:
    if col != 'datetime':
        power_data_df[col] = pd.to_numeric(power_data_df[col], errors='coerce')

In [None]:
# Drop rows with missing values (if any)
power_data_df.dropna(inplace=True)

In [None]:
# Reset index after cleaning
power_data_df.reset_index(drop=True, inplace=True)

In [None]:
# Preview cleaned data
print(power_data_df.dtypes)
power_data_df

Global_active_power             float64
Global_reactive_power           float64
Voltage                         float64
Global_intensity                float64
Sub_metering_1                  float64
Sub_metering_2                  float64
Sub_metering_3                  float64
datetime                 datetime64[ns]
dtype: object


Unnamed: 0,Global_active_power,Global_reactive_power,Voltage,Global_intensity,Sub_metering_1,Sub_metering_2,Sub_metering_3,datetime
0,4.216,0.418,234.84,18.4,0.0,1.0,17.0,2006-12-16 17:24:00
1,5.36,0.436,233.63,23.0,0.0,1.0,16.0,2006-12-16 17:25:00
2,5.374,0.498,233.29,23.0,0.0,2.0,17.0,2006-12-16 17:26:00
3,5.388,0.502,233.74,23.0,0.0,1.0,17.0,2006-12-16 17:27:00


# Step 3: Feature Engineering

In this step, we will:
extract time-based features from the datatime column(eg. hour, day of the week)

*   Extract time-based features from the datatime column(eg. hour, day of the week)
*   Create rolling averages to capture short-term patterns
*   Filter or resample data to hourly/daily level for better modeling(Optional)






# Basic Time Features

In [None]:
# Extract hour and day of week from datetime
power_data_df['hour'] = power_data_df['datetime'].dt.hour
power_data_df['day_of_week'] = power_data_df['datetime'].dt.dayofweek  # Monday=0, Sunday=6
power_data_df['month'] = power_data_df['datetime'].dt.month

# Add a Rolling Average Feature (e.g., Last 3 Readings)

In [None]:
# Rolling average of Global_active_power over last 3 readings
power_data_df['rolling_avg_power'] = power_data_df['Global_active_power'].rolling(window=3).mean()


# Step 4: Train/Test Split and Model Building