------------------------------------------------------------
# **Power Demand Forecasting for Karnataka using LSTM**
-------------------------------------------------------------

A Time Series Analysis from 2013–2023 on the Demand in Power Supply in Karnataka

Name: Susanna Gladys S

Reg No: 2448057

------------------------------------------------------------

**Overview:**
This notebook details the preprocessing steps for analysing and forecasting power demand for the Karnataka region.

**Dataset**
1. *Daily Power Generation Dataset*

  Original Source: Central Electricity Authority (India)

  Kaggle Repository: [Exact link unknown] — Originally hosted under a dataset titled "India Power Generation Data 2013-2022"

2. *Climate (Temperature) Data*

  Source: NASA POWER API (https://power.larc.nasa.gov/)

  Parameters Extracted: Daily average 2-meter air temperature (T2M)


## **DATA IMPORT + INITIAL PRE-PROCESSING**

Tasks Covered:

1. Identified and filled missing dates and values
2. Extracted time-based features like month, quarter, season (e.g., Summer, Monsoon, Winter)
3. Filtered the power generation dataset to Karnataka only
4. Defined a grid over Karnataka to sample temperature points
5. Retrieved temperature data using NASA POWER API
6. Merged datasets and aggregated by date


In [None]:
#-------------------------------------------------------------------------------------------------------------
# LOADING PACKAGES AND DATASET 1
#-------------------------------------------------------------------------------------------------------------
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
# Connect to google colab

from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
# Loading the dataset

df = pd.read_csv("drive/MyDrive/Colab Notebooks/Neural Networks/Project/Datasets/Daily_Power_Gen_States_march_23.csv")
df.head()

Unnamed: 0,Region,States,Max.Demand Met during the day(MW),Shortage during maximum Demand(MW),Energy Met (MU),date
0,NER,Mizoram,77,1.0,1.2,2015-01-01
1,WR,DD,214,0.0,4.8,2015-01-01
2,WR,Goa,383,0.0,7.3,2015-01-01
3,WR,Maharashtra,14837,57.0,315.0,2015-01-01
4,WR,MP,5740,0.0,109.8,2015-01-01


In [None]:
# Brief information about the dataset

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 126699 entries, 0 to 126698
Data columns (total 6 columns):
 #   Column                              Non-Null Count   Dtype  
---  ------                              --------------   -----  
 0   Region                              126699 non-null  object 
 1   States                              126699 non-null  object 
 2   Max.Demand Met during the day(MW)   126699 non-null  int64  
 3   Shortage during maximum Demand(MW)  126680 non-null  float64
 4   Energy Met (MU)                     126698 non-null  float64
 5   date                                126699 non-null  object 
dtypes: float64(2), int64(1), object(3)
memory usage: 5.8+ MB


In [None]:
# Size of the dataset

df.shape

(126699, 6)

In [None]:
#-------------------------------------------------------------------------------------------------------------
# TASK 1: Checking for null values
#-------------------------------------------------------------------------------------------------------------
df.isnull().sum()

Unnamed: 0,0
Region,0
States,0
Max.Demand Met during the day(MW),0
Shortage during maximum Demand(MW),19
Energy Met (MU),1
date,0


In [None]:
# Filling the missing values

df['Shortage during maximum Demand(MW)'] = df['Shortage during maximum Demand(MW)'].fillna(df['Shortage during maximum Demand(MW)'].mean())
df['Energy Met (MU)'] = df['Energy Met (MU)'].fillna(df['Energy Met (MU)'].mean())

In [None]:
# Converting date to datetime from object type

df['date'] = pd.to_datetime(df['date'])
df['DayName'] = pd.to_datetime(df['date']).apply(lambda x: x.day_name())
df['MonthName'] = pd.to_datetime(df['date']).apply(lambda x: x.month_name())
df['Year'] = pd.to_datetime(df['date']).dt.year
df['Quarter'] = pd.to_datetime(df['date']).dt.quarter
df['Month'] = pd.to_datetime(df['date']).dt.month


In [None]:
#-------------------------------------------------------------------------------------------------------------
# TASK 2: Extracting time based feature - Season
#-------------------------------------------------------------------------------------------------------------
df["Season"] = [ "Winter" if i < 3 or i > 11 else "Spring" if 3 <= i < 6 else "Summer" if 6 <= i < 9 else "Autumn" for i in df["Month"]]

df.head()

Unnamed: 0,Region,States,Max.Demand Met during the day(MW),Shortage during maximum Demand(MW),Energy Met (MU),date,DayName,MonthName,Year,Quarter,Month,Season
0,NER,Mizoram,77,1.0,1.2,2015-01-01,Thursday,January,2015,1,1,Winter
1,WR,DD,214,0.0,4.8,2015-01-01,Thursday,January,2015,1,1,Winter
2,WR,Goa,383,0.0,7.3,2015-01-01,Thursday,January,2015,1,1,Winter
3,WR,Maharashtra,14837,57.0,315.0,2015-01-01,Thursday,January,2015,1,1,Winter
4,WR,MP,5740,0.0,109.8,2015-01-01,Thursday,January,2015,1,1,Winter


In [None]:
#-------------------------------------------------------------------------------------------------------------
# TASK 3: Filter rows where the state is Karnataka
#-------------------------------------------------------------------------------------------------------------
df_karnataka = df[df['States'].str.strip().str.lower() == 'karnataka']

# Preview the Karnataka data
df_karnataka.head()

Unnamed: 0,Region,States,Max.Demand Met during the day(MW),Shortage during maximum Demand(MW),Energy Met (MU),date,DayName,MonthName,Year,Quarter,Month,Season
32,SR,Karnataka,7914,300.0,166.1,2015-01-01,Thursday,January,2015,1,1,Winter
67,SR,Karnataka,9011,0.0,183.0,2016-01-01,Friday,January,2016,1,1,Winter
74,SR,Karnataka,8232,600.0,174.7,2016-01-01,Friday,January,2016,1,1,Winter
138,SR,Karnataka,9800,0.0,197.1,2018-01-01,Monday,January,2018,1,1,Winter
178,SR,Karnataka,10675,0.0,199.3,2019-01-01,Tuesday,January,2019,1,1,Winter


In [None]:
df_karnataka.shape

(3539, 12)

In [None]:
#-------------------------------------------------------------------------------------------------------------
# TASK 4: Defining a grid over karnataka region to extract temperature data from NASA POWER Data Access Viewer
#-------------------------------------------------------------------------------------------------------------
import numpy as np

# Define bounding box of Karnataka (approx.)
lat_min, lat_max = 11.5, 18.5
lon_min, lon_max = 74.0, 78.5

# Step size for grid (0.5°)
step = 0.5

lats = np.arange(lat_min, lat_max + step, step)
lons = np.arange(lon_min, lon_max + step, step)
points = [(round(lat, 3), round(lon, 3)) for lat in lats for lon in lons]

print(f"Defined {len(points)} grid points over Karnataka.")

Defined 150 grid points over Karnataka.


In [None]:
#-------------------------------------------------------------------------------------------------------------
# TASK 5: Retrieving Temperature Data from NASA POWER Data Access Viewer
#-------------------------------------------------------------------------------------------------------------


import requests
import pandas as pd
from tqdm import tqdm # For a nice progress bar

# This is your accurate list of coordinates. It's all we need.
valid_coords = [
    (12.0, 74.5), (12.0, 75.0), (12.0, 75.5), (12.0, 76.0), (12.0, 76.5), (12.0, 77.0), (12.0, 77.5), (12.0, 78.0), (12.0, 78.5),
    (12.5, 74.5), (12.5, 75.0), (12.5, 75.5), (12.5, 76.0), (12.5, 76.5), (12.5, 77.0), (12.5, 77.5), (12.5, 78.0), (12.5, 78.5),
    (13.0, 74.5), (13.0, 75.0), (13.0, 75.5), (13.0, 76.0), (13.0, 76.5), (13.0, 77.0), (13.0, 77.5), (13.0, 78.0), (13.0, 78.5),
    (13.5, 74.5), (13.5, 75.0), (13.5, 75.5), (13.5, 76.0), (13.5, 76.5), (13.5, 77.0), (13.5, 77.5), (13.5, 78.0), (13.5, 78.5),
    (14.0, 74.5), (14.0, 75.0), (14.0, 75.5), (14.0, 76.0), (14.0, 76.5), (14.0, 77.0), (14.0, 77.5), (14.0, 78.0), (14.0, 78.5),
    (14.5, 74.5), (14.5, 75.0), (14.5, 75.5), (14.5, 76.0), (14.5, 76.5), (14.5, 77.0), (14.5, 77.5), (14.5, 78.0), (14.5, 78.5),
    (15.0, 74.5), (15.0, 75.0), (15.0, 75.5), (15.0, 76.0), (15.0, 76.5), (15.0, 77.0), (15.0, 77.5), (15.0, 78.0), (15.0, 78.5),
    (15.5, 74.5), (15.5, 75.0), (15.5, 75.5), (15.5, 76.0), (15.5, 76.5), (15.5, 77.0), (15.5, 77.5), (15.5, 78.0), (15.5, 78.5)
]

start_date = "2013-01-01"
end_date = "2023-01-31" # The end of your power data
variable = "T2M"

all_temps_df = pd.DataFrame()

print(f"Fetching temperature data for all {len(valid_coords)} locations in Karnataka...")
for lat, lon in tqdm(valid_coords):
    api_url = (
        f"https://power.larc.nasa.gov/api/temporal/daily/point"
        f"?parameters={variable}&community=AG&longitude={lon}&latitude={lat}"
        f"&start={start_date.replace('-', '')}&end={end_date.replace('-', '')}&format=JSON"
    )

    response = requests.get(api_url)
    if response.status_code == 200:
        data = response.json()
        temps = data['properties']['parameter'][variable]
        temp_df = pd.DataFrame(temps.items(), columns=['date', f'temp_{lat}_{lon}'])
        temp_df['date'] = pd.to_datetime(temp_df['date'], format='%Y%m%d')

        if all_temps_df.empty:
            all_temps_df = temp_df
        else:
            all_temps_df = pd.merge(all_temps_df, temp_df, on='date')
    else:
        print(f"Failed to fetch data for {lat}, {lon}. Status: {response.status_code}")

# Identify all the individual temperature columns
temp_cols = [col for col in all_temps_df.columns if col.startswith('temp_')]
# Calculate the average temperature for each day
all_temps_df['temp_avg_karnataka'] = all_temps_df[temp_cols].mean(axis=1)

# This is your final, correct temperature DataFrame
df_temp = all_temps_df[['date', 'temp_avg_karnataka']]

print("\nSuccessfully created a state-wide average temperature feature:")
print(df_temp.head())

Fetching temperature data for all 72 locations in Karnataka...


100%|██████████| 72/72 [03:50<00:00,  3.20s/it]


Successfully created a state-wide average temperature feature:
        date  temp_avg_karnataka
0 2013-01-01           25.355556
1 2013-01-02           24.577500
2 2013-01-03           24.740139
3 2013-01-04           25.044167
4 2013-01-05           25.203889





In [None]:
#-------------------------------------------------------------------------------------------------------------
# TASK 5: Retrieving Temperature Data from NASA POWER Data Access Viewer
#-------------------------------------------------------------------------------------------------------------
import requests
import pandas as pd

valid_coords = [
    (12.0, 74.5), (12.0, 75.0), (12.0, 75.5), (12.0, 76.0), (12.0, 76.5), (12.0, 77.0), (12.0, 77.5), (12.0, 78.0), (12.0, 78.5),
    (12.5, 74.5), (12.5, 75.0), (12.5, 75.5), (12.5, 76.0), (12.5, 76.5), (12.5, 77.0), (12.5, 77.5), (12.5, 78.0), (12.5, 78.5),
    (13.0, 74.5), (13.0, 75.0), (13.0, 75.5), (13.0, 76.0), (13.0, 76.5), (13.0, 77.0), (13.0, 77.5), (13.0, 78.0), (13.0, 78.5),
    (13.5, 74.5), (13.5, 75.0), (13.5, 75.5), (13.5, 76.0), (13.5, 76.5), (13.5, 77.0), (13.5, 77.5), (13.5, 78.0), (13.5, 78.5),
    (14.0, 74.5), (14.0, 75.0), (14.0, 75.5), (14.0, 76.0), (14.0, 76.5), (14.0, 77.0), (14.0, 77.5), (14.0, 78.0), (14.0, 78.5),
    (14.5, 74.5), (14.5, 75.0), (14.5, 75.5), (14.5, 76.0), (14.5, 76.5), (14.5, 77.0), (14.5, 77.5), (14.5, 78.0), (14.5, 78.5),
    (15.0, 74.5), (15.0, 75.0), (15.0, 75.5), (15.0, 76.0), (15.0, 76.5), (15.0, 77.0), (15.0, 77.5), (15.0, 78.0), (15.0, 78.5),
    (15.5, 74.5), (15.5, 75.0), (15.5, 75.5), (15.5, 76.0), (15.5, 76.5), (15.5, 77.0), (15.5, 77.5), (15.5, 78.0), (15.5, 78.5)
]


lat, lon = 13.0, 77.5
start_date = "2013-01-01" # Corrected start date format
end_date = "2023-01-31"
variable = "T2M"

url = (
    f"https://power.larc.nasa.gov/api/temporal/daily/point"
    f"?parameters={variable}&community=AG&longitude={lon}&latitude={lat}"
    f"&start={start_date.replace('-', '')}&end={end_date.replace('-', '')}&format=JSON"
)

response = requests.get(url)
data = response.json()

# Extract temperature data
temps = data['properties']['parameter'][variable]
df = pd.DataFrame(temps.items(), columns=['date', f'temp_{lat}_{lon}'])

In [None]:
#-------------------------------------------------------------------------------------------------------------
# DATASET 2: TEMPERATURE VARIABLES EXTRACTED FROM NASA POWER DATA ACCESS VIEWER
#-------------------------------------------------------------------------------------------------------------
df_temp.head()

Unnamed: 0,date,temp_avg_karnataka
0,2013-01-01,25.355556
1,2013-01-02,24.5775
2,2013-01-03,24.740139
3,2013-01-04,25.044167
4,2013-01-05,25.203889


In [None]:
df_temp.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3683 entries, 0 to 3682
Data columns (total 2 columns):
 #   Column              Non-Null Count  Dtype         
---  ------              --------------  -----         
 0   date                3683 non-null   datetime64[ns]
 1   temp_avg_karnataka  3683 non-null   float64       
dtypes: datetime64[ns](1), float64(1)
memory usage: 57.7 KB


In [None]:
# Convert date from object type to datetype

df_temp['date'] = pd.to_datetime(df_temp['date'])

# MERGING TWO DATASETS ON DATE
df_merged = pd.merge(df_karnataka, df_temp, on='date', how='inner')
df_merged.head()


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_temp['date'] = pd.to_datetime(df_temp['date'])


Unnamed: 0,Region,States,Max.Demand Met during the day(MW),Shortage during maximum Demand(MW),Energy Met (MU),date,DayName,MonthName,Year,Quarter,Month,Season,temp_avg_karnataka
0,SR,Karnataka,7914,300.0,166.1,2015-01-01,Thursday,January,2015,1,1,Winter,23.311667
1,SR,Karnataka,9011,0.0,183.0,2016-01-01,Friday,January,2016,1,1,Winter,21.605278
2,SR,Karnataka,8232,600.0,174.7,2016-01-01,Friday,January,2016,1,1,Winter,21.605278
3,SR,Karnataka,9800,0.0,197.1,2018-01-01,Monday,January,2018,1,1,Winter,22.286944
4,SR,Karnataka,10675,0.0,199.3,2019-01-01,Tuesday,January,2019,1,1,Winter,20.953056


In [None]:
# Sort the DataFrame by date
df_merged = df_merged.sort_values(by='date').reset_index(drop=True)
df_merged.head()

Unnamed: 0,Region,States,Max.Demand Met during the day(MW),Shortage during maximum Demand(MW),Energy Met (MU),date,DayName,MonthName,Year,Quarter,Month,Season,temp_avg_karnataka
0,SR,Karnataka,7191,800.0,158.0,2013-03-31,Sunday,March,2013,1,3,Spring,30.168056
1,SR,Karnataka,7652,800.0,159.4,2013-04-01,Monday,April,2013,2,4,Spring,29.406111
2,SR,Karnataka,7620,1000.0,164.6,2013-04-02,Tuesday,April,2013,2,4,Spring,29.747639
3,SR,Karnataka,7557,1450.0,163.7,2013-04-03,Wednesday,April,2013,2,4,Spring,30.265417
4,SR,Karnataka,7540,1000.0,161.8,2013-04-04,Thursday,April,2013,2,4,Spring,30.285833


In [None]:
import os
save_dir = '/content/drive/MyDrive/Colab Notebooks/Neural Networks/Project/Datasets'
save_path = os.path.join(save_dir, 'merged.csv')
df_merged.to_csv(save_path, index=False)