# 🌞 Solar Energy Data Definition
 
### Objective:
This notebook defines and prepares a clean dataset for solar power system analysis.  
It imports raw generation and consumption data, interpolates missing values, and creates a structured DataFrame for further analysis or modeling.

---

## 🧭 Table of Contents
1. [Introduction](#introduction)
2. [Data Import](#data-import)
3. [Data Cleaning & Preparation](#data-cleaning--preparation)
4. [Interpolation](#interpolation)
5. [Data Transformation](#data-transformation)
6. [Final Dataset](#final-dataset)
7. [Conclusion](#conclusion)

---

## 🔍 Introduction
Solar energy data often contains missing or unevenly spaced timestamps due to weather fluctuations or measurement intervals.  
This notebook demonstrates how to:
- Load raw solar generation and consumption data.
- Perform interpolation to fill gaps.
- Normalize and format data for analysis.


In [1]:
# 📦 Importing Required Libraries
# These libraries are used for data manipulation, visualization, and interpolation.
#importing neccessary liberaries

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from scipy.interpolate import interp1d
from scipy.integrate import cumulative_trapezoid

In [2]:
# 📂 Loading Raw CSV Data
# The raw datasets include:
# - raw_generation.csv : Solar generation data
# - raw_consumption.csv : Power consumption data
#extracting the data from csv files

generation = pd.read_csv('raw_generation.csv')
consumption = pd.read_csv('raw_consumption.csv')

In [3]:
# 🏷️ Renaming Columns
# Rename the columns for readability and consistency.
#changing column names for readability

generation.columns = ['timestamp', 'generation']
consumption.columns = ['timestamp', 'consumption']

In [4]:
generation.head()

Unnamed: 0,timestamp,generation
0,6.078775,0.207411
1,6.210066,174.328767
2,6.341357,190.47224
3,6.551422,570.172255
4,6.682713,649.506881


In [5]:
consumption.head()

Unnamed: 0,timestamp,consumption
0,10.806462,404.775845
1,18.510524,99.970064
2,29.223498,362.860361
3,47.668235,362.995014
4,51.392153,16.110242


In [6]:
# ⚙️ Interpolation Functions
# We use cubic interpolation for smoother estimation of missing data points.
# `bounds_error=False` prevents errors outside the data range.
# `fill_value=0` ensures 0 generation when solar data is missing (e.g., at night).
##interpolation funtion to get missing values

i1_f = interp1d(generation['timestamp'],generation['generation'], kind='cubic',bounds_error=False, fill_value=0) 
    #cubic interpolation filling 0 for no solar generation data

i2_f = interp1d(consumption['timestamp'],consumption['consumption'],kind='cubic',fill_value="extrapolate")
    #cubic interpolation filling the missing values by extrapolating

df = pd.DataFrame({'timestamp': np.arange(1, 1441, 1)}) 
    #changing the raw timestamp to per second values

df['generation'] = i1_f(df['timestamp'])
df['consumption'] = i2_f(df['timestamp'])

df['consumption'] = df['consumption'].abs() 
    #ensuring consumption values are always positive 

In [7]:
df.head()

Unnamed: 0,timestamp,generation,consumption
0,1,0.0,1854.027788
1,2,0.0,1637.907468
2,3,0.0,1438.799843
3,4,0.0,1256.130993
4,5,0.0,1089.326997


In [8]:
# 📂 Loading Raw CSV Data
# The raw datasets include:
# - raw_generation.csv : Solar generation data
# - raw_consumption.csv : Power consumption data
time = pd.read_csv('24_hour_timeline.csv')
df['timestamp'] = time['24_Hour_Time'] 
    #changing the timestamp to time format

In [9]:
df.head()

Unnamed: 0,timestamp,generation,consumption
0,00:00,0.0,1854.027788
1,00:01,0.0,1637.907468
2,00:02,0.0,1438.799843
3,00:03,0.0,1256.130993
4,00:04,0.0,1089.326997


In [10]:
# ⚡ Adjusting Consumption Data
# Scaling consumption by a factor of 3 to visualize proportional relationships clearly.
df['consumption'] = df['consumption'] * 3
    #multiplying the consumption data with 3 to get better understanding of the calculations

In [11]:
df.head()

Unnamed: 0,timestamp,generation,consumption
0,00:00,0.0,5562.083363
1,00:01,0.0,4913.722405
2,00:02,0.0,4316.39953
3,00:03,0.0,3768.392978
4,00:04,0.0,3267.980991


In [12]:
# 📦 Importing Required Libraries
# These libraries are used for data manipulation, visualization, and interpolation.
df.to_csv('dataset.csv')
    #saving the file as csv to import later