# Effective Precipitation - Measured as The Energy Potential of Rainfall

### Columns present across different data sources : 
- **DateValueCET**- sets the date for specific measurement
- **TimeValueCET** - sets the hour for the specific measurement

relevant columns: 
- **Value:** which is the actual measurement
- **Unit:** values(GWh), even though it is a precipitation, it is actually measured as "the energy potential associated with rainfall and water flow rather than just millimeters of rainfall. This approach quantifies how much electricity (in GWh) could be generated from the available water resources."
- **PowerPriceAreaCode:** values (BRAZIL_SOUTH, BRAZIL_NORTH, BRAZIL_NORTHEAST, RAZIL_SOUTHEAST_CENTRALWEST) + BRAZIL_SIN

In [1]:
import pandas as pd 
import sys
import os
from get_files_only import get_file_from_xdrive

# Read the Excel file
df_rain = get_file_from_xdrive('Brazil Rain Data.xlsx')
df_rain = df_rain[['DateValueCET', 'TimeValueCET', 'PowerPriceAreaCode', 'Value', 'Unit']]

display(df_rain.head())

Unnamed: 0,DateValueCET,TimeValueCET,PowerPriceAreaCode,Value,Unit
0,2024-09-09,05:00:00,BRAZIL_NORTHEAST,-178.5164,GWh
1,2024-09-09,05:00:00,BRAZIL_SOUTH,-260.301,GWh
2,2024-09-09,05:00:00,BRAZIL_NORTH,-129.8086,GWh
3,2024-09-09,05:00:00,BRAZIL_SIN,-1050.1289,GWh
4,2024-09-09,05:00:00,BRAZIL_SOUTHEAST_CENTRALWEST,-481.5028,GWh


> NOTE: BRAZIL_SIN is not a powerprice area code, SIN is in portuguese **"National Interconnected System"**  it is a large hydrothermal system for the production and transmission of electricity, whose operation involves complex simulation models that are under the coordination and control of the National Electric System Operator - ONS, which, in turn, is inspected and regulated by the National Electric Energy Agency - ANEEL.

BRAZIL_SIN area is not neccessary for this project, as it is the aggregate of all areas and therefore it will be disregarded.

In [2]:
#Remove Brazil_SIN Area
df_rain = df_rain[(df_rain['PowerPriceAreaCode'] != 'BRAZIL_SIN')]
df_rain['DateTime'] = pd.to_datetime((df_rain['DateValueCET']).astype(str) + ' ' + (df_rain['TimeValueCET']).astype(str), format='%Y-%m-%d %H:%M:%S')
df_rain['DateTime'] = pd.to_datetime(df_rain['DateTime'], format='%Y-%m-%d %H:%M:%S')
df_rain['Month'] = df_rain['DateTime'].dt.month
#Describe the dataframe
display(df_rain.groupby(['PowerPriceAreaCode', 'Month']).describe())

Unnamed: 0_level_0,Unnamed: 1_level_0,Value,Value,Value,Value,Value,Value,Value,Value
Unnamed: 0_level_1,Unnamed: 1_level_1,count,mean,std,min,25%,50%,75%,max
PowerPriceAreaCode,Month,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2
BRAZIL_NORTH,1,1364.0,452.438348,574.973371,-326.6698,38.33505,305.07175,713.741775,4438.0876
BRAZIL_NORTH,2,1243.0,430.99955,554.053843,-326.2999,38.1977,315.0624,696.89095,5618.1041
BRAZIL_NORTH,3,1364.0,437.484891,526.980334,-291.4709,61.646425,323.53065,695.1544,4092.7563
BRAZIL_NORTH,4,1320.0,166.494689,419.585232,-327.1041,-142.794375,57.09365,338.819525,3733.5889
BRAZIL_NORTH,5,1364.0,-114.456079,253.313873,-321.5373,-243.520175,-187.7762,-86.587975,2434.3091
BRAZIL_NORTH,6,1320.0,-196.61094,106.106042,-298.1324,-242.25445,-216.60045,-182.4133,1266.0103
BRAZIL_NORTH,7,1364.0,-184.959796,48.046959,-269.0503,-212.076075,-193.2892,-169.954325,437.8235
BRAZIL_NORTH,8,1364.0,-160.378375,70.916609,-252.7584,-191.4049,-177.02055,-154.212725,601.3738
BRAZIL_NORTH,9,1299.0,-57.425902,226.458528,-231.1071,-167.80985,-141.1676,-46.58405,2955.1552
BRAZIL_NORTH,10,1333.0,167.245434,393.923721,-233.3666,-113.0879,38.8598,305.2049,3274.5201


## Data Visualization

In [3]:
import plotly.express as px

df_rain['DateTime'] = pd.to_datetime(df_rain['DateValueCET'].astype(str) + ' ' + df_rain['TimeValueCET'].astype(str))
df_rain['DateValueCET'] = pd.to_datetime(df_rain['DateValueCET'])

# Convert 'TimeValueCET' to hour format
df_rain['Hour'] = pd.to_datetime(df_rain['TimeValueCET'], format='%H:%M:%S').dt.hour

year_filter = 2024
df_rain_graph = df_rain[df_rain['DateValueCET'].dt.year == year_filter]

# Create a line plot
fig_1 = px.line( 
    df_rain_graph,
    x='DateTime',
    y='Value',
    color='PowerPriceAreaCode',
    title=f'Energy Potential of Rainfall in (GWh) Over Time by Region for {year_filter}',
    labels={'Rainfall Potential (GWh)': 'Rainfall Potential (GWh)', 'DateTime': 'Date and Time'}
)

""" # Update layout for better visibility
fig.update_layout(
    xaxis_title='Date and Time',
    yaxis_title='Rainfall Potential (GWh)',
    legend_title='Region',
    xaxis=dict(tickformat="%Y-%m-%d %H:%M")
) """

fig_1.show()
year_filter = 2023
df_rain_graph = df_rain[df_rain['DateValueCET'].dt.year == year_filter]

# Create a line plot
fig_2 = px.line( 
    df_rain_graph,
    x='DateTime',
    y='Value',
    color='PowerPriceAreaCode',
    title=f'Energy Potential of Rainfall in (GWh) Over Time by Region for {year_filter}',
    labels={'Rainfall Potential (GWh)': 'Rainfall Potential (GWh)', 'DateTime': 'Date and Time'}
)

""" # Update layout for better visibility
fig.update_layout(
    xaxis_title='Date and Time',
    yaxis_title='Rainfall Potential (GWh)',
    legend_title='Region',
    xaxis=dict(tickformat="%Y-%m-%d %H:%M")
) """

fig_2.show()

year_filter = 2022
df_rain_graph = df_rain[df_rain['DateValueCET'].dt.year == year_filter]

# Create a line plot
fig_3 = px.line( 
    df_rain_graph,
    x='DateTime',
    y='Value',
    color='PowerPriceAreaCode',
    title=f'Energy Potential of Rainfall in (GWh) Over Time by Region for {year_filter}',
    labels={'Rainfall Potential (GWh)': 'Rainfall Potential (GWh)', 'DateTime': 'Date and Time'}
)

""" # Update layout for better visibility
fig.update_layout(
    xaxis_title='Date and Time',
    yaxis_title='Rainfall Potential (GWh)',
    legend_title='Region',
    xaxis=dict(tickformat="%Y-%m-%d %H:%M")
) """

fig_3.show()

# Conlusion 
- The data does not follow extremely similar patterns across years, as can be seen from the graphs above. 