# Analysis of the 5min pv data Part 3

This notebook explores how pv power generation changes throughout the day.

The following questions will be answered
- 1: How does generated power change over the day?
- 2: How does generated power change over the day and how does this vary season?
- 3: How does generated power change over the day and how does this vary season by a specific year
- 4: How does generated power change over the day by months

In [None]:
!pip install vaex
!pip install --upgrade vaex
!pip install dash
!pip install rtree 
!pip install pygeos

In [2]:
# Have to restart runtime for this to work
import vaex as vx
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import plotly.express as px
import seaborn as sns

from datetime import datetime

# Pre-processing Data

### The Data

The Following datasets were used

- **5min parquet**: Time series data of PV solar generation data. Available: https://huggingface.co/datasets/openclimatefix/uk_pv/tree/main. For information about the data read more here https://huggingface.co/datasets/openclimatefix/uk_pv
- **metadata**: Metadata of the different PV systems. Available: https://huggingface.co/datasets/openclimatefix/uk_pv/tree/main. Read more here https://huggingface.co/datasets/openclimatefix/uk_pv



In [3]:
# Load Datasets
min5= vx.open('/content/5min.parquet')   # 5 min data
metadata_vaex= vx.open('metadata.csv') # Metadata vaex

In [4]:
# Decomposing date
min5['day'],min5['month'],min5['year'] = min5.timestamp.dt.day, min5.timestamp.dt.month,min5.timestamp.dt.year,
min5['hour'],min5['minute'],min5['second'] = min5.timestamp.dt.hour,  min5.timestamp.dt.minute, min5.timestamp.dt.second
# Converting datetime to string for easyer processing 
min5['date'] = min5.year.astype(str)+ '-'+ min5.month.astype(str) + '-' + min5.day.astype(str)
min5['time'] = min5.hour.astype(str)+ ':'+ min5.minute.astype(str) 
min5['year_month_day'] = min5.year.astype(str) + '-' + min5.month.astype(str) + '-' + min5.day.astype(str)
min5['year'] = min5.year.astype(str)

# Functions

1. A function to convert time to datetime object, this will need to be done several times after the dataframes are converted from vaex to pandas. A function allows for this step to be completed efficiently

In [5]:
def time_to_datetime(data):
  'Convertime time to datetime object after data has been converted to pd dataframe'
  timestamp = [datetime.strptime(x, '%H:%M').time() for x in data.time]
  return timestamp

# 1) Time vs. Power Generated 

A plot of the average power generated throughout the day. Power generated is averaged across all years in the dataframe.

Average power generated increases begining at 5:00, peaks at 12:00 noon, and decreases until 20:00.


In [6]:
# -- Grouping Data
time_power_year = min5.groupby(['time']).agg({'generation_wh': 'mean'}).to_pandas_df() 
# --- Convertime time to datetime object
time_power_year['timestamp'] = time_to_datetime(time_power_year)
# --- Sorting timestamp to graph in the correct sequence
time_power_year = time_power_year.sort_values(['timestamp'], ascending=[True])

In [19]:
# Defining figure
fig = px.line(time_power_year, x="timestamp", y="generation_wh", 
              title="Average Power Generated by Time (2018-2021)",
              width=1100,height=600)

# Style
fig.update_layout(autosize=False, showlegend=False)
fig.update_layout({'xaxis': {'tickangle': 45, 'nticks': 24}})

fig.show()

# 2) Time vs. Power Generated by Season

A plot of the average power generated throughout the day across seasons. Power generated is averaged across all years in the dataframe.

In general, energy production over time follows the same trend of steadily increasing during the day, peaking at around noon, then decreasing until nightfall. However, there is some seasonal variation.

- In summer, more energy is produced. The window of energy production is approximately from 4:00 to 21:00, peaking at about 12:20.
- The energy production window shrinks steadily in spring and autumn until winter, where energy production window is at it’s narrowest - starting approximately at 7:55, and lasting approximately until 17:00.
- Interestingly, in spring and autumn energy generation peaks slightly earlier at about 11:45 and 12:10 respectively.


In [8]:
# Prepairing Data
time_power_year_season = min5.groupby(['time','month']).agg({'generation_wh': 'mean'}).to_pandas_df() 

In [9]:
# Creating a column for season
season_dict = {1: 'Winter', 2: 'Winter', 3: 'Spring',  4: 'Spring',
               5: 'Spring', 6: 'Summer', 7: 'Summer', 8: 'Summer',
               9: 'Fall', 10: 'Fall', 11: 'Fall', 12: 'Winter'}

time_power_year_season['Season'] = time_power_year_season['month'].apply(lambda x: season_dict[x])

# Average power generated by season and time 
time_power_year_season = time_power_year_season.groupby(['Season','time']).agg({'generation_wh': 'mean'}).reset_index()

# Convertime time to datetime object
time_power_year_season['timestamp'] = time_to_datetime(time_power_year_season)

# Sorting timestamp to graph in the correct sequence
time_power_year_season = time_power_year_season.sort_values(['timestamp'], ascending=[True])

In [20]:
fig = px.line(time_power_year_season, x="time", y="generation_wh", color='Season',
              title="Average Power Generated by Time and Season (2018-2021)",
              width=1100,height=600)

# Style
fig.update_layout(autosize=False, showlegend=True)
fig.update_layout({'xaxis': {'tickangle': 45, 'nticks': 24}})

fig.show()

# 3) Time vs. Power generated for a single year accross seasons

A function is defined below to plot 'time' against 'generation_wh' for a single year across seasons. This is done to allow the user to select the year of interest.


In [11]:
# Prepairing Data
# -- Creating a column for season
season_dict = {1: 'Winter', 2: 'Winter', 3: 'Spring',  4: 'Spring',
               5: 'Spring', 6: 'Summer', 7: 'Summer', 8: 'Summer',
               9: 'Fall', 10: 'Fall', 11: 'Fall', 12: 'Winter'}

min5_selected = min5[['generation_wh','month','year','time']]
# -- Calculate the average power generated by month and time
per_hour_season = min5_selected.groupby(['year','month','time']).agg({'generation_wh': 'mean'}).to_pandas_df()
# -- Create a column for season
per_hour_season['Season'] = per_hour_season['month'].apply(lambda x: season_dict[x])
# -- Re-group by season
per_hour_season = per_hour_season.groupby(['year','Season','time']).agg({'generation_wh': 'mean'}).reset_index()
# -- Convertime time to datetime object
per_hour_season['timestamp'] = time_to_datetime(per_hour_season)
# -- Sorting timestamp to graph in the correct sequence
per_hour_season = per_hour_season.sort_values(['timestamp','Season'], ascending=[True,True]).reset_index()

In [12]:
def test(data,year):
  'Generates a line plot for timestamp vs. power generated a selected year'
  # Selecting relivent columns to spead up processing time
  min5_year = data[data['year']==year]
  # plotting
  fig = px.line(min5_year, x='timestamp', y='generation_wh', color='Season',
              title= ("Average Power Generated by Time and Season in "+ year),
              width=1100,height=600)
  # update style
  fig.update_layout(autosize=False, showlegend=True)
  fig.update_layout({'xaxis': {'tickangle': 45, 'nticks': 24}})
  fig.update_xaxes(title=' time')
  fig.update_layout(yaxis_range=[0,120])
  return fig.show()

In [21]:
# defing which year to plot
test(per_hour_season,'2019')

# 4) Time vs. Power generated for a single year accross months

A function is defined below to plot 'time' against 'generation_wh' for a single year across months and days.

This is done to allow the user to select the year of interest.


In [14]:
# Function for plotting a single year
def year_plot(dataframe,year):
  'plotting time vs. energy generated by month and date'
  year_selected_df = dataframe[dataframe.year==year].sort_values(['Date','timestamp'], ascending=[True,True])
  
  fig = px.line(year_selected_df, x="timestamp", y="generation_wh", 
                animation_frame="date_str",
                title= ("Average Power Generated by Time accross Months in " + year),
                width=1100,height=600)
 
   # update style
  fig.update_layout(autosize=False, showlegend=True)
  fig.update_layout({'xaxis': {'tickangle': 45, 'nticks': 24}})
  fig.update_layout(yaxis_range=[0,200])
  fig['layout']['sliders'][0]['pad']['t'] = 150
  fig['layout']['updatemenus'][0]['pad']['t'] = 150
  return fig.show()

In [15]:
# Preparing data for plotting
# --- Grouping dataframe to convert to pandas for plotting
ymd_time = min5.groupby(['year','year_month_day','time']).agg({'generation_wh': 'mean'}).to_pandas_df() 

In [16]:
# Convert to datetime for sorting the data
# --- Convertime time to datetime object
ymd_time['timestamp']= [datetime.strptime(x, '%H:%M').time() for x in ymd_time.time]

# --- Convertime month-day to datetime object
ymd_time['Date']= [datetime.strptime(a,'%Y-%m-%d').date() for a in ymd_time.year_month_day]

# Convert back to string to use as plotty animation frame
ymd_time['date_str'] = ymd_time['Date'].apply(lambda x: str(x))

In [18]:
# calling function to plot data for 201
year_plot(ymd_time, '2021')