## Imports

In [2]:
import pathlib

import pandas as pd
import plotly.express as px

## Environment variables

In [3]:
pd.options.plotting.backend = "plotly"
pd.options.display.float_format = '{:.3f}'.format

## Constants definition

In [4]:
ELECTRICITY_PATH: pathlib.Path = pathlib.Path("../data/predict-energy-behavior-of-prosumers/electricity_prices.csv")

## Data load

- **origin_date** - The date when the day-ahead prices became available.
- **forecast_date** - Represents the start of the 1-hour period when the price is valid
- **euros_per_mwh** - The price of electricity on the day ahead markets in euros per megawatt hour.
- **data_block_id** - All rows sharing the same `data_block_id` will be available at the same forecast time. This is a function of what information is available when forecasts are actually made, at 11 AM each morning. For example, if the forecast weather `data_block_id` for predictins made on October 31st is 100 then the historic weather `data_block_id` for October 31st will be 101 as the historic weather data is only actually available the next day.

In [5]:
electricity: pd.DataFrame = pd.read_csv(ELECTRICITY_PATH)
electricity.head()

Unnamed: 0,forecast_date,euros_per_mwh,origin_date,data_block_id
0,2021-09-01 00:00:00,92.51,2021-08-31 00:00:00,1
1,2021-09-01 01:00:00,88.9,2021-08-31 01:00:00,1
2,2021-09-01 02:00:00,87.35,2021-08-31 02:00:00,1
3,2021-09-01 03:00:00,86.88,2021-08-31 03:00:00,1
4,2021-09-01 04:00:00,88.43,2021-08-31 04:00:00,1


In [10]:
electricity.columns

Index(['forecast_date', 'euros_per_mwh', 'origin_date', 'data_block_id'], dtype='object')

In [11]:
electricity.dtypes

forecast_date     object
euros_per_mwh    float64
origin_date       object
data_block_id      int64
dtype: object

## EDA

### Check database info

In [6]:
electricity.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15286 entries, 0 to 15285
Data columns (total 4 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   forecast_date  15286 non-null  object 
 1   euros_per_mwh  15286 non-null  float64
 2   origin_date    15286 non-null  object 
 3   data_block_id  15286 non-null  int64  
dtypes: float64(1), int64(1), object(2)
memory usage: 477.8+ KB


In [7]:
electricity.describe()

Unnamed: 0,euros_per_mwh,data_block_id
count,15286.0,15286.0
mean,157.064176,318.99071
std,121.148625,183.890301
min,-10.06,1.0
25%,85.29,160.0
50%,128.28,319.0
75%,199.7975,478.0
max,4000.0,637.0


### EIC COUNT Investigation

In [9]:
px.line(electricity, x="forecast_date", y="euros_per_mwh")

## Conclusion

1. molto semplice, bisogna solo convertire le colonne