## PV production forecasting using machine learning

### Data exploration notebook

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [None]:
try:
    import seaborn as sns
    # Seaborn style (figure aesthetics only)
    sns.set(context='paper', style='whitegrid', font_scale=1.2)
    sns.set_style('ticks', {'xtick.direction':'in', 'ytick.direction':'in'})
except ImportError:
    print('Seaborn not installed. Going without it.')

## PV Data

5 seconds resolution MiRIS PV from 13/05/2019 to 21/06/2019.

In [None]:
pv = pd.read_csv('miris_pv.csv', index_col=0, parse_dates=True)
pv.head()

In [None]:
# Checking for NaN values
pv.isnull().values.any()

In [None]:
pv.loc['2019-05-13'].plot()
plt.show()

In [None]:
# Resampling the dataset from 5-seconds to 15-minutes resolution (using mean)
pv = pv.resample('15min').mean()
pv.head()

In [None]:
pv.loc['2019-05-13'].plot()
plt.show()

## Weather Data

15-minute resolution weather data

The file is composed of forecast of several weather variables:

    CD = low clouds (0 to 1)
    CM = medium clouds (0 to 1)
    CU = high clouds (0 to 1)
    PREC = precipitation (mm / 15 min)
    RH2m = relative humidity (%)
    SNOW = snow height (mm)
    ST = Surface Temperature (°C)
    SWD = Global Horizontal Irradiance (W/m2)
    SWDtop = Total Solar Irradiance at the top of the atmosphere (W/m2)
    TT2M = temperature 2 meters above the ground (°C)
    WS100m = Wind speed at 100m from the ground (m/s)
    WS10m = Wind speed at 10m from the ground (m/s)

In [None]:
we = pd.read_csv('weather_data.csv', index_col=0, parse_dates=True)
we.head()

In [None]:
# Checking for NaN values
we.isnull().values.any()

In [None]:
we[['ST', 'TT2M']].loc['2019-05-13'].plot()
plt.show()