# Distribution of Regional Wind Power

In this notebook we explore the distribution of wind power production in Sweden. How the production has been distributed historically and if there are any trends or seasonalities.

## Import Python libraries

In [7]:
# Import libraries
import numpy as np
import pandas as pd
import plotly.plotly as py
import plotly.graph_objs as go
import plotly.figure_factory as ff
from plotly import tools

## Import data

In [8]:
# Import Nord Pool wind power data
df_SE1 = pd.read_csv('data/wp_S_SE1.csv', sep=';', header=0, names=['Time', 'SE1'], index_col=0, usecols=[0,2], decimal=',')
df_SE2 = pd.read_csv('data/wp_S_SE2.csv', sep=';', header=0, names=['Time', 'SE2'], index_col=0, usecols=[0,2], decimal=',')
df_SE3 = pd.read_csv('data/wp_S_SE3.csv', sep=';', header=0, names=['Time', 'SE3'], index_col=0, usecols=[0,2], decimal=',')
df_SE4 = pd.read_csv('data/wp_S_SE4.csv', sep=';', header=0, names=['Time', 'SE4'], index_col=0, usecols=[0,2], decimal=',')
df_SE = pd.read_csv('data/wp_S_SE.csv', sep=';', header=0, names=['Time', 'SE'], index_col=0, usecols=[0, 2], decimal=',')

In [9]:
# Concatenate dataframes
df = pd.concat([df_SE1, df_SE2, df_SE3, df_SE4, df_SE], axis=1)

In [10]:
# Convert to MWh and remove last row
df = df/10**3
df = df[:-1] 
df.index = pd.to_datetime(df.index)
df.head()

Unnamed: 0,SE1,SE2,SE3,SE4,SE
2014-12-01 00:00:00,35.631305,411.486402,335.119003,,
2014-12-01 01:00:00,32.907812,392.75638,348.271535,,
2014-12-01 02:00:00,49.587869,417.99198,323.013154,,
2014-12-01 03:00:00,72.84576,440.684582,315.589557,,
2014-12-01 04:00:00,92.836764,453.357844,323.035225,,


## Histogram

The wind power production histograms have quite expected appearence. They seem to follow some kind of Weibull distribution, which is natural since that is the distribution the wind follows. It can also be seen that the SE1 area has a much shorter tail and has a higher probability around zero. I am not sure about the reason for this.

In [11]:
trace_SE1 = go.Histogram(x=df.SE1, xbins=dict(start=0, end=2000, size=50), histnorm='probability', name = 'SE1')
trace_SE2 = go.Histogram(x=df.SE2, xbins=dict(start=0, end=2000, size=50), histnorm='probability', name = 'SE2')
trace_SE3 = go.Histogram(x=df.SE3, xbins=dict(start=0, end=2000, size=50), histnorm='probability', name = 'SE3')
trace_SE4 = go.Histogram(x=df.SE4, xbins=dict(start=0, end=2000, size=50), histnorm='probability', name = 'SE4')

data = [trace_SE1, trace_SE2, trace_SE3, trace_SE4]
layout = go.Layout(title='Wind Power Distribution',
                   xaxis=dict(title='Production [MW]'),
                   yaxis=dict(title='Probability'))
fig = dict(data=data, layout=layout)
py.iplot(fig)

## Kernel density estimation

In [12]:
columns = ['SE1','SE2','SE3','SE4']
hist_data = [df[column][5000:10000].dropna()/df[column][5000:10000].max() for column in columns]
fig = ff.create_distplot(hist_data, columns, bin_size=100, curve_type='kde', show_hist=False)
fig['layout'].update(title='Wind Power Distribution',
                     xaxis=dict(title='Production [MW]'),
                     yaxis=dict(title='Probability'))

py.iplot(fig)

## Box plot

In order to spot seasonalities, we will have a look at box plots made from the data on different time scales. These plots are simply the distribution but plotted in time. It is expected that there is an increasing trend on a yearly basis because of increased installed capacity, a seasonally varying trend on a monthly and daily basis due to the seasons and day-night cycles. 

It is hard to depict any trend on the yearly basis. This is probably due to the few number of years available. On a year-to-year basis the strenght of the wind can change resulting in higher production for a year with lower installed capacity. However, in the long run, the increase in installed capacity should be visible. From the box plots on the monthly basis, it can clearly be seen that there is a higher production in the winter compared to the summer. Furthermore, on a daily basis, there seem to be a diurnal pattern

In [13]:
def box_plot(df, area, freq='year'):
    groups = df[area].groupby(getattr(df[area].index, freq))
    
    data = []
    for name, group in groups:
        
        date = str(name)
        values = group.values

        trace = go.Box(y=values,
                       name=date,
                       marker=dict(color='rgb(107,174,214)'),
                       boxpoints='outliers')
        data.append(trace)

    layout = go.Layout(title='Grouped Wind Power Production in ' + area,
                       xaxis=dict(title=freq.title()),
                       yaxis=dict(title='Production [MW]'),
                       showlegend=False)
    fig = go.Figure(data=data, layout=layout)
    
    return fig

### Yearly

In [14]:
fig = box_plot(df, area='SE1', freq='year')
py.iplot(fig)

In [15]:
fig = box_plot(df, area='SE2', freq='year')
py.iplot(fig)

In [16]:
fig = box_plot(df, area='SE3', freq='year')
py.iplot(fig)

PlotlyRequestError: Account limit reached: Your account is limited to creating 25 charts. To continue, you can override or delete existing charts or you can upgrade your account at: https://plot.ly/products/cloud

In [None]:
fig = box_plot(df, area='SE4', freq='year')
py.iplot(fig)

### Monthly

In [None]:
fig = box_plot(df, area='SE1', freq='month')
py.iplot(fig)

In [None]:
fig = box_plot(df, area='SE2', freq='month')
py.iplot(fig)

In [None]:
fig = box_plot(df, area='SE3', freq='month')
py.iplot(fig)

In [None]:
fig = box_plot(df, area='SE4', freq='month')
py.iplot(fig)

### Hourly

In [None]:
fig = box_plot(df, area='SE1', freq='hour')
py.iplot(fig)

In [None]:
import plotly
div = plotly.offline.plot(fig, include_plotlyjs=False, output_type='div')
text_file = open("windpower-box-hourly-se1.html", "w")
text_file.write(div)
text_file.close()

In [None]:
fig = box_plot(df, area='SE2', freq='hour')
py.iplot(fig)

In [None]:
fig = box_plot(df, area='SE3', freq='hour')
py.iplot(fig)

In [None]:
fig = box_plot(df, area='SE4', freq='hour')
py.iplot(fig)

### Conclusion

It can be concluded that the regional wind power seems to follow a Weibull distribution in all areas. The wind power seems to decrease slightly as seen over the past years. However, at this point it is hard to clonclude anything since there is only about three years of data available. The expectation would be an increasing trend over the years when the meteorological variability is averaged out. It is clear that there are both monthly and hourly trends in the data. For example there is more wind power production in the winter compared to summer. Also, it seem the there is slightly less production during day hours compared to night hours. However, the relative variation is much smaller as compared to the monthly variability. 