# Correlation of Regional Wind Power Data

In this notebook we explore the interdependencies between wind power production in different Swedish price areas. The aim is to find if there are any dependencies at all and if they can be characterized qualitatively or quantitatively.

In [48]:
# Import libraries
import numpy as np
import pandas as pd
import scipy.stats as st
import itertools
from plotly import tools
import plotly.plotly as py
import plotly.graph_objs as go

In [49]:
# Import Nord Pool wind power data
df_SE1 = pd.read_csv('data/wp_S_SE1.csv', sep=';', header=0, names=['Time', 'SE1'], index_col=0, usecols=[0,2], decimal=',')
df_SE2 = pd.read_csv('data/wp_S_SE2.csv', sep=';', header=0, names=['Time', 'SE2'], index_col=0, usecols=[0,2], decimal=',')
df_SE3 = pd.read_csv('data/wp_S_SE3.csv', sep=';', header=0, names=['Time', 'SE3'], index_col=0, usecols=[0,2], decimal=',')
df_SE4 = pd.read_csv('data/wp_S_SE4.csv', sep=';', header=0, names=['Time', 'SE4'], index_col=0, usecols=[0,2], decimal=',')
df_SE = pd.read_csv('data/wp_S_SE.csv', sep=';', header=0, names=['Time', 'SE'], index_col=0, usecols=[0, 2], decimal=',')

In [50]:
# Concatenate dataframes
df = pd.concat([df_SE1, df_SE2, df_SE3, df_SE4, df_SE], axis=1)

In [52]:
# Convert to MWh and remove last row
df = df/10**3
df = df[:-1] 
df.index = pd.to_datetime(df.index)
df.head()

Unnamed: 0,SE1,SE2,SE3,SE4,SE
2014-12-01 00:00:00,0.035631,0.411486,0.335119,,
2014-12-01 01:00:00,0.032908,0.392756,0.348272,,
2014-12-01 02:00:00,0.049588,0.417992,0.323013,,
2014-12-01 03:00:00,0.072846,0.440685,0.31559,,
2014-12-01 04:00:00,0.092837,0.453358,0.323035,,


## Scatter matrix

Using a scatter matrix, we can map out dependencies and correlations between wind power production in different price areas. Since we have a lot of data points, just plotting a scatter plot might not be the best way to visualizing the data because of overlaying data points. Instead, one can do a density plot using a kernel density estimation to spot locations in the scatter plot where data points cluster. 

In [9]:
def kde_scipy(x, y, N):
    x_min = x.min()
    x_max = x.max()
    y_min = y.min()
    y_max = y.max()

    x_space = np.linspace(x_min,x_max,N)
    y_space = np.linspace(y_min,y_max,N)
    X,Y = np.meshgrid(x_space,y_space)
    
    positions = np.vstack([X.ravel(), Y.ravel()])
    values = np.vstack([x, y])
    
    kernel = st.gaussian_kde(values)
    Z = np.reshape(kernel(positions).T, X.shape)

    return [x_space, y_space, Z]

In [63]:
def scatter_matrix(df, N=200):
    n = len(df.columns)
    corr_matrix = df.corr(method='pearson').values.tolist()
    corrs = ['{:.2f}'.format(corr) for sublist in corr_matrix for corr in sublist]
    
    fig = tools.make_subplots(rows=n, 
                              cols=n,
                              shared_xaxes=True, 
                              shared_yaxes=True,
                              subplot_titles=corrs)
        
    per_ind = itertools.product(range(1,n+1), repeat=2)
    per_col = itertools.product(df.columns, repeat=2)
    
    for ind, col in zip(per_ind,per_col):
        
        if ind[0] == ind[1]:
            x = df[col[0]].dropna()
            x_norm = (x-min(x))/(max(x)-min(x))

            trace = go.Histogram(x=x_norm, 
                                 histnorm='probability',
                                 marker=go.Marker(color='rgb(0,0,128)'),
                                 xbins=dict(start=0,
                                            end=1,
                                            size=0.16),
                                 showlegend=False)
            
            fig.append_trace(trace, ind[1], ind[0])
            
        else:
            df_temp = df[list(col)].dropna()
            x = df_temp[col[0]]
            y = df_temp[col[1]]
            x_norm = (x-min(x))/(max(x)-min(x))
            y_norm = (y-min(y))/(max(y)-min(y))

            x_space, y_space, Z = kde_scipy(x=x_norm, y=y_norm, N=N)

            trace1 = go.Contour(x=x_space,
                                y=y_space,
                                z=Z,
                                colorscale='Viridis',
                                showscale=False,
                                name='Contour')

            trace2 = go.Scatter(x=x_norm,
                                y=y_norm, 
                                mode='markers',
                                marker=go.Marker(size=1.5,
                                                 color='white',
                                                 opacity=0.4),
                                name=col[0]+'-'+col[1],
                                visible='legendonly')

            fig.append_trace(trace1, ind[1], ind[0])
            fig.append_trace(trace2, ind[1], ind[0])

        if ind[1] == n:
            fig['layout']['xaxis'+str(ind[0])].update(title=col[0])
        if ind[0] == 1:
            fig['layout']['yaxis'+str(ind[1])].update(title=col[1])

    return fig

In [65]:
fig = scatter_matrix(df[['SE1','SE2','SE3','SE4']], N=200)
fig['layout'].update(height=900, width=900, title='Wind Power Scatter Matrix')
py.iplot(fig)

This is the format of your plot grid:
[ (1,1) x1,y1 ]  [ (1,2) x2,y1 ]  [ (1,3) x3,y1 ]  [ (1,4) x4,y1 ]
[ (2,1) x1,y2 ]  [ (2,2) x2,y2 ]  [ (2,3) x3,y2 ]  [ (2,4) x4,y2 ]
[ (3,1) x1,y3 ]  [ (3,2) x2,y3 ]  [ (3,3) x3,y3 ]  [ (3,4) x4,y3 ]
[ (4,1) x1,y4 ]  [ (4,2) x2,y4 ]  [ (4,3) x3,y4 ]  [ (4,4) x4,y4 ]

The draw time for this plot will be slow for all clients.



Estimated Draw Time Too Long



## Conclusion

From the scatter matrix presented in this notebook it can be seen that wind power production in different price areas is indeed correlated. The strongest correlation is between area SE1/SE2 and SE3/SE4. Area SE1 and SE4 seem to only have a weak correlation. From a physical perspective it is reasonable that areas closer to eachother in space are more correlated. 

In the future, one interesting point would be to see how these correlations change in time. This could be done by allowing for the user to interactively change the investigated time period