### Load up the libraries

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import statistics as stats

### Load up the data sets

In [None]:
windy = pd.read_csv("../input/wind-power-generation/50Hertz.csv")
amprion = pd.read_csv("../input/wind-power-generation/Amprion.csv")
tennet = pd.read_csv("../input/wind-power-generation/TenneTTSO.csv")
transnet = pd.read_csv("../input/wind-power-generation/TransnetBW.csv")

### Check to see if they're all okay and dont have any missing values and datatypes are good to work with.

In [None]:
windy.head()

In [None]:
windy.describe()

In [None]:
windy.dtypes

In [None]:
windy.isnull().sum()


In [None]:
amprion.head()

In [None]:
amprion.describe()

In [None]:
amprion.dtypes

In [None]:
amprion.isnull().sum()

In [None]:
tennet.head()

In [None]:
tennet.describe()

In [None]:
tennet.dtypes

In [None]:
tennet.isnull().sum()

In [None]:
transnet.head()

In [None]:
transnet.describe()

In [None]:
transnet.dtypes

In [None]:
transnet.isnull().sum()

### Google Colab lets you collapse loads of cells under headings, sorry if scrolling through all that was painful. Show your workings and all that.

Quick graph to see the layout i'll need to represent other graphs. Plots configured to show power generated across entire years according to specific times. Here, the plots are Midnight, midday and Nine in the evening.

In [None]:
plt.figure(figsize=(20,10))
midnight = windy["00:00:00"]
midnight.plot(label="Midnight");
qtrmidnight = windy["12:00:00"]
qtrmidnight.plot(label="Midday")
nineatnight = windy["21:00:00"]
nineatnight.plot(label="Nine O'Clock(PM)")
plt.legend(loc="best");

### A quick take from this is we can see that the midday line seems to be quite prevalent indicating the best power generation is coming from this part of the day. But thats just from these three selections.

Setting lists to find the mean power generation across each company across all dates.

In [None]:
legion = []; amprionaverages = []; tennetaverages = []; transnetaverages = []
for i in range(len(windy.columns)-1):
    legion.append(windy.transpose().iloc[i +1].mean())
for i in range(len(amprion.columns)-1):
    amprionaverages.append(amprion.transpose().iloc[i +1].mean())
for i in range(len(tennet.columns)-1):
    tennetaverages.append(tennet.transpose().iloc[i +1].mean())
for i in range(len(transnet.columns)-1):
    transnetaverages.append(transnet.transpose().iloc[i +1].mean())

In [None]:
plt.figure(figsize=(10,8))
plt.plot(legion, label ="hertz")
plt.plot(amprionaverages, label="aprion")
plt.plot(tennetaverages, label = "tennet")
plt.plot(transnetaverages, label = "transnet")
plt.legend(loc="center right");

### The transsnet line looks a bit strange with the sharp rise and falls round about but on average Hertz and Tennet seem to consitently produce a higher output and at similar times of day.

### By creating subplots by themselves I can get a better look at the shape of the line.

In [None]:
plt.subplot(4,1,1)
plt.plot(legion,label="50 Hertz")
plt.legend(loc="best")
plt.subplot(4,1,2)
plt.plot(amprionaverages,label="Ampiron")
plt.legend(loc="best")
plt.subplot(4,1,3)
plt.plot(tennetaverages,label="Tennet")
plt.legend(loc="best")
plt.subplot(4,1,4)
plt.plot(transnetaverages,label="Trans")
plt.legend(loc="best");

As the curves are very very similar for 50 Hertz, Ampirion and Tennet, i can confidently say that they are generating power under the same conditions. The total output will depend on the scale of their operations though.

Setting lists to plot the medians on the same graph.

In [None]:
hertzmedians = []; amprionmedians = []; tennetmedians = []; transnetmedians = []

for i in range(len(windy.columns)-1):
    hertzmedians.append(windy.transpose().iloc[i +1].median())
for i in range(len(amprion.columns)-1):
    amprionmedians.append(amprion.transpose().iloc[i +1].median())
for i in range(len(tennet.columns)-1):
    tennetmedians.append(tennet.transpose().iloc[i +1].median())
for i in range(len(transnet.columns)-1):
    transnetmedians.append(transnet.transpose().iloc[i +1].median())

In [None]:
plt.figure(figsize=(15,10))
plt.plot(hertzmedians, label="50Hertz")
plt.plot(amprionmedians, label="Amprion")
plt.plot(tennetmedians, label="TenneTSSO")
plt.plot(transnetmedians, label="TransnetBW")
plt.legend(loc="best");

This graph compared to the mean has a more rough look to the lines. It's probably a little bit out as it is based on a non resistant measure. 

## Plotting the raw data, adding a bit of style now so it's not as boring looking at line graphs so much.

In [None]:
plt.figure(figsize=(22,10))
plt.style.use("fivethirtyeight")
plt.plot(windy.iloc[0,1:],label="50Hertz")
plt.plot(amprion.iloc[0,1:],label="Amprion")
plt.plot(tennet.iloc[0,1:],label="TenneTSSO")
plt.plot(transnet.iloc[0,1:],label="TransnetBW")
plt.legend(loc=10)
plt.xticks(rotation=90)
plt.ylabel("THw Generated")
plt.xlabel("Time Of Day");


I probably should have led with this one as its much much clearer as to whats going on. There are clear surges in generation between 18:00 hours and about 04:00. Night time is obviously a much better time of day for producing energy.

In [None]:
maxhertz = []; maxamprion =[]; maxtennet = []; maxtransnet = []
for i in range(len(windy.columns)-1):
    maxhertz.append(max(windy.iloc[i, i+1:]))
for i in range(len(amprion.columns)-1):
    maxamprion.append(max(amprion.iloc[i, i+1:]))
for i in range(len(tennet.columns)-1):
    maxtennet.append(max(tennet.iloc[i, i+1:]))
for i in range(len(transnet.columns)-1):
    maxtransnet.append(max(transnet.iloc[i, i+1:]))

In [None]:
nextdataframe = [["Max Value",max(maxhertz), max(maxamprion), max(maxtennet), max(maxtransnet)],\
                ["Median Value", stats.median(hertzmedians), stats.median(amprionmedians), stats.median(tennetmedians), stats.median(transnetmedians)],\
                ["Mean Value", stats.mean(legion), stats.mean(amprionaverages), stats.mean(tennetaverages), stats.mean(transnetaverages)]]
tempmax = pd.DataFrame(nextdataframe, columns=["Value Type", "50 Hertz", "Amprion","Tennet", "Transnet"])

### Plotting Max, Median and Mean values for an idea of the range of power generation.

In [None]:
fig, axes = plt.subplots(1, 4, figsize=(15, 10), sharey=True)
fig.suptitle('Values For Each Company')
sns.barplot(ax=axes[0], x=tempmax["Value Type"], y=tempmax["50 Hertz"])
axes[0].set_title("Various")
sns.barplot(ax=axes[1], x=tempmax["Value Type"], y=tempmax["Amprion"])
axes[1].set_title("Values")
sns.barplot(ax=axes[2], x=tempmax["Value Type"], y=tempmax["Tennet"])
axes[2].set_title("For")
sns.barplot(ax=axes[3], x=tempmax["Value Type"], y=tempmax["Transnet"])
axes[3].set_title("Each");

### There is are two clear leaders in production. 50 Hertz and Tennet have both produced similar maximum values and their mean/medians are also very similar too.

This is my first ever submission of an EDA and one of the first ones I've ever really attempted. Thank you for viewing, any feedback regarding my approach/ coding techniques are greatly appreciated.