**Solar generation and power demand in Italy in 2016**

In this kernel I make an exploration of the data associated to solar power generation and total power consumption in Italy during the years 2015 and 2016. Information about the datasets can be found in the Data tab.
I start with 2016 data. First I load the data to a pandas DataFrame and check the data structure and format.

In [None]:
import numpy as np
import pandas as pd
import matplotlib, matplotlib.pyplot as plt

from subprocess import check_output
print(check_output(["ls", "../input"]).decode("utf8"))

data16 = pd.read_csv("../input/TimeSeries_TotalSolarGen_and_Load_IT_2016.csv")    
print(data16.shape)
data16.head(10)

In [None]:
data16.tail(10)

The dataset consists in a spreadsheet with three columns: time, electricity demand and solar power generation, these two last expressed in MW. The time is expressed in Coordinated Universal Time (UTC), and the format is "%Y-%m-%dT%H%M%SZ". I will split this format in order to separate date and time. 

Year 2016 was a leap year. As data were recorded every 60 minutes, the total number of rows in the structure is 8784 (= 366 x 24). My idea is to reshape the data to a 24 x 366 size, in order to make some graphical visualization of the magnitudes of interest as a function of day of the year.

In [None]:
data16.columns = ['date_time', 'load', 'solar_gen'] # rename the columns
data16['date_time'] = pd.to_datetime(data16['date_time']) # new timestamp format
data16['date'] = data16['date_time'].dt.date # return date from timestamp
data16['time'] = data16['date_time'].dt.time # return time from timestamp
data16.head()

In [None]:
data16.drop(['date_time'], axis = 1, inplace = True) # remove column labeled date_time in the same dataframe (inplace)
data16 = data16[['date', 'time', 'load', 'solar_gen']] # reorder the columns
data16.head()

Now I pivot the DataFrame using 'time' as index and putting 'date' in the columns. As both load and solar_gen are ordered by date, the pivot function automatically shape the Frame as shown below.

In [None]:
data16 = data16.pivot(index = 'time', columns = 'date')
print(data16.shape)

In [None]:
data16.head()

Some figures:

In [None]:
plt.figure() # colormap plot of solar power generation for each day, 1-hour period
plt.imshow(data16['solar_gen'], aspect = 'auto', interpolation = 'gaussian')
plt.yticks(np.arange(23, -1, -1))
plt.colorbar()
plt.xlabel('Day of year')
plt.ylabel('Time of day')
plt.title('Solar Generation [MW]')
plt.show()

It can be seen the time extension of the daily power generation during summer season and few weeks before and after it (approx. central 150 days of the year). Some outliers are also found, around day 300, which will be better appreciated later by visualizing the peak solar power versus date. 

In [None]:
plt.figure() # colormap plot of power demand for each day, 1-hour period
plt.imshow(data16['load'], aspect = 'auto', interpolation = 'gaussian')
plt.yticks(np.arange(23, -1, -1))
plt.colorbar()
plt.xlabel('Day of year')
plt.ylabel('Time of day')
plt.title('Load [MW]')
plt.show()

The blanks evidence missing data appearing from 24 to 27 November. The fringes highlight the decrease of power demand during the weekends, very likely due to a lower activity of the industrial sector. This will be better visualized below, by taking a look to peak demand as a function of date. A clear increase in the consumption can also be seen towards hottest and coldest months.
Now I create a new DataFrame with peak values.

In [None]:
max_data16 = pd.DataFrame(data16['solar_gen'].max(), columns=['max_solar_gen']) # create a dataframe, 1 column
max_data16['max_load'] = data16['load'].max() # add a new column to the dataframe
max_data16.head()

The max_data frame arranges with date as index. The following pictures summarize the peak magnitudes:

In [None]:
plt.figure()
max_data16['max_solar_gen'].plot(style=['or-'])
plt.ylabel('Maximum daily Solar Generation [MW]')
plt.xlabel('Date')
plt.show()

In the period 27-31 October the peak solar generation shows excesively large values, well above those usually measured.

In [None]:
fig, axes = plt.subplots(ncols=2, figsize=(15, 5))
axes[0].plot(max_data16['max_load'], 'og-')
axes[1].plot(max_data16['max_load'].iloc[:30], 'og-')
max_data16['weekday'] = max_data16.index.weekday
axes[1].plot(max_data16['max_load'].iloc[:30][max_data16['weekday']>=5], 'sb')
axes[0].set(title='Maximum daily load 2016', xlabel='Date', ylabel='Peak load [MW]')
axes[1].set(title='Maximum daily load January 2016', xlabel='Date', ylabel='Peak load [MW]')
plt.show()

The maximum power demand shows an oscillating behavior with daily peaks which are higher in the months of July-August and December-January. The picture at right corresponds to the consumption peaks during January. Clearly the demand on Saturdays and Sundays (blue symbols) decreases significantly with respect to workdays.