# Forecasting Time Series of Fotovoltaic energy generation in Spain

Code Created by Luis Enrique Acevedo Galicia

Date: 2019-02-03

In this notebook, the fotovoltaic energy generation is explored.
Data was taken from https://www.ree.es/es/estadisticas-del-sistema-electrico-espanol/series-estadisticas/series-estadisticas-nacionales and it refers to rom the years 2007-2018, with a monthly frequency.

This data measures the real fotovoltaic generation of energy in Spain.

# Modules

Be sure you have all this modules installed
Pandas, plotly, cufflinks, pyramid-arima

In [1]:
import pandas as pd
import plotly

# Getting and processing the Data

In [2]:
data = pd.read_csv("FEnergy_ren_gen.csv", index_col = 0)
data.head()

Unnamed: 0_level_0,Energy_Gwh
DATE,Unnamed: 1_level_1
2007-01-01,10
2007-02-01,13
2007-03-01,22
2007-04-01,26
2007-05-01,36


Adding the timestamps to our data set

In [3]:
data.index = pd.to_datetime(data.index)
data.head()

Unnamed: 0_level_0,Energy_Gwh
DATE,Unnamed: 1_level_1
2007-01-01,10
2007-02-01,13
2007-03-01,22
2007-04-01,26
2007-05-01,36


Verifying that data is completed

In [4]:
data[pd.isnull(data["Energy_Gwh"])]

Unnamed: 0_level_0,Energy_Gwh
DATE,Unnamed: 1_level_1


A study of seasonal and trend behavior is presented.

In [5]:
from plotly.plotly import plot_mpl
from statsmodels.tsa.seasonal import seasonal_decompose
Rsl = seasonal_decompose(data, model='muliplicative')
fig = Rsl.plot()
plot_mpl(fig)

'https://plot.ly/~enrrique1/132'

In [6]:
import plotly.plotly as ply
import cufflinks as cf


In [7]:
data.iplot(title="Fotovoltaic Energy generation Jan 2007--Dec 2018", theme='solar')


Consider using IPython.display.IFrame instead



Here it can be seen that from 2007 the fotooltaic generation is an upward trend

In [8]:
from pyramid.arima import auto_arima



    The 'pyramid' package will be migrating to a new namespace beginning in 
    version 1.0.0: 'pmdarima'. This is due to a package name collision with the
    Pyramid web framework. For more information, see Issue #34:
    
        https://github.com/tgsmith61591/pyramid/issues/34
        
    The package will subsequently be installable via the name 'pmdarima'; the
    only functional change to the user will be the import name. All imports
    from 'pyramid' will change to 'pmdarima'.
    



Let's find the model with the lowest Akaike information criterion (AIC)

In [9]:
model_STW = auto_arima(data, start_p=1, start_q=2,
                           max_p=8, max_q=8, m=12,
                           start_P=1, seasonal=True,
                           d=1, D=1, trace=True,
                           error_action='ignore',  
                           suppress_warnings=True, 
                           stepwise=True) 

Fit ARIMA: order=(1, 1, 2) seasonal_order=(1, 1, 1, 12); AIC=1512.206, BIC=1532.332, Fit time=1.388 seconds
Fit ARIMA: order=(0, 1, 0) seasonal_order=(0, 1, 0, 12); AIC=1559.207, BIC=1564.958, Fit time=0.017 seconds
Fit ARIMA: order=(1, 1, 0) seasonal_order=(1, 1, 0, 12); AIC=1512.772, BIC=1524.273, Fit time=0.378 seconds
Fit ARIMA: order=(0, 1, 1) seasonal_order=(0, 1, 1, 12); AIC=1507.989, BIC=1519.489, Fit time=0.350 seconds
Fit ARIMA: order=(0, 1, 1) seasonal_order=(1, 1, 1, 12); AIC=1509.592, BIC=1523.968, Fit time=0.498 seconds
Fit ARIMA: order=(0, 1, 1) seasonal_order=(0, 1, 0, 12); AIC=1543.009, BIC=1551.634, Fit time=0.102 seconds
Fit ARIMA: order=(0, 1, 1) seasonal_order=(0, 1, 2, 12); AIC=1509.729, BIC=1524.105, Fit time=0.958 seconds
Fit ARIMA: order=(0, 1, 1) seasonal_order=(1, 1, 2, 12); AIC=1511.965, BIC=1529.216, Fit time=1.160 seconds
Fit ARIMA: order=(1, 1, 1) seasonal_order=(0, 1, 1, 12); AIC=1509.977, BIC=1524.353, Fit time=0.604 seconds
Fit ARIMA: order=(0, 1, 0) s

In [10]:
model_STW.aic()

1507.9885381250278


# Definition of train and test data

review data set

In [11]:
data.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 144 entries, 2007-01-01 to 2018-12-01
Data columns (total 1 columns):
Energy_Gwh    144 non-null int64
dtypes: int64(1)
memory usage: 2.2 KB


We have almost 10 years of data. Lets train the model with 8 years and use the last two years to compre with the real data set.

In [12]:
Train_data = data.loc["2007-01-01":"2017-12-01"]
Test_data = data.loc["2017-01-01":]

# Training the model and forecasting

In [76]:
model_STW.fit(Train_data)

ARIMA(callback=None, disp=0, maxiter=50, method=None, order=(0, 1, 1),
   out_of_sample_size=0, scoring='mse', scoring_args={},
   seasonal_order=(0, 1, 1, 12), solver='lbfgs', start_params=None,

In [90]:
Forecasting_data = model_STW.predict(len(Test_data))

In [91]:
Forecasting_data

array([484.65064478, 509.7299975 , 743.66920917, 816.20242043,
       897.73734733, 925.67451415, 953.79029295, 879.55944587,
       781.08364258, 654.83178768, 514.55562581, 437.29691093,
       482.74672976, 507.02922548, 740.17158013, 811.90793438,
       892.64600428, 919.7863141 , 947.10523589, 872.0775318 ,
       772.8048715 , 645.75615959, 504.68314072, 426.62756883])

In [92]:
Forecasting_data = pd.DataFrame(Forecasting_data, index = Test_data.index, columns = ["Prediction"] )

In [80]:
Forecasting_data.head()

Unnamed: 0_level_0,Prediction
DATE,Unnamed: 1_level_1
2017-01-01,484.650645
2017-02-01,509.729998
2017-03-01,743.669209
2017-04-01,816.20242
2017-05-01,897.737347


In [81]:
Test_data.head()

Unnamed: 0_level_0,Energy_Gwh
DATE,Unnamed: 1_level_1
2017-01-01,476
2017-02-01,445
2017-03-01,717
2017-04-01,837
2017-05-01,873


In [82]:
pd.concat([Test_data, Forecasting_data], axis = 1).iplot()


Consider using IPython.display.IFrame instead



In [83]:
pd.concat([data, Forecasting_data], axis = 1).iplot()


Consider using IPython.display.IFrame instead



# Forecasting 2019 fotovoltaic production

In [106]:
Forecasting_data2 = model_STW.predict(len(Test_data)+12)
Forecasting_data2

array([484.65064478, 509.7299975 , 743.66920917, 816.20242043,
       897.73734733, 925.67451415, 953.79029295, 879.55944587,
       781.08364258, 654.83178768, 514.55562581, 437.29691093,
       482.74672976, 507.02922548, 740.17158013, 811.90793438,
       892.64600428, 919.7863141 , 947.10523589, 872.0775318 ,
       772.8048715 , 645.75615959, 504.68314072, 426.62756883,
       471.28053066, 494.76616936, 727.11166701, 798.05116426,
       877.99237715, 904.33582996, 930.85789474, 855.03333365,
       754.96381634, 627.11824742, 485.24837155, 406.39594265])

In [104]:
Dates = pd.date_range(start='1/1/2017', end='12/1/2019', freq='MS')


In [109]:
Forecasting_data2 = pd.DataFrame(Forecasting_data2, index = Dates, columns = ["Prediction"] )

In [108]:
Forecasting_data2.iplot(title="Fotovoltaic Energy generation Jan 2017--Dec 2019", theme='solar')


Consider using IPython.display.IFrame instead



Fotovoltaic energy will remain in the Spanish market with same trend. It is possible that in years coming, the use of renewable energies will be increased and it could be an opportunity to invert in this kind of technology.

Please take into account the seasonality of this technology, fotovoltaic can be utilized to decreased the use of fossil fuels, as main energy source during summer days and possible backup in winter days.