## Frequency Deviations 2015-2018: $\delta$
Raw data available from [RTE](http://clients.rte-france.com/lang/fr/clients_producteurs/vie/vie_frequence.jsp).  Missing data are marked by 'INVA' and replaced by 0. March 2016 only has frequency measurements for the first three days.

In [78]:
import pandas as pd
import numpy as np
from datetime import datetime

In [79]:
df = pd.DataFrame()
for y in ['2015','2016','2017','2018']:
    for m in ['01','02','03','04','05','06','07','08','09','10','11','12']:
        # declare name of text file for frequency data
        fname = 'RTE_Frequence_'+y+'_'+m+'.txt'
        # load data
        df_new = pd.read_csv(fname, sep=';', skiprows=1, skipfooter=1, encoding="ISO-8859-1", engine='python', decimal=',')
        df_new.drop('Unnamed: 2',axis=1,inplace=True)
        df_new.columns = ['Time','delta']
        df_new['Time'] = df_new['Time'].apply(lambda x: datetime.strptime(x, '%d/%m/%Y %H:%M:%S'))
        # Data Imputation - the missing data are marked by 'INVA', replace them by the nominal frequency (50Hz)
        df_new['delta'].replace('INVA', np.NaN, inplace=True)
        df_new.fillna(value=50, inplace=True)
        # calculate frequency deviation
        df_new['delta'] = df_new['delta'].apply(lambda x: (float(x)-50)/0.2)
        df_new[df_new['delta'] > 1] = 1
        df_new[df_new['delta'] < -1] = -1
        df = df.append(df_new, ignore_index=True)

In [80]:
# Use time as index
df.set_index('Time', inplace=True)
# Remove duplicate values from daylight saving time 
df = df[~df.index.duplicated(keep='first')]
# Data Completion: complete missing data by the mean deviation (zero)
df = df.reindex(pd.date_range('01-01-2015 00:00:00', '12-31-2018 23:59:50', freq='10s'), fill_value=0)

In [83]:
# Store the dataset
df.to_hdf('delta_10s.h5', key='df')