# Time Series Sampling

## Introduction

Open Data API (abbreviated to `odapi`) provides convenient interfaces such as the `TimeSerieAPI` interface to make time series management easier to the end user. 

Those interfaces are derived to create connectors such as the `Irceline` connector. As a first example, we will download a trial dataset from the [Irceline][1] [API][2] using the eponymous connector. Dataset created will be used in subequent notebooks to show time series capabilities of the package.

[1]: https://www.irceline.be/en
[2]: https://github.com/irceline/open_data

### Import package

All we need is to import the `Irceline` connector form the `odapi` package:

In [1]:
from odapi.connectors import Irceline

Additionally, we may silence logs as `odapi` package is bit verbose:

In [2]:
from odapi.settings import settings
settings.logger.setLevel(40)

### Create connector

Create a instance of `Irceline` connector to control the Irceline underlying API:

In [3]:
client = Irceline()

Now, we are ready to use the Open Data API.

## Create Dataset

Now we will create a typical Air Quality Dataset to feed subsequent examples. We will sample Air Quality Data of Brussels City (Central Europe) for the 2014 civil year.

### Metadata

All interfaces from `odapi` exposes metadata. We fetch the metadata from the Irceline API to se what it holds. Because we are dealing with air quality time series, metadata will looks like measurment channels. We sample 10 channels at random and show the 8 first columns:

In [4]:
meta = client.meta
meta.sample(10).iloc[:,:8]

Unnamed: 0,serieid,siteid,measureid,serieunits,measurekey,measurename,sitekey,sitename
42,100027,1740,391,µg/m³,BC,Black Carbon,43R202,Liège
119,10928,1713,8,µg/m³,NO2,Nitrogen dioxide,42R805,Antwerpen (Belgiëlei)
165,7067,1206,8,µg/m³,NO2,Nitrogen dioxide,44N052,Zwevegem
536,6215,1048,1,µg/m³,SO2,Sulphur dioxide,40HB23,Hoboken
321,10857,1132,6002,µg/m³,PM-1.0,Particulate Matter < 1 µm,42N016,Dessel
491,7152,1220,6001,µg/m³,PM-2.5,Particulate Matter < 2.5 µm,45R511,Marcinelle
418,7135,1217,5,µg/m³,PM-10.0,Particulate Matter < 10 µm,45R501,Charleroi
211,6614,1119,38,µg/m³,NO,Nitrogen monoxide,41R002,Ixelles
471,6901,1174,6001,µg/m³,PM-2.5,Particulate Matter < 2.5 µm,43N063,Corroy-Le-Grand
159,7006,1192,8,µg/m³,NO2,Nitrogen dioxide,43R223,Jemeppe


The complete list of available metadata is:

In [5]:
meta.dtypes

serieid                      object
siteid                        int64
measureid                    object
serieunits                   object
measurekey                   object
measurename                  object
sitekey                      object
sitename                     object
seriekey                     object
molarmass                   float64
factor                      float64
sitelocation                 object
sitetype                     object
lat                         float64
lon                         float64
nuts1id                      object
nuts2id                      object
nuts3id                      object
nuts1name                    object
nuts2name                    object
nuts3name                    object
lauid                        object
launame                      object
started         datetime64[ns, UTC]
stopped         datetime64[ns, UTC]
dtype: object

The main columns we are concerned about are:

In [6]:
keys = ['serieid', 'seriekey', 'measurekey', 'sitekey', 'measurename', 'sitename']

In [7]:
client.meta.measurekey.unique()

array(['1,2-XYLENE O-XYLENE', 'Ammonia', 'p', 'BZN', 'BC', 'CO2', 'CO',
       'Hg', 'EBZ', 'MPX', 'NO2', 'NO', 'O3', 'PM-1.0', 'PM-10.0',
       'PM-2.5', 'Relative Humidity', 'SO2', 'T', 'TOL', 'WD', 'WS'],
      dtype=object)

#### Selection 

From this, we can make a precise selection of measurement channels using the `select` method (`sitekey` starting by `41` are located in Brussels):

In [8]:
sel = client.select(sitekey='41....',
                    measurekey=['NO', 'O3', 'CO', 'SO2',
                                'PM-', 'BC', 'RH', 'T$', 'W', 'p'])[keys]
sel

Unnamed: 0,serieid,seriekey,measurekey,sitekey,measurename,sitename
30,11009,p/41R001 (hPa),p,41R001,Atmospheric Pressure,Molenbeek-Saint-Jean
48,10607,BC/41R012 (µg/m³),BC,41R012,Black Carbon,Uccle
51,10693,BC/41N043 (µg/m³),BC,41N043,Black Carbon,Haren
64,6569,BC/41R001 (µg/m³),BC,41R001,Black Carbon,Molenbeek-Saint-Jean
65,6609,BC/41R002 (µg/m³),BC,41R002,Black Carbon,Ixelles
...,...,...,...,...,...,...
620,99941,T/41R012 (°C),T,41R012,Temperature,Uccle
626,99915,WD/41R001 (°G),WD,41R001,Wind Direction,Molenbeek-Saint-Jean
627,99939,WD/41R012 (°G),WD,41R012,Wind Direction,Uccle
632,99916,WS/41R001 (m/s),WS,41R001,Wind Speed (scalar),Molenbeek-Saint-Jean


In [9]:
sel

Unnamed: 0,serieid,seriekey,measurekey,sitekey,measurename,sitename
30,11009,p/41R001 (hPa),p,41R001,Atmospheric Pressure,Molenbeek-Saint-Jean
48,10607,BC/41R012 (µg/m³),BC,41R012,Black Carbon,Uccle
51,10693,BC/41N043 (µg/m³),BC,41N043,Black Carbon,Haren
64,6569,BC/41R001 (µg/m³),BC,41R001,Black Carbon,Molenbeek-Saint-Jean
65,6609,BC/41R002 (µg/m³),BC,41R002,Black Carbon,Ixelles
...,...,...,...,...,...,...
620,99941,T/41R012 (°C),T,41R012,Temperature,Uccle
626,99915,WD/41R001 (°G),WD,41R001,Wind Direction,Molenbeek-Saint-Jean
627,99939,WD/41R012 (°G),WD,41R012,Wind Direction,Uccle
632,99916,WS/41R001 (m/s),WS,41R001,Wind Speed (scalar),Molenbeek-Saint-Jean


A complete overview of the selection is (table shows serie identifiers):

In [10]:
sel.pivot_table(index='sitekey', columns='measurekey',
                values='serieid', aggfunc='first')\
   .style.format('{}', na_rep='-')

measurekey,BC,CO,CO2,NO,NO2,O3,PM-10.0,PM-2.5,SO2,T,WD,WS,p
sitekey,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
41B001,-,6500,-,6503,6504,-,-,-,6502,-,-,-,-
41B004,-,6506,-,6507,6508,6509,-,-,-,-,-,-,-
41B006,-,6514,-,6515,6516,6517,-,-,-,-,-,-,-
41B008,-,10616,-,10613,10614,-,-,-,10615,-,-,-,-
41B011,-,-,-,6527,6528,6530,6531,6532,-,99914,-,-,-
41CHA1,-,-,-,100035,100036,-,-,-,-,-,-,-,-
41MEU1,-,-,-,6550,6551,10766,6552,6553,6549,-,-,-,-
41N043,10693,6558,-,6560,6561,6562,6563,6564,6559,-,-,-,-
41R001,6569,6571,-,6573,6574,6577,6578,6579,6572,99917,99915,99916,11009
41R002,6609,6611,6612,6614,6615,-,-,-,6613,-,-,-,-


### Records

Using the selection done above, we can fetch records on a defined time range using the `get_records` method:

In [11]:
recs = client.get_records(sel, start='2012-01-01 00:00:00+0100',
                          stop='2016-01-01 00:00:00+0100')

We localize timestamps to the rightful timezone (`odapi` enforces Time Zone to be located in UTC internally):

In [12]:
recs['start'] = recs['start'].dt.tz_convert('CET')

We pivot records to align Time Series and ensure time axis is regular:

In [13]:
data = recs.merge(sel[keys])
data = data.pivot_table(index='start',
                        columns=['seriekey', 'sitekey', 'measurekey', 'serieid'],
                        values='value')
data = data.resample('1H').first()

A selection of final dataframe looks like:

In [14]:
data.filter(regex='NO.*/41R00(1|2)').tail()

seriekey,NO/41R001 (µg/m³),NO/41R002 (µg/m³),NO2/41R001 (µg/m³),NO2/41R002 (µg/m³)
sitekey,41R001,41R002,41R001,41R002
measurekey,NO,NO,NO2,NO2
serieid,6573,6614,6574,6615
start,Unnamed: 1_level_4,Unnamed: 2_level_4,Unnamed: 3_level_4,Unnamed: 4_level_4
2015-12-31 19:00:00+01:00,7.5,36.0,48.5,61.5
2015-12-31 20:00:00+01:00,3.5,17.5,36.0,46.5
2015-12-31 21:00:00+01:00,2.0,14.5,18.5,33.5
2015-12-31 22:00:00+01:00,,8.0,,23.0
2015-12-31 23:00:00+01:00,2.0,8.5,9.0,22.5


And finally, we draw some Time Series:

We store the final dataframe for subseqent examples (see next notebooks):

In [15]:
data.to_pickle("brussels_2012-2016.pickle")