# Usage and Analysis Examples

This notebook shows some basic manipulations that can be done with this module. 

First, let's import the plotting library (`matplotlib`) with a cool style

In [5]:
%matplotlib nbagg
import matplotlib.pyplot as plt
plt.style.use('ggplot')

then, import the module, and create a `data_api` object

In [1]:
import data_api
api = data_api.configure()

You can start by searching for channels, e.g.:

In [2]:
api.search_channel('FOR-PHASE-AVG')

[{'backend': 'sf-databuffer',
  'channels': ['S10CB01-RBOC-DCP10:FOR-PHASE-AVG',
   'S10CB01-RIQM-DCP10:FOR-PHASE-AVG',
   'S10CB01-RKLY-DCP10:FOR-PHASE-AVG',
   'S10CB01-RPRE-DCP10:FOR-PHASE-AVG',
   'S10CB01-RWVG100-DCP10:FOR-PHASE-AVG',
   'S10CB01-RWVG200-DCP10:FOR-PHASE-AVG',
   'S10CB01-RWVG300-DCP10:FOR-PHASE-AVG',
   'S10CB01-RWVG400-DCP10:FOR-PHASE-AVG',
   'SINDI01-RIQM-DCP10:FOR-PHASE-AVG',
   'SINDI01-RKLY-DCP10:FOR-PHASE-AVG',
   'SINDI01-RPRE-DCP10:FOR-PHASE-AVG',
   'SINDI01-RWVG100-DCP10:FOR-PHASE-AVG',
   'SINEG01-RIQM-DCP10:FOR-PHASE-AVG',
   'SINEG01-RKLY-DCP10:FOR-PHASE-AVG',
   'SINEG01-RPRE-DCP10:FOR-PHASE-AVG',
   'SINEG01-RWVG100-DCP10:FOR-PHASE-AVG',
   'SINSB01-RIQM-DCP10:FOR-PHASE-AVG',
   'SINSB01-RKLY-DCP10:FOR-PHASE-AVG',
   'SINSB01-RPRE-DCP10:FOR-PHASE-AVG',
   'SINSB01-RWVG100-DCP10:FOR-PHASE-AVG',
   'SINSB02-RIQM-DCP10:FOR-PHASE-AVG',
   'SINSB02-RKLY-DCP10:FOR-PHASE-AVG',
   'SINSB02-RPRE-DCP10:FOR-PHASE-AVG',
   'SINSB02-RWVG100-DCP10:FOR-PHASE-AVG'

Then, you can get data from some channels, within a certain range. Ranges can be of type:
* `date`: e.g. "2016-07-26 16:00"
* `globalSeconds`
* `pulseId`

You can select a range start, end, or delta (two of the three). You can also decide if the table should be indexed accordingly to `date`, `globalSeconds` or `pulseId`

In [3]:
data = api.get_data(channels=[
        'SINSB02-RIQM-DCP10:FOR-PHASE-AVG', 
        'SINDI01-RIQM-DCP10:FOR-PHASE-AVG', 
        'S10CB01-RIQM-DCP10:FOR-PHASE-AVG',
       ], 
                  start="2016-07-27 16:11", end="2016-07-27 16:12", 
                  index_field="date")

Then, let's just explore the first 5 elements:

In [4]:
data.head()

Unnamed: 0_level_0,SINSB02-RIQM-DCP10:FOR-PHASE-AVG,SINDI01-RIQM-DCP10:FOR-PHASE-AVG,S10CB01-RIQM-DCP10:FOR-PHASE-AVG
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2016-07-27 14:11:00.003908,-48.03106,58.474915,144.4767
2016-07-27 14:11:00.013908,-48.02903,58.4745,144.46846
2016-07-27 14:11:00.023908,-48.028538,58.47647,144.47505
2016-07-27 14:11:00.033908,-48.024765,58.47751,144.47835
2016-07-27 14:11:00.043908,-48.028206,58.47304,144.49373


## Plotting

Pandas comes with some nice plotting utilities based on `matplotlib`:

In [6]:
data.plot(style='.-')

<IPython.core.display.Javascript object>

<matplotlib.axes._subplots.AxesSubplot at 0x1156b16a0>

Also, doing a *box* plot is extremely easy:

In [7]:
data.plot(kind='box')

<IPython.core.display.Javascript object>

<matplotlib.axes._subplots.AxesSubplot at 0x117ce5b38>

There are also some tools for more complex plotting, like a *scatter matrix* plus KDE estimation:

In [8]:
from pandas.tools.plotting import scatter_matrix

scatter_matrix(data, alpha=0.2, figsize=(10, 10), diagonal='kde')

<IPython.core.display.Javascript object>

array([[<matplotlib.axes._subplots.AxesSubplot object at 0x117c36278>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x11834f1d0>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x1117aa160>],
       [<matplotlib.axes._subplots.AxesSubplot object at 0x1181cb278>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x118390668>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x11875b5f8>],
       [<matplotlib.axes._subplots.AxesSubplot object at 0x1187ab1d0>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x11910c6d8>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x11915a3c8>]], dtype=object)

More examples can be found here: http://pandas.pydata.org/pandas-docs/stable/visualization.html

## Data reduction

You can reduce the data using the API (NB: at the moment, client side aggregation is always performed: this will change in the future):
* first, you configure the aggregation
* then, you retrieve data

In [9]:
api.set_aggregation(aggregations=["min", "mean", "max"], pulses_per_bin=100)
reduced_data = api.get_data(channels=[
        'SINSB02-RIQM-DCP10:FOR-PHASE-AVG', 
        'SINDI01-RIQM-DCP10:FOR-PHASE-AVG', 
        'S10CB01-RIQM-DCP10:FOR-PHASE-AVG',
       ], 
                  start="2016-07-27 16:11", end="2016-07-27 16:12", 
                  index_field="pulseId")

min
mean
max


In [10]:
reduced_data.columns

Index(['SINSB02-RIQM-DCP10:FOR-PHASE-AVG:min',
       'SINDI01-RIQM-DCP10:FOR-PHASE-AVG:min',
       'S10CB01-RIQM-DCP10:FOR-PHASE-AVG:min',
       'SINSB02-RIQM-DCP10:FOR-PHASE-AVG:mean',
       'SINDI01-RIQM-DCP10:FOR-PHASE-AVG:mean',
       'S10CB01-RIQM-DCP10:FOR-PHASE-AVG:mean',
       'SINSB02-RIQM-DCP10:FOR-PHASE-AVG:max',
       'SINDI01-RIQM-DCP10:FOR-PHASE-AVG:max',
       'S10CB01-RIQM-DCP10:FOR-PHASE-AVG:max'],
      dtype='object')

In [11]:
quantities = ['SINSB02-RIQM-DCP10:FOR-PHASE-AVG:max',
       'SINDI01-RIQM-DCP10:FOR-PHASE-AVG:max',
       'S10CB01-RIQM-DCP10:FOR-PHASE-AVG:max']
reduced_data[quantities].plot()

<IPython.core.display.Javascript object>

<matplotlib.axes._subplots.AxesSubplot at 0x119120e80>

You can also do a lot of statistics directly on the client! In case you are wondering about mean and standard deviation of the sample:

In [12]:
data.mean(), data.std()

(SINSB02-RIQM-DCP10:FOR-PHASE-AVG    -48.029730
 SINDI01-RIQM-DCP10:FOR-PHASE-AVG     58.468576
 S10CB01-RIQM-DCP10:FOR-PHASE-AVG    144.484548
 dtype: float64, SINSB02-RIQM-DCP10:FOR-PHASE-AVG    0.004088
 SINDI01-RIQM-DCP10:FOR-PHASE-AVG    0.005572
 S10CB01-RIQM-DCP10:FOR-PHASE-AVG    0.017284
 dtype: float64)

You can also rebin the data, and compute various statistical quantities on each bin:

In [13]:
import numpy as np

# Define pulses per bin
pulsesPerBin = 100

# get an array which tells which bin number an entry belongs to
bin_mask = np.array([i // pulsesPerBin for i in range(len(data.index))])
# get the actual bin boundaries
bins = data.index[[0, ] + (1 + np.where((bin_mask[1:] - bin_mask[0:-1]) == 1)[0]).tolist()]
# group the dataframe
groups = data.groupby(bin_mask)

In [14]:
# see some statistical properties
groups.describe()

Unnamed: 0,Unnamed: 1,SINSB02-RIQM-DCP10:FOR-PHASE-AVG,SINDI01-RIQM-DCP10:FOR-PHASE-AVG,S10CB01-RIQM-DCP10:FOR-PHASE-AVG
0,count,100.000000,100.000000,100.000000
0,mean,-48.029138,58.476526,144.479204
0,std,0.004016,0.002495,0.016330
0,min,-48.039455,58.467888,144.432200
0,25%,-48.031642,58.475171,144.467772
0,50%,-48.028609,58.476551,144.476700
0,75%,-48.026139,58.477882,144.491530
0,max,-48.018803,58.483948,144.514600
1,count,100.000000,100.000000,100.000000
1,mean,-48.030343,58.476973,144.478760


and then plot:

In [15]:
groups.describe().xs('75%', level=1).plot()


<IPython.core.display.Javascript object>

<matplotlib.axes._subplots.AxesSubplot at 0x1141d5240>