# Getting started with TimeSeries

In [92]:
# relevant imports
import numpy as np
from matplotlib import pyplot as plt
%matplotlib notebook
from ptsa.data.TimeSeriesX import TimeSeriesX as timeseries

## 1. Start with creating some data

<p>In real applications, you will most likely have your own timeseries data for analysis.  
For the purpose of illustrating functionalities of the timeseries object, we will construct sinusoids as our timeseries data.  
Our timeseries data will consist of 5000 data points, or samples. Suppose the sampling rate is 10Hz, this means that our timeseires is 5000/10=500 seconds long.
</p>

In [111]:
num_points = 5000
sample_rate = 10.
t = np.linspace(1, num_points, num_points) / sample_rate

# Let's create two noisy sinusoids with different frequencies.
frequency1 = .5 # 1 cycle every 2 seconds
frequency2 = .1 # 1 cycle every 10 seconds
data1 = np.sin(2*np.pi*frequency1*t) + np.random.uniform(-0.5, 0.5, num_points)
data2 = np.sin(2*np.pi*frequency2*t) + np.random.uniform(-0.5, 0.5, num_points)

<p>We can specify the timestamps for each data point, from 0s to 500s.</p>

In [112]:
print 'First 5 timestamps: ', t[:5]
print 'Last 5 timestamps: ', t[-5:]

First 5 timestamps:  [0.1 0.2 0.3 0.4 0.5]
Last 5 timestamps:  [499.6 499.7 499.8 499.9 500. ]


<p>We can visualize the timeseries using matplotlib.</p>

In [114]:
plt.figure(figsize=[10,2])
plt.plot(t, data1, label='.5Hz')
plt.plot(t, data2, label='.1Hz')

plt.legend()

<IPython.core.display.Javascript object>

<matplotlib.legend.Legend at 0x2aca5c73d450>

<p>As we zoom in the random noise we added to the sinusoids becomes clear.</p>

In [115]:
plt.figure(figsize=[10, 2])
plt.plot(t[500:1000], data1[500:1000], label='.5Hz')
plt.plot(t[500:1000], data2[500:1000], label='.1Hz')
plt.legend()

<IPython.core.display.Javascript object>

<matplotlib.legend.Legend at 0x2aca5cdb6b90>

## 2. Create a TimeSeries object

<p>The TimeSeries class is a convenient wrapper of xarray that offers functionalities for timeseries analysis. Although we focus our analysis here in the context of timeseries data, many of the following examples apply to non-timeseries, multidimentional datasets.  
To create a TimeSeries object, we just need to construct dimensions and the coordinates in those dimensions for a multidimentional array.</p>

In [116]:
# Let's stack the two time-series data arrays.
data = np.vstack((data1, data2))

# Constructing the timeseries object
ts = timeseries(data,
                dims=('data', 'time'),
                coords={'data':['data1', 'data2'],
                        'time':t,
                        'samplerate':sample_rate})
print ts

<xarray.TimeSeriesX (data: 2, time: 5000)>
array([[-0.088652,  0.553644,  1.278523, ..., -1.021133, -0.710079, -0.469294],
       [ 0.416845,  0.351558,  0.215291, ...,  0.347364, -0.049885,  0.424543]])
Coordinates:
    samplerate  float64 10.0
  * data        (data) |S5 'data1' 'data2'
  * time        (time) float64 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 ...


<p>timeseries also has a convenient plotting function.</p>

In [117]:
plt.figure()
ts.sel(data='data1').plot()

<IPython.core.display.Javascript object>

[<matplotlib.lines.Line2D at 0x2aca5bddbb10>]

## 3. Saving and loading your data
<p>timeseries objects can be easily saved and loaded in HDF5 format. </p>

In [99]:
# timeseries object can be easily saved
fname = 'my_ts_data.h5'
ts.to_hdf(fname)

In [100]:
ts = timeseries.from_hdf(fname)
print ts

<xarray.TimeSeriesX (data: 2, time: 5000)>
array([[ 0.346414,  0.583474,  0.754813, ..., -0.634816, -0.270143,  0.055121],
       [ 0.001064,  0.218458,  0.157267, ..., -0.217666, -0.048204, -0.003803]])
Coordinates:
    samplerate  float64 10.0
  * data        (data) |S5 'data1' 'data2'
  * time        (time) float64 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 ...


## 4. Indexing your data

<p>We can used the .sel() function to select partial data by the coordinates, or give the dimension name instead of dimension index to functions.</p>

In [118]:
# Select the data from 100s to 200s
ts.sel(time=(ts.time>100)&(ts.time<200))

<xarray.TimeSeriesX (data: 2, time: 999)>
array([[ 0.748759,  0.605719,  0.63898 , ..., -1.214076, -0.85809 ,  0.035541],
       [ 0.543828,  0.475781, -0.164471, ..., -0.518676,  0.299043, -0.161555]])
Coordinates:
    samplerate  float64 10.0
  * data        (data) |S5 'data1' 'data2'
  * time        (time) float64 100.1 100.2 100.3 100.4 100.5 100.6 100.7 ...

In [119]:
# mean over the time dimension
ts.mean('time')

<xarray.TimeSeriesX (data: 2)>
array([-0.007483,  0.002175])
Coordinates:
    samplerate  float64 10.0
  * data        (data) |S5 'data1' 'data2'

## 5. Matching coordinates

<p>We can also place the time dimension as the first dimension. The coordinates in TimeSeries are very useful and are used to keep track of the dimensions of the data. Therefore, the exact shape of the data or the order of the dimensions can vary.</p>

In [120]:
# Let's stack the two time-series data arrays.
data_transpose = np.vstack((data1, data2)).T

# Constructing the timeseries object
ts_transpose = timeseries(data_transpose,
                          dims=('time', 'data'),
                          coords={'time':t,
                                  'data':['data1', 'data2'],
                                  'samplerate':sample_rate})
print ts_transpose

<xarray.TimeSeriesX (time: 5000, data: 2)>
array([[-0.088652,  0.416845],
       [ 0.553644,  0.351558],
       [ 1.278523,  0.215291],
       ...,
       [-1.021133,  0.347364],
       [-0.710079, -0.049885],
       [-0.469294,  0.424543]])
Coordinates:
    samplerate  float64 10.0
  * data        (data) |S5 'data1' 'data2'
  * time        (time) float64 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 ...


<p>Just to show that TimeSeries keeps track of the coordinates, we can try adding the original timeseries and its "transposed" version together. Because their coordinates match, operations can be performed between them even though the data underneath them are of different shapes.  
Note that because the transposed version of the data is just itself, by adding them together we're just doubling the values.</p>

In [122]:
results = ts + ts_transpose

plt.figure(figsize=[10, 3])

# we'll only plot the first 250 samples to see things clearly
plt.plot(ts.time[:250], ts.sel(data='data1')[:250], label='original')
plt.plot(results.time[:250], results.sel(data='data1')[:250], label='original+transposed')
plt.ylim([-4,4])
plt.legend(ncol=2)

<IPython.core.display.Javascript object>

<matplotlib.legend.Legend at 0x2aca5d136590>

<p>Without the coordinates, the data cannot be added because their shape wouldn't match.</p>

In [108]:
try:
    print (data+data_transpose)
except Exception as e:
    print 'Error: ' + str(e)

Error: operands could not be broadcast together with shapes (2,5000) (5000,2) 


## 6. Resampling your data
<p>We can resample the data to a specific samplerate.  
Notice that by downsampling, we eliminate a certain degree of noise.</p>

In [136]:
original = ts.sel(data='data1').sel(time=ts.time<50.0)
downsampled = original.resampled(resampled_rate=2.0)

In [137]:
plt.figure(figsize=[10, 4])
plt.plot(original.time, original, label='10.0Hz (original)')
plt.plot(downsampled.time, downsampled, label='2.0Hz (downsampled)')
plt.legend()

<IPython.core.display.Javascript object>

<matplotlib.legend.Legend at 0x2aca6514ff90>

<p>However, if we downsample to a samplerate that is not enough to observe the frequency in the data, we lose precision of the signal.</p>

In [139]:
original = ts.sel(data='data1').sel(time=ts.time<50.0)
downsampled = original.resampled(resampled_rate=1.0)

plt.figure(figsize=[10, 4])
plt.plot(original.time, original, label='10.0Hz (original)')
plt.plot(downsampled.time, downsampled, label='1.0Hz (downsampled)')
plt.legend()

<IPython.core.display.Javascript object>

<matplotlib.legend.Legend at 0x2aca6516a910>

In [141]:
# timeseries updates the samplerate for you
print original.samplerate
print downsampled.samplerate

<xarray.TimeSeriesX 'samplerate' ()>
array(10.)
Coordinates:
    samplerate  float64 10.0
    data        |S5 'data1'
<xarray.TimeSeriesX 'samplerate' ()>
array(1.)
Coordinates:
    samplerate  float64 1.0


## 7. Filtering your data
<p>Let's create a new timeseries composed of sinunoides at three different frequencies. We'll show how to manipulate the data using different filtering methods.</p>

In [144]:
freq1 = 3.2
freq2 = 1.6
freq3 = 0.2

data1 = np.sin(2*np.pi*freq1*t)
data2 = np.sin(2*np.pi*freq2*t)
data3 = np.sin(2*np.pi*freq3*t)

# our data are simply the addition of the three sinusoids
data = data1 + data2 + data3

In [153]:
ts = timeseries(data, dims=('time'), coords={'time':t, 'samplerate':sample_rate})
ts

<xarray.TimeSeriesX (time: 5000)>
array([ 1.874488e+00,  3.830037e-01,  2.447679e-01, ..., -3.830037e-01,
       -1.874488e+00,  9.821934e-14])
Coordinates:
    samplerate  float64 10.0
  * time        (time) float64 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 ...

In [156]:
# Let's plot the first 200 samples of the data
fig, ax = plt.subplots(4, figsize=[10, 4], sharex=True, sharey=True)
ax[0].plot(t[:200], data1[:200])
ax[1].plot(t[:200], data2[:200])
ax[2].plot(t[:200], data3[:200])
ax[3].plot(t[:200], ts[:200])

ax[0].set_ylabel('3.2Hz')
ax[1].set_ylabel('1.6Hz')
ax[2].set_ylabel('0.2Hz')
ax[3].set_ylabel('sum')
ax[3].set_xlabel('Time(s)')

<IPython.core.display.Javascript object>

Text(0.5,0,u'Time(s)')

### We will use three different filters to filter out each component.  
<p> 1. To filter out the component with the highest frequency (3.2Hz), we'll use a lowpass filter. A lowpass filter perserves any frequency that is lower than the given frequency.  
2. To filter out the component with the lowest frequency (0.2Hz), we'll use a highpass filter. A highpass filter perserves any frequency that is higher than the given frequency.  
3. To filter out the component with the middle frequency (1.6Hz), we'll use a bandstop filter. A bandstop filter perserves any frequency that is outside of the given frequency range.
</p>

In [178]:
# lowpass filter
filtered_data = ts.filtered(3.0, filt_type='lowpass', order=4)

fig, ax = plt.subplots(3, figsize=[10, 4], sharex=True, sharey=True)

ax[0].plot(t[:200], ts[:200]) # origianl timeserids
ax[1].plot(t[:200], filtered_data[:200]) # bandstop filtered
ax[2].plot(t[:200], (data2+data3)[:200]) # what we should get (mid + low frequencies)

ax[0].set_ylabel('unfiltered')
ax[1].set_ylabel('filtered')
ax[2].set_ylabel('mid + low')
ax[2].set_xlabel('Time(s)')

<IPython.core.display.Javascript object>

Text(0.5,0,u'Time(s)')

In [179]:
# highpass filter
filtered_data = ts.filtered(0.5, filt_type='highpass', order=4)

fig, ax = plt.subplots(3, figsize=[10, 4], sharex=True, sharey=True)

ax[0].plot(t[:200], ts[:200]) # origianl timeserids
ax[1].plot(t[:200], filtered_data[:200]) # bandstop filtered
ax[2].plot(t[:200], (data2+data1)[:200]) # what we should get (mid + high frequencies)

ax[0].set_ylabel('unfiltered')
ax[1].set_ylabel('filtered')
ax[2].set_ylabel('mid + high')
ax[2].set_xlabel('Time(s)')

<IPython.core.display.Javascript object>

Text(0.5,0,u'Time(s)')

In [180]:
# bandstop filter
filtered_data = ts.filtered([1.4, 1.8], filt_type='stop', order=4)

fig, ax = plt.subplots(3, figsize=[10, 4], sharex=True, sharey=True)

ax[0].plot(t[:200], ts[:200]) # origianl timeserids
ax[1].plot(t[:200], filtered_data[:200]) # bandstop filtered
ax[2].plot(t[:200], (data1+data3)[:200]) # what we should get (high + low frequencies)

ax[0].set_ylabel('unfiltered')
ax[1].set_ylabel('filtered')
ax[2].set_ylabel('high + low')
ax[2].set_xlabel('Time(s)')

<IPython.core.display.Javascript object>

Text(0.5,0,u'Time(s)')