# Effect of under-sampling on a frequently sampled time series

Just how much of an effect does under-sampling have on our capability to see what happens with a rapidly changing time-series? This is of particular importance for the degree of emphasis we put on time series such as:
- sampling of Ruapehu crater lake
- gas flux flights

Two approaches are used:
- synthtic data example, where the synthetic data is a series of spikes with a zero 'background
- White Island SO2 flux data, estimating what gas flight values might see

In [None]:
%%javascript
IPython.OutputArea.prototype._should_scroll = function(lines) {
    return false;
}

In [None]:
import pandas as pd
import numpy as np
%matplotlib inline

**1. Synthetic data, a few short-term 'spike' signals.**

Number of signals, signal value, and width

In [None]:
numsig = 5
sigval = 100
sigwid = 20

Synthetic data, zero everywhere except where there is a signal. Triangular signal spread over several 'days'

In [None]:
zeros = np.zeros(shape=(1000,1))
data = pd.DataFrame(zeros, columns=['obs'])

signal = np.random.randint(20, len(data)-20, numsig) #signals locations
data['obs'][signal] = sigval

d = data.rolling(sigwid, win_type='triang', center=True).sum()
d.fillna(0, inplace=True)

In [None]:
interval = 40 # mean sample interval
sample = np.arange(20, len(d)-20, interval) #fixed sample times
weather = np.random.randint(-10, 11, len(sample)) #random variation between -10 -> +10
sample += weather #sample points are fixed + random (weather) component

In [None]:
#sample daily observations at the random sample points
dsamp = d.iloc[sample]

In [None]:
temp = d.plot(figsize=(15,5))
dsamp.plot(ax=temp, marker='o', linestyle='--')
temp.legend(['observation', 'sample'])

**2. Approximately two years of data from the MDOAS at North East Point on White Island.**

In [None]:
names = ['date', 'obs']
df = pd.read_csv('example_data.csv', skiprows=1, parse_dates=True, names=names, usecols=[0,1], index_col=0)

Dataframe contains various numbers of observations each day

In [None]:
df.head()

Downsample to a daily mean value. For days with no observations, use linear interpolation to fill

In [None]:
day = df.resample('D').mean()
day.interpolate(inplace=True)

In [None]:
day.head()

Set the row number as index as this makes random sampling easier

In [None]:
day.reset_index(inplace=True)

In [None]:
day.head()

In [None]:
interval = 40 # mean sample interval
sample = np.arange(20, len(day)-20, interval) #fixed sample times
weather = np.random.randint(-10, 11, len(sample)) #random variation between -10 -> +10
sample += weather #sample points are fixed + random (weather) component

In [None]:
#sample daily observations at the random sample points
dsamp = day.iloc[sample]

In [None]:
day.set_index('date', inplace=True)
dsamp.set_index('date', inplace=True)

In [None]:
day.head()

In [None]:
temp = day.plot(figsize=(15,5))
dsamp.plot(ax=temp, marker='o', linestyle='--')
temp.legend(['observation', 'sample'])