# TriScale - Seasonal Components

> This notebook is intended for **self-study** of _TriScale._  
Here is the [version for live sessions](live_seasonal-comp.ipynb).

This notebook contains tutorial materials for _TriScale_. 

More specifically, this notebook presents the importance of accounting for seasonal components  
in the data analysis.

> If you don't know about Jupyter Notebooks and how to interact with them,  
fear not! We compiled everything that you need to know here: [Notebook Basics](tutorial_notebook-basics.ipynb) :-) 


For more details about _TriScale,_ you may refer to [the paper](https://doi.org/10.5281/zenodo.3464273).

---
- [Scenario](#Scenario)
    - [Scenario details](#Scenario-details)
    - [The dataset](#The-dataset)
- [Data analysis](#Data-analysis)  
    - [Your turn: time to practice (part 1)](#Your-turn:-time-to-practice-(part-1))  
    - [What about seasonality?](#What-about-seasonality?)  
    - [Your turn: time to practice (part 2)](#Your-turn:-time-to-practice-(part-2))  

---

To get started, we need to import a few Python modules.  
All the _TriScale_-specific functions are part of one module called `triscale`.

In [None]:
import os
from pathlib import Path
import datetime

import pandas as pd
import numpy as np

import triscale

Alright, we are ready to analyse some data!

## Scenario

In this tutorial, we consider performance data from [Glossy](https://ieeexplore.ieee.org/document/5779066) collected on [the FlockLab testbed](http://flocklab.ethz.ch/)  
as experiment environment. 

[Glossy](https://ieeexplore.ieee.org/document/5779066) is a low-power wireless protocol based on synchronous transmissions and a  
flooding strategy. One important tuning parameter of Glossy is the number of times  
$N$ that each node transmit each packet.

The literature reports that larger $N$ yields better reliability; that is, a larger packet  
reception ratio (PRR). We performed a short experimental study to validate this  observation.   
More specifically, we test two values:
- $N=1$
- $N=2$

### Scenario details

> **Note.** These details are irrelevant for the present tutorial and are  
only provided for completeness. Feel free to skip that section...

<details>
  <summary>Click here show the details</summary>
  
The test scenario is very simple. During one communication round, each node in the network initiate in turn a Glossy flood (using $N=1$ retransmission). All the other nodes log whether they successfully received the packet. The same round is then repeated with $N=2$ retransmissions.

- The evaluation runs on the [TelosB motes](https://www.advanticsys.com/shop/mtmcm5000msp-p-14.html)
- The motes use radio frequency channel 22 (2.46GHz, which largely overlaps with WiFi traffic)
- The payload size is set to 64 bytes.
- The scenario is run 24 times per day, scheduled randomly throughout the day. 
- Data has been collected over three weeks, from 2019-08-22 to 2019-09-11.

We define the PRR metric as the median packet reception ratio between all the nodes. In other words, our metric is the median number of floods what are successfully received by one node in the network.
</details>


### The dataset

The collected data is available in the [TriScale artifacts repository](#Download-Source-Files-and-Data). The results have been  
collected, processed, and made directly available for analysis. Let's first load the dataset. 

In [None]:
# Load the PRR results from the test
df = pd.read_csv('ExampleData/metrics_glossy.csv', index_col=0, parse_dates=True)

# Display a random sample
df.sample(5)

To limit bias, the dataset has been "anonymized;" that is, we randomly replaced the  
value of $N$ with a letter ($A$ or $B$).

## Data analysis

### Your turn: time to practice (part 1)

The FlockLab testbed is located in an office building, where we expect more wireless  
interference during the day than during the night. Thus, for a fair comparison, the   
time span of a series of runs should be at least one day (24 hours).

Let us select two days and compare the PRR of $A$ and $B$ on those days.  
The KPI definition is given below.  

In [None]:
# Days considered for the data analysis
day_A = '2019-08-24'
day_B = '2019-08-26'

# Fitering the dataset for the data or interest
data_A = df.loc[day_A].PRR_A.dropna().values
data_B = df.loc[day_B].PRR_B.dropna().values

# KPI definition
KPI = {'name': 'PRR',
       'unit': '\%',
       'percentile': 50,
       'confidence': 95,
       'class': 'one-sided',
       'bounds': [0,100],
       'bound': 'lower'}

Use the `triscale.analysis_kpi()` function to compute the KPI value for each group. 

- Which group seems to perform best?
- What confidence to you have in this result?

In [None]:
########## YOUR CODE HERE ###########
# ...
#####################################

#### Solutions

<details>
  <summary><br/>Click here show the solutions</summary>
  
```python
>>> triscale.analysis_kpi(data_A, KPI)
(True, 88.0)
>>> triscale.analysis_kpi(data_B, KPI)
(True, 84.0)
```
$A$ seems to perform better than $B$. However, since even if the KPI has been defined  
with a high level of confidence, it does not mean that the experimental conditions  
during the two days were actually comparable...
    
And as a matter of fact, $A$ corresponds to $N=1$ which is highly unlikely to perform  
   better than $N=2$.
    
</details>

### What about seasonality? 

In the previous analysis, we (randomly?) picked some days for each group. But what   
do we know about the possible correlation between those two days? 
- Maybe we got unlucky on the day $B$ was tested?
- Or maybe we omitted some hidden factor?

To investigate that, we can look at the [wireless link quality data for FlockLab](https://doi.org/10.5281/zenodo.3354717), which is   
collected by the FlockLab maintainers and made publicly available. They ran the link  
quality tests every two hours, resulting in 12 measurement points per day.

In this tutorial, we look at the data from August 2019, which has a large overlap with  
our data collection period. Let's load this dataset and have a look...

In [None]:
link_quality = pd.read_csv('ExampleData/flocklab_link_quality.csv', index_col=0, parse_dates=True)
link_quality.head()

The dataset is simple: every two hours, we have one value representing the "average  
link quality" on the testbed (the computation that led to this average is irrelevant here).

_TriScale_'s `network_profiling()` generates an autocorellation plot based on  
such data, as illustrated below.

In [None]:
link_quality_bounds = [0,100]
link_quality_name = 'PRR [%]'
fig_theil, fig_autocorr = triscale.network_profiling(
    link_quality, 
    link_quality_bounds, 
    link_quality_name,
)
fig_autocorr.show()

One can clearly see from the autocorrelation plot that the average link quality on  
FlockLab has strong seasonal components. The **first pic at lag 12 (i.e., 24h)**  
reveals the daily seasonal component. 

But there is also **a second main peak at lag 84**; which corresponds to one week.  
Indeed, there is less interference in the weekends than on weekdays, which creates  
a weekly seasonal component.

Due to this weekly component, it becomes problematic (aka, potentially wrong) to  
compare results from different time periods which span less than a week.  
In other word, the time span for series of runs must be at least one week long  
to be fairly comparable.

Let us quickly check which days of the week we picked for our first analysis...

In [None]:
def print_weekday(str):
    '''Simple function printing the weekday
    from a date given as a string
    '''
    year, month, day  = (int(x) for x in str.split('-'))    
    ans = datetime.date(year, month, day)
    print(ans.strftime("%A"))
    
print_weekday(day_A)
print_weekday(day_B)

Bingo! $B$ was tested on a weekday, while $A$ was tested on a weekend...

> The day of the week was a "hidden" factor in our first analysis.  
Neglecting it led to wrong conclusions. 

### Your turn: time to practice (part 2)

Let us now use the entire Glossy dataset and analyse it as one series  
(with a span of three weeks).

In [None]:
data_A = df.PRR_A.dropna().values
data_B = df.PRR_B.dropna().values

Use again the `triscale.analysis_kpi()` function to compute  
the KPI value for each group.
- Which group seems to perform best now?
- What about independence? Do you think the results are trustworthy?

In [None]:
########## YOUR CODE HERE ###########
# ...
#####################################

#### Solutions

<details>
  <summary><br/>Click here to show the solutions</summary>
  
```python
>>> triscale.analysis_kpi(data_A, KPI)
(False, 80.0)
>>> triscale.analysis_kpi(data_B, KPI)
(False, 88.0)
```
Now, we do obtain the expected result: $N=2$ (group $B$) performs better than $N=1$.  
    Note however that the independence test fails. This is due to the ordering of the tests:  
    We scheduled tests randomly every day individually, not over the 3 weeks time span.  
    Therefore, the data are affected by the (strong) weekly correlation on the environment.
    
We can observe this correlation bt plotting the data and/or it's autocorellation function:
```python
>>> plots=['series','autocorr']
>>> triscale.analysis_kpi(data_A, KPI, plots)
```
    
We can try to emulate the fact that we'd have properly randomized the run epochs by shuffling  
    the data.
    
```python
>>> import random
>>> random.shuffle(data_A)
>>> to_plot=['autocorr']
>>> triscale.analysis_kpi(data_A, KPI, to_plot)
```  
    
As you can see, the correlation structure significantly flattens. In some cases, the independence  
test might even pass... But keep in mind it is only an artifact! To make a strong statement,  
    the run epochs should have been truly randomized.
</details>

--- 
[Back to main repository](.)