# Lab 5.04 - Two-sample t-test

In [1]:
# Package imports
import numpy as np                                  # "Scientific computing"
import scipy.stats as stats                         # Statistical tests

import pandas as pd                                 # Dataframe
import matplotlib.pyplot as plt                     # Basic visualisation
from statsmodels.graphics.mosaicplot import mosaic  # Mosaic plot
import seaborn as sns                               # Advanced dataviz

## Exercise 4 - Android Persistence libraries performance comparison

We analyzed the results of performance measurements for Android persistence libraries (Akin, 2016). Experiments were performed for different combinations of *DataSize* (Small, Medium, Large) and *PersistenceType* (GreenDAO, Realm, SharedPreferences, SQLite). For each data size, we were able to determine which persistence type yielded the best results.

Now we will verify if the best persistence type at first glance is also *significantly* better than the competition.

Specifically: Using a two-sample test for each data size, verify that the mean of the best persistence type is significantly lower than the mean of the second best and the worst scoring type.

Can we maintain the conclusion that for a given data size, one persistence type is best, i.e. is significantly better than any other persistence type?

In [26]:
df = pd.read_csv('../data/android_persistence_cpu.csv', delimiter=';')
df.groupby(['PersistenceType', 'DataSize']).sum().sort_values(['DataSize', 'Time'])

Unnamed: 0_level_0,Unnamed: 1_level_0,Time
PersistenceType,DataSize,Unnamed: 2_level_1
Realm,Large,319.55
SQLLite,Large,345.45
GreenDAO,Large,363.31
Realm,Medium,174.54
GreenDAO,Medium,223.62
SQLLite,Medium,233.82
Realm,Small,47.97
Sharedpreferences,Small,50.21
SQLLite,Small,53.97
GreenDAO,Small,56.81


In [38]:
df1 = df.query('PersistenceType == "Realm" and DataSize == "Large"')['Time']

df2 = df.query('PersistenceType == "SQLLite" and DataSize == "Large"')['Time']

stats.ttest_ind(a=df1, b=df2,
    alternative='less', equal_var=False)

TtestResult(statistic=-3.1251713022860717, pvalue=0.0016999220614984435, df=37.949081548450195)

In [39]:
df1 = df.query('PersistenceType == "Realm" and DataSize == "Medium"')['Time']

df2 = df.query('PersistenceType == "GreenDAO" and DataSize == "Medium"')['Time']

stats.ttest_ind(a=df1, b=df2,
    alternative='less', equal_var=False)

TtestResult(statistic=-3.720451024030081, pvalue=0.0002506300568234833, df=50.368112409979226)

In [43]:
df1 = df.query('PersistenceType == "Realm" and DataSize == "Small"')['Time']

df2 = df.query('PersistenceType == "Sharedpreferences" and DataSize == "Small"')['Time']

stats.ttest_ind(a=df1, b=df2,
    alternative='less', equal_var=False)

TtestResult(statistic=-0.9624716662718156, pvalue=0.16992370571901444, df=57.43660193307136)

### Answers

The table below provides an overview of the best and second best persistence type for each data size (based on the sample mean).

| Data Size | Best  | 2nd Best          | p-value   |
| :-------- | :---- | :---------------- | :-------- |
| Small     | Realm | SharedPreferences | 0.1699    |
| Medium    | Realm | GreenDAO          | 0.0002506 |
| Large     | Realm | SQLite            | 0.0017    |

The conclusion of Akin (2016), which states that Realm is the most efficient persistence type, still holds, but for the small data sets the difference is not significant.

Note that we have not explicitly selected a specific significance level in advance. However, for $\alpha$ = 0.1, 0.05 or even 0.01, the same conclusion can be drawn.