# Test analysis

Statistical analysis for 10 repetitions of a 5 minute test and for 10 repetitions of a 1 minute test of the **Lakeside-Mutual** application

### Workloads

- **5 minute test**:
    - simulated day length: 5 minutes = 300 seconds
    - 20 requests per second
    - about 6000 total requests

- **1 minute test** (compressed test):
    - simulated day length: 1 minutes = 60 seconds
    - 100 requests per second
    - about 6000 total requests

### Imports and functions

In [1]:
import os
import pathlib
import sys

import numpy as np
import pandas as pd
from IPython.display import HTML, display, display_html

mod_path = os.path.abspath(os.path.join("../../src/alyslib"))
if mod_path not in sys.path:
    sys.path.append(mod_path)

import alyslib

In [2]:
# function that returns a list that contains
# the mean of the TimeDeltas for every dataframe
def get_means(df):
    l = []
    for d in df:
        l.append(d.TimeDelta.mean())
    return l


# function that calculates the confidence interval
# (with `z_score`=1.96, returns a 95% confidence interval)
def conf_interval(data, z_score=1.96):
    mean = np.mean(data)
    std = np.std(data)
    size = len(data)
    err = z_score * (std / np.sqrt(size))
    return (mean - err, mean + err)

### Datasets - import

In [3]:
l = alyslib.import_data("./data", "net.gen")

### DataFrames - building

In [4]:
d0, d1 = alyslib.build_dfs(l)

In [5]:
dfmerge = d0 + d1

### DataFrames - cleaning network noise

For the analysis of the tests we cannot have **SendIP** and **RecvIP** differences. We clean the network noise for every pair of tests.

In [6]:
alyslib.clean_network_noise(dfmerge)

### Dataframes - sorting by Timestamp

In [7]:
alyslib.sort_by_key(dfmerge, "Timestamp")

### Dataframes - generating column Elapsed time

In [8]:
alyslib.cmp_elapsed(dfmerge)

### Analysis - 1 minute test

In [9]:
d0m = get_means(d0)

In [10]:
display(pd.DataFrame(d0m, columns=["means"]))
display(
    pd.DataFrame(
        [[np.mean(d0m), np.std(d0m), conf_interval(d0m)]],
        columns=["mean of means", "std of means", "95% conf interval"],
    )
)

Unnamed: 0,means
0,0.040251
1,0.039671
2,0.042893
3,0.037291
4,0.039666
5,0.040487
6,0.039763
7,0.039808
8,0.039885
9,0.039849


Unnamed: 0,mean of means,std of means,95% conf interval
0,0.039956,0.001278,"(0.039164105436788145, 0.04074853432831304)"


1. we calculated the **TimeDelta mean** for every DataFrame generated for the current test (we have 10 repetitions, so we have 10 DataFrames).
2. we calculated the **mean of the means** calculated above
3. we calculated the **standard deviation of the means** calculated above
4. we calculated the **95% confidence interval of the means** calculated above

### Analysis - 5 minutes test

In [11]:
d1m = get_means(d1)

In [12]:
display(pd.DataFrame(d1m, columns=["means"]))
display(
    pd.DataFrame(
        [[np.mean(d1m), np.std(d1m), conf_interval(d1m)]],
        columns=["mean of means", "std of means", "95% conf interval"],
    )
)

Unnamed: 0,means
0,0.179407
1,0.177271
2,0.181167
3,0.180987
4,0.184761
5,0.160731
6,0.174199
7,0.183377
8,0.174418
9,0.14816


Unnamed: 0,mean of means,std of means,95% conf interval
0,0.174448,0.01089,"(0.16769825473057587, 0.1811971233366496)"


We performed the same calculations as the previous test with this new test (again we have 10 repetitions, so we have 10 DataFrames for the current test)

### Analysis - comparison between the tests

In [13]:
display(
    pd.DataFrame(
        [
            [np.mean(d0m), np.std(d0m), conf_interval(d0m)],
            [np.mean(d1m), np.std(d1m), conf_interval(d1m)],
        ],
        columns=["mean of means", "std of means", "95% conf interval"],
        index=["1m test", "5m test"],
    )
)

Unnamed: 0,mean of means,std of means,95% conf interval
1m test,0.039956,0.001278,"(0.039164105436788145, 0.04074853432831304)"
5m test,0.174448,0.01089,"(0.16769825473057587, 0.1811971233366496)"


In conclusion, we present above the achieved results to compare the results of both tests simultaneously