# Statistics Helpers

Functions that can be used to compute various numeric properties about simulations

In [170]:
import numpy as np

In [193]:
%run Simulations.ipynb

999
2000
1000000
QUIC with loss sample
C:\Users\Jamie\Documents\UVic\4th Year\Fall 2021\CSC 499 - QUIC\jupyter-csc499\499-Visualization\data\benchmarking-loss\TCP\festive\2Mbps\0.05\no-pacing\
TCP | festive | 2Mbps (2000000bps) | 5.0% loss | no pacing
444
222
QUIC | festive | 1Mbps (1000000bps) | 0.0% loss | pacing
QUIC | festive | 1Mbps (1000000bps) | 1.0% loss | no pacing
QUIC | festive | 2Mbps (2000000bps) | 5.0% loss | no pacing
QUIC | tobasco | 500Kbps (500000bps) | 1.0% loss | pacing
QUIC | tobasco | 2Mbps (2000000bps) | 2.0% loss | pacing
QUIC | tobasco | 3Mbps (3000000bps) | 2.0% loss | no pacing
QUIC | tobasco | 3Mbps (3000000bps) | 5.0% loss | pacing
QUIC | tobasco | 5Mbps (5000000bps) | 5.0% loss | no pacing
festive over QUIC @ 2Mbps with 0% loss (with pacing)
festive over QUIC @ 2Mbps with 5.0% loss (no pacing)


## Single-parameter Queries

In [172]:
# Get quality levels requested for a given simulation
def getQualityLevels(parentDir):
    _, rows = readAdaptationLog(parentDir)
    return np.array(rows)[:,1]


# Computes the mean quality level over all segments for a simulation
def meanQualityLevel(parentDir):
    qualities = getQualityLevels(parentDir)
    return np.mean(qualities)

In [173]:
print ("Mean Quality Level - QUIC 500Kbps: ", meanQualityLevel(sampleQuicDirSlow))
print ("Mean Quality Level - TCP  500Kbps: ", meanQualityLevel(sampleTcpDirSlow))
print ("Mean Quality Level - QUIC 1Mbps: ", meanQualityLevel(sampleQuicDir))
print ("Mean Quality Level - TCP  1Mbps: ", meanQualityLevel(sampleTcpDir))

Mean Quality Level - QUIC 500Kbps:  0.7278911564625851
Mean Quality Level - TCP  500Kbps:  0.9099099099099099
Mean Quality Level - QUIC 1Mbps:  1.5
Mean Quality Level - TCP  1Mbps:  1.90990990990991


In [174]:
# Computes the number of times a buffer underflow ocurred in a simulation
def underflowCount(parentDir):
    _, rows = readUnderflowLog(parentDir)
    return len(rows)

In [175]:
print ("Number of buffer under-runs - QUIC 500Kbps: ", underflowCount(sampleQuicDirSlow))
print ("Number of buffer under-runs - TCP  500Kbps: ", underflowCount(sampleTcpDirSlow))
print ("Number of buffer under-runs - QUIC 1Mbps: ", underflowCount(sampleQuicDir))
print ("Number of buffer under-runs - TCP  1Mbps: ", underflowCount(sampleTcpDir))

Malformed buffer underflow log. Likely that QUIC failed near time 330.524
Number of buffer under-runs - QUIC 500Kbps:  1
Number of buffer under-runs - TCP  500Kbps:  0
Malformed buffer underflow log. Likely that QUIC failed near time 184.045
Number of buffer under-runs - QUIC 1Mbps:  0
Number of buffer under-runs - TCP  1Mbps:  0


In [176]:
# Note that TCP simulations may run for longer than corresponding QUIC simulations, 
# so comparing these counts directly can be misleading.
def qualityChangeCount(parentDir):
    qualities = getQualityLevels(parentDir)
    changeCount = 0
    current = qualities[0]
    for quality in qualities[1:]:
        if current != quality:
            current = quality
            changeCount += 1
    
    return changeCount

In [177]:
print ("Number of quality changes - QUIC 500Kbps: ", qualityChangeCount(sampleQuicDirSlow))
print ("Number of quality changes - TCP  500Kbps: ", qualityChangeCount(sampleTcpDirSlow))
print ("Number of quality changes - QUIC 1Mbps:   ", qualityChangeCount(sampleQuicDir))
print ("Number of quality changes - TCP  1Mbps:   ", qualityChangeCount(sampleTcpDir)) 

Number of quality changes - QUIC 500Kbps:  3
Number of quality changes - TCP  500Kbps:  1
Number of quality changes - QUIC 1Mbps:    2
Number of quality changes - TCP  1Mbps:    54


## Input data stats

In [178]:
# Return the average bitrates for each quality level
def getAverageBitrates(segmentSizesFile):
    segmentSizes = readSegmentSizesFile(segmentSizesFile)
    return segmentSizes.mean(axis=1)/2.0 # Divide by 2 because each segment is 2 seconds long


# Return the average bitrate for the given quality level
def getAverageBitrate(segmentSizesFile, qualityLevel):
    segmentSizes = readSegmentSizesFile(segmentSizesFile)
    return segmentSizes[qualityLevel,:].mean() / 2.0 # Divide by 2 to account for segment duration

In [179]:
# Show the average bitrates for each quality level (option 1).
allQualityLevels = getAverageBitrates(defaultSegmentSizesFile)
for i in range(len(allQualityLevels)):
    # Print at 2 decimal precision with right alignment
    bitrate = allQualityLevels[i]
    rounded = "{:.2f}".format(bitrate)
    print ("Quality {} : ".format(i), f"{rounded:>10} bits/sec")

Quality 0 :    22049.02 bits/sec
Quality 1 :    43490.10 bits/sec
Quality 2 :   112718.11 bits/sec
Quality 3 :   234482.03 bits/sec
Quality 4 :   492342.43 bits/sec
Quality 5 :   945602.08 bits/sec
Quality 6 :  2295160.36 bits/sec
Quality 7 :  5125286.18 bits/sec


In [180]:
# Show the average bitrates for each quality level (option 2).
for i in range(len(allQualityLevels)):
    # Print at 2 decimal precision with right alignment
    bitrate = getAverageBitrate(defaultSegmentSizesFile, i)
    rounded = "{:.2f}".format(bitrate)
    print ("Quality {} : ".format(i), f"{rounded:>10} bits/sec")

Quality 0 :    22049.02 bits/sec
Quality 1 :    43490.10 bits/sec
Quality 2 :   112718.11 bits/sec
Quality 3 :   234482.03 bits/sec
Quality 4 :   492342.43 bits/sec
Quality 5 :   945602.08 bits/sec
Quality 6 :  2295160.36 bits/sec
Quality 7 :  5125286.18 bits/sec


## FESTIVE Metrics

Under festive, we have computations for efficiency, stability, and fairness. Since we're not looking at competition between players, we will focus on stability and efficiency for individual players.

### Efficiency
For `N` players and bandwidth `W` where player `x` plays bit-rate `b_x_t` at time t, we define inefficiency as a function of time:
`|sum(b_x_t - W)|/W`

For us, since we only have one player, this becomes
`|(b_t - W)|/W`

i.e. at time `t`, the inefficiency is the difference between the current bitrate and the actual bandwidth, all divided by the bandwidth. 

For this we have a few options to compute the bandwidth to use at time t:
* Take the mean over the entire video to get mean bitrate
* Take the mean over a sliding window of segments around the current one at the current quality 
* Look only at the segment being downloaded

I like the third option best because the adaptation algorithm has that information on the size of the segment when it makes its decisision.

In [181]:
def bitrateForSegment(sim, segmentNumber):
    segSize = getSegmentSizeChoices(sim.path)[segmentNumber]
    return segSize / 2.0 # Divide by 2 because duration is two seconds

def bitrateForAllSegments(sim):
    segSizes = getSegmentSizeChoices(sim.path)
    return segSizes / 2.0

In [182]:
def inefficiencyScore(sim, segmentNumber):
    bitrate = bitrateForSegment(sim, segmentNumber)   
    bandwidthBps = sim.rateVal
    return abs (bandwidthBps - bitrate) / bandwidthBps


def inefficiencyScores(sim):
    bitrates = bitrateForAllSegments(sim)
    bandwidthBps = sim.rateVal

    return abs (bitrates - bandwidthBps) / bandwidthBps


def meanInefficiency(sim):
    return inefficiencyScores(sim).mean()

In [183]:
sampleQuicSim = Simulation(sampleQuicDir)
print(inefficiencyScore(sampleQuicSim, 0))
print(inefficiencyScore(sampleQuicSim, 2))
print(inefficiencyScore(sampleQuicSim, 3))
print(inefficiencyScore(sampleQuicSim, 4))
print(inefficiencyScore(sampleQuicSim, 5))

inefficiencyScores(sampleQuicSim)

0.999218
0.979147
0.9769315
0.9859475
0.9858765


array([0.999218 , 0.988803 , 0.979147 , 0.9769315, 0.9859475, 0.9858765,
       0.9890855, 0.991851 , 0.9895895, 0.991705 , 0.990389 , 0.991656 ,
       0.9917645, 0.994451 , 0.993565 , 0.988925 , 0.989494 , 0.9879025,
       0.985992 , 0.986492 , 0.971861 , 0.948986 , 0.957303 , 0.9644935,
       0.971294 , 0.9737635, 0.927984 , 0.929926 , 0.9350025, 0.934797 ,
       0.9547595, 0.945214 , 0.9349135, 0.9709125, 0.950485 , 0.9749865,
       0.9686135, 0.968624 , 0.9672955, 0.968813 , 0.9634375, 0.965162 ,
       0.9600805, 0.963889 , 0.9794295, 0.9731   , 0.967376 , 0.971432 ,
       0.965893 , 0.977587 , 0.981885 , 0.994425 , 0.957565 , 0.9154615,
       0.9005715, 0.883133 , 0.8832015, 0.9009615, 0.913289 , 0.965199 ,
       0.9310595, 0.922618 , 0.9546665, 0.942962 , 0.9272035, 0.9512365,
       0.971827 , 0.9719025, 0.975762 , 0.930174 , 0.9307615, 0.9590985,
       0.9518955, 0.973785 , 0.9983015, 0.9410445, 0.882098 , 0.8739415,
       0.9014315, 0.8945845, 0.8913515, 0.902468 , 

In [207]:
sampleTcpSim = Simulation(sampleTcpDir)
print(inefficiencyScore(sampleTcpSim, 0))
print(inefficiencyScore(sampleTcpSim, 2))
print(inefficiencyScore(sampleTcpSim, 3))
print(inefficiencyScore(sampleTcpSim, 4))
print(inefficiencyScore(sampleTcpSim, 5))

inefficiencyScores(sampleTcpSim)

0.999218
0.979147
0.9769315
0.9859475
0.9858765


array([0.999218 , 0.988803 , 0.979147 , 0.9769315, 0.9859475, 0.9858765,
       0.9890855, 0.991851 , 0.9895895, 0.991705 , 0.990389 , 0.991656 ,
       0.9917645, 0.994451 , 0.993565 , 0.988925 , 0.989494 , 0.9879025,
       0.985992 , 0.986492 , 0.971861 , 0.948986 , 0.957303 , 0.9644935,
       0.971294 , 0.9737635, 0.927984 , 0.929926 , 0.9350025, 0.934797 ,
       0.9547595, 0.945214 , 0.9349135, 0.9709125, 0.950485 , 0.9749865,
       0.9686135, 0.968624 , 0.9672955, 0.968813 , 0.9634375, 0.965162 ,
       0.9600805, 0.922233 , 0.9794295, 0.9731   , 0.967376 , 0.971432 ,
       0.965893 , 0.977587 , 0.9592525, 0.994425 , 0.957565 , 0.9154615,
       0.9005715, 0.883133 , 0.8832015, 0.788591 , 0.913289 , 0.965199 ,
       0.9310595, 0.922618 , 0.9546665, 0.942962 , 0.835357 , 0.9512365,
       0.971827 , 0.9719025, 0.975762 , 0.930174 , 0.9307615, 0.9153015,
       0.9518955, 0.973785 , 0.9983015, 0.9410445, 0.882098 , 0.8739415,
       0.792141 , 0.8945845, 0.8913515, 0.902468 , 

In [185]:
# Computes the difference inefficiency(sim2) - inefficiency(sim1)
def compareInefficiency(sim1, sim2):
    scores1 = inefficiencyScores(sim1)
    scores2 = inefficiencyScores(sim2)
    
    # Trim off the end of the longer simulation if once quit earlier
    if len(scores1) != len(scores2):
        minLen = min(len(scores1), len(scores2))
        scores1 = scores1[:minLen]
        scores2 = scores2[:minLen]
    
    mean1 = scores1.mean()
    mean2 = scores2.mean()
    
    return mean2 - mean1

In [186]:
# Demo: Let's compare the inefficiencies between all pairs where there is loss
for (quicSim, tcpSim) in findSimulationPairs(quicLossDir, lambda sim : sim.countSegments() > 60):
    print ('comparing {} with TCP counterpart'.format(getDescription(quicSim)))
    delta = compareInefficiency(quicSim, tcpSim)
    if delta == 0:
        print ('tie')
    elif delta < 0:
        print ('TCP more efficient by {}'.format(abs(delta)))
    else:
        print ('QUIC more efficient by {}'.format(delta))
    print()
    
# TODO May want to try graphing this? The trend appears to be that TCP is more efficient

comparing festive over QUIC @ 500Kbps with 0.0% loss (with pacing) with TCP counterpart
TCP more efficient by 0.03349548979591821

comparing festive over QUIC @ 500Kbps with 0.0% loss (no pacing) with TCP counterpart
TCP more efficient by 0.0057276027397259455

comparing festive over QUIC @ 500Kbps with 1.0% loss (with pacing) with TCP counterpart
TCP more efficient by 0.012860738255033421

comparing festive over QUIC @ 500Kbps with 1.0% loss (no pacing) with TCP counterpart
TCP more efficient by 0.0030497288135593603

comparing festive over QUIC @ 1Mbps with 0.0% loss (with pacing) with TCP counterpart
TCP more efficient by 0.005827577639751613

comparing festive over QUIC @ 1Mbps with 1.0% loss (with pacing) with TCP counterpart
TCP more efficient by 0.000831677165354372

comparing festive over QUIC @ 1Mbps with 1.0% loss (no pacing) with TCP counterpart
TCP more efficient by 0.005305012345678928

comparing festive over QUIC @ 1Mbps with 2.0% loss (with pacing) with TCP counterpart
T

In [187]:
# Demo: Look at the average difference between QUIC and TCP efficiency
total = 0
count = 0
for (quicSim, tcpSim) in findSimulationPairs(quicLossDir, lambda sim : sim.countSegments() > 60):
    total += compareInefficiency(quicSim, tcpSim)
    count += 1

print (total/count)

-0.003927904646335097


__Efficiency take-away__ : From this measurement, it looks like neither one is significantly more efficiency over the whole set of test-runs. Is there a relevant subset where QUIC or TCP performs significantly better than the other?

### Stability
Festive defines instability as a weighted sum of quality switches over a weighted sum of bit rates

In [188]:
def bitrateAtTime(sim, time):
    # Find the segment that corresponds to the given time and use its size
    segment = int(np.floor (time / 2))
    return bitrateForSegment(sim, segment)


def bitrateEachSecond(sim):
    segmentBitrates = bitrateForAllSegments(sim)
    
    # Reach segment represents two seconds so we need to duplicate all values in 
    # the list.
    return np.repeat(segmentBitrates, 2)


# Compute the instability of simulation sim as of time t
def instabilityAtTime(sim, t):
    if (t == 0):
        return 0
    
    k = 20 # Length of weighted sum
    windowSize = min(k, t) # Don't go back the full window at the beginning of the timeline
    
    def weight(d):
        return k-d; # Linear penalty function to decrease the contribution of switches longer in the past
    
    bitSwitchSum = 0
    for d in range(windowSize):
        # Could be done more efficiently
        prevBitrate = bitrateAtTime(sim, t-d-1)
        curBitrate = bitrateAtTime(sim, t - d)
        
        bitSwitchSum += weight(d) * abs (prevBitrate - curBitrate)    
    
    bitrateSum = 0
    for d in range(1, windowSize + 1):
        bitrate = bitrateAtTime(sim, t - d)
        bitrateSum += weight(d) * bitrate
        
    return bitSwitchSum / bitrateSum



In [208]:
def bitrateAtTime(sim, time):
    # Find the segment that corresponds to the given time and use its size
    segment = int(np.floor (time / 2))
    return bitrateForSegment(sim, segment)

# Compute the instability of simulation sim as of time t
def instabilityAtTime(sim, t):
    if (t == 0):
        return 0
    
    bitrates = bitrateEachSecond(sim)
    
    k = 20 # Length of weighted sum
    windowSize = min(k, t) # Don't go back the full window at the beginning of the timeline
    
    def weight(d):
        return k-d; # Linear penalty function to decrease the contribution of switches longer in the past
    
    bitSwitchSum = 0
    for d in range(windowSize):
        prev = bitrates[t-d-1]
        cur = bitrates[t-d]
        bitSwitchSum += weight(d) * abs (cur - prev)    
    
    bitrateSum = 0
    for d in range(1, windowSize + 1):
        bitrateSum += weight(d) * bitrates[t-d]
    
    return bitSwitchSum / bitrateSum


def instabilityEachSecond(sim):
    bitrates = bitrateEachSecond(sim)
    seconds = sim.duration()

    results = np.zeros(seconds)
    
    # TODO this is insanely inefficient. Should do a sliding window approach...
    for sec in range(seconds):
        results[sec] = instabilityAtTime(sim, sec)
        
    return results
    