# Data Streaming Algorithms

## HW2

See preprocessed notebook at https://github.com/salasin/DataStreamingAlgorithms

## Dry Part

$M$ is a data stream and $|M| = n$.

$A$ is $M$'s heavy hitters sketch and $|A| = k$.

(a) For every $j ∈ A, \tilde{f_j} \ge f_j$

The proof is by induction on the number of appearances of an element in the stream.

**Base case:**

After the 1st appearance of $j$ we have $f_j = 1$.

If $|A| < k$ before $j$'s appearance then it's added to the sketch and we get $\tilde{f_j} = 1 \ge f_j$.

Otherwise, if $|A| = k$ then we get $\tilde{f_j} = 1 + \tilde{f_{}}_{min} \ge 1 = f_j$ because $\tilde{f_{}}_{min} \ge 0$ because the counters are non-negative.

**Induction step:**

We denote by $\tilde{f_j}^{(i)}$ the value of $j$'s counter after $i$ appearances of $j$ and by $f_j^{(i)}$ the true count of $j$'s appearances (which is $i$).

We assume that $\tilde{f_j}^{(i)} \ge f_j^{(i)}$ holds for a given $i \ge 1$.

Then for $i + 1$ we get:

If $j \in A$ then we incremenet $\tilde{f_j}^{(i)}$ and we get $\tilde{f_j}^{(i + 1)} = \tilde{f_j}^{(i)} + 1 \ge f_j^{(i)} + 1 = f_j^{(i + 1)}$.

If $j \notin A$ and since $i \ge 1$ (not $j$'s first appearance) we can infer that $j$ was in $A$ and was removed. Therefore it must hold that $|A| = k$. At the time of $j$'s removal from $A$ we had $\tilde{f_j}^{(i)} = \tilde{f_{}}_{min}$. Now we want to re-insert $j$ to $A$ and we have a new $\tilde{f_{}}_{min}^{(new)}$ that is bigger than $\tilde{f_{}}_{min}$ because all counters are monotonically increasing. Then we get $\tilde{f_j}^{(i + 1)} = \tilde{f_{}}_{min}^{(new)} + 1 > \tilde{f_{}}_{min} + 1 = \tilde{f_j}^{(i)} + 1 \ge f_j^{(i)} + 1 = f_j^{(i + 1)}$.

(b) For every $j \in [M], \tilde{f_j} ≤ f_j + \tilde{f_{}}_{min}$

If $j \notin A$ we have $\tilde{f_j} = 0$ and the inequality holds since the right side is non-negative.

If $j \in A$, then $j$ was inserted to $A$ at some point and additional instances of $j$ may have been observed since. We denote with $\tilde{f_j}^{(curr)}$ the current value of the counter corresponding to $j$. We denote with $\tilde{f_{}}_{min}^{(curr)}$ the value of the counter that currently has the minimal value and with $\tilde{f_{}}_{min}^{(old)}$ the value of the counter that had the minimal value at the time $j$ was inserted. We denote with $f_{j}^{(before)}$ the count of $j$ instances before $j$'s insertion and with $f_{j}^{(after)}$ the similar count after the insertion.

We observe that $f_j = f_{j}^{(before)} + f_{j}^{(after)}$ and therefore $f_j \ge f_{j}^{(after)}$. We also observe that $\tilde{f_{}}_{min}^{(old)} \le \tilde{f_{}}_{min}^{(curr)}$ because all counters are monotonically increasing.

Therefore:

$\tilde{f_j} = \tilde{f_j}^{(curr)} = f_{j}^{(after)} + \tilde{f_{}}_{min}^{(old)} \le f_j + \tilde{f_{}}_{min}^{(curr)} = f_j + \tilde{f_{}}_{min}$

(c) $ֿ\sum_{j \in A} \tilde{f_j} = n$

Proof by induction on the size of the stream.

**Base case:**

When n = 1, i.e. when we process the first element $i$ in the stream, we add a new counter $\tilde{f_i} = 1$ to the sketch. Therefore:

$ֿ\sum_{j \in A} \tilde{f_j} = \tilde{f_i} = 1 = n$

**Induction step:**

We denote by $\sum_{j \in A}^{(n)} \tilde{f_j}$ the sum of all counters and by $\tilde{f_{j}}^{(n)}$ the value of $j$'s counter after $n$ elements from the stream were processed.

We assume that $ֿ\sum_{j \in A}^{(n)} \tilde{f_j} = n$ holds for a given $n$.

Then for $n + 1$ we get:

We start processing the $n + 1$ element $i$ so we incremenet $\tilde{f_i}$ (Note that it doesn't matter in this context whether $i$ was in $A$. If it wasn't we simply assume that before the incrementation $\tilde{f_i} = \tilde{f}_{min}$) and we get:

 $\sum_{j \in A}^{(n + 1)} \tilde{f_j} = ֿ\sum_{j \in A, j \neq i}^{(n + 1)} \tilde{f_j} + \tilde{f_i}^{(n + 1)} = \sum_{j \in A, j \neq i}^{(n + 1)} \tilde{f_j} + \tilde{f_i}^{(n)} + 1 = \sum_{j \in A}^{(n)} \tilde{f_j} + 1 = n + 1$

(d) $\tilde{f}_{min} \le \lfloor n/k \rfloor$ and hence $f_j \le \tilde{f_j} \le f_j + \lfloor n/k \rfloor$ for every $j \in A$.

From (c) we know that $ֿ\sum_{j \in A} \tilde{f_j} = n$. Also, the number of counters is $k$. Therefore, the average value of a counter is $\lfloor n / k \rfloor$. Since $\tilde{f}_{min}$ is the smallest of all counters then it must be below the average and therefore $\tilde{f}_{min} \le \lfloor n/k \rfloor$.

Combining this conclusion with the conclusions from (a) and (b) we can deduce that $f_j \le \tilde{f_j} \le f_j + \lfloor n/k \rfloor$ for every $j \in A$.

## Wet Part

In [146]:
import os.path
import pickle
import pandas as pd
import numpy as np
import sys
import time
import math

In [147]:
data = pd.read_csv('2017-07-03.csv')
source_ips = data.loc[:, 'ipv4Src']
unique_source_ips = len(source_ips.unique())
print(f'numer of unique source ips: {unique_source_ips}')

numer of unique source ips: 9640


In [148]:
true_frequencies = {}
for source_ip in source_ips:
    true_frequencies[source_ip] = true_frequencies.get(source_ip, 0) + 1
print(f'true frequency of 35.160.100.86 is: {true_frequencies["35.160.100.86"]}')

true frequency of 35.160.100.86 is: 753


In [149]:
top_elephant_flows = sorted(true_frequencies, key=true_frequencies.get, reverse=True)[:20]
top_rare_flows = sorted(true_frequencies, key=true_frequencies.get, reverse=False)[:20]

In [150]:
def init_hash_functions(stream, d, w):
    # Generates perfect hash functions from source ips to values between 0 to w - 1.
    hash_functions = []
    for _ in range(d):
        hash_functions.append(dict(map(lambda key, value: (key, value), stream.unique(), np.random.randint(0, w, len(stream.unique())))))
    return hash_functions

In [151]:
def count_morris_counters(morris_sigma, morris_delta):
    return int(1 / (morris_delta * (morris_sigma ** 2)))

In [152]:
def evaluate_avg_bias(items, sketch, true_frequencies):
    unique_items = np.unique(items)
    bias_per_item = []
    for item in unique_items:
        true_frequency = true_frequencies[item]
        estimated_frequency = sketch.query(item)
        bias = float(estimated_frequency / true_frequency)
        normalized_bias = 1 - bias if bias <= 1 else bias - 1
        bias_per_item.append(normalized_bias)
    return round(sum(bias_per_item) / len(bias_per_item), 2)

### Count Min Sketch

In [153]:
class CountMinSketch:
    def __init__(self, stream, d, w, morris_sigma=0.5, morris_delta=0.9, use_morris_counters=True):
        self.d = d
        self.w = w
        self.morris_estimators_num = count_morris_counters(morris_sigma, morris_delta)
        self.use_morris_counters = use_morris_counters
        self.hash_functions = init_hash_functions(stream, d, w)
        self.sketch = self.create_sketch(stream)

    def create_sketch(self, stream):
        sketch = {}
        for i, item in enumerate(stream):
            for j, hash_function in enumerate(self.hash_functions):
                if self.use_morris_counters:
                    morris_estimators = sketch.get((j, hash_function[item]), np.zeros(self.morris_estimators_num))
                    p = 1 / (2 ** morris_estimators)
                    update_mask = np.random.uniform(0, 1, size=morris_estimators.shape) <= p
                    morris_estimators[update_mask] += 1
                    sketch[(j, hash_function[item])] = morris_estimators
                else:
                    sketch[(j, hash_function[item])] = sketch.get((j, hash_function[item]), 0) + 1
            sys.stdout.write(f'\rprocessing element {i} of {len(stream)}: {round(float(i / len(stream)) * 100, 2)}% completed.')
        print()
        return sketch

    def query(self, item):
        if self.use_morris_counters:
            hash_values = []
            for i, hash_function in enumerate(self.hash_functions):
                morris_estimators = self.sketch[(i, hash_function[item])]
                processed_morris_estimators = (2 ** morris_estimators) - 1
                hash_values.append(np.mean(processed_morris_estimators))
            return min(hash_values)
        else:
            hash_values = [self.sketch[(i, hash_function[item])] for i, hash_function in enumerate(self.hash_functions)]
            return min(hash_values)

The following function calculates whether the theoretical guarantee for count min sketch holds by counting the number of elements that are in the range and comparing to the total number of elements:

If $w = \dfrac{2}{\epsilon}$ and $d = log\dfrac{1}{\delta}$ then $Pr[f_x \le \tilde{f}_x \le f_x + \epsilon F_1] \ge 1 - \delta$

In [154]:
def calc_count_min_sketch_theoretical_guarantees_accuracy(items, sketch, true_frequencies):
    unique_items = items.unique()
    items_that_hold_range_guarantee = 0
    epsilon = float(2 / sketch.w)
    delta = float(1 / (2 ** sketch.d))
    for item in unique_items:
        true_frequency = true_frequencies[item]
        estimated_frequency = sketch.query(item)
        if true_frequency <= estimated_frequency <= true_frequency + epsilon * len(items):
            items_that_hold_range_guarantee += 1
    return round(float(items_that_hold_range_guarantee / len(unique_items)), 2), 1 - delta

In order to focus the problem we have, we assume that we're given a business requirement to reduce the memory consumption to 10% of the solution that uses true counters.

Since we have 9640 unique elements in the stream and assuming a single counter size is 8 bytes, we need 77,120 bytes to store all the counters. We will try to reduce it to 8KB = 8192 bytes which approximately meets the business requirements.

We will try to optimize the sketch accuracy and overall runtime using only 8 KB.

Using regular counters, we can use up to $2^{10}$ counters:
1. d = 2, w = 512
2. d = 4, w = 256
3. d = 8, w = 128

In [155]:
if os.path.isfile('count_min_regular_counters_results.pkl'):
    with open('count_min_regular_counters_results.pkl', 'rb') as f:
        results_dict = pickle.load(f)
else:
    results_dict = {}

In [156]:
if not results_dict:
    d_w_pairs = [(2, 512), (4, 256), (8, 128)]
    elapsed_time = []
    total_avg_bias = []
    elephant_flows_avg_bias = []
    rare_flows_avg_bias = []
    theoretical_guarantee_empiric_accuracies = []
    theoretical_guarantee_calculated_accuracies = []
    for d, w in d_w_pairs:
        print(f'calculating count min sketch with regular counters with d={d} and w={w}')
        start = time.time()
        count_min_sketch = CountMinSketch(source_ips, d, w, use_morris_counters=False)
        end = time.time()
        elapsed_time.append(int(end - start))
        total_avg_bias.append(evaluate_avg_bias(source_ips, count_min_sketch, true_frequencies))
        elephant_flows_avg_bias.append(evaluate_avg_bias(top_elephant_flows, count_min_sketch, true_frequencies))
        rare_flows_avg_bias.append(evaluate_avg_bias(top_rare_flows, count_min_sketch, true_frequencies))
        empiric_accuracy, calculated_accuracy = calc_count_min_sketch_theoretical_guarantees_accuracy(source_ips, count_min_sketch, true_frequencies)
        theoretical_guarantee_empiric_accuracies.append(empiric_accuracy)
        theoretical_guarantee_calculated_accuracies.append(calculated_accuracy)
        results_dict = {'d_w_pairs': d_w_pairs,
                        'elapsed_time': elapsed_time,
                        'total_avg_bias': total_avg_bias,
                        'elephant_flows_avg_bias': elephant_flows_avg_bias,
                        'rare_flows_avg_bias': rare_flows_avg_bias,
                        'theoretical_guarantee_empiric_accuracies': theoretical_guarantee_empiric_accuracies,
                        'theoretical_guarantee_calculated_accuracies': theoretical_guarantee_calculated_accuracies}
    with open('count_min_regular_counters_results.pkl', 'wb') as f:
        pickle.dump(results_dict, f)

calculating count min sketch with regular counters with d=2 and w=512
processing element 11625502 of 11625503: 100.0% completed.
calculating count min sketch with regular counters with d=4 and w=256
processing element 11625502 of 11625503: 100.0% completed.
calculating count min sketch with regular counters with d=8 and w=128
processing element 11625502 of 11625503: 100.0% completed.


In [157]:
pd.DataFrame.from_dict(results_dict).head()

Unnamed: 0,d_w_pairs,elapsed_time,total_avg_bias,elephant_flows_avg_bias,rare_flows_avg_bias,theoretical_guarantee_empiric_accuracies,theoretical_guarantee_calculated_accuracies
0,"(2, 512)",845,120.15,0.03,2170.47,1.0,0.75
1,"(4, 256)",852,192.91,0.06,3076.3,1.0,0.9375
2,"(8, 128)",893,404.83,0.11,6348.1,1.0,0.996094


In [158]:
print(count_morris_counters(morris_sigma=0.5, morris_delta=0.9))

4


Using morris counters, there's a tradeoff between the amount of matrix counters we use and the amount of morris counters we use to maintain every matrix counter. The storage required for a morris counter is roughly log(64) = 6 bits, therefore a regular counter requires roughly the same storage size as 10 morris counters. We don't want to use 10 morris counters per matrix counter because we would prefer using a regular counter and get better accuracy. Our product managers told us that they are willing to tolerate morris_sigma = 0.5 and morris_delta = 0.9, and therefore we can use 4 morris counters per matrix counter. 4 morris counters are 3 bytes, therefore we can use 2730 matrix counters:
 1. d = 2, w = 1365
 2. d = 4, w = 682

Note that for d = 2 we can use w = 1365 matrix counters which is more than twice the amount of regular matrix counters we could use with the same storage capacity.

In [159]:
if os.path.isfile('count_min_morris_counters_results.pkl'):
    with open('count_min_morris_counters_results.pkl', 'rb') as f:
        results_dict = pickle.load(f)
else:
    results_dict = {}

In [160]:
if not results_dict:
    d_w_pairs = [(2, 1365), (4, 682)]
    elapsed_time = []
    total_avg_bias = []
    elephant_flows_avg_bias = []
    rare_flows_avg_bias = []
    theoretical_guarantee_empiric_accuracies = []
    theoretical_guarantee_calculated_accuracies = []
    morris_sigma = 0.5
    morris_delta = 0.9
    start = time.time()
    for d, w in d_w_pairs:
        print(f'calculating count min sketch with regular counters with d={d} and w={w} and {count_morris_counters(morris_sigma, morris_delta)} morris counters per matrix counter')
        start = time.time()
        count_min_sketch = CountMinSketch(source_ips, d, w, morris_sigma, morris_delta)
        end = time.time()
        elapsed_time.append(int(end - start))
        total_avg_bias.append(evaluate_avg_bias(source_ips, count_min_sketch, true_frequencies))
        elephant_flows_avg_bias.append(evaluate_avg_bias(top_elephant_flows, count_min_sketch, true_frequencies))
        rare_flows_avg_bias.append(evaluate_avg_bias(top_rare_flows, count_min_sketch, true_frequencies))
        empiric_accuracy, calculated_accuracy = calc_count_min_sketch_theoretical_guarantees_accuracy(source_ips, count_min_sketch, true_frequencies)
        theoretical_guarantee_empiric_accuracies.append(empiric_accuracy)
        theoretical_guarantee_calculated_accuracies.append(calculated_accuracy)
    results_dict = {'d_w_pairs': d_w_pairs,
                    'elapsed_time': elapsed_time,
                    'total_avg_bias': total_avg_bias,
                    'elephant_flows_avg_bias': elephant_flows_avg_bias,
                    'rare_flows_avg_bias': rare_flows_avg_bias,
                    'theoretical_guarantee_empiric_accuracies': theoretical_guarantee_empiric_accuracies,
                    'theoretical_guarantee_calculated_accuracies': theoretical_guarantee_calculated_accuracies}
    with open('count_min_morris_counters_results.pkl', 'wb') as f:
        pickle.dump(results_dict, f)

calculating count min sketch with regular counters with d=2 and w=1365 and 4 morris counters per matrix counter
processing element 11625502 of 11625503: 100.0% completed.
calculating count min sketch with regular counters with d=4 and w=682 and 4 morris counters per matrix counter
processing element 11625502 of 11625503: 100.0% completed.


In [161]:
pd.DataFrame.from_dict(results_dict).head()

Unnamed: 0,d_w_pairs,elapsed_time,total_avg_bias,elephant_flows_avg_bias,rare_flows_avg_bias,theoretical_guarantee_empiric_accuracies,theoretical_guarantee_calculated_accuracies
0,"(2, 1365)",1255,36.04,0.19,513.23,0.97,0.75
1,"(4, 682)",1563,45.79,0.31,1025.62,0.98,0.9375


In [162]:
def init_sign_hash_functions(stream, d):
    # Generates perfect sign hash functions from source ips to values in {1, -1}.
    sign_hash_functions = []
    for _ in range(d):
        sign_hash_functions.append(dict(map(lambda key, value: (key, value), stream.unique(), np.random.choice([1, -1], len(stream.unique())))))
    return sign_hash_functions

### Count Sketch

In [174]:
class CountSketch:
    def __init__(self, stream, d, w, morris_sigma=0.5, morris_delta=0.9, use_morris_counters=True):
        self.d = d
        self.w = w
        self.morris_estimators_num = count_morris_counters(morris_sigma, morris_delta)
        self.use_morris_counters = use_morris_counters
        self.hash_functions = init_hash_functions(stream, d, w)
        self.sign_hash_functions = init_sign_hash_functions(stream, d)
        self.sketch = self.create_sketch(stream)

    def create_sketch(self, stream):
        sketch = {}
        for i, item in enumerate(stream):
            for j, hash_function in enumerate(self.hash_functions):
                sign_hash_function = self.sign_hash_functions[j]
                if self.use_morris_counters:
                    morris_estimators = sketch.get((j, hash_function[item]), np.zeros(self.morris_estimators_num))
                    p = 1 / (2 ** morris_estimators)
                    update_mask = np.random.uniform(0, 1, size=morris_estimators.shape) <= p
                    morris_estimators[update_mask] += sign_hash_function[item]
                    sketch[(j, hash_function[item])] = morris_estimators
                else:
                    sketch[(j, hash_function[item])] = sketch.get((j, hash_function[item]), 0) + sign_hash_function[item]
            sys.stdout.write(f'\rprocessing element {i} of {len(stream)}: {round(float(i / len(stream)) * 100, 2)}% completed.')
        print()
        return sketch

    def query(self, item):
        if self.use_morris_counters:
            hash_values = []
            for i, hash_function in enumerate(self.hash_functions):
                morris_estimators = self.sketch[(i, hash_function[item])]
                processed_morris_estimators = (2 ** morris_estimators) - 1
                hash_values.append(np.mean(processed_morris_estimators) * self.sign_hash_functions[i][item])
            return np.median(hash_values)
        else:
            hash_values = [self.sketch[(i, hash_function[item])] * self.sign_hash_functions[i][item] for i, hash_function in enumerate(self.hash_functions)]
            return np.median(hash_values)

The following function calculates whether the theoretical guarantee for count sketch holds by counting the number of elements that are in the range and comparing to the total number of elements:

If $w = \dfrac{3}{\epsilon^2}$ and $d = log\dfrac{1}{\delta}$ then $Pr[|\tilde{f}_x - f_x| \le \epsilon ||f||_2] \ge 1 - \delta$

In [175]:
def calc_count_sketch_theoretical_guarantees_accuracy(items, sketch, true_frequencies):
    unique_items = items.unique()
    items_that_hold_range_guarantee = 0
    epsilon = math.sqrt(float(3 / sketch.w))
    delta = float(1 / (2 ** sketch.d))
    frequencies_norm = np.linalg.norm(list(true_frequencies.values()))
    for item in unique_items:
        true_frequency = true_frequencies[item]
        estimated_frequency = sketch.query(item)
        if abs(estimated_frequency - true_frequency) <= epsilon * frequencies_norm:
            items_that_hold_range_guarantee += 1
    return round(float(items_that_hold_range_guarantee / len(unique_items)), 2), 1 - delta

The analysis we will perform on the count sketch is similar to the analysis we did for count min sketch.

In [176]:
if os.path.isfile('count_regular_counters_results.pkl'):
    with open('count_regular_counters_results.pkl', 'rb') as f:
        results_dict = pickle.load(f)
else:
    results_dict = {}

In [177]:
if not results_dict:
    d_w_pairs = [(2, 512), (4, 256), (8, 128)]
    elapsed_time = []
    total_avg_bias = []
    elephant_flows_avg_bias = []
    rare_flows_avg_bias = []
    theoretical_guarantee_empiric_accuracies = []
    theoretical_guarantee_calculated_accuracies = []
    for d, w in d_w_pairs:
        print(f'calculating count sketch with regular counters with d={d} and w={w}')
        start = time.time()
        count_sketch = CountSketch(source_ips, d, w, use_morris_counters=False)
        end = time.time()
        elapsed_time.append(int(end - start))
        total_avg_bias.append(evaluate_avg_bias(source_ips, count_sketch, true_frequencies))
        elephant_flows_avg_bias.append(evaluate_avg_bias(top_elephant_flows, count_sketch, true_frequencies))
        rare_flows_avg_bias.append(evaluate_avg_bias(top_rare_flows, count_sketch, true_frequencies))
        empiric_accuracy, calculated_accuracy = calc_count_sketch_theoretical_guarantees_accuracy(source_ips, count_sketch, true_frequencies)
        theoretical_guarantee_empiric_accuracies.append(empiric_accuracy)
        theoretical_guarantee_calculated_accuracies.append(calculated_accuracy)
    results_dict = {'d_w_pairs': d_w_pairs,
                    'elapsed_time': elapsed_time,
                    'total_avg_bias': total_avg_bias,
                    'elephant_flows_avg_bias': elephant_flows_avg_bias,
                    'rare_flows_avg_bias': rare_flows_avg_bias,
                    'theoretical_guarantee_empiric_accuracies': theoretical_guarantee_empiric_accuracies,
                    'theoretical_guarantee_calculated_accuracies': theoretical_guarantee_calculated_accuracies}
    with open('count_regular_counters_results.pkl', 'wb') as f:
        pickle.dump(results_dict, f)

calculating count sketch with regular counters with d=2 and w=512
processing element 11625502 of 11625503: 100.0% completed.
calculating count sketch with regular counters with d=4 and w=256
processing element 11625502 of 11625503: 100.0% completed.
calculating count sketch with regular counters with d=8 and w=128
processing element 11625502 of 11625503: 100.0% completed.


In [178]:
pd.DataFrame.from_dict(results_dict).head()

Unnamed: 0,d_w_pairs,elapsed_time,total_avg_bias,elephant_flows_avg_bias,rare_flows_avg_bias,theoretical_guarantee_empiric_accuracies,theoretical_guarantee_calculated_accuracies
0,"(2, 512)",855,597.11,0.01,3980.01,0.98,0.75
1,"(4, 256)",876,123.03,0.03,1269.15,1.0,0.9375
2,"(8, 128)",912,96.38,0.02,2086.49,1.0,0.996094


In [179]:
if os.path.isfile('count_morris_counters_results.pkl'):
    with open('count_morris_counters_results.pkl', 'rb') as f:
        results_dict = pickle.load(f)
else:
    results_dict = {}

In [180]:
if not results_dict:
    d_w_pairs = [(2, 1365), (4, 682)]
    elapsed_time = []
    total_avg_bias = []
    elephant_flows_avg_bias = []
    rare_flows_avg_bias = []
    theoretical_guarantee_empiric_accuracies = []
    theoretical_guarantee_calculated_accuracies = []
    morris_sigma = 0.5
    morris_delta = 0.9
    start = time.time()
    for d, w in d_w_pairs:
        print(f'calculating count sketch with regular counters with d={d} and w={w} and {count_morris_counters(morris_sigma, morris_delta)} morris counters per matrix counter')
        start = time.time()
        count_sketch = CountSketch(source_ips, d, w, morris_sigma, morris_delta)
        end = time.time()
        elapsed_time.append(int(end - start))
        total_avg_bias.append(evaluate_avg_bias(source_ips, count_sketch, true_frequencies))
        elephant_flows_avg_bias.append(evaluate_avg_bias(top_elephant_flows, count_sketch, true_frequencies))
        rare_flows_avg_bias.append(evaluate_avg_bias(top_rare_flows, count_sketch, true_frequencies))
        empiric_accuracy, calculated_accuracy = calc_count_sketch_theoretical_guarantees_accuracy(source_ips, count_sketch, true_frequencies)
        theoretical_guarantee_empiric_accuracies.append(empiric_accuracy)
        theoretical_guarantee_calculated_accuracies.append(calculated_accuracy)
    results_dict = {'d_w_pairs': d_w_pairs,
                    'elapsed_time': elapsed_time,
                    'total_avg_bias': total_avg_bias,
                    'elephant_flows_avg_bias': elephant_flows_avg_bias,
                    'rare_flows_avg_bias': rare_flows_avg_bias,
                    'theoretical_guarantee_empiric_accuracies': theoretical_guarantee_empiric_accuracies,
                    'theoretical_guarantee_calculated_accuracies': theoretical_guarantee_calculated_accuracies}
    with open('count_morris_counters_results.pkl', 'wb') as f:
        pickle.dump(results_dict, f)

calculating count sketch with regular counters with d=2 and w=1365 and 4 morris counters per matrix counter
processing element 4547 of 11625503: 0.04% completed.

  p = 1 / (2 ** morris_estimators)
  p = 1 / (2 ** morris_estimators)


processing element 11625502 of 11625503: 100.0% completed.


TypeError: unsupported operand type(s) for *: 'float' and 'dict'

In [None]:
pd.DataFrame.from_dict(results_dict).head()