
# True Positive (True alarm) test for SEED

This experiment is based on the papar "Detecting Volatility Shift in Data Stream".

## General idea for this experiment

Drift detection algorithm SEED is feeding with a stationary data stream followed by gradual changing concept, and any true alarm happened would be recorded and counted to the true positive rate, as well as detection delay and execution time would be also computed in this experiment.

## Details for this experiment

This experiment is conducted with exactly the same sets of changing slopes of parameters $\mu$ , and other parameters take the best sample on the paper. However, this implement only takes 100 000 bits for each stream instead of 1 000 000 bits, because there is significately longer execution time (much more than half hour) on my PC.

## Setup and generate different slope data stream

In [1]:
from scipy.stats import bernoulli
from skmultiflow.drift_detection import SEED
import numpy as np
import time

# We use confidence parameter delta = 0.05 for both SEED and ADWIN



In [2]:
# data stream generated for slope = 0.0001
def stream1():
    param = 0.2
    data_stream = bernoulli.rvs(size=99000, p=param)
    data_stream = data_stream.tolist()
    for k in range(1000):
        param += 0.0001
        ele = bernoulli.rvs(size=1, p=param).tolist()
        data_stream = data_stream + ele
    return data_stream

In [4]:
# data stream generated for slope = 0.0002
def stream2():
    param = 0.2
    data_stream = bernoulli.rvs(size=99000, p=param)
    data_stream = data_stream.tolist()
    for k in range(1000):
        param += 0.0002
        ele = bernoulli.rvs(size=1, p=param).tolist()
        data_stream = data_stream + ele
    return data_stream

In [5]:
# data stream generated for slope = 0.0003
def stream3():
    param = 0.2
    data_stream = bernoulli.rvs(size=99000, p=param)
    data_stream = data_stream.tolist()
    for k in range(1000):
        param += 0.0003
        ele = bernoulli.rvs(size=1, p=param).tolist()
        data_stream = data_stream + ele
    return data_stream

In [6]:
# data stream generated for slope = 0.0004
def stream4():
    param = 0.2
    data_stream = bernoulli.rvs(size=99000, p=param)
    data_stream = data_stream.tolist()
    for k in range(1000):
        param += 0.0004
        ele = bernoulli.rvs(size=1, p=param).tolist()
        data_stream = data_stream + ele
    return data_stream

## Details for implements
For each trial

    create new drift detector instance
    create new data stream


    for each data instance

        feed drift detector with data
        check change happening and increment true alarm number

calculate and report true alarm number, average delay time and execution time.

In [7]:
# slope = 0.0001
true_alarm = 0
delay_time = []
start_time = time.time()

for i in range(100):
    seed = SEED()
    data = stream1()
    for j in range(100000):
        seed.add_element(data[j])
        if seed.detected_change() and j >= 99000:
            true_alarm += 1
            delay_time.append(j-99000)
            break
print("The true alarm number among 100 trails is", true_alarm)
print("The average delay time among 100 trails is", sum(delay_time)/100.0)
print("The time for the whole tests in seconds is", time.time() - start_time)

The true alarm number among 100 trails is 80
The average delay time among 100 trails is 623.84
The time for the whole tests in seconds is 110.25690793991089


In [8]:
# slope = 0.0002
true_alarm = 0
delay_time = []
start_time = time.time()

for i in range(100):
    seed = SEED()
    data = stream2()
    for j in range(100000):
        seed.add_element(data[j])
        if seed.detected_change() and j >= 99000:
            true_alarm += 1
            delay_time.append(j-99000)
            break
print("The true alarm number among 100 trails is", true_alarm)
print("The average delay time among 100 trails is", sum(delay_time)/100.0)
print("The time for the whole tests in seconds is", time.time() - start_time)

The true alarm number among 100 trails is 100
The average delay time among 100 trails is 536.28
The time for the whole tests in seconds is 110.76663517951965


In [9]:
# slope = 0.0003
true_alarm = 0
delay_time = []
start_time = time.time()

for i in range(100):
    seed = SEED()
    data = stream3()
    for j in range(100000):
        seed.add_element(data[j])
        if seed.detected_change() and j >= 99000:
            true_alarm += 1
            delay_time.append(j-99000)
            break
print("The true alarm number among 100 trails is", true_alarm)
print("The average delay time among 100 trails is", sum(delay_time)/100.0)
print("The time for the whole tests in seconds is", time.time() - start_time)

The true alarm number among 100 trails is 100
The average delay time among 100 trails is 434.52
The time for the whole tests in seconds is 109.37062907218933


In [10]:
# slope = 0.0004
true_alarm = 0
delay_time = []
start_time = time.time()

for i in range(100):
    seed = SEED()
    data = stream4()
    for j in range(100000):
        seed.add_element(data[j])
        if seed.detected_change() and j >= 99000:
            true_alarm += 1
            delay_time.append(j-99000)
            break
print("The true alarm number among 100 trails is", true_alarm)
print("The average delay time among 100 trails is", sum(delay_time)/100.0)
print("The time for the whole tests in seconds is", time.time() - start_time)

The true alarm number among 100 trails is 100
The average delay time among 100 trails is 368.28
The time for the whole tests in seconds is 111.56591391563416


## Result table of this experiment

| slope | TP times | Delay (avg) | Time (second) |
| --- | --- | --- | --- |
| 0.0001 | 80 | 623.84 | 110.26 |
| 0.0002 | 100 | 536.28 | 110.77 |
| 0.0003 | 100 | 434.52 | 109.37 |
| 0.0004 | 100 | 368.28 | 111.57 |
