Library for measuring performance of time series classification

Motivation

For temporal and sequential data (e.g. in biomedical applications), standard performance evaluation metrics, such as sensitivity and specificity, may not always be the most appropriate and can even be misleading. Evaluation metrics must ultimately reflect the needs of users and also be sufficiently sensitive to guide algorithm development.

For example, for epilepsy monitoring, neurologists ask for assesments on the level of seizure episodes (events), rather than duration or sample-by-sample based metrics. Similarly, another performance measure with a strong practical impact in epilepsy monitoring, is the false alarm rate (FAR), or the number of false positives per hour/day. Clinicians and patients see this measure as more meaningful than some established metrics in the ML community, and are very demanding in terms of performance, requiring it to be as low as possible for potential wearable applications (e.g., less than 1 FP/day). This also necessitates exceptionally high constraints on the required precision (usually much higher than 99%).

For this reason, here we provide code that measures performance on the level of events and on a sample-by-sample basis.

Metrics

In more details, we measures performance on the level of:

Sample : Performance metric that threats every label sample independently.
Events (e.g. epileptic seizure) : Classifies each event in both reference and hypothesis based on overlap of both.

Both methods are illustrated in the following figures :

Installation

The timescoring package is released for macOS, Windows and Linux on PyPi. It can be installed using pip :

python -m pip install -U pip
python -m pip install -U timescoring

The package is also available on conda-forge. It can be installed using conda :

conda install -c conda-forge timescoring

It can also be installed from source with a modern build of pip :

python -m pip install -U pip
git clone https://github.com/esl-epfl/epilepsy_performance_metrics.git
cd epilepsy_performance_metrics
python -m pip install -e .

Code

The timescoring package provides three classes :

annotation.Annotation : store annotations
scoring.SampleScoring(ref, hyp) : Compute sample based scoring
scoring.EventScoring(ref, hyp) : Compute event based scoring

In addition it also provides functions to visualize the output of the scoring algorithm (see visualization.py).

Parameters

Sample based scoring allows to set the sampling frequency of the labels. It defaults to 1 Hz.

Event based scoring allows to define certain parameters which are provided as an instance of scoring.EventScoring.Parameters :

toleranceStart (float): Allow some tolerance on the start of an event without counting a false detection. Defaults to 30 # [seconds].
toleranceEnd (float): Allow some tolerance on the end of an event without counting a false detection. Defaults to 60 # [seconds].
minOverlap (float): Minimum relative overlap between ref and hyp for a detection. Defaults to 0 which corresponds to any overlap # [relative].
maxEventDuration (float): Automatically split events longer than a given duration. Defaults to 5*60 # [seconds].
minDurationBetweenEvents (float): Automatically merge events that are separated by less than the given duration. Defaults to 90 # [seconds].

Scores

Scores are provided as attributes of the scoring class. The following metrics can be accesses :

sensitivity
precision
f1 : F1-score
fpRate : False alarm rate per 24h

Example of usage

# Loading Annotations #


from timescoring.annotations import Annotation

# Annotation objects can be instantiated from a binary mask

fs = 1
mask = [0, 1, 1, 0, 0, 0, 1, 1, 1, 0]

labels = Annotation(mask, fs)

print('Annotation objects contain a representation as a mask and as a list of events:')
print(labels.mask)
print(labels.events)


# Annotation object can also be instantiated from a list of events
fs = 1
numSamples = 10  # In this case the duration of the recording in samples should be provided
events = [(1, 3), (6, 9)]

labels = Annotation(events, fs, numSamples)


# Computing performance score #

from timescoring import scoring
from timescoring import visualization

fs = 1
duration = 66 * 60
ref = Annotation([(8 * 60, 12 * 60), (30 * 60, 35 * 60), (48 * 60, 50 * 60)], fs, duration)
hyp = Annotation([(8 * 60, 12 * 60), (28 * 60, 32 * 60), (50.5 * 60, 51 * 60), (60 * 60, 62 * 60)], fs, duration)
scores = scoring.SampleScoring(ref, hyp)
figSamples = visualization.plotSampleScoring(ref, hyp)

# Scores can also be computed per event
param = scoring.EventScoring.Parameters(
    toleranceStart=30,
    toleranceEnd=60,
    minOverlap=0,
    maxEventDuration=5 * 60,
    minDurationBetweenEvents=90)
scores = scoring.EventScoring(ref, hyp, param)
figEvents = visualization.plotEventScoring(ref, hyp, param)

print("# Event scoring\n" +
      "- Sensitivity : {:.2f} \n".format(scores.sensitivity) +
      "- Precision   : {:.2f} \n".format(scores.precision) +
      "- F1-score    : {:.2f} \n".format(scores.f1) +
      "- FP/24h      : {:.2f} \n".format(scores.fpRate))

A presentation explaining these metrics is available here.

Name		Name	Last commit message	Last commit date
Latest commit History 70 Commits
.github/workflows		.github/workflows
src/timescoring		src/timescoring
tests		tests
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github/workflows

.github/workflows

src/timescoring

src/timescoring

tests

tests

LICENSE

LICENSE

README.md

README.md

pyproject.toml

pyproject.toml

requirements.txt

requirements.txt

Repository files navigation

Library for measuring performance of time series classification

Motivation

Metrics

Installation

Code

Parameters

Scores

Example of usage

About

Releases 5

Packages

Contributors 2

Languages

License

esl-epfl/epilepsy_performance_metrics

Folders and files

Latest commit

History

Repository files navigation

Library for measuring performance of time series classification

Motivation

Metrics

Installation

Code

Parameters

Scores

Example of usage

About

Resources

License

Stars

Watchers

Forks

Languages