-
Notifications
You must be signed in to change notification settings - Fork 2
/
report.Rmd
1111 lines (900 loc) · 59.9 KB
/
report.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
title: "Report"
author: "Francisco Bischoff"
date: "on Jan 31, 2022"
output:
workflowr::wflow_html:
toc: true
toc_float: true
number_sections: false
theme: lumen
highlight: textmate
css: style.css
bibliography: ../papers/references.bib
link-citations: true
# Download your specific csl file and refer to it in the line below.
csl: ../thesis/csl/ama.csl
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(
echo = FALSE, fig.align = "center", autodep = TRUE,
fig.height = 5, fig.width = 10,
tidy = "styler",
tidy.opts = list(strict = TRUE)
)
if (knitr::is_latex_output()) {
knitr::opts_chunk$set(dev = "pdf")
} else {
knitr::opts_chunk$set(dev = "svg")
}
library(here)
library(glue)
library(visNetwork)
library(tibble)
library(kableExtra)
library(gridExtra)
# library(targets)
library(ggplot2)
knitr::opts_knit$set(root.dir = here("docs"), base.dir = here("docs"), verbose = TRUE)
.rmdenvir <- environment()
.refctr <- c(`_` = 0)
ref <- function(use_name) {
require(stringr)
if (!exists(".refctr")) .refctr <- c(`_` = 0)
if (any(names(.refctr) == use_name)) {
return(.refctr[use_name])
}
type <- str_split(use_name, ":")[[1]][1]
n_obj <- sum(str_detect(names(.refctr), type))
use_num <- n_obj + 1
newrefctr <- c(.refctr, use_num)
names(newrefctr)[length(.refctr) + 1] <- use_name
assign(".refctr", newrefctr, envir = .rmdenvir)
return(use_num)
}
```
# Objectives and the research question
While this research was inspired on the CinC/Physionet Challenge 2015, its purpose is not to beat
the state of the art on that challenge, but to identify, on streaming data, abnormal hearth electric
patterns, specifically those which are life-threatening, using low CPU and low memory requirements
in order to be able to generalize the use of such information on lower-end devices, outside the ICU,
as ward devices, home devices, and wearable devices.
The main questions is: can we accomplish this objective using a minimalist approach (low CPU, low
memory) while maintaining robustness?
# Principles
This research is being conducted using the Research Compendium principles [@compendium2019]:
1. Stick with the convention of your peers;
2. Keep data, methods, and output separated;
3. Specify your computational environment as clearly as you can.
Data management follows the FAIR principle (findable, accessible, interoperable, reusable)
[@wilkinson2016]. Concerning these principles, the dataset was converted from Matlab's format to
CSV format, allowing more interoperability. Additionally, all the project, including the dataset, is
in conformity with the Codemeta Project [@CodeMeta2017].
# Materials and methods
## Softwares
### Pipeline management
All steps of the process are being managed using the R package `targets` [@landau2021] from data
extraction to the final report. An example of a pipeline visualization created with `targets` is
shown in Fig. `r ref("fig:targets")`. This package helps to keep record of the random seeds (allowing
reproducibility), changes in some part of the code (or dependencies) and then running only the
branches that need to be updated, and several other features to keep a reproducible workflow
avoiding unnecessary repetitions.
```{r targets, echo=FALSE, out.width="100%"}
#| fig.cap=paste("Figure", ref("fig:targets"), "- Example of pipeline visualization using `targets`.
#| From left to right we see 'Stems' (steps that do not create branches) and 'Patterns'
#| (that contains two or more branches) and the flow of the information.
#| The green color means that the step is up to date to the current code and dependencies.")
if (knitr::is_latex_output()) {
knitr::include_graphics("figure/targets.pdf")
} else {
knitr::include_graphics("figure/targets.svg")
}
```
### Reports management
The report is available on the main webpage [@franz_website], allowing inspection of previous
versions managed by the R package `workflowr`[@workflowr2021]. This package complements the
`targets` package by taking care of the versioning of every report. It is like a Log Book that keeps
track of every important milestone of the project, while summarize the computational environment
where it was run. Fig. `r ref("fig:workflowr")` shows only a fraction of the generated website, where we
can see that this version passed the required checks (system is up-to-date, no caches, session
information was recorded, and others) and we see a table of previous versions.
```{r workflowr, echo=FALSE, out.width="100%"}
#| fig.cap=paste("Figure", ref("fig:workflowr"), "- Fraction of the website generated by `workflowr`.
#| On top we see that this version passed all checks, and in the middle we see a table
#| referring to the previous versions of the report.")
knitr::include_graphics("figure/workflowr_print.png")
```
### Modeling and parameter tuning
The well known package used for data science in R is the `caret` (short for **C**lassification
**A**nd **RE**gression **T**raining) [@JSSv028i05]. Nevertheless, the author of `caret` recognizes
several limitations of his (great) package, and is now in charge of the development of the
`tidymodels` [@tidymodels2020] collection. For sure, there are other available frameworks and
opinions [@Thompson2020]. Notwithstanding, this project will follow the `tidymodels` road. Three
significant arguments 1) constantly improving and constantly being re-checked for bugs; large
community contribution; 2) allows to plug in a custom modeling algorithm that, in this case, will be
the one needed for developing this work; 3) `caret` is not in active development.
### Continuous integration
Meanwhile, the project pipeline has been set up on GitHub, Inc. [@bischoffrepo2021] leveraging on
Github Actions [@gitactions2021] for the Continuous Integration lifecycle. The repository is
available at [@bischoffrepo2021], and the resulting report is available at [@franz_website].
It is also public available the roadmap and tasks status of this thesis on Zenhub [@zenhub2021].
## Developed software
### Matrix Profile {#matrixprofile}
Matrix Profile (MP) [@Yeh2017a], is a state-of-the-art [@DePaepe2020; @Feremans2020] time series
analysis technique that once computed, allows us to derive frameworks to all sorts of tasks, as
motif discovery, anomaly detection, regime change detection and others [@Yeh2017a].
Before MP, time series analysis relied on what is called *distance matrix* (DM), a matrix that stores all
the distances between two time series (or itself, in case of a Self-Join). This was very power consuming,
and several methods of pruning and dimensionality reduction were researched [@Lin2007].
For brevity, let's just understand that the MP and the companion Profile Index (PI) are two vectors
that hold one floating point value and one integer value, respectively, regarding the original time
series: (1) the similarity distance between that point on time (let's call these points "indexes")
and its first nearest-neighbor (1-NN), (2) The index where this this 1-NN is located. The original
paper has more detailed information [@Yeh2017a]. It is computed using a rolling window but instead
of creating a whole DM, only the minimum values and the index of these minimum are stored (in the
MP and PI respectively). We can have an idea of the relationship of both on Fig. `r ref("fig:thematrix")`.
```{r thematrix, echo=FALSE}
#| fig.cap=paste("Figure", ref("fig:thematrix"), "- A distance matrix (top), and a matrix profile (bottom). The matrix profile stores only
#| the minimum values of the distance matrix.")
knitr::include_graphics("figure/mp_1.png")
```
This research has already yielded two R packages concerning the MP algorithms from UCR [@mpucr]. The
first package is called `tsmp`, and a paper has also been published in the R Journal [@RJ-2020-021]
(Journal Impact Factor™, 2020 of 3.984). The second package is called `matrixprofiler` and enhances
the first one, using low-level language to improve computational speed. The author has also joined
the Matrix Profile Foundation as co-founder together with contributors from Python and Go languages
[@mpf2020; @VanBenschoten2020].
This implementation in R is being used for computing the MP and MP-based algorithms of this thesis.
## The data
The current dataset used is the CinC/Physionet Challenge 2015 public dataset, modified to include
only the actual data and the header files in order to be read by the pipeline and is hosted by
Zenodo [@bischoff2021] under the same license as Physionet.
The dataset is composed of 750 patients with at least five minutes records. All signals have been
resampled (using anti-alias filters) to 12 bit, 250 Hz and have had FIR band-pass (0.05 to 40Hz) and
mains notch filters applied to remove noise. Pacemaker and other artifacts are still present on the
ECG [@Clifford2015]. Furthermore, this dataset contains at least two ECG derivations and one or more
variables like arterial blood pressure, photoplethysmograph readings, and respiration movements.
The _events_ we seek to identify are the life-threatening arrhythmias as defined by Physionet in
Table `r ref("tab:alarms")`.
```{r alarms, echo=FALSE}
alarms <- tribble(
~Alarm, ~Definition,
"Asystole", "No QRS for at least 4 seconds",
"Extreme Bradycardia", "Heart rate lower than 40 bpm for 5 consecutive beats",
"Extreme Tachycardia", "Heart rate higher than 140 bpm for 17 consecutive beats",
"Ventricular Tachycardia", "5 or more ventricular beats with heart rate higher than 100 bpm",
"Ventricular Flutter/Fibrillation", "Fibrillatory, flutter, or oscillatory waveform for at least 4 seconds"
)
kbl(alarms,
booktabs = TRUE,
caption = paste("Table", ref("tab:alarms"), "- Definition of the five alarm types used in CinC/Physionet Challenge 2015."),
align = "ll",
position = "ht",
linesep = "\\addlinespace"
) %>%
row_spec(0, bold = TRUE) %>%
kable_styling(full_width = TRUE)
```
The fifth minute is precisely where the alarm has been triggered on the original recording set. To
meet the ANSI/AAMI EC13 Cardiac Monitor Standards [@AAMI2002], the onset of the event is within 10
seconds of the alarm (i.e., between 4:50 and 5:00 of the record). That doesn't mean that there are
no other arrhythmias before.
For comparison, on Table `r ref("tab:challenge")` we collected the score of the five best participants
of the challenge [@plesinger2015; @kalidas2015; @couto2015; @fallet2015; @hoogantink2015].
```{r challenge, echo=FALSE}
challenge <- tribble(
~Score, ~Authors,
"81.39", "Filip Plesinger, Petr Klimes, Josef Halamek, Pavel Jurak",
"79.44", "Vignesh Kalidas",
"79.02", "Paula Couto, Ruben Ramalho, Rui Rodrigues",
"76.11", "Sibylle Fallet, Sasan Yazdani, Jean-Marc Vesin",
"75.55", "Christoph Hoog Antink, Steffen Leonhardt"
)
kbl(challenge,
booktabs = TRUE,
caption = paste("Table", ref("tab:challenge"), "- Challenge Results on real-time data. The scores were multiplied by 100."),
align = "cl",
position = "ht"
) %>%
row_spec(0, bold = TRUE) %>%
# column_spec(1, width = "5em") %>%
# column_spec(2, width = "30em") %>%
kable_styling(full_width = TRUE)
```
The equation used on this challenge to compute the score of the algorithms is in the Equation
$\eqref{score}$. This equation is the accuracy formula, with penalization of the false negatives.
The reasoning pointed out by the authors [@Clifford2015] is the clinical impact of existing a
genuine life-threatening event that was considered unimportant. Accuracy is known to be misleading
when there is a high class imbalance [@Akosa2017].
\
$$
Score = \frac{TP+TN}{TP+TN+FP+5*FN} \tag{1} \label{score}
$$
\
Assuming that this is a finite dataset, the pathologic cases (1) $\lim_{TP \to \infty}$ (whenever
there is an event, it is positive) or (2) $\lim_{TN \to \infty}$ (whenever there is an event, it is
false), cannot happen. This dataset has 292 True alarms and 458 False alarms. Experimentally, this
equation yields:
- 0.24 if all guesses are on False class
- 0.28 if random guesses
- 0.39 if all guesses are on True class
- 0.45 if no false positives plus random on True class
- 0.69 if no false negatives plus random on False class
This small experiment (knowing the data in advance) shows that "a single line of code and a few
minutes of effort" [@Wu2020] algorithm could achieve at most a score of 0.39 in this challenge (the
last two lines, the algorithm must to be very good on one class).
Nevertheless, this equation will only be useful to allow us to compare the results of this thesis
with other algorithms.
## Work structure
### Project start
The project started with a literature survey on the databases Scopus, PubMed, Web of Science, and
Google Scholar with the following query (the syntax was adapted for each database):
\
TITLE-ABS-KEY ( algorithm OR 'point of care' OR 'signal processing' OR 'computer
assisted' OR 'support vector machine' OR 'decision support system*' OR 'neural
network*' OR 'automatic interpretation' OR 'machine learning') AND TITLE-ABS-KEY
( electrocardiography OR cardiography OR 'electrocardiographic tracing' OR ecg
OR electrocardiogram OR cardiogram ) AND TITLE-ABS-KEY ( 'Intensive care unit' OR
'cardiologic care unit' OR 'intensive care center' OR 'cardiologic care center' )
\
The inclusion and exclusion criteria were defined as in Table `r ref("tab:criteria")`.
```{r criteria, echo=FALSE}
criteria <- tribble(
~"Inclusion criteria", ~"Exclusion criteria",
"ECG automatic interpretation", "Manual interpretation",
"ECG anomaly detection", "Publication older than ten years",
"ECG context change detection", "Do not attempt to identify life-threatening arrhythmias, namely asystole, extreme bradycardia, extreme tachycardia, ventricular tachycardia, and ventricular flutter/fibrillation",
"Online Stream ECG analysis", "No performance measurements reported",
"Specific diagnosis (like a flutter, hyperkalemia, etc.)", ""
)
kbl(criteria,
booktabs = TRUE,
caption = paste("Table", ref("tab:criteria"), "- Literature review criteria."),
align = "ll",
position = "ht",
linesep = "\\addlinespace"
) %>%
row_spec(0, bold = TRUE) %>%
kable_styling(full_width = TRUE)
```
The survey is being conducted with peer review, all articles on full-text phase were obtained and
assessed for the extraction phase, with exception of 5 articles that were not available. The survey
is currently staled on the Data Extraction phase due to external factors.
Fig. `r ref("fig:prisma")` shows the flow diagram of the resulting screening using PRISMA format.
```{r prisma, echo=FALSE, out.width="70%", fig.cap=paste("Figure", ref("fig:workflowr"), "- Flowchart of the literature survey.")}
knitr::include_graphics("figure/PRISMA.png")
```
The peer review is being conducted by the author of this thesis together with another coleague, Dr.
Andrew Van Benschoten from the Matrix Profile Foundation [@mpf2020].
Table. `r ref("tab:kappa")` shows the Inter-rater Reliability (IRR) of the screening phases, using Cohen's $\kappa$ statistic.
The bottom line shows the estimated accuracy after corrected for possible confounders [@Bakeman2011].
```{r kappa, echo=FALSE, eval=!knitr::is_latex_output(), results="asis"}
cat(
r"(<br><table class="tg"><caption>)",
"Table ", ref("tab:kappa"), " - Inter-rater Reliability on the literature survey process.",
r"(</caption>
<thead> <tr> <th class="tg-top" colspan="2"> </th> <th class="tg-top"
colspan="2"> Title-Abstract<br>(2388 articles) </th> <th class="tg-top"> </th> <th class="tg-top"
colspan="2"> Full-Review<br>(303 articles) </th> </tr></thead> <tbody> <tr> <td class="tg-73oq"
colspan="2"> </td><td class="tg-wp8o" colspan="2"> Reviewer #2 </td><td class="tg-73oq"> </td><td
class="tg-wp8o" colspan="2"> Reviewer #2 </td></tr><tr> <td class="tg-73oq" colspan="2"> </td><td
class="tg-73oq"> Include </td><td class="tg-73oq"> Exclude </td><td class="tg-73oq"> </td><td
class="tg-73oq"> Include </td><td class="tg-73oq"> Exclude </td></tr><tr> <td class="tg-wp8o"
rowspan="2"> Reviewer #1 </td><td class="tg-3z1b"> Include </td><td class="tg-cross"> 185 </td><td
class="tg-cross"> 381 </td><td class="tg-cross"> </td><td class="tg-cross"> 63 </td><td
class="tg-cross"> 58 </td></tr><tr> <td class="tg-3z1b"> Exclude </td><td class="tg-crosslow"> 129
</td><td class="tg-crosslow"> 1693 </td><td class="tg-crosslow"> </td><td class="tg-crosslow"> 13
</td><td class="tg-crosslow"> 169 </td></tr><tr> <td class="tg-73oq" colspan="2"> Cohen’s omnibus
<span class="math inline">\(\kappa\)</span> </td><td class="tg-body" colspan="2"> 0.30 </td><td
class="tg-body"> </td><td class="tg-body" colspan="2"> 0.48 </td></tr><tr> <td class="tg-73oq"
colspan="2"> Maximum possible <span class="math inline">\(\kappa\)</span> </td><td class="tg-body"
colspan="2"> 0.66 </td><td class="tg-body"> </td><td class="tg-body" colspan="2"> 0.67
</td></tr><tr> <td class="tg-73oq" colspan="2"> Std Err for <span class="math
inline">\(\kappa\)</span> </td><td class="tg-body" colspan="2"> 0.02 </td><td class="tg-body">
</td><td class="tg-body" colspan="2"> 0.05 </td></tr><tr> <td class="tg-73oq" colspan="2"> Observed
Agreement </td><td class="tg-body" colspan="2"> 79% </td><td class="tg-body"> </td><td
class="tg-body" colspan="2"> 77% </td></tr><tr> <td class="tg-73oq" colspan="2"> Random Agreement
</td><td class="tg-body" colspan="2"> 69% </td><td class="tg-body"> </td><td class="tg-body"
colspan="2"> 55% </td></tr><tr> <td class="tg-mcqj" colspan="2"> Agreement corrected with KappaAcc
</td><td class="tg-mqa1" colspan="2"> 82% </td><td class="tg-mcqj"> </td><td class="tg-mqa1"
colspan="2"> 85% </td></tr></tbody></table>)"
)
```
\
```{r kappaa, echo = FALSE, eval=knitr::is_latex_output(), results="asis"}
cat(r"(
\begin{table}[ht]
\centering
\caption{\label{tab:kappa}Inter-rater Reliability on the literature survey process.}
\begin{tabular}{llcclcc}
\toprule
& & \multicolumn{2}{c}{\textbf{\begin{tabular}[c]{@{}c@{}}Title-Abstract\\ (2388 articles)\end{tabular}}} & \textbf{} & \multicolumn{2}{c}{\textbf{\begin{tabular}[c]{@{}c@{}}Full-Review\\ (303 articles)\end{tabular}}} \\ \cline{3-4} \cline{6-7}
& & \multicolumn{2}{c}{Reviewer \#2} & & \multicolumn{2}{c}{Reviewer \#2} \\ \cline{3-4} \cline{6-7}
& & \multicolumn{1}{l}{Include} & \multicolumn{1}{l}{Exclude} & & \multicolumn{1}{l}{Include} & \multicolumn{1}{l}{Exclude} \\ \hline
\multicolumn{1}{r}{\multirow{2}{*}{Reviewer \#1}} & \multicolumn{1}{r}{Include} & 185 & 381 & & 63 & 58 \\
\multicolumn{1}{r}{} & \multicolumn{1}{r}{Exclude} & 129 & 1693 & & 13 & 169 \\ \hline
Cohen's omnibus $\kappa$ & & \multicolumn{2}{c}{0.30} & & \multicolumn{2}{c}{0.48} \\
Maximum possible $\kappa$ & & \multicolumn{2}{c}{0.66} & & \multicolumn{2}{c}{0.67} \\
Std Err for $\kappa$ & & \multicolumn{2}{c}{0.02} & & \multicolumn{2}{c}{0.05} \\
Observed Agreement & & \multicolumn{2}{c}{79\%} & & \multicolumn{2}{c}{77\%} \\
Random Agreement & & \multicolumn{2}{c}{69\%} & & \multicolumn{2}{c}{55\%} \\ \hline\addlinespace
\multicolumn{2}{l}{\textbf{Agreement corrected with KappaAcc}} & \multicolumn{2}{c}{\textbf{82\%}} & \textbf{} & \multicolumn{2}{c}{\textbf{85\%}} \\ \bottomrule
\end{tabular}
\end{table}
)")
```
The purpose of using Cohen's $\kappa$ in such review is to allow us to gauge the agreement of both
reviewers on the task of selecting the articles according to the goal of the survey. The most naive
way to verify this would be simply to measure the overall agreement (the number of articles included
and excluded by both, divided by the total number of articles). Nevertheless, this would not take
into account the agreement we could expect purely by chance.
However, the $\kappa$ statistic must be assessed carefully. This topic is beyond the scope of this work
therefore it will be explained briefly.
While it is widely used, the $\kappa$ statistic is also well criticized. The direct interpretation of
its value depends on several assumptions that are often violated. (1) It is assumed that both
reviewers have the same level of experience; (2) The "codes" (include, exclude) are identified with
same accuracy; (3) The "codes" prevalence are the same; (4) There is no reviewer bias towards one of
the choices [@Sim2005; @Bakeman1997].
In addition, the number of "codes" affects the relation between the value of $\kappa$ and the actual
agreement between the reviewers. For example, given equiprobable "codes" and reviewers who are 85%
accurate, the value of $\kappa$ are 0.49, 0.60, 0.66, and 0.69 when number of codes is 2, 3, 5, and 10,
respectively [@Bakeman1997; @Morgan2019].
In order to take these limitations in account, the agreement between reviewers was calculated using
the KappaAcc [@Bakeman2011] from Professor Emeritus Roger Bakeman, Georgia State University, which
computes the estimated accuracy of simulated reviewers.
### RAW data
In order to better understand the data acquisition, it has been acquired a Single Lead Heart Rate
Monitor breakout from Sparkfun™ [@sparkfun2021] using the AD8232 [@AnalogDevices2020] microchip
from Analog Devices Inc., compatible with Arduino^®^ [@arduino2021], for an in-house experiment
(Fig. `r ref("fig:ad8232")`).
```{r ad8232, echo=FALSE, out.width="40%", fig.show="hold", fig.cap=paste("Figure", ref("fig:ad8232"), "- Single Lead Heart Rate Monitor")}
knitr::include_graphics(c("figure/sparkfun.jpg", "figure/FullSetup.jpg"))
```
The output gives us a RAW signal, as shown in Fig. `r ref("fig:rawsignal")`.
```{r rawsignal, echo=FALSE, out.width="50%", fig.cap=paste("Figure", ref("fig:rawsignal"), "- RAW output from Arduino at ~300hz")}
knitr::include_graphics("figure/arduino_plot.jpg")
```
After applying the same settings as the Physionet database (collecting the data at 500hz, resample
to 250hz, pass-filter, and notch filter), the signal is much better, as shown in Fig.
`r ref("fig:filtersignal")`.
```{r filtersignal, echo=FALSE, out.width="90%", fig.cap=paste("Figure", ref("fig:filtersignal"), "- Gray is RAW, Red is filtered")}
knitr::include_graphics("figure/filtered_ecg.png")
```
### Preparing the data
Usually, data obtained by sensors needs to be "cleaned" for proper evaluation. That is different
from the initial filtering process where the purpose is to enhance the signal. Here we are dealing
with artifacts, disconnected cables, wandering baselines and others.
Several SQIs (Signal Quality Indexes) are used in the literature [@eerikainen2015], some trivial
measures as _kurtosis_, _skewness_, median local noise level, other more complex as pcaSQI (the
ratio of the sum of the five largest eigenvalues associated with the principal components over the
sum of all eigenvalues obtained by principal component analysis applied to the time aligned ECG
segments in the window). By experimentation (yet to be validated), a simple formula gives us the
"complexity" of the signal and correlates well with the noisy data is shown in Equation
$\eqref{complex}$.
\
$$
\sqrt{\sum_{i=1}^w((x_{i+1}-x_i)^2)}, \quad \text{where}\; w \; \text{is the window size} \tag{2} \label{complex}
$$
\
The Fig. `r ref("fig:sqi")` shows some SQIs and their relation with the data.
```{r sqi, echo=FALSE, out.width="100%", fig.cap=paste("Figure", ref("fig:sqi"), "- Green line is the \"complexity\" of the signal")}
knitr::include_graphics("figure/noise.png")
```
```{r createfilter, include=FALSE}
source(here("scripts", "common", "read_ecg.R"))
source(here("scripts", "common", "win_complex.R"))
filter_w <- 200
limit <- 8
file <- "a104s"
size_w <- 16
size_h <- 5
data <- read_ecg_csv(here(glue("inst/extdata/physionet/{file}.hea")))
data <- data[[file]]$II
norm_data <- tsmp:::znorm(data)
filter <- win_complex(norm_data, filter_w)
filter <- filter > limit
if (knitr::is_latex_output()) {
grDevices::pdf(here("protocol/figure/regime_filter.pdf"),
width = size_w, height = size_h
)
} else {
svglite::svglite(here("protocol/figure/regime_filter.svg"),
width = size_w, height = size_h
)
}
plot(norm_data, main = "", type = "l", ylab = "", xlab = "index", lwd = 0.2)
points(cbind(which(filter), 0), col = "blue", pch = 19)
dev.off()
```
Fig. `r ref("fig:datafilter")` shows that noisy data (probably patient muscle movements) are marked
with a blue point and thus are ignored by the algorithm.
```{r datafilter, echo=FALSE, out.width="100%", fig.cap=paste("Figure", ref("fig:datafilter"), "- Noisy data marked by the \"complexity\" filter")}
if (knitr::is_latex_output()) {
knitr::include_graphics("figure/regime_filter.pdf")
} else {
knitr::include_graphics("figure/regime_filter.svg")
}
```
Although this step of "cleaning" the data is often used, this step will also be tested if it is
really necessary and the performance with and without "cleaning" will be reported.
### Detecting regime changes
The regime change approach will be using the _Arc Counts_ concept, used on the FLUSS (Fast Low-cost
Unipotent Semantic Segmentation) algorithm, as explained by Gharghabi, _et al._,[@gharghabi2018].
The FLUSS (and FLOSS, the on-line version) algorithm is built on top of the Matrix Profile
(MP)[@Yeh2017a], described on section `r ref("matrixprofile")`. Recalling that the MP and the companion
Profile Index (PI) are two vectors holding information about the 1-NN. One can imagine several
"arcs" starting from one "index" to another. This algorithm is based on the assumption that between
two regimes, the most similar shape (its nearest neighbor) is located on "the same side", so the
number of "arcs" decreases when there is a change on the regime, and increases again. As show on
Fig. `r ref("fig:arcsoriginal")`. This drop on the _Arc Counts_ is a signal that a change on the shape
of the signal has happened.
```{r arcsoriginal, echo=FALSE, out.width="100%", fig.cap=paste("Figure", ref("fig:arcsoriginal"), "- FLUSS algorithm, using arc counts.")}
if (knitr::is_latex_output()) {
knitr::include_graphics("figure/fluss_arcs.pdf")
} else {
knitr::include_graphics("figure/fluss_arcs.svg")
}
```
The choice of the FLOSS algorithm (on-line version of FLUSS) is founded on the following arguments:
- **Domain Agnosticism:** the algorithm makes no assumptions about the data as opposed to most
available algorithms to date.
- **Streaming:** the algorithm can provide real-time information.
- **Real-World Data Suitability:** the objective is not to _explain_ all the data. Therefore, areas
marked as "don't know" areas are acceptable.
- **FLOSS is not:** a change point detection algorithm [@aminikhanghahi2016]. The interest here is
changes in the shapes of a sequence of measurements.
Other algorithms we can cite are based on Hidden Markov Models (HMM) that require at least two
parameters to be set by domain experts: cardinality and dimensionality reduction. The most
attractive alternative could be the Autoplait [@Matsubara2014], which is also domain agnostic and
parameter-free. It segments the time series using Minimum Description Length (MDL) and recursively
tests if the region is best modeled by one or two HMM. However, Autoplait is designed for batch
operation, not streaming, and also requires discrete data. FLOSS was demonstrated to be superior in
several datasets in its original paper. In addition, FLOSS is robust to several changes in data like
downsampling, bit depth reduction, baseline wandering, noise, smoothing, and even deleting 3% of the
data and filling with simple interpolation. Finally, the most important, the algorithm is light and
suitable for low-power devices.
In the MP domain, it is worth also mentioning other possible algorithm: the Time Series Snippets
[@Imani2018], based on MPdist [@gharghabi2018b]. The latter measures the distance between two
sequences considering how many similar sub-sequences they share, no matter the order of matching. It
proved to be a useful measure (not a metric) for meaningfully clustering similar sequences. Time
Series Snippets exploits MPdist properties to summarize a dataset extracting the $k$ sequences that
represent most of the data. The final result seems to be an alternative for detecting regime
changes, but it is not. The purpose of this algorithm is to find which pattern(s) explains most of
the dataset. Also, it is not suitable for streaming data. Lastly, MPdist is quite expensive compared
to the trivial Euclidean distance.
The regime change detection will be evaluated following the criterias explained on section
`r ref("evaluation")`.
### Classification of the new regime {#classregime}
The next step towards the objective of this work is to verify if the new regime detected by the
previous step is indeed a life-threatening pattern that we should trigger the alarm.
First let's dismiss some apparent solutions: (1) Clustering. It is well understood that we cannot
cluster time series subsequences meaningfully with any distance measure, or with any algorithm
[@Keogh2005]. The main argument is that in a meaningfull algorithm, the output depends on the input,
and this has been proven to not happen in time series subsequence clustering [@Keogh2005]. (2)
Anomaly detection. In this work we are not looking for surprises, but for patterns that are known to
be life-threatening. (3) Forecasting. We may be tempted to make predictions, but clearly this is not
the idea here.
The method of choice is classification. The simplest algorithm could be a `TRUE`/`FALSE` binary
classification. Nevertheless, the five life-threatening patterns have well defined characteristics
that may seem more plausible to classify the new regime using some kind of ensamble of binary
classifiers or a "six-class" classifier (being the sixth class the `FALSE` class).
Since the model doesn't know which life-threatening pattern will be present in the regime (or if it
will be a `FALSE` case), the model will need to check for all five `TRUE` cases and if none of these
cases are identified, it will classify the regime as `FALSE`.
In order to avoid exceeding processor capacity, an initial set of shapelets [@Rakthanmanon2013] can
be sufficient to build the `TRUE`/`FALSE` classifier. And to build such set of shapelets, leveraging
on the MP, we will use the Contrast Profile [@Mercer2021].
The Contrast Profile (CP) looks for patterns that are at the same time very *similar* to its
neighbors in class *A* while is very *different* from the nearest neighbor from class *B*. In other
words, this means that such pattern represents well class *A* and may be taken as a "signature" of
that class.
In this case we need to compute two MP, one self-join MP using the *positive* class $MP^{(++)}$ (the
class that has the signature we want to find) and one AB-join MP using the *positive* and *negative*
classes $MP^{(+-)}$. Then we subtract the first $MP^{(++)}$ from the last $MP^{(+-)}$, resulting in
the $CP$. The high values on $CP$ are the locations for the signature candidates we look for (the
author of CP calls these segments *Plato's*).
Due to the nature of this approach, the MP's (containing values in Euclidean Distance) are truncated
for values above $\sqrt{2w}$, where $w$ is the window size. This because values above this threshold
are negatively correlated in the Pearson Correlation space. Finally, we normalize the values by
$\sqrt{2w}$. The formula $\eqref{contrast}$ synthesizes this computation.
\
$$
CP_w = \frac{MP_{w}^{(+-)} - MP_{w}^{(++)}}{\sqrt{2w}} \quad \text{where}\; w \; \text{is the window size} \tag{3} \label{contrast}
$$
\
For a more complete understanding of the process, Fig. `r ref("fig:contrast")` shows a practical example
from the original article [@Mercer2021].
\
```{r contrast, echo=FALSE, out.width="100%"}
#| fig.cap = paste("Figure", ref("fig:contrast"), "- Top to bottom: two weakly-labeled snippets of a larger time series. T(-) contains
#| only normal beats. T(+) also contains PVC (premature ventricular contractions).
#| Next, two Matrix Profiles with window size 91; AB-join is in red and self-join in blue.
#| Bottom, the Contrast Profile showing the highest location.")
if (knitr::is_latex_output()) {
knitr::include_graphics("figure/contrast.pdf")
} else {
knitr::include_graphics("figure/contrast.svg")
}
```
After extracting candidates for each class signature, a classification algorithm will be fitted and
evaluated using the criterias explained on section `r ref("evaluation")`.
### Summary of the methodology
In order to summarize the steps taken on this thesis to accomplish the main objective, Figs.
`r ref("fig:regimedetection")`, `r ref("fig:shapelets")` and `r ref("fig:fullmodel")` show the overview of the
processes involved.
First let us introduce the concept of Nested Resampling [@Bischl2012]. It is known that when
increasing model complexity, overfitting on the training set becomes more likely to happen
[@Hastie2009]. This is an issue that this work has to countermeasure as there are many steps that
requires parameter tuning, even for algorithms that are almost parameter-free like the MP.
The rule that must be followed is simple: *do not* evaluate a model on the same resampling split used
to perform its own parameter tuning. Using simple cross-validation, the information about the test
set "leaks" into the evaluation, which leads to overfitting/overtuning, and gives us an optimistic
biased estimative of the performance. Bernd Bischl, 2012 [@Bischl2012] describes more deeply these
factors, and also gives us a countermeasure for that: (1) from preprocessing the data to model
selection use the training set; (2) the test set should be touched once, on the evaluation step; (3)
repeat. This guarantees that a "new" separated data is only used *after* the model is trained/tuned.
Fig. `r ref("fig:nestedresampling")` shows us this principle. The steps (1) and (2) described above are
part of the **Outer resampling**, which in each loop splits the data in two sets: the training set
and the test set. The training set is then used in the **Inner resampling** where, for example, the
usual cross-validation may be used (creating an *Analysis set* and an *Assessment set*, to avoid
conflict of terminology), and the best model/parameters is selected. Then, this best model is
evaluated against the unseen test set that was created for this resampling.
The resulting (aggregated) performance of all outer samples gives us a more honest estimative
of the expected performance on new data.
```{r nestedresampling, echo=FALSE, out.width="70%"}
#| fig.cap = paste("Figure", ref("fig:nestedresampling"), "- Nested resampling.
#| The full dataset is resampled several times (outer resampling), so each branch has its
#| own Test set (yellow). On each branch, the Training set is used as if it were a full dataset,
#| being resampled again (inner resampling); here the Assessment set (blue) is used to test the
#| learning model and tune parameters. The best model then, is finally evaluated on its own
#| Test set.")
if (knitr::is_latex_output()) {
knitr::include_graphics("figure/draw-nested-resampling.pdf")
} else {
knitr::include_graphics("figure/draw-nested-resampling.svg")
}
```
\
After the understanding of the Nested Resampling [@Bischl2012], the following flowcharts can be
better interpreted. Fig. `r ref("fig:regimedetection")` starts with the "Full Dataset" that contains all
time series from the dataset described on section `r ref("the-data")`. Each time series represents one
file from the database, and represents one patient.
The regime change detection will use subsampling (bootstrapping can lead to substantial bias toward
more complex models) in the Outer resampling and cross-validation in the Inner resampling. How
the evaluation will be performed and why the use of cross-validation will be explained on section
`r ref("evaluation")`.
```{r regimedetection, echo=FALSE, out.width="90%"}
#| fig.cap = paste("Figure", ref("fig:regimedetection"), "- Pipeline for regime change detection.
#| The full dataset (containing several patients) is divided on a Training set and a Test set.
#| The Training set is then resampled in an Analysis set and an Assessment set. The former is
#| used for training/parameter tuning and the latter for assessing the result. The best parameters
#| are then used for evaluation on the Test set. This may be repeated several times.")
if (knitr::is_latex_output()) {
knitr::include_graphics("figure/draw-regime-model.pdf")
} else {
knitr::include_graphics("figure/draw-regime-model.svg")
}
```
Fig. `r ref("fig:shapelets")` shows the processes for training the classification model. First, the last
ten seconds of each time series will be identified (the even occurs in this segment). Then the
dataset will be grouped by class (type of event) and `TRUE`/`FALSE` (alarm), so the Outer/Inner
resampling will produce a Training/Analysis set and Test/Assessment set with similar frequency of
the full dataset.
The next step will be to extract shapelet candidates using the Contrast Profile and train the
classifier.
This pipeline will use subsampling (for the same reason above) in the Outer resampling and
cross-validation in the Inner resampling. How the evaluation will be performed and why the use of
cross-validation will be explained on section `r ref("evaluation")`.
```{r shapelets, echo=FALSE, out.width="60%"}
#| fig.cap = paste("Figure", ref("fig:shapelets"), "- Pipeline for alarm classification.
#| The full dataset (containing several patients) is grouped by class and by TRUE/FALSE alarm.
#| This grouping allows resampling to keep a similar frequency of classes and TRUE/FALSE of the full dataset.
#| Then the full dataset is divided on a Training set and a Test set.
#| The Training set is then resampled in an Analysis set and an Assessment set. The former is
#| used for extracting shapelets, training the model and parameter tuning; the latter for assessing
#| the performance of the model. Finally, the best model is evaluated on the Test set.
#| This may be repeated several times.")
if (knitr::is_latex_output()) {
knitr::include_graphics("figure/draw-classif-model.pdf")
} else {
knitr::include_graphics("figure/draw-classif-model.svg")
}
```
Finally, Fig. `r ref("fig:fullmodel")` shows how the final model will be used on the field. In a
streaming scenario, the data will be collected and processed in real-time to maintain an up to date
Matrix Profile. The FLOSS algorithm will be looking for a regime change. When a regime change is
detected, a sample of this new regime will be presented to the trained classifier that will evaluate
if this new regime is a life-threatening condition or not.
```{r fullmodel, echo=FALSE, out.width="60%"}
#| fig.cap = "Pipeline of the final process.
#| The streaming data, coming from one patient, is processed to create its Matrix Profile.
#| Then, the FLOSS algorithm is computed for detecting a regime change. When a new regime is
#| detected, a sample of this new regime is analysed by the model and a decision is made. If
#| the new regime is life-threatening, the alarm will be fired."
if (knitr::is_latex_output()) {
knitr::include_graphics("figure/draw-global-model.pdf")
} else {
knitr::include_graphics("figure/draw-global-model.svg")
}
```
## Evaluation of the algorithms {#evaluation}
The subsampling method used on both algorithms, regime change and classification, will be
the Cross Validation, as the learning task will be in batches.
Other options dismissed [@Bischl2012]:
* Leave-One-Out Cross Validation: has better properties for regression than for classification. It
has a high variance as an estimator of the mean loss. It also is asymptotically inconsistent and
tends to select too complex models. It is demonstrates empirically that 10-fold CV is often
superior.
* Bootstrapping: while it has low variance, it may be optimistic biased on more complex models.
Also, its resampling method with replacement can leak information into the assessment set.
* Subsampling: is like bootstrapping, but without replacement. The only argument for not choosing
it, is that with Cross Validation we make sure all the data is used for analysis and assessment.
### Regime change
A detailed discussion about the evaluation process of segmentation algorithms is made by the
FLUSS/FLOSS author [@gharghabi2018]. Previous researches have used precision/recall or derived
measures for performance. The main issue is how to assume that the algorithm was correct? If the
ground truth says the change occurred at location 10,000, and the algorithm detects a change at
location 10,001, is this a miss?
As pointed out by the author, several independent researchers have suggested a temporal tolerance,
that solves one issue, but also has a hard time on penalize any tiny miss beyond this tolerance.
The second issue is a over-penalization of an algorithm in which most of the detections are
good, but just one (or a few) is poor.
The author proposes the solution depicted in Fig. `r ref("fig:flosseval")`. It gives 0 as the best score
and 1 as the worst. The function sums the distances between the ground truth locations and the
locations suggested by the algorithm. The sum is then divided by
<!--the product of the number of segments, and then -->
the length of the time series to normalize the range to [0, 1].
The goal is minimizing this score.
<!--
TODO: review the problem when there are too many detections
-->
```{r flosseval, echo=FALSE, out.width="100%"}
#| fig.cap=paste("Figure", ref("fig:flosseval"), "- Regime change evaluation. The top line illustrates the ground truth, and the
#| bottom line the locations reported by the algorithm. Note that multiple proposed locations
#| can be mapped to a single ground truth point.")
if (knitr::is_latex_output()) {
knitr::include_graphics("figure/floss_eval.pdf")
} else {
knitr::include_graphics("figure/floss_eval.svg")
}
```
### Classification
As described on section `r ref("classregime")`, the model for classification will use a set of shapelets
to identify if we have a `TRUE` (life-threatening) regime or a `FALSE` (non life-threatening) regime.
Although the implementation of the final process will be using streaming data, the classification
algorithm will work in batches, because it will not be applied on every single data point, but on
samples that are extracted when a regime change is detected. During the training phase, the data is
also analyzed in batches.
One important factor we must consider is that, on real world, the majority of regime changes will be
`FALSE` (i.e., not life-threatening). Thus, a performance measure that is robust to class imbalance
is needed if we want to be able to assess the model after it was trained, on the field.
It is well known that the *Accuracy* measure is not reliable for unbalanced data
[@Akosa2017; @Bekkar2013] as it returns optimistic results for a classifier on the majority class. A
description of common measures used on classification is available [@Akosa2017; @Chicco2020]. Here
we will focus on three candidate measures that can be used: F-score (well discussed on
[@Chicco2020]), Matthew's Correlation Coefficient (MCC) [@Matthews1975] and $\kappa_m$ statistic
[@Bifet2015].
The F-score (let's abbreviate to F~1~ as this is the more common setting), is widely used on
*information retrieval*, where the classes are usually classified as "relevant" and "irrelevant",
and combines the *recall* (also known as sensitivity) and the *precision* (the positive predicted
value). *Recall* assess how well the algorithm retrieves relevant examples among the (usually few)
relevant items in dataset, while *precision* assess the proportion of indeed relevant items are
contained in the retrieved examples. It ranges from [0, 1]. It ignores completely the irrelevant
items that were not retrieved (usually this set contain lots of items). In classification tasks, its
main weakness is not evaluate the True Negatives, and if the proportion of a random classifier gets
towards the `TRUE` class (increasing the False Positives significantly), this score actually gets
better, thus not suitable to our case. The F~1~ score is defined on equation $\eqref{score}$.
$$
F_1 score = \frac{2 \cdot TP}{2 \cdot TP + FP + FN} = 2 \cdot \frac{precision \cdot recall}{precision + recall} \tag{4} \label{fscore}
$$
The MCC is a good alternative to the F~1~ when we do care about the True negatives (both were
considered to "provide more realistic estimates of real-world model performance" [@Dubey2018]). It
is a method to compute the *Pearson product-moment correlation coefficient* [@Delgado2019] between
the actual and predicted values. It ranges from [-1, 1]. The MCC is the only binary classification
rate that only gives a high score if the binary classifier was able to correctly classify the
majority of the positive and negative instances [@Chicco2020]. One may argue that Cohen's $\kappa$
has the same behavior, but there are two main differences (1) MCC is *undefined* in the case of a
*majority voter* while Cohen's $\kappa$ doesn't discriminates this case from the random classifier
($\kappa$ is zero for both cases) (2) It is proven that in an special case when the classifier is
increasing the False Negatives, Cohen's $\kappa$ doesn't get worse as spected, MCC
doesn't have this issue [@Delgado2019]. MCC is defined on equation $\eqref{mccval}$.
$$
MCC = \frac{TP \cdot TN - FP \cdot FN}{\sqrt{(TP + FP) \cdot (TP + FN) \cdot (TN + FP) \cdot (TN + FN)}} \tag{5} \label{mccval}
$$
The $\kappa_m$ statistic [@Bifet2015] is a measure that takes in account not the *random classifier*
but the *majority voter* (a classifier that only votes on the larger class). It was introduced by
Bifet *et al.* [@Bifet2015] for being used in online settings, where the class balance may change
over time. It is defined on equation $\eqref{kappam}$, where $p_0$ is the observed accuracy and
$p_m$ is the accuracy of the majority voter. The score ranges from ($-\infty$, 1], theoretically,
but in practice you see negative numbers if the classifier is performing worse than the majority voter
and positive numbers if performing better than the majority number, until the maximum of 1, when the
classifier is optimal.
$$
\kappa_m = \frac{p_0 - p_m}{1 - p_m} \tag{6} \label{kappam}
$$
In the inner resampling (model training/tuning), the classification will be binary, and in our case
we know that the data is slightly unbalanced (60% false alarms). For this step, the metric for model
selection will be the MCC. Nevertheless, during the optimization process, the algorithm will seek to
minimize the False Negative Rate ($FNR = \frac{FN}{TP+FN}$), and between ties, the smaller FNR wins.
In the outer resampling, the MCC and $\kappa_m$ of all winning models will aggregated and reported
using the median and interquartile range.
For different classifiers, we will use the Wilcoxon's signed-rank test for comparing their performances,
as this method is known to have low Type I and Type II errors in this kind of comparison [@Bifet2015].
### Full model (streaming setting)
For the final assessment, the best and the average model of the previous pipelines will be assembled
and tested using the whole original dataset.
The algorithm will be tested in each of the five life-threatening event split individually, in order
to evaluate its strengths and weakness.
For more transparency, the whole confusion matrix will be reported, as well as the MCC, $\kappa_m$, and
the FLOSS evaluation.
# Current results
## Regime change detection
The current status on regime change detection pipeline is the implementation of the resampling strategies and evaluation in
order to start the parameter tuning. An example of the current implementation is shown on Fig. `r ref("fig:flossregime")`.
```{r flossregime, echo=FALSE, out.height="80%", out.width="80%"}
#| fig.cap=paste("Figure", ref("fig:flossregime"), "- Regime change detection example.
#| The graph on top shows the ECG streaming; the blue line marks the ten seconds
#| before the original alarm was fired; the red line marks the time constraint of 1250;
#| the dark red line marks the limit for taking a decision in this case of Asystole
#| the blue horizontal line represents the size of the sliding window.
#| The graph on the middle shows the Arc counts as seen by the algorithm (with the corrected
#| distribution); the red line marks the current minimum value and its index; the blue
#| horizontal line shows the minimum value seen until then.
#| The graph on the bottom shows the computed Arc counts (raw) and the red line is the
#| theoretical distribution used for correction.")
knitr::include_graphics("figure/floss_regime.png")
```
## Classification
The current status on the classification pipeline is the implementation of the shapelets extraction
using the Contrast Profile.
An example of candidates for ventricular tachycardia is presented on Fig. `r ref("fig:vtachy")`.
```{r vtachy, echo=FALSE, out.width="90%", fig.cap=paste("Figure", ref("fig:vtachy"), "- Shapelet candidates for Ventricular Tachycardia."), fig.height=9, fig.width=14}
def_par <- graphics::par(no.readonly = TRUE)
graphics::layout(matrix(1:6, ncol = 2, byrow = TRUE))
graphics::par(mai = c(0.8, 0.5, 0.6, 0.5), cex = 1)
data <- readRDS(here("presentations/Report/contrast.rds"))
for (i in (1:6)) {
plot(tsmp:::znorm(data[[i]]$plato), type = "l", ylab = "", xlab = "samples (250hz)")
for (j in seq_along(data[[i]]$neighbors)) {
lines(tsmp:::znorm(data[[i]]$neighbors[[j]]$data), col = j + 1)
}
}
graphics::par(def_par)
```
## Feasibility trial
A side-project called "false.alarm.io" has been derived from this work (an unfortunate mix of
"false.alarm" and "PlatformIO" [@PlatformIO], the IDE chosen to interface the panoply of embedded
systems we can experiment with). The current results of this side-project are very enlightening and
show that the final algorithm can indeed be used in small hardware. Further data will be available
in the future.
A brief mentioning, linking back to the objectives of this work, an initial trial was done using an
ESP32 MCU (Fig. `r ref("fig:esp32")`) in order to be sure if such small device can handle the task.
```{r esp32, echo=FALSE, out.width="50%", fig.cap=paste("Figure", ref("fig:esp32"), "- ESP32 MCU")}
knitr::include_graphics("figure/esp32.jpg")
```
Current results show that such device has enough computation power to handle the task in real-time
using just one of its two microprocessors. The main limitation seen in advance is the on-chip SRAM
that must be well managed.
# Scientific contributions
## Matrix Profile
Since the first paper presenting this new concept [@Yeh2017a], lots of investigations were made to
speed up its computation. It is notable how all computations are not dependent on the _rolling
window size_ as previous works not using Matrix Profile. Aside from this, we can see that the first
STAMP [@Yeh2017a] algorithm has the time complexity of $O(n^2log{n})$ while STOMP [@zhu2016]
$O(n^2)$ (a significant improvement), but STOMP lacks the "any-time" property. Later SCRIMP
[@zhu2018] solves this problem keeping the same time complexity of $O(n^2)$. Here we are in the
"exact" algorithms domain and we will not extend the scope for conciseness.
The main issue with the algorithms above is the dependency on a fast Fourier transform (FFT)
library. FFT has been extensively optimized and architecture/CPU bounded to exploit the most of
speed. Also, padding data to some power of 2 happens to increase the efficiency of the
algorithm. We can argue that time complexity doesn't mean "faster" when we can exploit low-level
instructions. In our case, using FFT in a low-power device is overkilling. For example, a quick
search over the internet gives us a hint that computing FFT on a 4096 data in an ESP32 takes
about 21ms (~47 computations in 1 second). This means ~79 seconds for computing all FFT's
(~3797) required for STAMP using a window of 300. Currently, we can compute a full matrix of 5k
data in about 9 seconds in an ESP32 MCU (Fig. `r ref("fig:esp32")`), and keep updating it as fast as
1 min of data (at 250hz) in just 6 seconds.
Recent works using _exact_ algorithms are using an unpublished algorithm called **MPX**, which
computes the Matrix Profile using cross-correlation methods ending up faster and is easily
portable.
**On computing the Matrix Profile:** the contribution of this work on this area is adding the
*Online* capability to MPX, which means we can update the Matrix Profile as new data comes in.
**On extending the Matrix Profile:** the contribution of this work on this area is the use of an
unexplored constraint that we could apply on building the Matrix Profile we are calling _Similarity
Threshold_ (ST). The original work outputs the similarity values in Euclidean Distance (ED) values,
while MPX naturally outputs the values in Pearson's correlation coefficients (CC). Both ED and CC
are interchangeable using the equation $\eqref{edcc}$. However, we may argue that it is easier to
compare values that do not depend on the window size during an exploratory phase. MPX happens to
naturally return values in CC, saving a few more computation time. The ST is an interesting factor
that we can use, especially when detecting pattern changes during time. The FLOSS algorithm relies
on counting references between indexes in the time series. ST can help remove "noise" from these
references since only similar patterns above a certain threshold are referenced, and changes have
more impact on these counts. The best ST threshold is still to be determined.