# Evaluation of FSRS online and SM-15

The result of the evaluation shows that the difference in performance between FSRS online and SM-15 is not significant. FSRS online has reached the level of SM-15.

In [1]:
import scipy
import json
import numpy as np

with open("./evaluation.json", 'r') as f:  
    evaluation = json.load(f)

print(f"Number of users: {len(evaluation)}")

size = []
for item in evaluation:
    size.append(item['size'])

print(f"Number of repetitions: {sum(size)}")

metrics = ["RMSE", "MAE", "log_loss", "universal_metric"]

print()

for m in metrics:
    fsrs_online = []
    sm15 = []
    fsrs_offline = []
    for item in evaluation:
        fsrs_online.append(item['fsrs_online'][m])
        sm15.append(item['sm15'][m])
        fsrs_offline.append(item['fsrs_offline'][m])

    fsrs_online = np.array(fsrs_online)
    sm15 = np.array(sm15)
    fsrs_offline = np.array(fsrs_offline)

    print(f"Metric: {m}")
    print(f"FSRS Online\tmean: {fsrs_online.mean():.4f}\tstd: {fsrs_online.std():.4f}")
    print(f"SM15\t\tmean: {sm15.mean():.4f}\tstd: {sm15.std():.4f}")
    print(f"FSRS Offline\tmean: {fsrs_offline.mean():.4f}\tstd: {fsrs_offline.std():.4f}")
    print()
    print("FSRS Online vs SM15")
    print(scipy.stats.ttest_rel(fsrs_online, sm15))
    print(scipy.stats.wilcoxon(fsrs_online, sm15))
    print("FSRS Offline vs SM15")
    print(scipy.stats.ttest_rel(fsrs_offline, sm15))
    print(scipy.stats.wilcoxon(fsrs_offline, sm15))
    print()

Number of users: 9
Number of repetitions: 180714

Metric: RMSE
FSRS Online	mean: 0.1189	std: 0.0511
SM15		mean: 0.1107	std: 0.0384
FSRS Offline	mean: 0.0459	std: 0.0191

FSRS Online vs SM15
TtestResult(statistic=0.5327511678090001, pvalue=0.6086816363117653, df=8)
WilcoxonResult(statistic=22.0, pvalue=1.0)
FSRS Offline vs SM15
TtestResult(statistic=-7.33750414938164, pvalue=8.09061158826112e-05, df=8)
WilcoxonResult(statistic=0.0, pvalue=0.00390625)

Metric: MAE
FSRS Online	mean: 0.0729	std: 0.0376
SM15		mean: 0.0720	std: 0.0309
FSRS Offline	mean: 0.0319	std: 0.0168

FSRS Online vs SM15
TtestResult(statistic=0.07159397577351928, pvalue=0.9446825180865761, df=8)
WilcoxonResult(statistic=17.0, pvalue=0.5703125)
FSRS Offline vs SM15
TtestResult(statistic=-6.004901233147683, pvalue=0.00032163217452764307, df=8)
WilcoxonResult(statistic=0.0, pvalue=0.00390625)

Metric: log_loss
FSRS Online	mean: 0.3739	std: 0.1211
SM15		mean: 0.3858	std: 0.1517
FSRS Offline	mean: 0.3388	std: 0.1250

FSRS On

# Reference

- fsrs: https://github.com/open-spaced-repetition/fsrs4anki/wiki/The-Algorithm
- sm15: https://supermemo.guru/wiki/Algorithm_SM-15
- rmse: https://en.wikipedia.org/wiki/Root-mean-square_deviation
- mae: https://en.wikipedia.org/wiki/Mean_absolute_error
- log_loss: https://en.wikipedia.org/wiki/Cross-entropy
- universal_metric: https://supermemo.guru/wiki/Universal_metric_for_cross-comparison_of_spaced_repetition_algorithms
- ttest_rel: https://en.wikipedia.org/wiki/Student%27s_t-test#Dependent_t-test_for_paired_samples
- wilcoxon: https://en.wikipedia.org/wiki/Wilcoxon_signed-rank_test