Skip to content

Evaluation Measures

janvanrijn edited this page Aug 15, 2018 · 2 revisions

When uploading a run, evaluation measures can be specified. There are two forms of evaluation measures:

  • Measures that are also calculated by the Evaluation Engine. See 'Evaluation Engine' for a list. These measures can still be submitted, but the result of the evaluation engine takes precedence. The values will be compared, and if they differ by more than 0.00001 (10^{-5}) a warning is recorded in the database.
  • Measures that are not calculated by the Evaluation Engine, for example information about the operating system or run time. See 'User Measures'. These measures will be stored in the database.

User Measures

The measures that are not calculated by the evaluation engine, and therefore can freely uploaded by the workbenches. The following are of interest:

  • usercpu_time_millis, usercpu_time_millis_training, usercpu_time_testing: the number of milliseconds the CPU was busy on training/testing/both. Note that cpu time is hard to measure (requires low-level libraries) and is not widely supported across platforms.
  • wall_clock_time_millis, wall_clock_time_millis_training, wall_clock_time_testing: the number of milliseconds that passed in between the start of {training, testing} and the end of {training, testing}. Does not take into account the number of cores.
  • os_information: Used in Weka-based runs, records information about the OS that the JVM ran on.
  • scimark_benchmark: Used in Weka-based runs, benchmarks the JVM using 5 different measures.
  • run_cpu_time: legacy, old expdb measure
  • run_memory: legacy, old expdb measure
  • run_virtual_memory: legacy, old expdb measure

Evaluation Engine

The following evaluation measures are currently calculated by the Evaluation Engine:

Regression:

  • mean_absolute_error
  • mean_prior_absolute_error
  • number_of_instances
  • root_mean_squared_error
  • root_mean_prior_squared_error
  • relative_absolute_error
  • root_relative_squared_error

Classification:

  • average_cost (based on cost matrix)
  • total_cost (based on cost matrix)
  • mean_absolute_error
  • mean_prior_absolute_error
  • root_mean_squared_error
  • root_mean_prior_squared_error
  • relative_absolute_error
  • root_relative_squared_error
  • prior_entropy
  • kb_relative_information_score
  • predictive_accuracy
  • kappa
  • number_of_instances
  • precision (per class)
  • recall (per class)
  • f_measure (per class)
  • area_under_roc_curve (per class)
  • confusion_matrix