# Scoring Case Count Warnings Using ExpressScore

The **ExpressScore** package provides a lightweight method for Mercury Challenge participants to score their warnings against ground truth.  Unlike the official Challenge scoring system, **ExpressScore** does not require a Docker installation.
In this notebook we will show you how to use **ExpressScore**'s *CaseCountScorer* class to score warnings for the Disease and Non-Violent Civil Unrest event types.  The *CaseCountScorer* matches input warnings and ground truth and computes three metrics.  The primary metric used in ranking Challenge participants is the Quality Score, which measures the difference between the predicted and the actual Case Counts and converts to a scale from 0 to 1.  Quality Score is computed as:

$QS = 1 - \frac{abs(predicted - actual}{max(predicted, actual, 4)}$

Additionally, the Precision metric reports what proportion of the germane input warnings were matched to events and Recall reports what proportion of events matched germane input warnings:

$Precision = \frac{matches}{warnings}$

$Recall = \frac{matches}{events}$

A warning or an event is "germane" if it matches the location and event type used in constructing the *CaseCountScorer* instance.  Non-germane warnings or events are not included in the computation of any of the metrics.  More on this later.

*CaseCountScorer* has 3 methods that Challenge participants may find useful:

- *quality_score(predicted, actual)*:  A static method that computes the Quality Score for the input predicted and actual case counts.
- *score\_one(warn\_, event\_)*: A static method that parses the JSON inputs for the warning and the event and computes the Quality Score.  Provides diagnostics if there are errors in the input.
- *score(warn_data, gsr_data)*: Matches and scores the lists of JSON-formatted warning and GSR (event) data.  Computes all 3 metrics and provides other output details.

We will walk through the use of each of these with some examples

In [1]:
import sys
sys.path.append("..")
import os
import json
import pprint
from main.express_score import CaseCountScorer

In [2]:
EXPRESS_SCORE_HOME = os.path.abspath("..")
RESOURCE_PATH = os.path.join(EXPRESS_SCORE_HOME, "resources")
TEST_RESOURCE_PATH = os.path.join(RESOURCE_PATH, "test")

## *quality_score* method


The arguments for *CaseCountScorer.quality_score* are a predicted value, an actual value, and an optional accuracy denominator.  The accuracy denominator, which defaults to 4 if not specified, provides scaling for very small case count values.  Here are some examples using this method.

In [3]:
predicted = 1
actual = 0
print("Expect QS = 0.75")
print(CaseCountScorer.quality_score(predicted, actual))
predicted = 2
print("Expect QS = 0.5")
print(CaseCountScorer.quality_score(predicted, actual))
predicted = 100
print("Expect QS = 0")
print(CaseCountScorer.quality_score(predicted, actual))
actual = 99
print("Expect QS = 0.99")
print(CaseCountScorer.quality_score(predicted, actual))
predicted = 99
actual = 100
print("Expect QS = 0.99")
print(CaseCountScorer.quality_score(predicted, actual))

Expect QS = 0.75
0.75
Expect QS = 0.5
0.5
Expect QS = 0
0.0
Expect QS = 0.99
0.99
Expect QS = 0.99
0.99


Note that small differences between predicted and actual have little effect when both are large but have a big effect when both are small.  Also note that QS is symmetric in predicted or actual.

## *score_one* method

The arguments for *CaseCountScorer.score_one* are a JSON-formatted warning and a JSON-formatted event data, along with an optional accuracy denominator which is passed to the *quality_score* method.  Here is an example:

In [4]:
warn_ = {"Event_Type": "Civil Unrest",
         "Country": "Egypt",
         "Event_Date": "2018-06-24",
         "Case_Count": 8,
         "Warning_ID": "test_1"}
event_ = {"Event_Type": "Civil Unrest",
          "Country": "Egypt",
          "Event_Date": "2018-06-24",
          "Case_Count": 6,
          "Event_ID": "CU_Count_Egypt_2018-06-24",
          "Earliest_Reported_Date": "2018-07-15"}
print("Expect QS = 0.75")
result = CaseCountScorer.score_one(warn_,event_)
print(result)

Expect QS = 0.75


## *score* method

The arguments for *CaseCountScorer.score* are a list of JSON formatted warnings, a list of JSON formatted events, and an optional accuracy denominator.  The *score* method is not static; it only operates in the context of event type and location attributes for the *CaseCountScorer* instance:

In [5]:
mers_scorer = CaseCountScorer(event_type="Disease", location="Saudi Arabia")

We have some example warning sets and GSR event data in the *resources/test* directory.  We will use these to illustrate scoring.  First, some sample warnings for MERS case counts in Saudi Arabia.

In [6]:
warn_filename = "dis_test_warnings.json"
warn_path = os.path.join(TEST_RESOURCE_PATH, warn_filename)
with open(warn_path, "r") as f:
    mers_warn = json.load(f)
pprint.pprint(mers_warn)

[{'Case_Count': 2.0,
  'Country': 'Saudi Arabia',
  'Disease': 'MERS',
  'Event_Date': '2018-04-29',
  'Event_Type': 'Disease',
 {'Case_Count': 3.0,
  'Country': 'Saudi Arabia',
  'Disease': 'MERS',
  'Earliest_Reported_Date': '2018-05-06',
  'Event_Date': '2018-05-06',
  'Event_Type': 'Disease',
 {'Case_Count': 1.0,
  'Country': 'Saudi Arabia',
  'Disease': 'MERS',
  'Earliest_Reported_Date': '2018-05-13',
  'Event_Date': '2018-05-13',
  'Event_Type': 'Disease',
 {'Case_Count': 0.0,
  'Country': 'Saudi Arabia',
  'Disease': 'MERS',
  'Event_Date': '2018-05-20',
  'Event_Type': 'Disease',
 {'Case_Count': 1.0,
  'Country': 'Saudi Arabia',
  'Disease': 'MERS',
  'Event_Date': '2018-05-27',
  'Event_Type': 'Disease',
 {'Case_Count': 0.0,
  'Country': 'Saudi Arabia',
  'Disease': 'MERS',
  'Event_Date': '2018-06-03',
  'Event_Type': 'Disease',


And a subset of the GSR:

In [7]:
gsr_filename = "dis_test_gsr.json"
gsr_path = os.path.join(TEST_RESOURCE_PATH, gsr_filename)
with open(gsr_path, "r") as f:
    mers_gsr = json.load(f)
pprint.pprint(mers_gsr)

[{'Case_Count': 3.0,
  'Country': 'Saudi Arabia',
  'Disease': 'MERS',
  'Earliest_Reported_Date': '2018-04-22',
  'Encoding_Comment': 'Qunfundah(72yo),Najran(32yo),Aluhuohn(60yo)',
  'Event_Date': '2018-04-22',
  'Event_ID': 'Disease_Saudi_Arabia_MERS_2018-04-22',
  'Event_Type': 'Disease',
  'First_Reported_Link': 'empres-i.fao.org/eipws3g/',
  'GSS_Link': 'empres-i.fao.org/eipws3g/',
  'News_Source': 'empres-i.fao.org/eipws3g/',
  'Other_Links': 'promedmail.org/direct.php?id=20180508.5788906',
  'Revision_Date': '2018-05-12'},
 {'Case_Count': 2.0,
  'Country': 'Saudi Arabia',
  'Disease': 'MERS',
  'Earliest_Reported_Date': '2018-05-02',
  'Encoding_Comment': 'Hofuf(66), Sakaka(53)',
  'Event_Date': '2018-04-29',
  'Event_ID': 'Disease_Saudi_Arabia_MERS_2018-04-29',
  'Event_Type': 'Disease',
  'First_Reported_Link': 'empres-i.fao.org/eipws3g/',
  'GSS_Link': 'empres-i.fao.org/eipws3g/',
  'News_Source': 'empres-i.fao.org/eipws3g/',
  'Other_Links': 'promedmail.org/direct.php?id=201

Note that the two sets contain event dates that overlap but are not identical.  The GSR has an event on 2018-04-22 for which there is no warning.  Conversely, there are warnings for 2018-05-27 and 2018-06-03 for which there are no matching events.  We will therefore expect to see Precision and Recall values less than 1.0.

In [8]:
mers_scoring = mers_scorer.score(mers_warn, mers_gsr)
pprint.pprint(mers_scoring)

{'Details': {'QS Values': [1.0, 0.25, 1.0, 1.0]},
 'Matches': [('test_2018-04-29', 'Disease_Saudi_Arabia_MERS_2018-04-29'),
             ('test_2018-05-06', 'Disease_Saudi_Arabia_MERS_2018-05-06'),
             ('test_2018-05-13', 'Disease_Saudi_Arabia_MERS_2018-05-13'),
             ('test_2018-05-20', 'Disease_Saudi_Arabia_MERS_2018-05-20')],
 'Results': {'Precision': 0.6666666666666666,
             'Quality Score': 0.8125,
             'Recall': 0.8},
 'Unmatched GSR': ['Disease_Saudi_Arabia_MERS_2018-04-22'],


If there are no errors the scoring result is provided in JSON format with these keys:
- Details:  The list of Quality Scores for each match.
- Matches: A list of tuples with warning ID and event ID as matched.
- Results:  Values for the 3 metrics.
- Unmatched GSR:  Germane events that were not matched to warnings.
- Unmatched Warnings:  Germane warnings that were not matched to events.

Another example, using Daily Civil Unrest Counts in Egypt.

In [9]:
warn_filename = "test_egypt_daily_cu_warnings.json"
warn_path = os.path.join(TEST_RESOURCE_PATH, warn_filename)
with open(warn_path, "r") as f:
    eg_daily_warn = json.load(f)
gsr_filename = "test_egypt_daily_cu_gsr.json"
gsr_path = os.path.join(TEST_RESOURCE_PATH, gsr_filename)
with open(gsr_path, "r") as f:
    eg_daily_gsr = json.load(f)
eg_scorer = CaseCountScorer(location="Egypt", event_type="Civil Unrest")
eg_warn = eg_daily_warn[-4:]
eg_gsr = eg_daily_gsr[-5:]
eg_scoring = eg_scorer.score(eg_warn, eg_gsr)
pprint.pprint(eg_scoring)

{'Details': {'QS Values': [0.0, 0.25, 0.5]},
 'Matches': [('test_Egypt-05-29', 'CU_Count_Egypt_2018-05-29'),
             ('test_Egypt-05-30', 'CU_Count_Egypt_2018-05-30'),
             ('test_Egypt-05-31', 'CU_Count_Egypt_2018-05-31')],
 'Results': {'Precision': 0.75, 'Quality Score': 0.25, 'Recall': 0.6},
 'Unmatched GSR': ['CU_Count_Egypt_2018-05-27', 'CU_Count_Egypt_2018-05-28'],


Suppose we mix the MERS Disease warnings and the Egypt Daily Civil Unrest warnings and GSRs together.  Depending on which scorer we use different warnings and events will be considered germane.  The rest will be ignored.

In [10]:
mixed_warn = mers_warn + eg_daily_warn[-4:]
mixed_gsr = mers_gsr + eg_daily_gsr[-5:]
print("We should get the same results as the last scoring.")
as_eg_scoring = eg_scorer.score(mixed_warn, mixed_gsr)
pprint.pprint(as_eg_scoring)

We should get the same results as the last scoring.
{'Details': {'QS Values': [0.0, 0.25, 0.5]},
 'Matches': [('test_Egypt-05-29', 'CU_Count_Egypt_2018-05-29'),
             ('test_Egypt-05-30', 'CU_Count_Egypt_2018-05-30'),
             ('test_Egypt-05-31', 'CU_Count_Egypt_2018-05-31')],
 'Results': {'Precision': 0.75, 'Quality Score': 0.25, 'Recall': 0.6},
 'Unmatched GSR': ['CU_Count_Egypt_2018-05-27', 'CU_Count_Egypt_2018-05-28'],


On the other hand, if we use the *CaseCountScorer* instance with the Disease/Saudi Arabia Context we'll get the first set of results.

In [11]:
as_mers_scoring = mers_scorer.score(mixed_warn, mixed_gsr)
pprint.pprint(as_mers_scoring)

{'Details': {'QS Values': [1.0, 0.25, 1.0, 1.0]},
 'Matches': [('test_2018-04-29', 'Disease_Saudi_Arabia_MERS_2018-04-29'),
             ('test_2018-05-06', 'Disease_Saudi_Arabia_MERS_2018-05-06'),
             ('test_2018-05-13', 'Disease_Saudi_Arabia_MERS_2018-05-13'),
             ('test_2018-05-20', 'Disease_Saudi_Arabia_MERS_2018-05-20')],
 'Results': {'Precision': 0.6666666666666666,
             'Quality Score': 0.8125,
             'Recall': 0.8},
 'Unmatched GSR': ['Disease_Saudi_Arabia_MERS_2018-04-22'],


## Errors and Warnings

The *CaseCountScorer* will also catch some errors, such as negative case counts and should provide helpful error messages.

In [13]:
predicted = -1
actual = 1
CaseCountScorer.quality_score(predicted, actual)

Negative case counts are not allowed


In [14]:
predicted = 1
actual = 1
CaseCountScorer.quality_score(predicted, actual, accuracy_denominator=0)

The accuracy denominator must be positive.
