# Scoring Military Activity Warnings Using ExpressScore

The **ExpressScore** package provides a lightweight method for Mercury Challenge participants to score their warnings against ground truth.  Unlike the official Challenge scoring system, **ExpressScore** does not require a Docker installation.
In this notebook we will show you how to use **ExpressScore**'s *MaScorer* class to score warnings for the Military Activity event type.  The *MaScorer* matches input warnings and ground truth and computes four metrics.  The primary metrics used in ranking Challenge participants are the Quality Score and F1.  

## Metrics

### F1

F1 is an informational retrieval metric that balances the ability to match real world events (Recall) with the ability to issue warnings that can be matched (Precision).  The Precision metric reports what proportion of the germane input warnings were matched to events and Recall reports what proportion of events matched germane input warnings:

$Precision = \frac{matches}{warnings}$

$Recall = \frac{matches}{events}$

F1 is the harmonic mean of Precision and Recall and is computed as:

$F1 = \frac{2*Precision*Recall}{Precision + Recall}$

### Quality Score

Quality Score (QS) measures the closeness of the warning to the ground truth event according to 4 features:
- Location Score (LS) is based on the distance between the predicted location in the warning and the actual location in the event.
- Date Score (DS) is based on the difference between the predicted event date in the warning and the actual event date.
- Event Subtype Score (ESS) is based on the match between the predicted event subtype and the actual event subtype.
- Actor Score (AS) is based on the match between the predicted actor and the actual actor.
Each of these features has an associated Component Score in the range from 0.0 to 1.0.  Quality Score is the sum of these component scores $QS = LS + DS + ESS + AS$ and takes values in the range from 0.0 to 4.0.  The Component Scores are defined here:

#### Location Score

Location Score (LS) is defined as:

$LS = 1 - \frac{distance in km}{100}$ (when the event *Approximate_Location* field is "False")
or $LS = 1 - \frac{distance in km - 16.67}{83.33}$ (when the event *Approximate_Location* field is "True")

The distance of 16.67 km represents a circle of uncertainty around the location encoded in the GSR.

#### Date Score

Date Score (DS) is defined as:
$DS = 1 - \frac{\mid PredictedEventDate - ActualEventDate\mid}{4.0}$ when the difference in days is less than 4 days.

#### Event Subtype Score

Event Subtype Score (ESS) is 1 if the predicted and actual event subtypes match, 0 otherwise.

#### Actor Score

Actor Score (AS) is 1 if either the actual Actor list includes a wildcard value such as "Unspecified" or it contains the predicted Actor value.  Otherwise it is 0.

### Germane Warnings/Events

A warning or an event is "germane" if it matches the country and event type used in constructing the *MaCountScorer* instance.  Non-germane warnings or events are not included in the computation of any of the metrics.  More on this later.

## MaScorer methods

*MaCountScorer* has 3 methods that Challenge participants may find useful:

- *quality_score(predicted, actual)*:  A static method that computes the Quality Score for the input predicted and actual case counts.
- *score\_one(warn\_, event\_)*: A static method that parses the JSON inputs for the warning and the event and computes the Quality Score.  Provides diagnostics if there are errors in the input.
- *score(warn_data, gsr_data)*: Matches and scores the lists of JSON-formatted warning and GSR (event) data.  Computes all 4 metrics and provides other output details.

We will walk through the use of each of these with some examples

In [1]:
import sys
sys.path.append("..")
import os
import json
import pprint
from geopy.distance import distance
from main.express_score import Scorer, MaScorer

In [2]:
EXPRESS_SCORE_HOME = os.path.abspath("..")
RESOURCE_PATH = os.path.join(EXPRESS_SCORE_HOME, "resources")
TEST_RESOURCE_PATH = os.path.join(RESOURCE_PATH, "test")
# Path to Lebanon test files
LB_TEST_PATH = os.path.join(TEST_RESOURCE_PATH, "lb_ma_may_2018")
IQ_TEST_PATH = os.path.join(TEST_RESOURCE_PATH, "iq_ma_may_2018")
SY_TEST_PATH = os.path.join(TEST_RESOURCE_PATH, "sy_ma_may_2018")

Let's load a set of warnings from the Baserate model and the May GSR for Lebanon

In [3]:
test_gsr_path = os.path.join(LB_TEST_PATH, "test_lb_gsr.json")
with open(test_gsr_path, "r", encoding="utf8") as f:
    test_gsr = json.load(f)
test_warn_path = os.path.join(LB_TEST_PATH,"test_lb_warnings.json")
with open(test_warn_path, "r", encoding="utf8") as f:
    test_warn = json.load(f)
print("Lebanon has {0} warnings and {1} events".format(len(test_warn), len(test_gsr)))



An example warning:

In [4]:
w = test_warn[2]
pprint.pprint(w)

{'Actor': 'Lebanese Military',
 'City': 'Ouâdi el Kâf',
 'Country': 'Lebanon',
 'Event_Date': '2018-05-24',
 'Event_Subtype': 'Force Posture',
 'Event_Type': 'Military Activity',
 'Latitude': 34.3444,
 'Longitude': 35.9478,
 'Probability': 0.5811039169,
 'State': 'Liban-Nord',
 'timestamp': '2018-05-20T6:17:28.0'}


An example event

In [5]:
e = test_gsr[2]
pprint.pprint(e)

{'Actor': 'Lebanese Military',
 'Approximate_Location': 'False',
 'City': 'Tripoli',
 'Country': 'Lebanon',
 'Earliest_Reported_Date': '2018-05-22',
 'Event_Date': '2018-05-22',
 'Event_ID': 'MN2',
 'Event_Subtype': 'Force Posture',
 'Event_Type': 'Military Activity',
 'First_Reported_Link': 'http://nna-leb.gov.lb/en/show-news/91451/',
 'GSS_Link': 'http://nna-leb.gov.lb/en/show-news/91451/',
 'Latitude': 34.4367,
 'Longitude': 35.8497,
 'News_Source': 'NNA',
 'Other_Links': None,
 'Revision_Date': '2018-06-06',
 'State': 'Liban-Nord'}


## *quality_score* and component score methods


### Location Score

In [6]:
dist = distance((w["Latitude"], w["Longitude"]),( e["Latitude"], e["Longitude"])).km
print("The Distance between the warning and event is {:.2f} km".format(dist))
ls = MaScorer.location_score(dist)
print("LS = {:.2f}".format(ls))

LS = 0.86


### Date Score

In [7]:
dd = Scorer.date_diff(w["Event_Date"], e["Event_Date"])
print("The Date Difference between the warning and event is {} days".format(dd))
ds = MaScorer.date_score(dd)
print("DS = {:.2f}".format(ds))

DS = 0.50


### Event Subtype Score

In [8]:
w_subtype = w["Event_Subtype"]
e_subtype = e["Event_Subtype"]
ess = MaScorer.event_subtype_score(w_subtype, e_subtype)
print("Warning Event Subtype: {0}, Actual Event Subtype: {1}, ESS={2}".format(w_subtype, e_subtype, ess))



### Actor Score

In [9]:
w_actor = w["Actor"]
e_actor = e["Actor"]
acs = MaScorer.actor_score(w_actor, e_actor)
print("Warning Actor: {0}, Event Actor: {1}, AS={2}".format(w_actor, e_actor, acs))



### Quality Score

In [10]:
qs = ls + ds + ess + acs
print("Warning-Event QS = {:.2f}".format(qs))



## *score_one*

The *score_one* method for #MaScorer# takes a single warning and a single event and scores them against each other.

In [11]:
scoring_ = MaScorer.score_one(w, e)
pprint.pprint(scoring_)

{'Actor Score': 1,
 'Approximate_Location': 'False',
 'Date Difference': 2,
 'Date Score': 0.5,
 'Distance': 13.646094360548306,
 'Event Subtype Score': 1,
 'Event_ID': 'MN2',
 'Location Score': 0.8635390563945169,
 'Quality Score': 3.363539056394517,


## *score*

The *score* method for **MaScorer** compares a list of JSON formatted warnings to a list of JSON formatted events.  Using the Munkres or Hungarian algorithm <https://en.wikipedia.org/wiki/Hungarian_algorithm> the warnings are matched to events so as to optimize the aggregate Quality Score metric, subject to the constraint that neither LS nor DS for the match can be 0.

The output of the *score* method includes a list of warning/event pairs, a vector of quality score values, and mean metrics.

In [12]:
scoring_ = MaScorer.score(test_warn, test_gsr)
pprint.pprint(scoring_)

{'Details': {'Quality Scores': [3.820949051145861,
                                3.823585527754669,
                                3.5,
                                3.3259779435050736]},
 'F1': 0.38095238095238093,
 'Matches': [('BR_8', 'MN0'),
             ('BR_12', 'MN3'),
             ('BR_13', 'MN2'),
             ('BR_16', 'MN1')],
 'Mercury Score': 1.285359413602731,
 'Precision': 0.23529411764705882,
 'Quality Score': 3.6176281306014006,
 'Recall': 1.0}


The matching algorithm takes O(N^3) to compute, where N is the approximate number of warnings or events.  For the more eventful countries of Syria or Iraq it can take a few minutes to do the scoring.  We show this using the scoring for Iraq below.

In [13]:
test_gsr_path = os.path.join(IQ_TEST_PATH, "test_cc_gsr.json")
with open(test_gsr_path, "r", encoding="utf8") as f:
    test_gsr = json.load(f)
test_warn_path = os.path.join(IQ_TEST_PATH,"test_cc_warnings.json")
with open(test_warn_path, "r", encoding="utf8") as f:
    test_warn = json.load(f)
print("Iraq has {0} warnings and {1} events".format(len(test_warn), len(test_gsr)))



In [14]:
scoring_ = MaScorer.score(test_warn, test_gsr)
for m in ["Quality Score", "F1", "Precision", "Recall", "Mercury Score"]:
    print("{0} = {1:.3f}".format(m, scoring_[m]))

Quality Score = 2.735
F1 = 0.866
Precision = 0.792
Recall = 0.954
Mercury Score = 1.549


Syria scoring:

In [15]:
test_gsr_path = os.path.join(SY_TEST_PATH, "test_cc_gsr.json")
with open(test_gsr_path, "r", encoding="utf8") as f:
    test_gsr = json.load(f)
test_warn_path = os.path.join(SY_TEST_PATH,"test_cc_warnings.json")
with open(test_warn_path, "r", encoding="utf8") as f:
    test_warn = json.load(f)
print("Syria has {0} warnings and {1} events".format(len(test_warn), len(test_gsr)))



In [16]:
scoring_ = MaScorer.score(test_warn, test_gsr)
for m in ["Quality Score", "F1", "Precision", "Recall", "Mercury Score"]:
    print("{0} = {1:.3f}".format(m, scoring_[m]))

Quality Score = 3.510
F1 = 0.649
Precision = 0.494
Recall = 0.942
Mercury Score = 1.526
