# Statistical Evaluation Tests for One-Day Forecasts

This demo provides step-by-step instructions on how to invoke statistical **T** (the classical paired t-test) and **W** (the Wilcoxon signed-rank test) evaluation tests for three one-day California forecasts models over 10 day time period that includes Napa event of August, 24, 2014. It also explains which data products and images are generated by the evaluation tests, and allows user to view results. This tutorial uses *EvaluationTest.py* CSEP Python module in standalone mode to invoke the tests. Python code, which is a simplified implementation of standalone functionality of the *EvaluationTest.py* module, is also provided in case users want to integrate this CSEP functionality within their custom Python routines.

## Test Case

   Forecast group within CSEP is defined as a collection of comparable forecasts for the same testing region with the same target earthquakes. Comparable one-day forecasts files for our test case are stored in *OneDayEvaluation/forecasts* directory, where *OneDayEvaluation* directory represents the group.
   
   Daily forecasts are organized (''archived'') within CSEP by year and month of the start date of each forecast testing period. Forecast group's directory contains *archive* subdirectory where all previously generated daily forecasts are stored in *archive/YYYY-M[M]* sub-directories. It allows CSEP testing framework to reuse already generated forecasts for any evaluation which requires existence of all daily forecasts prior and including the day of evaluation. For example, statistical evaluation of daily forecasts over the time period of 2014/08/20 - 2014/08/31 requires existence of daily forecasts for each day of that testing period. Archived forecasts files for our test case are stored in *OneDayEvaluaton/forecasts/archive/2014_8* directory.

   Please note that *OneDayEvaluaton/forecasts/archive/2014_8* sub-directory represents start date of all forecasts that are involved into this test case evaluation. If testing period would span over 3 months of data (for example, 2014/7/15 through 2014/9/15) then *archive* directory should have sub-directory per each month of existing forecast data (*2014_7*, *2014_8*, *2014_9*). Each of these subdirectories would store forecasts with corresponding start date.

   This test case uses three one-day forecasts models for which forecasts files are stored in *OneDayEvaluation/forecasts/archive/2014_8* directory: 
   * *ETASV1.1*
   * *KJSSOneDayCalifornia*
   * *STEPJAVA*
   
   Forecasts files for evaluation test day of 2014/08/31, such as *ETASV1.1_8_31_2014-fromXML.dat*, *STEPJAVA_8_31_2014-fromXML.dat* and *KJSSOneDayCalifornia_8_31_2014-fromXML.dat*, should be stored in *OneDayEvaluation/forecasts* directory. CSEP examines *OneDayEvaluation/forecasts* directory to learn which forecasts models participate in the evaluation.

In [None]:
!ls OneDayEvaluation/forecasts

In [None]:
!ls OneDayEvaluation/forecasts/archive

In [None]:
!ls OneDayEvaluation/forecasts/archive/2014_8

  There are two catalog files in the *observations* directory. *catalog.nodecl.dat* is a daily observation catalog, while *cumulative.catalog.nodecl.dat* file represents all observed events since beginning of the testing period, which is set to 2014/08/20 for this test case. T and W evaluation tests use only cumulative observation catalog since evaluation is performed over cumulative testing period of the model's existence within CSEP. Cumulative catalog for this test case consists of two events and confirms to the ASCII [ZMAP](https://northridge.usc.edu/trac/csep/wiki/catalogZMAPformat) format:

In [None]:
!ls OneDayEvaluation/observations

In [None]:
!cat OneDayEvaluation/observations/cumulative.catalog.nodecl.dat

### Forecast Group Configuration File

This test case relies on forecast group's configuration file *OneDayEvaluation/forecast.init.xml*. CSEP automatically detects any *forecast.init.xml* configuration files that exist for forecasts groups and "learns" necessary information about the group's models and evaluations tests for the group from that file. *forecast.init.xml* file is  XML format, with an example for our test case:

In [None]:
!cat OneDayEvaluation/forecast.init.xml

XML format elements of such configuration file store necessary information about forecast group and specify which models construct the group as well as evalution tests that are being applied to the forecasts. Configuration file for our test case's group contains the following elements:

   * forecastDir - directory to store forecasts files in.
   * catalogDir - directory to store observation catalog files in.
   * postProcessing - keyword which identifies catalog filtering Python module within CSEP for specific forecast group type. It's set to 'OneDayModel' for our test case, and is specific in how observation catalog is being constructed for evaluation of one-day forecasts.
   * entryDate - entry date of participating forecasts models into testing center. This date determines start date for cumulative catalog which is used by evaluation methods of the models.
   * models - Space separated list of forecasts models for the group. This list is not being used if forecasts files are not being generated by the test case (like this one).
   * evaluationTests - Space separated list of evaluation tests to invoke.

### Command-line Options

Since we use *EvaluationTest.py* Python module in standalone mode to invoke our test case, we also use command-line options to provide *EvaluationTest.py* module with necessary information about forecasts and observations that are being evaluated.

The following command-line options should be provided to the *EvaluationTest.py* module to invoke evalation tests.

   Testing period is defined by start date of 2014/08/20 and test date of 2014/08/31 inclusively. Test date for evaluation is set to 2014/08/31.

* *--year=2014* - Year of the test date
* *--month=8* - Month of the test date
* *--day=31* - Day of the test date
* *--forecasts=OneDayEvaluation* - Directory that represents forecast group for evaluation
* *--testDir=OneDayEvaluation/TWScriptResults* - Directory to store results to

In [None]:
!python3 $CENTERCODE/src/generic/EvaluationTest.py --year=2014 --month=8 --day=31 --forecasts=OneDayEvaluation --testDir=OneDayEvaluation/TWScriptResults

### Tests Results

   This section examines data products that **T** and **W** evaluation tests generated by running above **python3** command.
   Please note that each data product, as generated by the CSEP, has corresponding metadata file with identical filename with an additional *.meta* extention. Metadata file captures information on how each data product has been generated and is used for reproducibility of the results only. You can ignore all generated *.meta files for now.
   
   For example, metadata file for the T-Test result file has the following content:

In [None]:
!cat OneDayEvaluation/TWScriptResults/scec.csep.StatisticalTest.sTest_T-Test.xml.*[1-9].meta

#### Forecast Scale Factor

   Forecast scale factor, that corresponds to the test date of 2006/09/01 within testing period, is captured within *OneDayEvaluation/TWScriptResults/ForecastScaleFactor.dat* file:

In [None]:
!cat OneDayEvaluation/TWScriptResults/ForecastScaleFactor.dat

#### T-Test Results Files

Result file with **scec.csep.StatisticalTest.sTest_T-Test.xml.** prefix respresents T-test evaluation results for both models.


In [None]:
!cat OneDayEvaluation/TWScriptResults/scec.csep.StatisticalTest.sTest_T-Test.xml.*[1-9]

   Information gain plot, that corresponds to the T-test evaluation results, is stored in SVG format image file with **InformationGain** keyword per each model. Model name appears as part of the SVG image file and as title of the plot, and considered to be a reference model for the results in the plot.

In [None]:
import glob, shutil
from IPython.core.display import SVG

# Locate T-test information gain plot file for ETAS forecast
image_file = glob.glob('OneDayEvaluation/TWScriptResults/scec.csep.StatisticalTest.sTest_T-Test_ETASV1.1_8_31_2014_InformationGain.svg.*[0-9]')[0]
print(image_file)
SVG(image_file)

In [None]:
# Locate T-test information gain plot file for STEPJava forecast
image_file = glob.glob('OneDayEvaluation/TWScriptResults/scec.csep.StatisticalTest.sTest_T-Test_STEPJAVA_8_31_2014_InformationGain.svg.*[0-9]')[0]
print(image_file)
SVG(image_file)

In [None]:
# Locate T-test information gain plot file for KJSS forecast
image_file = glob.glob('OneDayEvaluation/TWScriptResults/scec.csep.StatisticalTest.sTest_T-Test_KJSSOneDayCalifornia_8_31_2014_InformationGain.svg.*[0-9]')[0]
print(image_file)
SVG(image_file)

   Probability gain plot, that corresponds to the T-test evaluation results, is stored in SVG format image file with **ProbabilityGain** keyword per each model. Model name appears as part of the SVG image file and as title of the plot, and considered to be a reference model for the results in the plot.

In [None]:
# Locate T-test probability gain plot file for ETAS forecast
image_file = glob.glob('OneDayEvaluation/TWScriptResults/scec.csep.StatisticalTest.sTest_T-Test_ETASV1.1_8_31_2014_ProbabilityGain.svg.*[0-9]')[0]
print(image_file)
SVG(image_file)

In [None]:
# Locate T-test information gain plot file for STEPJava forecast
image_file = glob.glob('OneDayEvaluation/TWScriptResults/scec.csep.StatisticalTest.sTest_T-Test_STEPJAVA_8_31_2014_ProbabilityGain.svg.*[0-9]')[0]
print(image_file)
SVG(image_file)

In [None]:
# Locate T-test information gain plot file for KJSS forecast
image_file = glob.glob('OneDayEvaluation/TWScriptResults/scec.csep.StatisticalTest.sTest_T-Test_KJSSOneDayCalifornia_8_31_2014_ProbabilityGain.svg.*[0-9]')[0]
print(image_file)
SVG(image_file)

#### W-Test ResultsFiles

Result file with **scec.csep.StatisticalTest.sTest_W-Test.xml.** prefix respresents W-test evaluation results for both models.


In [None]:
!cat OneDayEvaluation/TWScriptResults/scec.csep.StatisticalTest.sTest_W-Test.xml.*[1-9]

   Information gain plot, that corresponds to the W-test evaluation results, is stored in SVG format image file with **InformationGain** keyword per each model. Model name appears as part of the SVG image file and as title of the plot, and considered to be a reference model for the results in the plot.

In [None]:
# Locate W-test information gain plot file for ETAS forecast
image_file = glob.glob('OneDayEvaluation/TWScriptResults/scec.csep.StatisticalTest.sTest_W-Test_ETASV1.1_8_31_2014_InformationGain.svg.*[0-9]')[0]
print(image_file)
SVG(image_file)

In [None]:
# Locate W-test information gain plot file for STEPJava forecast
image_file = glob.glob('OneDayEvaluation/TWScriptResults/scec.csep.StatisticalTest.sTest_W-Test_STEPJAVA_8_31_2014_InformationGain.svg.*[0-9]')[0]
print(image_file)
SVG(image_file)

In [None]:
# Locate W-test information gain plot file for KJSS forecast
image_file = glob.glob('OneDayEvaluation/TWScriptResults/scec.csep.StatisticalTest.sTest_W-Test_KJSSOneDayCalifornia_8_31_2014_InformationGain.svg.*[0-9]')[0]
print(image_file)
SVG(image_file)

   Probability gain plot, that corresponds to the T-test evaluation results, is stored in SVG format image file with **ProbabilityGain** keyword per each model. Model name appears as part of the SVG image file and as title of the plot, and considered to be a reference model for the results in the plot.

In [None]:
# Locate W-test probability gain plot file for ETAS forecast
image_file = glob.glob('OneDayEvaluation/TWScriptResults/scec.csep.StatisticalTest.sTest_W-Test_ETASV1.1_8_31_2014_ProbabilityGain.svg.*[0-9]')[0]
print(image_file)
SVG(image_file)

In [None]:
# Locate W-test information gain plot file for STEPJava forecast
image_file = glob.glob('OneDayEvaluation/TWScriptResults/scec.csep.StatisticalTest.sTest_W-Test_STEPJAVA_8_31_2014_ProbabilityGain.svg.*[0-9]')[0]
print(image_file)
SVG(image_file)

In [None]:
# Locate W-test information gain plot file for KJSS forecast
image_file = glob.glob('OneDayEvaluation/TWScriptResults/scec.csep.StatisticalTest.sTest_W-Test_KJSSOneDayCalifornia_8_31_2014_ProbabilityGain.svg.*[0-9]')[0]
print(image_file)
SVG(image_file)

### Adding Your Forecast to the Test Case

   To add your own forecast to the test case, please place your daily forecast file for the test date of 2014/08/31 in ASCII [CSEPForecast](https://northridge.usc.edu/trac/csep/wiki/ForecastFormat) format under *OneDayEvaluation/forecasts* directory. Forecast file name should follow the same naming convention as existing forecasts: ModelName_M_D_YYYY.dat. Other daily forecasts files that correspond to the whole testing period should be placed under *OneDayEvaluation/forecasts/archive/2014_8* directory. Once forecasts files have been added to the forecast group's directory structure, you can just re-run the test case.

### Python Code to Run Evaluation Test

   Detailed Python code below provides (simplified) behind the scenes details of what provided above **python3** command does when *EvaluationTest.py* module is invoked in standalone mode.

   Please note that we use different *OneDayEvaluation/PythonResults* directory to store results data to when invoking the code below.

In [None]:
import datetime

# Import CSEP modules
import CSEPUtils
from ForecastGroup import ForecastGroup
from EvaluationTest import EvaluationTest
from PostProcess import PostProcess
from TStatisticalTest import TStatisticalTest
from WStatisticalTest import WStatisticalTest

# Path to the forecast group directory
forecast_dir = 'OneDayEvaluation'
# Path to the evaluation test results (please note it's different from above 'OneDayEvaluation/TWScriptResults')
results_dir = 'OneDayEvaluation/PythonResults'

# Test date for evaluation
test_date = datetime.datetime(2014, 8, 31)

# Instantiate forecast group for the tests
forecast_group = ForecastGroup(forecast_dir)

# Observation catalog directory as provided in OneDayEvaluation/forecast.init.xml file
catalog_dir = forecast_group.catalogDir()

# Run evaluation tests        
for each_set in forecast_group.tests:
    for each_test in each_set:
        # Use the same directory for catalog data and test results: options.test_dir
        print('Running %s evaluation test' %each_test.Type)
        each_test.run(test_date,
                      catalog_dir,
                      results_dir)
         
        # Update cumulative summaries if any
        each_test.resultData()
       
del forecast_group
forecast_group = None

print('Done with evaluation tests for %s group.' %forecast_dir)

*OneDayEvaluation/PythonResults* directory contains the same results as previously examined results generated by the **python3** commmand, just with different filenames:

In [None]:
!ls OneDayEvaluation/PythonResults