# Application of the Evaluator

The evaluator class alows to easily compare the prediciton quality of different classifiers.
With the automatic generation of a training and an evaluation set from a given datafolder the influence of differnet classifier parameters can be tested and compared. 
Plotting methods for generating different plots for the whole evaluation set are intended to make a first assesment over the effect of certain parameters possible.


## Imports

At first we need to point python to the project folder. The path can be assigned as a relative path as shown below, or as an absolute system path.
Than the module can be imported via the `import evaluator` command.

In [1]:
import sys
from ctypytool import evaluator

import warnings
warnings.filterwarnings("ignore")

## Initialization
At first we create two new instances of the evaluator object. Those will represent two different classifiers that we will compare in their prediciton quality later on.

In [2]:
ev1 = evaluator.evaluator()
ev2 = evaluator.evaluator()

In [3]:
ev1

CTYPYTOOL CLOUD PROJECT class

... project_path              : None

CTYPYTOOL PARAMETER HANDLER class

=== Parameters ===

... ccp_alpha                 : 0
... classifier_type           : Forest
... cloudtype_channel         : ct
... data_source_folder        : ../data/full_dataset
... difference_vectors        : True
... feature_preselection      : False
... georef_file               : ../data/auxilary_files/msevi-medi-georef.nc
... hours                     : [0]
... input_channels            : ['bt062', 'bt073', 'bt087', 'bt097', 'bt108', 'bt120', 'bt134']
... input_source_folder       : ../data/example_data
... label_file_structure      : nwcsaf_msevi-medi-TIMESTAMP.nc
... mask_file                 : ../data/auxilary_files/lsm_mask_medi.nc
... mask_key                  : land_sea_mask
... mask_sea_coding           : 0
... max_depth                 : 35
... max_features              : None
... merge_list                : []
... min_samples_split         : 2
... n_estimators       

Than we need to create new cloud classifier projects for running the evaluation. As is the case with the cloud_clasifier class, the `create_new_project()` method will create a classifier new project.

In [4]:
path_1 = "../classifiers/evaluations/classifier_1"
ev1.create_new_project(path_1)

path_2 = "../classifiers/evaluations/classifier_2"
ev2.create_new_project(path_2)

Folder with given name already exits! Loading existing project!
Folder with given name already exits! Loading existing project!


Then we can alter the project parameters for both classifier project indivdually. Here we only change the type of classifier we are going to train, but we could also alter all other parameters, such as the sampling size of the training data, the fact if we want to create difference vectors from our data,  the input channels we are going to use and many more. See the notebook **Changing_Project_Parameters** for more details.

In [5]:
ev1.set_project_parameters(classifier_type = "Tree")
ev2.set_project_parameters(classifier_type = "Forest")

## Evaluating 

### Splitting Dataset

The method `create_split_trainingset` will split the training data specified under the project parameter `data_source_folder` into an training set and a disjunct set used for evaluation.
The parameter `eval_size` determines the number of files used for evaluation, while the parameter `timesensitive` will make sure an equal number of files is selected for each hour of the day.

In [6]:
ev1.create_split_trainingset(eval_size=24, timesensitive=True)

Comparing differnt classifiers, requires similiar preconditions in all cases. In order to make sure we use the same distribution of files into training data and evaluation data, we can use the `copy_evaluation_split` method. The parameter `source_project` determines from which project the data is copied.

In [7]:
ev2.copy_evaluation_split(source_project = path_1)

Filelist copied from ../classifiers/evaluations/classifier_1


### Creating Evaluation Data
The Method `create_evaluation_data` will train the classifier with the training data specified in the previews step and then use this classifer to predict labels for all data specified as evaluation data.

In [8]:
ev1.create_evaluation_data()
ev2.create_evaluation_data()

Trainig evaluation classifier
Masked indices set!
Sampling dataset 1/668

sampling took 83.90552854537964 seconds
Removed 680 vectors for containig 'Nan' values
Training data created!
Training Tree Classifier
Classifier trained!
Classifier saved!
Prediciting evaluation labels
Classifier loaded!
Masked indices set!
Input vectors created!
Labels saved as nwcsaf_msevi-medi-20190504_0000_predicted.nc
Input vectors created!
Labels saved as nwcsaf_msevi-medi-20191113_0100_predicted.nc
Input vectors created!
Labels saved as nwcsaf_msevi-medi-20191014_0200_predicted.nc
Input vectors created!
Labels saved as nwcsaf_msevi-medi-20190219_0300_predicted.nc
Input vectors created!
Labels saved as nwcsaf_msevi-medi-20201114_0400_predicted.nc
Input vectors created!
Labels saved as nwcsaf_msevi-medi-20200331_0500_predicted.nc
Input vectors created!
Labels saved as nwcsaf_msevi-medi-20200411_0600_predicted.nc
Input vectors created!
Labels saved as nwcsaf_msevi-medi-20200405_0700_predicted.nc
Input vectors created!
Labels saved as nwcsaf_msevi-medi-20200129_0800_predicted.nc
In

### Evaluation Plots
When the evaluation data has been created we can automatically create different plots to visualize and compare the classification quality. The method `create_evaluation_plots` takes the boolean arguments `correlation`, `probabilities`, `comparison`, `overallCorrelation` too determine which plots to create. Plots are created for all predicted labels in comparison to the original evaluation data and will be saved in the project folder. The parameter `show` determines if the plots are additionaly displayed directly.

#### Correlation Matrices
The first argument determines if correlation matrices are calculated and plotted for all files in the evaluation data individually.

In [9]:
ev1.create_evaluation_plots(correlation=True, show=False)
ev2.create_evaluation_plots(correlation=True, show=False)

Correlation Matrix saved as 20190504_0000_CoocurrenceMatrix.png
Correlation Matrix saved as 20191113_0100_CoocurrenceMatrix.png
Correlation Matrix saved as 20191014_0200_CoocurrenceMatrix.png
Correlation Matrix saved as 20190219_0300_CoocurrenceMatrix.png
Correlation Matrix saved as 20201114_0400_CoocurrenceMatrix.png
Correlation Matrix saved as 20200331_0500_CoocurrenceMatrix.png
Correlation Matrix saved as 20200411_0600_CoocurrenceMatrix.png
Correlation Matrix saved as 20200405_0700_CoocurrenceMatrix.png
Correlation Matrix saved as 20200129_0800_CoocurrenceMatrix.png
Correlation Matrix saved as 20190419_0900_CoocurrenceMatrix.png
Correlation Matrix saved as 20201106_1000_CoocurrenceMatrix.png
Correlation Matrix saved as 20190507_1100_CoocurrenceMatrix.png
Correlation Matrix saved as 20190101_1200_CoocurrenceMatrix.png
Correlation Matrix saved as 20191202_1300_CoocurrenceMatrix.png
Correlation Matrix saved as 20191020_1400_CoocurrenceMatrix.png
Correlation Matrix saved as 20200128_150

#### Overall Correlation Matrix
Alternatively it is also possible to create a single correlation matrix from all the data files in the evaluatin set. This gives a more general overview over the classifiers performance.

In [10]:
ev1.create_evaluation_plots(overallCorrelation=True, show=False)
ev2.create_evaluation_plots(overallCorrelation=True, show=False)

Correlation Matrix saved as Overall_CoocurrenceMatrix.png
Correlation Matrix saved as Overall_CoocurrenceMatrix.png


#### Comparison plots
With the `comparison` parameter we can plot the resulting maps next to each other to get an overview over spacial deviations from the original data. When we, as in the first example leave the parameter `cmp_targets` blank, we will only plot the predicted labels next to the original data.

In [11]:
ev1.create_evaluation_plots(comparison=True, show=False)

Comparison Plot saved as 20190504_0000_ComparisonPlot.png
Comparison Plot saved as 20191113_0100_ComparisonPlot.png
Comparison Plot saved as 20191014_0200_ComparisonPlot.png
Comparison Plot saved as 20190219_0300_ComparisonPlot.png
Comparison Plot saved as 20201114_0400_ComparisonPlot.png
Comparison Plot saved as 20200331_0500_ComparisonPlot.png
Comparison Plot saved as 20200411_0600_ComparisonPlot.png
Comparison Plot saved as 20200405_0700_ComparisonPlot.png
Comparison Plot saved as 20200129_0800_ComparisonPlot.png
Comparison Plot saved as 20190419_0900_ComparisonPlot.png
Comparison Plot saved as 20201106_1000_ComparisonPlot.png
Comparison Plot saved as 20190507_1100_ComparisonPlot.png
Comparison Plot saved as 20190101_1200_ComparisonPlot.png
Comparison Plot saved as 20191202_1300_ComparisonPlot.png
Comparison Plot saved as 20191020_1400_ComparisonPlot.png
Comparison Plot saved as 20200128_1500_ComparisonPlot.png
Comparison Plot saved as 20191220_1600_ComparisonPlot.png
Comparison Plo

Alternatively we can specify a list of other projects using their project paths. In that case the data from all those project will be plotted next to the data from the evaluator project from which the method is called as well as the original data.

In [12]:
other_projects = [path_1]
titles = ["Rand. Forest.", "Dec. Tree"]

ev2.create_evaluation_plots(comparison=True, cmp_targets=other_projects, plot_titles=titles, show = False)

Comparison Plot saved as 20190504_0000_ComparisonPlot.png
Comparison Plot saved as 20191113_0100_ComparisonPlot.png
Comparison Plot saved as 20191014_0200_ComparisonPlot.png
Comparison Plot saved as 20190219_0300_ComparisonPlot.png
Comparison Plot saved as 20201114_0400_ComparisonPlot.png
Comparison Plot saved as 20200331_0500_ComparisonPlot.png
Comparison Plot saved as 20200411_0600_ComparisonPlot.png
Comparison Plot saved as 20200405_0700_ComparisonPlot.png
Comparison Plot saved as 20200129_0800_ComparisonPlot.png
Comparison Plot saved as 20190419_0900_ComparisonPlot.png
Comparison Plot saved as 20201106_1000_ComparisonPlot.png
Comparison Plot saved as 20190507_1100_ComparisonPlot.png
Comparison Plot saved as 20190101_1200_ComparisonPlot.png
Comparison Plot saved as 20191202_1300_ComparisonPlot.png
Comparison Plot saved as 20191020_1400_ComparisonPlot.png
Comparison Plot saved as 20200128_1500_ComparisonPlot.png
Comparison Plot saved as 20191220_1600_ComparisonPlot.png
Comparison Plo

Finally, using the `probabilities` flag, we can also plot the proability scores of Forest classifers next to their prediction maps and the original data

In [13]:
ev2.create_evaluation_plots(probabilities=True, show=False)

Probability Plot saved as 20190504_0000_ProbabilityPlot.png
Probability Plot saved as 20191113_0100_ProbabilityPlot.png
Probability Plot saved as 20191014_0200_ProbabilityPlot.png
Probability Plot saved as 20190219_0300_ProbabilityPlot.png
Probability Plot saved as 20201114_0400_ProbabilityPlot.png
Probability Plot saved as 20200331_0500_ProbabilityPlot.png
Probability Plot saved as 20200411_0600_ProbabilityPlot.png
Probability Plot saved as 20200405_0700_ProbabilityPlot.png
Probability Plot saved as 20200129_0800_ProbabilityPlot.png
Probability Plot saved as 20190419_0900_ProbabilityPlot.png
Probability Plot saved as 20201106_1000_ProbabilityPlot.png
Probability Plot saved as 20190507_1100_ProbabilityPlot.png
Probability Plot saved as 20190101_1200_ProbabilityPlot.png
Probability Plot saved as 20191202_1300_ProbabilityPlot.png
Probability Plot saved as 20191020_1400_ProbabilityPlot.png
Probability Plot saved as 20200128_1500_ProbabilityPlot.png
Probability Plot saved as 20191220_1600_

The plots are saved in accordingly named subfolders in the newly created `plots` folder inside the project folder. 

In [14]:
%%bash

ls -l ../classifiers/evaluations/*/plots


../classifiers/evaluations/classifier_1/plots:
total 8
drwxr-xr-x 2 b380352 bb1174 4096 Aug 18 13:25 Comparisons
drwxr-xr-x 2 b380352 bb1174 4096 Aug 18 13:24 Coocurrence

../classifiers/evaluations/classifier_2/plots:
total 12
drwxr-xr-x 2 b380352 bb1174 4096 Aug 18 13:26 Comparisons
drwxr-xr-x 2 b380352 bb1174 4096 Aug 18 13:25 Coocurrence
drwxr-xr-x 2 b380352 bb1174 4096 Aug 18 13:27 Probabilities


It is also possible to perform all above steps in one single function call by setting all parameters simultaneously.