# Tracking sensAI experiments

In this notebook we will demonstrate how to use sensAI's tracking utilities with evaluators
and parameter sweeps. Several backends are supported and it is very easy to write a new custom adapter
for a different tracking framework. In this notebook we will use [trains](https://github.com/allegroai/trains)
as tracking backend. After running it, you can access the results on the trains
[demoserver](https://demoapp.trains.allegro.ai/) (if you have not provided your own trains config)

In [None]:
# Note - this cell should be executed only once per session
%load_ext autoreload
%autoreload 2

import sys, os

# in order to get the config, it is not part of the library
os.chdir("..")
sys.path.append(os.path.abspath("."))

In [None]:
import geopandas as gp

from sensai.hyperopt import GridSearch
from sklearn.cluster import DBSCAN
import logging

from sensai.clustering.coordinate_clustering import SKLearnCoordinateClustering
from sensai.evaluation.evaluator_clustering import ClusteringModelSupervisedEvaluator
from sensai.evaluation.clustering_ground_truth import PolygonAnnotatedCoordinates
from sensai.tracking.clearml_tracking import ClearMLExperiment

import matplotlib.pyplot as plt

from config import get_config

logging.basicConfig(level=logging.INFO)
c = get_config(reload=True)

### Evaluators

The main entrypoint to reproducible experiments is the evaluator api. We will use clustering evaluation for
demonstration purposes. We load the data and create a SupervisedClusteringEvaluator, see
[intro to evaluation](Clustering%20Evaluation.ipynb) for more details.

[comment]: <> (TODO - use some VectorModel with an sklearn dataset instead, move the notebook to sensAI repo)

In [None]:
# loading the data and ground truth labels
sampleFile = c.datafile_path("sample", stage=c.RAW) # this can point to a directory or a shp/geojson file
sampleGeoDF = gp.read_file(sampleFile)
groundTruthClusters = PolygonAnnotatedCoordinates(sampleGeoDF, c.datafile_path("sample", stage=c.GROUND_TRUTH))

In [None]:
# creating the evaluator
groundTruthCoordinates, groundTruthLabels = groundTruthClusters.getCoordinatesLabels()
supervisedEvaluator = ClusteringModelSupervisedEvaluator(groundTruthCoordinates, trueLabels=groundTruthLabels)

### Setup tracking

Now comes the new part - we create a tracking experiment and set it in the evaluator

In [None]:
experiment = ClearMLExperiment(projectName="Demos", taskName="notebook_experiment")
supervisedEvaluator.setTrackedExperiment(experiment)

As simple as that! Whenever we perform an evaluation, the results will be tracked. Depending on
the backend and the particular implementation of the experiment, the code and other information
like images will get tracked as well. We will demonstrated the tracking of the evaluation of a dbscan.

In [None]:
boundedDbscan = SKLearnCoordinateClustering(DBSCAN(eps=150, min_samples=20), minClusterSize=100)
supervisedEvaluator.computeMetrics(boundedDbscan)

In [None]:
# plots are tracked automatically on creation.
# Note that one should use fig.show instead of plt.show

fig, ax = plt.subplots(figsize=[6, 8])
ax.set_title("Sample Ground Truth clusters")
groundTruthClusters.plot(includeNoise=False, markersize=0.2, cmap="plasma", ax=ax)
fig.show()

In [None]:
fig, ax = plt.subplots(figsize=[6, 8])
ax.set_title("Predicted clusters")
boundedDbscan.plot(includeNoise=False, markersize=0.2, cmap="plasma", ax=ax, figsize=10)
fig.show()


In [None]:
# We can also add the summaries df to the experiment through explicit tracking

logger  = supervisedEvaluator.trackedExperiment.logger

logger.report_table(title="Clusters Summaries", series="pandas DataFrame", iteration=0,
                    table_plot=boundedDbscan.summaryDF().sort_values("numMembers"))

The same mechanism works in the hyperopts module. The experiment can be set for GridSearch
or simulated annealing. One can also set the experiment in the evaluator that is passed to
the hyperopt objects and use that one for tracking instead. Here an example


In [None]:
# because of how trains works and because we are using it in jupyter, we need to manually close the existing task
# even though the docu says, with reuse_last_task_id=False a new task would be created...
# this step is unnecessary if one has one experiment per script execution
# we also unset the tracked experiment in the evaluator and prepare a new one for the grid search

supervisedEvaluator.trackedExperiment.task.close()
supervisedEvaluator.unsetTrackedExperiment()


def dbscanFactory(**kwargs):
    return SKLearnCoordinateClustering(DBSCAN(**kwargs), minClusterSize=100)

parameterOptions = {
    "min_samples": [10, 20],
    "eps": [50, 150]
}

dbscanGridSearch = GridSearch(dbscanFactory, parameterOptions,
                              csvResultsPath=os.path.join(c.temp, "dbscanGridSearchCsv"))
gridExperiment = ClearMLExperiment(projectName="Demos", taskName="notebook_grid_search")
dbscanGridSearch.setTrackedExperiment(gridExperiment)

In [None]:
searchResults = dbscanGridSearch.run(supervisedEvaluator, sortColumnName="numClusters")

In [None]:
# unfortunately, the trains experiment interface is at conflict with the grid search
# the most pragmatic solution is to simply attach the dataframe to the experiment and to use it for further evaluation

dbscanGridSearch.trackedExperiment.logger.report_table(title="Results", series="pandas DataFrame", iteration=0,
                    table_plot=searchResults)
