Linajea Tracking Example
=====================


This example show all steps necessary to generate the final tracks, from training the network to finding the optimal ILP weights on the validation data to computing the tracks on the test data.

- train network
- predict on validation data
- grid search weights for ILP
  - solve once per set of weights
  - evaluate once per set of weights
  - select set with fewest errors
- predict on test data
- solve on test data with optimal weights
- evaluate on test data

In [None]:
%load_ext autoreload
%autoreload 2
import logging
import os
import sys
import time
import types

import numpy as np
import pandas as pd

from linajea.config import TrackingConfig
import linajea.evaluation
from linajea.process_blockwise import (extract_edges_blockwise,
                                       predict_blockwise,
                                       solve_blockwise)
from linajea.training import train
import linajea.config
import linajea.process_blockwise
import linajea

In [None]:
logging.basicConfig(
        level=logging.INFO,
        format='%(asctime)s %(name)s %(levelname)-8s %(message)s')

Configuration
--------------------

All parameters to control the pipeline (e.g. model architecture, data augmentation, training parameters, ILP weights) are contained in a configuration file (in the TOML format https://toml.io)

You can use a single monolithic configuration file or separate configuration files for a subset of the steps of the pipeline, as long as the parameters required for the respective steps are there.

Familiarize yourself with the example configuration files and have a look at the documentation for the configuration to see what is needed. Most parameters have sensible defaults; usually setting the correct paths and the data configuration is all that is needed to start. See `run_multiple_samples.ipynb` for an example setup that can (optionally) handle multiple samples and automates the process of selecting the correct data for each step as much as possible. For training `train_data` has to be set, and for validation and testing `inference_data`.

You can modify the `config_file` variable to point to the config file you would like to use. Make sure that the file paths contained in it point to the correct destination, for instance that they are adapted to your directory structure.

In [None]:
train_config_file = "config_example_single_sample_train.toml"
train_config = TrackingConfig.from_file(train_config_file)
os.makedirs(train_config.general.setup_dir, exist_ok=True)

Training
------------

To start training simply pass the configuration object to the train function. Make sure that the training data and parameters such as the number of iterations/setps are set correctly.

To train until convergence will take from several hours to multiple days.

In [None]:
train(train_config)

Validation
--------------

After the training is completed we first have to determine the optimal ILP weights.
This is achieved by first creating the prediction on the validation data and then performing a grid search by solving the ILP and evaluating the results repeatedly.

MongoDB is used to store the computed results. A `mongod` server has to be running before executing the remaining cells.
See https://www.mongodb.com/docs/manual/administration/install-community/ for a guide on how to install it (Linux/Windows/MacOS)
Alternatively you might want to create a singularity image (https://github.com/singularityhub/mongo). This can be used locally but will be necessary if you want to run the code on an HPC cluster and there is no server installed already.

In [None]:
validation_config_file = "config_example_single_sample_val.toml"
val_config = TrackingConfig.from_file(validation_config_file)
os.makedirs(val_config.general.setup_dir, exist_ok=True)

### Predict Validation Data

To predict the `cell_indicator` and `movement_vectors` on the validation data make sure that `args.validation` is set to `True`, then execute the next cell.

Depending on the number of workers used (see config file) and the size of the data this can take a while.

In [None]:
predict_blockwise(val_config)

### Extract Edges Validation Data

For each detected cell, look for neighboring cells in the next time frame and insert an edge candidate for each into the database.

In [None]:
extract_edges_blockwise(val_config)

### ILP Weights Grid Search

#### Solve on Validation Data

Make sure that `solve.grid_search` is set to `True`. The parameter sets to try are generated automatically.

In [None]:
linajea.process_blockwise.solve_blockwise(val_config)

#### Evaluate on Validation Data

In [None]:
validation_config_file = "config_example_single_sample_val.toml"
val_config = TrackingConfig.from_file(validation_config_file)
parameters = val_config.solve.parameters
for params in parameters:
    val_config.solve.parameters = [params]
    linajea.evaluation.evaluate_setup(val_config)

### Predict Test Data

Now that we know which ILP weights to use we can predict the `cell_indicator` and `movement_vectors` on the test data and compute the tracks. Make sure that `args.validation` is set to `False` and `solve.grid_search` and `solve.random_search` are set to `False`.

In [None]:
test_config_file = "config_example_single_sample_test.toml"
test_config = TrackingConfig.from_file(test_config_file)
print(test_config.inference_data.data_source)

In [None]:
predict_blockwise(test_config)

### Solve on Test Data

Then we can solve the ILP on the test data. We select the ILP weights that resulted in the lowest overall number of errors on the validation data.

In [None]:
score_columns = ['fn_edges', 'identity_switches',
                 'fp_divisions', 'fn_divisions']
if not val_config.general.sparse:
    score_columns = ['fp_edges'] + score_columns

results = linajea.evaluation.get_results_sorted(
    val_config,
    filter_params={"val": True},
    score_columns=score_columns,
    sort_by="sum_errors")

test_config.solve.parameters = [val_config.solve.parameters[0]]
test_config.solve.parameters[0].weight_node_score = float(results.iloc[0].weight_node_score)
test_config.solve.parameters[0].selection_constant = float(results.iloc[0].selection_constant)
test_config.solve.parameters[0].track_cost = float(results.iloc[0].track_cost)
test_config.solve.parameters[0].weight_edge_score = float(results.iloc[0].weight_edge_score)
test_config.solve.parameters[0].weight_division = float(results.iloc[0].weight_division)
test_config.solve.parameters[0].weight_child = float(results.iloc[0].weight_child)
test_config.solve.parameters[0].weight_continuation = float(results.iloc[0].weight_continuation)
print(test_config.solve.parameters[0], type(test_config.solve.parameters[0].weight_continuation))

In [None]:
solve_blockwise(test_config)

### Evaluate on Test Data

In [None]:
report = linajea.evaluation.evaluate_setup(test_config)
print(report.get_short_report())