### **Cell tracking with Trackastra**
Kannan Annamalai, Xiu Yit Foo, Viththegan Sresutharsan, Muhamed Tafech, Reira Ikenaga, Rudy Harricks, Sayel Elwan

## **1.Introduction**
This Google Colab notebook provides a simple and user friendly pipeline to train a deep neural network model using Trackastra, as well as to run inference and obtain tracking statistic. It uses the transformer-based architecture developed by Benjamin Galluser and Martin Weigert [(GitHub)](https://github.com/weigertlab/trackastra).
This iteration trains a trackastra model on a provided Cell Tracking Challenge data set. The model can either be trained upon a previous trackastra model or train a new model from scratch.

This framework is designed for 2D+Time datasets sourced from the [Cell Tracking Challenge website](https://celltrackingchallenge.net/2d-datasets/).

Before use, ensure that Google Colab's runtime is set to T4 GPU-accelerated. This is necessary for metric learning and comprehensive feature extraction that will create the tracking model.

This method also requires the user to mount their Google Drive in which the following directories will be accessed and output to:

Input directories:


*   ```PROJECT/data/``` directory containing cell training data

Key output directories:


*  ```PROJECT/trackastra_out/trained_models/``` saves trained model output


```
PROJECT/
├── (INPUT) data/
│   └── dataset_name
├── (INPUT) trackastra_trained_models (if using an existing model)/
│   └── model_name
└── (OUTPUT) trackastra_out/
    ├── trained_models/
    │   ├── trackastraTrained<dataset_name>.zip
```

If the user is only using a pre-trained model to run inference or benchmarking without the need for training a model, skip ahead to section 4 onwards for the inference code cells.


## **2. Setup Environment**

To begin, the ```DATASET_NAME``` variable will need to be updated to the name of the dataset being used for training. This name is taken from the folder downloaded from the [Cell Tracking Challenge website](https://celltrackingchallenge.net/2d-datasets/).

In [None]:
import os
ROOT = "/content/drive/MyDrive/PROJECT"
DATASET_NAME = "Fluo_C2DL_Huh7" # Update to correspond to data being used
DATA_FOLDER = os.path.join(ROOT, "data", DATASET_NAME)

Next, we will mount Google Drive. Please authenticate using your Google Drive
which contains the necessary data and directories as outlined above. If successful, the contents of the data folder will be printed.

In [None]:
from google.colab import drive
drive.mount('/content/drive', force_remount=True)

os.listdir(DATA_FOLDER)

Mounted at /content/drive


['01_GT', '02', '02_GT', '01', '02_Masks']

Next, we will install necessary files and dependencies. The user does not need to modify any of the following code. The installation can take 5-10minutes to complete running.

In [None]:
import os
! git clone https://github.com/weigertlab/trackastra.git

!nvidia-smi -L
!python -V
!pip install -q jedi>=0.16
!pip -q install -U pip setuptools wheel

import re, subprocess, sys, textwrap

# Read CUDA version from nvidia-smi
out = subprocess.check_output(["nvidia-smi"]).decode()
m = re.search(r"CUDA Version:\s*([0-9]+)\.([0-9]+)", out)
cuda = (int(m.group(1)), int(m.group(2))) if m else (12, 1)

# Map to PyTorch index URL
if cuda >= (12, 1):
    index_url = "https://download.pytorch.org/whl/cu121"
elif cuda >= (11, 8):
    index_url = "https://download.pytorch.org/whl/cu118"
else:
    index_url = "https://download.pytorch.org/whl/cpu"

print(f"Detected CUDA {cuda[0]}.{cuda[1]} → using {index_url}")
!pip -q install --upgrade torch torchvision --index-url $index_url

%cd /content/trackastra
!pip install -e ".[train]"

import yaml

config_path = "/content/trackastra/scripts/example_config.yaml"

with open(config_path, "r") as f:
    cfg = yaml.safe_load(f)

cfg["input_train"] = [DATA_FOLDER + "/02"]
cfg["input_val"]   = [DATA_FOLDER + "/01"]


with open(config_path, "w") as f:
    yaml.safe_dump(cfg, f, sort_keys=False)

from pathlib import Path
p = Path("/content/trackastra/scripts/train.py")
src = p.read_text()

block = """
import pathlib

def _path_to_str(dumper, data):
    return dumper.represent_str(str(data))

for _name in ("SafeDumper", "CSafeDumper", "Dumper", "CDumper"):
    _d = getattr(yaml, _name, None)
    if _d:
        yaml.add_representer(pathlib.Path, _path_to_str, Dumper=_d)
        yaml.add_multi_representer(pathlib.Path, _path_to_str, Dumper=_d)

def _to_yamlable(obj):
    if isinstance(obj, pathlib.Path):
        return str(obj)
    if isinstance(obj, dict):
        return {k: _to_yamlable(v) for k, v in obj.items()}
    if isinstance(obj, (list, tuple, set)):
        T = type(obj)
        return T(_to_yamlable(v) for v in obj)
    return obj
"""
insert_at = src.find("import yaml")
end = src.find("\n", insert_at)
src = src[:end] + "\n" + block + "\n" + src[end:]
p.write_text(src)
print("✅ Added _path_to_str and _to_yamlable helpers to train.py.")

fatal: destination path 'trackastra' already exists and is not an empty directory.
GPU 0: Tesla T4 (UUID: GPU-32e77cd1-a764-0683-9940-b21164c092a0)
Python 3.12.12
Detected CUDA 12.4 → using https://download.pytorch.org/whl/cu121
/content/trackastra
Obtaining file:///content/trackastra
  Installing build dependencies ... [?25l[?25hdone
  Checking if build backend supports build_editable ... [?25l[?25hdone
  Getting requirements to build editable ... [?25l[?25hdone
  Preparing editable metadata (pyproject.toml) ... [?25l[?25hdone
Building wheels for collected packages: trackastra
  Building editable for trackastra (pyproject.toml) ... [?25l[?25hdone
  Created wheel for trackastra: filename=trackastra-0.4.3.dev1+g5e103d9c7-0.editable-py3-none-any.whl size=8658 sha256=27e118c0ec9f7cf30215360829e1f947386a290883f4d46aeac6f714eaa80671
  Stored in directory: /tmp/pip-ephem-wheel-cache-0ri8jdqx/wheels/5f/71/d8/f456f41cb2d610d9ac089083cf43637bac74da444ddcf7b0d3
Successfully built track

## **3. Training the Trackastra model**

Now we are ready to train the model. There are 2 options given. If you would like to train a model from scratch, run the code given in 3.1. If you wish to train upon an existing Trackastra model, proceed to 3.2 instead and provide the location to the existing model. After running the training, proceed to 3.3 to save the best model to your Google Drive.
\
\
\
**NOTE**: If you would like to stop the training halfway while training, you can click the stop running in Google Colab or keyboard interrupt. The best model that has been trained so far will be saved. You can then run 3.3 to save the model to Google Drive.

# **3.1 Training a model from scratch**
This code trains a new Trackastra model and obtains the best model using validation loss as the metric. Do note that training a model from scratch is extremly slow and can take hours to complete training. If you would like to change the number of epochs that the training runs, you can modify the EPOCHS variable. Else, you can click run and it will use a default value of 50.
\
\
NOTE: Proceed to 3.3 to save the trained model to your Google Drive. If you fail to do so, the trained model will be lost when the runtime is disconnected.

In [None]:
EPOCHS = 50

%cd /content/trackastra/scripts

!python train.py --config example_config.yaml --epochs {EPOCHS}

/content/trackastra/scripts

INFO:root:Model has 5.8 million parameters
Using 16bit Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
INFO:__main__:Resuming from None
INFO:__main__:Using lightning datamodule
INFO:trackastra.data.distributed:Loading TRAIN data
INFO:trackastra.data.data:ROOT (config): 	/content/drive/MyDrive/PROJECT/data/Fluo_C2DL_Huh7/02
INFO:trackastra.data.data:ROOT (guessed): 	/content/drive/MyDrive/PROJECT/data/Fluo_C2DL_Huh7/02
INFO:trackastra.data.data:GT TRA (guessed):	/content/drive/MyDrive/PROJECT/data/Fluo_C2DL_Huh7/02_GT/TRA
INFO:trackastra.data.data:GT MASK (guessed):	/content/drive/MyDrive/PROJECT/data/Fluo_C2DL_Huh7/02_GT/TRA
INFO:trackastra.data.data:IMG (guessed):	/content/drive/MyDrive/PROJECT/data/Fluo_C2DL_Huh7/02
INFO:trackastra.data.data:Loading ground truth
INFO:trackastra.data.data:Loading images
INFO:trackastra.data.data:Loading detections
INFO:track

# **3.2 Training a model using a pre-trained model**
This code trains a Trackastra model upon a previously trained model and obtains the best model using validation loss as the metric.
\
\
Update the variable MODEL_NAME to the name of the folder which contains your trained model. If the model was previously trained using this notebook as well, simply unzip the saved folder and move it to the appropriate directory and it can be used to train on another dataset.
\
\
If you would like to change the number of epochs that the training runs, you can modify the EPOCHS variable. Else, you can click run and it will use a default value of 50.
\
\
NOTE: Proceed to 3.3 to save the trained model to your Google Drive. If you fail to do so, the trained model will be lost when the runtime is disconnected.

In [None]:
%cd /content/trackastra/scripts
MODEL_NAME = "trackastraTrained_DIC_C2DH_HeLa"
MODEL_FOLDER = os.path.join(ROOT, "trackastra_trained_models", MODEL_NAME)
EPOCHS = 50
!python train.py --config example_config.yaml --model {MODEL_FOLDER} --epochs {EPOCHS}




/content/trackastra/scripts
INFO:trackastra.model.model:Loading model state from /content/drive/MyDrive/PROJECT/trackastra_trained_models/trackastraTrained_DIC_C2DH_HeLa/model.pt
INFO:root:Model has 5.8 million parameters
Using 16bit Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
INFO:__main__:Resuming from None
INFO:__main__:Using lightning datamodule
INFO:trackastra.data.distributed:Loading TRAIN data
INFO:trackastra.data.distributed:Loading cached dataset from /content/trackastra/scripts/.cache/73bb16e765c3ecf30993ad8d241bf70a77bdecc5f2b16db9ea84518e5848a06b.pkl
INFO:trackastra.data.distributed:Loaded 25 TRAIN samples (in 0.0 s)


INFO:trackastra.data.distributed:Loading VAL data
INFO:trackastra.data.distributed:Loading cached dataset from /content/trackastra/scripts/.cache/ed9652d0eb21a39d66b1fe64cd7d316df753ffc07e65ebbcc566146d2cd8fc26.pkl
INFO:trackastra.data.distributed:Loaded 25

## **3.3 Moving trained model to Google Drive**
After you have completed training, run this block to save the trained model into Google Drive. Ensure this step runs successfully to avoid data loss when the runtime disconnects. The final line of the output will list the contents of the trained_models folder in trackastra_out. A zip file with the dataset name that has been trained will be shown if successful.

In [None]:
OUTPUT_NAME = "trackastraTrained_" + DATASET_NAME
!zip -r /content/{OUTPUT_NAME}.zip /content/trackastra/scripts/runs
!cp /content/{OUTPUT_NAME}.zip /content/drive/MyDrive/PROJECT/trackastra_out/trained_models
os.listdir("/content/drive/MyDrive/PROJECT/trackastra_out/trained_models")

updating: content/trackastra/scripts/runs/ (stored 0%)
updating: content/trackastra/scripts/runs/2025-10-24_09-32-37_example/ (stored 0%)
updating: content/trackastra/scripts/runs/2025-10-24_09-32-37_example/tb/ (stored 0%)
updating: content/trackastra/scripts/runs/2025-10-24_09-32-37_example/tb/version_0/ (stored 0%)
updating: content/trackastra/scripts/runs/2025-10-24_09-32-37_example/tb/version_0/events.out.tfevents.1761298365.a3e0370e5e7a.14992.0 (deflated 70%)
updating: content/trackastra/scripts/runs/2025-10-24_09-32-37_example/tb/version_0/hparams.yaml (stored 0%)
updating: content/trackastra/scripts/runs/2025-10-24_09-32-37_example/train_config.yaml (deflated 48%)
updating: content/trackastra/scripts/runs/2025-10-24_09-04-50_example/ (stored 0%)
updating: content/trackastra/scripts/runs/2025-10-24_09-04-50_example/tb/ (stored 0%)
updating: content/trackastra/scripts/runs/2025-10-24_09-04-50_example/tb/version_0/ (stored 0%)
updating: content/trackastra/scripts/runs/2025-10-24_0

['trackastraTrained_Fluo_C2DL_Huh7.zip']

## **4. Inference**
Now we provide a pipeline to use a trained Trackastra model to run cell tracking on a dataset. If ground truth is available, we also provide evaluation metrics of the cell tracking output.
Ensure the following drive structure needed for inference

Input directories:


*   ```PROJECT/data/``` directory containing dataset to run cell tracking and ground truth for evaluation (if available)

Key output directories:


*  ```PROJECT/trackastra_out/files_used_in_ctc_evaluation/``` saves cell tracking output


```
PROJECT/
├── (INPUT) data/
│   └── dataset_name (for cell tracking and evaluation)
├── (INPUT) trackastra_trained_models/
│   └── model_name (Trackastra model to be used for inference)
└── (OUTPUT) trackastra_out/
    ├── files_used_for_ctc_evaluation/
    │   ├── results_<dataset_name>
```

## **4.1 Setup Environment**

To begin, the ```DATASET_FOLDER``` variable will need to be updated to the location of the dataset to be used for inference. Update the folder with respect to PROJECT/data directory. This dataset folder should contain images of cells (in .tif format) to run cell tracking on.\
Next, provide the location to the folder which contains segmentation masks for the images. Since Trackastra uses pre-segmented images, the segmentation masks are also required for each input image. Ensure that there are the same number of masks as input images and they are also in .tif format.\
If segmentation masks of input data is not available, any cell segmentation algorithm could be run to generate the masks. We provide instructions to use CellPose to generate segmentation masks in the appendix.\
Finally, update the location to the trained Trackastra mdoel to be used for inference. If the training script above is used, simply unzip the output folder that contains the model and provide the location to that folder.

In [None]:
import os
ROOT = "/content/drive/MyDrive/PROJECT"
DATASET_NAME = "DIC_C2DH_HeLa" # Update to dataset location
IMAGE_DATA = "02" #Update the folder that contains input images (in .tif format)
MASK_DATA = "02_ST/SEG" #Update to the folder which contains segmentation masks.
TRAINED_MODEL = "trackastra_trained_models/trackastraTrained_DIC_C2DH_HeLa" #Update to folder that contains trained model
DATA_FOLDER = os.path.join(ROOT, "data", DATASET_NAME, IMAGE_DATA)
MASK_FOLDER = os.path.join(ROOT, "data", DATASET_NAME, MASK_DATA)
MODEL_FOLDER = os.path.join(ROOT, TRAINED_MODEL)


Next, we will mount Google Drive. Please authenticate using your Google Drive
which contains the necessary data and directories as outlined above. If successful, the contents of the data folder will be printed. If the Google Drive has already been mounted in training section, this can be skipped.

In [None]:
from google.colab import drive
drive.mount('/content/drive', force_remount=True)

os.listdir(DATA_FOLDER)

Mounted at /content/drive


['t035.tif',
 't058.tif',
 't040.tif',
 't056.tif',
 't012.tif',
 't005.tif',
 't017.tif',
 't041.tif',
 't061.tif',
 't055.tif',
 't014.tif',
 't083.tif',
 't057.tif',
 't002.tif',
 't018.tif',
 't038.tif',
 't027.tif',
 't037.tif',
 't009.tif',
 't044.tif',
 't069.tif',
 't072.tif',
 't063.tif',
 't016.tif',
 't004.tif',
 't015.tif',
 't013.tif',
 't074.tif',
 't078.tif',
 't053.tif',
 't082.tif',
 't079.tif',
 't026.tif',
 't054.tif',
 't028.tif',
 't033.tif',
 't059.tif',
 't068.tif',
 't036.tif',
 't020.tif',
 't034.tif',
 't076.tif',
 't081.tif',
 't030.tif',
 't001.tif',
 't024.tif',
 't029.tif',
 't060.tif',
 't043.tif',
 't049.tif',
 't051.tif',
 't052.tif',
 't042.tif',
 't066.tif',
 't025.tif',
 't022.tif',
 't064.tif',
 't039.tif',
 't046.tif',
 't047.tif',
 't075.tif',
 't071.tif',
 't050.tif',
 't003.tif',
 't062.tif',
 't008.tif',
 't048.tif',
 't021.tif',
 't065.tif',
 't010.tif',
 't032.tif',
 't080.tif',
 't067.tif',
 't031.tif',
 't006.tif',
 't023.tif',
 't019.tif',

Next, we will setup konda environment and install necessary files and dependencies. The user does not need to modify any of the following code. The installation can take around 10 minutes to complete running.

In [None]:
!pip install konda
import konda
konda.install()

!konda tos accept --override-channels --channel https://repo.anaconda.com/pkgs/main
!konda tos accept --override-channels --channel https://repo.anaconda.com/pkgs/r
!konda create --name trackastra python=3.10 --no-default-packages -y
!konda activate trackastra
!konda install -c conda-forge -c gurobi -c funkelab ilpy -y
!konda run "pip install \"trackastra[ilp,dev]\""
!konda run "pip install py-ctcmetrics"

Collecting konda
  Downloading konda-0.1.0-py3-none-any.whl.metadata (3.7 kB)
Downloading konda-0.1.0-py3-none-any.whl (7.3 kB)
Installing collected packages: konda
Successfully installed konda-0.1.0
Downloading Miniconda installer...
Installing Miniconda to /usr/local...
✅ Miniconda installed successfully!
Run '!conda --version' to check if conda is working.

📋 Usage examples:
  konda create -n my_env python=3.11 -y
  konda activate my_env
accepted Terms of Service for [4;94mhttps://repo.anaconda.com/pkgs/main[0m
accepted Terms of Service for [4;94mhttps://repo.anaconda.com/pkgs/r[0m
[1;33mJupyter detected[0m[1;33m...[0m
[1;32m2[0m[1;32m channel Terms of Service accepted[0m
Retrieving notices: - \ | / - \ | / - \ done
Channels:
 - defaults
Platform: linux-64
Collecting package metadata (repodata.json): / - \ | / - \ | / - done
Solving environment: | done


    current version: 25.7.0
    latest version: 25.9.1

Please update conda 

## **4.2 Inference**
The environment is now set up. Now, we are ready to run inference. We first copy the images, segmentation masks and trained model that we require

In [None]:
!cp -r {MODEL_FOLDER} ./MODEL
!cp -r {DATA_FOLDER} ./IMAGES
!cp -r {MASK_FOLDER} ./MASKS

Finally, we run the inference script based on the model, input images and segmentation masks provided. The output is a set of .tif images corresponding to the tracked image data, as well as a file res_track.txt, which shows the id of each cell identified and tracked. It will be copied over to the user's Google Drive

In [None]:
#run the trackastra model with DIC-C2DH-HeLa dataset as test dataset
import os
os.environ["MPLBACKEND"] = "Agg"

!konda run 'trackastra track --model-custom "./MODEL" -i "./IMAGES" -m "./MASKS" --output-ctc "/content/results"'

#in the results folder generated, rename man_track.txt to res_track.txt
!mv /content/results/man_track.txt /content/results/res_track.txt

#move results to drive
!mkdir -p /content/drive/MyDrive/PROJECT/trackastra_out/trackastra_inference/tracking_results
!cp -r /content/results/ /content/drive/MyDrive/PROJECT/trackastra_out/trackastra_inference/tracking_results

INFO:trackastra.model.model:Loading model state from /content/MODEL/model.pt
INFO:trackastra.model.model_api:Using device cuda
INFO:trackastra.model.model_api:Default batch size = 4 for model on cuda.
INFO:trackastra.model.model_api:Predicting weights for candidate graph
INFO:trackastra.data.wrfeat:Extracting features from 84 detections
INFO:trackastra.data.wrfeat:Using single process for feature extraction
Extracting features: 100% 84/84 [00:01<00:00, 72.09it/s]
INFO:trackastra.model.model_api:Building windows
Building windows: 100% 79/79 [00:00<00:00, 6329.15it/s]
INFO:trackastra.model.model_api:Predicting windows
Computing associations: 100% 20/20 [00:01<00:00, 18.86it/s]
INFO:trackastra.model.model_api:Running greedy tracker
INFO:trackastra.tracking.tracking:Build candidate graph with delta_t=1
INFO:trackastra.tracking.tracking:Added 988 vertices, 959 edges
INFO:trackastra.tracking.tracking:Running greedy tracker
Greedily matched edges: 100% 956/959 [00:00<00:00, 149083.68it/s]
Con

## **4.3 Benchmarking**
For the final step, we will evaluate the inference that was performed above. Multiple benchmarking statistics will be provided, but important statistics that are useful are TRA (tracking accuracy) and DET (detection accuracy). Provide the location to the ground truth of the dataset to evaluate the accuracy of the tracking results.

In [None]:
GROUND_TRUTH = "02_GT" #Update to location of ground truth
gt_folder = os.path.join(ROOT, "data", DATASET_NAME, GROUND_TRUTH)
!cp -r {gt_folder} ./ground_truth
!konda run 'ctc_evaluate --gt "./ground_truth" --res "/content/drive/MyDrive/PROJECT/trackastra_out/trackastra_inference/tracking_results"'

Evaluate sequence:  /content/drive/MyDrive/PROJECT/trackastra_out/files_used_in_ctc_evaluation/tracking_results  with ground truth:  ./ground_truthwith results:  {'Valid': 1, 'CHOTA': np.float64(0.9260586709067992), 'BC': None, 'CT': 0.7142857142857143, 'CCA': None, 'TF': np.float64(0.9005944252899934), 'SEG': 0.954924697420571, 'TRA': 0.9497736216307705, 'DET': 0.9514563106796117, 'MOTA': np.float64(0.941747572815534), 'HOTA': np.float64(0.9491777632565629), 'IDF1': np.float64(0.9472636815920398), 'MTML': None, 'FAF': 0.0, 'LNK': 0.938344873062974, 'OP_CTB': 0.9523491595256708, 'OP_CSB': 0.9531905040500913, 'BIO': None, 'OP_CLB': None, 'AOGM': np.float64(593.5), 'AOGM_0': np.float64(11816.5), 'AOGM_NS': np.int64(0), 'AOGM_FN': np.int64(50), 'AOGM_FP': np.int64(0), 'AOGM_ED': 4, 'AOGM_EA': np.int64(59), 'AOGM_EC': 1, 'gt_divisions': 5, 'tp_div(0)': 5, 'fp_div(0)': 2, 'fn_div(0)': 0, 'BC(0)': 0.8333333333333333, 'tp_div(1)': 5, 'fp_div(1)': 2, 'fn_div(1)': 0, 'BC(1)': 0.8333333333333333