# Dacapo

DaCapo is a framework that allows for easy configuration and execution of established machine learning techniques on arbitrarily large volumes of multi-dimensional images.

DaCapo has 4 major configurable components:
1. **dacapo.datasplits.DataSplit**

2. **dacapo.tasks.Task**

3. **dacapo.architectures.Architecture**

4. **dacapo.trainers.Trainer**

These are then combined in a single **dacapo.experiments.Run** that includes your starting point (whether you want to start training from scratch or continue off of a previously trained model) and stopping criterion (the number of iterations you want to train).

## Environment setup
If you have not already done so, you will need to install DaCapo. You can do this by first creating a new environment and then installing DaCapo using pip.

```bash
conda create -n dacapo python=3.10
conda activate dacapo
```

Then, you can install DaCapo using pip, via GitHub:

```bash
pip install git+https://github.com/janelia-cellmap/dacapo.git
```

Or you can clone the repository and install it locally:

```bash
git clone https://github.com/janelia-cellmap/dacapo.git
cd dacapo
pip install -e .
```

Be sure to select this environment in your Jupyter notebook or JupyterLab.

## Datasplit
Where can you find your data? What format is it in? Does it need to be normalized? What data do you want to use for validation?

In [2]:
from dacapo import DataConfig
data_config = DataConfig(
    # optional name
    datasets = [
        {"role": "train", "raw": "dsb_nuclei.zarr/train/raw", "gt": "dsb_nuclei.zarr/train/gt"},
        {"role": "validate", "raw": "dsb_nuclei.zarr/validate/raw", "gt": "dsb_nuclei.zarr/validate/gt"},
    ]
    # gets store in init and stores itself
)


## Task
What do you want to learn? An instance segmentation? If so, how? Affinities,
Distance Transform, Foreground/Background, etc. Each of these tasks are commonly learned
and evaluated with specific loss functions and evaluation metrics. Some tasks may
also require specific non-linearities or output formats from your model.

In [3]:
from dacapo import TaskConfig

task_config = TaskConfig(
    target="semantic",
    classes=2, # overwrites num_classes from zarr metadata, otherwise inferred from metadata
    # optional dict mapping label ids to names
)

## Architecture

The setup of the network you will train. Biomedical image to image translation often utilizes a UNet, but even after choosing a UNet you still need to provide some additional parameters. How much do you want to downsample? How many convolutional layers do you want?

In [4]:
# optional
# from dacapo import ArchitectureConfig

# architecture_config = ArchitectureConfig()

## Trainer

How do you want to train? This config defines the training loop and how the other three components work together. What sort of augmentations to apply during training, what learning rate and optimizer to use, what batch size to train with.

In [5]:
# optional
# from dacapo import TrainerConfig

# trainer_config = TrainerConfig()

## Run
Now that we have our components configured, we just need to combine them into a run and start training. We can have multiple repetitions of a single set of configs in order to increase our chances of finding an optimum.

In [6]:
# optional
# from dacapo import StartConfig

# start_config = StartConfig(
#     "setup04",
#     "best",
# )

from dacapo import Experiment

experiment = Experiment(
    name="dsb_nuclei", # complain if not unique, allow overwrite, allow load from name
    data_config,
    task_config,
    # architecture_config,
    # trainer_config,
    # start_config,
    num_iterations=200000,
    # num_repetitions=1,
    # validation_interval=5000,
    # overwrite=False, # set to true to overwrite any previous experiment with the same name ( data and models included )
)
# experiment = Experiment.load("dsb_nuclei")

example_finetuned_example_jrc_mus-livers_peroxisome_8nm_example_distances_8nm_peroxisome_example_attention-upsample-unet_example_default_one_label_finetuning__0
example_finetuned_example_jrc_mus-livers_peroxisome_8nm_example_distances_8nm_peroxisome_example_attention-upsample-unet_example_default_one_label_finetuning__1
example_finetuned_example_jrc_mus-livers_peroxisome_8nm_example_distances_8nm_peroxisome_example_attention-upsample-unet_example_default_one_label_finetuning__2


## Train

To train one of the runs, you can either do it by first creating a **Run** directly from the run config

In [None]:
experiment.run() # allow run in compute context, if need to continue do so, if done report

# or experiment.report()

If you want to start your run on some compute cluster, you might want to use the command line interface: dacapo train -r {run_config.name}. This makes it particularly convenient to run on compute nodes where you can specify specific compute requirements.

# Apply

Once you have trained your model, you can use it to make predictions on new data. 

In [None]:
model = experiment.get_model(
    # criterion=task.default_criterion(),
    # validation_dataset=data.get_datasets("validate"),
)

# dacapo.predict(model, "dsb_nuclei.zarr/test/raw", "dsb_nuclei.zarr/test/pred" # doesn't post-process

dacapo.apply(model, input_path, output_path)