Skip to content

Training Dataset

jeanollion edited this page Jun 26, 2021 · 21 revisions

Tain TaLiSSman

Input data

Transmitted Light Stack

For prediction we recommend stacks of 5 slices with a 0.2µm step, in range [-0.6µm, -1.4µm] (relatively to the focal plane), however for the training procedure we advise to acquire stacks of about 25 slices with a 0.1µm step in the range [-0.1µm, -2µm]. This will allow to improve robustness to loss of focus and to defects in flatness of the agar-pad thanks to a data augmentation step that selects randomly 5 slices with a 0.2µm within the 25 slices.

Labels

TaLiSSman network is trained in a supervised way, which means labeled images (i.e. images with one distinct integer value per bacteria) are required. We recommend to use a constitutive cytoplasmic fluorescent marker, that can be segmented using rule-driven algorithm (e.g. as in bacmman's example dataset2 configuration) and manually curated.

In this tutorial we will assume that this fluorescence channel is available as well as a method to segment it.

Generate Training set with BACMMAN

Note that all those steps are also detailed in the article Ollion et al. Nature Protocols, 2019.. It is recommended to read it, in particular to get familiar with the notions used in bacmman (channel image, object class, pre-processing, processing pipeline, selections, etc...)

Install bacmman with tensorflow 2 wrapper

  • Follow instructions in this page to install BACMMAN.
  • In the Deep Learning section, follow the instructions for Tensorflow 2.x with CPU support (GPU is for advanced users)
  • Start BACMMAN from the menu Plugins > BACMMAN > BACteria in Mother Machine ANalyzer

Configuration

  • If not already done, from the Home tab: right-click on Working Directory and choose a folder that will contain the data.
  • Create a new dataset using the provided configuration template:
  • From the menu Dataset select New dataset from Github
  • In the github credentials enter jeanollion as username (first text area) and click on Connect
  • Choose the configuration: TaLiSSman > training
  • Click Ok.
  • This configuration includes a deep learning denoising step. To download the model weights:
  • From the Import menu choose DL Model Library
  • If necessary: In the github credentials enter jeanollion as username and press enter
  • Unfold the item: TaLiSSman > bacteria denoising, double click on the link. If a web browser does not open automatically, the link will be copied to the clipboard, paste it into a web browser.
  • Download and unzip the weights into a folder named DLModels located in the Working Directory previously set.
  • Close the DL Model Library

Note: In this example configuration template is stored on a github account, which allows management of configuration templates, for further details see the documentation on github library.

Import data and run

  • Download the example dataset and unzip the example dataset.
  • Import images using the command Run > Import/re-link Images and select the folder containing the images. The 4 imported positions will be displayed in the Position panel of the Home tab.
  • Run Pre-Processing and Segmentation: in the Home tab, select Pre-Processing and Segmentation and Tracking in the Tasks panel, and use the command Run > Run Selected Tasks

Notes:

  • If the image format differs, refer to the documentation to configure the import method.
  • The configuration template includes a pre-processing step that selects a few slices, that should be configured according to your data.
  • This step allows to select 20 slices on one side of the focus. The neural network will use only 5 slices with 0.2µm step in the end but providing more slices for training step allows to include a data augmentation step that improves robustness to imprecision on focus or lack of flatness of the sample.
  • Note that in this dataset the first slice is always brighter and should be removed.

Display Results

From the Data Browsing tab:

  • Right-click on a position, select Open Kymograph > BacteriaFluo.
  • The position will be displayed and segmented objects can be selected interactively.
  • Press ctrl + A to display all segmented objects.

Manual Curation

In this step, we will correct segmentation errors manually:

  • False positives are erased
  • False negatives are created
  • Merged bacteria are split
  • Over-segmented bacteria are merged

Resources:

Export Training Set

Create Selections

Selections are sub-populations of segmented object. To export the training dataset, we will create two distinct selections, one for training and one for validation. Validation should represent around 25% of the total dataset. Those two selection should be mutually exclusive. These selections will contain the parent object of bacteria (i.e. the segmented objects that contain the bacteria), in this case the whole viewfield.

To do so, from the Segmentation & Tracking Results panel:

  1. select all the positions that will be included in the training selection.
  2. From the right-click menu, choose Create Selection > Viewfield. A selection named Viewfield will be created (if a selection with the same named was already existing, it will be overwritten) and displayed in the Selections panel.
  3. Right-click on the selection, choose duplicate and enter train as selection name (this name must match the name set in the parameter training_selection_name of the training notebook).
  4. Repeat steps 1, 2, and 3 for the validation selection (name it eval: this name must match the name set in the parameter validation_selection_name of the training notebook)

Export Dataset

This step will generate a single hdf5 file that will contain both the transmitted-light stacks and the labeled images for all the positions included in the train and eval selections previously defined.

From the Home panel:

  • Right-click in the Tasks to execute panel and choose Add new dataset extraction Task to List
  • Set the same settings as in the screenshot below.
    • Right-click on the different items of the right panel to modify them.
    • The first extracted feature corresponds to the transmitted-light stack, so the object class associated to the corresponding channel is selected.
    • The second one corresponds to the labeled object class, so the object class with segmented bacteria is selected.
    • Choose an output file that do not exist yet. Note that on macosx, one may need to create an empty file first and select it.
  • Click OK
  • Right-click in the Tasks to execute panel and choose Run all Tasks

Train TaLiSSman

  • In this tutorial training will be performed using google-colab a service that provides free GPUs for a limited time, thus a google account is required.
  • The previously generated dataset should be uploaded to a google drive, and shared publicly.
  • Follow this link to open the notebook.
  • From the File menu choose Save Copy in Drive
  • Follow instructions on the notebook to train a TaLiSSman network
  • In the Export section the commands will export the trained weights to a zip file, download the zip file and extract it.
  • See this tutorial to use TaLiSSman in BACMMAN.