Skip to content

Required parameters for COMnet

Timothy Dunn edited this page Nov 5, 2020 · 3 revisions

If using the COMnet, there are a few key parameters that must be specified in the COMnet base config (see ./configs/com_mouse_config.yaml for an example).

io_config

string. This is the path, relative to where you are calling com-train or com-predict, to the project-specific io.yaml file that defines paths to required files, such as output directories and labeled data.

💡 This file is often the same one used by DANNCE. In most cases you will be launching com-train or com-predict from within a project directory that contains an io.yaml file, so simply leaving this as io_config: io.yaml will point DANNCE to the correct file every time.

batch_size.

int. This is the number of timepoints that will be included in a batch of samples during training and prediction. This value is constrained by the amount of GPU memory you can access. Because this is in unit of timepoints, not frames, the total number of images in the batch is equal to batch_size*(no. of cameras)

💡For prediction, increasing batch_size may increase prediction speed. For training, batch_size can be modified as a hyperparameter. The effect of batch size on training is an open area of research in deep learning, although there is agreement that the optimal batch size is problem- and data-specific. For the COMnet, the amount of images you can fit into memory depends on more parameters than DANNCE. Increases in the number of cameras used decreases the achievable batch size. Increases in the downfac amount increase the achievable batch size by creating smaller image dimensions. In the same way, increases and decreases in native image resolution also affect the batch size.

epochs.

int. This is the number of times your full dataset will be looped over during training.

💡You should need fewer epochs as you increase the number of images in your training dataset. With ~100 timepoints in the training set, 100-200 epochs is probably sufficient. Inspect your training and validation loss in your tensorboard logs. If using a validation set, you should continue to train if the validation loss has not yet plateaued.

downfac.

int (even). This is the factor by which your images will be downsampled before getting passed through the network.

💡 The more you downsample your images (the higher downfac), the faster training and prediction will be. Because you only need a coarse estimate of the animals position from the COMnet, downsampling doesn't typically degrade performance noticeably. We typically use downfac: 4.

num_validation_per_exp

int. Number of samples (i.e. timepoints; per animal) used for assessing validation metrics during training.

💡The more validation samples you use, the more accurately you can assess the accuracy of your models during training, thus allowing you better determine at which epoch you should stop training, or which of several alternative models will track better. However, the more validation samples you use, the fewer samples you can use for the actual training. We recommend using num_validation_per_exp: 0 unless you are using over 100 samples per animal.

max_num_samples

int. During prediction, this sets the total number of timepoints evaluated.

com_finetune_weights

string. If finetuning, this must be the path to the directory containing the pretrained weights file.

crop_height

[top_pixel, bottom_pixel] (int). An integer array setting the bounds of the input images (at raw resolution) in terms of pixel position. See crop_width

crop_width

[left_pixel, right_pixel] (int). An integer array setting the bounds of the input images (at raw resolution) in terms of pixel position.

💡 The COMnet expects that the size of each image dimensions be a multiple of 32. So raw images can be cropped using these parameters if necessary.