<font size = 6>*GMOdetector notebook* </font><br>
**Template to analyze a new batch of images** (v.0.7.2)

In this workflow, images taken with the macroPhor Array dual RGB/hyperspectral imaging platform are analyzed by a workflow in which regression quantifies fluorescent signals in hyperspectral images, deep learning segments RGB images into different tissues, and these datasets are cross-referenced to produce statistics on growth of transgenic callus and shoot.

# Experiment ID and description

<div class="alert alert-block alert-success">
Provide a short description of the experiment in the below box. This should include unique identifier codes for the experiment, along with a short description of genotypes and treatments studied. The timepoint should also be included. </div>

# Parameters for analysis

<div class="alert alert-block alert-success">
The below variables must be modified appropriately every time this workflow is run over new images.
</div>

## Data location
The `data` variable below provides the **complete** path to the folder containing data to be analyzed. This should include all folders and subfolders in which the data of interest is organized by. For the organizational system used for our lab's data, this should follow the format "/Experiment/Subexperiment/Timepoint/"

In [3]:
data="/mnt/Elements_22/GTNEC_GWAS_poplar_transformation_necrotic_test/day1/"

## Sample information
Every experiment has a spreadsheet of metadata to organize treatment and genotype information for each plate, prepare labels, and randomize plates. [For details, see tutorial on preparing this spreadsheet](https://github.com/naglemi/GMOnotebook/blob/master/1_Decide_parameters/1_Metadata_and_randomization/1-Generate_randomization_scheme.ipynb).

In [4]:
randomization_datasheet="/mnt/Elements_22/GTNEC_GWAS_poplar_transformation_necrotic_test/GTNEC_labels.xlsx"

In [5]:
segmentation_mode="hyperspectral"

In [6]:
unregenerated_tissues="Background Stem Necrotic"

In [None]:
grid=12 # 12 or 20

## Detection of missing or contaminated explants

Set the `missing_explants` variable to `"Automatic"` or to the path of manually prepared data file. [For details, see this tutorial and example file](https://github.com/naglemi/GMOnotebook/tree/master/1_Decide_parameters/3_Other_parameters).

Note: Our automatic missing explant detection model is only trained for poplar.

In [7]:
missing_explants="None"

## Computing weights for fluorescent proteins

[See this notebook for details on all below fluorescent protein settings.](https://github.com/naglemi/GMOnotebook/blob/master/1_Decide_parameters/3_Other_parameters/3_Hyperspectral_settings.ipynb)

In [8]:
# ALL known fluorescent components in the sample should be included.
# Library has DsRed, ZsYellow, GFP, Chl, ChlA, ChlB, Noise
fluorophores=(GFP Chl Noise) # Order doesn't matter here. Names must match library.
desired_wavelength_range=(500 900) # (first last), e.g. (500 900)

### Producing false-color plots for fluorescent proteins

In [9]:
FalseColor_channels=(Chl GFP Noise) # (Red Green Blue), e.g. (Chl GFP Noise)
FalseColor_caps=(200 200 200) # (Red Green Blue); recommend 400 for reporters, 200 for others; e.g. (200 400 200)

### Producing summary statistics for fluorescent proteins

In [10]:
reporters=(GFP Chl) # Will compute summary stats for these proteins, e.g. (GFP) or (GFP Chl)
pixel_threshold=3 # If this many pixels... (recommended: 3)
reporter_threshold=38 # ...have this much signal (recommended: 38), then the tissue is "Positive"

### Cropping and alignment settings

If you wish to do alignment manually, follow [these tutorials](https://github.com/naglemi/GMOnotebook/tree/master/1_Decide_parameters/2_Align_and_crop_parameters) and change the settings below. However, the recommended workflow is now fully-automated.

In [11]:
pre_aligned_resized_grid_borders=("Automatic") # "(left,top,right,bottom)"
aligned_grid_borders="Automatic" # "top bottom right left"
mode="scikit" # "scikit" (recommended) or "opencv"
homography="ENTER_HOMOGRAPHY_NPY" # file path
hypercube_csv="ENTER_HYPERCUBE_TO_CSV" # file path
aligned_grid="ENTER_ALIGNED_GRID" # file path

## Plot settings

In [12]:
composite=0 # 1 to make composite images with side-by-side RGB, segmentation outputs and blended images (slow), 0 to skip
test_align_each_img=0 # 1 to make blended images of aligned RGB and hyperspectral layers for inspection, 0 to skip
width=9 # GGplot box/violin plot output (inches)
height=5 # GGplot box/violin plot output (inches)

## Parallelization

In [13]:
parallel=0 # 1 if parallelizing CubeGLM with GNU Parallel, 0 if not

## Paths to workflow modules

These three below settings need to be changed if you wish to use a different model for hyperspectral segmentation.

In [None]:
segmentation_model_key="/home/models/poplar_training_a2_v7.key.csv"
segmentation_model_path="/home/models/poplar_model_a2_v7_GBC.pkl"
segmentation_model_type="GBC"

These only need to be modified if you are setting up a `GMOnotebook` template on a new computer.

In [15]:
gmodetector_wd="/home/cubeglm/"
spectral_library_path="${gmodetector_wd}spectral_library/"
deeplab_path="/home/gmobot/poplar_model_2_w_contam/"
cubeml_path="/home/cubeml/"
alignment_path="/home/ImageAlignment/"
gmolabeler_path="/home/GMOlabeler/"
contamination_path="/home/DenseNet"
data_prefix="/mnt/output/"
output_directory_prefix="${data_prefix}gmodetector_out/"

In [14]:
cwd="/home/GMOnotebook"

<div class="alert alert-block alert-info">
With all above variables set, please "Save as..." with a filename referencing this specific dataset. <br>Finally, deploy the workflow (Step 4 in above instructions).
</div>

# Check if inputs are OK

This script will print warnings for any common problems that are detected with input variables we set above.

In [None]:
Rscript ${cwd}/intermediates/are_inputs_ok.R \
  --data "$data" \
  --randomization_datasheet "$randomization_datasheet" \
  --segmentation_mode "$segmentation_mode" \
  --unregenerated_tissues "$unregenerated_tissues" \
  --grid "$grid" \
  --missing_explants "$missing_explants" \
  --fluorophores "$fluorophores" \
  --desired_wavelength_range "$desired_wavelength_range" \
  --FalseColor_channels "$FalseColor_channels" \
  --FalseColor_caps "$FalseColor_caps" \
  --reporters "$reporters" \
  --pixel_threshold "$pixel_threshold" \
  --reporter_threshold "$reporter_threshold" \
  --segmentation_model_key "$segmentation_model_key" \
  --segmentation_model_path "$segmentation_model_path" \
  --gmodetector_wd "$gmodetector_wd" \
  --spectral_library_path "$spectral_library_path" \
  --deeplab_path "$deeplab_path" \
  --cubeml_path "$cubeml_path" \
  --alignment_path "$alignment_path" \
  --gmolabeler_path "$gmolabeler_path" \
  --contamination_path "$contamination_path" \
  --data_prefix "$data_prefix" \
  --output_directory_prefix "$output_directory_prefix" \
  --cwd "$cwd"


# Automated workflow to be deployed

See the below code for a walkthrough of how GMOnotebook works, or view the outputs after running the workflow for help troubleshooting errors in specific steps of analysis.

<div class="alert alert-block alert-danger"> <b>Danger:</b> Do not modify any below code without creating a new version of the template notebook. During routine usage, this workflow should be customized only by modifying variables above, while leaving the below code unmodified. </div>

These internal variables are set automatically.

In [16]:
datestamp=$(date +”%Y-%m-%d”)
data_folder=$(echo $data | cut -d/ -f5-)
timepoint="$(basename -- $data_folder)"
output_directory_full="$output_directory_prefix$data_folder"
dataset_name=$(echo $data_folder | sed -e 's/\///g')

Time analysis begins:

In [17]:
echo $(date)

Tue Dec 12 02:22:15 PM PST 2023


## Quantification of fluorescent proteins by regression

The Python package `CubeGLM` is used to quantify fluorescent proteins in each pixel of hyperspectral images via linear regression. Hyperspectral images are regressed over spectra of known components, and pixelwise maps of test-statistics are constructed for each component in the sample. This approach to quantifying components of hyperspectral images is described in-depth in the Methods section from <a href="https://link.springer.com/article/10.1007/s40789-019-0252-7" target="_blank">Böhme, et al. 2019</a>. Code and documentation for `CubeGLM` is on <a href="https://github.com/naglemi/GMOdetector_py" target="_blank">Github</a>.

In [16]:
cd $data_prefix

In [17]:
job_list_name="$dataset_name.jobs"

In [18]:
rm -rf $job_list_name

In [19]:
for file in $data/*.hdr
do
 if [[ "$file" != *'hroma'* ]] && [[ "$file" != *'roadband'* ]]; then
  echo "python -W ignore ${gmodetector_wd}/wrappers/analyze_sample.py \
--file_path $file \
--fluorophores ${fluorophores[*]} \
--min_desired_wavelength ${desired_wavelength_range[0]} \
--max_desired_wavelength ${desired_wavelength_range[1]} \
--red_channel ${FalseColor_channels[0]} \
--green_channel ${FalseColor_channels[1]} \
--blue_channel ${FalseColor_channels[2]} \
--red_cap ${FalseColor_caps[0]} \
--green_cap ${FalseColor_caps[1]} \
--blue_cap ${FalseColor_caps[2]} \
--plot 1 \
--spectral_library_path "$spectral_library_path" \
--output_dir $output_directory_full \
--threshold 38" >> $job_list_name
 fi
done

In [22]:
if [ $parallel -eq 1 ]
then
    parallel --jobs 20 -a $job_list_name
fi

if [ $parallel -eq 0 ]
then
    bash $job_list_name
fi

Running GMOdetector version 0.0.875
load mode isload_full_then_crop
Saving to /mnt/output/gmodetector_out/day1/GTNEC1_15.0_F1.9_L100LEB30_184042_0_0_0_Fluorescence_weights.hdf with key of weights
Saving summary stats w/ threshold38.0 to /mnt/output/gmodetector_out/day1/GTNEC1_15.0_F1.9_L100LEB30_184042_0_0_0_Fluorescence_summary.csv
Producing image channel for GFP with cap 200 in color green
Producing image channel for Chl with cap 200 in color red
Producing image channel for Noise with cap 200 in color blue
Saving image to /mnt/output/gmodetector_out/day1/GTNEC1_15.0_F1.9_L100LEB30_184042_0_0_0_Fluorescence.png
Saving image metadata to /mnt/output/gmodetector_out/day1/GTNEC1_15.0_F1.9_L100LEB30_184042_0_0_0_Fluorescence.csv

Finished running sample /mnt/Elements_22/GTNEC_GWAS_poplar_transformation_necrotic_test/day1//GTNEC1_15.0_F1.9_L100LEB30_184042_0_0_0_Fluorescence.hdr in 11.68317340966314s


R[write to console]: 1: 
R[write to console]: In (function (package, help, pos = 2, lib.l

Time regression completes:

In [23]:
echo $(date)

Mon Dec 11 11:33:07 PM PST 2023


## Semantic segmentation of tissues

Images are segmented into specific plant tissues by a deep neural network of the state-of-the-art Deeplab v3 architecture <a href="https://arxiv.org/abs/1706.05587" target="_blank">Liang-Chieh et al., 2017</a>. The model has been trained using training sets generated with our annotation GUI Intelligent DEep Annotator for Segmentation (IDEAS, available on <a href="https://bitbucket.org/JialinYuan/image-annotator/src/master/" target="_blank">Bitbucket</a>, publication pending). Our branch of the Deeplab v3 repo, including a Jupyter walkthrough for training, can be found on Github.

Training is completed upstream of this notebook, which only entails analysis of test data using the latest model.

<img src="Figures/downsized/segmentation_composite2.png">

Figure: This example image was taken from an experiment on the effects of different CIMs on cottonwood regeneration. This composite image illustrates that for every sample, tissues are segmented into stem (red), callus (blue) and shoot (green). These composite images, useful for manual inspection of results, are produced when the 'composite' option is on.

### Pre-processing

#### Normalize orientation

We desire for images to all be in the same orientation. At one point, the camera on the *macroPhor Array* was set to automatically detect orientation, which led to images randomly being in portrait or landscape. Here we will standardize the orientation.

In [24]:
for filename in $data/*.jpg; do
    exiftool -Orientation=8 -n $filename > ${data}log_exiftool.txt
    done

In [25]:
rm -f $data/*original*

#### Crop and resize

This script resizes images to 900x900 and then crops away top and bottom 150 pixels for a final image size of 900x600.

The purpose for cropping is to remove labels, which has been standard practice for all training and testing. Otherwise, we could run into problems such as the neural network "learning" plants labeled as control have more or less regeneration.<br>The purpose for resizing is to reduce computational expense.

In [21]:
if [ "$segmentation_mode" = "rgb" ]; then
    cd ${cwd}/intermediates/
    python crop.py $data
fi

Corrupt chroma_15.0_F1.9_L100LEB30_213543_0_0_1_rgb_processed.png.csv
Corrupt chroma_15.0_F1.9_L100LEB30_213618_1_1_4_rgb_processed.png.csv
Corrupt GTNEC1_15.0_F1.9_L100LEB30_204742_0_0_0_rgb_processed.png.csv
Corrupt GTNEC1_15.0_F1.9_L100LEB30_204919_3_0_3_rgb_processed.png.csv
Corrupt GTNEC1_15.0_F1.9_L100LEB30_204953_4_0_4_rgb_processed.png.csv
Corrupt GTNEC1_15.0_F1.9_L100LEB30_205026_5_0_5_rgb_processed.png.csv
Corrupt GTNEC1_15.0_F1.9_L100LEB30_205059_6_0_6_rgb_processed.png.csv
Corrupt GTNEC1_15.0_F1.9_L100LEB30_205229_9_1_4_rgb_processed.png.csv
Corrupt GTNEC1_15.0_F1.9_L100LEB30_205303_10_1_3_rgb_processed.png.csv
Corrupt GTNEC1_15.0_F1.9_L100LEB30_205336_11_1_2_rgb_processed.png.csv
Corrupt GTNEC1_15.0_F1.9_L100LEB30_205517_14_2_0_rgb_processed.png.csv
Corrupt GTNEC1_15.0_F1.9_L100LEB30_205541_15_2_1_rgb_processed.png.csv
Corrupt GTNEC1_15.0_F1.9_L100LEB30_205615_16_2_2_rgb_processed.png.csv
Corrupt GTNEC1_15.0_F1.9_L100LEB30_205648_17_2_3_rgb_processed.png.csv
Corrupt GTNEC1

#### Prepare input list

The script `inference.py` requires a list of all files to be analyzed. We will create this file as `test.csv`. This will be a list of all our (pre-processed) image files.

In [28]:
if [ "$segmentation_mode" = "rgb" ]; then
    cd $data
    ls -d $PWD/* $data | grep -i "rgb_cropped.jpg" > test.csv
    sed -i '/hroma/d' "${data}/test.csv"
fi

### Inference

The trained model is deployed to perform semantic segmentation of experimental images. A list of RGB images to be segmented by the trained model is passed through the --image-list option. For each of these images, we will obtain an output mask (.png) of labeled tissues

Dependencies include `opencv`, `scipy`, `yaml` and `tensorflow` (version 1.14)

In [None]:
if [ "$segmentation_mode" = "rgb" ]; then
    export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim
    python -W ignore deeplab/inference.py \
        --image_lists "${data}/test.csv" \
        --crop_size 900 --crop_size 600 \
        --seg_results segmentation_results \
        --model_dir "${deeplab_path}/deeplab/model/" \
        >> $data/log_inference.txt
    mv "${deeplab_path}/segmentation_results/raw/"* $data/
fi

In [20]:
if [ "$segmentation_mode" = "hyperspectral" ]; then
    cd $cubeml_path
    #cd /mnt/cubeml/
    python scripts/batch_inference.py \
    --dir $data \
    --pickle $segmentation_model_path \
    --method $segmentation_model_type \
    --false_color
    >> $data/log_inference.txt
fi

Loading img /mnt/Elements_22/GTNEC_GWAS_poplar_transformation_necrotic_test/day1/chroma_15.0_F1.9_L100LEB30_213543_0_0_1_Broadband.hdr
load mode isload_full_then_crop
Current version sanity test1
Saving out false color RGB in addition to segmentation mask
Producing image channel for 533.7419 with cap 563 in color green
Producing image channel for 563.8288 with cap 904 in color red
Producing image channel for 500.0404 with cap 406 in color blue
Saving image to /mnt/Elements_22/GTNEC_GWAS_poplar_transformation_necrotic_test/day1/chroma_15.0_F1.9_L100LEB30_213543_0_0_1_rgb_processed.png.png
Saving image metadata to /mnt/Elements_22/GTNEC_GWAS_poplar_transformation_necrotic_test/day1/chroma_15.0_F1.9_L100LEB30_213543_0_0_1_rgb_processed.png.csv
Inference completed for /mnt/Elements_22/GTNEC_GWAS_poplar_transformation_necrotic_test/day1/chroma_15.0_F1.9_L100LEB30_213543_0_0_1_Broadband.hdr. Results saved as /mnt/Elements_22/GTNEC_GWAS_poplar_transformation_necrotic_test/day1/chroma_15.0_F1.

### Post-processing

Name outputs to reflect that they are segmentation results

In [35]:
if [ "$segmentation_mode" = "rgb" ]; then
    cd $data
    for file in *_rgb_cropped.png; do mv -f "$file" "${file%_rgb_cropped.png}_segment_cropped.png"; done
fi

Re-expand segment outputs to same size as original RGB files

In [36]:
if [ "$segmentation_mode" = "rgb" ]; then
    cd $alignment_path
    python expand.py $data >> $data/log_expand.txt
fi

Make composite images with side-by-side RGB, segmentation outputs and blended images

In [37]:
if [ $composite -eq 1 ]
then
    echo "making composites"
    cd $gmolabeler_path
    python image_blender.py $data 0.75 'both' 1 0
fi

## Classification of contaminated/missing explants

Plates are cropped into sub-images for each explant and each is analyzed to determine if the explant position should be excluded from analysis due to being missing or contamination. Missing and contaminated explants are recognized using a trained Densenet model (<a href="https://github.com/Contamination-Classification/DenseNet" target="_blank">Huang, et al. 2018</a>). Our fork of the Densenet repository is available on <a href="https://arxiv.org/abs/1608.06993" target="_blank">GitHub</a>.

<img src="Figures/Densenet.png">
Figure: These are four examples of contaminated explants used in the training set for this pre-trained model

To check the grid cropping dimensions, we can run the following script. Note that these are the dimensions to crop the image to after resizing to 2000x2000 (from 4000x4000 in the case of the *macroPhor Array*).

### Determine grid cropping parameters

In [42]:
if [ $pre_aligned_resized_grid_borders = "Automatic" ]; then
    cd "${cwd}/intermediates/"
    python find_grid_position.py --mode RGB --cap 255 --index 0 --data $data --plot
    
    # Navigate to the data directory where coordinates.csv is saved
    cd "$data"
    
    cat coordinates.csv

    # Read the CSV file and extract x1, x2, y1, y2
    firstline=1
    while IFS=',' read -r mode x1 x2 y1 y2; do
        if [ "$firstline" -eq "0" ]; then
            # echo "Debug x1: $x1"
            # echo "Debug x2: $x2"
            # echo "Debug y1: $y1"
            # echo "Debug y2: $y2"

            # Divide by 2 and round to integer
            x1=$(printf "%.0f" $(echo "$x1 / 2" | bc -l))
            x2=$(printf "%.0f" $(echo "$x2 / 2" | bc -l))
            y1=$(printf "%.0f" $(echo "$y1 / 2" | bc -l))
            y2=$(printf "%.0f" $(echo "$y2 / 2" | bc -l))

            pre_aligned_resized_grid_borders="$x1,$y1,$x2,$y2"
        fi
        firstline=0
    done < coordinates.csv

fi

Selected file: chromagrid_15.0_F1.9_L100LEB30_111309_1_1_4_rgb.jpg
RGB data loaded, elapsed time: 0.12 seconds
Calculations done, elapsed time: 0.14 seconds
Coordinates saved, elapsed time: 0.14 seconds
  plt.show()
  plt.show()
Plotting done, elapsed time: 19.60 seconds
mode,x1,x2,y1,y2
RGB,1016,3262,470,3431


### Prepare list of images

In [None]:
if [ $missing_explants = "Automatic" ]; then
   echo "Missing explants will be inferred."
   cd $data
   ls -d $PWD/* $data | grep -i "rgb.jpg" > rgb_list.txt
   sed -i '/hroma/d' rgb_list.txt
   img_list_path="${data}/rgb_list.txt"
else
   echo "Missing explants input manually by user, in file: "
   echo $missing_explants
fi

If the mode for missing explant data is automatic, prepare input file for script to detect missing explants and run this script.

### Infer contaminated/missing explants

In [None]:
if [ $missing_explants = "Automatic" ]; then
    cd $contamination_path
    conda activate DenseNet
    python -W ignore inference.py \
    --img-list=$img_list_path \
    --crop_dims $pre_aligned_resized_grid_borders \
    --output_file=output.csv >> $data/log_contam.txt
    mv -f output.csv "${data}/output.csv"
    conda deactivate
fi

In [None]:
if [ $missing_explants = "Automatic" ]; then
    missing_explants="${data}/output.csv"
    echo "Missing explants inferred by model and written to file:"
    echo $missing_explants
else
    echo "Missing explants input manually by user, in file: "
    echo $missing_explants
fi

## Alignment of RGB and hyperspectral layers

To match the frame and angle of RGB and hyperspectral image layers, we perform a homography transformation using a method described [in these notebooks](https://github.com/naglemi/GMOnotebook/tree/master/1_Decide_parameters/2_Align_and_crop_parameters/2_find_alignment_parameters). Using a pair of standard images, a homography matrix is calculated for the necessary transformation of RGB images to align with hyperspectral images. The transformation can then be applied to large batches of images rapidly, as long as the RGB and hyperspectral cameras remain in the same positions relative to one another (as they do in the macroPhor Array platform)

<img src="Figures/Alignment.png">
Figure: To enable precise calculation of a homography matrix for transformation of RGB images to match hyperspectral images, we used images of a piece of paper with grid marks.

### `opencv` method

#### Prepare file lists for alignment

<div class="alert alert-block alert-info"><b>Tip:</b> We will produce two lists: one for hyperspectral channels (chlorophyll peak channel) for each sample and another for the complementary RGB images. We will superimpose them and produce images for inspection, allowing the user to make sure the alignment works reliably for all images. However, it is possible to replace the hyperspectral channels for each image with a single file; this would run more quickly but be less useful for allowing the user to validate alignment results.</div>

We need to produce a csv with two columns with headers `hyper_img` and `rgb_images`. For each RGB image being transformed in batch alignment (mode 2), we can test the alignment by producing superimposed images of the transformed RGB images together with a hyperspectral layer. The hyperspectral layer can either be for a grid (fast) or can be for a layer extracted from the hyperspectral image of each channel (slow, but useful for making sure a certain homography matrix works reliably to transform a batch of images).

In [None]:
if [ "$mode" = "opencv" ] && [ "$segmentation_mode" = "rgb" ]; then
    cd $data
    ls | grep -i 'rgb\.jpg' > file_list_part1.csv
    ls | grep -i 'segment_uncropped\.png' > file_list_part2.csv
    cat file_list_part* > file_list.csv
    # sed -i '/hroma/d' file_list.csv
    cwd2=$(pwd)/
    awk -v prefix="$cwd2" '{print prefix $0}' file_list.csv > temp
    mv -f temp file_list.csv
    echo 'rgb_images' | cat - file_list.csv > temp && mv -f temp file_list.csv
    cp file_list.csv file_list_hyper_channel.csv
    sed -i 's/_rgb.jpg/_hyperchannel.csv/g' file_list_hyper_channel.csv
    sed -i 's/rgb_images/hyper_img/g' file_list_hyper_channel.csv
    sed -i 's/_segment_uncropped.png/_hyperchannel.csv/g' file_list_hyper_channel.csv
    paste --delimiters=',' file_list_hyper_channel.csv file_list.csv > rgb_and_hyper_channel_lists.csv
fi

#### Run batch alignment to apply homography matrix to all images

In [None]:
if [ "$mode" = "opencv" ] && [ "$segmentation_mode" = "rgb" ]; then
    conda activate alignment
    cd $alignment_path
    file_list_input="${data}/rgb_and_hyper_channel_lists.csv"
    python main.py \
    --hyper-img $hypercube_csv \
    --img-csv $file_list_input \
    --mode 2 \
    --h_matrix_path $homography >> $data/log_alignment_mode2.txt
    conda deactivate
    conda deactivate
fi

### `scikit` method

With this similar, but simpler approach, a homography matrix is manually generated by the user as shown in the `manual_alignment` notebooks in the GMOnotebook repository. This is notably different from how these matrices are automatically generated with the `opencv` method (which is not always perfectly robust). This homography matrix is then used to batch-transform all images for this dataset, similarly to the `opencv` method.

In [None]:
if [ "$mode" = "scikit" ] && [ "$segmentation_mode" = "rgb" ]; then
    conda activate alignment
    python $cwd/manual_alignment/batch_align_manual.py \
    --img_dir $data \
    --hyp_path $hypercube_jpg \
    --matrix $homography
fi

In [None]:
if [ "$segmentation_mode" = "rgb" ]; then
    hypercube_jpg=$(echo $hypercube_csv | sed -e 's/\.csv/.jpg/g')
fi

## Cross-analyze deep segmentation and regression results

Scripts in the <a href="https://github.com/naglemi/GMOlabeler" target="_blank">GMOlabeler repository</a> are used to cross-reference results from deep segmentation of RGB images and regression of hyperspectral imaging, apply thresholding parameters to classify tissues as transgenic or escapes, and produce plots.

<img src="Figures/GMOlabeler.png">

Figure: The various steps of data processing in GMOlabeler are illustrated for an example explant from an experiment on CIM optimization for cottonwood. Images of plates are cropped to a sub-image for each explant. RGB segmentation results and hyperspectral regression results are cross-referenced to calculate fluorecent proteins in specific tissues and infer whether these tissues are transgenic.

### Prepare sample datasheet input

Prepare input file we will use for making plots. This file contains paths to CLS results, RGB images, and hyperspectral images.

In [None]:
echo $datestamp

In [None]:
cd "${cwd}/intermediates/"

# Check if segmentation_mode is set to "hyperspectral"
if [ "$segmentation_mode" = "hyperspectral" ]; then
  Rscript pre_label.R \
  -r "${data}/" \
  -R "${output_directory_prefix}" \
  -i 1 \
  -d $datestamp \
  --segmentation_model_key $segmentation_model_key # Only included if segmentation_mode is hyperspectral
else
  Rscript pre_label.R \
  -r "${data}/" \
  -R "${output_directory_prefix}" \
  -i 1 \
  -d $datestamp
fi

### Cross-reference RGB and hyperspectral data

In [None]:
if [ "$aligned_grid_borders" = "Automatic" ]; then
    cd "${cwd}/intermediates/"
    python find_grid_position.py --mode hyperspectral --cap 500 --index 130 --data $data \
    --rotation 270 --flip_horizontal --plot
    
    # Navigate to the data directory where coordinates.csv is saved
    cd "$data"
    
    cat coordinates.csv

    # Read the CSV file and extract x1, x2, y1, y2
    firstline=1
    while IFS=',' read -r mode x1 x2 y1 y2; do
        if [ "$firstline" -eq "0" ]; then
            # echo "Debug x1: $x1"
            # echo "Debug x2: $x2"
            # echo "Debug y1: $y1"
            # echo "Debug y2: $y2"

            aligned_grid_borders="$x1 $x2 $y2 $y1"
        fi
        firstline=0
    done < coordinates.csv

fi

In [None]:
# Check if $aligned_grid is unset or the file doesn't exist
if [[ -z "$aligned_grid" || ! -f "$aligned_grid" ]]; then
  echo "No grid found at a path given by user, searching for one..."

  # Find files, extract filenames and timestamps, sort by timestamp, and get the filename with the latest timestamp
  aligned_grid=$(find "$data" -maxdepth 1 -type f -name "*hroma*rgb_processed.png*" ! -name "*csv*" \
                 | while read -r file; do
                     timestamp=$(echo "$file" | grep -o '[0-9]\{6\}')
                     echo "$timestamp $file"
                   done \
                 | sort -k1,1nr \
                 | head -n 1 \
                 | cut -d' ' -f2-)

  # Check if a file was found
  if [[ -z "$aligned_grid" ]]; then
    echo "No suitable grid file found."
  else
    # Output the found path
    echo "Grid found: $aligned_grid"
  fi
fi


In [None]:
cd $gmolabeler_path
for reporter in ${reporters[@]}; do
    # Start the command with the basic arguments
    cmd="python main.py \
    \"${data}/samples_pre_labeling.csv\" \
    $aligned_grid \
    $reporter_threshold \
    $reporter \
    $grid \
    \"hdf\" \
    \"${output_directory_prefix}/gmolabeler_logic_outputs/\""

    # Append the grid borders to the command
    if [[ -n "$aligned_grid_borders" ]]; then
        cmd+=" \"$aligned_grid_borders\""
    fi

    # Check if segmentation_model_key is set and points to a file
    if [[ -n $segmentation_model_key && -f $segmentation_model_key ]]; then
        # Append the segmentation model key to the command as a named argument
        cmd+=" --segmentation_model_key \"$segmentation_model_key\""
    fi

    # Run the command and redirect stdout to the log file
    eval $cmd > "$data/log_gmolabeler_$reporter.txt"

done

In [None]:
tail "$data/log_gmolabeler_$reporter.txt"

### Calculate sums of statistics over combined segments

We are interested in all regenerated tissue (callus + shoot) as well as all tissue (including stem as well). We will calculate aggregate statistics over these groups.

In [None]:
echo $segmentation_model_key

In [None]:
cd $gmolabeler_path
for reporter in "${reporters[@]}"; do
    # Start the command with the base part
    cmd="Rscript calculate_sum_stats_over_combined_segments.R \
    --output_dir \"${output_directory_prefix}/gmolabeler_logic_outputs/\" \
    --datapath \"${data_folder}/${reporter}/\""

    # Append the model key path if it's set and not None
    if [ -n "${segmentation_model_key}" ] && [ "${segmentation_model_key}" != "None" ]; then
        cmd+=" --keypath \"${segmentation_model_key}\""
    fi

    # Append the exclude tissues string if it's set and not None
    if [ -n "${unregenerated_tissues}" ] && [ "${unregenerated_tissues}" != "None" ]; then
        echo $exclude_tissues
        cmd+=" --exclude_tissues \"${unregenerated_tissues}\""
    fi
    
    echo $cmd

    # Execute the command
    eval $cmd
done

### Make plots of results

In [None]:
cd $gmolabeler_path
for reporter in ${reporters[@]}; do
    # Start the command with the base part
    cmd="Rscript grid_item_plots.R \
    -d \"${data_folder}/\" \
    -r \"$randomization_datasheet\" \
    -p $pixel_threshold \
    -v categorical \
    -m 1 \
    -M $missing_explants \
    -g $grid \
    --samples-pre-labeling ${data}/samples_pre_labeling.csv \
    --sort 1 \
    --height $height \
    --width $width \
    --Reporter $reporter \
    --outdir \"${output_directory_prefix}\""

    # Append the model key path if it's set and not None
    if [ -n "$segmentation_model_key" ] && [ "$segmentation_model_key" != "None" ]; then
        cmd+=" --keypath \"$segmentation_model_key\""
    fi

    echo $cmd

    # Execute the command
    eval $cmd
done

In [None]:
echo -e "Complete \u2705"