# *GMOnotebook*
#### **Notebook template for applying routine hyperspectral/segmentation cross-analysis phenomics workflow over new datasets**

###### Notebook template v0.2.01 (Oct 14, 2020)
###### v0.2.0 onward applies the Python `GMOdetector` package instead of the deprecated R `GMOdetectoR` package, for much faster runtime
###### v0.2.01 has calls to `conda` cleaned up with `conda activate` and `conda deactivate` in the same code blocks to avoid any problems with warnings/errors preventing automatic execution of subsequent blocks

<img src="WorkflowFlowchart.png">

In this workflow, images taken with the macroPhor Array dual RGB/hyperspectral imaging platform are analyzed by a workflow in which regression quantifies fluorescent signals in hyperspectral images, deep learning segments RGB images into different tissues, and these datasets are cross-referenced to produce statistics on growth of transgenic callus and shoot.

#### Instructions for use
1.  Enter information for the experiment below
2. Set <font color=blue>variables</font> for data paths and parameters, as instructed by colored boxes.
3. "Save as" with filename describing experiment and anything special about this analysis (e.g. T18_OD_TAO_wk7_automation_test_attempt2.ipynb)
4. Run notebook from console, using the below command with the notebook filename inserted<br>
```jupyter nbconvert --to HTML --ExecutePreprocessor.timeout=-1 --allow-errors --execute insert_filename_here```
5. Wait for email

#### Experiment ID and quick description:

<div class="alert alert-block alert-success">
Provide a short description of the experiment in the below box. This should include unique identifier codes for the experiment, along with a short description of genotypes and treatments studied. The timepoint should also be included. </div>

#### Parameters for analysis:

<div class="alert alert-block alert-success">
The below variables must be modified appropriately every time this workflow is run over new images.
</div>

##### Data location
The `data` variable below provides the **complete** path to the folder containing data to be analyzed. This should include all folders and subfolders in which the data of interest is organized by. For the organizational system used for our lab's data, this should follow the format "/Experiment/Subexperiment/Timepoint/"

In [1]:
data="/scratch2/NSF_GWAS/macroPhor_Array/GREGSCORNER/GFPA_cottonwoods/wk3/Fluorescent/"

##### Sample information
Every experiment has a randomization datasheet, which was used to organize treatment and genotype information for each plate, prepare labels, and randomize plates. The path to this file is provided through the `randomization_datasheets` variable below. This workflow requires this datasheet in order to know which plates have which genotype/treatment. At a later date, we will integrate an ability to read this data directly from labels.

In [2]:
randomization_datasheet="/scratch2/NSF_GWAS/macroPhor_Array/GREGSCORNER/GFPA_cottonwoods/GFPA_randomized list_MNrevised.xlsx"

##### Exclusion of missing/contaminated explants from analysis

Set the `missing_explants` variable to `"Automatic"` if using model to automatically detect missing and contaminated explants. Note that this model is only supported for plates with 12 explants. Otherwise, provide an appropriately formatted `.csv` spreadsheet (see example)  of manually scored contamination / missing explant data.

In [3]:
missing_explants="Automatic"

Enter your email where results will be sent

In [4]:
email=michael.nagle@oregonstate.edu

<div class="alert alert-block alert-warning">
All variables below should be modified only as needed to indicate the fluorescent proteins in samples and the grid layout of explants. </div>

##### Provide information pertinent to the fluorophores in the sample

Below should be all known fluorescent components contained in the sample. This includes each fluorescent protein, as well as a "noise" or "diffraction" term if applicable. All of these components must exist in the user's spectral library. `GMOdetector` currently comes with a spectral library that includes, by default:
- DsRed
- ZsYellow
- GFP
- Chl
- ChlA
- ChlB
- Diffraction

In [5]:
fluorophores=(GFP Chl Diffraction) # An explanation of array variables in bash is here: https://tldp.org/LDP/Bash-Beginners-Guide/html/sect_10_02.html

In [6]:
spectral_library_path="/scratch2/NSF_GWAS/notebooks/gmodetector_py/spectral_library/"

The user has the option of limiting loading of hyperspectral data and subsequent regression to a specific range of wavelengths, using the `desired_wavelength_range` array variable. This range should cover all fluorophores provided in the `fluorophores` array variable able. Aside from runtime, there is no disadvantage to including a wider range than is needed.

In [7]:
desired_wavelength_range=(500 900)

##### To assist user inspection of regression results, false color plots will be produced by `GMOdetector` to show results of regression over whole samples. 
###### Note: In v0.2.0 of the workflow, these parameters are independent of those used later by `GMOlabeler` to produce by-explant plots including false color plots. These will be made the same in a later update.
The `FalseColor_channels` array variable indicates the components to be plotted as red, green and blue, in that order.

In [8]:
FalseColor_channels=(Chl GFP Diffraction)

The `FalseColor_caps` array variable indicates an upper limit of signal for each of these component. Any signal at or above these values will appear with maximum brightness; thus, these variables are comparable to exposure on a camera. If caps are too high, not much signal at lower ranges will be seen. If cap for a given component is too low, the false color images will appear overexposed with respect to the component.

In [9]:
FalseColor_caps=(200 400 50)

##### `GMOlabeler` will use the below parameters to classify individual explants on plates as transgenic or not.

In [10]:
reporter=GFP
#reporter=DsRed

Parameters for reporter signal threshold and pixel threshold must be provided by the user. Our grid search yielded several noted below. These were most recently calculated from statistics produced with Python `GMOdetector`.

<img src="GMOlabeler_parameters.png">

In [11]:
reporter_threshold=38
pixel_threshold=3

##### Grid information

Currently, `GMOlabeler` distinguishes explants by their expected position on a plate. This is supported for plates with 12 or 20 explants, arranged in specific positions on a grid. False positives may result in cases in which one explant grows to such an extent that it intrudes into the grid square meant for an adjacent explant. <br>In the future, an additional neural network will be used to segment individual explants on a per-pixel basis, avoiding the need for crude cropping, and the outputs from the resulting model will be integrated with `GMOlabeler`.

Cropping to grid squares is supported for standard grids with 12 or 20 explants, as indicated by the `grid` variable. An image of each grid square (with a file path provided by the `grid_file` variable) will be superimposed over each image during `GMOlabeler` analysis to allow the user to easily inspect and verify grid positions.

In [12]:
#grid=20
grid=12

#grid_file="/scratch2/NSF_GWAS/macroPhor_Array/grids/grid20_post_processed.png"
grid_file="/scratch2/NSF_GWAS/macroPhor_Array/grids/grids_left_facing_125208_1_0_1_rgb_processed.jpg"

<div class="alert alert-block alert-danger">
The below variables do not need to be modified during any routine use of the workflow.
</div>

For the following variables, use 0 (False) or 1 (True)

In [13]:
composite=1

Set dimensions for plot outputs

In [14]:
#width=15
width=9
height=5

<div class="alert alert-block alert-info">
With all above variables set, please "Save as..." with a filename referencing this specific dataset. <br>Finally, execute the workflow by running the below command in a console, replacing insert_filename_here with the filename for the saved notebook.
</div>

# Automated workflow to be deployed

See the below code for a walkthrough of how GMOnotebook works, or view the outputs after running the workflow for help troubleshooting errors in specific steps of analysis.

<div class="alert alert-block alert-danger"> <b>Danger:</b> Do not modify any below code without creating a new version of the template notebook. During routine usage, this workflow should be customized only by modifying variables above, while leaving the below code unmodified. </div>

##### Time analysis begins:

In [15]:
echo $(date)

Tue Oct 27 20:08:52 PDT 2020


These internal variables are set automatically.

In [16]:
datestamp=$(date +”%Y-%m-%d”)

In [17]:
echo $data_folder




In [18]:
data_folder=$(echo $data | cut -d/ -f5-)

In [19]:
timepoint="$(basename -- $data_folder)"

In [20]:
#data="/scratch2/NSF_GWAS/macroPhor_Array${data_folder}"

In [21]:
gmodetector_wd="/scratch2/NSF_GWAS/notebooks/gmodetector_py/"

In [22]:
output_directory_prefix="/scratch2/NSF_GWAS/gmodetector_out/"

In [23]:
output_directory_full="$output_directory_prefix$data_folder"

In [24]:
dataset_name=$(echo $subdirectory | sed -e 's/\//-/g')

## 1. Quantification of fluorescent proteins by regression

The Python package `GMOdetector` is used to quantify fluorescent proteins in each pixel of hyperspectral images via linear regression. Hyperspectral images are regressed over spectra of known components, and pixelwise maps of test-statistics are constructed for each component in the sample. This approach to quantifying components of hyperspectral images is described in-depth in the Methods section from <a href="https://link.springer.com/article/10.1007/s40789-019-0252-7" target="_blank">Böhme, et al. 2019</a>. Code and documentation for `GMOdetector` is on <a href="https://github.com/naglemi/GMOdetector_py" target="_blank">Github</a>.

In [25]:
cd $gmodetector_wd

In [26]:
job_list_name="$dataset_name.jobs"

In [27]:
rm -rf $job_list_name

In [28]:
for file in $data/*.hdr
do
 if [[ "$file" != *'hroma'* ]]; then
  echo "python wrappers/analyze_sample.py \
--file_path $file \
--fluorophores ${fluorophores[*]} \
--min_desired_wavelength ${desired_wavelength_range[0]} \
--max_desired_wavelength ${desired_wavelength_range[1]} \
--red_channel ${FalseColor_channels[0]} \
--green_channel ${FalseColor_channels[1]} \
--blue_channel ${FalseColor_channels[2]} \
--red_cap ${FalseColor_caps[0]} \
--green_cap ${FalseColor_caps[1]} \
--blue_cap ${FalseColor_caps[2]} \
--spectral_library_path /scratch2/NSF_GWAS/notebooks/gmodetector_py/spectral_library/ \
--output_dir $output_directory_full" >> $job_list_name
 fi
done

In [29]:
conda activate test-environment
parallel -a $job_list_name
conda deactivate
conda deactivate

(test-environment) Academic tradition requires you to cite works you base your article on.
If you use programs that use GNU Parallel to process data for an article in a
scientific publication, please cite:

  Tange, O. (2020, September 22). GNU Parallel 20200922 ('Ginsburg').
  Zenodo. https://doi.org/10.5281/zenodo.4045386

This helps funding further development; AND IT WON'T COST YOU A CENT.
If you pay 10000 EUR you should feel free to use GNU Parallel without citing.

More about funding GNU Parallel and the citation notice:
https://www.gnu.org/software/parallel/parallel_design.html#Citation-notice

To silence this citation notice: run 'parallel --citation' once.

Come on: You have run parallel 25 times. Isn't it about time 
you run 'parallel --citation' once to silence the citation notice?

Running GMOdetector version 0.0.855
load mode isload_full_then_crop
Saving to /scratch2/NSF_GWAS/gmodetector_out/GREGSCORNER/GFPA_cottonwoods/wk3/Fluorescent/GFPA1_I5.0_F1.9_L100_162853_13_2_0_we

##### Time regression completes:

In [30]:
echo $(date)

Tue Oct 27 20:12:58 PDT 2020


## 2. Neural networks to segment tissues, classify missing/contaminated explants

### 2.1. Semantic segmentation of tissues

Images are segmented into specific plant tissues by a deep neural network of the state-of-the-art Deeplab v3 architecture <a href="https://arxiv.org/abs/1706.05587" target="_blank">Liang-Chieh et al., 2017</a>. The model has been trained using training sets generated with our annotation GUI Intelligent DEep Annotator for Segmentation (IDEAS, available on <a href="https://bitbucket.org/JialinYuan/image-annotator/src/master/" target="_blank">Bitbucket</a>, publication pending). Our branch of the Deeplab v3 repo, including a Jupyter walkthrough for training, can be found on Github.

Training is completed upstream of this notebook, which only entails analysis of test data using the latest model.

<img src="Figures/downsized/segmentation_composite2.png">

Figure: This example image was taken from an experiment on the effects of different CIMs on cottonwood regeneration. This composite image illustrates that for every sample, tissues are segmented into stem (red), callus (blue) and shoot (green). These composite images, useful for manual inspection of results, are produced when the 'composite' option is on.

#### 2.1.1. Pre-processing
##### 2.1.1.1. Crop to remove labels, and resize images to 900x600

This script resizes images to 900x900 and then crops away top and bottom 150 pixels for a final image size of 900x600.

In [31]:
cd /scratch2/NSF_GWAS/GMOdetectoR/

In [32]:
conda activate base
python crop.py $data
conda deactivate

(base) (base) 

In [33]:
echo $data

/scratch2/NSF_GWAS/macroPhor_Array/GREGSCORNER/GFPA_cottonwoods/wk3/Fluorescent/


##### 2.1.1.2. prepare test.csv file for inference

Make a list of all our image files

In [34]:
cd $data
ls -d $PWD/* $data | grep -i "rgb_cropped" > test.csv

Remove the chroma standard from list of RGB image data to be segmented

In [35]:
sed -i '/hroma/d' "${data}/test.csv"

#### 2.1.2. Inference

The trained model is deployed to perform semantic segmentation of experimental images. A list of RGB images to be segmented by the trained model is passed through the --image-list option. For each of these images, we will obtain an output mask (.png) of labeled tissues

Dependencies include `opencv`, `scipy`

In [36]:
cd /scratch2/NSF_GWAS/deeplab/
conda activate deeplab
python /scratch2/NSF_GWAS/deeplab/inference.py --image-list "${data}/test.csv"
conda deactivate
conda deactivate

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])



2020-10-27 20:14:00.882293: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
2020-10-27 20:14:06.293944: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: 
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 0000:05:00.0
2020-10-27 20:14:06.295426: I tens

#### 2.3. Post-processing

Name outputs to reflect that they are segmentation results

In [37]:
cd $data
for file in *_rgb_cropped.png; do mv -f "$file" "${file%_rgb_cropped.png}_segment_cropped.png"; done

Re-expand segment outputs to same size as original RGB files

Dependencies include `scikit-image`

In [38]:
conda activate base
cd /scratch2/NSF_GWAS/ImageAlignment/
python expand.py $data
conda deactivate

(base) (base) Working in directory/scratch2/NSF_GWAS/macroPhor_Array/GREGSCORNER/GFPA_cottonwoods/wk3/Fluorescent/
Number of files: 90
Reading file GFPA4_I5.0_F1.9_L100_170854_1_0_1_segment_cropped.png
Writing file GFPA4_I5.0_F1.9_L100_170854_1_0_1_segment_uncropped.png
Reading file GFPA4_I5.0_F1.9_L100_170950_2_0_2_segment_cropped.png
Writing file GFPA4_I5.0_F1.9_L100_170950_2_0_2_segment_uncropped.png
Reading file GFPA4_I5.0_F1.9_L100_175704_3_2_4_segment_cropped.png
Writing file GFPA4_I5.0_F1.9_L100_175704_3_2_4_segment_uncropped.png
Reading file GFPA2_I5.0_F1.9_L100_164338_1_0_4_segment_cropped.png
Writing file GFPA2_I5.0_F1.9_L100_164338_1_0_4_segment_uncropped.png
Reading file GFPA3_I5.0_F1.9_L100_162402_3_0_3_segment_cropped.png
Writing file GFPA3_I5.0_F1.9_L100_162402_3_0_3_segment_uncropped.png
Reading file GFPA3_I5.0_F1.9_L100_162553_5_0_5_segment_cropped.png
Writing file GFPA3_I5.0_F1.9_L100_162553_5_0_5_segment_uncropped.png
Reading file GFPA2_I5.0_F1.9_L100_164433_2_0_5_se

Make composite images with side-by-side RGB, segmentation outputs and blended images

In [39]:
conda activate base
if [ $composite -eq 1 ]
then
    cd /scratch2/NSF_GWAS/GMOlabeler/
    python image_blender.py $data 0.75 'both' 1 180
fi
conda deactivate

(base) ['image_blender.py', '/scratch2/NSF_GWAS/macroPhor_Array/GREGSCORNER/GFPA_cottonwoods/wk3/Fluorescent/', '0.75', 'both', '1', '180']
Building composite for image GFPA1_I5.0_F1.9_L100_161732_1_0_1_rgb.jpg
Building composite for image GFPA1_I5.0_F1.9_L100_161827_2_0_2_rgb.jpg
Building composite for image GFPA1_I5.0_F1.9_L100_161923_3_0_3_rgb.jpg
Building composite for image GFPA1_I5.0_F1.9_L100_162020_4_0_4_rgb.jpg
Building composite for image GFPA1_I5.0_F1.9_L100_162117_5_0_5_rgb.jpg
Building composite for image GFPA1_I5.0_F1.9_L100_162212_6_0_6_rgb.jpg
Building composite for image GFPA1_I5.0_F1.9_L100_162308_7_1_6_rgb.jpg
Building composite for image GFPA1_I5.0_F1.9_L100_162354_8_1_5_rgb.jpg
Building composite for image GFPA1_I5.0_F1.9_L100_162450_9_1_4_rgb.jpg
Building composite for image GFPA1_I5.0_F1.9_L100_162555_10_1_3_rgb.jpg
Building composite for image GFPA1_I5.0_F1.9_L100_162649_11_1_2_rgb.jpg
Building composite for image GFPA1_I5.0_F1.9_L100_162757_12_1_0_rgb.jpg
Build

#### 2.4. Classification of contaminated/missing explants

Plates are cropped into sub-images for each explant and each is analyzed to determine if the explant position should be excluded from analysis due to being missing or contamination. Missing and contaminated explants are recognized using a trained Densenet model (<a href="https://github.com/Contamination-Classification/DenseNet" target="_blank">Huang, et al. 2018</a>). Our fork of the Densenet repository is available on <a href="https://arxiv.org/abs/1608.06993" target="_blank">GitHub</a>.

<img src="Figures/Densenet.png">
Figure: These are four examples of contaminated explants used in the training set for this pre-trained model

If the mode for missing explant data is automatic, prepare input file for script to detect missing explants and run this script.

In [40]:
img_list_path="${data}/rgb_list.txt"

Dependencies include `keras-preprocessing`, `termcolor`,  `protobuf` and `absl-py`

In [41]:
echo $img_list_path

/scratch2/NSF_GWAS/macroPhor_Array/GREGSCORNER/GFPA_cottonwoods/wk3/Fluorescent//rgb_list.txt


In [42]:
if [ $missing_explants = "Automatic" ]; then
    conda activate densenet
    echo "Missing explants will be inferred."
    cd $data
    ls -d $PWD/* $data | grep -i "rgb.jpg" > rgb_list.txt
    sed -i '/hroma/d' rgb_list.txt
    #sed -i 's/.jpg//g' rgb_list.txt
    #cat rgb_list.txt
    cd /scratch2/NSF_GWAS/Contamination
    img_list_path="${data}/rgb_list.txt"
    #echo $img_list_path
    python inference.py --img-list=$img_list_path --output_file=output.csv
    mv -f output.csv "${data}/output.csv"
    missing_explants="${data}/output.csv"
    conda deactivate
else
    echo "Missing explants input manually by user, in file: "
    echo $missing_explants
fi

Missing explants will be inferred.
Using TensorFlow backend.
Reading Arguments: 
Processing 90 images.
Intializing the pretrained model
2020-10-27 20:49:08.725479: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-10-27 20:49:14.384983: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties: 
pciBusID: 0000:05:00.0 name: Tesla K80 computeCapability: 3.7
coreClock: 0.8235GHz coreCount: 13 deviceMemorySize: 11.17GiB deviceMemoryBandwidth: 223.96GiB/s
2020-10-27 20:49:14.385421: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/apps/cuda/cuda-8.0.61/lib64:/usr/local/apps/cuda/cuda-9.0/lib64/
2020-10-27 20:49:14.385727: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library

## 3. Alignment of RGB and hyperspectral layers

To match the frame and angle of RGB and hyperspectral image layers, we apply a scale-invariant feature transformation (<a href="https://github.com/NSF-Image-alignment/ImageAlignment" target="_blank">GitHub</a>). Using a pair of standard images, a homography matrix is calculated for the necessary transformation of RGB images to align with hyperspectral images. The transformation can then be applied to large batches of images rapidly, as long as the RGB and hyperspectral cameras remain in the same positions relative to one another (as they do in the macroPhor Array platform)

<img src="Figures/Alignment.png">
Figure: To enable precise calculation of a homography matrix for transformation of RGB images to match hyperspectral images, we used images of a piece of paper with grid marks. These images are provided by the user inputs to --hyper-img and --rgb-img in the below call to the alignment script. If using a phenotyping platform other than the macroPhor Array, or using updated camera settings, these variables will need to be replaced.

##### 3.1. Prepare input file for alignment

In [43]:
cd $data
ls | grep -i 'rgb\.jpg' > file_list_part1.csv
ls | grep -i 'segment_uncropped\.png' > file_list_part2.csv
cat file_list_part* > file_list.csv
sed -i '/hroma/d' file_list.csv
cwd=$(pwd)/
awk -v prefix="$cwd" '{print prefix $0}' file_list.csv > temp
mv -f temp file_list.csv
echo 'rgb_images' | cat - file_list.csv > temp && mv -f temp file_list.csv

##### 3.2 Run alignment

In [44]:
conda activate alignment
cd /scratch2/NSF_GWAS/ImageAlignment/
file_list_input="${data}/file_list.csv"
python main.py \
--hyper-img Grids_to_align-selected/20itemgrid_F1.9_I3.0_L50_cyan_114229_0_0_0_index130.csv \
--rgb-img Grids_to_align-selected/20itemgrid_F1.9_I3.0_L50_cyan_114229_0_0_0_rgb.jpg \
--img-csv $file_list_input \
--mode 2
conda deactivate
conda deactivate

(alignment) (alignment) (alignment) [  0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17
  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35
  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53
  54  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69  70  71
  72  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89
  90  91  92  93  94  95  96  97  98  99 100 101 102 103 104 105 106 107
 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125
 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143
 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161
 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179
 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197
 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215
 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233
 234 235 236 23

## 4. Cross-analyze deep segmentation and regression results

Scripts in the <a href="https://github.com/naglemi/GMOlabeler" target="_blank">GMOlabeler repository</a> are used to cross-reference results from deep segmentation of RGB images and regression of hyperspectral imaging, apply thresholding parameters to classify tissues as transgenic or escapes, and produce plots.

<img src="Figures/GMOlabeler.png">

Figure: The various steps of data processing in GMOlabeler are illustrated for an example explant from an experiment on CIM optimization for cottonwood. Images of plates are cropped to a sub-image for each explant. RGB segmentation results and hyperspectral regression results are cross-referenced to calculate fluorecent proteins in specific tissues and infer whether these tissues are transgenic.

##### 4.1. Prepare sample datasheet input

Prepare input file we will use for making plots. This file contains paths to CLS results, RGB images, and hyperspectral images.

In [45]:
cd /scratch2/NSF_GWAS/GMOdetectoR/
Rscript wrappers/pre_label_a3.R \
-r "${data}/" \
-R "/scratch2/NSF_GWAS/gmodetector_out/" \
-i 1 \
-d $datestamp

[1] "”2020-10-27”"
[1] "Looking for CLS data in: /scratch2/NSF_GWAS/gmodetector_out/GREGSCORNER/GFPA_cottonwoods/wk3/Fluorescent//"
[1] "2020-10-27"
[1] "Looking for CLS files in directory /scratch2/NSF_GWAS/gmodetector_out/GREGSCORNER/GFPA_cottonwoods/wk3/Fluorescent//"
[1] "How many CLS files? 90"
[1] "Writing 90 rows to /scratch2/NSF_GWAS/macroPhor_Array/GREGSCORNER/GFPA_cottonwoods/wk3/Fluorescent///samples_pre_labeling.csv"


##### 4.2. Cross-reference RGB and hyperspectral data

In [46]:
#cat "${data}/samples_pre_labeling.csv"

In [47]:
echo $grid

12


In [48]:
identify $grid_file

/scratch2/NSF_GWAS/macroPhor_Array/grids/grids_left_facing_125208_1_0_1_rgb_processed.jpg JPEG 1419x1566 1419x1566+0+0 8-bit sRGB 346442B 0.000u 0:00.000


In [49]:
conda activate
cd /scratch2/NSF_GWAS/GMOlabeler/
python main.py \
"${data}/samples_pre_labeling.csv" \
$grid_file \
$reporter_threshold \
$reporter \
$grid \
"hdf"
conda deactivate

(base) (base) Grid type 12
Loading plate 0 of 90
/scratch2/NSF_GWAS/macroPhor_Array/GREGSCORNER/GFPA_cottonwoods/wk3/Fluorescent///GFPA4_I5.0_F1.9_L100_175854_5_2_6_segment_uncropped_processed.png
Plate segment loaded
Plate RGB loaded
Loading CLS data from path/scratch2/NSF_GWAS/gmodetector_out/GREGSCORNER/GFPA_cottonwoods/wk3/Fluorescent///GFPA4_I5.0_F1.9_L100_175854_5_2_6_weights.hdf
Loading plate 1 of 90
/scratch2/NSF_GWAS/macroPhor_Array/GREGSCORNER/GFPA_cottonwoods/wk3/Fluorescent///GFPA4_I5.0_F1.9_L100_175759_4_2_5_segment_uncropped_processed.png
Plate segment loaded
Plate RGB loaded
Loading CLS data from path/scratch2/NSF_GWAS/gmodetector_out/GREGSCORNER/GFPA_cottonwoods/wk3/Fluorescent///GFPA4_I5.0_F1.9_L100_175759_4_2_5_weights.hdf
Loading plate 2 of 90
/scratch2/NSF_GWAS/macroPhor_Array/GREGSCORNER/GFPA_cottonwoods/wk3/Fluorescent///GFPA4_I5.0_F1.9_L100_175704_3_2_4_segment_uncropped_processed.png
Plate segment loaded
Plate RGB loaded
Loading CLS data from path/scratch2/NSF_G

##### 4.3. Calculate sums of statistics over combined segments

We are interested in all regenerated tissue (callus + shoot) as well as all tissue (including stem as well). We will calculate aggregate statistics over these groups.

In [50]:
Rscript calculate_sum_stats_over_combined_segments.R \
--datapath "${data_folder}/"

[1] "Writing output with sums statistics calculated over combined tissue segments to: /scratch2/NSF_GWAS/GMOlabeler/output/GREGSCORNER/GFPA_cottonwoods/wk3/Fluorescent//stats_with_sums_over_tissues.csv"


##### 4.4. Make plots of results

In [51]:
echo "${data_folder}/"
echo $randomization_datasheet
echo $pixel_threshold
echo $variable_type
echo 1
echo $missing_explants
echo $grid
echo 1
echo $height
echo $width

GREGSCORNER/GFPA_cottonwoods/wk3/Fluorescent//
/scratch2/NSF_GWAS/macroPhor_Array/GREGSCORNER/GFPA_cottonwoods/GFPA_randomized list_MNrevised.xlsx
3

1
/scratch2/NSF_GWAS/macroPhor_Array/GREGSCORNER/GFPA_cottonwoods/wk3/Fluorescent//output.csv
12
1
5
9


In [52]:
echo $missing_explants

/scratch2/NSF_GWAS/macroPhor_Array/GREGSCORNER/GFPA_cottonwoods/wk3/Fluorescent//output.csv


In [53]:
echo $randomization_datasheet

/scratch2/NSF_GWAS/macroPhor_Array/GREGSCORNER/GFPA_cottonwoods/GFPA_randomized list_MNrevised.xlsx


In [54]:
cd /scratch2/NSF_GWAS/GMOlabeler/
Rscript grid_item_plots_a13.R \
-d "${data_folder}/" \
-r "$randomization_datasheet" \
-p $pixel_threshold \
-v categorical \
-m 1 \
-M $missing_explants \
-g $grid \
--sort 1 \
--height $height \
--width $width

1: replacing previous import ‘data.table::melt’ by ‘reshape2::melt’ when loading ‘GMOdetectoR’ 
2: replacing previous import ‘data.table::dcast’ by ‘reshape2::dcast’ when loading ‘GMOdetectoR’ 
In storage.mode(default) <- type : NAs introduced by coercion
[1] "Saving list of input arguments to : /scratch2/NSF_GWAS/GMOlabeler/plots/GREGSCORNER/GFPA_cottonwoods/wk3/Fluorescent//gridplot_args.rds"
[1] "Reading in output from GMOlabeler at path: /scratch2/NSF_GWAS/GMOlabeler/output/GREGSCORNER/GFPA_cottonwoods/wk3/Fluorescent//stats_with_sums_over_tissues.csv"

[1] "Rows in output from GMOlabeler: 4359"

[1] "Max n_pixels_passing_threshold in output from GMOlabeler: 2626"

[1] "Max total_signal in output from GMOlabeler: 471922.685884808"

[1] "Look at the top of output from GMOlabeler"
[1] ""                                                                                                                       
[2] "/scratch2/NSF_GWAS/macroPhor_Array/GREGSCORNER/GFPA_cottonwoods/wk3/Fluores

## 5. Email plots to user

##### 5.1. ZIP results

In [55]:
echo "/scratch2/NSF_GWAS/GMOlabeler/plots${data_folder}"

/scratch2/NSF_GWAS/GMOlabeler/plotsGREGSCORNER/GFPA_cottonwoods/wk3/Fluorescent/


In [56]:
cd "/scratch2/NSF_GWAS/GMOlabeler/plots/${data_folder}"

In [57]:
rm -f ./plants_over_plates.csv

In [58]:
cp "/scratch2/NSF_GWAS/GMOlabeler/output/${data_folder}/plants_over_plates.csv" ./

In [59]:
rm -f Rplots.pdf

In [60]:
cd ../

This messy substitution is explained here: https://superuser.com/questions/1068031/replace-backslash-with-forward-slash-in-a-variable-in-bash

In [61]:
data_folder_Compress="${data_folder////-}.zip"
data_folder_Compress=${data_folder_Compress#?};

In [62]:
ls

[0m[01;34mFluorescent[0m


In [63]:
cd $timepoint
zip -r $data_folder_Compress ./*

updating: 2A1c_portion_Callus_total.png (deflated 11%)
updating: 2A1_portion_Callus_total.png (deflated 13%)
updating: 2A2c_portion_Callus_transgenic.png (deflated 12%)
updating: 2A2_portion_Callus_transgenic.png (deflated 29%)
updating: 2B1c_portion_Shoot_total.png (deflated 32%)
updating: 2B1_portion_Shoot_total.png (deflated 19%)
updating: 2B2c_portion_Shoot_transgenic.png (deflated 32%)
updating: 2B2_portion_Shoot_transgenic.png (deflated 20%)
updating: 2C1c_portion_All_regenerated_tissue_total.png (deflated 11%)
updating: 2C1_portion_All_regenerated_tissue_total.png (deflated 14%)
updating: 2C2c_portion_All_regenerated_tissue_transgenic.png (deflated 12%)
updating: 2C2_portion_All_regenerated_tissue_transgenic.png (deflated 27%)
updating: 2D1c_portion_All_tissue_total.png (deflated 11%)
updating: 2D1_portion_All_tissue_total.png (deflated 15%)
updating: 2D2c_portion_All_tissue_transgenic.png (deflated 8%)
updating: 2D2_portion_All_tissue_transgenic.png (deflated 12%)
updating: 3A1

##### 5.2. Write email

In [64]:
duration=$(( SECONDS - start ))

https://unix.stackexchange.com/questions/53841/how-to-use-a-timer-in-bash

In [65]:
rm -f /scratch2/NSF_GWAS/GMOlabeler/email_to_send.txt
cp /scratch2/NSF_GWAS/GMOlabeler/email_to_send_template.txt /scratch2/NSF_GWAS/GMOlabeler/email_to_send.txt

In [66]:
echo "" >> /scratch2/NSF_GWAS/GMOlabeler/email_to_send.txt
echo "Number of samples run: " >> /scratch2/NSF_GWAS/GMOlabeler/email_to_send.txt

In [67]:
cat "${data}/test.csv" | wc -l >> /scratch2/NSF_GWAS/GMOlabeler/email_to_send.txt
echo "" >> /scratch2/NSF_GWAS/GMOlabeler/email_to_send.txt

In [68]:
if (( $SECONDS > 3600 )) ; then
    let "hours=SECONDS/3600"
    let "minutes=(SECONDS%3600)/60"
    let "seconds=(SECONDS%3600)%60"
    echo "Completed in $hours hour(s), $minutes minute(s) and $seconds second(s)" >> /scratch2/NSF_GWAS/GMOlabeler/email_to_send.txt
elif (( $SECONDS > 60 )) ; then
    let "minutes=(SECONDS%3600)/60"
    let "seconds=(SECONDS%3600)%60"
    echo "Completed in $minutes minute(s) and $seconds second(s)" >> /scratch2/NSF_GWAS/GMOlabeler/email_to_send.txt
else
    echo "Completed in $SECONDS seconds" >> /scratch2/NSF_GWAS/GMOlabeler/email_to_send.txt
fi

In [69]:
echo "" >> /scratch2/NSF_GWAS/GMOlabeler/email_to_send.txt

##### 5.3. Send email with results to user

In [70]:
pwd

/scratch2/NSF_GWAS/GMOlabeler/plots/GREGSCORNER/GFPA_cottonwoods/wk3/Fluorescent


In [71]:
mail -a $data_folder_Compress -s $data_folder $email < /scratch2/NSF_GWAS/GMOlabeler/email_to_send.txt

In [72]:
du -sh $data_folder_Compress

3.5M	REGSCORNER-GFPA_cottonwoods-wk3-Fluorescent-.zip


In [73]:
cat /scratch2/NSF_GWAS/GMOlabeler/email_to_send.txt


Your job is complete!

Number of samples run: 
90

Completed in 57 minute(s) and 37 second(s)



##### Time analysis ends

In [74]:
echo $(date)

Tue Oct 27 21:06:27 PDT 2020
