# Prepare Jupyter notebooks for each batch

This workflow is used to run batches of dataset through GMOdetector.

We define a single dataset as a set of images that are typically collected on the same day, in the same folder. An experiment may include a batch of data for each of several timepoints.

How to perform batch analysis:
1. Fill in a spreadsheet in `csv` format just like the example provided in this folder, `1-IN_Batch_parameters.csv`. This should have one column for each dataset. The spreadsheet should be filled out in accordance with the proper criteria and formatting for each path or paramter as described in the [workflow notebook](https://github.com/naglemi/GMOnotebook/blob/master/2a_Deploy_workflow/GMOdetector_template_v0.62.ipynb) and [tutorials on setting parameters](https://github.com/naglemi/GMOnotebook/tree/master/1_Decide_parameters)
2. Below, enter the paths to this `csv` and provide a descriptive identifier for the batch.
3. Run this notebook, which will create a new notebook for each dataset inside a folder named by your `batch_ID`.
4. Run the next notebook, `2_Deploy_batch_of_notebooks.ipynb`, to launch all of the notebooks created here.

In [1]:
library(data.table)

## Load dataframe of batches and parameters

In [2]:
df <- fread("/mnt/archived_notebooks/1-IN_Batch_parameters_GTNECd5-rgb.csv", colClasses = 'character')
batch_ID <- "GTNECd5-rgb" # No spaces

In [3]:
dir_to_save_notebooks <- "/mnt/archived_notebooks/"

## Initiate files from template

In [4]:
options(stringsAsFactors=FALSE)

In [5]:
if(!dir.exists(dir_to_save_notebooks)){
    dir.create(dir_to_save_notebooks)
}

In [6]:
df

Parameter,Dataset_1
<chr>,<chr>
user_note,GTNEC_d5_test
data,/mnt/drives/Elements/Elements_25/GTNEC_GWAS_poplar_transformation_necrotic_test/day5/
randomization_datasheet,/mnt/drives/Elements/Elements_25/GTNEC_GWAS_poplar_transformation_necrotic_test/GTNEC_labels.xlsx
missing_explant,Automatic
fluorophores,(GFP Chl Noise)
desired_wavelength_range,(500 900)
FalseColor_channels,(Chl GFP Noise)
FalseColor_caps,(200 400 200)
reporters,(GFP Chl)
pixel_threshold,3


In [7]:
filenames <- paste0(dir_to_save_notebooks, "/",
                    batch_ID, "/",
                    df[1,2:ncol(df)],
                    "_GMOdet-v0.9.ipynb")

In [8]:
if(!dir.exists(unique(dirname(filenames)))){
    dir.create(unique(dirname(filenames)),
               recursive = TRUE)
}

In [9]:
for(filename in filenames){
    file.copy("/home/gmobot/GMOnotebook/2a_Deploy_workflow/GMOdetector_template_v0.9.ipynb",
              filename,
              overwrite = TRUE)
}

## Replace dummy strings with desired paths/parameters

A good way to do this is using `sed` in `bash`, via R's `system` function.

Example `sed` find and replace command:

To use `system` in R, we simply pass the desired command to `system` as a string.

In [10]:
strings <- data.frame(Parameter = c(
    "user_note", "data", "randomization_datasheet",
    "missing_explant", "fluorophores", "desired_wavelength_range",
    "FalseColor_channels", "FalseColor_caps", "reporters",
    "pixel_threshold", "reporter_threshold", "grid",
    "composite", "width", "height", "parallel",
    "segmentation_mode", "segmentation_model_key", "segmentation_model_path",
    "segmentation_model_type"),
                      String_to_Replace = c(
                          "ENTER_NOTE", "ENTER_DATA_PATH",
                          "ENTER_RANDOMIZATION_DATASHEET_PATH",
                          "ENTER_DENSENET_OPTION_OR_SHEET", "ENTER_FLUOROPHORES",
                          "ENTER_WAVELENGTHS", "ENTER_CHANNELS", "ENTER_CAPS",
                          "ENTER_REPORTERS", "ENTER_PIXEL_THRESHOLD",
                          "ENTER_REPORTER_THRESHOLD", "ENTER_GRID",
                          "ENTER_COMPOSITE_OPTION",
                          "ENTER_PLOT_WIDTH", "ENTER_PLOT_HEIGHT", "ENTER_PARALLEL_OPTION",
                          "ENTER_SEGMENTATION_MODE", "ENTER_HYP-SEGMENTATION_MODEL_KEY",
                          "ENTER_HYP-SEGMENTATION_MODEL_PATH", "ENTER_HYP-SEGMENTATION_MODEL_TYPE"
                ),
                      stringsAsFactors=FALSE
)

In [11]:
merged <- merge(strings, df, all.x=TRUE)

In [12]:
merged <- as.data.table(merged)  # Convert to data.table if it's not already

for(j in 1:length(filenames)){
    filename <- filenames[j]
    batchs_column_in_merged <- j + 2
    #cat(filename, "\n")
    for(i in 1:nrow(merged)){
        find <- merged[i, 2, with=FALSE]  # using with=FALSE for base R like indexing
        replace <- merged[i, batchs_column_in_merged, with=FALSE]
        
        # Escape characters that might cause issues in sed command
        find <- gsub("/", "\\/", find)
        replace <- gsub("/", "\\/", replace)

        command <- paste0("sed -i -e 's#",
                          find,
                          "#",
                          replace,
                          "#g' ",
                          filename)
        #cat(command, "\n")
        system(command)
        Sys.sleep(0.5)
    }
}

In [13]:
print(paste0("Finished generating notebooks at: ", Sys.time()))

[1] "Finished generating notebooks at: 2023-12-25 14:53:38"
