# Prepare Jupyter notebooks for each batch

This workflow is used to run batches of dataset through GMOdetector.

We define a single dataset as a set of images that are typically collected on the same day, in the same folder. An experiment may include a batch of data for each of several timepoints.

How to perform batch analysis:
1. Fill in a spreadsheet in `csv` format just like the example provided in this folder, `1-IN_Batch_parameters.csv`. This should have one column for each dataset. The spreadsheet should be filled out in accordance with the proper criteria and formatting for each path or paramter as described in the [workflow notebook](https://github.com/naglemi/GMOnotebook/blob/master/2a_Deploy_workflow/GMOdetector_template_v0.62.ipynb) and [tutorials on setting parameters](https://github.com/naglemi/GMOnotebook/tree/master/1_Decide_parameters)
2. Below, enter the paths to this `csv` and provide a descriptive identifier for the batch.
3. Run this notebook, which will create a new notebook for each dataset inside a folder named by your `batch_ID`.
4. Run the next notebook, `2_Deploy_batch_of_notebooks.ipynb`, to launch all of the notebooks created here.

In [1]:
library(data.table)

## Load dataframe of batches and parameters

In [2]:
df <- fread("1-IN_Batch_parameters_GSG_first_run_PTC_PTDwk6.csv", colClasses = 'character')
batch_ID <- "GSG_PTC_PTD_wk6" # No spaces

In [1]:
dir_to_save_notebooks <- "/mnt/output/notebooks/"

## Initiate files from template

In [3]:
options(stringsAsFactors=FALSE)

In [4]:
df <- t(df)
colnames(df) <- df[1, ]
df <- as.data.frame(df[-1, ], stringsAsFactors = FALSE)

In [3]:
if(!dir.exists(dir_to_save_notebooks)){
    dir.create(dir_to_save_notebooks)
}

“cannot create dir '/mnt/output/notebooks', reason 'No such file or directory'”


In [5]:
filenames <- paste0(dir_to_save_notebooks, "/"
                    batch_ID, "/",
                    df$user_note,
                    "_GMOdetv0.62.ipynb")

In [6]:
if(!dir.exists(paste0("../2b_Deploy_workflow_on_batch/", batch_ID))){
    dir.create(paste0("../2b_Deploy_workflow_on_batch/", batch_ID),
               recursive = TRUE)
}

In [7]:
filenames

In [8]:
for(filename in filenames){
    file.copy("../2a_Deploy_workflow/GMOdetector_template_v0.62.ipynb",
              filename,
              overwrite = TRUE)
}

## Replace dummy strings with desired paths/parameters

A good way to do this is using `sed` in `bash`, via R's `system` function.

Example `sed` find and replace command:

To use `system` in R, we simply pass the desired command to `system` as a string.

In [9]:
strings <- data.frame(Parameter = c(
    "user_note", "data", "randomization_datasheet",
    "missing_explant", "fluorophores", "desired_wavelength_range",
    "FalseColor_channels", "FalseColor_caps", "reporters",
    "pixel_threshold", "reporter_threshold", "grid",
    "pre_aligned_resized_grid_borders", "aligned_grid_borders",
    "mode", "homography", "hypercube_csv", "aligned_grid",
    "composite", "test_align_each_img", "width", "height", "parallel",
    "segmentation_mode", "segmentation_model_key", "segmentation_model_path",
    "segmentation_model_type"),
                      String_to_Replace = c(
                          "ENTER_NOTE", "ENTER_DATA_PATH",
                          "ENTER_RANDOMIZATION_DATASHEET_PATH",
                          "ENTER_DENSENET_OPTION_OR_SHEET", "ENTER_FLUOROPHORES",
                          "ENTER_WAVELENGTHS", "ENTER_CHANNELS", "ENTER_CAPS",
                          "ENTER_REPORTERS", "ENTER_PIXEL_THRESHOLD",
                          "ENTER_REPORTER_THRESHOLD", "ENTER_GRID",
                          "ENTER_PRE_ALIGNED_GRID_BORDERS", "ENTER_ALIGNED_GRID_BORDERS",
                          "ENTER_ALIGNMENT_MODE", "ENTER_HOMOGRAPHY_NPY", 
                          "ENTER_HYPERCUBE_TO_CSV", "ENTER_ALIGNED_GRID",
                          "ENTER_COMPOSITE_OPTION", "ENTER_TEST_ALIGN_OPTION",
                          "ENTER_PLOT_WIDTH", "ENTER_PLOT_HEIGHT", "ENTER_PARALLEL_OPTION",
                          "ENTER_SEGMENTATION_MODE", "ENTER_HYP-SEGMENTATION_MODEL_KEY",
                          "ENTER HYP-SEGMENTATION_MODEL_PATH", "ENTER_HYP-SEGMENTATION_MODEL_TYPE"
                ),
                      stringsAsFactors=FALSE
)

In [10]:
t_df <- t(df)

In [11]:
merged <- as.data.table(cbind(strings, t_df))

In [12]:
for(j in 1:length(filenames)){
    filename <- filenames[j]
    batchs_column_in_merged <- j + 2
    #cat(filename, "\n")
    for(i in 1:nrow(merged)){
        find <- merged[i, 2]
        replace <- merged[i, ..batchs_column_in_merged]
        # replace <- gsub("/", "\\/", replace)
        #cat(paste0("Replacing ",
        #           find, " with ", replace, "\n"))
        
        command <- paste0("sed -i -e 's#",
                          find,
                          "#",
                          replace,
                          "#g' ",
                          filename)
        #cat(command, "\n")
        system(command)
        Sys.sleep(0.5)
    }
}

In [13]:
print(paste0("Finished generating notebooks at: ", Sys.time()))

[1] "Finished generating notebooks at: 2022-10-11 14:20:32"
