## Stepwise guide for running mcmicro on O2
### Step: 0
Open a new terminal window. Using transfer node `ssh [ECOMMON ID]@transfer.rc.hms.harvard.edu` copy the raw data to your working directory on O2 scratch folder (`rsync -avP [path to source data] /n/scratch/users/*/*/[working dir]` ) 
### Step: 1
From within a new terminal window, enter a login node `ssh [ECOMMON ID]@o2.hms.harvard.edu`. Go to the working directory where you transfered your raw data `cd /n/scratch/users/*/*/[working dir]`. Make sure the data is organized in the following format. Raw data from each slide/image needs to be in its own folder named `raw`. All the images need to be placed under a higher level folder (Dataset).

-- Main_Folder/ Working directory (Dataset)  
&emsp;&emsp;-- Image1/  
&emsp;&emsp;&emsp;&emsp;-- raw/  
&emsp;&emsp;-- Image2/  
&emsp;&emsp;&emsp;&emsp;-- raw/   

#### Optional: 
use this script to sort rcpnl files or ome.tif files into the correct mcmicro structure automatically. type `module load java` then create a new file called `sortfiles.nf`

New file can be created by using any text editor. I use `vim`. Just type `vim sortfiles.nf` press `i`  to insert or start typing. Paste the following code block and save by pressing `esc` and then `:wq` + `enter`. if you want to use this script to sort ome.tiff into registration dir instead, replace `*.rcpnl` and `$dir/raw` respectively. Depending on the age of your raw file or the machine data was aquired, the format of your filename may require the `.split('@').head()` `@` to be replaced by a different character. Insert the character directly after the `LSPID` to isolate the `LSPID` here and decide the Dataset dir name.

Type `nextflow sortfiles.nf` to run.



```groovy
Channel.fromPath("*.rcpnl")
    .map{ it ->
      dir = it.getBaseName().split('@').head();
      file("$dir/raw").mkdirs();
      it.moveTo("$dir/raw")
}
```

### Step: 2
Again, within your working directory, create a markers.csv file in the following format. Ensure the same number of markers as number of channels in the image or there will be an error with mcmicro quantification module. 

you can again use vim, just type `vim markers.csv` press `i` to insert and can paste directly from a csv on excel if you want. save by pressing `esc` and then `:wq` + `enter`

```
cycle,marker_name
1,DNA_1
1,secondary_488
1,secondary_555
1,secondary_647
2,DNA_2
2,CCR4/7
2,background_555
2,background_647
3,DNA_3
3,CD163
3,CD19
3,CD3d
```

### Step: 3
Within your working directory,create a params.yml file in the following format. note: parameters may change if the image was collected using pixel binning or on a different microscope. Here image was collected on Rarecyte Orion or HT14 without pixel binning. Change the `start-at` or `stop-at` parameters as needed for your unique situation.

you can again use vim, just type `vim params.yml` press `i` to insert and can paste directly from a csv on excel if you want. save by pressing `esc` and then `:wq` + `enter`

```yaml
workflow:
  start-at: segmentation
  stop-at: quantification
  background: false
  segmentation-channel: 1 1
options:
  unmicst: --channel 1 --scalingFactor 0.5 --tool unmicst-duo --outlier 99.99
  s3seg: --maxima-footprint-size 15 --area-max 50000 --expand-size 6 --pixelSize 0.325 --mean-intensity-min 80
  mcquant: --masks nucleiRing.ome.tif cellRing.ome.tif cytoRing.ome.tif
modules:
  watershed:
    name: s3seg
    container: labsyspharm/s3segmenter
    version: 1.5.5-large
```

### Step: 4 
Save a copy of `markers.csv` and `params.yml` within each `image` folder.

#### Optional: 
use the following bash script to automatically copy these files into the `image` folder.

again, use vim within your working directory. just type `vim cpmarkers.sh` press `i` to insert and can paste directly from a csv on excel if you want. save by pressing `esc` and then `:wq` + `enter`

to run, type `bash cpmarkers.sh`

```bash
#!/bin/bash
for dir in LSP*
  do
  cp markers.csv "$dir"
  cp params.yml "$dir"
done
```

### Step: 5 
move the `image` directories into a `Main Folder` where all directories within it will be subject to the mcmicro submission. 

I like to call this Main Folder something along the lines of `to_run`

type first `mkdir to_run` then `mv LSP* to_run` . This is now referred to as your `dataset` directory, while the parent directory remains the `working directory`

### Step: 6
Create a new file called 'submission.sh' and copy paste the following code block into it (This can be anywhere you like, I just keep in my working directory).


New file can be created by using any text editor. I use `vim`. Just type `vim submission.sh` press `i`  to insert or start typing. Paste the following code block and save by pressing `esc` and then `:wq` + `enter`


```bash
#!/bin/sh

# Using getopts for passing in arguments via basg
while getopts ":p:t:" opt; do
  case $opt in
    p) p="$OPTARG"
    ;;
    t) t="$OPTARG"
    ;;
  esac
done

# store sub dirs
SLIDEDIRS=("$p"/*/)

# loop and submit all jobs
for dir in ${SLIDEDIRS[@]}; do
    cd "$dir"
    name=$(basename "$dir")
    echo $name $dir
    sbatch --job-name "$name" "$t" "$dir"
    sleep 2
done
```

### Step: 7
Create a template to run mcmicro on all images. This contains all the information of the various settings to be used when running mcmicro.

Some fields need to be replaced based on your situation
`#SBATCH -t ` This depends on the size of your dataset. Note if you change this, you might need to change `#SBATCH -p ` as well.  
Maximum time in each partition
short: 12 hours  
medium: 5 days  
long: 30 days 
for more on partitions see `https://harvardmed.atlassian.net/wiki/spaces/O2/pages/1586793641/How+to+choose+a+partition+in+O2`

As before create a new file called `submit_mcmicro.sh` and copy paste the following code block into it.

Just type `vim submit_mcmicro.sh` press `i` to insert or start typing. Paste the following code block and save by pressing `esc` and then `:wq` + `enter`

```bash
#!/bin/bash
#SBATCH -p medium
#SBATCH -J nextflow_O2
#SBATCH -t 5-0
#SBATCH --mem=1G
#SBATCH --mail-type=END


# get the values
SAMPLEDIR="$1"
SAMPLEID="$(basename $SAMPLEDIR)"

# load packages
module purge
module load java

# RUN JM custom script to generate memory config. toggle between using pre OR post reg
# Comment out all lines if using -resume or re-running after the memory.config already exists
# n/groups/lsp/mcmicro/tools/o2/config_pre_reg.sh -u -s  $SAMPLEDIR 
# -u  Set this when using unmicst --scalingFactor 0.5
# -s  Set this when using s3segmenter-large version
# Switch between pre and post registration memory config generation!
#
/n/groups/lsp/mcmicro/tools/o2/config_pre_reg.sh -s -u "$SAMPLEDIR" > "$SAMPLEDIR"/memory.config
#/n/groups/lsp/mcmicro/tools/o2/config_post_reg.sh -s -u "$SAMPLEDIR" > "$SAMPLEDIR"/memory.config

# Run
nextflow run labsyspharm/mcmicro \
    -profile        O2,WSI,GPU\
    --in            "$SAMPLEDIR" \
    -w              /n/scratch/users/"${USER:0:1}/$USER"/work \
    -c              /n/scratch/users/"${USER:0:1}/$USER"/base.config \
    -c              "$SAMPLEDIR"/memory.config \
    --publish_dir_mode link
```

### Step: 8
Create the `base.config` file. This can be located anywhere you like. I keep mine in my user scratch folder (see path in script above, change to your chosen path)

Move to the chosen directory and type `vim base.config` press `i` to insert or start typing. Paste the following code block and save by pressing `esc` and then `:wq` + `enter`

```groovy
process {
  withName:illumination   {
    cpus   = 4
    time   = '12h'
  }
  withName:ashlar         {
    cpus   = 1
    time   = '36h'
    queue  = 'medium'
  }
   withName: 'segmentation:worker'         {
    time   = '2h'
    queue = 'gpu_quad'
  }
  withName:s3seg          {
    cpus   = 4
    time   = '2h'
  }
  withName:mcquant        {
    cpus   = 1
    queue  = 'medium'
    time   = '96h'
  }
}

report {
  enabled = false
}
```

### Step: 9
Submit the job !!!

Type the following from the directory where you had saved the submission.sh script.

`bash submission.sh -p '/n/scratch/users/r/rjp21/project/to_run' -t '/n/scratch/users/r/rjp21/project/submit_mcmicro.sh'`
where,
`p` is path to the main/dataset folder.
`t` is path to the template file that you just created.

type `sacct` into the terminal to see the jobs submitted.