<a href="https://colab.research.google.com/github/mengqinqqq/pepita/blob/master/modified_interactive_pipeline.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Hello!

Use the following procedure to run the zebrafish pipeline. Action items are **bolded**, the rest of the text provides rationale for what we're doing.

## Nuts and bolts

First, we'll need to establish a connection to Google Drive, from which we'll read in the image data, and to which we'll write some logging data and charts.

**Run the next code block to start the connection to Google Drive, and follow whatever instructions pop up to complete the connection.**

In [None]:
# Mount Google Drive

from google.colab import drive
drive.mount('/content/drive')

Next, we'll need to get the latest zebrafish pipeline code for this script to use. **Run the next code block to download it from GitHub and print out the version we'll be using.**

In [None]:
# Get GitHub repository

import json
from urllib.request import urlopen
from zipfile import ZipFile

repo_url = 'https://github.com/mengqinqqq/pepita/archive/refs/heads/master.zip'
with open('/tmp/repo.zip', 'wb') as zip_file:
  response = urlopen(repo_url)
  zip_file.write(response.read())

with ZipFile('/tmp/repo.zip') as zip_file:
  zip_file.extractall(path = '/tmp/')

commit_url = 'https://api.github.com/repos/mengqinqqq/pepita/commits?per_page=1'
response = json.loads(urlopen(commit_url).read())
print('Got analysis pipeline at commit hash', response[0]['sha'][:12],
      f'({response[0]["author"]["login"]}: "{response[0]["commit"]["message"]}")')

The last piece of getting our infrastructure set up will be to ensure that all the pipeline's dependencies are installed -- by default some won't be. (Once they're installed, python will need to be restarted to gain access to the new packages -- we'll do that next.) **Run the following command:**

In [None]:
!pip install -r /tmp/pepita-master/requirements.txt

Ok, assuming the previous command completed successfully, all dependencies are now installed; **restart the runtime (in the top left menu, Runtime > Restart runtime)**.

Once that's complete, the check marks in the code blocks above should be gone -- that's fine, those blocks have done their jobs and don't need to be rerun.

**Run the next code block to start importing what's needed to run the pipeline.** If it completes successfully, all dependencies are now properly in place.

In [None]:
import glob
import os
import sys

sys.path.append('/tmp/pepita-master')

import pipeline

## Getting relevant data

1. If the zebrafish images aren't stored in Google Drive yet, **upload a directory containing all the images for the experiment**, by going to Google Drive, hitting `+ New`, and uploading the *folder* (not a zip file).
  - You can obtain these from the lab share drive, at Active(Helens)/ma_lab/Microscopy/Zebrafish.
  - You can also obtain these from Dropbox, at Dropbox/Ethan/Project_INDIGO-Tox/Experiments/[Date]_[Experiment].
  - If you're on a computer without access to either of these locations:
    1. log in to Dropbox using the lab Avira credentials,
    1. navigate to the relevant folder at the location listed above,
    1. download the whole experiment folder,
    1. and unzip the zip file that gets saved in your Downloads folder.
  - Whichever way you get it, upload the whole experiment folder, with subfolders for each of the (1-3) plates inside it.

1. We will also need to **upload a CSV plate template indicating what condition each well represents** -- *this is a very important piece*, providing important information on what the image data means; without it, we just have unlabeled images.
  - An example template can be found in the github repo [here](https://raw.githubusercontent.com/ma-lab-cgidr/PEPITA-tools/master/examples/plate-layout.csv).
  - The location and name of the CSV file on Google Drive will be configured lower down, so put it anywhere you like.

**Copy/paste a link to the relevant Benchling protocol below** for good traceability (in Benchling, go to the experiment, click Share in the top right, and copy the given link; double click here and replace the dummy link with the real one):

https://example.com

## Making Configuration Changes (CHANGE CODE HERE)

Next, we'll need to configure some settings to match the experiment being analyzed. **Edit the following code block so that all variables are correct**; see the comments for guidance.

Once everything looks good, **run the code block**.

In [None]:
#
# CHANGE STUFF HERE
#

# Common Parameters: Change the following for every experiment

CHECKERBOARD = True                                     # set to false if doing simple dose-response curve(s)
CONVERSIONS = {                                         # fill in drug dose conversions used in this experiment
    'AZM50': 'AZM 150μM',
    'GEN50': 'GEN 2μM',
    'NEO99': 'NEO 20μM'
}
EXPERIMENT_DATE = '2023-04-11'                          # enter the date of the experiment, so files created have meaningful names
FOLDER = '/content/drive/MyDrive/2023-04-11_AzmGenCombo'# fill in which directory contains the images for this experiment; use the Copy Path function in the kebab menu \o/

# Uncommon Parameters: We tend to leave the following the same, but modify them if needed

PLATE_CONTROL = 'Untreated 0μM'
PLATE_POSITIVE_CONTROl = 'NEO99'
PLATEFILE = '/content/drive/MyDrive/2023-04-11_AzmGenCombo/plate-template-both.csv' # use Copy Path again

# Generated Parameters: You probably don't need to change the following, but you can

chartfile = f'{FOLDER}/chart_{EXPERIMENT_DATE}.png'
imagefiles = sorted(glob.glob(f'{FOLDER}/**/*_CH1.tif', recursive=True))

#
# OK NOW WE'RE DONE CHANGING STUFF
#

## Running the pipeline

One more code block needs to run before we can execute the pipeline. This just finishes setup, taking into account the variables you may have modified above. **Run the next code block to continue.**

In [None]:
# Configure output to go in a helpful location

config_file = '/tmp/pepita-master/config-ext.ini'

for dir in ('.cache', 'dose_response', 'imageops'):
  os.makedirs(f'{FOLDER}/log/{dir}', exist_ok=True)

with open(config_file, 'w') as file:
    file.write('[Main]\n')
    file.write(f'log_dir = {FOLDER}/log\n')

# Look the other way... this is super janky. But it's needed atm for files to
# end up in the proper location
sys.argv[0] = '/tmp/pepita-master/pipeline.py'

Now that setup is complete, we should be able to run the analysis pipeline. **Run the following code block** -- this one may take a few minutes to complete.

In [None]:
# Go ahead and run the pipeline!

debug=1 # This slows things down a bit, but also makes for easier investigation
#         of problematic datapoints. Decrease to 0 for faster execution.
#         Increase to 2 or more at your own risk.

pipeline.main(imagefiles, chartfile=chartfile, checkerboard=CHECKERBOARD,
              conversions=CONVERSIONS, debug=debug, platefile=PLATEFILE,
              plate_control=[PLATE_CONTROL], plate_info=EXPERIMENT_DATE,
              plate_positive_control=[PLATE_POSITIVE_CONTROl],
              absolute_chart=True)

## Analysis

Whether or not a rerun is warranted, or if the results are high quality in general, is a tricky question. Some things to look at as you **evaluate the quality of the run**:
- How many missing squares are there in the `{EXPERIMENT_DATE}_96-well_schematic_heatmap_absolute_###########.png` chart?<sup>1</sup>
- What values do the control wells have in that same chart?<sup>2</sup>
- How tight are the groups clustering in the `chart_{EXPERIMENT_DATE}.png` chart?<sup>3</sup>
- How many major outliers are there in that same chart?<sup>4</sup>
- Looking at the axes in the `*checkerboard_###########.png` chart: Are the dose responses under single drug conditions decreasing monotonically (or something close)? Back-and-forthing responses, or flat ones for known ototoxins, usually make results hard to interpret.<sup>5</sup>

Debug images (if `debug` > 0 in the previous code block) can be found in the `log/` folder, inside the folder you uploaded.

**Rerun the pipeline as needed after making tweaks** -- either by changing variables above, or by adding or changing mask images (put them next to their corresponding zebrafish image, replacing `_CH#.tif` in the filename with `_mask.tif`).

----

1: Missing squares indicate fish excluded from the analysis -- more than, say, 1-3 missing indicates a systematic problem with the plate: either several dead or missing fish, or some difference with the images making it hard for the pipeline to properly locate the fish. It's often worth looking into each missing square, and potentially adding a mask to help the pipeline know where the fish is.

2: Untreated controls should ideally have among the highest scores in the plate and cluster fairly closely with each other (a range of 0-5 is ideal, 5-15 is acceptible, 15-25 is a bit disappointing, >25 is a problem). Positive controls should have a value near 0. Wells with values outside these expectations should be looked at individually.

3: Wide clustering is generally just an indicator of poorer quality results, but doesn't necessarily mean the results should be discarded.

4: Major outliers are often a sign of poor image segmenting, requiring a manual mask, similar to (1).

5: This is generally a result of something gone wrong in the course of the experiment, and can't really be fixed on the analysis end.