# Tutorial 3: The phenopype workflow

Analysis of scientific images can be an iterative process that may require users to go back and forth, trying different processing steps and check how to improve results. Later, when the best functions and appropriate settings are found and efficient data collection has priority, image analysis should be efficient and with minimal user input to increase throughput and reproducibility. In phenopype, users can choose between two different workflows that are useful for different stages in the process of scientific image analysis:

| Workflow | Use case | Operation principle | Code explicitness | Data reproducibility |
|:---|:---------|:------------|:---|:---|
| **Low throughput** | "prototyping" - self education and evaluation on single images and small datasets | image analysis functions are written and stored in Python | high | low |
| **High throughput**  | "production" - default workflow for larger image datasets | images analysis functions are written and stored in YAML format | low | high |

In the **low throughput** workflow, users write a function stack in directly in Python code. This is recommended for users who wish to familiarize themselves with the basic principles of computer vision or when working with only a handful of images. In contrast, the **high throughput** workflow is for production stage, when image analysis should be efficient and data collection reproducible for other users and scientists. In this tutorial you will learn about the differences between the two workflows.

<div class="alert alert-block alert-success">

**Further resources related to the high throughput workflow**

*   [Window control covered in Tutorial 2](tutorial_2_phenopype_images.ipynb#Window-control)
*   [pype section in the API](https://mluerig.github.io/phenopype/api.html#pype-high-throughput-function)
*   [YAML indentation and separation](https://www.tutorialspoint.com/yaml/yaml_indentation_and_separation.htm).
*   [the structure of digital images](https://mluerig.github.io/phenopype/resources.html#computer-vision)
    
</div>

## Example task
    
Here the goal is to measure lateral armor plating in threespine stickleback (*Gasterosteus aculeatus*). First we need to draw a mask around posterior region that contains the plates. For that step you should select the boundaries around the area of interest, perform a thresholding operation inside the mask, and retrieve the contours inside. The procedure to extract bone-plate area is the same in all workflows, but workflows differ in the amount of explicit Python code, and in reproducibility. 

<center>
<br>
<div style="text-align: left" >
<img  src="_assets/images/luerig_2021_figure2.jpg">
    
**Fig. 1:** Workflow demonstration using a stained stickleback (*Gasterosteus aculeatus*) stained with alizarin red. Traits to be extracted area and shape of bone-plates, and, within the detected plates, pixel intensities that denote bone-density. Shown are visual feedback (A) from low-throughput (B) and high-throughput workflow (C), which use the same computer functions, but differ in the way these functions are called by the user: while in the low-throughput workflow all functions have to be explicitly coded in Python, the high-throughput routine parses the functions from human readable YAML files to facilitate rapid user interaction and increase reproducibility (Figure from <a href="https://besjournals.onlinelibrary.wiley.com/doi/10.1111/2041-210X.13771)">Lürig 2021</a>)
</div>
</center>

## Low throughput workflow

In the low throughput workflow, the output of every function needs to be explicitly passed on to the next step. First we need to use  `load_image`, which imports the file as a three-channel numpy array (*ndarray*).

In [1]:
import phenopype as pp

filepath = r"images/stickle1.jpg"

## load image as array, supply image_data (DataFrame containing meta data)
image = pp.load_image(filepath)

Next, run `create_mask` - use left mouse clicks to trace an outline around some of the fish plates, right mouse to remove erroneous points, and finish masking the region with `Enter`. Note that the connection between the last two points is drawn automatically, no need to click on the starting point. Then run all remaining code cells on by one, we will visualize the result at the end. 

<center></center>


<center>
<br>
<div style="text-align: left; width:400px" >
<img src="_assets/images/masks2.gif">
    
<b>Fig. 2:</b> Draw a mask around the armour plates. Finish with `Enter`.

</div>
</center>

In [4]:
## draw mask
mask = pp.preprocessing.create_mask(image, tool="polygon", window_max_dim=1200) 

After drawing a mask around the bone plates, we pass the mask on to the `threshold` function. Thresholding will convert a three dimensional array into a one dimensional binary array of the same width and hight (white denoting foreground, black denoting background). 

In [5]:
## thresholding converts multichannel to binary image
image_bin = pp.segmentation.threshold(image, method="adaptive", 
                                      channel="red", blocksize=199, 
                                      constant=5, mask=mask) 

- decompose image: using red channel
- including pixels from 1 drawn masks 


In [6]:
## detect contours ony binary image
contours = pp.segmentation.detect_contour(image_bin, retrieval="ext", min_area=150) 

- found 7 contours that match criteria


In [10]:
## export contours
pp.export.save_annotation(contours, annotation_id = "a", dirpath = r"_temp/output")

- creating new annotation file
- writing annotation of type "contour" with id "a" to "annotations.json"


In [11]:
contours

{'info': {'annotation_type': 'contour', 'pp_function': 'detect_contour'},
 'settings': {'approximation': 'simple',
  'retrieval': 'ext',
  'offset_coords': [0, 0],
  'min_nodes': 3,
  'max_nodes': inf,
  'min_area': 150,
  'max_area': inf,
  'min_diameter': 0,
  'max_diameter': inf},
 'data': {'n_contours': 7,
  'coord_list': [array([[[1747,  381]],
   
          [[1747,  382]],
   
          [[1746,  383]],
   
          [[1745,  382]],
   
          [[1743,  382]],
   
          [[1742,  383]],
   
          [[1741,  383]],
   
          [[1739,  385]],
   
          [[1739,  388]],
   
          [[1738,  389]],
   
          [[1738,  391]],
   
          [[1737,  392]],
   
          [[1737,  394]],
   
          [[1734,  397]],
   
          [[1735,  398]],
   
          [[1735,  399]],
   
          [[1736,  400]],
   
          [[1736,  402]],
   
          [[1737,  403]],
   
          [[1737,  407]],
   
          [[1738,  408]],
   
          [[1740,  408]],
   
          [[17

Next we can visualize the contours found by `detect_contour`. Note that we first have to draw them explicitly on a "canvas", i.e. a background for visualization. We could draw them on the original image, but then it would be unusable for further work. It is better practice to make a copy using the `copy` library:

In [12]:
import copy

## copy the image 
canvas = copy.deepcopy(image)

## draw detected contours onto canvas
image_drawn = pp.visualization.draw_contour(canvas, contours)  

## show convas
pp.show_image(image_drawn)

While analyzing the image, you can explore output from the different steps to see what is going on. For example, the binary image resulting from the thresholding: 

In [13]:
pp.show_image(image_bin)

## High throughput worflow

This is the default workflow to analyse medium and large image datasets in phenopype. Here, instead of writing down our analysis as a sequence of Python code, as we did in the low throughput workflow, we supply the same functions through a configuration file in human readable `YAML` format. This file can then be loaded by phenopype's `pype` class, which initiates the analysis by triggering three actions: 

1. open the YAML configuration file in the default OS text editor
2. parse the contained functions and execute them in the sequence
3. open a HighGUI window showing the processed image, updates with every step

After one iteration of all steps, users can evaluate the results and decide to modify the opened configuration file (e.g. either change function parameters or add new functions), and run `pype` again, or to terminate the `pype`-run and save all results. The processed image, any extracted phenotypic information, as well as the modified config-file is stored inside a phenopype `container`, and inside a directory specified with `dirpath`.



<center>
<div style="width:600px; text-align: left" >
    
<center><img src="_assets/images/luerig_2021_figure_3C.jpg" width=400></center>    
    
**Fig 2** The Pype class in a for loop will trigger a series of events for each image directory provided by the loop generator: i) open the contained yaml configuration with the default OS text editor, ii) parse and execute the contained functions from top to bottom, iii) open a GUI window and show the processed image. Once the Pype class has finished executing all functions from the configuration file, users can decide to either modify the opened configuration file (e.g. either change function parameters or add new functions), which will trigger to run the Pype class again, or to close the GUI window, which will terminate the Pype class instance and save all results to the folder. By including the Pype class in a simple Python for-loop with all project folders, users can continue with this procedure throughout the entire project dataset, eliminating the need to manually open, close or save any of the images.
    
    
</div>
</center>

<div class="alert alert-block alert-danger">

**IMPORTANT - read before continuing:**
    
1. Window control is as covered in [Tutorial 2](tutorial_2_phenopype_images.ipynb#Window-control)). Don't use the close button, make sure that the window is selected / highlighted when you use the key combinations to close or interact with it:  
    
- `Enter` - finish an interactive step / function in `pype`-mode (e.g. creating a mask)
- `Ctrl+Enter` - close and finish a window in `pype`-mode 
- `Esc` - close a window and quit the Phenoype process that invoked it (e.g. a `for` loop - [see Tutorial 4](tutorial_4_managing_projects.ipynb#Using-pype-with-project-folders)). This may also work when the process is frozen. 
    
2. At the current stage of development, phenopype cannot handle errors resulting from incorrect yaml syntax (e.g. missing spaces or wrong indentation). Consult the section [YAML-syntax (below)](#YAML-syntax) to learn how to correctly modify the configuration files.

2. The `pype` attempts to facilate rapid processing by calling some functions automatically (e.g. to visualize and export the results). Consult the [pype section in the API](https://mluerig.github.io/phenopype/api.html#pype-high-throughput-function) to learn about the most important aspects of the `pype` function.
    
</div>  

In [None]:
import phenopype as pp

filepath = r"images/stickle1.jpg"

pype_demo = pp.pype(image=filepath, # input - can be also an array or a phenopype directory 
        dirpath = r"_temp/output", ## directory where output is stored (folder needs to exist)
        name="demo", # name of the  pype routine, appended to all results-files 
        template="tut3" # template for the analysis - you can create your own!
        )

Now all results and intermediate data results are saved under `"_temp/output"`, but also contained in a container that is part of the `pype`-object which we called `pype_demo`. You can use it to access e.g. the binary images, as we did in the other workflows above, and visualize it manually: 

In [None]:
pp.show_image(pype_demo.container.image_bin)

Of course you can modify the configuration files to change the outcome. For instance, try to change the `blocksize` argument in `threshold` to `49`, or `499`, and see what happens. If you do so while the window is open, phenopype will update the image window and show the updated results.

### YAML syntax

The configuration files needed to run the pype are written in yaml (a recursive acronym for "YAML Ain't Markup Language"). In principle, these are just text files that follows a specific set of rules for indentation and separation. Let's look at the configuration template for this tutorial that we used above - we can look at *all* templates using `pype_config_templates`, and inspect a specific template using `show_config_template`: 

In [None]:
import phenopype as pp

pp.pype_config_templates

In [None]:
pp.show_config_template("tut3")

The text inside the yaml configuration files is parsed by Python from top to bottom and converted back to Python code in the background, i.e. to phenopype modules and functions. Indentation hierarchy is as follows:

1. The first level without any indentation, e.g. `-preprocessing` or `- segmentation`, denote from the module that a function is part of. 
2. The second level with two-space indentation before the hyphen, e.g. `- threshold` or `- find_contours` are functions that are loaded from the `segmentation` module. 
3. The third level without hyphens, e.g. `method: otsu` and `blocksize: 99`, are arguments passed on to the function. 

Following this notation, the yaml parser in Python interprets the first item in `segmentation` as follows:

`pp.segmentation.threshold(image, method="adaptive", blocksize=199, constant=5, channel="red")`

When running the pype routine, `image` is automatically loaded and passed to all following functions. You can add or remove functions as you like. Note in the hyphenated first two levels you can  specify modules and functions as many times as you want  (`- ` is the yaml list notation). When adding or modifying modules and functions, it is important to keep in mind that the **function stack is executed sequentially**. So, if you want to perform a `morphology` operation on a binary images, it should come *after* and not before the main segmentation function (in this case `threshold`). 


<div class="alert alert-block alert-info">
    
**Here are the most important rules for YAML syntax:**

- **indentation rules:**  
    - 0 spaces for modules
    - 2 spaces + hyphen+space in front of functions 
    - 4 spaces in front of arguments
- **separation rules:** 
    - modules and functions with arguments are followed by a colon (`:`) and a new line
    - functions without specified arguments don't need a colon 
    - arguments are followed by a colon, a space and then the value
- modules and functions can be emtpy (see`- draw_masks` above), but function arguments *cannot* be emtpy (e.g. `overwrite:` needs to be `true` or `false`)
- as per Python syntax, optional function arguments can, but don't have to be specified and the functions will just run on default values
- functions can be added multiple times, but sometimes their output may be overwtritten (e.g. `- threshold` makes sense only once, but `- blur` may be used in multiple locations)
    
</div>



To learn how to analyze entire data sets with the high-throughput method, move on to the next [Tutorial 4](tutorial_4_managing_projects.ipynb). 