# Tutorial 3: The three phenopype workflows

Analysis of scientific images can be an iterative process that may require users to go back and forth, trying different processing steps and check how to improve results. Later, when the best functions and appropriate settings are found and efficient data collection has priority, image analysis should be efficient and with minimal user input to increase throughput and reproducibility. In phenopype, users can choose from three different workflows that are useful for different stages in the process of scientific image analysis.

In all three workflows, users assemble a stack (or sequence) of computer vision functions to analyze an image (Fig 1). The functions are available from phenopypes five core modules: **preprocessing, segmentation, measurement, export and visualization** ([see API reference](api.html#image-analysis-core-modules)). The major difference between the three workflows lies in the degree of code-explicitness, i.e. the need to write code for every step of the analysis, and in the degree of reproducibility, i.e. documentation of code and the results it produces. 

| Workflow | Use case | Operation principle | Code explicitness | Data reproducibility |
|:---|:---------|:------------|:---|:---|
| **Prototyping** | analysis prototyping, self education and evaluation | images are loaded as arrays and functions are applied one by one | High | Low |
| **Low throughput** | single pictures and very small datasets | images are loaded into phenopype containers | Medium | low |
| **High throughput**  | medium and large datasets - default analysis workflow | images are loaded from a phenopype project directory tree, and analyzed with the [pype](api.html#pype-high-throughput-function) method | Low | High |

In the **prototyping** and **low throughput** workflow, users write a phenopype function stack in directly in Python code. This is recommended for users who wish to familiarize themselves with the basic principles of computer vision and to explore the phenopype function library. The **high throughput** workflow is for production stage, when image analysis should be efficient and data collection reproducible for other users and scientists. In this tutorial, you will learn the differences between the three workflows.

<center>
<div style="width:600px; text-align: left" >
    
![Phenopype workflow example](_assets/workflow_example_case.png)
    
**Fig. 1:** Workflow demonstration using a stained stickleback (*Gasterosteus aculeatus*) stained with alizarin red. Traits of interest are bone-plate area and shape, and, within the detected plates, pixel intensities that denote bone-density. The computer vision functions used to extract the trait of interest (bone-plate area, shape and pixel density) are the same in all workflows, but workflows differ in the amount of code necessary and in reproducibility. 
    
</div>
</center>

## Prototyping worflow

In the prototyping workflow, the output of every function needs to be explicitly passed on to the next step (as seen in [Tutorial 2](tutorial_2_phenopype_images.ipynb)). Every step can be run seperately, or all in one go (if you merged all the code cells). This is useful for phenopype beginners who want to explore what the different functions do, or for troubleshooting, because all intermediate steps can be inspected. 

First we need to provide the path to an image on the hard drive to `load_image`, which imports the file as a three-channel [1] numpy array (*ndarray*), together with image meta data (file name, exposure, dimensions, etc.) as a pandas *DataFrame*. The array gets passed on to the `threshold` function, which will return a binary array of the same dimensions. This array needs to be passed on to the `find_contours` function, which will return a dictionary with the detected contours. Finally, the dataframe can be exported as a csv file with `save_contours`. By passing on the initially created meta-data, this function will also included filename and image dimensions in the exported csv.  

[1] to learn more about the basics of Computer Vision check the [resources section](resources.html) of the phenopype documentation. 

![Phenopype prototyping workflow](_assets/workflow_proto.png)
<strong>Fig. 2:</strong> Schematic of Phenopype's prototyping workflow

In [1]:
import phenopype as pp

filepath = r"images/stickleback_side.jpg"

In [2]:
## load image as array, supply image_data (DataFrame containing meta data)
image, image_data = pp.load_image(filepath, df = True, meta=True)

<center>
<div style="width:500px; text-align: left" >
    
![Create masks](_assets/masks2.gif)
    
</div>
</center>

In [3]:
## draw mask
mask = pp.preprocessing.create_mask(image, tool="polygon") 

- creating mask


In [4]:
## thresholding converts multichannel to binary image
image_bin = pp.segmentation.threshold(image, method="adaptive", 
                                      channel="red", blocksize=199, 
                                      constant=5, df_masks=mask) 

- including pixels from 1 drawn masks 


In [5]:
## perform morphology operations on binarized image
image_morph = pp.segmentation.morphology(image_bin, operation="close", 
                                         shape="ellipse", kernel_size=3, 
                                         iterations=3) 

In [6]:
## detect contours ony binary image
contours = pp.segmentation.find_contours(image_morph, df_image_data=image_data, 
                                         retrieval="ext", min_area=150) 

- found 2 contours that match criteria


In [7]:
## draw detected contours onto canvas
image_drawn = pp.visualization.draw_contours(image, df_contours=contours)  

In [8]:
## export contours to csv
pp.export.save_contours(contours, dirpath = r"../_temp/output")

- contours saved under ../_temp/output\contours.csv (overwritten).


In [9]:
contours

Unnamed: 0,filename,width,height,contour,center,diameter,area,order,idx_child,idx_parent,coords
0,stickleback_side.jpg,3000,2000,1,"(1563, 747)",118,1923,parent,-1,-1,"[[[1516, 722]], [[1514, 724]], [[1513, 724]], ..."
1,stickleback_side.jpg,3000,2000,2,"(1299, 671)",425,19439,parent,-1,-1,"[[[1101, 600]], [[1100, 601]], [[1099, 601]], ..."


In [10]:
## show convas
pp.show_image(image_drawn)

While analyzing the image, you can explore output from the different steps to see what is going on. For example, the binary image resulting from the thresholding: 

In [11]:
pp.show_image(image_bin)

## Low throughput worflow

The low throughput workflow is similar to the prototyping workflow, and not intended for work on larger projects. It introduces the phenopype "container, which is a Python class that incorporates loaded images, dataframes, detected contours, intermediate output, etc. so that they are available for inspection or storage at the end of the analysis. The advantage of using containers is that they don’t litter the global environment and namespace, while still containing all intermediate steps (e.g. binary masks or contour DataFrames). Containers can be used manually to analyze images, but typically they are used automatically within the pype-routine that is part of phenoype's high throughput workflow.

![Phenopype low throughput workflow](_assets/workflow_low.png)

In [12]:
import phenopype as pp

filepath = r"images/stickleback_side.jpg"

In [13]:
## load image as a phenopype container which will include all images, dataframes, 
## detected contours and intermediate output
container = pp.load_image(filepath, cont=True, 
                          dirpath=r"../_temp/output", # specifies where the output is stored
                         ) 

Directory to save phenopype-container output set at - E:\git_repos\phenopype\_temp\output


In [14]:
## afterwards, same as in the prototyping workflow, functions are applied 
## directly to the container
pp.preprocessing.create_mask(container, tool="polygon") 
pp.segmentation.threshold(container, method="adaptive", channel="red", 
                          blocksize=199, constant=5) # 3/4
pp.segmentation.morphology(container, operation="close", shape="ellipse", 
                           kernel_size=3, iterations=3) # 5
pp.segmentation.find_contours(container, retrieval="ext", min_area=150) # 6
pp.visualization.select_canvas(container, canvas="raw")
pp.visualization.draw_contours(container) # 6
pp.export.save_contours(container, dirpath = r"../_temp/output")
pp.show_image(container.canvas) 

- creating mask
- including pixels from 1 drawn masks 
- found 2 contours that match criteria
- raw image
- contours saved under ../_temp/output\contours.csv (overwritten).


Although the intermediate steps from the functions are not present as objects in the namespace, you can access and evaluate it from the container. Again, we will look at the binary image:

In [15]:
pp.show_image(container.image_bin)

Use `dir` to inspect all the components of the container:

In [16]:
print(dir(container))

['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', 'canvas', 'df_contours', 'df_image_data', 'df_image_data_copy', 'df_masks', 'dirpath', 'image', 'image_bin', 'image_copy', 'image_data', 'image_gray', 'load', 'reference_manual_mode', 'reset', 'save', 'save_suffix']


## High throughput worflow

This is the default workflow to analyse medium and large image datasets in phenopype. Here, instead of writing a sequence of functions in Python, as we did in the prototyping and the high throughput workflow, we open an textfile in human readable `YAML` format. The high throughput is started with the `pype` function, which triggers three actions: 

1. opening the YAML configuration file in the default OS text editor
2. parsing the contained functions and executing them in the sequence
3. open a HighGUI window showing the processed image 

After one iteration of these steps, users can evaluate the results and decide to modify the opened configuration file (e.g. either change function parameters or add new functions), and run `pype` again, or to terminate the `pype`-run and save all results. The processed image, any extracted phenotypic information, as well as the modified config-file is stored inside the image directory. Together with the raw images, which may be either stored separately or within the directory tree, users can thereby provide the full image analysis pipeline to anyone who wishes to reproduce the obtained results. 



<center>
<div style="width:600px; text-align: left" >
    
![Phenopype high throughput workflow](_assets/workflow_high.png)
    
</div>
</center>

<p style="color:red;font-weight: bold">IMPORTANT - Read this before continuing</p>

When running the `pype` method, hitting `Enter` will finish an interactive step (e.g. creating a mask), and `Ctrl+Enter` will  complete a pype-run and close the window, if exectued after all functions have run (hitting `Enter` will start another `pype`-iteration (without overwriting your work). Closing a window using the builtin close button on the top right will also trigger another `pype` iteration. `Esc` will close windows and end all running Python tasks (useful if you want to interrupt batch analysis with `for` loops, as shown in [Tutorial 4](tutorial_4_managing_projects.ipynb) ). The `pype` method uses HighGUI windows, which can be sometimes unstable - be sure to check [Tutorial 2](tutorial_2_phenopype_images.ipynb#Window-control) 

At the current stage of development, the pype method is prone to errors resulting from incorrect yaml syntax, e.g. missing spaces or wrong indentation. The pype will still try to run from bottom to top and pass exceptions, but may result in errors that cascade through the function stack. Consult the section [YAML-syntax (below)](#yaml-syntax) to learn how to correctly modify YAML files.

A `pype` will attempt to facilitate user experience when working with large data sets, for example, by calling some functions automatically (e.g. from the `visualization` and `export` modules). Also, some functions may not necessarily show their default behavior (e.g. `visualization.save_canvas` will always have `overwrite=True` to save output canvas). Consult the section [pype-behavior](#pype-behavior) to learn about the most important aspects of the `pype` function.

In [17]:
import phenopype as pp

filepath = r"images/stickleback_side.jpg"

pp.pype(image=filepath, # input - can be also an array or a phenopype directory 
        dirpath = r"../_temp/output", ## directory where output is stored (folder needs to exist)
        name="demo", # name of the  pype routine, appended to all results-files 
        template="tut3" # template for the analysis - you can create your own!
        )

Directory to save phenopype-container output set to parent folder of image:
E:\git_repos\phenopype\tutorials\images
pype_config_demo.yaml already exists - overwrite?
y: yes, file will be overwritten and loaded
n: no, existing file will be loaded instead
To load an existing file, use "config" instead of "template".y
New pype configuration created (tut3.yaml) from phenopype template:
e:\git_repos\phenopype\phenopype\templates\tut3.yaml


------------+++ new pype iteration 2021:03:07 19:37:33 +++--------------


=== AUTOLOAD ===
- masks_demo.csv
PREPROCESSING
create_mask
- mask with label mask1 already created (edit/overwrite=False)
SEGMENTATION
threshold
- including pixels from 1 drawn masks 
morphology
find_contours
- found 2 contours that match criteria
VISUALIZATION
select_canvas
- invalid selection - defaulting to raw image
draw_contours
draw_masks
drawing mask: mask1
EXPORT
save_contours
- contours saved under ../_temp/output\contours_demo.csv (overwritten).
save_canvas
- canvas sav

<phenopype.main.pype at 0x213fa038d88>

To learn how to analyze a lot of images or whole projects with the high-throughput method, move on to the next [Tutorial 4](tutorial_4_managing_projects.ipynb). Also, check the examples (e.g. [Example 2](example_2_landmarks_stickleback.ipynb)), which typically include code for both low and high throughput.

---

---

### YAML syntax

The configuration files needed to run the pype are written in yaml (a recursive acronym for "YAML Ain't Markup Language"). In principle, these are just text files that follow a specific syntax that follows rules for [indentation and separation](https://www.tutorialspoint.com/yaml/yaml_indentation_and_separation.htm).

In [18]:
import phenopype as pp

pp.pype_config_templates

{'ex1.yaml': 'e:\\git_repos\\phenopype\\phenopype\\templates\\ex1.yaml',
 'ex2.yaml': 'e:\\git_repos\\phenopype\\phenopype\\templates\\ex2.yaml',
 'ex3.yaml': 'e:\\git_repos\\phenopype\\phenopype\\templates\\ex3.yaml',
 'ex5_1.yaml': 'e:\\git_repos\\phenopype\\phenopype\\templates\\ex5_1.yaml',
 'ex5_2.yaml': 'e:\\git_repos\\phenopype\\phenopype\\templates\\ex5_2.yaml',
 'ex6.yaml': 'e:\\git_repos\\phenopype\\phenopype\\templates\\ex6.yaml',
 'ex7.yaml': 'e:\\git_repos\\phenopype\\phenopype\\templates\\ex7.yaml',
 'ex8_1.yaml': 'e:\\git_repos\\phenopype\\phenopype\\templates\\ex8_1.yaml',
 'ex8_2.yaml': 'e:\\git_repos\\phenopype\\phenopype\\templates\\ex8_2.yaml',
 'tut3.yaml': 'e:\\git_repos\\phenopype\\phenopype\\templates\\tut3.yaml'}

In [19]:
pp.show_config_template("tut3")

SHOWING BUILTIN PHENOPYPE TEMPLATE tut3.yaml


- preprocessing:
  - create_mask:
      tool: polygon
- segmentation:
  - threshold:
      method: adaptive
      blocksize: 199
      constant: 5
      channel: red
  - morphology:
      operation: close
      shape: ellipse
      kernel_size: 3
      iterations: 3
  - find_contours:
      retrieval: ext
      min_diameter: 0
      min_area: 150
- visualization:
  - select_canvas:
      canvas: image
  - draw_contours:
      line_width: 2
      label_width: 1
      label_size: 1
      fill: 0.3
  - draw_masks
- export:
  - save_contours:
      overwrite: true
  - save_canvas:
      resize: 0.5
      overwrite: true


The text inside the yaml configuration files is parsed by Python from top to bottom and converted back to Python code in the background, i.e. to phenopype modules and functions. The first level without any indentation, e.g. from the above example `-preprocessing`, `- segmentation`, `- visualization` and `- export`, denote from the core module that a function is part of. The second level, e.g. `- threshold` and `- find_contours` are functions inside the `segmentation` module. The third level without hyphens, e.g. `method: otsu` and `blocksize: 99`, are arguments passed on to the function. Following this notation, the yaml parser in Python would interpret the first item in `segmentation` as follows:

`pp.segmentation.threshold(image, method="adaptive", blocksize=199, constant=5, channel="red")`

When running the pype routine, `image` is automatically loaded and passed to all functions. Apart from that you can add or remove functions as you like. Note the hyphen followed by a space in front (`- `) a function: this notation indicates that during parsing the items are interpreted as part of a list. This important, as it allows you to specify the same function as many times as you want.

When adding or modifying modules and functions, it is important to keep in mind that the **function stack is executed sequentially**. So, if you want to perform a `morphology` operation on a binary images, it should come *after* and not before the main segmentation function (in this case `threshold`). 

Here are the most important rules for YAML syntax:

- **indentation rules:**  0 spaces for modules, hyphen+space in front of functions, 4 spaces in front of arguments
- **separation rules:** modules and functions with arguments are followed by a colon (`:`) and a new line; functions without specified arguments don't need a colon; arguments are followed by a colon, a space and then the value
- modules and functions can be emtpy (e.g. see `measurement:` and `- draw_masks` above), but function arguments *cannot* (e.g. `overwrite:` needs to be `true` or `false`)
- as per Python syntax, optional function arguments can, but don't have to be specified and the functions will just run on default values
- if you need to add modules (not all presets contain all modules), stick to this order: 
preprocessing > segmentation > measurement > visualization > export
- functions can be added multiple times, but sometimes their output may be overwtritten (e.g. `- threshold` only works once, `- create_mask` multiple times [1])
 

[1] Note that one `create_mask` mask operation already can create multiple masks - see [Tutorial 5](tutorial_5_gui_interactions.ipynb). 

### `pype` behavior

The `pype` function has specific implicit behavior that aims at supporting speed and robustness when working in "production" (i.e. when performing the actual analysis of large image datasets compared to prototyping and low throughput workflow). Here I list some important aspects of that behavior.

#### Window control

The `pype` method uses HighGUI windows, which can be sometimes unstable - be sure to check [Tutorial 2](tutorial_2_phenopype_images.ipynb#Window-control). If your text window didn't show up when using the `pype` function, make sure you selected a default text editor to handle YAML files ([see the Installation Instructions](installation.html#choose-a-text-editor)). Here is another summary of the window behavior when working with `pype` in high throughput workflow:

- Editing and saving the opened configuration file in the text editor will trigger another `pype` iteration, i.e. close the image window, run the functions in the control file, and display the updated results. 
- Closing the image window manually (with the X button in the upper right), also runs the functions in the control file, and show the updated results.
- `Esc` will close all windows and interrupt the pype routine (`Esc` triggers `sys.exit()`, which will also end a Python session if run from the command line) [1,2] , as well as any loops ([see Tutorial 4](tutorial_4_managing_projects.ipynb#Using-pype-with-project-folders)).
- Each step that requires user interaction (e.g. `create_mask` or `landmarks`) needs to be confirmed with `Return` until the next function in the sequence is executed [1,2].
- At the end of the analysis, when the final steps (visualization and export functions) have run, you can end the pype routine with another `Return` keystroke [1,2].  


[1] Sometimes keystrokes will not be recognized, so they need to be executed multiple times - see [Tutorial 2](tutorial_2_phenopype_images.ipynb).<br>
[2] The image window needs to be highlighted to detect keystrokes, not the text editor or the console - see [Tutorial 2](tutorial_2_phenopype_images.ipynb). <br>


#### Function execution

Most important things to keep in mind during a `pype` iteration:

- The `pype` function will execute all functions in sequence, but it will not overwrite overwrite data from past iterations on disk unless specified.
- To overwrite interactive user input, set the argument `overwrite: true` at the specific function in the configuration file. **Remember to remove it after the next run. [1]**
- If a `pype` is initialized on a directory (either from a Phenopype project or a directory specified with the argument `dirpath`), it will attempt to load input data (e.g. masks) that contain the provided `name` argument [2].

[1] If you forget to remove an overwrite argument and are prompted to overwrite previous input, simply remove the `overwrite: true` argument, and save to run the `pype` again, it will fall back onto input from the last iteration.<br>
[2] For example, `pp.pype(image, name="run1", dirpath="path\to\directory)` will attempt to load any saved files in `directory` that contains the suffix `"run1"` (e.g. `"masks_run1.csv"`).<br>

#### Visualizing the results

Aspects of visual feedback during a `pype` run (can be completely suppressed by setting `feedback=False` in the `pype` arguments):

- Visual feedback is always generated automatically by an internal function that show results (i.e. output from `landmarks`, `find_contours` or `create_mask`) on top of a "canvas". 
- The canvas can be the image at any step of analytic process (i.e. raw image, binary image, or a colour channel [gray, red, green or blue]) and is selected with `- select_canvas` as part of the `visualization` module. 
- If `- select_canvas` is not explicitly specified, it is called automatically and defaults to the raw image as canvas. 
- Output from all functions, needs to be specified manually. For example, after using `- landmarks`, `- draw_landmarks` should be called in the `visualization` module. [1]
- Visual parameters of interactive tools (e.g. `point_size` or `line_thickness`) are specified separately in the respective function, *and* in the `visualization` module. 

[1] Experimental: use the flag `autoshow=True` in the `pype` arguments to automatically show results.  

#### Exporting the results

Saving results and canvas for quality control:

- All results are saved automatically, even if the respective functions in `export` are not specified, with the `name` argument in `pype` as suffix [1,2]. 
- If a file already exist in the directory, and the respective function is *not* listed under `export:`, then it *will not* be overwritten. If an export function *is* specified under `export:`, it *will overwrite* any existing file [3]
- The canvas is an exception: it will always be saved and always be overwritten to show the output from the last iteration. However, users can modify the canvas name with `name` in the arguments to save different output side by side [4].

[1] Experimental: use the flag `autosave=False` in the `pype` arguments to deactivate this behavior.<br>
[2] For example, `pp.pype(image, name="run1")` will save `"masks_run1.csv"` or `"contours_run1.csv"`. <br>
[3] For example, listing `- save_landmarks` under `export:` will overwrite `landmarks_run1.csv`<br>
[4] For example, `name: binary` under `- save_canvas:` save the canvas as `canvas_binary.jpg`<br>