# Tutorial 2: Working with phenopype

Analysis of scientific images can be an iterative process that may require frequent user input to preprocess images, adjust settings and evaluate the obtained results. In phenopype, users can start this process by identifying the appropriate functions and settings to analyse a series of images (i.e. which segmentation algorithms is to be used). For the actual analysis, users then can switch to a workflow that has higher throughput and is more reproducible. Phenopype offers workflows that are appropriate for all stages of the scientific process:

---

| Workflow | Use case | Principle of operation | Explicitness | Reproducibility |
|:---|:---|:---|:---|:---|
| [**Prototyping**](#proto) | analysis prototyping, self education and evaluation | images are loaded as arrays and functions are applied one by one | High | Low |
| [**Low throughput**](#low) | single pictures and very small datasets | images are loaded into phenopype containers | Medium | low |
| [**High throughput**](#high) | medium and large datasets - default analysis workflow | images are loaded from a phenopype project directory tree, and analyzed with the [pype](api.html#pype-method) method | Low | High |

---

For all three workflows, users assemble a stack of computer vision functions from phenopypes five core modules (preprocessing, segmentation, measurement, export, visualization - for an overview check the [API reference](api)). However, the degree of user interaction, visual feedback and the mode by which these functions are applied to images differ, as well as reproducibility. 

In the prototyping and low throughput workflow, users write a phenopype function stack in directly in Python code. This is recommended for users who wish to familiarize themselves with the basic principles of computer vision and to explore the phenopype function library. Output from all intermediate steps is returned from the functions and can be evaluated, which makes these routines are also appropriate for prototyping and testing. 

To process image datasets in high throughput and reproducibility, users should work from a phenopype directory structure in conjunction with the pype-method. To get started with Phenopype's high througput workflow, [see below](#high), check [Tutorial 3](tutorial_3_managing_projects_1.ipynb) and consult the [pype section in the API reference](api.html#pype-method).

In the following, all three workflows a demonstrated by analyzing an image of a threespine stickleback (*Gasterosteus aculeatus*) stained with alizarin red. Traits of interest are bone-plate area and shape, and, within the detected plates, pixel intensities that denote bone-density.

<center>
<div style="width:600px; text-align: left" >
    
![Phenopype workflow example](_assets/workflow_example_case.png)
    
**Fig. 1:** Workflow demonstration using a stained stickleback. The computer vision functions used to extract the trait of interest (bone-plate area, shape and pixel density) are the same in all cases, but workflows differ in the amount of code necessary and in reproducibility. 
    
</div>
</center>

## Prototyping worflow <a name="proto"></a>

The low throughput workflow starts with the path to an image that is stored on the hard drive. `load_image` imports the file as a three-channel [1] numpy array (*ndarray*), together with image meta data (file name, exposure, dimensions, etc.) as a pandas *DataFrame*. The array gets passed on to the `threshold` function, which will return a binary array of the same dimensions. This array needs to be passed on to the `find_contours` function, which will return a dictionary with the detected contours. This dictionary, together with the original array, can then be passed to the `colour_intensity` function. This function will collect the average color value from within the perimeter coordinates for each contour and return a pandas dataframe containing those values. Finally, the dataframe can be exported as a csv file with `save_colour`. By passing on the initially created meta-data, the function will automatically expand the provided columns of meta-info into the exported csv.  

[1] to learn more about the basic of Computer Vision check the resources section of the phenopype documentation. 

![Phenopype prototyping workflow](_assets/workflow_proto.png)

<strong>Fig. 2:</strong> Schematic of Phenopype's prototyping workflow

In [2]:
import phenopype as pp

filepath = r"images/stickleback_side.jpg"

## load image as array, supply image_data (DataFrame containing meta data)
image, image_data = pp.load_image(filepath, df = True, meta=True)
## draw mask
image_masked, mask = pp.preprocessing.create_mask(image, tool="polygon") 
## thresholding converts multichannel to binary image
image_bin = pp.segmentation.threshold(image, method="adaptive", 
                                      channel="red", blocksize=199, 
                                      constant=5, masks=mask) 
## perform morphology operations on binarized image
image_morph = pp.segmentation.morphology(image_bin, operation="close", 
                                         shape="ellipse", kernel_size=3, 
                                         iterations=3) 
## detect contours ony binary image
contours = pp.segmentation.find_contours(image_morph, df=image_data, 
                                         retrieval="ext", min_area=150) 
## draw detected contours onto canvas
image_drawn = pp.visualization.show_contours(image, contours=contours,
                                             df=image_data, df_contours=contours)  
## export contours to csv
pp.export.save_contours(contours, dirpath = r"../_temp/output")
## show convas
pp.show_image(image_drawn)

- create mask
- applying mask: mask1
- contours saved under ../_temp/output\contours.csv (overwritten).


While analyzing the image, you can explore output from the different steps to see what is going on. For example, the binary image resulting from the thresholding: 

In [3]:
pp.show_image(image_bin)

## Low throughput worflow <a name="low"></a>

The `load_image` function can also load an image into a phenopype container, which is a python class that incorporates loaded images, dataframes, detected contours, intermediate output, etc. so that they are available for inspection or storage at the end of the analysis. The advantage of using containers is that they don’t litter the global environment and namespace, while still containing all intermediate steps (e.g. binary masks or contour DataFrames). Containers can be used manually to analyze images, but typically they are used automatically within the pype-routine that is part of phenoype's high throughput workflow (see below).

![Phenopype low throughput workflow](_assets/workflow_low.png)

In [5]:
import phenopype as pp

filepath = r"images/stickleback_side.jpg"

## load image as a phenopype container which will include all images, dataframes, 
## detected contours and intermediate output
container = pp.load_image(filepath, cont=True, meta=True) 

## afterwards, same as in the prototyping workflow, functions are applied 
## directly to the container
pp.preprocessing.create_mask(container, tool="polygon") 
pp.segmentation.threshold(container, method="adaptive", channel="red", 
                          blocksize=199, constant=5) # 3/4
pp.segmentation.morphology(container, operation="close", shape="ellipse", 
                           kernel_size=3, iterations=3) # 5
pp.segmentation.find_contours(container, retrieval="ext", min_area=150) # 6
pp.visualization.show_contours(container) # 6
pp.export.save_contours(container, dirpath = r"../_temp/output")
pp.show_image(container.canvas) 

- create mask
- applying mask: mask1
- contours saved under images\contours.csv.


Although the intermediate steps from the functions are not present as objects in the namespace, you can access and evaluate it from the container. Again, we will look at the binary image:

In [7]:
pp.show_image(container.image_bin)

Use `dir` to inspect all the components of the container:

In [10]:
print(dir(container))

['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', 'canvas', 'df_contours', 'df_image_data', 'df_image_data_copy', 'df_masks', 'dirpath', 'image', 'image_bin', 'image_copy', 'image_data', 'image_gray', 'image_mod', 'load', 'reset', 'save', 'save_suffix', 'show']


## High throughput worflow <a name="high"></a>

The pype routine is phenopype’s standard method to analyse medium and large image datasets, where a function stack is constructed with the human readable `yaml` syntax (serialization language - for more information see the [resources section](resources) of the phenopype documentation). Users can execute the pype method on a filepath, an array, or a phenopype directory, which always will trigger three actions: 

1. open the contained yaml configuration with the default OS text editor
2. parse the contained functions and execute them in the sequence
3. open a Python-window showing the processed image. 

After one iteration of these steps, users can evaluate the results and decide to modify the opened configuration file (e.g. either change function parameters or add new functions), and run the pype again, or to terminate the pype and save all results. The processed image, any extracted phenotypic information, as well as the modified config-file is stored inside the image directory. Together with the raw images, which may be either stored separately or within the directory tree, users can thereby provide the full image analysis pipeline to anyone who wishes to reproduce the obtained results. 

**Further information** 

For more detailed information on `pype` and it's default behavior, see below.

For more information on how to use the `pype` in conjunction with phenopype projects please refer to the [Phenopype project tutorial](tutorial_3_managing_projects_1.ipynb).

Check the examples, which include both low and highthroughput code.

![Phenopype high throughput workflow](_assets/workflow_high.png)

In [2]:
import phenopype as pp

filepath = r"images/stickleback_side.jpg"

pp.pype(image=filepath, # input - can be also an array or a phenopype directory 
        dirpath = r"../_temp/output", ## where output is stored
        name="demo", # name of the  pype routine, appended to all results-files 
        preset="demo1" # template for the analysis - you can create your own!
        )



------------+++ new pype iteration 2020:03:25 15:03:29 +++--------------


AUTOLOAD
- masks_demo.csv
PREPROCESSING
create_mask
- mask with label mask1 already created (overwrite=False)
SEGMENTATION
blur
threshold
- applying mask: mask1
morphology
find_contours
VISUALIZATION
select_canvas
show_contours
show_masks
 - show mask: mask1.
EXPORT
save_contours
- contours saved under ../_temp/output\contours_demo.csv (overwritten).
save_canvas
- canvas saved under ../_temp/output\canvas_demo.jpg (overwritten).
AUTOSAVE
save_masks
- masks not saved - file already exists (overwrite=False).


TERMINATE


<phenopype.main.pype at 0x18b78889d48>

At the current stage of development, the pype method is prone to errors resulting from incorrect yaml syntax, e.g. missing spaces or wrong indentation. The pype will still try to run from bottom to top and pass exceptions, but may result in errors that cascade through the function stack. Therefore I am going over basic yaml syntax, and the specific structure of the pype_config files. 

Moreover, the pype will trigger specific behavior of some functions to facilitate user experience when working with large data sets. Fore example, some functions get called automatically (e.g. from the `visualization` and `export` modules), but they don't necessarily show default behavior as documented in the api (e.g. `visualization.save_canvas` will always have `overwrite=True` to save output canvas). To clarify I am explaining here the most important aspects of `pype`-behavior.

### 3.1 yaml-syntax

### 3.2. `pype`-behavior