Merge 0250969 into 6dd9881

histolab · Sep 11, 2020 · a348d6f · a348d6f
2 parents 6dd9881 + 0250969
commit a348d6f
Show file tree

Hide file tree

Showing 3 changed files with 465 additions and 194 deletions.
diff --git a/README.md b/README.md
@@ -50,154 +50,298 @@ Histolab has only one system-wide dependency: OpenSlide.
 
 You can download and install it from [OpenSlide](https://openslide.org/download/) according to your operating system.
 
+### Documentation
+
+Read the full documentation here https://histolab.readthedocs.io/en/latest/.
+
+# Quickstart 
+Here we present a step-by-step tutorial on the use of `histolab` to
+extract a tile dataset from example WSIs. The corresponding Jupyter
+Notebook is available at <https://github.com/histolab/histolab-box>:
+this repository contains a complete `histolab` environment that can be
+used through [Vagrant](http://www.vagrantup.com) or
+[Docker](http://www.docker.com) on all platforms.
 
-### Installation 
+Thus, the user can decide either to use `histolab` through
+`histolab-box` or installing it in his/her python virtual environment
+(using conda, pipenv, pyenv, virtualenv, etc...). In the latter case, as
+the `histolab` package has been published on ([PyPi](http://www.pypi.org)), 
+it can be easily installed via the command:
 
 ```
 pip install histolab
 ```
 
-### Documentation
-
-Read the full documentation here https://histolab.readthedocs.io/en/latest/.
+## TCGA data
 
-### Quickstart 
+First things first, let’s import some data to work with, for example the
+prostate tissue slide and the ovarian tissue slide available in the
+`data` module:
 
 ```python
-from histolab.data import breast_tissue, heart_tissue
+from histolab.data import prostate_tissue, ovarian_tissue
 ```
 
-**NB** To use the data module, you need to install ```pooch```.
-
-Each data function outputs the corresponding slide as an OpenSlide object, and the path where the slide has been saved:
+**Note:** To use the `data` module, you need to install `pooch`, also
+available on PyPI (<https://pypi.org/project/pooch/>). This step is
+needless if we are using the Vagrant/Docker virtual environment.
 
+The calling to a `data` function will automatically download the WSI
+from the corresponding repository and save the slide in a cached
+directory:
 
 ```python
-breast_svs, breast_path = breast_tissue()
-heart_svs, heart_path = heart_tissue()
+prostate_svs, prostate_path = prostate_tissue()
+ovarian_svs, ovarian_path = ovarian_tissue()
 ```
 
-### Slide
+Notice that each `data` function outputs the corresponding slide, as an
+OpenSlide object, and the path where the slide has been saved.
 
+## Slide initialization
+
+`histolab` maps a WSI file into a `Slide` object. Each usage of a WSI
+requires a 1-o-1 association with a `Slide` object contained in the
+`slide` module:
 
 ```python
 from histolab.slide import Slide
 ```
 
-Convert the slide into a ```Slide``` object. ```Slide``` takes as input the path where the slide is stored and the ```processed_path``` where the thumbnail and the tiles will be saved.
-
+To initialize a Slide it is necessary to specify the WSI path, and the
+`processed_path` where the thumbnail and the tiles will be saved. In our
+example, we want the `processed_path` of each slide to be a subfolder of
+the current working directory:
 
 ```python
-breast_slide = Slide(breast_path, processed_path='processed')
-heart_slide = Slide(heart_path, processed_path='processed')
-```
+import os
 
-As a ```Slide``` object, you can now easily retrieve information about the slide, such as the slide name, the dimensions at native magnification, the dimensions at a specified level, save and show the slide thumbnail, or get a scaled version of the slide.
+BASE_PATH_PROSTATE = os.getcwd()
+BASE_PATH_OVARIAN = os.getcwd()
 
+PROCESS_PATH_PROSTATE = os.path.join(BASE_PATH_PROSTATE, 'processed')
+PROCESS_PATH_OVARIAN = os.path.join(BASE_PATH_OVARIAN, 'processed')
 
-```python
-print(f"Slide name: {breast_slide.name}")
-print(f"Dimensions at level 0: {breast_slide.dimensions}")
-print(f"Dimensions at level 1: {breast_slide.level_dimensions(level=1)}")
-print(f"Dimensions at level 2: {breast_slide.level_dimensions(level=2)}")
+prostate_slide = Slide(prostate_path, processed_path=PROCESS_PATH_PROSTATE)
+ovarian_slide = Slide(ovarian_path, processed_path=PROCESS_PATH_PROSTATE)
 ```
 
-    Slide name: 9c960533-2e58-4e54-97b2-8454dfb4b8c8
-    Dimensions at level 0: (96972, 30681)
-    Dimensions at level 1: (24243, 7670)
-    Dimensions at level 2: (6060, 1917)
-
+**Note:** If the slides were stored in the same folder, this can be done
+directly on the whole dataset by using the `SlideSet` object of the
+`slide` module.
 
+With a `Slide` object we can easily retrieve information about the
+slide, such as the slide name, the number of available levels, the
+dimensions at native magnification or at a specified level:
 
 ```python
-print(f"Slide name: {heart_slide.name}")
-print(f"Dimensions at level 0: {heart_slide.dimensions}")
-print(f"Dimensions at level 1: {heart_slide.level_dimensions(level=1)}")
-print(f"Dimensions at level 2: {heart_slide.level_dimensions(level=2)}")
+print(f"Slide name: {prostate_slide.name}")
+print(f"Levels: {prostate_slide.levels}")
+print(f"Dimensions at level 0: {prostate_slide.dimensions}")
+print(f"Dimensions at level 1: {prostate_slide.level_dimensions(level=1)}")
+print(f"Dimensions at level 2: {prostate_slide.level_dimensions(level=2)}")
 ```
 
-    Slide name: JP2K-33003-2
-    Dimensions at level 0: (32671, 47076)
-    Dimensions at level 1: (8167, 11769)
-    Dimensions at level 2: (2041, 2942)
-
-
+```
+Slide name: 6b725022-f1d5-4672-8c6c-de8140345210
+Levels: [0, 1, 2]
+Dimensions at level 0: (16000, 15316)
+Dimensions at level 1: (4000, 3829)
+Dimensions at level 2: (2000, 1914)
+```
 
 ```python
-breast_slide.save_thumbnail()
-print(f"Thumbnails saved at: {breast_slide.thumbnail_path}") 
-heart_slide.save_thumbnail()
+print(f"Slide name: {ovarian_slide.name}")
+print(f"Levels: {ovarian_slide.levels}")
+print(f"Dimensions at level 0: {ovarian_slide.dimensions}")
+print(f"Dimensions at level 1: {ovarian_slide.level_dimensions(level=1)}")
+print(f"Dimensions at level 2: {ovarian_slide.level_dimensions(level=2)}")
+```
 
-print(f"Thumbnails saved at: {heart_slide.thumbnail_path}") 
+```
+Slide name: b777ec99-2811-4aa4-9568-13f68e380c86
+Levels: [0, 1, 2]
+Dimensions at level 0: (30001, 33987)
+Dimensions at level 1: (7500, 8496)
+Dimensions at level 2: (1875, 2124)
 ```
 
-    Thumbnails saved at: processed/thumbnails/9c960533-2e58-4e54-97b2-8454dfb4b8c8.png
-    Thumbnails saved at: processed/thumbnails/JP2K-33003-2.png
+Moreover, we can save and show the slide thumbnail in a separate window.
+In particular, the thumbnail image will be automatically saved in a
+subdirectory of the processedpath:
 
+```python
+prostate_slide.save_thumbnail()
+prostate_slide.show()
+```
 
+![](https://user-images.githubusercontent.com/4196091/92748324-5033e680-f385-11ea-812b-6a9a225ceca4.png)
 
 ```python
-breast_slide.show() 
-heart_slide.show()
+ovarian_slide.save_thumbnail()
+ovarian_slide.show()
 ```
 
-![thumbnails](https://user-images.githubusercontent.com/31658006/84955475-a4695a80-b0f7-11ea-83d5-db7668801219.png)
+![](https://user-images.githubusercontent.com/4196091/92748248-3db9ad00-f385-11ea-846b-a5ce8cf3ca09.png)
+
+## Tile extraction
 
-### Tiles extraction
+Once that the `Slide` objects are defined, we can proceed to extract the
+tiles. To speed up the extraction process, `histolab` automatically
+detects the tissue region with the largest connected area and crops the
+tiles within this field. The `tiler` module implements different
+strategies for the tiles extraction and provides an intuitive interface
+to easily retrieve a tile dataset suitable for our task. In particular,
+each extraction method is customizable with several common parameters:
 
-Now that your ```Slide``` object is defined, you can automatically extract the tiles. A ```RandomTiler``` object crops random tiles from the slide.
-You need to specify the size you want your tiles, the number of tiles to crop, and the level of magnification. If ```check_tissue``` is True, the exracted tiles are taken by default from the **biggest tissue region detected** in the slide, and the tiles are saved only if they have at least 80% of tissue inside.
+-   `tile_size`: the tile size;
+-   `level`: the extraction level (from 0 to the number of available
+    levels);
+-   `check_tissue`: if a minimum percentage of tissue is required to
+    save the tiles (default is 80%);
+-   `prefix`: a prefix to be added at the beginning of the tiles’
+    filename (default is the empty string);
+-   `suffix`: a suffix to be added to the end of the tiles’ filename
+    (default is `.png`).
 
+### Random Extraction
+
+The simplest approach we may adopt is to randomly crop a fixed number of
+tiles from our slides; in this case, we need the `RandomTiler`
+extractor:
 
 ```python
 from histolab.tiler import RandomTiler
+```
+
+Let us suppose that we want to randomly extract 6 squared tiles at level
+2 of size 512 from ourprostate slide, and that we want to save them only
+if they have at least 80% of tissue inside. We then initialize our
+`RandomTiler` extractor as follows:
+
+```python
+PROSTATE_RANDOM_TILES_PATH = os.path.join(PROCESS_PATH_PROSTATE, 'random')# save tiles in the 'random' subdirectory
 
 random_tiles_extractor = RandomTiler(
     tile_size=(512, 512),
     n_tiles=6,
     level=2,
     seed=42,
-    check_tissue=True,
-    prefix="processed/breast_slide/",
+    check_tissue=True, # default
+    prefix=PROSTATE_RANDOM_TILES_PATH,
+    suffix=".png" # default
+)
+```
+
+Notice that we also specify the random seed to ensure the
+reproducibility of the extraction process. Starting the extraction is as
+simple as calling the `extract` method on the extractor, passing the
+slide as parameter:
+
+```python
+random_tiles_extractor.extract(prostate_slide)
+```
+
+![](https://user-images.githubusercontent.com/4196091/92750145-1663df80-f387-11ea-8d98-7794eef2fd47.png)
+
+Random tiles extracted from the prostate slide at level 2.
+
+### Grid Extraction
+
+Instead of picking tiles at random, we may want to retrieve all the
+tiles available. The Grid Tiler extractor crops the tiles following a grid
+structure on the largest tissue region detected in the WSI:
+
+```python
+from histolab.tiler import GridTiler
+```
+
+In our example, we want to extract squared tiles at level 0 of size 512
+from our ovarian slide, independently of the amount of tissue detected.
+By default, tiles will not overlap, namely the parameter defining the
+number of overlapping pixels between two adjacent tiles,
+`pixel_overlap`, is set to zero:
+
+```python
+# save tiles in the 'grid' subdirectory
+OVARIAN_GRID_TILES_PATH = os.path.join(PROCESS_PATH_OVARIAN, 'grid')
+
+grid_tiles_extractor = GridTiler(
+   tile_size=(512, 512),
+   level=0,
+   check_tissue=False,
+   pixel_overlap=0, # default
+   prefix=OVARIAN_GRID_TILES_PATH,
+   suffix=".png" # default
 )
+```
 
-random_tiles_extractor.extract(breast_slide)
+Again, the extraction process starts when the extract method is called
+on our extractor:
+
+```python
+grid_tiles_extractor.extract(ovarian_slide)
 ```
 
-    	 Tile 0 saved: processed/breast_slide/tile_0_level2_70536-7186-78729-15380.png
-    	 Tile 1 saved: processed/breast_slide/tile_1_level2_74393-3441-82586-11635.png
-    	 Tile 2 saved: processed/breast_slide/tile_2_level2_82218-6225-90411-14420.png
-    	 Tile 3 saved: processed/breast_slide/tile_3_level2_84026-8146-92219-16340.png
-    	 Tile 4 saved: processed/breast_slide/tile_4_level2_78969-3953-87162-12147.png
-    	 Tile 5 saved: processed/breast_slide/tile_5_level2_78649-3569-86842-11763.png
-    	 Tile 6 saved: processed/breast_slide/tile_6_level2_81994-6753-90187-14948.png
-    6 Random Tiles have been saved.
+![](https://user-images.githubusercontent.com/4196091/92751173-0993bb80-f388-11ea-9d30-a6cd17769d76.png)
 
+Examples of non-overlapping grid tiles extracted from the ovarian slide
+at level 0.
 
-![breast 001](https://user-images.githubusercontent.com/31658006/84955724-0f1a9600-b0f8-11ea-92c9-3236dd16bca8.png)
+### Score-based extraction
+
+Depending on the task we will use our tile dataset for, the extracted
+tiles may not be equally informative. The `ScoreTiler` allows us to save
+only the "best" tiles, among all the ones extracted with a grid
+structure, based on a specific scoring function. For example, let us
+suppose that our goal is the detection of mitotic activity on our
+ovarian slide. In this case, tiles with a higher presence of nuclei are
+preferable over tiles with few or no nuclei. We can leverage the
+`NucleiScorer` function of the `scorer` module to order the extracted
+tiles based on the proportion of the tissue and of the hematoxylin
+staining. In particular, the score is computed as ![formula](https://render.githubusercontent.com/render/math?math=N_t\cdot\mathrm{tanh}(T_t)) where ![formula](https://render.githubusercontent.com/render/math?math=N_t) is the percentage of nuclei and  ![formula](https://render.githubusercontent.com/render/math?math=T_t) the percentage of tissue in the tile *t*
+
+First, we need the extractor and the scorer:
 
 ```python
-random_tiles_extractor = RandomTiler(
+from histolab.tiler import ScoreTiler
+from histolab.scorer import NucleiScorer
+```
+
+As the `ScoreTiler` extends the `GridTiler` extractor, we also set the
+`pixel_overlap` as additional parameter. Moreover, we can specify the
+number of the top tiles we want to save with the `n_tile` parameter:
+
+```python
+# save tiles in the 'scored' subdirectory
+OVARIAN_SCORED_TILES_PATH = os.path.join(PROCESS_PATH_OVARIAN, 'scored')
+
+scored_tiles_extractor = ScoreTiler(
+    scorer = NucleiScorer(),
     tile_size=(512, 512),
-    n_tiles=6,
+    n_tiles=100,
     level=0,
-    seed=42,
     check_tissue=True,
-    prefix="processed/heart_slide/",
+    pixel_overlap=0, # default
+    prefix=OVARIAN_SCORED_TILES_PATH,
+    suffix=".png" # default
 )
-random_tiles_extractor.extract(heart_slide)
 ```
 
-    	 Tile 0 saved: processed/heart_slide/tile_0_level0_4299-35755-4811-36267.png
-    	 Tile 1 saved: processed/heart_slide/tile_1_level0_7051-39146-7563-39658.png
-    	 Tile 2 saved: processed/heart_slide/tile_2_level0_10920-26934-11432-27446.png
-    	 Tile 3 saved: processed/heart_slide/tile_3_level0_7151-30986-7663-31498.png
-    	 Tile 4 saved: processed/heart_slide/tile_4_level0_11472-26400-11984-26912.png
-    	 Tile 5 saved: processed/heart_slide/tile_5_level0_13489-42680-14001-43192.png
-    	 Tile 6 saved: processed/heart_slide/tile_6_level0_13281-33895-13793-34407.png
-    6 Random Tiles have been saved.
+Finally, when we extract our cropped images, we can also write a report
+of the saved tiles and their scores in a CSV file:
+
+```python
+summary_filename = 'summary_ovarian_tiles.csv'
+SUMMARY_PATH = os.path.join(OVARIAN_SCORED_TILES_PATH, summary_filename)
+
+scored_tiles_extractor.extract(ovarian_slide, report_path=SUMMARY_PATH)
+```
+
+<img src="https://user-images.githubusercontent.com/4196091/92751801-9d658780-f388-11ea-8132-5d0c82bb112b.png" width=500>
 
-![heart](https://user-images.githubusercontent.com/31658006/84955793-2c4f6480-b0f8-11ea-8970-592dc992d56d.png)
+Representation of the scored assigned to each extracted tile by the
+NucleiScorer, based on the amount of nuclei detected.
 
 ## Versioning 
 

diff --git a/docs/api/utils.rst b/docs/api/utils.rst
@@ -9,4 +9,4 @@ Utils
    :hidden:
 
 .. automodule:: src.histolab.util
-    :members: np_to_pil, threshold_to_mask, polygon_to_mask_array, apply_mask_image, resize_mask
+    :members: np_to_pil, threshold_to_mask, polygon_to_mask_array, apply_mask_image