Skip to content

Commit

Permalink
Merge 0250969 into 6dd9881
Browse files Browse the repository at this point in the history
  • Loading branch information
ernestoarbitrio committed Sep 11, 2020
2 parents 6dd9881 + 0250969 commit a348d6f
Show file tree
Hide file tree
Showing 3 changed files with 465 additions and 194 deletions.
296 changes: 220 additions & 76 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,154 +50,298 @@ Histolab has only one system-wide dependency: OpenSlide.

You can download and install it from [OpenSlide](https://openslide.org/download/) according to your operating system.

### Documentation

Read the full documentation here https://histolab.readthedocs.io/en/latest/.

# Quickstart
Here we present a step-by-step tutorial on the use of `histolab` to
extract a tile dataset from example WSIs. The corresponding Jupyter
Notebook is available at <https://github.com/histolab/histolab-box>:
this repository contains a complete `histolab` environment that can be
used through [Vagrant](http://www.vagrantup.com) or
[Docker](http://www.docker.com) on all platforms.

### Installation
Thus, the user can decide either to use `histolab` through
`histolab-box` or installing it in his/her python virtual environment
(using conda, pipenv, pyenv, virtualenv, etc...). In the latter case, as
the `histolab` package has been published on ([PyPi](http://www.pypi.org)),
it can be easily installed via the command:

```
pip install histolab
```

### Documentation

Read the full documentation here https://histolab.readthedocs.io/en/latest/.
## TCGA data

### Quickstart
First things first, let’s import some data to work with, for example the
prostate tissue slide and the ovarian tissue slide available in the
`data` module:

```python
from histolab.data import breast_tissue, heart_tissue
from histolab.data import prostate_tissue, ovarian_tissue
```

**NB** To use the data module, you need to install ```pooch```.

Each data function outputs the corresponding slide as an OpenSlide object, and the path where the slide has been saved:
**Note:** To use the `data` module, you need to install `pooch`, also
available on PyPI (<https://pypi.org/project/pooch/>). This step is
needless if we are using the Vagrant/Docker virtual environment.

The calling to a `data` function will automatically download the WSI
from the corresponding repository and save the slide in a cached
directory:

```python
breast_svs, breast_path = breast_tissue()
heart_svs, heart_path = heart_tissue()
prostate_svs, prostate_path = prostate_tissue()
ovarian_svs, ovarian_path = ovarian_tissue()
```

### Slide
Notice that each `data` function outputs the corresponding slide, as an
OpenSlide object, and the path where the slide has been saved.

## Slide initialization

`histolab` maps a WSI file into a `Slide` object. Each usage of a WSI
requires a 1-o-1 association with a `Slide` object contained in the
`slide` module:

```python
from histolab.slide import Slide
```

Convert the slide into a ```Slide``` object. ```Slide``` takes as input the path where the slide is stored and the ```processed_path``` where the thumbnail and the tiles will be saved.

To initialize a Slide it is necessary to specify the WSI path, and the
`processed_path` where the thumbnail and the tiles will be saved. In our
example, we want the `processed_path` of each slide to be a subfolder of
the current working directory:

```python
breast_slide = Slide(breast_path, processed_path='processed')
heart_slide = Slide(heart_path, processed_path='processed')
```
import os

As a ```Slide``` object, you can now easily retrieve information about the slide, such as the slide name, the dimensions at native magnification, the dimensions at a specified level, save and show the slide thumbnail, or get a scaled version of the slide.
BASE_PATH_PROSTATE = os.getcwd()
BASE_PATH_OVARIAN = os.getcwd()

PROCESS_PATH_PROSTATE = os.path.join(BASE_PATH_PROSTATE, 'processed')
PROCESS_PATH_OVARIAN = os.path.join(BASE_PATH_OVARIAN, 'processed')

```python
print(f"Slide name: {breast_slide.name}")
print(f"Dimensions at level 0: {breast_slide.dimensions}")
print(f"Dimensions at level 1: {breast_slide.level_dimensions(level=1)}")
print(f"Dimensions at level 2: {breast_slide.level_dimensions(level=2)}")
prostate_slide = Slide(prostate_path, processed_path=PROCESS_PATH_PROSTATE)
ovarian_slide = Slide(ovarian_path, processed_path=PROCESS_PATH_PROSTATE)
```

Slide name: 9c960533-2e58-4e54-97b2-8454dfb4b8c8
Dimensions at level 0: (96972, 30681)
Dimensions at level 1: (24243, 7670)
Dimensions at level 2: (6060, 1917)

**Note:** If the slides were stored in the same folder, this can be done
directly on the whole dataset by using the `SlideSet` object of the
`slide` module.

With a `Slide` object we can easily retrieve information about the
slide, such as the slide name, the number of available levels, the
dimensions at native magnification or at a specified level:

```python
print(f"Slide name: {heart_slide.name}")
print(f"Dimensions at level 0: {heart_slide.dimensions}")
print(f"Dimensions at level 1: {heart_slide.level_dimensions(level=1)}")
print(f"Dimensions at level 2: {heart_slide.level_dimensions(level=2)}")
print(f"Slide name: {prostate_slide.name}")
print(f"Levels: {prostate_slide.levels}")
print(f"Dimensions at level 0: {prostate_slide.dimensions}")
print(f"Dimensions at level 1: {prostate_slide.level_dimensions(level=1)}")
print(f"Dimensions at level 2: {prostate_slide.level_dimensions(level=2)}")
```

Slide name: JP2K-33003-2
Dimensions at level 0: (32671, 47076)
Dimensions at level 1: (8167, 11769)
Dimensions at level 2: (2041, 2942)


```
Slide name: 6b725022-f1d5-4672-8c6c-de8140345210
Levels: [0, 1, 2]
Dimensions at level 0: (16000, 15316)
Dimensions at level 1: (4000, 3829)
Dimensions at level 2: (2000, 1914)
```

```python
breast_slide.save_thumbnail()
print(f"Thumbnails saved at: {breast_slide.thumbnail_path}")
heart_slide.save_thumbnail()
print(f"Slide name: {ovarian_slide.name}")
print(f"Levels: {ovarian_slide.levels}")
print(f"Dimensions at level 0: {ovarian_slide.dimensions}")
print(f"Dimensions at level 1: {ovarian_slide.level_dimensions(level=1)}")
print(f"Dimensions at level 2: {ovarian_slide.level_dimensions(level=2)}")
```

print(f"Thumbnails saved at: {heart_slide.thumbnail_path}")
```
Slide name: b777ec99-2811-4aa4-9568-13f68e380c86
Levels: [0, 1, 2]
Dimensions at level 0: (30001, 33987)
Dimensions at level 1: (7500, 8496)
Dimensions at level 2: (1875, 2124)
```

Thumbnails saved at: processed/thumbnails/9c960533-2e58-4e54-97b2-8454dfb4b8c8.png
Thumbnails saved at: processed/thumbnails/JP2K-33003-2.png
Moreover, we can save and show the slide thumbnail in a separate window.
In particular, the thumbnail image will be automatically saved in a
subdirectory of the processedpath:

```python
prostate_slide.save_thumbnail()
prostate_slide.show()
```

![](https://user-images.githubusercontent.com/4196091/92748324-5033e680-f385-11ea-812b-6a9a225ceca4.png)

```python
breast_slide.show()
heart_slide.show()
ovarian_slide.save_thumbnail()
ovarian_slide.show()
```

![thumbnails](https://user-images.githubusercontent.com/31658006/84955475-a4695a80-b0f7-11ea-83d5-db7668801219.png)
![](https://user-images.githubusercontent.com/4196091/92748248-3db9ad00-f385-11ea-846b-a5ce8cf3ca09.png)

## Tile extraction

### Tiles extraction
Once that the `Slide` objects are defined, we can proceed to extract the
tiles. To speed up the extraction process, `histolab` automatically
detects the tissue region with the largest connected area and crops the
tiles within this field. The `tiler` module implements different
strategies for the tiles extraction and provides an intuitive interface
to easily retrieve a tile dataset suitable for our task. In particular,
each extraction method is customizable with several common parameters:

Now that your ```Slide``` object is defined, you can automatically extract the tiles. A ```RandomTiler``` object crops random tiles from the slide.
You need to specify the size you want your tiles, the number of tiles to crop, and the level of magnification. If ```check_tissue``` is True, the exracted tiles are taken by default from the **biggest tissue region detected** in the slide, and the tiles are saved only if they have at least 80% of tissue inside.
- `tile_size`: the tile size;
- `level`: the extraction level (from 0 to the number of available
levels);
- `check_tissue`: if a minimum percentage of tissue is required to
save the tiles (default is 80%);
- `prefix`: a prefix to be added at the beginning of the tiles’
filename (default is the empty string);
- `suffix`: a suffix to be added to the end of the tiles’ filename
(default is `.png`).

### Random Extraction

The simplest approach we may adopt is to randomly crop a fixed number of
tiles from our slides; in this case, we need the `RandomTiler`
extractor:

```python
from histolab.tiler import RandomTiler
```

Let us suppose that we want to randomly extract 6 squared tiles at level
2 of size 512 from ourprostate slide, and that we want to save them only
if they have at least 80% of tissue inside. We then initialize our
`RandomTiler` extractor as follows:

```python
PROSTATE_RANDOM_TILES_PATH = os.path.join(PROCESS_PATH_PROSTATE, 'random')# save tiles in the 'random' subdirectory

random_tiles_extractor = RandomTiler(
tile_size=(512, 512),
n_tiles=6,
level=2,
seed=42,
check_tissue=True,
prefix="processed/breast_slide/",
check_tissue=True, # default
prefix=PROSTATE_RANDOM_TILES_PATH,
suffix=".png" # default
)
```

Notice that we also specify the random seed to ensure the
reproducibility of the extraction process. Starting the extraction is as
simple as calling the `extract` method on the extractor, passing the
slide as parameter:

```python
random_tiles_extractor.extract(prostate_slide)
```

![](https://user-images.githubusercontent.com/4196091/92750145-1663df80-f387-11ea-8d98-7794eef2fd47.png)

Random tiles extracted from the prostate slide at level 2.

### Grid Extraction

Instead of picking tiles at random, we may want to retrieve all the
tiles available. The Grid Tiler extractor crops the tiles following a grid
structure on the largest tissue region detected in the WSI:

```python
from histolab.tiler import GridTiler
```

In our example, we want to extract squared tiles at level 0 of size 512
from our ovarian slide, independently of the amount of tissue detected.
By default, tiles will not overlap, namely the parameter defining the
number of overlapping pixels between two adjacent tiles,
`pixel_overlap`, is set to zero:

```python
# save tiles in the 'grid' subdirectory
OVARIAN_GRID_TILES_PATH = os.path.join(PROCESS_PATH_OVARIAN, 'grid')

grid_tiles_extractor = GridTiler(
tile_size=(512, 512),
level=0,
check_tissue=False,
pixel_overlap=0, # default
prefix=OVARIAN_GRID_TILES_PATH,
suffix=".png" # default
)
```

random_tiles_extractor.extract(breast_slide)
Again, the extraction process starts when the extract method is called
on our extractor:

```python
grid_tiles_extractor.extract(ovarian_slide)
```

Tile 0 saved: processed/breast_slide/tile_0_level2_70536-7186-78729-15380.png
Tile 1 saved: processed/breast_slide/tile_1_level2_74393-3441-82586-11635.png
Tile 2 saved: processed/breast_slide/tile_2_level2_82218-6225-90411-14420.png
Tile 3 saved: processed/breast_slide/tile_3_level2_84026-8146-92219-16340.png
Tile 4 saved: processed/breast_slide/tile_4_level2_78969-3953-87162-12147.png
Tile 5 saved: processed/breast_slide/tile_5_level2_78649-3569-86842-11763.png
Tile 6 saved: processed/breast_slide/tile_6_level2_81994-6753-90187-14948.png
6 Random Tiles have been saved.
![](https://user-images.githubusercontent.com/4196091/92751173-0993bb80-f388-11ea-9d30-a6cd17769d76.png)

Examples of non-overlapping grid tiles extracted from the ovarian slide
at level 0.

![breast 001](https://user-images.githubusercontent.com/31658006/84955724-0f1a9600-b0f8-11ea-92c9-3236dd16bca8.png)
### Score-based extraction

Depending on the task we will use our tile dataset for, the extracted
tiles may not be equally informative. The `ScoreTiler` allows us to save
only the "best" tiles, among all the ones extracted with a grid
structure, based on a specific scoring function. For example, let us
suppose that our goal is the detection of mitotic activity on our
ovarian slide. In this case, tiles with a higher presence of nuclei are
preferable over tiles with few or no nuclei. We can leverage the
`NucleiScorer` function of the `scorer` module to order the extracted
tiles based on the proportion of the tissue and of the hematoxylin
staining. In particular, the score is computed as ![formula](https://render.githubusercontent.com/render/math?math=N_t\cdot\mathrm{tanh}(T_t)) where ![formula](https://render.githubusercontent.com/render/math?math=N_t) is the percentage of nuclei and ![formula](https://render.githubusercontent.com/render/math?math=T_t) the percentage of tissue in the tile *t*

First, we need the extractor and the scorer:

```python
random_tiles_extractor = RandomTiler(
from histolab.tiler import ScoreTiler
from histolab.scorer import NucleiScorer
```

As the `ScoreTiler` extends the `GridTiler` extractor, we also set the
`pixel_overlap` as additional parameter. Moreover, we can specify the
number of the top tiles we want to save with the `n_tile` parameter:

```python
# save tiles in the 'scored' subdirectory
OVARIAN_SCORED_TILES_PATH = os.path.join(PROCESS_PATH_OVARIAN, 'scored')

scored_tiles_extractor = ScoreTiler(
scorer = NucleiScorer(),
tile_size=(512, 512),
n_tiles=6,
n_tiles=100,
level=0,
seed=42,
check_tissue=True,
prefix="processed/heart_slide/",
pixel_overlap=0, # default
prefix=OVARIAN_SCORED_TILES_PATH,
suffix=".png" # default
)
random_tiles_extractor.extract(heart_slide)
```

Tile 0 saved: processed/heart_slide/tile_0_level0_4299-35755-4811-36267.png
Tile 1 saved: processed/heart_slide/tile_1_level0_7051-39146-7563-39658.png
Tile 2 saved: processed/heart_slide/tile_2_level0_10920-26934-11432-27446.png
Tile 3 saved: processed/heart_slide/tile_3_level0_7151-30986-7663-31498.png
Tile 4 saved: processed/heart_slide/tile_4_level0_11472-26400-11984-26912.png
Tile 5 saved: processed/heart_slide/tile_5_level0_13489-42680-14001-43192.png
Tile 6 saved: processed/heart_slide/tile_6_level0_13281-33895-13793-34407.png
6 Random Tiles have been saved.
Finally, when we extract our cropped images, we can also write a report
of the saved tiles and their scores in a CSV file:

```python
summary_filename = 'summary_ovarian_tiles.csv'
SUMMARY_PATH = os.path.join(OVARIAN_SCORED_TILES_PATH, summary_filename)

scored_tiles_extractor.extract(ovarian_slide, report_path=SUMMARY_PATH)
```

<img src="https://user-images.githubusercontent.com/4196091/92751801-9d658780-f388-11ea-8132-5d0c82bb112b.png" width=500>

![heart](https://user-images.githubusercontent.com/31658006/84955793-2c4f6480-b0f8-11ea-8970-592dc992d56d.png)
Representation of the scored assigned to each extracted tile by the
NucleiScorer, based on the amount of nuclei detected.

## Versioning

Expand Down
2 changes: 1 addition & 1 deletion docs/api/utils.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,4 +9,4 @@ Utils
:hidden:

.. automodule:: src.histolab.util
:members: np_to_pil, threshold_to_mask, polygon_to_mask_array, apply_mask_image, resize_mask
:members: np_to_pil, threshold_to_mask, polygon_to_mask_array, apply_mask_image

0 comments on commit a348d6f

Please sign in to comment.