Here we present a step-by-step tutorial on the use of `histolab` to extract a tile dataset from example WSIs retrieve from the [TCGA repository](https://portal.gdc.cancer.gov/).

## TCGA data
First things first, let’s import some data to work with, for example the prostate tissue slide and the ovarian tissue slide available in the `data` module:

In [None]:
from histolab.data import prostate_tissue, ovarian_tissue

<div class="alert alert-block alert-info">
<b>Note:</b> To  use  the <mark>data</mark>  module,  you  need  to  install <mark><a href=”https://pypi.org/project/pooch/">pooch</a></mark>. This step is needless if we are using the Vagrant/Docker virtual environment.</div>

The call to a  `data` function will automatically download the WSIs from the corresponding repository and save the slide in a cached directory:

In [None]:
prostate_svs, prostate_path = prostate_tissue()
ovarian_svs, ovarian_path = ovarian_tissue()

Notice that each  `data` function outputs the corresponding slide, as an *OpenSlide* object, and the path where the slide has been saved.

## Slide initialization

`histolab` maps a WSI file into a `Slide` object. Each usage of a WSI requires a 1-o-1 association with a `Slide` object contained in the `slide` module:

In [None]:
from histolab.slide import Slide

To initialize a `Slide` it is necessary to specify the WSI path, and the `processed_path` where the thumbnail and the tiles will be saved. In our example, we want the `processed_path` of each slide to be a subfolder of the current working directory:

In [None]:
import os

BASE_PATH = os.getcwd()

PROCESS_PATH_PROSTATE = os.path.join(BASE_PATH, 'prostate', 'processed')
PROCESS_PATH_OVARIAN = os.path.join(BASE_PATH, 'ovarian', 'processed')


prostate_slide = Slide(prostate_path, processed_path=PROCESS_PATH_PROSTATE)
ovarian_slide = Slide(ovarian_path, processed_path=PROCESS_PATH_OVARIAN)

<div class="alert alert-block alert-info">
<b>Note:</b> If our slides were stored in the same folder, this can be done directly on the whole dataset by calling the <mark>SlideSet</mark> object of the <mark>slide</mark> module.</div>

With a `Slide` object we can easily retrieve information about the slide, such as the slide name, the number of available levels, the dimensions at native magnification or at a specified level:

In [None]:
print(f"Slide name: {prostate_slide.name}")
print(f"Levels: {prostate_slide.levels}")
print(f"Dimensions at level 0: {prostate_slide.dimensions}")
print(f"Dimensions at level 1: {prostate_slide.level_dimensions(level=1)}")
print(f"Dimensions at level 2: {prostate_slide.level_dimensions(level=2)}")

In [None]:
print(f"Slide name: {ovarian_slide.name}")
print(f"Levels: {ovarian_slide.levels}")
print(f"Dimensions at level 0: {ovarian_slide.dimensions}")
print(f"Dimensions at level 1: {ovarian_slide.level_dimensions(level=1)}")
print(f"Dimensions at level 2: {ovarian_slide.level_dimensions(level=2)}")

Moreover, we can save and show the slide thumbnail in a separate window. In particular, the thumbnail image will be automatically saved in a subdirectory of the `processed_path`:


In [None]:
prostate_slide.save_thumbnail()
print(f"Thumbnails saved at: {prostate_slide.thumbnail_path}")
prostate_slide.show()

In [None]:
ovarian_slide.save_thumbnail()
print(f"Thumbnails saved at: {ovarian_slide.thumbnail_path}")
ovarian_slide.show()

## Tiles extraction

Once that our `Slide` objects are defined, we can proceed to extract the tiles. To speed up the extraction process, `histolab` automatically detects the tissue region with the largest connected area and crops the tiles within this field. The `tiler` module implements different strategies for the tiles extraction and provides an intuitive interface to easily retrieve a tile dataset suitable for our task. In particular, each extraction method is customizable with several common parameters:

P1) `tile_size`: the tile size;

P2) `level`: the extraction level (from 0 to the number of available levels);

P3) `check_tissue`: if a minimum percentage of tissue is required to save the tiles (default is 80\%);

P4) `prefix`: a prefix to be added at the beginning of the tiles' filename (default is the empty string);

P5) `suffix`: a suffix to be added to the end of the tiles' filename (default is `.png`).

### Random extraction

The simplest approach we may adopt is to randomly crop a fixed number of tiles from our slides; in this case, we need the `RandomTiler` extractor:

In [None]:
from histolab.tiler import RandomTiler

Let's suppose that we want to randomly extract 6 squared tiles at level 2 of size 512 from our prostate slide, and that we want to save them only if they have at least 80\% of tissue inside. We then initialize our `RandomTiler` extractor as follows:

In [None]:
random_tiles_extractor = RandomTiler(
    tile_size=(512, 512),
    n_tiles=6,
    level=2,
    seed=42,
    check_tissue=True, # default 
    prefix="random", # save tiles in the "random" subdirectory of slide's processed_path
    suffix=".png" # default
)

Notice that we also specify the random seed to ensure the reproducibility of the derived dataset.

Starting the extraction is as simple as calling the `extract` method on our slide:

In [None]:
random_tiles_extractor.extract(prostate_slide)


### Grid extraction

Instead of picking tiles at random, we may want to retrieve all the tiles available. The `GridTiler` extractor crops the tiles following a grid structure on the largest tissue region detected in the WSI:

In [None]:
from histolab.tiler import GridTiler

In our example, we want to extract squared tiles at level 0 of size 512 from our ovarian slide, independently of the amount of tissue detected. By default, tiles will not overlap, namely the parameter defining the number of overlapping pixels between two adjacent tiles, `pixel_overlap`, is set to zero:

In [None]:
grid_tiles_extractor = GridTiler(
    tile_size=(512, 512),
    level=0,
    check_tissue=False, 
    pixel_overlap=0, # default 
    prefix="grid", # save tiles in the "grid" subdirectory of slide's processed_path 
    suffix=".png" # default
)

Again, the extraction process starts when the extract method is called on our slide:

In [None]:
grid_tiles_extractor.extract(ovarian_slide)

### Score-based extraction

According to the task addressed, the tiles extracted may not be equally informative.

The `ScoreTiler` allows us to save only the "best" tiles, among all the ones extracted with a grid structure, based on a specific scoring function. 

For example, let's suppose that our goal is the detection of mitotic activity on our ovarian slide. In this case, tiles with a higher presence of nuclei are preferable over tile with little or no nuclei. We can leverage the `NucleiScorer` function of the `scorer` module to order the extracted tiles based on the proportion of the tissue and of the hematoxylin staining.

In particular, the score is computed as $N_t\cdot\mathrm{tanh}(T_t)$, where $N_t$ is the percentage of nuclei and $T_t$ the percentage of tissue in the tile $t$.

First, we need the extractor and the scorer:

In [None]:
from histolab.tiler import ScoreTiler
from histolab.scorer import NucleiScorer

As the `ScoreTiler` extends the `GridTiler` extractor, we also set the `pixel_overlap` as additional parameter. Moreover, we can specify the number of the top tiles we want to save with the `n_tile` parameter:

In [None]:
scored_tiles_extractor = ScoreTiler(
    scorer = NucleiScorer(),
    tile_size=(512, 512),
    n_tiles=0, 
    level=0,
    check_tissue=True, 
    pixel_overlap=0, # default 
    prefix="scored", # save tiles in the "scored" subdirectory of slide's processed_path 
    suffix=".png" # default
)

Finally, when we extract our cropped images, we can also write a report of the saved tiles and their scores in a CSV file:

In [None]:
summary_filename = "summary_ovarian_tiles.csv"
SUMMARY_PATH = os.path.join(ovarian_slide.processed_path, summary_filename)

scored_tiles_extractor.extract(ovarian_slide, report_path=SUMMARY_PATH)