# Tile Generation Tutorial (File Edition)

Welcome to the tile generation tutorial!

As a whole slide image is too large for deep learning model training, a slide is often divded into a set of small tiles, and used for training. For tile-based whole slide image analysis, generating tiles and labels is an important and laborious step. With LUNA tiling CLIs and tutorials, you can easily generate tile labels and get your data ready for downstream analysis. In this notebook, we will see how to generate tiles and labels using LUNA tiling CLIs. Here are the main steps we will review:

1. Load slides
2. Generate tiles, labels
3. Collect tiles for model training

Through out this notebook, we will use different method parameter files. Please refer to the example parameter files in the `configs` directory to follow these steps.


In [1]:
import os
HOME = os.environ['HOME']
LUNA_HOME = f"{HOME}/vmount"
PROJECT = "PRO-12-123"
SLIDE_ID = "01OV002-bd8cdc70-3d46-40ae-99c4-90ef77"

DATASET_DIR = f"{LUNA_HOME}/{PROJECT}/data/toy_data_set"
ANNOTATION_DIR = f"{DATASET_DIR}/table/ANNOTATIONS"
TILING_DIR = f"{LUNA_HOME}/{PROJECT}/tiling"
SLIDE = f"{DATASET_DIR}/{SLIDE_ID}.svs"

Initially, we'll walk through each CLI step manually-- then run them using the Luna CLI client in parallel

First, we generate tiles given a slide image of size 128 at 20x, and save them

In [2]:
!generate_tiles {SLIDE} \
--tile_size 128 \
--requested_magnification 10 \
--output-urlpath {TILING_DIR}/test/tiles

saving to /home/pollardw/vmount/PRO-12-123/tiling/test/tiles/01OV002-bd8cdc70-3d46-40ae-99c4-90ef77.tiles.parquet
[32m2023-08-03 02:29:40.526[0m | [34m[1mDEBUG   [0m | [36mluna.common.utils[0m:[36mwrapper[0m:[36m146[0m - [34m[1mcli ran in 9.95s[0m


In [3]:
!detect_tissue {SLIDE} \
{TILING_DIR}/test/tiles/01OV002-bd8cdc70-3d46-40ae-99c4-90ef77.tiles.parquet \
--requested_magnification 2 \
--filter_query "otsu_score > 0.1" \
--output-urlpath {TILING_DIR}/test/detect

[32m2023-08-03 02:29:44.980[0m | [1mINFO    [0m | [36mluna.pathology.cli.run_tissue_detection[0m:[36mdetect_tissue[0m:[36m282[0m - [1mSlide dimensions (53760, 54840)[0m
[32m2023-08-03 02:29:44.980[0m | [1mINFO    [0m | [36mluna.pathology.cli.run_tissue_detection[0m:[36mdetect_tissue[0m:[36m286[0m - [1mThumbnail scale factor: 20[0m
[32m2023-08-03 02:29:47.116[0m | [34m[1mDEBUG   [0m | [36mluna.common.utils[0m:[36mwrapper[0m:[36m146[0m - [34m[1mget_downscaled_thumbnail ran in 2.14s[0m
[32m2023-08-03 02:29:47.117[0m | [1mINFO    [0m | [36mluna.pathology.cli.run_tissue_detection[0m:[36mdetect_tissue[0m:[36m289[0m - [1mSample array size: (2742, 2688, 3)[0m
[32m2023-08-03 02:29:47.123[0m | [1mINFO    [0m | [36mluna.pathology.cli.run_tissue_detection[0m:[36mdetect_tissue[0m:[36m292[0m - [1mSlide dimensions (53760, 54840)[0m
[32m2023-08-03 02:29:47.124[0m | [1mINFO    [0m | [36mluna.pathology.cli.run_tissue_detection[0m:[36md

In [4]:
!label_tiles \
"{DATASET_DIR}/table/ANNOTATIONS/slide_annotation_dataset_TCGA collection_ov_regional.parquet" \
"{TILING_DIR}/test/detect/{SLIDE_ID}.tiles.parquet" \
{SLIDE_ID} \
--output-urlpath "{TILING_DIR}/test/label"

[32m2023-08-03 02:31:25.464[0m | [1mINFO    [0m | [36mluna.pathology.cli.generate_tile_labels[0m:[36mgenerate_tile_labels[0m:[36m88[0m - [1mslide_id=01OV002-bd8cdc70-3d46-40ae-99c4-90ef77[0m
/home/pollardw/vmount/PRO-12-123/data/toy_data_set/table/ANNOTATIONS/01OV002-bd8cdc70-3d46-40ae-99c4-90ef77.annotation.geojson TCGA collection ov_regional
100%|█████████████████████████████████████| 5088/5088 [00:00<00:00, 6020.64it/s]
[32m2023-08-03 02:31:26.365[0m | [1mINFO    [0m | [36mluna.pathology.cli.generate_tile_labels[0m:[36mgenerate_tile_labels[0m:[36m157[0m - [1m                                            level_0  ...  intersection_area
address                                              ...                   
x26_y56_z10  01OV002-bd8cdc70-3d46-40ae-99c4-90ef77  ...           0.053094
x26_y57_z10  01OV002-bd8cdc70-3d46-40ae-99c4-90ef77  ...           0.341454
x27_y56_z10  01OV002-bd8cdc70-3d46-40ae-99c4-90ef77  ...           0.655530
x27_y57_z10  01OV002-bd8cdc70

In [5]:
!save_tiles {SLIDE} \
{TILING_DIR}/test/label/{SLIDE_ID}.regional_label.tiles.parquet \
--num_cores 4 \
--batch_size 200 \
--dataset-id PRO_TILES \
--output-urlpath {TILING_DIR}/test/saved_tiles

[32m2023-08-03 02:31:30.548[0m | [1mINFO    [0m | [36mluna.pathology.cli.save_tiles[0m:[36msave_tiles[0m:[36m127[0m - [1mNow generating tiles with batch_size=200![0m
Traceback (most recent call last):       ] | 0% Completed |  0.4s[2K
  File "/opt/conda/bin/save_tiles", line 8, in <module>
    sys.exit(fire_cli())
  File "/opt/conda/lib/python3.9/site-packages/luna/pathology/cli/save_tiles.py", line 176, in fire_cli
    fire.Fire(cli)
  File "/opt/conda/lib/python3.9/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/opt/conda/lib/python3.9/site-packages/fire/core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/opt/conda/lib/python3.9/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/luna/common/utils.py", line 144, in wrapper
    result = func(*args, **kwar

In [6]:
from luna.common.utils import LunaCliClient

def pipeline (slide_id, input_slide, input_annotations):
    client = LunaCliClient("~/vmount/PRO-12-123/2_tiling-file", slide_id)
    
    client.bootstrap("slide", input_slide)
    client.bootstrap("annotations", input_annotations)
    
    client.configure("generate_tiles", "slide", 
        tile_size=128, 
        requested_magnification=10
    ).run("source_tiles")

    client.configure("detect_tissue", "slide", "source_tiles",
        filter_query="otsu_score > 0.1", 
        requested_magnification=2
    ).run("detected_tiles")

    client.configure("label_tiles", "annotations", "detected_tiles").run("labled_tiles")

    client.configure( "save_tiles", "slide", "labled_tiles",
        num_cores=4, batch_size=200, dataset_id='PRO_TILES_LABELED'
    ).run("saved_tiles")

In [7]:
from concurrent.futures import ThreadPoolExecutor
import pandas as pd

df_slides = pd.read_parquet("../PRO-12-123/data/toy_data_set/table/SLIDES/slide_ingest_PRO-12-123.parquet")
        
with ThreadPoolExecutor(5) as pool:
    
    for index, row in df_slides.iterrows():
        print (index)
        
        pool.submit(pipeline, index, row.slide_image, "../PRO-12-123/data/toy_data_set/table/ANNOTATIONS")
        

0


AttributeError: 'Series' object has no attribute 'slide_image'

In [None]:
import pandas as pd
df_tiles = pd.read_parquet("~/vmount/PRO-12-123/datasets/PRO_TILES_LABELED/").query("intersection_area > 0")
print (df_tiles['regional_label'].value_counts())
df_tiles

Congratulations! Now you have 2120 tumor, 860 stroma, and 751 fat tiles images and labels ready to train your model.