# Tile Generation Tutorial (File Edition)

Welcome to the tile generation tutorial!

As a whole slide image is too large for deep learning model training, a slide is often divded into a set of small tiles, and used for training. For tile-based whole slide image analysis, generating tiles and labels is an important and laborious step. With LUNA tiling CLIs and tutorials, you can easily generate tile labels and get your data ready for downstream analysis. In this notebook, we will see how to generate tiles and labels using LUNA tiling CLIs. Here are the main steps we will review:

1. Load slides
2. Generate tiles, labels
3. Collect tiles for model training

Through out this notebook, we will use different method parameter files. Please refer to the example parameter files in the `configs` directory to follow these steps.


In [5]:
import os
HOME = os.environ['HOME']
LUNA_HOME = f"{HOME}/vmount"
PROJECT = "PRO-12-123"
SLIDE_ID = "01OV002-bd8cdc70-3d46-40ae-99c4-90ef77"

DATASET_DIR = f"{LUNA_HOME}/{PROJECT}/data/toy_data_set"
ANNOTATION_DIR = f"{DATASET_DIR}/table/ANNOTATIONS"
TILING_DIR = f"{LUNA_HOME}/{PROJECT}/tiling"
SLIDE = f"{DATASET_DIR}/{SLIDE_ID}.svs"

In [6]:
# env DATASET_URL=file:///$LUNA_HOME/PRO-12-123/

Initially, we'll walk through each CLI step manually-- then run them using the Luna CLI client in parallel

First, we generate tiles given a slide image of size 128 at 20x, and save them

In [7]:
!generate_tiles {SLIDE} \
--tile_size 128 \
--requested_magnification 10 \
--output-urlpath {TILING_DIR}/test/tiles

Perhaps you already have a cluster running?
Hosting the HTTP server on port 42858 instead
saving to /home/pollardw/vmount/PRO-12-123/tiling/test/tiles/01OV002-bd8cdc70-3d46-40ae-99c4-90ef77.tiles.parquet
[32m2023-07-31 13:50:20.002[0m | [34m[1mDEBUG   [0m | [36mluna.common.utils[0m:[36mwrapper[0m:[36m146[0m - [34m[1mcli ran in 14.88s[0m
2023-07-31 13:50:20,018 - distributed.worker - ERROR - Failed to communicate with scheduler during heartbeat.
Traceback (most recent call last):
  File "/opt/conda/lib/python3.9/site-packages/distributed/comm/tcp.py", line 225, in read
    frames_nbytes = await stream.read_bytes(fmt_size)
tornado.iostream.StreamClosedError: Stream is closed

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.9/site-packages/distributed/worker.py", line 1215, in heartbeat
    response = await retry_operation(
  File "/opt/conda/lib/python3.9/site-packages/distributed/utils_

# ! ls -1 {TILING_DIR}/test/tiles

In [None]:
!detect_tissue {SLIDE} \
{TILING_DIR}/test/tiles/01OV002-bd8cdc70-3d46-40ae-99c4-90ef77.tiles.parquet \
--requested_magnification 2 \
--filter_query "otsu_score > 0.1" \
--output-urlpath {TILING_DIR}/test/detect

In [None]:
!ls {TILING_DIR}/test/detect

In [None]:
!ls {DATASET_DIR}/table/ANNOTATIONS

In [None]:
!label_tiles --help

In [None]:
!label_tiles \
"{DATASET_DIR}/table/ANNOTATIONS/slide_annotation_dataset_TCGA collection_ov_regional.parquet" \
"{TILING_DIR}/test/detect/{SLIDE_ID}-filtered.tiles.parquet" \
{SLIDE_ID} \
--output-urlpath "{TILING_DIR}/test/label"

In [None]:
!save_tiles {SLIDE} \
{TILING_DIR}/test/label \
--num_cores 4 \
--batch_size 200 \
--dataset-id PRO_TILES \
--output-urlpath {TILING_DIR}/test/saved_tiles

In [None]:
from luna.common.utils import LunaCliClient

def pipeline (slide_id, input_slide, input_annotations):
    client = LunaCliClient("~/vmount/PRO-12-123/2_tiling-file", slide_id)
    
    client.bootstrap("slide", input_slide)
    client.bootstrap("annotations", input_annotations)
    
    client.configure("generate_tiles", "slide", 
        tile_size=128, 
        requested_magnification=10
    ).run("source_tiles")

    client.configure("detect_tissue", "slide", "source_tiles",
        filter_query="otsu_score > 0.1", 
        requested_magnification=2
    ).run("detected_tiles")

    client.configure("label_tiles", "annotations", "detected_tiles").run("labled_tiles")

    client.configure( "save_tiles", "slide", "labled_tiles",
        num_cores=4, batch_size=200, dataset_id='PRO_TILES_LABELED'
    ).run("saved_tiles")

In [None]:
from concurrent.futures import ThreadPoolExecutor
import pandas as pd

df_slides = pd.read_parquet("../PRO-12-123/data/toy_data_set/table/SLIDES/slide_ingest_PRO-12-123.parquet")
        
with ThreadPoolExecutor(5) as pool:
    
    for index, row in df_slides.iterrows():
        print (index)
        
        pool.submit(pipeline, index, row.slide_image, "../PRO-12-123/data/toy_data_set/table/ANNOTATIONS")
        

In [None]:
import pandas as pd
df_tiles = pd.read_parquet("~/vmount/PRO-12-123/datasets/PRO_TILES_LABELED/").query("intersection_area > 0")
print (df_tiles['regional_label'].value_counts())
df_tiles

Congratulations! Now you have 2120 tumor, 860 stroma, and 751 fat tiles images and labels ready to train your model.