# Tile Generation Tutorial (Waystation edition)

In [None]:
env DATASET_URL=http://waystation:6077/

We can run the same save_tiles command as before since we already ran the preceeding steps (if you haven't worked through 2_tiling-file, run it first!)

In [None]:
import pandas as pd

In [None]:
!save_tiles \
~/vmount/PRO-12-123/data/toy_data_set/01OV002-bd8cdc70-3d46-40ae-99c4-90ef77.svs \
~/vmount/PRO-12-123/tiling/test/label \
--num_cores 4 --batch_size 200 --dataset_id PRO_TILES_LABELED_S3 \
-o ~/vmount/PRO-12-123/tiling/test/saved_tiles

We can see that we got a REST response <Response [200]>: {"dsid":"PRO_TILES_S3","rows_written":1303,"sid":"626c0a1dbfa0f49e3e026f6a-01OV002-bd8cdc70-3d46-40ae-99c4-90ef77","status":"success"}, so our 1303 tiles for this slide ID were successfully written!

Since waystation was configured with a local minio instance, we can read our dataset directly using S3

In [None]:
df_tiles = pd.read_parquet(
    "s3://datasets/PRO_TILES_LABELED_S3/",
    storage_options={
        "key": 'admin',
        "secret": 'password',
        "client_kwargs": {"endpoint_url": "http://minio:9000"}
    }
)
df_tiles

In [None]:
from luna.common.utils import LunaCliClient
import os
def pipeline (slide_id, input_slide, input_annotations):
    
    print (os.environ)
    client = LunaCliClient("~/vmount/PRO-12-123/2_tiling-waystation", slide_id)
    
    client.bootstrap("slide", input_slide)
    client.bootstrap("annotations", input_annotations)
    
    client.configure("generate_tiles", "slide", 
        tile_size=128, 
        requested_magnification=10
    ).run("source_tiles")

    client.configure("detect_tissue", "slide", "source_tiles",
        filter_query="otsu_score > 0.1", 
        requested_magnification=2
    ).run("detected_tiles")

    client.configure("label_tiles", "annotations", "detected_tiles").run("labled_tiles")

    client.configure( "save_tiles", "slide", "labled_tiles",
        num_cores=4, batch_size=200, dataset_id='PRO_TILES_LABELED_S3'
    ).run("saved_tiles")

In [None]:
from concurrent.futures import ThreadPoolExecutor
import pandas as pd

df_slides = pd.read_parquet("../PRO-12-123/data/toy_data_set/table/SLIDES/slide_ingest_PRO-12-123.parquet")
        
with ThreadPoolExecutor(5) as pool:
    
    for index, row in df_slides.iterrows():
        print (index)
        
        pool.submit(pipeline, index, row.slide_image, "../PRO-12-123/data/toy_data_set/table/ANNOTATIONS")
        

Now, we when we read the dataset again, we see our tiles from all slides

In [None]:
df_tiles = pd.read_parquet(
    "s3://datasets/PRO_TILES_LABELED_S3/",
    storage_options={
        "key": 'admin',
        "secret": 'password',
        "client_kwargs": {"endpoint_url": "http://minio:9000"}
    }
)
print (df_tiles['regional_label'].value_counts())
df_tiles


We still have 2120 tumor, 860 stroma, and 751 fat tiles images and labels ready to train your model, this time aggregated at an S3 endpoint!