# Tile extraction

### Import libraries

Always the first step.

In [None]:
# Set environment variables with os package
import os
os.environ['SF_BACKEND'] = 'torch' # Alternative is 'tensorflow'
os.environ['SF_SLIDE_BACKEND'] = 'cucim' # Alternative is 'libvips'
os.environ['CUDA_VISIBLE_DEVICES'] = '0' # Set which GPU(s) to use 
os.environ['CUDA_LAUNCH_BLOCKING'] = '1' # Make sure CUDA kernel doesn't asynchronously start
os.environ['TORCH_USE_CUDA_DSA'] = '1' # Extra CUDA error logging


# Check if GPU is available
if os.environ['SF_BACKEND']=='torch':
    import torch
    print('GPU available: ', torch.cuda.is_available())
    print('GPU count: ', torch.cuda.device_count())
    print('GPU current: ', torch.cuda.current_device())
    print('GPU name: ', torch.cuda.get_device_name(torch.cuda.current_device()))
elif os.environ['SF_BACKEND']=='tensorflow':
    import tensorflow as tf
    print("GPU: ", len(tf.config.list_physical_devices('GPU')))

# import slideflow
import slideflow as sf

# Set verbose logging
import logging
logging.getLogger('slideflow').setLevel(logging.INFO)
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '10'
import sys
sys.stderr = sys.__stdout__

# Check if slideflow was properly installed
sf.about()

<a id='import'></a>
### Getting Started with a Slideflow Project
We are starting this tutorial under the assumption that you have already initialized a slideflow project. Once the project has been created and you have specified the paths to datasets, annotation files, etc. we will begin by initializing a Slideflow Project object.

In [None]:
# Set root paths
username = "skochanny" # change me
root_path = f'/scratch/{username}/PROJECTS'
labshare_path = '/gpfs/data/pearson-lab'
project_name = "TEST_PROJECT"
project_root_path = f"{root_path}/{project_name}"

Make the Project class object. 

In [None]:
# Be sure to check that the project path is correct
P = sf.Project(project_root_path)

## Generate ROIs

Normally, we get a pathologist to annotate the regions of interest (ROIs) using QuPath or Slideflow Studio. Alternatively, you can automatically generate ROIs using a tumor vs. normal pancancer model that we trained on all of TCGA and segmenting the slide for tumor areas. 

The pancancer tumor vs. normal model is on the labshare at: `/MODELS/pancan_segmentation`

ROIs will be saved in the ROIs directory as configured in the dataset settings. Alternatively, ROIs can be exported to a user-defined directory using the `dest` argument.

By default, ROIs will be generated for all slides in the dataset, skipping slides with existing ROIs. To overwrite any existing ROIs, use the `overwrite=True` argument.

In [None]:
# Set model path
model_path = f'{labshare_path}/MODELS/pancan_segmentation'

# create a dataset object
dataset = P.dataset()

# Generate ROIs for all slides in the dataset.
dataset.generate_rois(model_path)

You can verify which slides have ROIs by using `dataset.summary()`.

In [None]:
dataset.summary()

## Tile extraction

Once the project and dataset have been specified, we can extract tiles from the whole slide images. If this step has already been completed, you can skip this section.<br><br>
Notes regarding tile extraction:<br>
- If you do not have manually annotated ROIs, the image will be tiled. If taking this route, be sure to edit the `whitespace_fraction` parameter such that tiles that are mostly white space are not included in the analysis. You can check the extraction report to see this. 
- If you do have manually annotated ROIs, you can force the tile extration to be only within the ROIs by setting `roi_method='inside'` 
- `qc` is a quality control parameter that will remove tiles based on Gaussian blur detection and/or Otsu's method<br>

Additional parameters for tile extraction can be found in the [Slideflow documentation here](https://slideflow.dev/dataset/#slideflow.Dataset.extract_tiles). 


In [None]:
from slideflow.slide import qc
P.extract_tiles(tile_px=299, # can resize to 224x224 pixel tiles later, which is default for most feature extractors
                tile_um=302, 
                whitespace_fraction=0.95,
                roi_method='inside',
                source = ['LUADvsLUSC'],
                #skip_existing=False,
                qc=[qc.GaussianV2(), qc.Otsu()]) # also can set qc='both'

Make sure to examine the Extraction Report PDF to ensure that your extracted tiles look good. It is saved in the `tfrecords/{tile_px}px_{tile_um}um/` directory. 