# HistoMIL Preprocessing Notebook

This Jupyter notebook is designed to guide users through the process of performing various preprocessing steps on histopathology whole-slide images using HistoMIL. This includes tissue segmentation, patching (tiling), and feature extraction. All preprocessing steps will be performed in batch. Predefined preprocessing parameters can be found in the HistoMIL package and can be modified in this notebook.

Additionally, this notebook will demonstrate how to perform preprocessing steps on a single slide file.

## Getting Started

Before proceeding with this notebook, please make sure that you have followed the setup instructions provided in the project's README file. This includes creating a conda environment and installing the required dependencies.

## Batch Preprocessing

The batch preprocessing pipeline in HistoMIL consists of the following steps:

Tissue segmentation
Patching (tiling)
Feature extraction
The default preprocessing parameters can be found in the HistoMIL/EXP/paras/slides.py file. You can modify these parameters to customize the preprocessing pipeline for your specific needs.

To perform batch preprocessing, you can use the cohort_slide_preprocessing function in the Experiment.cohort_slide_preprocessing module (HistoMIL.EXP.workspace.experiment.Experiment). Here's an example of how to run batch pre-processing:

In [None]:
# avoid pandas warning
import pandas as pd
pd.options.mode.chained_assignment = None  # default='warn'
# avoid multiprocessing problem
import torch
torch.multiprocessing.set_sharing_strategy('file_system')

#------>stop skimage warning
import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning)
import imageio.core.util
import skimage 
def ignore_warnings(*args, **kwargs):
    pass
imageio.core.util._precision_warn = ignore_warnings

#set logger as INFO
from HistoMIL import logger
import logging
logger.setLevel(logging.INFO)

In [None]:
#--------------------------> parameters
from HistoMIL.EXP.paras.env import EnvParas
preprocess_env = EnvParas()
preprocess_env.exp_name = "wandb exp name"
preprocess_env.project = "wandb project name" 
preprocess_env.entity = "wandb entity name"
#----------------> cohort
preprocess_env.cohort_para.localcohort_name = "BRCA"
preprocess_env.cohort_para.task_name = "BRCA"
preprocess_env.cohort_para.cohort_file = "/BRCA_all_files.csv"
preprocess_env.cohort_para.pid_name = "PatientID"

#--------------------------> init machine and person
import pickle
machine_cohort_loc = "Path/to/BRCA_machine_config.pkl"
with open(machine_cohort_loc, "rb") as f:   # Unpickling
    [data_locs,exp_locs,machine,user] = pickle.load(f)
preprocess_env.data_locs = data_locs
preprocess_env.exp_locs = exp_locs

In [None]:

#--------------------------> setup experiment
logger.info("setup experiment")
from HistoMIL.EXP.workspace.experiment import Experiment
exp = Experiment(env_paras=preprocess_env)
exp.setup_machine(machine=machine,user=user)
logger.info("setup data")
exp.init_cohort()
logger.info("pre-processing..")
exp.cohort_slide_preprocessing(concepts=["slide","tissue","patch","feature"],
                                is_fast=True, force_calc=False)

## Single Slide Preprocessing

If you want to perform preprocessing steps on a single slide file, you can use the preprocess_slide function in the HistoMIL.DATA.Slide.collector.pre_process_wsi_collector  function. Here's how we define this function and an example of how to use this function:

In [None]:
from pathlib import Path
from HistoMIL.DATA.Slide.collector import WSICollector,CollectorParas
from HistoMIL.EXP.paras.slides import DEFAULT_CONCEPT_PARAS
def pre_process_wsi_collector(data_locs,
                            wsi_loc:Path,
                            collector_paras:CollectorParas,
                            concepts:list=["slide","tissue","patch"],
                            fast_process:bool=True,force_calc:bool=False):

    C = WSICollector(db_loc=data_locs,wsi_loc=wsi_loc,paras=collector_paras)
    try:

        for name in concepts:
            if name == "tissue":
                if fast_process:
                    from HistoMIL.EXP.paras.slides import set_min_seg_level
                    C.paras.tissue = set_min_seg_level(C.paras.tissue, C.slide,C.paras.tissue.min_seg_level)
                    logger.debug(f"Collector:: set seg level to {C.paras.tissue.seg_level}")
            C.create(name)
            C.get(name, force_calc) # for tissue, req_idx_0 is always default slide
    except Exception as e:
        logger.exception(e)
    else:
        logger.info(f"Collector:: {wsi_loc} is done")
    finally:
        del C

folder = "folder of wsi"
fname = "name of wsi"
wsi_loc = Path(str("/"+ folder +"/"+ fname))

pre_process_wsi_collector(data_locs,
                            wsi_loc,
                            collector_paras=DEFAULT_CONCEPT_PARAS,
                            )