## Walkthrough of MADELEINE slide-embeddings extraction and downstream evaluation (molecular subtyping on BCNB)

## Step 0a: Download WSI

1. Download the slide from source (https://bcnb.grand-challenge.org/Dataset/) [1]. You will have to make an account at grand-challenges.org. 

2. Move the WSIs to ./data/BCNB/wsi

## Step 0b: Segment tissue and extract patches

Use the CLAM API (https://github.com/mahmoodlab/CLAM) to segment the tissue and extract patches. MADELEINE requires individual patch size to be 256x256 and magnification to be 10x. You can run the following command to segment the tissue and extract the patches:

```python create_patches_fp.py --source ./data/BCNB/wsi --save_dir ./data/BCNB  --patch_size 256 --patch_level 1 --preset tcga.csv --seg --patch --stitch```

Explanation of arguments:
- **source**: Path to WSI

- **save_dir**: Path to where patches will be stored 
- **patch_size**: Size of individual tiles 
- **patch_level**: Magnification level in WSI. Since BCNB slides are originally at 20x, patch_level=1 corresponds to 10x magnification
- **preset**: Segmentation parameters
- **seg**: Segment tissue
- **patch**: Patch tissue
- **stitch**: Show visualization of stitched patches

## Step 0c: Extract patch features

Continuing with the CLAM API, now extract the patch features using CONCH patch encoder [2]. You can run the following command:

```CUDA_VISIBLE_DEVICES=0 python extract_features_fp.py --data_h5_dir ./data/BCNB/patches --data_slide_dir ./data/BCNB/wsi --csv_path ./data/BCNB/process_list_autogen.csv --feat_dir ./data/BCNB/ --batch_size 512 --slide_ext .tiff```

Explanation of arguments:
- **data_h5_dir**: Path to coordinates

- **data_slide_dir**: Path to WSI
- **csv_path**: Path to csv file with filenames without extension. Easiest is to use the csv file generated by step 0b
- **feat_dir**: Save dir for features
- **batch_size**: Batch size for forward pass of patch feature encoder. Reduce this if you get OOM errors.
- **slide_ext**: Extension of WSI

Using the above command, we were able to extract BCNB features using a single NVIDIA RTX 3090Ti.

Please do not ask for permission for weights of CONCH via the MADELEINE repository. Refer to instructions at [CONCH model weights](http://github.com/mahmoodlab/)

## Pre-extracted patch features
If you do not want to run all the preprocessing steps, you can directly download the patch features we provide from HERE and place them in ./data/BCNB/h5_files

## References 
[1] Xu, Feng, et al. "Predicting axillary lymph node metastasis in early breast cancer using deep learning on primary tumor biopsy slides." Frontiers in oncology 11 (2021): 759007.
[2] Lu, Ming Y., et al. "A visual-language foundation model for computational pathology." Nature Medicine 30.3 (2024): 863-874.