## Step-by-step instructions to download HEST-1k 

This tutorial will guide you to:

- Download HEST-1k in its entirety (scanpy, whole-slide images, patches, nuclear segmentation, alignment preview)
- Download some samples of HEST-1k 
- Download samples with some attributes (e.g., all breast cancer cases) 
- Inspect freshly downloaded samples

For each sample, we provide:

- **wsis/**: H&E-stained whole slide images in pyramidal Generic TIFF (or pyramidal Generic BigTIFF if >4.1GB)
- **st/**: Spatial transcriptomics expressions in a scanpy .h5ad object
- **metadata/**: Metadata
- **spatial_plots/**: Overlay of the WSI with the st spots
- **thumbnails/**: Downscaled version of the WSI
- **tissue_seg/**: Tissue segmentation masks:
    - `{id}_mask.jpg`: Downscaled or full resolution greyscale tissue mask
    - `{id}_mask.pkl`: Tissue/holes contours in a pickle file
    - `{id}_vis.jpg`: Visualization of the tissue mask on the downscaled WSI
- **cellvit_seg/**: Cellvit nuclei segmentation
- **pixel_size_vis/**: Visualization of the pixel size
- **patches/**: 256x256 H&E patches (0.5µm/px) extracted around ST spots in a .h5 object optimized for deep-learning. Each patch is matched to the corresponding ST profile (see **st/**) with a barcode.
- **patches_vis/**: Visualization of the mask and patches on a downscaled WSI.


### Download HEST-1k 

In [None]:
from huggingface_hub import snapshot_download

local_dir='../hest_data' # hest will be dowloaded to this folder
snapshot_download(repo_id="MahmoodLab/hest", repo_type='dataset', local_dir=local_dir)


### Download HEST-1k based on sample IDs

In [None]:
from huggingface_hub import snapshot_download

local_dir='../hest_data' # hest will be dowloaded to this folder
ids_to_query = ['TENX96', 'TENX99'] # list of ids to query

list_patterns = [f"*{id}[_.]**" for id in ids_to_query]
snapshot_download(repo_id="MahmoodLab/hest", repo_type='dataset', local_dir=local_dir, allow_patterns=list_patterns)


### Download HEST-1k based on metadata keys (e.g., organ, technology, oncotree code)

In [None]:
from huggingface_hub import snapshot_download
import pandas as pd

local_dir='../hest_data' # hest will be dowloaded to this folder

meta_df = pd.read_csv("../metadata/HEST_v1_0_0.csv")

# Download all Invasive Ductal Carcinoma Breast cancer cases. 
meta_df = meta_df[meta_df['oncotree_code'] == 'IDC']
meta_df = meta_df[meta_df['organ'] == 'Breast']

ids_to_query = meta_df['id'].values

list_patterns = [f"*{id}[_.]**" for id in ids_to_query]
snapshot_download(repo_id="MahmoodLab/hest", repo_type='dataset', local_dir=local_dir, allow_patterns=list_patterns)

### Inspect freshly downloaded samples

In [None]:
from hest import load_hest

print('load hest...')
hest_d = load_hest('../hest_data') # location of the data
print('loaded hest')
for d in hest_d:
    print(d)