## Step-by-step instructions to interact HEST-1k 

This tutorial will guide you to:

- Read HEST data
- Visualized the spots over a downscaled version of the WSI
- Saving HESTData into Pyramidal Tif and anndata


This tutorial assumes that the user has already downloaded HEST-1k (in its entirety or partially). 

### Read HEST

In [3]:
from hest import read_HESTData
from hest import load_hest

# 1- Read the whole hest dataset
#hest_data = load_hest('../hest_data')

# 2- Read a subset of hest
hest_data = load_hest('../hest_data', id_list=['TENX96'])

st = hest_data[0]

# 3- Access objects

# ST (adata):
adata = st.adata
print('\n* Scanpy adata:')
print(adata)

# WSI:
wsi = st.wsi
print('\n* WSI:')
print(wsi)



* Scanpy adata:
AnnData object with n_obs × n_vars = 7233 × 541
    obs: 'in_tissue', 'pxl_col_in_fullres', 'pxl_row_in_fullres', 'array_col', 'array_row', 'n_counts', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'pct_counts_in_top_50_genes', 'pct_counts_in_top_100_genes', 'pct_counts_in_top_200_genes', 'pct_counts_in_top_500_genes', 'total_counts_mito', 'log1p_total_counts_mito', 'pct_counts_mito'
    var: 'n_cells_by_counts', 'mean_counts', 'log1p_mean_counts', 'pct_dropout_by_counts', 'total_counts', 'log1p_total_counts', 'mito'
    uns: 'spatial'
    obsm: 'spatial'

* WSI:
<hest.wsi.CuImageWSI object at 0x7fdf68239f10>


`st.adata` is a spatial scanpy object containing the following:
## Observations (st.adata.obs)
- `in_tissue`: Indicator if the observation is within the tissue (`in_tissue` comes from the initial Visium/Xenium run and might not be accurate, prefer the segmentation obtained by st.segment_tissue() instead).
- `pxl_col_in_fullres`: Pixel column position of the patch/spot centroid in the full resolution image.
- `pxl_row_in_fullres`: Pixel row position of the patch/spot centroid in the full resolution image.
- `array_col`: Patch/spot column position in the array.
- `array_row`: Patch/spot row position in the array.
- `n_counts`: Number of counts for each observation.
- `n_genes_by_counts`: Number of genes detected by counts in each observation.
- `log1p_n_genes_by_counts`: Log-transformed number of genes detected by counts.
- `total_counts`: Total counts per observation.
- `log1p_total_counts`: Log-transformed total counts.
- `pct_counts_in_top_50_genes`: Percentage of counts in the top 50 genes.
- `pct_counts_in_top_100_genes`: Percentage of counts in the top 100 genes.
- `pct_counts_in_top_200_genes`: Percentage of counts in the top 200 genes.
- `pct_counts_in_top_500_genes`: Percentage of counts in the top 500 genes.
- `total_counts_mito`: Total mitochondrial counts per observation. (note that this field might not be accurate)
- `log1p_total_counts_mito`: Log-transformed total mitochondrial counts. (note that this field might not be accurate)
- `pct_counts_mito`: Percentage of counts that are mitochondrial. (note that this field might not be accurate)

## Variables (st.adata.var)
- `n_cells_by_counts`: Number of cells detected by counts for each variable.
- `mean_counts`: Mean counts per variable.
- `log1p_mean_counts`: Log-transformed mean counts.
- `pct_dropout_by_counts`: Percentage of dropout events by counts.
- `total_counts`: Total counts per variable.
- `log1p_total_counts`: Log-transformed total counts.
- `mito`: Indicator if the gene is mitochondrial. (note that this field might not be accurate)

## Unstructured (st.adata.uns)
- `spatial`: Contains a downscaled version of the full resolution image in `st.adata.uns['spatial']['ST']['images']['downscaled_fullres']`

## Observation-wise Multidimensional (st.adata.obsm)
- `spatial`: Pixel coordinates of spots/patches centroids on the full resolution image. (first column is x axis, second column is y axis)

## Visualizing the spots over a downscaled version of the WSI

In [None]:
# visualize the spots over a downscaled version of the full resolution image
save_dir = '.'
st.save_spatial_plot(save_dir)


## Saving to pyramidal tiff and h5
Save `HESTData` object to `.tiff` + expression `.h5ad` and a metadata file.

In [None]:
# Warning saving a large image to pyramidal tiff (>1GB) can be slow on a hard drive.
st.save(save_dir, pyramidal=True)

## Tissue segmentation

We integrated 2 tissue segmentation methods:

- Image processing-based using Otsu thresholding 
- Deep learning-based using a fine-tuned DeepLabV3 ResNet50


In [None]:
save_dir = '.'

name = 'tissue_seg_otsu'
st.segment_tissue(method='otsu') 
st.save_tissue_seg_pkl(save_dir, name)

name = 'tissue_seg_deep'
st.segment_tissue(method='deep') 


## Patching

In [None]:
patch_save_dir = './processed'

st.dump_patches(
    patch_save_dir,
    'demo',
    target_patch_size=224,
    target_pixel_size=0.5
)