## Xenium sensitivity analysis

### Relative per-gene sensitivity between two samples

This task compares the sensitivity of two samples obtained with different panels (Xenium Prime 5K relative to Xenium v1).

| Sample  | Panel | Organ | Disease |
|---------|-------|-------|---------|
| TENX143 | Xenium Human 5K Pan Tissue and Pathways Panel | Lymphoid | Diseased (Reactive Lymph Node) |
| TENX125 | Xenium Human Multi-Tissue and Cancer Panel (V1) | Lymphoid | Diseased (Reactive follicular hyperplasia) |
| TENX158 | Xenium Human 5K Pan Tissue and Pathways Panel | Skin | Cancer (Primary Dermal Melanoma) |
| TENX115 | Xenium Human Skin Gene Expression Panel (V1) | Skin | Cancer (SKCM) |

Compare same-organ samples to have enough genes in common.

In [None]:
from hest import iter_hest
from hest.sensitivity import find_common_genes, filter_adata_by_gene_names, paired_sensitivity

# ids_to_query = ['TENX143', 'TENX125']
ids_to_query = ['TENX158', 'TENX115']

adata_list = []
for st in iter_hest('../hest_data', id_list=ids_to_query):
    # TO DO: Unify gene names
    adata_list.append(st.adata)

common_genes = find_common_genes(adata_list)
if len(common_genes) > 0:
    print('Number of Common Genes: ', len(common_genes))
else:
    raise ValueError('No common genes found between the datasets')

adata_filtered_1 = filter_adata_by_gene_names(adata_list[0], common_genes)
adata_filtered_2 = filter_adata_by_gene_names(adata_list[1], common_genes)

per_gene_ratio, per_sample_ratio = paired_sensitivity(adata_filtered_1, adata_filtered_2)

import matplotlib.pyplot as plt
subtitle = f'Ratio of average sensitivity across all genes = {per_sample_ratio*100:.2f}%'
plt.figure(figsize=(4, 6), dpi=400)
plt.boxplot(per_gene_ratio * 100, tick_labels=[subtitle])
plt.title(f'Per-Gene Sensitivity {ids_to_query[0]} Relative to {ids_to_query[1]}')
plt.ylabel('Per-Gene Relative Sensitivity (%)')
plt.show()

### False Positive Rate

Two metrics are proposed:
- **Negative Control Codeword Rate (NCCR)**: false positive rate of the decoding algorithm adjusted to the proportion of negative control codewords in the panel.
- **Negative Control Probe Rate (NCPR)**: false positive rate of the transcript signal adjusted to the proportion of probes that are negative control probes in the panel.

In [None]:
from hest.sensitivity import compute_negative_control_codeword_rate, compute_negative_control_probe_rate

for i, adata in enumerate(adata_list):
    nccr = compute_negative_control_codeword_rate(adata)
    ncpr = compute_negative_control_probe_rate(adata)
    print(f'False Positive Rate Metrics for {ids_to_query[i]}')
    print(f'Negative Control Codeword Rate: {nccr*100:.2f}%')
    print(f'Negative Control Probe Rate: {ncpr*100:.2f}%')
    print()

### Using raw transcript data (for Xenium only)

Using raw transcript data one can filter high quality (Phred quality score >= 20) transcripts.

In [None]:
import pandas as pd
from hest.sensitivity import compute_negative_control_codeword_rate, compute_negative_control_probe_rate

for id in ids_to_query:
    parquet_file = f'../hest_data/transcripts/{id}_transcripts.parquet'
    transcripts_df = pd.read_parquet(parquet_file, columns=['feature_name', 'qv'])
    transcripts_df = transcripts_df[transcripts_df['qv'] >= 20][['feature_name']]
    nccr = compute_negative_control_codeword_rate(transcripts_df)
    ncpr = compute_negative_control_probe_rate(transcripts_df)
    del transcripts_df
    print(f'False Positive Rate Metrics for {id}')
    print(f'Negative Control Codeword Rate: {nccr*100:.2f}%')
    print(f'Negative Control Probe Rate: {ncpr*100:.2f}%')