<center> Advanced Integration and Annotation of scRNA-seq Data Using scVI: Hyperparameter Tuning, Label Transfer, and Custom Reference Creation - Part 2

# Label Transfer and Hyperparameter Tuning

Following successful integration and label transfer, the scVI model requires fine-tuning to optimize its performance. The `ModelTuner` function from the `scvi` module will be employed to adjust the neural network hyperparameters, ensuring accurate label transfer and robust integration.

This step focuses on:

- Improving classification accuracy of transferred labels.
- Reducing batch effect noise while preserving biological signal.
- Enhancing model generalizability for downstream analyses.

Hyperparameter tuning will be performed using the `tune_hyperparameters()` method, enabling automatic selection of the best parameters based on a predefined metric (e.g., log-likelihood or classification accuracy). Once tuned, the model will be retrained on the full dataset. [GitHub reference](https://github.com/mousepixels/sanbomics_scripts/blob/main/sc2024/annotation_integration.ipynb)

Next, we proceed with the implementation.


In [None]:
import scanpy as sc
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import scvi
import torch
#import celltypist
#from celltypist import models
from scvi.autotune import ModelTuner
from ray import tune
import ray

In [None]:
torch.set_float32_matmul_precision("high")

In [None]:
print("scvi-tools version:", scvi.__version__)

In [None]:
print("CUDA available:", torch.cuda.is_available())

# Integration

In this step, we will perform batch integration. It is essential to fine-tune the network for optimal predictions. The goal is to predict the latent space (hidden layer) of the network, and then perform K-Nearest Neighbors (KNN) and Leiden clustering on this layer.


In [None]:
# remove the refferance data
# andata_bc = andata_combined[andata_combined.obs['sample']=='ST'].copy()
pathout = "/data/kanferg/Sptial_Omics/SpatialOmicsToolkit/out_4"
andata_combined = sc.read_h5ad(os.path.join(pathout, "adata_concat_BreastCancer_harmony_scVI_scANVI_unintigrated.h5ad"))
andata_bc = andata_combined[andata_combined.obs['sample']=='ST'].copy()

In [None]:
model_cls = scvi.model.SCVI
model_cls.setup_anndata(andata_bc, categorical_covariate_keys = ['batch'],
                             continuous_covariate_keys=['percent_mito'])
tuner = ModelTuner(model_cls)

In [None]:
results = tuner.fit(andata_bc, metric="validation_loss")

In [None]:
model = model_cls(andata_bc)
print(model.module)

In [None]:
search_space = {
    "n_hidden": tune.choice([92, 128]),
    "n_latent": tune.choice([10, 20, 30, 40, 50, 60]),
    #"n_layers": tune.choice([1, 2, 3]),
    "lr": tune.loguniform(1e-4, 1e-2),
    "gene_likelihood": tune.choice(["nb", "zinb"])
}

# Specify a storage path (e.g., a local directory for Ray's outputs)
#run_config = RunConfig(storage_path="./ray_results")

# Run the tuner with the updated configuration
results = tuner.fit(
    andata_bc,
    metric="validation_loss",
    resources={'gpu': 3},  # specify GPU resources
    search_space=search_space,
    num_samples=10,
    max_epochs=2,)



In [None]:
best_vl = 10000
best_i = 0
for i, res in enumerate(results.results):
    vl = res.metrics['validation_loss']

    if vl < best_vl:
        best_vl = vl
        best_i = i
        
results.results[best_i]

print(f'{results.results[best_i]}')