# scVAE-Annotator: 10x Genomics Integration Demo

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/or4k2l/scVAE-Annotator/blob/main/examples/colab_10x_demo.ipynb)

This notebook demonstrates how to use scVAE-Annotator with 10x Genomics data.

## Features Demonstrated
- ‚úÖ Loading 10x MTX format (Cell Ranger output)
- ‚úÖ Complete annotation pipeline
- ‚úÖ Visualization generation
- ‚úÖ Result interpretation

**Expected Runtime**: ~8-10 minutes on Colab

**Dataset**: PBMC 3k from 10x Genomics

In [None]:
# Install scVAE-Annotator
!git clone https://github.com/or4k2l/scVAE-Annotator.git
%cd scVAE-Annotator
!pip install -q -e .

import os
os.environ['SCIPY_ARRAY_API'] = '0'

print("‚úÖ Installation complete!")

## Step 1: Download 10x Genomics PBMC 3k Dataset

In [None]:
!mkdir -p data/pbmc3k
!wget -q -O data/pbmc3k/pbmc3k.tar.gz \
    http://cf.10xgenomics.com/samples/cell-exp/1.1.0/pbmc3k/pbmc3k_filtered_gene_bc_matrices.tar.gz
!cd data/pbmc3k && tar -xzf pbmc3k.tar.gz

print("‚úÖ Dataset downloaded and extracted")

## Step 2: Load 10x MTX Format

In [None]:
import scanpy as sc
from scvae_annotator import create_optimized_config
from scvae_annotator.tenx_loader import load_10x_data, get_10x_metadata_summary

# Load 10x data
adata = load_10x_data('data/pbmc3k/filtered_gene_bc_matrices/hg19/')

# Display metadata
summary = get_10x_metadata_summary(adata)
print("\n10x Metadata Summary:")
for key, value in summary.items():
    print(f"  {key}: {value}")

## Step 3: Run Annotation Pipeline

In [None]:
from scvae_annotator import (
    enhanced_preprocessing,
    optimized_leiden_clustering,
    train_improved_vae,
    EnhancedAutoencoderAnnotator,
    create_visualizations
)

# Configure for Colab (faster settings)
config = create_optimized_config()
config.autoencoder_epochs = 10
config.optuna_trials = 5
config.output_dir = './results'

# Run pipeline
print("\nüî¨ Preprocessing...")
adata = enhanced_preprocessing(adata, config)

print("\nüîó Clustering...")
adata, n_clusters = optimized_leiden_clustering(adata, config)

print("\nüß† Training VAE...")
adata = train_improved_vae(adata, config)

print("\nüè∑Ô∏è  Annotating...")
# Use leiden clusters as pseudo-labels for demo
adata.obs['celltype'] = adata.obs['leiden']
adata.obs['cell_type_ground_truth'] = adata.obs['leiden']

annotator = EnhancedAutoencoderAnnotator(config)
annotator.train(adata)
annotator.predict(adata)

print("\nüìä Creating visualizations...")
create_visualizations(adata, config)

print("\n‚úÖ Pipeline complete!")

## Step 4: View Results

In [None]:
from IPython.display import Image, display
from pathlib import Path

# Display statistics
pred_col = 'autoencoder_predictions'
conf_col = 'autoencoder_confidence'

print("üìä Results Summary:")
print(f"  Cells: {adata.n_obs}")
print(f"  Genes: {adata.n_vars}")
print(f"  Cell types: {adata.obs[pred_col].nunique()}")
print(f"  Mean confidence: {adata.obs[conf_col].mean():.3f}")

# Display plots
plots = ['umap_comparison.png', 'confusion_matrix.png', 'confidence_analysis.png']
for plot in plots:
    path = Path(config.output_dir) / plot
    if path.exists():
        print(f"\n{plot}:")
        display(Image(filename=str(path), width=700))