## Colab - Install citepro

In [None]:
!uv pip install --system "git+https://github.com/ptggenomics/citepro.git"

## Import citepro

In [8]:
import citepro


## Downloading example dataset

In [None]:
!curl -LO https://ptgngsdata.s3.us-west-2.amazonaws.com/counts/E44_1_restPBMC_DCpos_filtered_feature_bc_matrix.h5

In [None]:
h5_path = "E44_1_restPBMC_DCpos_filtered_feature_bc_matrix.h5"

## Basic usage - use Proteintech Genomics recipe

The create_mudata() take the following arguments:
  - path_count

## Explore the celltypist models

Execute the following code to see all the available built-in models from the [Celltypist](https://www.celltypist.org/models)

In [None]:
import celltypist as ct
ct.models.models_description()

If any of above model fits your need, use it as the argument of `celltypist_model`

### execute the create_mudata

Now let's read in this data and create mudata object. This usually takes 5 to 10 minutes on a recent generation of HPC.

In [None]:
mudat = citepro.recipe.create_mudata(path_count=h5_path, samp_id="1_rest", celltypist_model='Immune_All_Low.pkl')

### Save the result mudata into .h5mu format

In [11]:
!mkdir -p data

In [None]:

mudat.write_h5mu('data/test.h5mu')

The MuData can be read back with ```mu.read_h5mu()``` function.


```python
mudat = mu.read_h5mu('data/test.h5mu')
```

## protein descriptive metadata 
A few useful antibody descriptive data are already incoporated in the data.
  * sum - Sum of all UMIs of this antibody among all cells.
  * percent - percent of total UMI count this antibody occupied. Useful for determining whether a antibody needs furthur titrated. A general rule of thumb is that no antibody should occupy more than 10% of UMI count space, given that this a decently sized \(>20 antibodies\)) cocktail. 
  * median, 75th and 95th - median UMI count across all cells
  

In [None]:
mudat['prot'].var[['gene_ids', 'sum', 'percent','median', '75th', '95th']]

## Convert MuData to AnnData
let's convert to anndata for the ease of downstrean analysis

In [14]:
adata = citepro.recipe.mu_to_ann(mudat)

The converted AnnData object has a structure as below:

In [None]:
adata

## Basic plotting using scanpy

In [16]:
import scanpy as sc

Feature plots using rna-generated umap, colored by CD8A protein level

In [None]:
sc.pl.embedding(adata, basis= 'X_umap_rna', color = ['prot:CD8A.65146.1'])

Feature plots using rna-generated umap, colored by predicted cell types

In [None]:
sc.pl.embedding(adata, basis= 'X_umap_rna', color = ['celltype_ct_majvote'])

Put protein and its encoding transcript expression level side-by-side

In [None]:
sc.pl.embedding(adata, basis= 'X_umap_rna', color = ['prot:CD8A.65146.1', 'rna:CD8A'])

Scanpy supports generating 3D umap!

In [None]:
sc.pl.embedding(adata, basis= 'X_umap_rna_3d', color = ['prot:CD8A.65146.1', 'rna:CD8A'], projection='3d')

## save the processed AnnData object

In [None]:
adata.write_h5ad('data/test.h5ad')