## Colab - Install CITE-pro

In [None]:
!uv pip install --system "git+https://github.com/ptggenomics/citepro"

Alternatively, the CITE-pro can be installed from our s3 bucket.
```
!uv pip install --system https://ptgngsdata.s3.us-west-2.amazonaws.com/citepro_pre-release2.zip
```

## Import CITE-pro

In [None]:
import citepro

## Downloading example dataset

For demonstration purposes, you’ll be downloading single-cell multiomic data generated by Proteintech Genomics directly into the Google Colab virtual machine. The data is from human resting PBMCs stained with the MultiPro® Human Discovery Panel (HDP) followed by processing using 10x Genomics Flex chemistry with Feature Barcoding Technology.   

It is also possible to upload your own count matrix generated from HDP stained cells. This can be accomplished by mounting a Google Drive containing your pre-processed count matrix.


In [None]:
!curl -LO https://ptgngsdata.s3.us-west-2.amazonaws.com/counts/E44_1_restPBMC_DCpos_filtered_feature_bc_matrix.h5

Additional sample can be downloaded:
  *  LPS 4hrs, TI - https://ptgngsdata.s3.us-west-2.amazonaws.com/counts/E44_3_LPS_4h_filtered_feature_bc_matrix.h5


## Parameters for create_mudata

These are parameters for create_mudata:
  * path_count - file path to the count matrix
  * allow_file - file that contains barcodes passed filter \(a.k.a allow list\)
  * sample_id - a sample id string. \(Optional\)
  * prot_norm - method of protein normalization \(acceptable values: asinh, clr, none\)
  * celltypist_model - predefined model to predict cell types
  * add_3d_umap - whether to generate 3d umap 

#### Explore the celltypist models

Execute the following code to see all the available built-in models from the [Celltypist](https://www.celltypist.org/models)

In [None]:
import celltypist as ct
ct.models.models_description()


If any of above model fits your need, use it as the `celltypist_model` parameter, otherwize choose none to omit celltype prediction.

In [None]:
from google.colab import output
output.enable_custom_widget_manager()

In [None]:

create_mu_params  = citepro.nbgui.generate_create_mu_setting()

In [None]:
from google.colab import output
output.disable_custom_widget_manager()

### Execute the create_mudata

Now let's read in this data and create mudata object. This usually takes >10 minutes on colab, but will be faster \(5-10 minutes\) on a recent generation of HPC

In [None]:
mudat = citepro.recipe.create_mudata(**create_mu_params)

### Save the result mudata into .h5mu format

In [11]:
!mkdir -p data

In [None]:

mudat.write_h5mu('data/demo.h5mu')

The MuData can be read back with ```mu.read_h5mu()``` function.


```python
mudat = mu.read_h5mu('data/demo.h5mu')
```

## Protein descriptive metadata 
A few useful antibody descriptive data are already incoporated in the data.
  * sum - Sum of all UMIs of this antibody among all cells.
  * percent - percent of total UMI count this antibody occupied. Useful for determining whether a antibody needs further titrated. A general rule of thumb is that no antibody should occupy more than 10% of UMI count space, given that this a decently sized \(>20 antibodies\)) cocktail. 
  * median, 75th and 95th - median UMI count across all cells
  

In [None]:
mudat['prot'].var[['gene_ids', 'sum', 'percent','median', '75th', '95th']].sort_values(by='percent', ascending=False)

## Convert MuData to AnnData
Let's convert to anndata for the ease of downstrean analysis

In [14]:
adata = citepro.recipe.mu_to_ann(mudat)

The converted AnnData object has a structure as below:

In [None]:
adata

## Basic plotting using scanpy

Feature plots using rna-generated umap, colored by predicted cell types

In [None]:
import scanpy as sc
sc.pl.embedding(adata, basis= 'X_umap_rna', color = ['celltype_ct_majvote'])

Put protein and its encoding transcript expression level side-by-side

Please note that the dynamic range of protein are set as follow:
  * Minimal - 0.5th percentile of protein level, but levels below 2 will not be displayed.
  * Maximal - 99.5th percentile of protein level, but the top of the displayed expression range won't go below 6.

In [None]:
citepro.nbgui.display_multiomics_feature_plot(adata)

Figures can be downloaded with right click -> save as..

Scanpy supports generating 3D umap!

In [None]:
sc.pl.embedding(adata, basis= 'X_umap_rna_3d', color = ['prot:CD8A.65146.1', 'rna:CD8A'], projection='3d')

## Save the processed AnnData object

In [None]:
adata.write_h5ad('data/demo.h5ad')

Do not forget to download the saved h5ad file using the colab 'File' tab on the left, as all the data on the colab virtual machine will be lost when disconnected. 