## Colab - Install citepro

In [None]:
!uv pip install --system "git+https://github.com/ptggenomics/citepro.git" ipywidgets ipyfilechooser

## Import citepro

In [1]:
import citepro

👉 Detailed model information can be found at `https://www.celltypist.org/models`


## Downloading example dataset

In this tutorial, we are downloading a demo dataset directly to the colab virtual machine. Alternatively, one can upload the count matrix to Google Drive and mount the drive to access the data 

In [None]:
!curl -LO https://ptgngsdata.s3.us-west-2.amazonaws.com/counts/E44_1_restPBMC_DCpos_filtered_feature_bc_matrix.h5

## Parameters for create_mudata

These are parameters for create_mudata:
  * path_count - file path to the count matrix
  * allow_file - file that contains barcodes passed filter \(a.k.a allow list\)
  * sample_id - a sample id string. \(Optional\)
  * prot_norm - method of protein normalization \(acceptable values: asinh, clr, none\)
  * celltypist_model - predefined model to predict cell types
  * add_3d_umap - whether to generate 3d umap 

#### Explore the celltypist models

Execute the following code to see all the available built-in models from the [Celltypist](https://www.celltypist.org/models)

In [6]:
import celltypist as ct
ct.models.models_description()


👉 Detailed model information can be found at `https://www.celltypist.org/models`


Unnamed: 0,model,description
0,Immune_All_Low.pkl,immune sub-populations combined from 20 tissue...
1,Immune_All_High.pkl,immune populations combined from 20 tissues of...
2,Adult_COVID19_PBMC.pkl,peripheral blood mononuclear cell types from C...
3,Adult_CynomolgusMacaque_Hippocampus.pkl,cell types from the hippocampus of adult cynom...
4,Adult_Human_MTG.pkl,cell types and subtypes (10x-based) from the a...
5,Adult_Human_PancreaticIslet.pkl,cell types from pancreatic islets of healthy a...
6,Adult_Human_PrefrontalCortex.pkl,cell types and subtypes from the adult human d...
7,Adult_Human_Skin.pkl,cell types from human healthy adult skin
8,Adult_Human_Vascular.pkl,vascular populations combined from multiple ad...
9,Adult_Mouse_Gut.pkl,cell types in the adult mouse gut combined fro...


If any of above model fits your need, use it as the `celltypist_model` parameter, otherwize choose none to omit celltype prediction.

In [3]:

create_mu_params  = citepro.nbgui.generate_create_mu_setting()

VBox(children=(HBox(children=(Label(value='Count_matrix'), FileChooser(path='/home/jasonhuang/gitplaza/citepro…

Check parameters...
Setting:
    count_matrix: /home/jasonhuang/gitplaza/citepro_notebook/uv.lock 
    barcode_block_list: None
    sample_id: None
    celltypist_model: Immune_All_Low.pkl
    add_3d_umap: True
Good to go!


### execute the create_mudata

Now let's read in this data and create mudata object. This usually takes >10 minutes on colab, but will be faster \(5-10 minutes\) on a recent generation of HPC

In [None]:
mudat = citepro.recipe.create_mudata(**create_mu_params)

### Save the result mudata into .h5mu format

In [11]:
!mkdir -p data

In [None]:

mudat.write_h5mu('data/test.h5mu')

The MuData can be read back with ```mu.read_h5mu()``` function.


```python
mudat = mu.read_h5mu('data/test.h5mu')
```

## protein descriptive metadata 
A few useful antibody descriptive data are already incoporated in the data.
  * sum - Sum of all UMIs of this antibody among all cells.
  * percent - percent of total UMI count this antibody occupied. Useful for determining whether a antibody needs furthur titrated. A general rule of thumb is that no antibody should occupy more than 10% of UMI count space, given that this a decently sized \(>20 antibodies\)) cocktail. 
  * median, 75th and 95th - median UMI count across all cells
  

In [None]:
mudat['prot'].var[['gene_ids', 'sum', 'percent','median', '75th', '95th']].sort_values(by='percent', ascending=False)

## Convert MuData to AnnData
let's convert to anndata for the ease of downstrean analysis

In [14]:
adata = citepro.recipe.mu_to_ann(mudat)

In [1]:
import celltypist as ct
modname = [mn for mn in ct.models.models_description()['model']]
modname.insert(0, "--- None ---")

👉 Detailed model information can be found at `https://www.celltypist.org/models`


In [2]:
mdata = None

In [8]:
from ipyfilechooser import FileChooser
import ipywidgets as widgets
from IPython.display import display


# Create and display a FileChooser widget
fc_count = FileChooser()
fc_count_reset = widgets.Button(description='Reset selection', layout=widgets.Layout(width='150px'))
fc_count_reset.on_click(lambda x: fc_count.reset())

#fc_count = 'Count_matrix'

ta_samp_id = widgets.Text(
    description='Sample_ID'
)
# Create another FileChooser widget for barcode block list
fc_barcode = FileChooser()
fc_barcode_reset = widgets.Button(description='Reset selection', layout=widgets.Layout(width='150px'))
fc_barcode_reset.on_click(lambda x: fc_barcode.reset())
#fc_barcode.title = 'Barcode_block_list'

celltypist_model_dropdown = widgets.Dropdown(
    options=modname,
    value='Immune_All_Low.pkl'
)

cb_3d_umap = widgets.Checkbox(value=True, description='Generate 3D umap')

def on_run(b):
    print('Check parameters...')
    if not fc_count.selected:
        print('Please select a count matrix.')
        return
    ct_model = celltypist_model_dropdown.value if celltypist_model_dropdown.value != "--- None ---" else None
    sampid = ta_samp_id.value if ta_samp_id.value else None
    global mudat
    print("Setting:\n    count_matrix: {} \n    barcode_block_list: {}\n    sample_id: {}\n    celltypist_model: {}\n    add_3d_umap: {}".format(
        fc_count.selected, fc_barcode.selected, sampid, ct_model, cb_3d_umap.value))

    #mudat = citepro.recipe.create_mudata(path_count=fc_count.selected, samp_id=ta_samp_id.value, celltypist_model=ct_model, add_3d_umap = cb_3d_umap.value)


run_button = widgets.Button(description='Run', layout=widgets.Layout(width='150px'))
run_button.on_click(on_run)

lay1 = widgets.Layout(border = '1px solid #777777', padding = '2px', margin = '5px')
hboxes = []
hboxes.append(widgets.HBox([widgets.Label(value = 'Count_matrix'), fc_count, fc_count_reset], layout = lay1))
hboxes.append(widgets.HBox([widgets.Label(value = 'Barcode_block_list'), fc_barcode, fc_barcode_reset], layout = lay1))
hboxes.append(widgets.HBox([ta_samp_id], layout = lay1))
hboxes.append(widgets.HBox([widgets.Label(value = 'Celltypist'), celltypist_model_dropdown], layout = lay1))
hboxes.append(widgets.HBox([cb_3d_umap], layout = lay1))
hboxes.append(widgets.HBox([run_button], layout = lay1))
#hboxes.append(widgets.HBox(output, layout = lay1))

#hbox2 = widgets.HBox([ta_samp_id, celltypist_model_dropdown])                     
# Wrap the file choosers with VBox
vbox = widgets.VBox(hboxes)
display(vbox)


VBox(children=(HBox(children=(Label(value='Count_matrix'), FileChooser(path='/home/jasonhuang/gitplaza/citepro…

Check parameters...
Setting:
    count_matrix: /home/jasonhuang/gitplaza/citepro_notebook/Basic_usage_GUI.ipynb 
    barcode_block_list: None
    sample_id: None
    celltypist_model: Immune_All_Low.pkl
    add_3d_umap: True
Check parameters...
Setting:
    count_matrix: /home/jasonhuang/gitplaza/citepro_notebook/Basic_usage_GUI.ipynb 
    barcode_block_list: None
    sample_id: 3
    celltypist_model: Immune_All_Low.pkl
    add_3d_umap: True


In [4]:
result

3

The converted AnnData object has a structure as below:

In [None]:
adata

## Basic plotting using scanpy

In [16]:
import scanpy as sc

Feature plots using rna-generated umap, colored by CD8A protein level

In [None]:
sc.pl.embedding(adata, basis= 'X_umap_rna', color = ['prot:CD8A.65146.1'])

Feature plots using rna-generated umap, colored by predicted cell types

In [None]:
sc.pl.embedding(adata, basis= 'X_umap_rna', color = ['celltype_ct_majvote'])

Put protein and its encoding transcript expression level side-by-side

In [None]:
sc.pl.embedding(adata, basis= 'X_umap_rna', color = ['prot:CD8A.65146.1', 'rna:CD8A'])

Scanpy supports generating 3D umap!

In [None]:
sc.pl.embedding(adata, basis= 'X_umap_rna_3d', color = ['prot:CD8A.65146.1', 'rna:CD8A'], projection='3d')

## save the processed AnnData object

In [None]:
adata.write_h5ad('data/test.h5ad')