# Introduction

This is a short python tutorial for using the panhumanpy package for hierarchical cell type annotation. To get started, we recommend that you install conda or an equivalent, and create a fresh conda environment wherein to install panhumanpy as follows: 

In [None]:
# From the command line, to create and activate conda env:
# conda create -n panhumanpy_env python=3.9
# conda activate panhumanpy_env

# To install panhumanpy,
# For CPU based use:
# pip install git+https://github.com/satijalab/panhumanpy.git
# For GPU based use:
# pip install git+https://github.com/satijalab/panhumanpy.git#egg=panhumanpy[gpu]

panhumanpy offers two interfaces to build a pipeline for cell type annotation and to obtain a low dimensional representation of the dataset for visualisation. The high-level interface provides quick and easy acess to inference using the Azimuth Neural Network, while the low-level interface is more modular and allows for finer control and customisation of the workflow. <br>
<br>
To begin, import the panhumanpy package (and anndata, optional).

In [None]:
import panhumanpy as ph
import anndata as ad       #optional

# High-level interface

The high-level interface of the panhumanpy package is through the AzimuthNN class. We can pass either an h5ad object, or an anndata object read from the h5ad object to the AzimuthNN object for annotation. 

In [None]:
# file path to query h5ad
query_path = "path/to/h5ad"

# optional
query = ad.read_h5ad(query_path)

In [None]:
# For documentation
help(ph.AzimuthNN)

#### Simplified workflow

In [None]:
#azimuth = ph.AzimuthNN(query_path)     # if passing the filepath
azimuth = ph.AzimuthNN(query)
embeddings = azimuth.azimuth_embed()
umap = azimuth.azimuth_umap()
cell_metadata = azimuth.cells_meta

In the above, cell_metadata is a pandas dataframe with the annotations from the Azimuth Neural Network. The important columns are as follows:<br>
-'full_hierarchical_labels': The complete cell type label with all hierarchical levels.<br>
-'level_zero_labels': Cell type labels at the lowest resolution.<br>
-'final_level_labels': Cell type labels at the highest resolution, note that this label comes from different levels for different cells, -as the maximum hierarchical depth is not uniform across all cells.<br>
-'final_level_softmax_prob': The confidence values for the predicted final level labels. <br>

#### Digging a little deeper

Note that in the above we've assumed that the index column of the genes metadata in the anndata object ie query.var consists of gene names. However that is not always the case. In these cases, you need to inspect query.var and select the correct column name where the gene names are stored, and specify this column name when you instantiate the AzimuthNN object. Quite often, gene names are stored in a column called "gene_symbol" or in a column called "feature_name". 

With these options, you would instantiate the AzimuthNN object as follows:

In [None]:
azimuth = ph.AzimuthNN(query, feature_names_col = "gene_symbol")

#### Label refinement

We offer you a built-in post-processing step on the hierarchical annotations provided by the Azimuth Neural Network which you can use to obtain annotations with consistent granularity across tissues at three levels for each cell. These refined annotations 'azimuth_broad', 'azimuth_medium', 'azimuth_fine', are added to the metadata dataframe, and lend themselves to easy interpretation. <br>
<br>
This step is performed as follows:

In [None]:
azimuth.azimuth_refine()
cell_metadata = azimuth.cells_meta

#### Saving the annotated object

Finally, you can pack the annotations, and any embeddings and umaps created into an annotated object. You can optionally also save the annotated object at a specified filepath. In case a file with the filename exists already, a datetime stamp is added to the filename.

In [None]:
annotated_query = azimuth.pack_adata(save_path="path/to/save/h5ad")

# Low-level flexible interface

This is intended for low-level interactive usage of the Azimuth Neural Network annotation pipeline. This class provides a comprehensive framework for single-cell RNA-seq annotation using neural network models. It handles the complete workflow from data loading and preprocessing to inference, post-processing, and result visualization. It can be used to create memory-efficient and scalable pipelines for atlas-scale annotation, and also for more exploratory analysis of the annotation process.<br>
<br>
We shall not be detailing all possible options available in this case, and shall only provide a minimal set of steps that can reproduce the results of the previous section. The user is encouraged to read the documentation and the source-code, and explore all the attributes and the methods in the class.

In [None]:
# For documentation
help(ph.AzimuthNN_base)

In [None]:
azimuth = ph.AzimuthNN_base()  
# Note that this class is not instantiated with a query, the query is only passed to it later.

In [None]:
# Pass a query in the form of an anndata object, an h5ad file, or the components thereof separately
# You can optionally specify the column in query.var with the gene names just like in the previous section
azimuth.query_adata(query)

In [None]:
# Inference
azimuth.process_query()
azimuth.run_inference_model()
_ = azimuth.process_outputs()

In [None]:
# (Optional) refinement of labels for consistent granularity and ease of interpretation
_ = azimuth.refine_labels(refine_level = 'broad')
_ = azimuth.refine_labels(refine_level = 'medium')
_ = azimuth.refine_labels(refine_level = 'fine')

In [None]:
# Update metadata with annotations and read the updated cell metadata
azimuth.update_cells_meta()
cell_metadata = azimuth.cells_meta

In [None]:
# For Azimuth NN embeddings and the corresponding umap
embeddings = azimuth.inference_model_embeddings(embedding_layer_name = 'dense_3')
umap = azimuth.inference_model_umaps(embedding_layer_name='dense_3')

In [None]:
# To pack metadata updated with annotations, and any low dimensional representations computed into an anndata obj
annotated_query = azimuth.pack_adata()