# Tutorial scoring with ANS and other gene signature scoring methods
The following package contains the Python implementation of the Adjusted Neighborhood Scoring method, as well as of UCell [1], JASMINE [2] (with likelihood or odds-ratio sub-computation), the scoring approach by Tirosh et al. [3], and the two modification of it Tirosh_AG and Tirosh_LVG. We refer to the main article of this work for details on the scoring methods. 

**Content tutorial**:
- In the first part of this tutorial, we show the *basic usage* of the available gene signature scoring methods. 
- In the second part, we show the application of *GMM postprocessing* for comparable score ranges. 

We used our preprocessed version of the PBMC dataset and the DGEX list published by Hao et al. 2021 [4]. We will use the preprocessed dataset containing B-cells, monocytes, and natural killer cells. The preprocessed dataset can be downloaded [here](https://drive.google.com/file/d/15DiWGfSoqtt6Fl2tK_0ik-w50rn30LQA/view?usp=drive_link) and the DGEX list [here](https://drive.google.com/file/d/1a3Uqky2VZxCxLvGI-soCTUp3lijrfrx7/view?usp=drive_link). The raw data can be downloaded [here](https://atlas.fredhutch.org/nygc/multimodal-pbmc/). 

*Place the downloaded data into the `tut_data` folder*.


[1] [https://github.com/carmonalab/UCell](https://github.com/carmonalab/UCell) by [Andreatta et Carmona 2021](https://doi.org/10.1016/j.csbj.2021.06.043)

[2] [https://github.com/NNoureen/JASMINE](https://github.com/NNoureen/JASMINE) by [Noureen et al. 2022](https://doi.org/10.7554/eLife.71994)

[3] Tirosh, Itay, Benjamin Izar, Sanjay M. Prakadan, Marc H. Wadsworth 2nd, Daniel Treacy, John J. Trombetta, Asaf Rotem, et al. 2016. “Dissecting the Multicellular Ecosystem of Metastatic Melanoma by Single-Cell RNA-Seq.” Science 352 (6282): 189–96. https://doi.org/10.1126/science.aad0501

[4] Hao, Yuhan, Stephanie Hao, Erica Andersen-Nissen, William M. Mauck 3rd, Shiwei Zheng, Andrew Butler, Maddie J. Lee, et al. 2021. “Integrated Analysis of Multimodal Single-Cell Data.” Cell 184 (13): 3573–87.e29.

In [1]:
import scanpy as sc
import pandas as pd

from signaturescoring import score_signature

from tut_helper import get_sigs_from_DGEX_list
sc.settings.verbosity = 2

## Load preprocessed data

In [2]:
adata = sc.read_h5ad('tut_data/pp_pbmc_b_mono_nk.h5ad')

## To avoid errors 
if 'log1p' in adata.uns_keys():
    adata.uns['log1p']['base'] = None
else:
    adata.uns['log1p'] = {'base': None}

The preprocessed dataset contains B-cells, Monocytes and NK-cells. 

In [3]:
adata.obs['celltype.l1'].value_counts()

Mono    43553
NK      14408
B       10613
Name: celltype.l1, dtype: int64

## Load cell state specific signatures
We create celltype signatures based on the list of published differentially expressed genes  per cell type. Because the cell type granularity level is lower in the DGEX genes list, we simply union the DGEX genes of all cell sub-type beloning to our types of interest, i.e., B-cells, Monocytes and NK-cells. The detailed way how to extract the signatures is implemented in the method `get_sigs_from_DGEX_list` of `tut_helper.py`.

In [4]:
DE_of_celltypes = pd.read_csv('tut_data/DE_by_celltype.csv')

In [5]:
SG_subtypes = get_sigs_from_DGEX_list(adata, DE_of_celltypes, remove_overlapping=True)

Types and their subtypes:
{
    "B": [
        "B intermediate kappa",
        "B intermediate lambda",
        "B memory kappa",
        "B memory lambda",
        "B naive kappa",
        "B naive lambda",
        "Plasma",
        "Plasmablast"
    ],
    "Mono": [
        "CD14 Mono",
        "CD16 Mono"
    ],
    "NK": [
        "NK_1",
        "NK_2",
        "NK_3",
        "NK_4",
        "NK Proliferating",
        "NK_CD56bright"
    ]
}


In [6]:
for k,v in SG_subtypes.items():
    print(f'Signature for subtype {k} contains {len(v)} genes.')

Signature for subtype B contains 488 genes.
Signature for subtype Mono contains 382 genes.
Signature for subtype NK contains 243 genes.


## Score cell statespecific signatures
Next we show how to score cells with each method and avaialble parameters. We will score all signatured in `SG_subtypes`. The method `score_signature` of our package takes an anndata object, a list of genes, and method specific parameters and scores the cells. Cell scores are stored inplace in the anndata object in `.obs` DataFrame. 

### Adjusted Neighborhood Scoring (ANS)

In [None]:
for gene_type, gene_list in SG_subtypes.items():
    sc.logging.info(f'Scoring for gene type ')
    score_signature(method='adjusted_neighborhood_scoring',
                    adata=adata,
                    gene_list=gene_list, 
                    ctrl_size=100, 
                    score_name=f'ANS_{gene_type}_scores'
                    )

### Scanpy scoring

### Tirosh-based scoring methods

### JASMINE scoring

### UCell scoring

## GMM postprocessing