# Score Consistency

Here, we briefly demonstrate the similarities and discrepancies between LIANA's output in python and R. 

In [5]:
import os

import pandas as pd
import scipy

import scanpy as sc

import cell2cell as c2c
from liana.method.sc._rank_aggregate import AggregateClass, _rank_aggregate_meta
from liana.method.sc import cellphonedb, natmi, singlecellsignalr

data_path = '../../data/'
output_folder = os.path.join(data_path, 'liana-outputs/')
c2c.io.directories.create_directory(output_folder)

../../data/liana-outputs/ already exists.


#### Comparison with R output:

There are minor differences with the LIANA implementation in R that lead to outputs not being identical

- SingleCellSignalR Magnitude (lrscore): precision - slightly different after 3rd decimal place
- LogFC Specificity (lr_logfc): similar relative differences but different exact values
- CellPhoneDB Specificity (cellphone_pvals): similar relative differences but different exact values
- CellChat: not run by default in R

Let's check the consistency in the magnitude aggregate rank score when running the different methods that report magnitude (excluding CellChat, which is not present by default in R). 

In [6]:
adata = sc.read_h5ad(os.path.join(data_path, 'processed.h5ad'))
sadata = adata[adata.obs['sample']=='C100']

<font color='red'>Note for Daniel - I believe line 198-202 of _liana_pipe.py makes it impossible to pass just "Magnitude" as the consensus_opts argument, so can't pass for example just cellphonedb and natmi and then specify consensus_opts = ['Magnitude']. Have to include singlecellsignalr for this to work (which is fine, still get a high correlation:</font>

In [None]:
# # make rank_aggregate function that only runs on methods of choice and only for Magnitude
# rank_aggregate_partial = AggregateClass(_rank_aggregate_meta, methods=[cellphonedb, natmi, singlecellsignalr])
# # have to add singlecellsignalr to make the below code work
# rank_aggregate_partial(adata = sadata, 
#                        groupby='celltype', 
#                        use_raw = False, # run on log- and library-normalized counts
#                        verbose = True, 
#                        inplace = True
#                        #consensus_opts = ['Magnitude'] # rank by magnitude only - CURRENTLY not passed to _aggregate
#                       )

In [7]:
# make rank_aggregate function that only runs on methods of choice and only for Magnitude
rank_aggregate_partial = AggregateClass(_rank_aggregate_meta, methods=[cellphonedb, natmi, singlecellsignalr])
rank_aggregate_partial(adata = sadata, 
                       groupby='celltype', 
                       use_raw = False, # run on log- and library-normalized counts
                       verbose = True, 
                       inplace = True
                      )

Using `.X`!
5580 features of mat are empty, they will be removed.
The following cell identities were excluded: Plasma




0.33 of entities in the resource are missing from the data.
Generating ligand-receptor stats for 2548 samples and 19218 features
Running CellPhoneDB


100%|███████████████████████████████████████| 1000/1000 [00:21<00:00, 46.61it/s]


Running NATMI
Running SingleCellSignalR




In [8]:
rel_cols = ['source', 'target', 'ligand_complex', 'receptor_complex', 'magnitude_rank']
liana_aggregate_partial = sadata.uns['liana_res'].loc[:,rel_cols]
liana_aggregate_partial.sort_values(by = ['source', 'target', 'ligand_complex', 'receptor_complex'], inplace = True)
liana_aggregate_partial.to_csv(os.path.join(output_folder, 'magnitude_ranks_python.csv'))
liana_aggregate_partial.head()

Unnamed: 0,source,target,ligand_complex,receptor_complex,magnitude_rank
67,B,B,ACTR2,ADRB2,0.405892
17,B,B,ADAM17,ITGB1,0.081878
54,B,B,ADAM17,RHBDF2,0.334024
23,B,B,ADAM28,ITGA4,0.403503
8,B,B,APOC2,LRP1,1.0


Note, to run the correlation, make sure to have run the [companion Python tutorial](../ccc_R/S3_Score_Consistency.ipynb) up to the point where you save the csv named "magnitude_ranks_R.csv". 

In [9]:
# read and format R aggregate rank
lap_R = pd.read_csv(os.path.join(output_folder, 'magnitude_ranks_R.csv'), index_col = 0)
lap_R.columns = ['source', 'target', 'ligand_complex', 'receptor_complex', 'aggregate_rank']

# merge the two scores
la = pd.merge(liana_aggregate_partial, lap_R, on = ['source', 'target', 'ligand_complex', 'receptor_complex'], 
                                                how = 'inner')
sr = scipy.stats.spearmanr(la.magnitude_rank, la.aggregate_rank).statistic
print('The spearman correlation bewteen R and python aggregate magnitude scores is: {:.2f}'.format(sr))

The spearman correlation bewteen R and python aggregate magnitude scores is: 0.98
