Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

For an identified ligand, receptors, and target gene pathway, the gene expression for the ligand and receptors are very low #17

Closed
hoyu310 opened this issue Mar 25, 2020 · 6 comments
Labels
good first issue Good for newcomers

Comments

@hoyu310
Copy link

hoyu310 commented Mar 25, 2020

I preface this by mentioning I found NicheNet to work very well overall, but I don't understand why, in my dataset, a specific ligand and some receptors have extremely low expression, despite them being confidently identified. I hope to know whether this is an expected result based on how NicheNet works, and whether there are any potential problems.

My dataset (processed by Seurat) has 7 clusters, numbered 0-6, and I followed the seurat_wrapper vignette to run NicheNet. My dataset has two conditions, A and B (every cell is either one or the other), clusters 1 and 2 have a relatively higher proportion of cells being condition A (the other clusters also have slightly higher proportion of cells being condition A, but to a much lesser degree than clusters 1 and 2), and I set condition_oi as condition A and condition_reference as condition B. I ran nichenet_seuratobj_aggregate for all (7*7 = 49) pairs of sender/receiver (i.e. 0/0, 0/1 .. 6/5, 6/6) being individual clusters, so 49 runs, and summarized the $ligand_target_matrix, $ligand_activities, and $ligand_receptor_matrix outputs in each run into one table for all the runs.

I tried to pick out the more important ligand->receptor(s)->target_gene(s) pathways by looking at the Pearson_correlation_coefficient_target_gene_prediction_ability, Regulatory_potential, and Prior_interaction_potential histograms for all the data, then deciding on some cutoffs to represent "high" values for each of the three criteria (decided at 0.10, 0.003, and 0.5, respectively; the latter two holds the top 5-10 % of the results), and filtering out the results that have any of the three criteria below the cutoff. This resulted in only two pathways, with one of them being particularly interesting.

The particularly interesting pathway consists of ligand X, receptors Y1 and Y2, and target_gene Z. It is interesting because the pathway is very specific, only appearing in cluster 1 as sender and 4 as receiver, even when doing an extra check with all the cutoffs are completely relaxed. Also, the sender being 1 and receiver being 4, as well as the actual identities of the ligand/receptors/target_gene, are all very in line with our understanding of our dataset.

Our only concern is that, when looking at the Seurat violin plots (VlnPlot) of the genes of the ligand/receptors/target_gene in this pathway (attached), the only plot that seems to make sense is target_gene Z (with cluster 1 being very low and 4 being relatively high). For ligand X, in which based on the NicheNet result cluster 1 is the sole sender, the expression of gene X in cluster 1 is very low. For receptors Y1 and Y2, the expression of genes Y1 and Y2 are very low in all clusters.

Based purely on the NicheNet algorithm, does this result make sense at all? For the tool's general application to real data, can ligand X still be considered as a strong ligand?

The NicheNet output indicates ligand X is a bona fide ligand - could this be a reason why it was determined as a ligand despite having very low expression here?

VlnPlots_of_pathway

@browaeysrobin
Copy link
Member

Hi @hoyu310,

The prioritization of ligands by NicheNet (ligand activity analysis) will only occur based on enrichment of their target genes in the set of genes that are differentially expressed in the receiver cell. So there is no prioritization based on the strength of expression of the ligand in the sender cell or strength of expression of the receptor(s) in the receiver cell. Expression in sender cells is only used to determine which ligands are expressed in a sender cell, and expression in receiver cells is used to determine which receptors are expressed in the receiver cell. The default definition of 'being expressed' is that a gene should be expressed in 10% of cells in the cluster of interest. This is not so high (you can put a more stringent cutoff if you want), resulting in the possible outcome that a ligand, top-ranked according to the enrichment of its target genes, is actually not very highly expressed. So what you observe, can be expected based on how NicheNet prioritizes ligands.

However, one thing is weird, based on the results you show me here. That is that ligand X was only found to be important when considering cluster 1 as 'sender' cell type. This is weird because ligand X seems to be expressed in the other clusters as well. So it seems that Ligand X is not specific. Can you check what went possibly wrong here?

@hoyu310
Copy link
Author

hoyu310 commented Mar 25, 2020

@browaeysrobin Thanks for the response, now it makes sense why the genes of the ligands or receptors are not always highly expressed.

It is indeed strange that ligand X only has cluster 1 as the sender and cluster 4 as the receiver, now that you pointed it out. After carefully reviewing all the outputs, I now realize why this is the case. For the outputs of nichenet_seuratobj_aggregate, in $ligand_activities, the full set of ligands ("test_ligand"), typically in the hundreds, are included; however, in $ligand_target_matrix and $ligand_receptor_matrix, the target_gene and receptor associations, respectively, are only included for the top 20 ligands listed in $ligand_activities.

For my dataset and the 49 runs, it happened to be that ligand X is a top 20 ligand only in the run of cluster 1 as sender and 4 as receiver. However, when looking back at the outputs, ligand X actually appears in $ligand_activities in most of the 49 runs, with the Pearson_correlation_coefficient_target_gene_prediction_ability of ligand X in a number of these runs being comparable (sometimes even higher) to that of the run of cluster 1 as sender and 4 as receiver - it just happened that in these runs, ligand X is outside of the top 20. The way I compiled the "one table for all the runs" is that for every ligand in $ligand_activities, if the ligand has a target_gene in $ligand_target_matrix and also has a receptor in $ligand_receptor_matrix, then this will be one entry in the compiled table.

Therefore, I think that if there were a parameter in nichenet_seuratobj_aggregate that would allow the computation of $ligand_target_matrix and $ligand_receptor_matrix for all the ligands in $ligand_activities, then it would solve this problem. But I am not sure around how much computational resources this would add to the run and so whether this is feasible, though.

@browaeysrobin
Copy link
Member

Hi @hoyu310 ,

I added a new parameter that allows you to get the output for all ligands (the heatmaps won't look nice, but you can still use the matrices like you do). You should reinstall nichenetr and put filter_top_ligands = FALSE instead of the default TRUE as extra parameter. Then you should get all output.

Please let me know whether it works for you now.

Just an extra note on your analysis: in this way it will be hard to find specific sender-receiver interactions if you put the cutoff on expressed genes low. To get some more specific links, you could put this threshold more stringent. What I prefer though is considering all possible sender cells at once in a NicheNet analysis and then tracing back which cell types express which top-ranked ligands most strongly.

@hoyu310
Copy link
Author

hoyu310 commented Mar 26, 2020

@browaeysrobin Thanks a lot, I re-installed and ran nichenet_seuratobj_aggregate with filter_top_ligands = FALSE, and it worked! I will also try out your other suggestions. Thanks again.

@hoyu310 hoyu310 closed this as completed Mar 26, 2020
@browaeysrobin browaeysrobin added the good first issue Good for newcomers label Apr 14, 2020
@linzhangTuesday
Copy link

Hi @hoyu310, thanks for the detailed description of your issue.

May I ask for the three metrics for filtering you mentioned: Pearson_correlation_coefficient_target_gene_prediction_ability, Regulatory_potential, and Prior_interaction_potential. What is the difference btw the second and third one?

@browaeysrobin
Copy link
Member

Hi @linzhangTuesday

Regulatory_potential: scores in the ligand-target matrix -- used for ligand-target gene regulatory potential
Prior_interaction_potential: weights of the ligand-signaling weighted network -- used for protein-protein interaction evidence, eg between ligands and receptors

See supplementary figure 1 of the paper, or the figure in https://github.com/saeyslab/nichenetr/blob/master/vignettes/model_construction.md, to see what I mean with the ligand-signaling network etc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

3 participants