-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Determining the appropriate background #16
Comments
Was thinking when we discussed this that it is worth checking if all HPO genes are pLI extreme genes. Probably Not worth putting time into checking though.
I think it should be against all genes.
Sent from Outlook for iOS<https://aka.ms/o0ukef>
…________________________________
From: Brian M. Schilder ***@***.***>
Sent: Tuesday, November 7, 2023 5:10:15 PM
To: neurogenomics/MultiEWCE ***@***.***>
Cc: Skene, Nathan G ***@***.***>; Assign ***@***.***>
Subject: Re: [neurogenomics/MultiEWCE] Determining the appropriate background (Issue #16)
This email from ***@***.*** originates from outside Imperial. Do not click on links and attachments unless you recognise the sender. If you trust the sender, add them to your safe senders list<https://spam.ic.ac.uk/SpamConsole/Senders.aspx> to disable email stamping for this address.
Assigned #16<#16> to @NathanSkene<https://github.com/NathanSkene>.
—
Reply to this email directly, view it on GitHub<#16 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AH5ZPE2TUBBPNB2O7PKCE5DYDJTPPAVCNFSM6AAAAAA7BRDYD2VHI2DSMVQWIX3LMV45UABCJFZXG5LFIV3GK3TUJZXXI2LGNFRWC5DJN5XDWMJQHA4DSOBVGAZDAMQ>.
You are receiving this because you were assigned.Message ID: ***@***.***>
|
I have some code and data for this for when I looked at genes under selective pressure and pLI, so I can have a look back!
Agreed re: against all genes.
Sent from Outlook for iOS<https://aka.ms/o0ukef>
…________________________________
From: Nathan Skene ***@***.***>
Sent: Tuesday, November 7, 2023 5:51:55 PM
To: neurogenomics/MultiEWCE ***@***.***>
Cc: Murphy, Kitty ***@***.***>; Mention ***@***.***>
Subject: Re: [neurogenomics/MultiEWCE] Determining the appropriate background (Issue #16)
Was thinking when we discussed this that it is worth checking if all HPO genes are pLI extreme genes. Probably Not worth putting time into checking though.
I think it should be against all genes.
Sent from Outlook for iOS<https://aka.ms/o0ukef>
________________________________
From: Brian M. Schilder ***@***.***>
Sent: Tuesday, November 7, 2023 5:10:15 PM
To: neurogenomics/MultiEWCE ***@***.***>
Cc: Skene, Nathan G ***@***.***>; Assign ***@***.***>
Subject: Re: [neurogenomics/MultiEWCE] Determining the appropriate background (Issue #16)
This email from ***@***.*** originates from outside Imperial. Do not click on links and attachments unless you recognise the sender. If you trust the sender, add them to your safe senders list<https://spam.ic.ac.uk/SpamConsole/Senders.aspx> to disable email stamping for this address.
Assigned #16<#16> to @NathanSkene<https://github.com/NathanSkene>.
—
Reply to this email directly, view it on GitHub<#16 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AH5ZPE2TUBBPNB2O7PKCE5DYDJTPPAVCNFSM6AAAAAA7BRDYD2VHI2DSMVQWIX3LMV45UABCJFZXG5LFIV3GK3TUJZXXI2LGNFRWC5DJN5XDWMJQHA4DSOBVGAZDAMQ>.
You are receiving this because you were assigned.Message ID: ***@***.***>
—
Reply to this email directly, view it on GitHub<#16 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ANQCHWHK52N72IZZ2AGOYU3YDJYLXAVCNFSM6AAAAAA7BRDYD2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOJZGM2DAMZWGI>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
This would be excellent! I'll file a separate issue for this.
Cool! Seems we're all in consensus, I'll proceed accordingly. |
method="gprofiler" vs. method="homologene"
Human CTD + Human hitsThe gprofiler background is MUCH more comprehensive set of human genes than homologene (the default method orthogene uses within EWCE). Difference: 62663 vs. 16482 genes > bg1=get_bg(method="gprofiler")
Useing cached bg.
+ Version: 2023-11-08
> length(bg1)
[1] 62663
>
> bg2=get_bg(method="homologene")
Retrieving all genes using: homologene.
Retrieving all organisms available in homologene.
Mapping species name: human
Common name mapping found for human
1 organism identified from search: 9606
Gene table with 19,129 rows retrieved.
Returning all 19,129 genes from human.
Returning 19,129 unique genes from entire human genome.
+ Version: 2023-11-08
> length(bg2)
[1] 19129 Mouse CTD + Human hitsFor mouse-human analyses, the choice of orthologous gene mapping method makes a difference (tho not a massive one): Difference: 17024 vs. 19129 genes
ConclusionFor
For Tagging @Al-Murphy here as this is relevant to |
An important question for any enrichment-based analysis is: how do we determine the appropriate set of background genes?
In the case of rare disease celltyping project, I think there's two potential backgrounds that could be used.
Options
1. All HPO genes
All genes included in the HPO gene annotations.
This is essentially saying "is this phenotype's genes enriched in a celltype relative to rare disease genes in general?"
2. All human genes
All genes in the human genome, or at least those that appear in the CTD. When the CTD is from mouse, this background would be further reduced to those that have human 1:1 orthologs.
This is essentially saying "is this phenotype's genes enriched in a celltype relative to all genes that are expressed and informative in some cell type?"
Internally,
EWCE
will use the CTD genes to subset this list further, as it can only sample specificity scores from genes that are included in the CTD:Conclusions
@NathanSkene @KittyMurphy I think we concluded the latter is the way to go, especially as this is concordant with how
EWCE
behaves by default (unless the user supplies their own custombg=
arg).The text was updated successfully, but these errors were encountered: