-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Interpretation of normalized XP-EHH #99
Comments
Hi Catalina,
So, I've only really seen this sort of pattern when comparing two samples
from the same population. If the raw xpehh statistic has very low variance,
the normalization step could pop out very extreme scores. You might check
how much variance there was in the raw statistic (this would be reported by
norm in the normalization log file) just to see if these extreme scores are
the result of dividing by a small variance.
…-Zachary
On Tue, Jul 11, 2023 at 11:23 AM Catalina I. Fernández H. < ***@***.***> wrote:
Dear @szpiech <https://github.com/szpiech>, thank you for developing this
tool and making our life easier.
I have a question regarding what seems extremely high XP-EHH scores. I run
this analysis using array data of ±500,000 SNPs from two human populations.
As can be seen in the plots (link below), there are "too many" peaks and
values seem"too extreme". Never seen anything like this, at least in other
articles comparing human populations. I already checked several times all
the steps to run this analysis but I'm wondering if I'm missing something
while cleaning the data for instance. Do you have any insights into what
process or error could be generating this pattern for the XP-EHH? This is
the first time I'm running this analysis.
*by position*: [
https://www.dropbox.com/s/atdvmshpc558efn/xp-ehhh%20norm.png?dl=0]
*by 200kn windows*: [
https://www.dropbox.com/s/yw4qrf2w728lvhf/xp-ehh_window.png?dl=0]
Thank you very much!
Catalina.
—
Reply to this email directly, view it on GitHub
<#99>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABAKRQRZ2FJYOETF5M433Q3XPVVWJANCNFSM6AAAAAA2GFVUEE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Hi Zachary, thank you so so much for your reply! I would have never guessed this was the problem and nobody else I talked to noticed this earlier. So, the comparison I was (am!) trying to make is between individuals from two distinct indigenous populations in South America, and according to the logfile, the variance is indeed quite low (0.094). I run again the XP-EHH and compared these samples against Chinese individuals (CHB; 1000 Genomes) and now the variance is higher (0.157) but I still get "too many" and "extreme" values, so it's hard to interpret. According to your experience and knowledge, would you interpret the first fact (very low variance between two pops) as an indication that this test may not be suitable to detect selection in populations that probably split not too long ago? If so, is there any other tool that you would recommend? And my second question is what additional or different data filters (maf or other) in the data for the xp-ehh analyses itself could be added/ eliminated to be able to capture the 'true' estimate for this statistic? or maybe, would it make sense to change some of the default parameters for the normalization step? I would really appreciate any insight in this regard. Thanks a lot! |
Hi Catalina,
Hmm, well if these are two populations that actually cluster separately
(e.g. on PCA or STRUCTURE analysis), then I might not necessarily expect
these strange results. On the other hand I had someone report similar
patterns (actually somewhat more extreme) when comparing two sets of data
that were actually quite far diverged, which, given your description, also
doesn't sound like your situation.
So, given these apparently inflated scores, I think you may want to adjust
your critical value for the windowing analysis. I think if you looked at
the empirical distribution of normalized scores and picked +/-Z that
contains 95% of the mass, this might be a better choice. You can also
analyze both with respect th CHB, and examine the overlap/differences. I
wonder if these populations have fairly small effective population size and
if this might affect the statistic at all. If you have a guess at their
joint demographic history, you could try simulating and testing the
statistic.
I assume you've filtered close relatives?
Zachary
…On Tue, Aug 8, 2023 at 8:46 AM Catalina I. Fernández H. < ***@***.***> wrote:
Hi Zachary, thank you so so much for your reply! I would have never
guessed this was the problem and nobody else I talked to noticed this
earlier. So, the comparison I was (am!) trying to make is between
individuals from two distinct indigenous populations in South America, and
according to the logfile, the variance is indeed quite low (0.094). I run
again the XP-EHH and compared these samples against Chinese individuals
(CHB; 1000 Genomes) and now the variance is higher (0.157) but I still get
"too many" and "extreme" values, so it's hard to interpret. According to
your experience and knowledge, would you interpret the first fact (very low
variance between two pops) as an indication that this test may not be
suitable to detect selection in populations that probably split not too
long ago? If so, is there any other tool that you would recommend? And my
second question is what additional or different data filters (maf or other)
in the data for the xp-ehh analyses itself could be added/ eliminated to be
able to capture the 'true' estimate for this statistic? or maybe, would it
make sense to change some of the default parameters for the normalization
step? I would really appreciate any insight in this regard. Thanks a lot!
—
Reply to this email directly, view it on GitHub
<#99 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABAKRQX4NK452MCIH5V54MDXUIYLNANCNFSM6AAAAAA2GFVUEE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Dear @szpiech, thank you for developing this tool and making our life easier.
I have a question regarding what seems extremely high XP-EHH scores. I run this analysis using array data of ±500,000 SNPs from two human populations. As can be seen in the plots (link below), there are "too many" peaks and values seem"too extreme". Never seen anything like this, at least in other articles comparing human populations. I already checked several times all the steps to run this analysis but I'm wondering if I'm missing something while cleaning the data for instance. Do you have any insights into what process or error could be generating this pattern for the XP-EHH? This is the first time I'm running this analysis.
by position: [https://www.dropbox.com/s/atdvmshpc558efn/xp-ehhh%20norm.png?dl=0]
by 200kn windows: [https://www.dropbox.com/s/yw4qrf2w728lvhf/xp-ehh_window.png?dl=0]
Thank you very much!
Catalina.
The text was updated successfully, but these errors were encountered: