Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interpretation of normalized XP-EHH #99

Open
catferna opened this issue Jul 11, 2023 · 3 comments
Open

Interpretation of normalized XP-EHH #99

catferna opened this issue Jul 11, 2023 · 3 comments

Comments

@catferna
Copy link

Dear @szpiech, thank you for developing this tool and making our life easier.
I have a question regarding what seems extremely high XP-EHH scores. I run this analysis using array data of ±500,000 SNPs from two human populations. As can be seen in the plots (link below), there are "too many" peaks and values seem"too extreme". Never seen anything like this, at least in other articles comparing human populations. I already checked several times all the steps to run this analysis but I'm wondering if I'm missing something while cleaning the data for instance. Do you have any insights into what process or error could be generating this pattern for the XP-EHH? This is the first time I'm running this analysis.
by position: [https://www.dropbox.com/s/atdvmshpc558efn/xp-ehhh%20norm.png?dl=0]
by 200kn windows: [https://www.dropbox.com/s/yw4qrf2w728lvhf/xp-ehh_window.png?dl=0]
Thank you very much!
Catalina.

@szpiech
Copy link
Owner

szpiech commented Jul 15, 2023 via email

@catferna
Copy link
Author

catferna commented Aug 8, 2023

Hi Zachary, thank you so so much for your reply! I would have never guessed this was the problem and nobody else I talked to noticed this earlier. So, the comparison I was (am!) trying to make is between individuals from two distinct indigenous populations in South America, and according to the logfile, the variance is indeed quite low (0.094). I run again the XP-EHH and compared these samples against Chinese individuals (CHB; 1000 Genomes) and now the variance is higher (0.157) but I still get "too many" and "extreme" values, so it's hard to interpret. According to your experience and knowledge, would you interpret the first fact (very low variance between two pops) as an indication that this test may not be suitable to detect selection in populations that probably split not too long ago? If so, is there any other tool that you would recommend? And my second question is what additional or different data filters (maf or other) in the data for the xp-ehh analyses itself could be added/ eliminated to be able to capture the 'true' estimate for this statistic? or maybe, would it make sense to change some of the default parameters for the normalization step? I would really appreciate any insight in this regard. Thanks a lot!

@szpiech
Copy link
Owner

szpiech commented Aug 11, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants