-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Variants missed in iHS output #64
Comments
Hi there, Since selscan filters low frequency variants based on the sample allele frequencies of the data provided, you'll inevitably find some scores "missing" from one dataset to another. You could set |
Hi, Dr. @szpiech , thanks for the advices. However I feel what I encountered might not a low-frequency variant issue. I should make it clear that the two datasets were used as discovery and replication sets, and the variants with iHS scores in only one dataset are not rare variants (AF ~ 0.40), I am wondering if all variants in a given haplotype are reported? May I extract raw input/output data and send it to you by email for a closer lookup? Another question is will it be recommended to use HG38 build instead of HG19 as the latter contains regions with alternate loci or with fix patches? will such regions a problem for iHS analysis? |
selscan will only report a score at site above the MAF threshold, but scores at sites above the MAF threshold can also be dropped if they are close to a large gap or near the chromosome boundaries. |
It seems that even with --trunc-ok some of the variants above the MAF threshold were dropped in the output file. I have reproduced this issue with the 1KGP data, and sent the script/results to you in email. Could you kindly have a check (just in case it went to the spam folder). Thank you very much! |
Hi @psytky03! Did you manage to know what is going on with the missing variants? I think something simmilar is happening to me. |
I’m still looking into this, but it’ll be a few weeks as I’m on vacation starting today. |
Enjoy your vacation!! :)
On Jul 15, 2021, at 10:20 AM, Zachary A Szpiech ***@***.******@***.***>> wrote:
This Message Is From an External Sender
This message came from outside your organization.
I’m still looking into this, but it’ll be a few weeks as I’m on vacation starting today.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub<https://urldefense.com/v3/__https://github.com/szpiech/selscan/issues/64*issuecomment-880876923__;Iw!!LQC6Cpwp!4rRBcst-oPN9f9KvE77c7rdLzIIIqERbCU71DIhO2HC2mayYuDHQjF6OFucx6ALtxQo8$>, or unsubscribe<https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/ABV5EEJU3PIBSDMSGSAHKILTX4KFFANCNFSM47TPMT5A__;!!LQC6Cpwp!4rRBcst-oPN9f9KvE77c7rdLzIIIqERbCU71DIhO2HC2mayYuDHQjF6OFucx6NDGsiex$>.
|
Just wondering if there is any update on this issue @szpiech, thanks! |
Okay, sorry for the delay in figuring this out. This one was tricky. The sample example files that were sent to me, which contained a segment of human chr7 from TGP, showed that a large number of sites with MAF > 0.05 were not being reported in the genomic interval chr7:142038782-142257749. After a lot of head-scratching, I believe the problem traces to the genetic map file that is being used. In particular, all sites in chr7:142038782-142257749 are assigned identical genetic map positions (154.2832 cM), and I believe this is causing the computation to hit a condition where they get dropped, perhaps interacting with the 50kb gap that also exists in the region (although I'm unsure exactly which condition). A solution would be to either assign a finer genetic map position to these variants, use nSL, or use the --pmap flag when computing iHS (this will use the physical map as the distance for integrating). Using the example files I was provided, I was able to get scores reported with the following invocation. Hope this helps! |
Dear @szpiech,
I attempted to compare the iHS results of two datasets (which I processed separated) but found not all variants were included in the iHS outputs, which made the direct comparison difficult. Is there a way to report all variants?
Thank you very much!
The text was updated successfully, but these errors were encountered: