Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A question about genetic map #90

Open
Captain-Pam opened this issue Jan 9, 2023 · 16 comments
Open

A question about genetic map #90

Captain-Pam opened this issue Jan 9, 2023 · 16 comments

Comments

@Captain-Pam
Copy link

Hi szpiech,
Thank you for your tool. I have a question about it.
The map file is needed in the calculation of iHS. The genetic position of the rsID is required in the map file, but the SHAPEIT has only a small fraction of the loci of the genetic map for 1000 Genome. However, selscan needs the genetic distance of all region rsIDs when calculating iHS, i.e., one genetic position per rsID. Can I use physical position/1e6 instead of genetic position? Will this cause a large estimation error?

I look forward to hearing from you!

@szpiech
Copy link
Owner

szpiech commented Jan 9, 2023 via email

@Captain-Pam
Copy link
Author

Hello,
Thank you for your reply. Linear interpolation by distance? Or use predictGMAP? Perhaps using --PMAP is the most straightforward. Do these methods have a great impact on the results?

@szpiech
Copy link
Owner

szpiech commented Jan 9, 2023 via email

@Captain-Pam
Copy link
Author

Hi,
Thank you for your reply.
1)Which of these do you recommend?
2)I probably figured out how to run it. selscan should have to use genotype data, I currently have plink data of 2000 individuals, then I need to convert the plink file to .hap format using ShapIT, then make the map file (depending on whether genetic position or physical position is used), and finally I can run selscan. Great!

@szpiech
Copy link
Owner

szpiech commented Jan 9, 2023 via email

@Captain-Pam
Copy link
Author

Hi,
Thank you for your advice. There is an interesting question
1)After the significant loci obtained by GWAS, can I use 1000 Genome as input file for --hap and --map to calculate iHS without using my own genotype?
2) Similarly, when comparing XP-EHH between two groups again, my data VS 1000G (EUR) or 1000G (EAS) vs 1000G (EUR) is still confusing. This may not be related to the phenotype, but rather the need to use the genotype of the reference population to be able to calculate iHS, XP-EHH.

@szpiech
Copy link
Owner

szpiech commented Jan 10, 2023 via email

@Captain-Pam
Copy link
Author

Hi,
Thank you for your time. Sorry maybe I didn't say myself clearly. Let me summarize my questions.
I have a genotype data of more than 2000+ individuals and through GWAS I get some significant locus. My aims are to test if these regions are subject to positive selection using selscan or to perform a genome-wide scan to find some positive selection locus.
Following your suggestion, I downloaded the 1000 Genome genetic map file to be used as a reference map file for predictGMAP to get my map file.

  1. max gap parameter will lose some loci, do we need to increase the gap?
  2. I am confused about the --hap parameter of selscan, do I need to extract the haplotype from my data using shapeit or just use the hap data provided by 1000 Genome? Similarly, when comparing the difference between populations (EAS vs EUR) using XP-EHH.

Have a nice day.

@szpiech
Copy link
Owner

szpiech commented Jan 12, 2023 via email

@Captain-Pam
Copy link
Author

Hi,
Thank you for your answer. As you suggested, I will follow your advice and use the reference population (matched my population) to calculate the iHS.
The genome-wide scan is using the top 1% iHS scores to describe its importance for positive selection. But GWAS is several significant regions, although I can calculate iHS for each region, but how can I describe its significance? Can iHS comparisons be made just at SNPs within regions, the top 1% to account for positive selection on certain SNPs?

@Captain-Pam
Copy link
Author

Hi,
Thank you for your tool. I am going to calculate the background distribution of iHS for 1000G, but how can I determine whether the allele is an ancestral allele or a derived allele. I didn't find a better way.
I look forward to hearing from you!

@szpiech
Copy link
Owner

szpiech commented Feb 14, 2023 via email

@Captain-Pam
Copy link
Author

Hi,
Thank you for your reply.This solved my problem.
I have found the AA (ancestral allele) in this files, and considering that the iHS calculation does not support INDEL, I removed INDEL, there are SNPs with AA of . or null, or unknown, should these SNPs be removed, because they will bring unreliability in the calculation, so they are not involved in the calculation of iHS? But I also found that many SNP's REF allele could be AA, and I'm torn again whether I should delete it here.

@szpiech
Copy link
Owner

szpiech commented Feb 22, 2023 via email

@Captain-Pam
Copy link
Author

Hi,
Thank you for your reply. You are right. I checked these SNPs for which AA was not provided, and most of them were low frequency . Therefore, this may not be an issue for most GWAS requiring a threshold of MAF 0.01.

  1. Regarding AA, the information of 1000 GP may be too old, and some researchers started to reacquire AA starting from the comparison of near-origin species, for example, the sequence comparison of more than 10 kinds of chimpanzees. But this will take a lot of time for sequence alignment. If I have time, I would like to try to compare more species to get AA.

  2. The new version of selscan, which can use unphased haplotype solves a big problem. Overall, phased haplotype is a bit more accurate.

  3. I found the norm program that doesn't require me to standardize iHS myself. Great! Across genome-wide iHS normalization (frequency bins), this should be considered for all chromosomes, do I need --files followed by all chromosome iHS results for joint normalization?

@szpiech
Copy link
Owner

szpiech commented Feb 24, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants