Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No sel_coef #59

Closed
PhDMattyB opened this issue Jul 23, 2020 · 4 comments
Closed

No sel_coef #59

PhDMattyB opened this issue Jul 23, 2020 · 4 comments

Comments

@PhDMattyB
Copy link

Hi @hardingnj, I just have a quick question about the output of xpclr. Whenever I run xpclr I'm not getting a lot of the columns in the output file, mainly the selection coefficient. Xpclr seems to be outputting everything up until the nullL column and then dropping everything else. I just wanted to check with you to see if this is normal or not.

input code: xpclr --input popn_chr1_phased.vcf --samplesA popn_benthic.txt --samplesB popn_pelagic.txt --chr 1 --minsnps 2 --maxsnps 100 --size 200000 --step 1000 --phased --ld 0.95 --rrate 0.00000008 --out ./popn_sweep_chr1

Any advice would be greatly appreciated!

Matt

@hardingnj
Copy link
Owner

Hmm- seems like SNPs are being skipped- but your min SNPs seems very low. Can you share the std out, and maybe a couple of lines from your output file?

@PhDMattyB
Copy link
Author

Here is the first couple of the lines from the output file:

id chrom start stop pos_start pos_stop modelL nullL sel_coef nSNPs nSNPs_avail xpclr xpclr_norm 1_00000001_00200000 1 1 200000 0 0 0.0 0.0 1_00000101_00200100 1 101 200100 0 0 0.0 0.0 1_00000201_00200200 1 201 200200 0 0 0.0 0.0 1_00000301_00200300 1 301 200300 0 0 0.0 0.0

Here is what xpclr outputs when its running:
2020-07-27 09:35:04: INFO: sunning xpclr v 1.1.2 2020-07-27 09:35:04: INFO: loading VCF 2020-07-27 09:35:04: INFO: VCF loading complete 2020-07-27 09:35:04: INFO: 393 SNPs in total are in the provided input files 2020-07-27 09:35:04: INFO: 0 SNPs excluded as multiallelic 2020-07-27 09:35:04: INFO: 0 SNPs excluded as missing in all samples in a population 2020-07-27 09:35:04: INFO: 378 SNPs excluded as invariant or singleton in population 2 2020-07-27 09:35:04: INFO: 15/393 SNPs included in the analysis (3.82%) 2020-07-27 09:35:04: INFO: Done dropping SNPs from the analysis. XP-CLR algorithm starting 2020-07-27 09:35:04: INFO: No genetic distance provided; using rrate of 1e-08/bp 2020-07-27 09:35:04: INFO: Omega estimated as: 0.834940 2020-07-27 09:36:18: INFO: Analysis complete. Output file ./GSBPI_sweep_chr1_test

This is from the analysis for the first chromosome. The amount of SNPs being excluded is a bit worrying. The vcf files for the phased data is straight from SHAPEIT and I'm wondering if something funky happened when it tried to impute the phased SNPs.

@hardingnj
Copy link
Owner

hardingnj commented Jul 27, 2020

Yes- that's the issue I think, lack of SNPs.

What you need from population 2 is a good estimate of true ancestral allele frequencies. This allows the model to identify large deviations in pop 1 from the "original" frequency. ie fast drift. When this happens several times in the same window- that's good evidence for selection.

If you have few samples in pop 2, you don't get a good estimate of ancestral AFs, and your deviations are very stochastic.

Here it seems like the majority of your SNPs are fixed in population 2, which gives us no (useful) information about the true ancestral allele frequencies. The only way you can get from 0.0 or 1.0 to an AF between is via a mutation, not via drift. Therefore they are of no value to XPCLR.

You probably need larger samples, or alternatively to infer selection perhaps look at something like PBS, which requires an outgroup.

@PhDMattyB
Copy link
Author

That makes total sense! Thanks a ton for all of your help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants