Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dxy values interpretation? #92

Closed
MarinaSci opened this issue Nov 17, 2023 · 1 comment
Closed

Dxy values interpretation? #92

MarinaSci opened this issue Nov 17, 2023 · 1 comment
Labels
help wanted Extra attention is needed

Comments

@MarinaSci
Copy link

Dear Kieran,

Thank you for pixy !!
I work with a mix of pooled and individual data; individual data come from worms and the pooled data from a large population of eggs found in poo samples. I am interested in analysing mitochondrial genomes and nuclear repeat data from both and I use grenedalf for calculating Dxy/Pi from pooled data and I started using pixy ( Dxy/Pi) to process the individuals. I have some populations where n=1 (I know, not ideal) so I could not calculate pi unfortunately.

For sample pairs, I am trying to better understand the Dxy output from pixy. In terms of interpretation, I think that Dxy follows the same principle of Fst (low Fst = low genetic differentiation). I ran PCA analysis (based on allele frequencies), Dxy and Fst for some individual data, and I am not sure how to interpret the Dxy values because they disagree with Fst and PCA (getting low Dxy values but higher Fst values between populations and the PCA plots do show distinct clustering). For populations that should very diverse, I get very low Dxy (output attached); which indicates that they are 'mixing'.

I tried filtering for DP (> 10) and GQ (> 30) as you suggest on the paper and I get the same results. How do I interpret the value of Dxy = 0.0647012529439746 between China and the Honduras, for example, when the corresponding Fst is 0.7393705489876253?
Would be it because I am analysing mitochondrial genomes? Are there any other assumptions for Dxy by pixy? Should I filter the VCF further before running pixy?

I am attaching a subset of the VCF, the populations file and sharing the command I use.
Any help is greatly appreciated, thank you!!

All the best,
Marina

Command for pixy:
pixy --stats pi fst dxy --vcf TT_bcftools_for_pixy_NOMINIMUMALLELEFREQ_mtDNA_genes_invds.recode.vcf.gz --chromosomes 'NC_017750_Trichuris_trichiura_mitochondrion_complete_genome' --populations TT_INDVs.pops --window_size 20000 --n_cores 4 --bypass_invariant_check 'yes' --fst_type 'hudson' --output_prefix TT_indvs_pixy_output_hudson

Files to reproduce result:
TT_bcftools_for_pixy_NOMINIMUMALLELEFREQ_mtDNA_genes_invds_n5000.recode.vcf.gz

TT_INDVs.pops.txt

TT_indvs_pixy_output_hudson_fst.txt
TT_indvs_pixy_output_hudson_dxy.txt

@MarinaSci MarinaSci added the help wanted Extra attention is needed label Nov 17, 2023
@ksamuk
Copy link
Owner

ksamuk commented Nov 27, 2023

Hi Marina,

I'm unfortunately not able to help with the biological interpretation of your data. I will say that mitochondria tend to have different evolutionary dynamics/diversity than the nuclear genome, and low sample sizes could result in high variance in all estimates of pi/dxy/FST.

If you have any issues running or installing pixy, or believe you'd discovered an error, please re open an issue.

All the best,

Kieran

@ksamuk ksamuk closed this as completed Nov 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants