issue about mergeGeno.py and parseVCFs.py #42

willright28 · 2020-10-26T06:33:29Z

Hi dear @simonhmartin
Thanks for these useful scripts! Recently when I used the mergeGeno.py to combine two geno files(they were prduced by your parseVCF.py and worked well), but somehow the result contained no information, just like this:
#CHROM POS A1 A2 A3 chr1 1 N N N
And the result was same to the outcome of parseVCFs.py(I tried to produce a geno file including all samples but failed).
Here is the code I used:python mergeGeno.py -i A_pop.geno.gz -i B_pop.geno.gz -f my_reference.fai --method all | bgzip > out.geno
So I wonder if there are anything wrong with my code or samples?
Thanks in advanced!!!

The text was updated successfully, but these errors were encountered:

simonhmartin · 2020-10-27T12:51:16Z

Hi,
I'm not sure what the issue might be. Could you share the top 3 lines from each of your geno files that you want to merge?

willright28 · 2020-10-28T08:59:11Z

Hi,
I'm not sure what the issue might be. Could you share the top 3 lines from each of your geno files that you want to merge?

Hi,
I have deleted them... but they just looked like the standard geno file demonstrated by you.
BTW, I put my sample together in one vcf and ran the popgenWindows.py with parameter --popsFile and -p for separating the population. It worked finally, but the result of pi, fst and dxy are quite different from vcftools's. I set the same windows and steps for the two methods, it really confuses me.

Please let me show you one example:
For pi by vcftools:
CHROM BIN_START BIN_END N_VARIANTS PI
scaffold_36 1 50000 579 0.00303697

For pi by popgenWindows.py:
scaffold,start,end,mid,sites,pi_sc
scaffold_36,1,50000,26960,582,0.2749

You could see the two result have huge difference, but actually the CHROM and window are same and the number of variants alike. I have no ideas what happen to this...

It would be really appreciate if you could share your valuable suggestions, looking forward to your reply.

simonhmartin · 2020-10-28T09:26:24Z

Regarding the pi value, the difference is probably because you have only variant sites. One of the major differences between my approach and vcftools, is that my scripts don't assume that missing sites are invariant. In other words, you need to have both the variant and invariant sites in the geno file to get a meaningful pi value. There is some discussion of these issues in this preprint: https://doi.org/10.1101/2020.06.27.175091

For Fst it should have less effect, but note that the specific details of how Fst and pi is calculated can also differ between approaches, so you will see some variation. For example, the approaches can differ in how they handle sites that have incomplete data.

willright28 · 2020-10-28T10:49:41Z

Regarding the pi value, the difference is probably because you have only variant sites. One of the major differences between my approach and vcftools, is that my scripts don't assume that missing sites are invariant. In other words, you need to have both the variant and invariant sites in the geno file to get a meaningful pi value. There is some discussion of these issues in this preprint: https://doi.org/10.1101/2020.06.27.175091

For Fst it should have less effect, but note that the specific details of how Fst and pi is calculated can also differ between approaches, so you will see some variation. For example, the approaches can differ in how they handle sites that have incomplete data.

Thanks so much for the help!
BTW, when I wanna take a look of the help information of some scripts, like ABBABABAwindows.py, it said:
line 64 print("Sorter received result", resNumber, file=sys.stderr) SyntaxError: invalid syntax
-------------------------------------------------------^
the‘^’ is point to the equal mark
Could you take a look of it? Or maybe I used it in a wrony way...

simonhmartin · 2020-10-28T11:48:00Z

You get this error with the command python ABBABABAwindows.py -h?
That is strange. I can't reproduce the error.

simonhmartin · 2020-10-28T11:48:40Z

Oh I see it's a python version error. You need to use Python 3

emgiles · 2024-01-17T13:05:51Z

Hi willright and Simon,

I am having the same issue as the one stated at the top of this thread. Willright, did you ever resolve this? I get the errors "Could not retrieve index file" and "Could not load .tbi/.csi index" when I try to convert my vcf to geno using the script parseVCFs.py. Below you can see my command and top several lines of the vcf. advice?? running on python 3

nohup python VCF_processing/parseVCFs.py -i scurra_viridula_zebrina_pileup_05jan2024.vcf.gz --skipIndels --minQual 30 --gtf flag=DP min=5 --threads 30 --fai jasmine-uni1728-mb-hirise-3bs35_08-29-2020__hic_output.fasta.fai > scurra_viridula_zebrina_pileup_17jan2024.geno &

##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##bcftoolsVersion=1.9+htslib-1.9
##bcftoolsCommand=mpileup --threads 20 --skip-indels -q 30 -Q 20 -f /path/to/genome/jasmine-uni1728-mb-hirise-3bs35_08-29-2020__hic_output.fasta -Ov -o scurra_viridula_zebrina_pileup_05jan2024.vcf
names_of_124.bam
##reference=file:///path/to/genome/jasmine-uni1728-mb-hirise-3bs35_08-29-2020__hic_output.fasta
##contig=<ID=Sc1nRTI_1;HRSCAF=9,length=9724429>
##contig=<ID=Sc1nRTI_2;HRSCAF=53,length=43341626>
##contig=<ID=Sc1nRTI_3;HRSCAF=77,length=26245687>
##contig=<ID=Sc1nRTI_4;HRSCAF=306,length=762119>
##contig=<ID=Sc1nRTI_5;HRSCAF=352,length=47867>

simonhmartin · 2024-01-17T13:32:53Z

Do you have bgzip and tabix installed. parseVCFs.py only runs on a bgzipped vcf. Otherwise you can use the single-thread parseVCF.py.

emgiles · 2024-01-17T13:42:48Z

tabix is from samtools, correct? samtools is installed.
I definitely used something else to zip the vcf. Perhaps that is the problem. Will try again with bgzip.
Thank you for the quick reply!!

emgiles · 2024-01-19T14:17:33Z

I discovered that parseVCFs.py only works if you have previously indexed the vcf with tabix. This requirement is not clearly written in the readme. Thanks!

hrluo93 mentioned this issue Apr 16, 2024

--minSites set #114

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

issue about mergeGeno.py and parseVCFs.py #42

issue about mergeGeno.py and parseVCFs.py #42

willright28 commented Oct 26, 2020

simonhmartin commented Oct 27, 2020

willright28 commented Oct 28, 2020 •

edited

Loading

simonhmartin commented Oct 28, 2020

willright28 commented Oct 28, 2020 •

edited

Loading

simonhmartin commented Oct 28, 2020

simonhmartin commented Oct 28, 2020

emgiles commented Jan 17, 2024

simonhmartin commented Jan 17, 2024

emgiles commented Jan 17, 2024

emgiles commented Jan 19, 2024

issue about mergeGeno.py and parseVCFs.py #42

issue about mergeGeno.py and parseVCFs.py #42

Comments

willright28 commented Oct 26, 2020

simonhmartin commented Oct 27, 2020

willright28 commented Oct 28, 2020 • edited Loading

simonhmartin commented Oct 28, 2020

willright28 commented Oct 28, 2020 • edited Loading

simonhmartin commented Oct 28, 2020

simonhmartin commented Oct 28, 2020

emgiles commented Jan 17, 2024

simonhmartin commented Jan 17, 2024

emgiles commented Jan 17, 2024

emgiles commented Jan 19, 2024

willright28 commented Oct 28, 2020 •

edited

Loading

willright28 commented Oct 28, 2020 •

edited

Loading