vcf_reader = vcf.Reader(open(path,"r"), strict_whitespace=True)
vcf_writer = vcf.Writer(open(new.vcf, 'w'), vcf_reader, lineterminator='\n')
for record in vcf_reader:
When I run the command, I see 11 lines missing from the end. I understand cleaning of some lines at the meta but have no idea what causing this. Any fix or information what I am doing wrong?
Hi @omergerdan, thanks for your report. Could you provide some additional information, i.e., the file you used for testing and which lines you are missing?
I used a vcf file which has 200k+ lines. Meta and most lines are written correctly however 11 lines from the end is missing at new file.
I wanted to check if I could re-write my vcf correctly with pyvcf but couldnt.
There is nothing specific at the vcf file where it fails, just regular lines, chrom id, pos db id etc.
I'm affraid we really need a concrete case of this to further analyse the problem.
Actually, our unit tests include some simple cases where it is asserted that the writer outputs exactly the records it was given. So there is probably something specific either in your setup or input file (but of course it can still be a but in PyVCF).
There is nothing really specific at the position. I will run some more tests and post them here hopefully next week if we can't come up with a solution.
Is it consistent? It would be nice if you could come up with an example of a small file that gives the problem.
I ran it twice and got the same result for the same trial, I can claim it is consistent. However I was busy at that time therefore to be sure it would be better for me to rerun for posting samples and clarify the problem better.
VCF writing kind of snuck in to this project without proper development - in a way it was to easy. @omergerdan please can we have a test case, as this would greatly help us do this properly.