New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bcftools can't parse CONTIG ID containing a comma #266
Comments
* allow spaces between keys and values when parsing in header lines * these spaces will be dropped when writing out the header e.g. `##reference=<ID=hs37d5, Source=blah>` and `##reference=<ID=hs37d5, Source = blah>` will become `##reference=<ID=hs37d5,Source=blah>` Fixes samtools/bcftools#266
* allow spaces between keys and values when parsing in header lines * these spaces will be dropped when writing out the header e.g. `##reference=<ID=hs37d5 , Source=blah>` and `##reference=<ID=hs37d5, Source = blah >` will become `##reference=<ID=hs37d5,Source=blah>` Fixes samtools/bcftools#266
This is a similar bug that I just found with @andrewdhuang. We tried
We believe that it is attempting to put the Please forgive me if this is the wrong place to report this bug. |
Hi, this is an issue in htslib, but the report here is OK. It is now fixed by pd3/htslib@45379c2 and pd3/htslib@d04b77a. (This test file in bcftools will need update after this is merged, in the test fills sequence names on the fly https://github.com/samtools/bcftools/blob/develop/test/isec.tab.out) Cheers |
Original bug auto-closed now that samtools/htslib#214 has been merged; re-opening this issue to track the new comma/quoting issue. |
…e,1"> * fixes a parsing problem when comma in the contig name (closes samtools/bcftools#266) * when injecting contigs in the header from the index, use quoted contig IDs * some bcftools test output files will need to be updated if this is pulled in * there are places in at least bcftools/vcfconvert.c and samtools/bam_plcmd.c where contig header lines are created. These should also be changed to have quoted IDs * still and issue is what happens if there is a `"` in the contig name
I believe this is now solved, please reopen if not. |
I run bcftools on the vcf files generated from freebayes, but got problems on parsing the header lines (see below commands and error messages). I used bcftools version 1.8, but sorted with samtools version 1.3. Is it a problem? bcftools consensus -f ../../Euc_RefSeq.fas -I test.vcf.gz -o test.unmask.fa [E::bcf_hdr_parse_line] Could not parse the header line: "##contig=<ID=Egrandis_33521_Branched_chain_aminotransferase_BCAT1,_pyridoxal_phosphate_enzymes_type_IV_superfamily,length=1650>" [W::bcf_hdr_parse] Could not parse header line: ##contig=<ID=Egrandis_33521_Branched_chain_aminotransferase_BCAT1,_pyridoxal_phosphate_enzymes_type_IV_superfamily,length=1650> [E::bcf_hdr_parse_line] Could not parse the header line: "##contig=<ID=Egrandis_contig_40711_Uncharacterized_membrane_protein,_predicted_efflux_pump,length=1488>" [W::bcf_hdr_parse] Could not parse header line: ##contig=<ID=Egrandis_contig_40711_Uncharacterized_membrane_protein,_predicted_efflux_pump,length=1488> [E::bcf_hdr_parse_line] Could not parse the header line: "##contig=<ID=Egrandis_contig_40796__Protein_of_unknown_function_(DUF3675)_Zinc_finger,C3HC4_type(RING_finger),length=810>" [W::bcf_hdr_parse] Could not parse header line: ##contig=<ID=Egrandis_contig_40796__Protein_of_unknown_function_(DUF3675)_Zinc_finger,C3HC4_type(RING_finger),length=810> .... Note: the --sample option not given, applying all records The fasta sequence does not match the REF allele at cpl_Euc_Mauve_Alignment_extraction_ndhF:196: .vcf: [N] .vcf: [N] <- (ALT) .fa: [M]GAGTTCGGTCACTTAATAGATCCACTTACTTCTATTATGTTAATATTAATTACTACTGTTGGAATTTTGGTTCTTTTTTATAGTGACAATTATATGTCTCATGATCAAGGATATTTGAGATTTTTTGCTTATATGAGTTTTTTCAATACTTCCATGTTGGGATTAGTTACTAGTTCGAATTTGATACAAATTTATATTTTTTGGGAATTAGTTGGAATGTGTTCTTATCTATTAATAGGTTTTTGGTTCACACGACCTAGTGCGGCGACTGCTTGTCAAAAAGCGTTTGTAACGAATCGTGTAGGCGATTTTGGTTTATTATTAGGAATTTTAGGTCTTTATTGGATAACCGGTAGTTTTGAATTTCGGGATTTGTTCCAAATATTGAATAACTTGATTTATAATAATGAGGTTCCTTTTTTATTTCTTACTTTGTGTGCCTTTCTTTTATTTGCAGGTGCAGTTGCGAAATCGGCACAATTCCCCCTTCATGTATGGTTACCTGATGCCATGGAAGGCCCTACTCCCATTTCGGCTCTTATACATGCCGCTACTATGGTAGCAGCGGGCATTTTTCTTGTAGCTCGACTTCTTCCTCTTTTTATAATCATACCTTACATAATGAATTTCATATCTTTAATAGGTATAATAACAGTATTATTAGGAGCTACTTTAGCTCTTGCTCAAAAAGATATTAAAAGAGGTTTAGCTTATTCTACAATGTCTCAATTGGGTTATATGATGTTAGCTCTAGGTATGGGGTCTTATCGAGCCGCTTTATTTCATTTGATTACTCATGCTTATTCAAAAGCATTGTTGTTTTTAGGATCCGGATCAATTATTCATTCAATGGAAGCTATTGTTGGATATTCTCCAGATAAAAGTCAGAATATGGTTCTTATGGGAGGTTTAAAAAAGCATGTACCAATTACAAAAACTGCTTTTTTAGTAGGTACACTTTCTCTTTGTGGTATTCCCCCCCTTGCTTGTTTTTGGTCCAAAGATGAAATTCTTAATGATAGTTGGTTGTATTCACCTATTTTCGCAATAATAGCTTGTTCCACAGCAGGATTAACCGCATTTTATATGTTTCGAATCTATTTMCTTACTTTTGAGGGACATTTCAATGTTCATTTTCAAAATTACAATGGTCAAAAAAGTAGTTCCTGCTATTCAATATCTCTATGGGGAAAAGAAGTGCCAAAAAYGATTAAAAATCATTTTTGTTTATTAAGTTTATTRACAATGAATAATAATGAAAGGRCTTCTTTTTTTTCGAATAARACATATCAAATTGATGGTAATGGAAAAAACAGGATACGYCCTTTTATTACTATTACTMATTTTGTCACTAAAAAWACTTTCTCTTATCCTCATGAATCGGACAATACCATGTTRTTTTCTATGGTTATATTAGTGYYATTTACTTTGTTTGTTGGGGTCGTAGGAATTCCCTTTGCTTTTAATCAAGAAGAAATTCATTTGGATATATTATCTAAATTGTTAAATCCGTCTATAAACCTTTTACATCCGAATTCAAATAATTCGGTGGATTGGTATGAATTTGTGACAAATGCAAGTTTTTCTGTCAGWATAGCTTTTTTCGGAATATTTATAGSGTCTTTTTTATATAASCCTATTTATTCATCTTTACAAAATTTGAACTTACTRAATTCGTTTTCTAAAAGAGGTYCTAATMGAATTTTAGGGGACAGAATAAGAAATGGGATATATGATTGGTCATATAATCGTGGTTACATAGATGCTTTTTATACAATAYCCTTAACTCAGGGTATAAGAGGACTAGCTGAACTAATTCATTTTTTGGATAGACGASTAATTGATGGAATTACGAATGGTYTCG bcftools consensus -f ../../Euc_RefSeq.fas -I -m test-region.bed test.vcf.gz -o test.mask.fa [E::bcf_hdr_parse_line] Could not parse the header line: "##contig=<ID=Egrandis_33521_Branched_chain_aminotransferase_BCAT1,_pyridoxal_phosphate_enzymes_type_IV_superfamily,length=1650>" [W::bcf_hdr_parse] Could not parse header line: ##contig=<ID=Egrandis_33521_Branched_chain_aminotransferase_BCAT1,_pyridoxal_phosphate_enzymes_type_IV_superfamily,length=1650> [E::bcf_hdr_parse_line] Could not parse the header line: "##contig=<ID=Egrandis_contig_40711_Uncharacterized_membrane_protein,_predicted_efflux_pump,length=1488>" [W::bcf_hdr_parse] Could not parse header line: ##contig=<ID=Egrandis_contig_40711_Uncharacterized_membrane_protein,_predicted_efflux_pump,length=1488> [E::bcf_hdr_parse_line] Could not parse the header line: "##contig=<ID=Egrandis_contig_40796__Protein_of_unknown_function_(DUF3675)_Zinc_finger,C3HC4_type(RING_finger),length=810>" Could not parse bed line: cpl_Egrandis_20012500 480 Failed to initialize mask regions |
Still issues in 2021... [E::bcf_hdr_parse_line] Could not parse the header line: "##contig=<ID=JF781502 Human rhinovirus B strain HRV-B84_p1098_sR861_2008 polyprotein gene, complete cds,length=6941>" Any solutions? |
I have vcf files with this line in the header:
bcftools fails to parse the file due to the space after the comma (before Source=). It works when I delete the space.
I get this error:
The text was updated successfully, but these errors were encountered: