Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[W::vcf_parse] Contig '2' is not defined in the header. (Quick workaround: index the file with tabix.) #262

Closed
LiaOb21 opened this issue Dec 1, 2022 · 4 comments

Comments

@LiaOb21
Copy link

LiaOb21 commented Dec 1, 2022

Dear developers,

I used pggb to obtai a pangenome of arabidopsis using 17 assemblies.
I renamed the scaffolds following the PanSN-spec naming pattern, and these are some examples of how the names of the scaffolds look like in my input file (pggb_input.fasta):

>An-1#001#CABPTH030000098
>An-1#001#CABPTH030000099
...
>Cvi#001#LR699763
>Cvi#001#LR699764
...
>Eri-1#001#CABPTL010000072
>Eri-1#001#CABPTL010000073
...
>MyArabidopsis#001#scaffold_5
>MyArabidopsis#001#scaffold_6
...
>TAIR10#001#1
>TAIR10#001#2

I then followed the sequence partitioning tutorial without any error.
Now, when I run pggb on a single community, it works perfectly, I get the odgi images etc., but it stopped at this point:

pggb -i pggb_input.community.1.fa.gz -o output_c1 -n 17 -t 16 -V 'TAIR10:#:1000'

...
[vg::deconstruct] decompose VCF
vcfbub -l 0 -a 1000 --input output_c1/pggb_input.community.1.fa.gz.7659a9d.417fcdf.171f02d.smooth.final.TAIR10.vcf.gz
6.41s user 0.18s system 9% cpu 72.58s total 11868Kb max memory
vcfwave -I 1000 -t 16
678.04s user 16.98s system 957% cpu 72.61s total 428340Kb max memory
[W::vcf_parse] Contig '2' is not defined in the header. (Quick workaround: index the file with tabix.)
Encountered an error, cannot proceed. Please check the error output above.
If feeling adventurous, use the --force option. (At your own risk!)

Of course, I tried to index the file pggb_input.community.1.fa.gz.7659a9d.417fcdf.171f02d.smooth.final.TAIR10.vcf.gz with tabix, but this doesn't solve the issue.

I must specify also that I used a personal perl script for the renaming of the scaffolds: could this be a problem or the names look okay?

In addition, I have seen the partition-before-pggb.sh script, but for some reason it is not working.
I copied it in nano from GitHub and then chmod +x to make it executable. When I try to use it, it doesn't give any error but also I don't get any output. Any advice on how to get it working?

Thank you so much in advance!

Best regards,

Lia

@AndreaGuarracino
Copy link
Member

Hi @LiaOb21, could you please share a little piece of data to reproduce the problem? Or could you share the output_c1/pggb_input.community.1.fa.gz.7659a9d.417fcdf.171f02d.smooth.final.TAIR10.vcf.gz file?

About the partition-before-pggb, it should be equivalent to pggb, so they should both work. I have no idea about it at the moment. Have you done any other tests?

@LiaOb21
Copy link
Author

LiaOb21 commented Dec 2, 2022

Hi @AndreaGuarracino,

I dowloaded the S. cerevisiae data to test the pipeline, how explained in your Sequence partitioning tutorial.

1. partition-before-pggb

About partition-before-pggb I fixed the problem.
I'm going to explain how if this can be helpful for someone else:

I installed pggb through guix-genomic (more or less 2 months ago) and some packages were not installed automatically: wfmash and python-igraph (I installed both through conda).

Then, another important thing is that installing pggb through guix-genomics, I don't have the directories partition-before-pggb is looking for. So, I had to clone the repostitory from GitHub and make executable (with the command chmod +x) all the scripts contained in the ~/Software/pggb/scripts directory. Then, I used the partition-before-pggb script giving the entire path:

~/Software/pggb/partition-before-pggb -i scerevisiae7.fasta.gz -o output -n 7 -t 16 -p 90 -s 5k -V 'S288C:#:1000'

  1. [W::vcf_parse] Contig '[name]' is not defined in the header. (Quick workaround: index the file with tabix.)

I encountered the same error using S. cerevisiae data. I tried to run pggb on the community.0. The command I gave is this:

pggb -i scerevisiae7.fasta.gz.e363e5b.community.0.fa -o output -n 7 -t 16 -V 'S288C:#:1000'

And here the error:

[vg::deconstruct] decompose VCF
vcfbub -l 0 -a 1000 --input output/scerevisiae7.fasta.gz.e363e5b.community.0.fa.f5e9856.417fcdf.7493449.smooth.final.S288C.vcf.gz
0.04s user 0.00s system 15% cpu 0.28s total 3232Kb max memory
vcfwave -I 1000 -t 16
3.92s user 0.07s system 967% cpu 0.41s total 72388Kb max memory
[W::vcf_parse] Contig 'chrI' is not defined in the header. (Quick workaround: index the file with tabix.)
Encountered an error, cannot proceed. Please check the error output above.
If feeling adventurous, use the --force option. (At your own risk!)

Is there anything wrong in my command? Or maybe do I need someother package?

I installed separately:

  • vg downloading the repository from GitHub (version v1.44.0 "Solara").
  • vcfbub through cargo (vcfbub 0.1.0).
  • bcftools (I previously had an old version) through miniconda (bcftools 1.16).

To try to solve the issue (even if I think these were installed with the guix installation of pggb, since I didn't have errors), I installed also:

  • seqwish through miniconda (0.7.7)
  • smoothxg through miniconda (0.6.7)
  • gfaffix through miniconda (0.1.4)

I attach the .vcf.gz file.

Thank you so much in advance!

scerevisiae7.fasta.gz.e363e5b.community.0.fa.f5e9856.417fcdf.7493449.smooth.final.S288C.vcf.gz

@AndreaGuarracino
Copy link
Member

@LiaOb21, I think I have understood your problem!

In the file you've shared, you have this line

##contig=<ID=S288C#1#chrI#0,length=219929>

where it was added #0 and the end of the reference name (that is S288C#1#chrI). Moreover, the CHROM column in the VCF contains chrI and not that contig ID (that is 288C#1#chrI#0).

This is happened because you've used a version of > 1.40.0. In the Docker/Singularity/Bioconda version of pggb, we are stick at vg 1.40.0. At the moment, we do not recommend using vg versions later than 1.40.0 to avoid these problems. Please use vg 1.40.0 and let me know if the issue is resolved!

@LiaOb21
Copy link
Author

LiaOb21 commented Dec 13, 2022

Hi @AndreaGuarracino,

Sorry for the late reply!
Apparently changing the version of vg resolved the problem. I am using 1.40.0 now and I don't get that error anymore.

Thank you so much for your help.

Best regards,

Lia

@LiaOb21 LiaOb21 closed this as completed Dec 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants