Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem processing non-NCBI generated GBK files #233

Closed
HilbigA opened this issue Jun 1, 2023 · 5 comments
Closed

Problem processing non-NCBI generated GBK files #233

HilbigA opened this issue Jun 1, 2023 · 5 comments
Assignees

Comments

@HilbigA
Copy link

HilbigA commented Jun 1, 2023

Hi,

I am trying to run panaroo on a collection of .gbk files generated by Prokka (without compliance flag). I am running the version of panaroo 1.3.3 with python 3.9.12 and I am using the "remove invalid genes"-flag.

I have tried running the same command on .gff files from the Prokka output without a problem. Unfortunately I was provided half of my dataset in .gbk files (from Prokka but not processed by me so I dont have those gff files).

Is there something I can do to make use of my gbk files or, otherwise should I use a script like the convert-refseq-to-gff (although they dont originate from refseq) to pre-process them?

I am posting the error below:
Thanks a lot for any help,
Antonia

pre-processing gff3 files...
0%| | 0/2 [00:00<?, ?it/s]Problem reading GFF3 file: SRR6327902.scf.gbk
0%| | 0/2 [00:00<?, ?it/s]
Error reading prokka input!
Traceback (most recent call last):
File "/home/ubuntu/miniconda3/lib/python3.9/site-packages/panaroo/prokka.py", line 278, in process_prokka_input
gene_sequence_list = Parallel(n_jobs=n_cpu)(
File "/home/ubuntu/miniconda3/lib/python3.9/site-packages/joblib/parallel.py", line 1085, in call
if self.dispatch_one_batch(iterator):
File "/home/ubuntu/miniconda3/lib/python3.9/site-packages/joblib/parallel.py", line 901, in dispatch_one_batch
self._dispatch(tasks)
File "/home/ubuntu/miniconda3/lib/python3.9/site-packages/joblib/parallel.py", line 819, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "/home/ubuntu/miniconda3/lib/python3.9/site-packages/joblib/_parallel_backends.py", line 208, in apply_async
result = ImmediateResult(func)
File "/home/ubuntu/miniconda3/lib/python3.9/site-packages/joblib/_parallel_backends.py", line 597, in init
self.results = batch()
File "/home/ubuntu/miniconda3/lib/python3.9/site-packages/joblib/parallel.py", line 288, in call
return [func(*args, **kwargs)
File "/home/ubuntu/miniconda3/lib/python3.9/site-packages/joblib/parallel.py", line 288, in
return [func(*args, **kwargs)
File "/home/ubuntu/miniconda3/lib/python3.9/site-packages/panaroo/prokka.py", line 127, in get_gene_sequences
raise RuntimeError("Error reading prokka input!")
RuntimeError: Error reading prokka input!

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/ubuntu/miniconda3/bin/panaroo", line 10, in
sys.exit(main())
File "/home/ubuntu/miniconda3/lib/python3.9/site-packages/panaroo/main.py", line 312, in main
process_prokka_input(args.input_files, args.output_dir,
File "/home/ubuntu/miniconda3/lib/python3.9/site-packages/panaroo/prokka.py", line 290, in process_prokka_input
raise RuntimeError("Error reading prokka input!")
RuntimeError: Error reading prokka input!

@HilbigA
Copy link
Author

HilbigA commented Jun 1, 2023

My apologies, I misunderstood that files need to be provided as a list for all filetypes that arent GFF3. While providing a list with the gbk files worked well on a test-subset, I am now running into this error:

.gbk
Traceback (most recent call last):
File "/home/ubuntu/miniconda3/bin/panaroo", line 10, in
sys.exit(main())
File "/home/ubuntu/miniconda3/lib/python3.9/site-packages/panaroo/main.py", line 298, in main
files.append(create_temp_gff3(line[0], None, temp_dir))
File "/home/ubuntu/miniconda3/lib/python3.9/site-packages/panaroo/prokka.py", line 84, in create_temp_gff3
convert_gbk_gff3(gff_file, temp_dir + "temp_gffs/" + prefix + '.gff', True)
File "/home/ubuntu/miniconda3/lib/python3.9/site-packages/panaroo/biocode_convert.py", line 150, in convert_gbk_gff3
gene.add_mRNA(mRNA)
UnboundLocalError: local variable 'gene' referenced before assignment

Any ideas whats causing that? thanks a lot.

@HilbigA
Copy link
Author

HilbigA commented Jun 1, 2023

I solved my issue by chucking gff and gbk files together and using a list of those as input, which worked.

However I am leaving the issue open as it is not clear to me why one subset of the .gbk files was rejected (see 2nd post). Is this connected to the --compliance tag in Prokka?

@gtonkinhill
Copy link
Owner

Hi,

Sorry for the slow response and I'm glad you got it running.

If it's possible, it would be great if you could share one of the gbk files that caused this error along with the exact command you ran. This would help me a lot in working out what went wrong.

@gtonkinhill gtonkinhill self-assigned this Jun 7, 2023
@gtonkinhill gtonkinhill reopened this Jun 23, 2023
@HilbigA
Copy link
Author

HilbigA commented Jun 23, 2023

Hi, now my apologies for the late reply, I must have missed the notification that you responded. Here is the command I used:
ls .g > list.txt
panaroo -i list.txt -o panaroo_out --clean-mode strict --remove-invalid-genes ->

Github is saying that I mustnt upload gbk format - how do I provide this?

The gbk files that worked fine were produced with the fq2dna pipeline (https://gitlab.pasteur.fr/GIPhy/fq2dna), here however the gff files are not routinely provided with the output but gbk only. The gbk files that did not work were generated by me when running Prokka on contigs that were pre-assembled. From this prokka run I did myself I had the gff files that worked with panaroo, command (where var is my contig.fasta file and pref the basename):
prokka $var --prefix $pref --usegenus --outdir ../ProkkaOut/$pref --kingdom Bacteria --genus Streptococcus --species agalactiae --strain $pref

Thanks a lot for looking into this!

@gtonkinhill
Copy link
Owner

Hi,

Apologies, I lost track of this issue. I have now added extra error messages to v1.3.4 which indicate that Panaroo requires GBK files that are compliant with Genbank/ENA/DDJB.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants