GraftM graft on big metagenome error #277

steff1088 · 2022-04-26T16:25:41Z

Hi all,

I ran into issues running my mcrA package on a big 45 GB metagenome in fastq format. I can't really interpret the error message so I was wondering if you had any ideas. The package runs fine on other metagenomes in fasta and fastq format. @wwood @geronimp

GraftM 0.13.1

                            GRAFT

                   Joel Boyd, Ben Woodcroft

                                                     __/__
                                              ______|
      _- - _                         ________|      |_____/
       - -            -             |        |____/_
       - _     >>>>  -   >>>>   ____|
      - _-  -         -             |      ______
         - _                        |_____|
       -                                  |______

04/23/2022 01:38:19 PM INFO: Working on 11774.2.218915.CGAACTG-ACAGTTC.filter-METAGENOME
Traceback (most recent call last):
File "/home/users/sbuessec/.local/bin/graftM", line 415, in
Run(args).main()
File "/home/users/sbuessec/.local/lib/python3.6/site-packages/graftm/run.py", line 613, in main
self.graft()
File "/home/users/sbuessec/.local/lib/python3.6/site-packages/graftm/run.py", line 388, in graft
diamond_db
File "/home/users/sbuessec/.local/lib/python3.6/site-packages/graftm/timeit.py", line 10, in timed
result = method(*args, **kw)
File "/home/users/sbuessec/.local/lib/python3.6/site-packages/graftm/sequence_searcher.py", line 851, in aa_db_search
hit_reads_orfs_fasta)
File "/home/users/sbuessec/.local/lib/python3.6/site-packages/graftm/sequence_searcher.py", line 943, in search_and_extract_orfs_matching_protein_database
hits
File "/home/users/sbuessec/.local/lib/python3.6/site-packages/graftm/sequence_searcher.py", line 534, in _extract_from_raw_reads
extern.run(extract_cmd, stdin='\n'.join(input_reads))
File "/home/users/sbuessec/.local/lib/python3.6/site-packages/extern/init.py", line 41, in run
raise ExternCalledProcessError(process, command)
extern.ExternCalledProcessError: Command mfqe --output-uncompressed --fasta-read-name-lists /dev/stdin --input-fasta <(awk '{print ">" substr($0,2);getline;print;getline;getline}' '11774.2.218915.CGAACTG-ACAGTTC.filter-METAGENOME.fastq') --output-fasta-files '/tmp/_raw_extracted_reads.famb1zbzrb' returned non-zero exit status 101.
STDERR was: b"[2022-04-23T20:45:46Z INFO mfqe] Read in 223 read names from /dev/stdin\n[2022-04-23T20:45:46Z INFO mfqe] Iterating input FASTQ file\n[2022-04-23T20:47:38Z INFO mfqe] Extracted 446 reads from 120829412 total\nthread 'main' panicked at 'Mismatching numbers of read names were observed. Expected:\n[223]\nbut found\n[446]', src/main.rs:333:9\nnote: run with RUST_BACKTRACE=1 environment variable to display a backtrace\n"STDOUT was: b''

The text was updated successfully, but these errors were encountered:

wwood · 2022-04-27T00:02:43Z

Hi,

I can't tell exactly since I don't have the command you used or the data, but the error message (found 446 reads when expected 223) suggests to me that the read sets are interleaved, since 223*2=446.

Does that help?

steff1088 · 2022-04-27T01:26:05Z

Thank you very much for the quick response.

The command I used was:
graftM graft --threads 8 --evalue 0.000000001 --forward 11774.2.218915.CGAACTG-ACAGTTC.filter-METAGENOME.fastq --graftm_package 500PSI_mcrAs_refined.gpkg --output_directory GraftM_output_11774.2.218915.CGAACTG-ACAGTTC_500PSI_mcrAs_refined_package --force

If the reads are interleaved, what can I do to make them compatible with the graft command?

wwood · 2022-04-27T01:35:34Z

Hi, You can either use the --interleaved flag instead of --forward. You can tell whether they are interleaved easily just by looking at the head of the file - they'll have 2 reads with the same name. Alternatively you can split the file up - there's plenty of tools out there for doing that out there. ben Ben WoodcroftMicrobial informatics group leader, ARC Future Fellow (+617) 3443 7334 Centre for Microbiome Research, Level 3, Translational Research Institute, School of Biomedical Sciences, Faculty of Health, Queensland University of Technology https://research.qut.edu.au/cmr/team/ben-woodcroft

…

On Apr 27 2022, at 11:26 am, steff1088 ***@***.***> wrote: Thank you very much for the quick response. The command I used was: graftM graft --threads 8 --evalue 0.000000001 --forward 11774.2.218915.CGAACTG-ACAGTTC.filter-METAGENOME.fastq --graftm_package 500PSI_mcrAs_refined.gpkg --output_directory GraftM_output_11774.2.218915.CGAACTG-ACAGTTC_500PSI_mcrAs_refined_package --force If the reads are interleaved, what can I do to make them compatible with the graft command? — Reply to this email directly, view it on GitHub (#277 (comment)), or unsubscribe (https://github.com/notifications/unsubscribe-auth/AAADX5HD7CIFV4BJ6O7JZHLVHCJTPANCNFSM5UMM6IXA). You are receiving this because you were mentioned.

steff1088 · 2022-04-27T02:15:24Z

Thanks Ben, that did the trick!

-steffen

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GraftM graft on big metagenome error #277

GraftM graft on big metagenome error #277

steff1088 commented Apr 26, 2022

wwood commented Apr 27, 2022

steff1088 commented Apr 27, 2022

wwood commented Apr 27, 2022 via email

steff1088 commented Apr 27, 2022

GraftM graft on big metagenome error #277

GraftM graft on big metagenome error #277

Comments

steff1088 commented Apr 26, 2022

wwood commented Apr 27, 2022

steff1088 commented Apr 27, 2022

wwood commented Apr 27, 2022 via email

steff1088 commented Apr 27, 2022