Sequence ids don't overlap at metagenome pipeline step #16

gavinmdouglas · 2018-05-28T15:01:11Z

Posted here from another issue from @itiago:

Hi Gavin
so, the hsp.py worked fine (thanks for that), now the problem is the next
step: Metagenome prediction.
I don't understand this step, once the biom file from mothur only provides
OTUs and relative abundance, and how it will relate to the later results
obtained from picrust2.
I did run the make.biom file from mothur and have a biom file (please
remember that I only have one sample, so I did this with a biomfile for
just one sample)
then I run the command and got this error result:
(picrust2-dev) igor@ubuntu:~/Desktop/Alfaguara/Picrust2$
metagenome_pipeline.py -i shared.0.03.biom -m
16S_predicted.tsv -f EC_predicted.tsv
-p 4 -o metagenome_prediction
Traceback (most recent call last):
File
"/home/igor/miniconda2/envs/picrust2-dev/bin/metagenome_pipeline.py", line
6, in
exec(compile(open(file).read(), file, 'exec'))
File "/home/igor/picrust2/scripts/metagenome_pipeline.py", line 76, in

main()
File "/home/igor/picrust2/scripts/metagenome_pipeline.py", line 64, in
main
output_normfile=True)
File "/home/igor/picrust2/picrust2/metagenome_pipeline.py", line 46, in
run_metagenome_pipeline
pred_marker)
File "/home/igor/picrust2/picrust2/util.py", line 246, in
three_df_index_overlap_sort
"input files.")
ValueError: No sequence ids overlap between all three of the input files.

Thanks for any help
Best
Igor

gavinmdouglas · 2018-05-28T15:10:46Z

@itiago - This step takes in the relative abundances of OTUs or ASVs and multiplies the abundance of all predicted gene families by this relative abundance. It then outputs a table of function abundances for each sample (both stratified and unstratified by which sequence contributed that function). The abundance of marker genes is also used to normalize the abundances of the input OTUs/ASVs as well.

I believe the problem you're running into here is that the cluster ids in the mothur output file don't match the ids in the fasta file you placed into the tree. Is this correct? For instance, is the sequence "M01028_125_000000000-AN36D_1_1101_9843_5463" the name of an OTU in the mothur output file? If so I'm not sure why this error is coming up and it would be great if you could send me the input files you're trying to use privately.

Thanks,

Gavin

gavinmdouglas · 2018-05-29T16:26:28Z

I believe this problem is due to confusion about which sequences should be added into the tree (i.e. that it should be OTU representative sequences in this case) so I'm closing it for now.

itiago · 2018-05-29T17:03:35Z

Agree, will rerun the workflow with representatives of each otu, and rename the otus designation accordingly. Thanks for the help, A terça, 29/05/2018, 17:36, Gavin Douglas <notifications@github.com> escreveu:

…

I believe this problem is due to confusion about which sequences should be added into the tree (i.e. that it should be OTU representative sequences in this case) so I'm closing it for now. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#16 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AQAOmf6iScA_EzGuopT7c0PXbatdS8hmks5t3Xa1gaJpZM4UQUTl> .

pedres · 2018-12-21T08:22:36Z

Hi Gavin,

I am having the same problem when running picrust2. I have installed the last version and run the full pipeline command with tutorial files without problems. However, when I tried to run the pipeline with my data set the process fails. I used dada2 to process my samples (6456 ASVs in 24 samples) and run the pipeline with the following command: picrust2_pipeline.py -s ASV_raref.fa -i ASV_raref.txt -o picrust2_out_MENCIA --threads 24 -n

The error is:
Traceback (most recent call last):
File "/home/fulgencio/miniconda3/envs/picrust2/bin/picrust2_pipeline.py", line 7, in
exec(compile(f.read(), file, 'exec'))
File "/home/fulgencio/picrust2/scripts/picrust2_pipeline.py", line 227, in
main()
File "/home/fulgencio/picrust2/scripts/picrust2_pipeline.py", line 220, in main
verbose=args.verbose)
File "/home/fulgencio/picrust2/picrust2/pipeline.py", line 195, in full_pipeline
verbose=verbose)
File "/home/fulgencio/picrust2/picrust2/pipeline.py", line 357, in metagenome_pipeline_steps
output_normfile=True)
File "/home/fulgencio/picrust2/picrust2/metagenome_pipeline.py", line 64, in run_metagenome_pipeline
pred_marker)
File "/home/fulgencio/picrust2/picrust2/util.py", line 300, in three_df_index_overlap_sort
"input files.")
ValueError: No sequence ids overlap between all three of the input files.

I think that in my case representative sequences are not a problem because I used dada2 (nor de novo OTUs). I have checked that sequence IDs in fasta and table are the same. I have attached the files I used.

Thank you very much in advance.

Manuel

fasta_and_table.zip

gavinmdouglas · 2018-12-21T15:56:27Z

Hi Gavin,

I am having the same problem when running picrust2. I have installed the last version and run the full pipeline command with tutorial files without problems. However, when I tried to run the pipeline with my data set the process fails. I used dada2 to process my samples (6456 ASVs in 24 samples) and run the pipeline with the following command: picrust2_pipeline.py -s ASV_raref.fa -i ASV_raref.txt -o picrust2_out_MENCIA --threads 24 -n

The error is:
Traceback (most recent call last):
File "/home/fulgencio/miniconda3/envs/picrust2/bin/picrust2_pipeline.py", line 7, in
exec(compile(f.read(), file, 'exec'))
File "/home/fulgencio/picrust2/scripts/picrust2_pipeline.py", line 227, in
main()
File "/home/fulgencio/picrust2/scripts/picrust2_pipeline.py", line 220, in main
verbose=args.verbose)
File "/home/fulgencio/picrust2/picrust2/pipeline.py", line 195, in full_pipeline
verbose=verbose)
File "/home/fulgencio/picrust2/picrust2/pipeline.py", line 357, in metagenome_pipeline_steps
output_normfile=True)
File "/home/fulgencio/picrust2/picrust2/metagenome_pipeline.py", line 64, in run_metagenome_pipeline
pred_marker)
File "/home/fulgencio/picrust2/picrust2/util.py", line 300, in three_df_index_overlap_sort
"input files.")
ValueError: No sequence ids overlap between all three of the input files.

I think that in my case representative sequences are not a problem because I used dada2 (nor de novo OTUs). I have checked that sequence IDs in fasta and table are the same. I have attached the files I used.

Thank you very much in advance.

Manuel

fasta_and_table.zip

This is a cross-post of https://groups.google.com/forum/#!topic/picrust-users/HdZjZtYHRbQ and was resolved.

JCSzamosi · 2019-08-06T16:49:54Z

I am getting this same error, but when I look at my three input files, the sequences IDs are, in fact, identical across all three files. I've attached the three files (subsetted down to their first 9 sequences, but the error still persists)

16S_head.txt
KO_head.txt
asvtab_head.txt

here.

gavinmdouglas · 2019-08-06T17:47:49Z

Hey @JCSzamosi,

What version of PICRUSt2 are you using? I think the issue is that the ids are being interpreted as a string in one case and as integers in the other cases. This problem should be fixed in the latest release though. A quick fix should be to add a string to the beginning of each sequence id though (like "seq1" rather than "1").

JCSzamosi · 2019-08-06T18:30:58Z

Thanks so much for the quick response!

I'm using version 2.2.0_b, which I assume is the latest since it's what I got from following the instructions here.

Prepending the sequence IDs with a string has created a new error. Log and input files attached below:

16S_head.txt
asvtab_head.txt
KO_head.txt
metagenome_pipe.log

gavinmdouglas · 2019-08-06T18:41:44Z

Hmm that's annoying that the problem resurfaced in v2.2.0-b - I thought it was resolved.

Anyway I think the new issue is because of the "Consensus Lineage" column in your BIOM table - the script is expecting only numeric columns.

Gavin

JCSzamosi · 2019-08-06T18:49:11Z

Oh, hah, good catch! Thanks.

Seems to be working when I remove the consensus lineage column and use pre-pended strings. Sorry about the bug recurring :(

manterd · 2022-02-07T03:23:24Z

I too had this problem when using a mothur shared file but converting this to a biom file with mothur 'make.biom(shared=<shared_file>)' fixed the problem. Seems like the extra shared column files are the problem...

gavinmdouglas mentioned this issue May 28, 2018

error when trying to run hsp.py #11

Closed

gavinmdouglas closed this as completed May 29, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sequence ids don't overlap at metagenome pipeline step #16

Sequence ids don't overlap at metagenome pipeline step #16

gavinmdouglas commented May 28, 2018

gavinmdouglas commented May 28, 2018

gavinmdouglas commented May 29, 2018

itiago commented May 29, 2018 via email

pedres commented Dec 21, 2018

gavinmdouglas commented Dec 21, 2018

JCSzamosi commented Aug 6, 2019

gavinmdouglas commented Aug 6, 2019

JCSzamosi commented Aug 6, 2019 •

edited

Loading

gavinmdouglas commented Aug 6, 2019

JCSzamosi commented Aug 6, 2019

manterd commented Feb 7, 2022

Sequence ids don't overlap at metagenome pipeline step #16

Sequence ids don't overlap at metagenome pipeline step #16

Comments

gavinmdouglas commented May 28, 2018

gavinmdouglas commented May 28, 2018

gavinmdouglas commented May 29, 2018

itiago commented May 29, 2018 via email

pedres commented Dec 21, 2018

gavinmdouglas commented Dec 21, 2018

JCSzamosi commented Aug 6, 2019

gavinmdouglas commented Aug 6, 2019

JCSzamosi commented Aug 6, 2019 • edited Loading

gavinmdouglas commented Aug 6, 2019

JCSzamosi commented Aug 6, 2019

manterd commented Feb 7, 2022

JCSzamosi commented Aug 6, 2019 •

edited

Loading