Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sequence ids don't overlap at metagenome pipeline step #16

Closed
gavinmdouglas opened this issue May 28, 2018 · 11 comments
Closed

Sequence ids don't overlap at metagenome pipeline step #16

gavinmdouglas opened this issue May 28, 2018 · 11 comments

Comments

@gavinmdouglas
Copy link
Member

Posted here from another issue from @itiago:


Hi Gavin
so, the hsp.py worked fine (thanks for that), now the problem is the next
step: Metagenome prediction.
I don't understand this step, once the biom file from mothur only provides
OTUs and relative abundance, and how it will relate to the later results
obtained from picrust2.
I did run the make.biom file from mothur and have a biom file (please
remember that I only have one sample, so I did this with a biomfile for
just one sample)
then I run the command and got this error result:
(picrust2-dev) igor@ubuntu:~/Desktop/Alfaguara/Picrust2$
metagenome_pipeline.py -i shared.0.03.biom -m
16S_predicted.tsv -f EC_predicted.tsv
-p 4 -o metagenome_prediction
Traceback (most recent call last):
File
"/home/igor/miniconda2/envs/picrust2-dev/bin/metagenome_pipeline.py", line
6, in
exec(compile(open(file).read(), file, 'exec'))
File "/home/igor/picrust2/scripts/metagenome_pipeline.py", line 76, in

main()
File "/home/igor/picrust2/scripts/metagenome_pipeline.py", line 64, in
main
output_normfile=True)
File "/home/igor/picrust2/picrust2/metagenome_pipeline.py", line 46, in
run_metagenome_pipeline
pred_marker)
File "/home/igor/picrust2/picrust2/util.py", line 246, in
three_df_index_overlap_sort
"input files.")
ValueError: No sequence ids overlap between all three of the input files.

Thanks for any help
Best
Igor

@gavinmdouglas
Copy link
Member Author

@itiago - This step takes in the relative abundances of OTUs or ASVs and multiplies the abundance of all predicted gene families by this relative abundance. It then outputs a table of function abundances for each sample (both stratified and unstratified by which sequence contributed that function). The abundance of marker genes is also used to normalize the abundances of the input OTUs/ASVs as well.

I believe the problem you're running into here is that the cluster ids in the mothur output file don't match the ids in the fasta file you placed into the tree. Is this correct? For instance, is the sequence "M01028_125_000000000-AN36D_1_1101_9843_5463" the name of an OTU in the mothur output file? If so I'm not sure why this error is coming up and it would be great if you could send me the input files you're trying to use privately.

Thanks,

Gavin

@gavinmdouglas
Copy link
Member Author

I believe this problem is due to confusion about which sequences should be added into the tree (i.e. that it should be OTU representative sequences in this case) so I'm closing it for now.

@itiago
Copy link

itiago commented May 29, 2018 via email

@pedres
Copy link

pedres commented Dec 21, 2018

Hi Gavin,

I am having the same problem when running picrust2. I have installed the last version and run the full pipeline command with tutorial files without problems. However, when I tried to run the pipeline with my data set the process fails. I used dada2 to process my samples (6456 ASVs in 24 samples) and run the pipeline with the following command: picrust2_pipeline.py -s ASV_raref.fa -i ASV_raref.txt -o picrust2_out_MENCIA --threads 24 -n

The error is:
Traceback (most recent call last):
File "/home/fulgencio/miniconda3/envs/picrust2/bin/picrust2_pipeline.py", line 7, in
exec(compile(f.read(), file, 'exec'))
File "/home/fulgencio/picrust2/scripts/picrust2_pipeline.py", line 227, in
main()
File "/home/fulgencio/picrust2/scripts/picrust2_pipeline.py", line 220, in main
verbose=args.verbose)
File "/home/fulgencio/picrust2/picrust2/pipeline.py", line 195, in full_pipeline
verbose=verbose)
File "/home/fulgencio/picrust2/picrust2/pipeline.py", line 357, in metagenome_pipeline_steps
output_normfile=True)
File "/home/fulgencio/picrust2/picrust2/metagenome_pipeline.py", line 64, in run_metagenome_pipeline
pred_marker)
File "/home/fulgencio/picrust2/picrust2/util.py", line 300, in three_df_index_overlap_sort
"input files.")
ValueError: No sequence ids overlap between all three of the input files.

I think that in my case representative sequences are not a problem because I used dada2 (nor de novo OTUs). I have checked that sequence IDs in fasta and table are the same. I have attached the files I used.

Thank you very much in advance.

Manuel

fasta_and_table.zip

@gavinmdouglas
Copy link
Member Author

Hi Gavin,

I am having the same problem when running picrust2. I have installed the last version and run the full pipeline command with tutorial files without problems. However, when I tried to run the pipeline with my data set the process fails. I used dada2 to process my samples (6456 ASVs in 24 samples) and run the pipeline with the following command: picrust2_pipeline.py -s ASV_raref.fa -i ASV_raref.txt -o picrust2_out_MENCIA --threads 24 -n

The error is:
Traceback (most recent call last):
File "/home/fulgencio/miniconda3/envs/picrust2/bin/picrust2_pipeline.py", line 7, in
exec(compile(f.read(), file, 'exec'))
File "/home/fulgencio/picrust2/scripts/picrust2_pipeline.py", line 227, in
main()
File "/home/fulgencio/picrust2/scripts/picrust2_pipeline.py", line 220, in main
verbose=args.verbose)
File "/home/fulgencio/picrust2/picrust2/pipeline.py", line 195, in full_pipeline
verbose=verbose)
File "/home/fulgencio/picrust2/picrust2/pipeline.py", line 357, in metagenome_pipeline_steps
output_normfile=True)
File "/home/fulgencio/picrust2/picrust2/metagenome_pipeline.py", line 64, in run_metagenome_pipeline
pred_marker)
File "/home/fulgencio/picrust2/picrust2/util.py", line 300, in three_df_index_overlap_sort
"input files.")
ValueError: No sequence ids overlap between all three of the input files.

I think that in my case representative sequences are not a problem because I used dada2 (nor de novo OTUs). I have checked that sequence IDs in fasta and table are the same. I have attached the files I used.

Thank you very much in advance.

Manuel

fasta_and_table.zip

This is a cross-post of https://groups.google.com/forum/#!topic/picrust-users/HdZjZtYHRbQ and was resolved.

@JCSzamosi
Copy link

I am getting this same error, but when I look at my three input files, the sequences IDs are, in fact, identical across all three files. I've attached the three files (subsetted down to their first 9 sequences, but the error still persists)

16S_head.txt
KO_head.txt
asvtab_head.txt

here.

@gavinmdouglas
Copy link
Member Author

Hey @JCSzamosi,

What version of PICRUSt2 are you using? I think the issue is that the ids are being interpreted as a string in one case and as integers in the other cases. This problem should be fixed in the latest release though. A quick fix should be to add a string to the beginning of each sequence id though (like "seq1" rather than "1").

@JCSzamosi
Copy link

JCSzamosi commented Aug 6, 2019

Thanks so much for the quick response!

I'm using version 2.2.0_b, which I assume is the latest since it's what I got from following the instructions here.

Prepending the sequence IDs with a string has created a new error. Log and input files attached below:

16S_head.txt
asvtab_head.txt
KO_head.txt
metagenome_pipe.log

@gavinmdouglas
Copy link
Member Author

Hmm that's annoying that the problem resurfaced in v2.2.0-b - I thought it was resolved.

Anyway I think the new issue is because of the "Consensus Lineage" column in your BIOM table - the script is expecting only numeric columns.

Gavin

@JCSzamosi
Copy link

Oh, hah, good catch! Thanks.

Seems to be working when I remove the consensus lineage column and use pre-pended strings. Sorry about the bug recurring :(

@manterd
Copy link

manterd commented Feb 7, 2022

I too had this problem when using a mothur shared file but converting this to a biom file with mothur 'make.biom(shared=<shared_file>)' fixed the problem. Seems like the extra shared column files are the problem...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants