Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-write metagenome_contributions.py #1

Closed
gavinmdouglas opened this issue Mar 2, 2018 · 5 comments
Closed

Re-write metagenome_contributions.py #1

gavinmdouglas opened this issue Mar 2, 2018 · 5 comments

Comments

@gavinmdouglas
Copy link
Member

metagenome_contributions.py no longer in this repository - there could be better way to output the contributions that could leverage the likelihoods outputted in R by the discrete hidden-state prediction methods.

@gavinmdouglas
Copy link
Member Author

metagenome_pipeline.py outputs functions stratified by functional contributions by default now. per_sample_functions.py is still experimental script to use probability distributions rather than discrete predictions.

@jjmmii
Copy link

jjmmii commented May 7, 2018

Hi @gavinmdouglas , I updated the local clone today and noticed that metagenome_pipeline.py is taking a long time to run (job has been running for 8 hours and still running). I guess it's because it's calculating stratified output? Is there a way to turn off this option? Thank you so much.

Best,
-Jamie

@gavinmdouglas
Copy link
Member Author

Hey Jamie,

This is definitely a problem, thanks for pointing this out. I have re-written how the stratified data is output and it is much faster now. I haven't added an option yet for non-stratified output only.

Thanks,

Gavin

@jjmmii
Copy link

jjmmii commented May 8, 2018

Thanks so much Gavin! It is blazing fast now, but I noticed there is much less number of lines in the pred_metagenome_unstrat.tsv compared to the OUT_PREFIX.genefamilies.biom.tsv in a previous version running with the same data (853 lines vs 3333 lines respectively). Also strangely, in pred_metagenome_strat.tsv, when I check which sequences are mapped to the EC's, only a few (9 out of 485 sequences) are used/output. For example:

$ cut -f2 pred_metagenome_strat.tsv | sort-uniq-count-rank
595     seq_16
549     seq_13
504     seq_11
455     seq_9
454     seq_4
454     seq_6
443     seq_2
442     seq_5
160     seq_7
1       sequence

Coincidentally these sequences are the very first ones in my data. Could this be a bug (i.e. not all output was written) or PICRUSt2 only mapped a few of my sequences to genes?

Best,
Jamie

@jjmmii
Copy link

jjmmii commented Jun 5, 2018

Just to follow-up, the problem was gone as of the latest clone of PICRUSt2 yesterday.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants