Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bsub: command not found #1

Closed
frederic-mahe opened this issue Sep 9, 2016 · 0 comments
Closed

bsub: command not found #1

frederic-mahe opened this issue Sep 9, 2016 · 0 comments

Comments

@frederic-mahe
Copy link
Owner

To speed up taxonomic assignment, the STAMPA pipeline described on that repository splits the input dataset in small chunks and spread the computation load using the LSF scheduler (with the bsub command). If you don't have access to a cluster of computers with LSF installed, you can run the analysis linearly (i.e. multithreaded, not parallelized), using the commands below:

# variables
QUERY="representatives.fas"
DATABASE="V4_references.fas"
THREADS=8

# search for best hits
vsearch \
    --usearch_global ${QUERY} \
    --threads ${THREADS} \
    --dbmask none \
    --qmask none \
    --rowlen 0 \
    --notrunclabels \
    --userfields query+id1+target \
    --maxaccepts 0 \
    --maxrejects 32 \
    --top_hits_only \
    --output_no_hits \
    --db ${DATABASE} \
    --id 0.5 \
    --iddef 1 \
    --userout - | sed 's/;size=/_/ ; s/;//' > hits.representatives

# in case of multi-best hit, find the last-common ancestor
python stampa_merge.py $(pwd)

# sort by decreasing abundance
sort -k2,2nr -k1,1d results.representatives > representatives.results
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant