New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

An update to anvi'o pangenomic workflow lingo: maxbit -> minbit #581

Closed
meren opened this Issue Sep 1, 2017 · 2 comments

Comments

Projects
None yet
2 participants
@meren
Member

meren commented Sep 1, 2017

Alban Mathieu yesterday pointed a typo in the anvi'o pangenomic workflow. What we were calling maxbit (as ITEP defines it), in fact should have been minbit all this time (again, according to how we implemented ITEP's definition).

Now it is fixed in our codebase (7ea066f) and online tutorials, and have an updated help menu:

$ anvi-pan-genome -h

usage: anvi-pan-genome [-h] -g GENOMES_STORAGE [-G GENOME_NAMES]
                       [--skip-alignments] [--exclude-partial-gene-calls]
                       [--use-ncbi-blast] [--minbit MINBIT]
                       [--mcl-inflation INFLATION]
                       [--min-occurrence NUM_OCCURRENCE]
                       [--min-percent-identity PERCENT] [--sensitive]
                       [-n PROJECT_NAME] [--description TEXT_FILE]
                       [-o DIR_PATH] [-W] [-T NUM_CPUS] [--debug]
                       [--skip-hierarchical-clustering]
                       [--enforce-hierarchical-clustering]
                       [--distance DISTANCE_METRIC] [--linkage LINKAGE_METHOD]

(...)

  --minbit MINBIT       The minimum minbit value. The minbit heuristic
                        provides a mean to set a to eliminate weak matches
                        between two protein sequences. We learned it from ITEP
                        (Benedict MN et al, doi:10.1186/1471-2164-15-8), which
                        is a comprehensive analysis workflow for pangenomes,
                        and decided to use it in the anvi'o pangenomic
                        workflow, as well. Briefly, If you have two protein
                        sequences, 'A' and 'B', the minbit is defined as
                        'BITSCORE(A, B) / MIN(BITSCORE(A, A), BITSCORE(B,
                        B))'. So the minbit score between two sequences goes
                        to 1 if they are very similar over the entire length
                        of the 'shorter' protein sequence, and goes to 0 if
                        (1) they match over a very short stretch compared even
                        to the length of the shorter protein sequence or (2)
                        the match betwen sequence identity is low. The default
                        is 0.5.

(...)

Although this does not affect previous analysis results, and the definition of the parameter did not change, it is still frustrating and embarrassing, because if you are currently working on a manuscript using anvi'o v2.4.0 or earlier, and if your Methods section mentions the maxbit parameter, you may want to add a note reminding the readers that the name of this parameter has changed to minbit after anvi'o v2.4.0.

I apologize for this very much, and I promise to buy you a beer during our first meeting if this causes you any inconvenience :(

@meren meren closed this Sep 1, 2017

@AstrobioMike

This comment has been minimized.

Show comment
Hide comment
@AstrobioMike

AstrobioMike Sep 1, 2017

Contributor

More importantly, do we still get a beer from you if we've met before?

Contributor

AstrobioMike commented Sep 1, 2017

More importantly, do we still get a beer from you if we've met before?

@meren

This comment has been minimized.

Show comment
Hide comment
@meren

meren Sep 1, 2017

Member

You're asking the tough questions. FINE. YES.

Member

meren commented Sep 1, 2017

You're asking the tough questions. FINE. YES.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment