add DNA/protein check on first 100 bp of FASTA files #1037
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
One of the most dangerous flaws in our protein hash calculations is that sourmash doesn't do any sequence type checks: you need to explicitly specify
--input-is-protein
for amino acid inputs, and there is no error checking on that, e.g. see #999 (comment).En route to bigger changes in the way we do things per
#999, this adds checks for
compute
to verify proper--input-is-protein
behavior.This does NOT deal with API-level issues like
add_sequence
andadd_protein
, this is just about command-line signature compute.Example output
and
make test
Did it pass the tests?make coverage
Is the new code covered?without a major version increment. Changing file formats also requires a
major version number increment.
changes were made?