Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

index:: do not write output if no signatures to write? #207

Closed
bluegenes opened this issue Feb 10, 2024 · 4 comments · Fixed by #282
Closed

index:: do not write output if no signatures to write? #207

bluegenes opened this issue Feb 10, 2024 · 4 comments · Fixed by #282

Comments

@bluegenes
Copy link
Contributor

bluegenes commented Feb 10, 2024

As of #134, we write empty index files in cases where there were no signatures to load. This matches sourmash underlying sourmash index functionality, do we want to keep it, or write no file?

As of #197, I raise an error for this situation. Other than indexing, I standardized most code to exit gracefully when no signatures are loaded from either query or search files.

@bluegenes
Copy link
Contributor Author

bluegenes commented Mar 7, 2024

In #197, I introduced a bool param for each function, allow_failed_sigpaths, which is used within utils::report_on_collection_loading to decide whether or not to allow loading failures. I set that to false for index, so that all signatures must load properly in order to index.

Where this fix does not solve indexing woes, is when all signatures load correctly, but do not match desired parameters. This yields an empty collection (still a valid collection), which is then passed to Revindex for indexing.

in utils::report_on_collection_loading, we do check for an empty collection, but currently, we do the following:

if collection.is_empty() {
    eprintln!("No {} signatures loaded, exiting.", report_type);
    return Ok(());
}

This means we actually would need to check for empty collection within each function, as I've proposed in #267 for index. Also, it's actually a bit confusing, because we write exiting, but don't initiate exit.

I'm thinking I should perhaps return an error from report_on_collection_loading instead, which would cause all of the functions to exit based on an empty collection. This would mean they error out when unable to load any signatures for either 'query', 'against', or general 'siglist'.

We would need to handle this differently if we allow reading multiple collections in -- we would only want to exit if we have 0 signatures to use.

@ctb
Copy link
Collaborator

ctb commented Mar 7, 2024

hah! we confronted a similar challenge a few years back in sourmash - sourmash-bio/sourmash#1426. At least this reminded me of that.

@bluegenes
Copy link
Contributor Author

hah! we confronted a similar challenge a few years back in sourmash - sourmash-bio/sourmash#1426. At least this reminded me of that.
...so we did!

so do you have a vote on erroring out when we have 0 sigs to search?

I'm thinking I should perhaps return an error from report_on_collection_loading instead, which would cause all of the functions to exit based on an empty collection. This would mean they error out when unable to load any signatures for either 'query', 'against', or general 'siglist'.

@ctb
Copy link
Collaborator

ctb commented Mar 8, 2024

I'm a fan of erroring out if there are 0 sketches to use in any situation.

bluegenes added a commit that referenced this issue Mar 20, 2024
Error out when 0 signatures are loaded from a collection. 

Since we currently only load a single collection as our query or database, 0 sigs in either means that no useful results would be obtained.

- Fixes #207
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants