-
-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
StringStore
-related optimizations
#10938
StringStore
-related optimizations
#10938
Conversation
…types to `hash_t`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly nitpicking
…`id` with `hash`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, but could use a review from someone who is more familiar with StringStore
internals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some minor comments but this basically looks good to me. You might want to change the variable name of hash
since it's a builtin. (Also sorry it took me so long to get to this!)
Add comment on early return
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
@explosion-bot please test_slow |
URL: https://buildkite.com/explosion-ai/spacy-slow-tests/builds/124 |
…ing-store-optimizations
@explosion-bot please test_slow_gpu |
URL: https://buildkite.com/explosion-ai/spacy-slow-gpu-tests/builds/49 |
I don't understand why the slow GPU tests succeed and the slow tests fail while building? Is there some difference in the setup that leads to this? (Accidentally not testing the right branches or something like that?) |
Yeah, I find that rather curious too. The error does point to some branch mismatch between the spacy and thinc codebases, but the version numbers ostensibly seem like they ought to line up? I'll take a closer look at it. |
…ing-store-optimizations
@explosion-bot please test_slow |
URL: https://buildkite.com/explosion-ai/spacy-slow-tests/builds/164 |
@explosion-bot please test_slow_gpu |
URL: https://buildkite.com/explosion-ai/spacy-slow-gpu-tests/builds/61 |
* `strings`: More roubust type checking of keys/IDs, coerce `int`-like types to `hash_t` * Preserve existing public API behaviour * Fix return type * Replace `bool` with `bint`, rename to `_try_coerce_to_hash`, replace `id` with `hash` * Avoid unnecessary re-encoding and re-calculation of strings and hashs respectively * Rename variables named `hash` Add comment on early return
Description
numpy
integer types to native hash/integer type to avoid comparison overhead.StringStore.__getitem__/__contains__
for readability.StringStore.__contains__
that calledstr.encode
on a non-str
object.The error handling changes have been deferred to a future PR that will target the
v4
branch, allowing us to break backward-compatibility.Types of change
Performance enhancement
Checklist