Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StringStore-related optimizations #10938

Merged

Conversation

shadeMe
Copy link
Contributor

@shadeMe shadeMe commented Jun 9, 2022

Description

  • Coerce numpy integer types to native hash/integer type to avoid comparison overhead.
  • Refactor StringStore.__getitem__/__contains__ for readability.
  • Remove broken (and presumably unused) code path from StringStore.__contains__ that called str.encode on a non-str object.

The error handling changes have been deferred to a future PR that will target the v4 branch, allowing us to break backward-compatibility.

Types of change

Performance enhancement

Checklist

  • I confirm that I have the right to submit this contribution under the project's MIT license.
  • I ran the tests, and all new and existing tests passed.
  • My changes don't require a change to the documentation, or if they do, I've added all required information.

@shadeMe shadeMe added the perf / speed Performance: speed label Jun 9, 2022
@shadeMe shadeMe changed the base branch from master to v4 June 9, 2022 10:19
@shadeMe shadeMe changed the base branch from v4 to master June 9, 2022 10:20
@shadeMe shadeMe marked this pull request as ready for review June 9, 2022 11:59
@shadeMe shadeMe requested review from polm and danieldk June 9, 2022 11:59
Copy link
Contributor

@danieldk danieldk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly nitpicking

spacy/strings.pyx Outdated Show resolved Hide resolved
spacy/strings.pyx Outdated Show resolved Hide resolved
spacy/strings.pyx Show resolved Hide resolved
spacy/strings.pyx Outdated Show resolved Hide resolved
spacy/strings.pyx Show resolved Hide resolved
@danieldk danieldk self-requested a review June 10, 2022 09:23
Copy link
Contributor

@danieldk danieldk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, but could use a review from someone who is more familiar with StringStore internals.

spacy/strings.pyx Outdated Show resolved Hide resolved
Copy link
Contributor

@polm polm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some minor comments but this basically looks good to me. You might want to change the variable name of hash since it's a builtin. (Also sorry it took me so long to get to this!)

Add comment on early return
@shadeMe shadeMe requested a review from polm June 24, 2022 10:15
Copy link
Contributor

@polm polm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@shadeMe
Copy link
Contributor Author

shadeMe commented Jun 24, 2022

@explosion-bot please test_slow

@explosion-bot
Copy link
Collaborator

explosion-bot commented Jun 24, 2022

🪁 Successfully triggered build on Buildkite

URL: https://buildkite.com/explosion-ai/spacy-slow-tests/builds/124

@shadeMe shadeMe added the enhancement Feature requests and improvements label Jun 24, 2022
@shadeMe
Copy link
Contributor Author

shadeMe commented Jun 24, 2022

@explosion-bot please test_slow_gpu

@explosion-bot
Copy link
Collaborator

explosion-bot commented Jun 24, 2022

🪁 Successfully triggered build on Buildkite

URL: https://buildkite.com/explosion-ai/spacy-slow-gpu-tests/builds/49

@adrianeboyd
Copy link
Contributor

I don't understand why the slow GPU tests succeed and the slow tests fail while building? Is there some difference in the setup that leads to this? (Accidentally not testing the right branches or something like that?)

@shadeMe
Copy link
Contributor Author

shadeMe commented Jun 27, 2022

I don't understand why the slow GPU tests succeed and the slow tests fail while building? Is there some difference in the setup that leads to this? (Accidentally not testing the right branches or something like that?)

Yeah, I find that rather curious too. The error does point to some branch mismatch between the spacy and thinc codebases, but the version numbers ostensibly seem like they ought to line up? I'll take a closer look at it.

@shadeMe
Copy link
Contributor Author

shadeMe commented Jul 4, 2022

@explosion-bot please test_slow

@explosion-bot
Copy link
Collaborator

explosion-bot commented Jul 4, 2022

🪁 Successfully triggered build on Buildkite

URL: https://buildkite.com/explosion-ai/spacy-slow-tests/builds/164

@shadeMe
Copy link
Contributor Author

shadeMe commented Jul 4, 2022

@explosion-bot please test_slow_gpu

@explosion-bot
Copy link
Collaborator

explosion-bot commented Jul 4, 2022

🪁 Successfully triggered build on Buildkite

URL: https://buildkite.com/explosion-ai/spacy-slow-gpu-tests/builds/61

@svlandeg svlandeg merged commit 59c763e into explosion:master Jul 4, 2022
@shadeMe shadeMe deleted the refactor/string-store-optimizations branch July 4, 2022 13:05
polm pushed a commit that referenced this pull request Jul 11, 2022
* `strings`: More roubust type checking of keys/IDs, coerce `int`-like types to `hash_t`

* Preserve existing public API behaviour

* Fix return type

* Replace `bool` with `bint`, rename to `_try_coerce_to_hash`, replace `id` with `hash`

* Avoid unnecessary re-encoding and re-calculation of strings and hashs respectively

* Rename variables named `hash`
Add comment on early return
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Feature requests and improvements perf / speed Performance: speed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants