-
Notifications
You must be signed in to change notification settings - Fork 25.6k
[DiskBBQ] Add bulk scoring for int7 centroid scoring #138204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DiskBBQ] Add bulk scoring for int7 centroid scoring #138204
Conversation
|
Pinging @elastic/es-search-relevance (Team:Search Relevance) |
|
Hi @benwtrent, I've created a changelog YAML for you. |
| if (dims > DOT7U_STRIDE_BYTES_LEN) { | ||
| for (size_t c = 0; c < count; c++) { | ||
| int i = 0; | ||
| i += dims & ~(DOT7U_STRIDE_BYTES_LEN - 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: this could be extracted out of the loop. I'd expect a decent optimizer to do that, but you never know :)
ldematte
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me; of course we'd need building and publishing the native lib and update it in libs/native/libraries/build.gradle
I think what we did in the past was to raise a separate PR with the native code changes (basically, the libs/simdec/native files), get it approved/merged, publish the library, and then in a separate PR do the java changes/gradle version bump.
|
An alternative is to do everything within the same PR:
Pro of this approach is that the changes to the native lib are immediately tested in the Java layer. Let me know if you prefer this approach, and I can take care of steps 1-3 for you (or give you more detailed instructions) |
…t7-centroid-scoring
…enwtrent/elasticsearch into feature/add-bulk-int7-centroid-scoring
I think a separate PR is fine for now (I can open that). But this does show that we just need a nicer process (possibly a tag or CI job?), that will publish and use the binaries, etc but not impact main until things are merged. |
|
native PR: #138239 |
|
I love this, that should leverage the capacity of bulk scoring in DiskBBQ. I am +1 here but I let Lorenzo reviewing the native stuff, otherwise LGTM. |
|
Very nice! This type of bulk - scoring against every vector in a contiguous block - is good for this particular use case (no issue with this). I mention this as I did think that prefetching could help, but I image that it would help more for the Lucene bulkScore - scoring against a given set of ordinals in the dataset. A quick check on my mac laptop shows just a little improvement!? |
|
I can add |
|
Prefectching is a strange beast :) I think that prefetching could help in cases where we don't scan the 2 "matrices" (array of vectors) linearly, but using an array of offsets. But even in that case I would benchmark it, as I'm not sure it would help a lot if the access is really sparse/random. Or if it's better to "hint" the compiler/processor to prefetch, e.g. by manually unrolling part of the loop (so that the processor "knows" to go and fetch some data while it's still processing the current vector). |
To be clear: this should be exactly what the lines suggested by @benwtrent would do, but sometimes the internal prefetch logic "knows better" and makes better decisions if left alone (e.g. if it actually wants to prefetch or not to avoid trashing). The only way to know is to benchmark IMO |
|
I will happily leave it for now :D. Its fun that we already get such great results just with this simple change. |
ChrisHegarty
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This provides a minor but measurable speed improvement.
JMH shows a nicer story:
this PR
baseline:
I will need help rebuilding and publishing the binaries :)