New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/speed up binary vector decoding #96716
Feature/speed up binary vector decoding #96716
Conversation
Hi @benwtrent, I've created a changelog YAML for you. |
Pinging @elastic/es-search (Team:Search) |
Baseline is current main (with Panama Vector Module)
|
…benwtrent/elasticsearch into feature/speed-up-binary-vector-decoding
@elasticmachine update branch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool change! I had a few questions, but otherwise LGTM.
@@ -64,6 +65,8 @@ | |||
* A {@link FieldMapper} for indexing a dense vector of floats. | |||
*/ | |||
public class DenseVectorFieldMapper extends FieldMapper { | |||
public static final Version MAGNITUDE_STORED_INDEX_VERSION = Version.V_7_5_0; | |||
public static final Version LITTLE_ENDIAN_FLOAT_STORED_INDEX_VERSION = Version.V_8_9_0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be the last version prior to your change's version using the new TransportVersion
constants?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jdconrad this is a transport/wire serialization thing. its an index version thing. From my understanding index versioning is different. I will see what I can find.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I asked @thecoop and me just using Version here is OK. We may need to update it to IndexVersion
depending on which commits make it in first :)
@@ -890,18 +907,18 @@ private Field parseKnnVector(DocumentParserContext context) throws IOException { | |||
private Field parseBinaryDocValuesVector(DocumentParserContext context) throws IOException { | |||
// encode array of floats as array of integers and store into buf | |||
// this code is here and not int the VectorEncoderDecoder so not to create extra arrays |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not related to your change, but would you mind fixing the not int
-> not in
?
FloatBuffer fb = ByteBuffer.wrap(vectorBR.bytes, vectorBR.offset, vectorBR.length) | ||
.order(ByteOrder.LITTLE_ENDIAN) | ||
.asFloatBuffer(); | ||
fb.get(vector); | ||
} else { | ||
ByteBuffer byteBuffer = ByteBuffer.wrap(vectorBR.bytes, vectorBR.offset, vectorBR.length); | ||
for (int dim = 0; dim < vector.length; dim++) { | ||
vector[dim] = byteBuffer.getFloat((dim * Float.BYTES) + vectorBR.offset); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could .asFloatBuffer()
not be used for both little and big endian?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.asFloatBuffer()
is marginally slower for BE
. These implementations are the fastest I could get them.
…nary-vector-decoding
…nary-vector-decoding
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. 👍
return indexVersion.onOrAfter(LITTLE_ENDIAN_FLOAT_STORED_INDEX_VERSION) | ||
? ByteBuffer.wrap(new byte[numBytes]).order(ByteOrder.LITTLE_ENDIAN) | ||
: ByteBuffer.wrap(new byte[numBytes]); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Encoding floats in little endian format provides much faster decoding. This commit takes all indices created in 8.9.0+ and stores binary vectors as little endian. closes: elastic#96710
Encoding floats in little endian format provides much faster decoding.
This commit takes all indices created in 8.9.0+ and stores binary vectors as little endian.
closes: #96710