Skip to content

DiskBBQ always block encode centroids #138296

@benwtrent

Description

@benwtrent

Description

We block encode centroids into blocks of 16, which is fine for when they don't also have their own parents, but continually, we end up having clusters of centroids that are less than 16, which harms query throughput significantly.

Consequently, we should always block encode centroids, even a "tail" that is less than 16 vectors in size.

In parallel, we likely should increase the block size. Bulk scoring off heap provides a significant speed up for centroid scoring (see: #138204)

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions