Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: support scalar indices on nested columns #929

Open
westonpace opened this issue Feb 5, 2024 · 3 comments
Open

Feature: support scalar indices on nested columns #929

westonpace opened this issue Feb 5, 2024 · 3 comments
Labels
enhancement New feature or request

Comments

@westonpace
Copy link
Contributor

SDK

Python

Description

LanceDb has strong support for nested fields as they are quite common in ML/AI workloads. However, it is not possible today to create a scalar index on a nested field. We should add support for that.

@westonpace westonpace added the enhancement New feature or request label Feb 5, 2024
@westonpace
Copy link
Contributor Author

Inspired by #899 (comment)

@BTheunissen
Copy link

This would be great. Would PyArrow list operations such as array_has_any be able to take advantage of the scalar indices in this case?

@westonpace
Copy link
Contributor Author

By "nested fields" I was mostly thinking of struct fields. A nested list field would require some more work. The btree implementation is expecting each node to be a (value, row_address) tuple.

Once you have a list field you have more than one value per row and that assumption breaks.

I had thought about this in the past. If your value set is somewhat bounded (e.g. less than tens of thousands of possible values) then a bitmap index could be used for array_has_any.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants