-
Notifications
You must be signed in to change notification settings - Fork 193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: allow fixed size tensors to be used as query vectors #1736
feat: allow fixed size tensors to be used as query vectors #1736
Conversation
QueryVectorLike = Union[ | ||
pd.Series, | ||
pa.Array, | ||
pa.Scalar, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can we use scalar here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's to handle FixedSizeListScalar (and fixed shape tensor scalar) Pretty often I find myself doing something like:
queries = dataset.take(random_indices)
for query in queries:
# Here `query` is a FixedSizeListScalar
...
|
||
column_field = self.ds.schema.field(column) | ||
column_type = column_field.type | ||
if hasattr(column_type, "storage_type"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still need to check storage type for pyarrow > 12?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not for fixed shape tensor array. Once we are using pyarrow 13 we can directly check for FixedShapeTensorArray
. However, another advantage of using storage_type
is that we are future proofing ourselves against other possible extension types.
No description provided.