Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build Scanner with nested column projection and limit / offset push down. #61

Merged
merged 17 commits into from
Jul 31, 2022

Conversation

eddyxu
Copy link
Contributor

@eddyxu eddyxu commented Jul 29, 2022

No description provided.

@eddyxu eddyxu self-assigned this Jul 29, 2022
@eddyxu eddyxu added the c++ C++ issues label Jul 29, 2022
@eddyxu eddyxu requested a review from changhiskhan July 31, 2022 04:50
@eddyxu eddyxu marked this pull request as ready for review July 31, 2022 04:51
@eddyxu
Copy link
Contributor Author

eddyxu commented Jul 31, 2022

A code snippet to run

import time

import duckdb
import lance

ds = lance.dataset("s3://eto-ops-testing/coco.lance")

start = time.time()
scan = lance.scanner(ds, columns=["annotations.label"], limit=10)
print(duckdb.query(
    "SELECT annotations.label, count(1) FROM (SELECT UNNEST(annotations) as annotations FROM scan) GROUP BY 1"))
end = time.time()
print(f"Query time: {end - start}")

builder.get().Project([tobytes(c) for c in columns])
if filter is not None:
builder.get().Filter(_bind(filter, dataset.schema()))
if limit is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Silly edge case but offset is ignored if limit isn't specified. May want to document that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SQL standard does not support offset w/o LIMIT IIRC.

@eddyxu eddyxu merged commit 2586532 into main Jul 31, 2022
@eddyxu eddyxu deleted the lei/py_filter branch July 31, 2022 07:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ C++ issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants