Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Predicate push down for timestamp with mongo connector #13239

Open
atulaltizon opened this issue Aug 16, 2019 · 1 comment
Open

Predicate push down for timestamp with mongo connector #13239

atulaltizon opened this issue Aug 16, 2019 · 1 comment

Comments

@atulaltizon
Copy link

I am using the mongo connector. I have a collection of documents containing a timestamp field.
Presto's schema discovery correctly describes the field's datatype as timestamp.
There is a mongo index on this field.
When I write an SQL query with a
WHERE clause that field e.g.
SELECT * from foo where ts > TIMESTAMP '2019-08-01 00:00:00.000'

It tends to do a collscan instead of an index scan.
I know that presto will do a collscan if functions are used in the where clause because the underlying datbase tech might not support that SQL function implicitly.
However, filtering based on timestamps is a basic need and typically supported by all databases.
I google for similar issues but couldn't find any answers.
If this is already supported, any idea or pointers on how to cause it to do an index scan when using timestamps in where clauses is highly appreciated.

If not can you kindly take this as an issue since this might be relevant to a lot of presto users.

Thanks in advance!
Atul

@ankoehn
Copy link

ankoehn commented May 26, 2020

I have the same issue: collection contains 40Mio data records. Query "SELECT * from foo where ts > TIMESTAMP '2020-05-12 00:00:00.000" need just to load ~40k data records form mongo. But Presto loads all the 40Mio data records and does the filter in its engine. It takes just too long (~10 minutes).

Is there a way to push down the filtering based on timestamps? @atulaltizon did you solve this issue somehow?

Update (2020-06-02): to solve this issue we changed the formats. We store a date as a string e.g. '2020-05-31' in and a timestamp in epoch time format e.g. 1591110705.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants