-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multicorn aggregation/grouping pushdown support #1
Conversation
pg_es_fdw/_es_query.py
Outdated
"""Convert a list of Multicorn quals to an ElasticSearch query""" | ||
ignore_columns = ignore_columns or [] | ||
|
||
# Aggreagtion/grouping queries |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: aggregation
} | ||
} | ||
|
||
if aggs is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we be in a situation where aggs is None and group_clauses isn't? Is it basically something like SELECT a, b, c FROM T GROUP BY a, b, c
which is the same as SELECT DISTINCT a, b, c FROM T
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we can, that is a good example. Here's a concrete one:
sgr@localhost:splitgraph> explain select column5, column4 from es.iris group by column4, column5
+--------------------------------------------------------------------------------------------+
| QUERY PLAN |
|--------------------------------------------------------------------------------------------|
| Foreign Scan (cost=1.00..1.00 rows=1 width=1) |
| Multicorn: Elasticsearch query to <Elasticsearch([{'host': 'es01-test', 'port': 9200}])> |
| Multicorn: Query: { |
| "aggs": { |
| "group_buckets": { |
| "composite": { |
| "sources": [ |
| { |
| "column5": { |
| "terms": { |
| "field": "column5" |
| } |
| } |
| }, |
| { |
| "column4": { |
| "terms": { |
| "field": "column4" |
| } |
| } |
| } |
| ], |
| "size": 1000 |
| } |
| } |
| } |
| } |
+--------------------------------------------------------------------------------------------+
EXPLAIN
Time: 0.012s
sgr@localhost:splitgraph> select column5, column4 from es.iris group by column4, column5
+-----------------+---------+
| column5 | column4 |
|-----------------+---------|
| Iris-setosa | 0.1 |
| Iris-setosa | 0.2 |
| Iris-setosa | 0.3 |
| Iris-setosa | 0.4 |
| Iris-setosa | 0.5 |
| Iris-setosa | 0.6 |
| Iris-versicolor | 1.0 |
| Iris-versicolor | 1.1 |
| Iris-versicolor | 1.2 |
| Iris-versicolor | 1.3 |
| Iris-versicolor | 1.4 |
| Iris-versicolor | 1.5 |
| Iris-versicolor | 1.6 |
| Iris-versicolor | 1.7 |
| Iris-versicolor | 1.8 |
| Iris-virginica | 1.4 |
| Iris-virginica | 1.5 |
| Iris-virginica | 1.6 |
| Iris-virginica | 1.7 |
| Iris-virginica | 1.8 |
| Iris-virginica | 1.9 |
| Iris-virginica | 2.0 |
| Iris-virginica | 2.1 |
| Iris-virginica | 2.2 |
| Iris-virginica | 2.3 |
| Iris-virginica | 2.4 |
| Iris-virginica | 2.5 |
+-----------------+---------+
Would be nice to have some unit tests for this ES query handling (converting aggs/group_clauses into ES queries and back) + UPD Dec 21 -- the tests will live in the splitgraph repo (including checking the ES queries) so this is fine |
Enable aggregation/grouping support offered in Multicorn through the accompanying PR splitgraph/Multicorn#1.
can_pushdown_upperrel
with relevant details so that Multicorn can decide whether and what to push to the Python side.Here are two instructive examples of the translated aggregation queries:
GROUP BY
GROUP BY
CU-1t1wycg