Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multicorn aggregation/grouping pushdown support #1

Merged
merged 8 commits into from
Dec 27, 2021

Conversation

gruuya
Copy link

@gruuya gruuya commented Dec 13, 2021

Enable aggregation/grouping support offered in Multicorn through the accompanying PR splitgraph/Multicorn#1.

  • Provides a response to can_pushdown_upperrel with relevant details so that Multicorn can decide whether and what to push to the Python side.
  • Takes care of differences in the query and responses for the cases with and without aggregations.

Here are two instructive examples of the translated aggregation queries:

  1. With GROUP BY
sgr@localhost:splitgraph> explain select column5, column4, min(column2) from es.iris group by column4, column5
+--------------------------------------------------------------------------------------------+
| QUERY PLAN                                                                                 |
|--------------------------------------------------------------------------------------------|
| Foreign Scan  (cost=1.00..1.00 rows=1 width=1)                                             |
|   Multicorn: Elasticsearch query to <Elasticsearch([{'host': 'es01-test', 'port': 9200}])> |
|   Multicorn: Query: {                                                                      |
|     "aggs": {                                                                              |
|         "group_buckets": {                                                                 |
|             "composite": {                                                                 |
|                 "sources": [                                                               |
|                     {                                                                      |
|                         "column5": {                                                       |
|                             "terms": {                                                     |
|                                 "field": "column5"                                         |
|                             }                                                              |
|                         }                                                                  |
|                     },                                                                     |
|                     {                                                                      |
|                         "column4": {                                                       |
|                             "terms": {                                                     |
|                                 "field": "column4"                                         |
|                             }                                                              |
|                         }                                                                  |
|                     }                                                                      |
|                 ],                                                                         |
|                 "size": 1000                                                               |
|             },                                                                             |
|             "aggregations": {                                                              |
|                 "min.column2": {                                                           |
|                     "min": {                                                               |
|                         "field": "column2"                                                 |
|                     }                                                                      |
|                 }                                                                          |
|             }                                                                              |
|         }                                                                                  |
|     }                                                                                      |
| }                                                                                          |
+--------------------------------------------------------------------------------------------+
  1. Without GROUP BY
sgr@localhost:splitgraph> explain analyze select sum(column4), max(column4), sum(column4), min(column2), avg(column3)  from es.iris
+--------------------------------------------------------------------------------------------+
| QUERY PLAN                                                                                 |
|--------------------------------------------------------------------------------------------|
| Foreign Scan  (cost=1.00..1.00 rows=1 width=1) (actual time=2.344..2.350 rows=1 loops=1)   |
|   Multicorn: Elasticsearch query to <Elasticsearch([{'host': 'es01-test', 'port': 9200}])> |
|   Multicorn: Query: {                                                                      |
|     "aggs": {                                                                              |
|         "sum.column4": {                                                                   |
|             "sum": {                                                                       |
|                 "field": "column4"                                                         |
|             }                                                                              |
|         },                                                                                 |
|         "max.column4": {                                                                   |
|             "max": {                                                                       |
|                 "field": "column4"                                                         |
|             }                                                                              |
|         },                                                                                 |
|         "min.column2": {                                                                   |
|             "min": {                                                                       |
|                 "field": "column2"                                                         |
|             }                                                                              |
|         },                                                                                 |
|         "avg.column3": {                                                                   |
|             "avg": {                                                                       |
|                 "field": "column3"                                                         |
|             }                                                                              |
|         }                                                                                  |
|     }                                                                                      |
| }                                                                                          |
| Planning Time: 3.123 ms                                                                    |
| Execution Time: 2.467 ms                                                                   |
+--------------------------------------------------------------------------------------------+

CU-1t1wycg

"""Convert a list of Multicorn quals to an ElasticSearch query"""
ignore_columns = ignore_columns or []

# Aggreagtion/grouping queries

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: aggregation

}
}

if aggs is not None:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we be in a situation where aggs is None and group_clauses isn't? Is it basically something like SELECT a, b, c FROM T GROUP BY a, b, c which is the same as SELECT DISTINCT a, b, c FROM T?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we can, that is a good example. Here's a concrete one:

sgr@localhost:splitgraph> explain select column5, column4 from es.iris group by column4, column5
+--------------------------------------------------------------------------------------------+
| QUERY PLAN                                                                                 |
|--------------------------------------------------------------------------------------------|
| Foreign Scan  (cost=1.00..1.00 rows=1 width=1)                                             |
|   Multicorn: Elasticsearch query to <Elasticsearch([{'host': 'es01-test', 'port': 9200}])> |
|   Multicorn: Query: {                                                                      |
|     "aggs": {                                                                              |
|         "group_buckets": {                                                                 |
|             "composite": {                                                                 |
|                 "sources": [                                                               |
|                     {                                                                      |
|                         "column5": {                                                       |
|                             "terms": {                                                     |
|                                 "field": "column5"                                         |
|                             }                                                              |
|                         }                                                                  |
|                     },                                                                     |
|                     {                                                                      |
|                         "column4": {                                                       |
|                             "terms": {                                                     |
|                                 "field": "column4"                                         |
|                             }                                                              |
|                         }                                                                  |
|                     }                                                                      |
|                 ],                                                                         |
|                 "size": 1000                                                               |
|             }                                                                              |
|         }                                                                                  |
|     }                                                                                      |
| }                                                                                          |
+--------------------------------------------------------------------------------------------+
EXPLAIN
Time: 0.012s
sgr@localhost:splitgraph> select column5, column4 from es.iris group by column4, column5
+-----------------+---------+
| column5         | column4 |
|-----------------+---------|
| Iris-setosa     | 0.1     |
| Iris-setosa     | 0.2     |
| Iris-setosa     | 0.3     |
| Iris-setosa     | 0.4     |
| Iris-setosa     | 0.5     |
| Iris-setosa     | 0.6     |
| Iris-versicolor | 1.0     |
| Iris-versicolor | 1.1     |
| Iris-versicolor | 1.2     |
| Iris-versicolor | 1.3     |
| Iris-versicolor | 1.4     |
| Iris-versicolor | 1.5     |
| Iris-versicolor | 1.6     |
| Iris-versicolor | 1.7     |
| Iris-versicolor | 1.8     |
| Iris-virginica  | 1.4     |
| Iris-virginica  | 1.5     |
| Iris-virginica  | 1.6     |
| Iris-virginica  | 1.7     |
| Iris-virginica  | 1.8     |
| Iris-virginica  | 1.9     |
| Iris-virginica  | 2.0     |
| Iris-virginica  | 2.1     |
| Iris-virginica  | 2.2     |
| Iris-virginica  | 2.3     |
| Iris-virginica  | 2.4     |
| Iris-virginica  | 2.5     |
+-----------------+---------+

@mildbyte
Copy link

mildbyte commented Dec 14, 2021

Would be nice to have some unit tests for this ES query handling (converting aggs/group_clauses into ES queries and back) + handling the EXPLAIN/get_rel_size in Multicorn as you mentioned (I see EXPLAIN works fine here?). LGTM otherwise.

UPD Dec 21 -- the tests will live in the splitgraph repo (including checking the ES queries) so this is fine

@gruuya gruuya merged commit c47afb3 into master Dec 27, 2021
@gruuya gruuya deleted the es-multicorn-agg-pushdown-poc-cu-1t1wycg branch December 27, 2021 15:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants