Enable aggregation and grouping pushdown in the engine #581

gruuya · 2021-12-15T11:52:19Z

Bump Multicorn to new version which supports optional pushdown of aggregation/grouping, as per Implement aggregation and grouping pushdown Multicorn#1
Bump Elasticsearch fork to version which utilizes this mechanism (Multicorn aggregation/grouping pushdown support postgres-elasticsearch-fdw#1)
Add a couple of helper libs/configs to the engine debug image
Add a full end-to-end setup for aggregation pushdown, including Elasticsearch server image seeded with fixture data to act as a foreign data source in integration tests

mildbyte · 2021-12-16T12:00:17Z

All existing tests pass with the new Multicorn code which is great news. Let's hold off on merging this until we have test coverage of the new groupby features though.

mildbyte · 2021-12-21T10:02:34Z

test/architecture/src/esorigin/Dockerfile

+    chown -R elasticsearch:elasticsearch /data && \
+    echo 'path.data: /data' >> config/elasticsearch.yml && \
+    echo 'discovery.type: "single-node"' >> config/elasticsearch.yml && \
+    echo "xpack.security.enabled: false" >> config/elasticsearch.yml


I'd add some of these here too so that ES doesn't refuse to start with high disk usage

cluster.routing.allocation.disk.watermark.flood_stage: "99%" cluster.routing.allocation.disk.watermark.high: "99%"

mildbyte · 2021-12-21T10:20:45Z

test/splitgraph/commands/test_aggregation_pushdown.py

+    for row in result:
+        assert row == (gender, age)
+
+        age += 1


This assertion is kind of difficult to mentally unroll. I recently started using pytest-snapshot (https://pypi.org/project/pytest-snapshot/, usage example: https://github.com/splitgraph/splitgraph/blob/master/test/splitgraph/cloud/project/test_merging.py#L21) that can store expected outputs as files and then easily regenerate them, so that you can essentially approve a change to a test through version control and see what changed. I think it's worth changing some of these (especially the bigger query outputs where you're asserting the whole output) to use pytest-snapshot.

Fair point, checking out pytest-snapshot

mildbyte · 2021-12-21T10:22:58Z

test/splitgraph/commands/test_aggregation_pushdown.py

+    # DISTINCT queries are not going to be pushed down
+    result = get_engine().run_sql("EXPLAIN SELECT COUNT(DISTINCT city) FROM es.account")
+    assert len(result) > 2
+    assert _extract_query_from_explain(result) == _bare_sequential_scan


Idea for a test: some kind of a JOIN here, e.g. a JOIN between two aggregation results on es.account (I think it won't be pushed down, but useful to test anyway)

Another idea: a nested expression, e.g. SELECT avg(age * balance) FROM es.account GROUP BY state to make sure this doesn't crash

Yet another: a window query that I think might be implemented as a sort/groupby, e.g. the balance of the oldest person in each state (haven't tested that the syntax is correct):

SELECT age, balance, state RANK () OVER ( PARTITION BY state ORDER BY age DESC ) rank_number FROM es.accounts WHERE rank_number = 1

Thanks, good ideas!

I'll add them to the test suites.

Extend the Dockerfile for PG debug image so that it optionally compiles Postgres with Valgrind support, and changes the container entrypoint so that PG gets run under Valgrind upon starting (`use_valgrind=1`).

Include new version of Multicorn with aggregation and grouping pushdown

df6dcbc

gruuya requested a review from mildbyte December 15, 2021 11:52

gruuya self-assigned this Dec 15, 2021

gruuya added 6 commits December 16, 2021 12:07

Add ElasticSearch origin server with seed data as test fixture

46d3a95

Add basic test on actual Elasticsearch foreign server

4a1cdc2

Fix the elasticsearch test

2900047

Add aggregation test and try to see if EXPLAIN works in CI

135d20e

Get the first aggregation pushdown test working

fd388e5

Extend aggregation pushdown tests

5a82745

mildbyte force-pushed the master branch from 35f44e1 to fb8bbc6 Compare December 17, 2021 19:31

gruuya added 5 commits December 17, 2021 20:40

Add test for HAVING and ORDER BY (not) pushdown

f4038e1

Test subquery aggregation pushdown

99624af

Test aggregation of a sub-aggregation pushdown and correctness

f79eb36

Extend tests for things that will be pushed down in v2

9d1d0a8

Merge branch 'master' into engine-aggregation-pushdown-cu-1x57q56

e864e76

mildbyte reviewed Dec 21, 2021

View reviewed changes

gruuya added 6 commits December 22, 2021 09:50

Extend tests and use snapshots for storing large output assertions

f7047e5

Try to fix the build job syntax complaint

ecad180

Bump Multicorn to ref with fixed memory leak

33387be

Bump PostGIS version in the engine image

972bb66

Add Valgrind as a build option for the engine debug image

227e1ad

Extend the Dockerfile for PG debug image so that it optionally compiles Postgres with Valgrind support, and changes the container entrypoint so that PG gets run under Valgrind upon starting (`use_valgrind=1`).

Bump Multicorn and ES FDW submodule refs

bf30141

gruuya merged commit f41f924 into master Dec 27, 2021

gruuya deleted the engine-aggregation-pushdown-cu-1x57q56 branch December 27, 2021 15:59

gruuya mentioned this pull request Jan 17, 2022

Multicorn agg pushdown v2 #613

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable aggregation and grouping pushdown in the engine #581

Enable aggregation and grouping pushdown in the engine #581

gruuya commented Dec 15, 2021 •

edited

mildbyte commented Dec 16, 2021

mildbyte Dec 21, 2021

mildbyte Dec 21, 2021

gruuya Dec 21, 2021

mildbyte Dec 21, 2021

mildbyte Dec 21, 2021

mildbyte Dec 21, 2021

gruuya Dec 21, 2021

Enable aggregation and grouping pushdown in the engine #581

Enable aggregation and grouping pushdown in the engine #581

Conversation

gruuya commented Dec 15, 2021 • edited

mildbyte commented Dec 16, 2021

mildbyte Dec 21, 2021

Choose a reason for hiding this comment

mildbyte Dec 21, 2021

Choose a reason for hiding this comment

gruuya Dec 21, 2021

Choose a reason for hiding this comment

mildbyte Dec 21, 2021

Choose a reason for hiding this comment

mildbyte Dec 21, 2021

Choose a reason for hiding this comment

mildbyte Dec 21, 2021

Choose a reason for hiding this comment

gruuya Dec 21, 2021

Choose a reason for hiding this comment

gruuya commented Dec 15, 2021 •

edited