Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable aggregation and grouping pushdown in the engine #581

Merged
merged 18 commits into from
Dec 27, 2021

Conversation

gruuya
Copy link
Contributor

@gruuya gruuya commented Dec 15, 2021

@gruuya gruuya requested a review from mildbyte December 15, 2021 11:52
@gruuya gruuya self-assigned this Dec 15, 2021
@mildbyte
Copy link
Contributor

All existing tests pass with the new Multicorn code which is great news. Let's hold off on merging this until we have test coverage of the new groupby features though.

chown -R elasticsearch:elasticsearch /data && \
echo 'path.data: /data' >> config/elasticsearch.yml && \
echo 'discovery.type: "single-node"' >> config/elasticsearch.yml && \
echo "xpack.security.enabled: false" >> config/elasticsearch.yml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add some of these here too so that ES doesn't refuse to start with high disk usage

cluster.routing.allocation.disk.watermark.flood_stage: "99%"
cluster.routing.allocation.disk.watermark.high: "99%"

for row in result:
assert row == (gender, age)

age += 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assertion is kind of difficult to mentally unroll. I recently started using pytest-snapshot (https://pypi.org/project/pytest-snapshot/, usage example: https://github.com/splitgraph/splitgraph/blob/master/test/splitgraph/cloud/project/test_merging.py#L21) that can store expected outputs as files and then easily regenerate them, so that you can essentially approve a change to a test through version control and see what changed. I think it's worth changing some of these (especially the bigger query outputs where you're asserting the whole output) to use pytest-snapshot.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point, checking out pytest-snapshot

# DISTINCT queries are not going to be pushed down
result = get_engine().run_sql("EXPLAIN SELECT COUNT(DISTINCT city) FROM es.account")
assert len(result) > 2
assert _extract_query_from_explain(result) == _bare_sequential_scan
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idea for a test: some kind of a JOIN here, e.g. a JOIN between two aggregation results on es.account (I think it won't be pushed down, but useful to test anyway)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another idea: a nested expression, e.g. SELECT avg(age * balance) FROM es.account GROUP BY state to make sure this doesn't crash

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yet another: a window query that I think might be implemented as a sort/groupby, e.g. the balance of the oldest person in each state (haven't tested that the syntax is correct):

SELECT
	age, balance, state
	RANK () OVER (
        PARTITION BY state
		ORDER BY age DESC
	) rank_number 
FROM
	es.accounts
WHERE rank_number = 1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, good ideas!

I'll add them to the test suites.

@gruuya gruuya merged commit f41f924 into master Dec 27, 2021
@gruuya gruuya deleted the engine-aggregation-pushdown-cu-1x57q56 branch December 27, 2021 15:59
@gruuya gruuya mentioned this pull request Jan 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants