Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(array): implement min, max, any, all, sum, mean #9704

Merged
merged 31 commits into from
Jul 30, 2024

Conversation

NickCrews
Copy link
Contributor

@NickCrews NickCrews commented Jul 27, 2024

fixes #7073

Supported backends:

  • duckdb
  • polars
  • clickhouse
  • bigquery
  • snowflake
  • postgres
  • risingwave
  • trino
  • pyspark

ibis/expr/types/arrays.py Outdated Show resolved Hide resolved
ibis/expr/types/arrays.py Outdated Show resolved Hide resolved
ibis/expr/types/arrays.py Outdated Show resolved Hide resolved
ibis/expr/types/arrays.py Outdated Show resolved Hide resolved
ibis/expr/types/arrays.py Outdated Show resolved Hide resolved
ibis/expr/types/arrays.py Outdated Show resolved Hide resolved
ibis/expr/types/arrays.py Outdated Show resolved Hide resolved
ibis/expr/types/arrays.py Outdated Show resolved Hide resolved
ibis/expr/types/arrays.py Outdated Show resolved Hide resolved
Copy link
Contributor Author

@NickCrews NickCrews left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for implementing the other back ends! I can't get trini running on my dev machine so debugging the failing tests is gonna be hard, if you are able to fix that it would be awesome!

ibis/backends/sql/compilers/pyspark.py Outdated Show resolved Hide resolved
ibis/expr/types/arrays.py Outdated Show resolved Hide resolved
@cpcloud
Copy link
Member

cpcloud commented Jul 27, 2024

Thanks for implementing the other back ends! I can't get trini running on my dev machine so debugging the failing tests is gonna be hard, if you are able to fix that it would be awesome!

If you view a scalar as a one row, one column table, does that clarify why it makes sense as an operation on a "scalar"?

@NickCrews
Copy link
Contributor Author

That wasn't my mental model for a scalar, but with that model then yes this does make sense. This makes me want to write/edit something on the concepts section of the docs that explains this. Since I've been using ibis quite a lot but still never grokked that, it makes me think this isn't advertised enough

Ok I think mins() is probably the best we can do! I'll push this change.

@NickCrews NickCrews added the docs-preview Add this label to trigger a docs preview label Jul 27, 2024
@NickCrews
Copy link
Contributor Author

How do I get codespell to ignore the looks-like-a-misspelling of alls?

@ibis-docs-bot ibis-docs-bot bot removed the docs-preview Add this label to trigger a docs preview label Jul 27, 2024
@ibis-docs-bot
Copy link

ibis-docs-bot bot commented Jul 27, 2024

@NickCrews
Copy link
Contributor Author

I need to fix the see also links, the urls are wrong

@cpcloud
Copy link
Member

cpcloud commented Jul 28, 2024

Regarding Polars, there's some incorrect behavior of any and all, so I'll leave those out for now. I've created a ticket upstream (pola-rs/polars#17917).

@cpcloud
Copy link
Member

cpcloud commented Jul 28, 2024

Actually, we can workaround the polars issue (which isn't an upstream bug, it turns out).

@cpcloud cpcloud added the feature Features or general enhancements label Jul 28, 2024
@cpcloud
Copy link
Member

cpcloud commented Jul 28, 2024

Ok, I was able to get Trino passing, as well as Snowflake and BigQuery.

To ensure that BigQuery works I parameterized the data argument to include two cases: one for nulls and one for non-nulls. These tests also run for every backend, so we can have some confidence that the no-nulls cases are behaving as expected.

@cpcloud
Copy link
Member

cpcloud commented Jul 28, 2024

To be crystal clear here, the current implementation ensures that the semantics of the array aggregations match the behavior of the corresponding columnar aggregations. IMO this the right choice to ensure that reasoning about the two cases' aggregation requires only one conceptual understanding for both "plural" aggregates as well as columnar ones.

@cpcloud cpcloud added the ci-run-cloud Add this label to trigger a run of Bigquery and Snowflake in CI label Jul 28, 2024
@cpcloud cpcloud marked this pull request as ready for review July 28, 2024 12:19
@ibis-docs-bot ibis-docs-bot bot removed the ci-run-cloud Add this label to trigger a run of Bigquery and Snowflake in CI label Jul 28, 2024
@cpcloud
Copy link
Member

cpcloud commented Jul 28, 2024

Also got postgres and risingwave working, using the same strategy as bigquery, which is this:

SELECT (SELECT AGG(el) FROM UNNEST(array_column))
FROM t

@cpcloud cpcloud added sql Backends that generate SQL ci-run-cloud Add this label to trigger a run of Bigquery and Snowflake in CI labels Jul 28, 2024
@ibis-docs-bot ibis-docs-bot bot removed the ci-run-cloud Add this label to trigger a run of Bigquery and Snowflake in CI label Jul 28, 2024
@cpcloud cpcloud merged commit 793efbc into ibis-project:main Jul 30, 2024
95 checks passed
@cpcloud cpcloud added this to the 9.3 milestone Aug 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Features or general enhancements sql Backends that generate SQL
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feat: Array.sum(), .min(), .max(), mean(), etc.
2 participants