-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
.unnest_all()
#12353
Comments
A high performance port of There should be specific logic here that does heuristic inspection on
Perhaps this behaviour could be customizable with a parameter to the eventual Oh yeah, and above all: This function should follow the standard |
This seems really useful! How about just |
Just had to use this on my project, I hope to see it merged soon |
Yeah, I just took the name to use as a placeholder. DuckDB seems to have a duckdb.sql("""
from df
select unnest(x), unnest(y)
""")
# ┌────────────────────────────┬────────────────────────────┐
# │ foo │ bar │
# │ struct(a bigint, b bigint) │ struct(a bigint, b bigint) │
# ├────────────────────────────┼────────────────────────────┤
# │ {'a': 1, 'b': 2} │ {'a': 5, 'b': 6} │
# │ {'a': 3, 'b': 4} │ {'a': 7, 'b': 8} │
# └────────────────────────────┴────────────────────────────┘ duckdb.sql("""
from df
select unnest(x, recursive := true), unnest(y, recursive := true)
""")
# ┌───────┬───────┬───────┬───────┐
# │ a │ b │ a │ b │
# │ int64 │ int64 │ int64 │ int64 │
# ├───────┼───────┼───────┼───────┤
# │ 1 │ 2 │ 5 │ 6 │
# │ 3 │ 4 │ 7 │ 8 │
# └───────┴───────┴───────┴───────┘ (although it doesn't seem possible to keep the "path") |
Worth noting that DuckDB's -- unnesting a list of lists recursively, generating 5 rows (1, 2, 3, 4, 5)
SELECT unnest([[1, 2, 3], [4, 5]], recursive := true);
-- unnesting a list of structs recursively, generating two rows of two columns (a, b)
SELECT unnest([{'a': 42, 'b': 84}, {'a': 100, 'b': NULL}], recursive := true);
-- unnesting a struct, generating two columns (a, b)
SELECT unnest({'a': [1, 2, 3], 'b': 88}, recursive := true); It would be nice to unnest a single level with plain |
Looking forward to seeing it merged! P.S. Spent a minute slightly modified it to avoid some linter warnings and add type info, pasted here in case someone needs it:
|
Thanks @cmdlineluser and @fzyzcjy for sharing, that code snippet was useful to me! Very slick. The inverse operation of "unnest_all" would also be very useful - re-nesting normalized columns based on a separator. |
from discussion:
|
It would be nice to be able to recursively unnest both lists and structs with an automatic prefix based on the column name. I've commented some functions to do this in another issue: #7078 (comment). |
Description
Requests for this functionality (or a subset of) exist across quite a few issues (and several Stack Overflow questions):
json_normalize()
from Pandas #12219DataFrame.unnest
#9790pd.json_normalize()
/ automatic flattening of nested data #7374unnest_all
has cropped up a few times, so I've just chosen that name as a placeholder.The basic use case is to allow:
My latest attempt at a Python helper for this is to walk the schema to build the expressions:
However, I think the real benefit of this functionality (and the reason for this issue) is that it allows Polars to be used for interactively exploring nested data.
an interesting example, polars expressions:
Using Polars to load "JSON" data in the REPL and interactively explore it with
.unnest_all()
and.explode()
is rather nice.A proper implementation of this would be super useful.
The text was updated successfully, but these errors were encountered: