-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nested lists of lists can panic #857
Comments
This issue is a blocker for the property test PR. Should I fix it there or do it separately? |
Lists in polars are tricky:
For [[] , [[]] , [[[[]]]], [[2]] ] => dtype will be list[list[s64]]
[[[[]]]] => dtype will be list[list[list[list[null]]]] What is the rationale behind this, any clue ? |
@lkarthee Interesting I didn't realize. Is this the behavior of Rust Polars or Python Polars? |
I have tested it in py polars - from your rust example it looks like rust is no different from py. |
I ask because I'm not sure you could make it in Rust because the compiler might prevent you. Also, what's resulting series from
Otherwise the dtypes wouldn't match, right? I'll try and test these in Rust later today. |
Yeah true.
Interesting to note that first element reduced to [null] not [[]]. |
Hm yeah I would've expected |
That can potentially be a polars bug? |
Yeah it might be a bug. Although it's technically legal since every element can potentially be (But I think bug is most likely) |
This makes me believe there is a bug - a list 2 level nesting [[2]] is converted to 3 level nesting [[[2]]]. Better to log an issue with polars before attempting a fix ? It only considers dtype of first non empty element.
|
I started to make an issue, but now I think Polars is more correct than we initially thought. Consider this example: pl.Series([["a"], [1]])
# shape: (2,)
# Series: '' [list[str]]
# [
# ["a"]
# ["1"]
# ]
pl.Series([[1], ["a"]])
# shape: (2,)
# Series: '' [list[i64]]
# [
# [1]
# [null]
# ] The dtypes of these series are either However, I still can't account for some other stuff we're seeing. These cases still seem wrong even if we use the first-non-empty-element-wins rule: # Wrapping elements in additional layers of nesting to make the dtypes compatible
pl.Series([[[2]], [1, 1]])
# shape: (2,)
# Series: '' [list[list[i64]]]
# [
# [[2]]
# [[1], [1]]
# ]
# An empty list as the errant element pushes the null up one layer of nesting
pl.Series([ [[2, 2]], [["b"]] ])
# shape: (2,)
# Series: '' [list[list[i64]]]
# [
# [[2, 2]]
# [[null]]
# ]
pl.Series([ [[2, 2]], [[[]]] ])
# shape: (2,)
# Series: '' [list[list[i64]]]
# [
# [[2, 2]]
# [null]
# ] Then there's this: pl.Series([ [[2, 2]], [[2, 2]] ])
# shape: (2,)
# Series: '' [list[list[i64]]]
# [
# [[2, 2]]
# [[2, 2]]
# ]
pl.Series([ [[2, 2]], [[2, []]] ])
# shape: (2,)
# Series: '' [o][object]
# [
# [[2, 2]]
# [[2, []]]
# ] Not sure what to make of that. Does anyone know if the intended behavior is documented anywhere? I didn't see it in the Python docs, but I might've just missed it. |
You are right, that makes sense. This means we should raise on these cases, including in several of the examples below:
Should raise.
Should raise.
Should raise. You can think that
Should raise. The example you originally reported is still wrong and has to be fixed though. |
@josevalim I think Polars is making non-matching elements
I think it should yield That said, we can also chose to differ from how Python Polars is resolving these situations by raising. |
I think we should be stricter, unless we have a valid use case. Returning |
Found via:
This panics:
with:
It appears that this is a legal series:
https://www.rustexplorer.com/b/0ga2cs
The text was updated successfully, but these errors were encountered: