Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nested lists of lists can panic #857

Closed
billylanchantin opened this issue Feb 12, 2024 · 14 comments
Closed

Nested lists of lists can panic #857

billylanchantin opened this issue Feb 12, 2024 · 14 comments
Labels
kind:bug Something isn't working

Comments

@billylanchantin
Copy link
Contributor

Found via:

This panics:

test "list of list of lists of integers with two levels of empty lists" do
  values = [[], [[]], [[1]]]
  series = Series.from_list(values)

  assert series.dtype == {:list, {:list, {:s, 64}}}
  assert Series.to_list(series) == values
end

with:

  1) test from_list/2 list of list of lists of integers with two levels of empty lists (Explorer.Series.ListTest)
     test/explorer/series/list_test.exs:122
     ** (ErlangError) Erlang error: :nif_panicked
     code: series = Series.from_list(values)
     stacktrace:
       (explorer 0.9.0-dev) Explorer.PolarsBackend.Native.s_from_list_of_series("", [#Explorer.PolarsBackend.Series<
    shape: (0,)
    Series: '' [list[null]]
    [
    ]
  >, #Explorer.PolarsBackend.Series<
    shape: (1,)
    Series: '' [list[i64]]
    [
        []
    ]
  >, #Explorer.PolarsBackend.Series<
    shape: (1,)
    Series: '' [list[i64]]
    [
        [1]
    ]
  >])
       (explorer 0.9.0-dev) lib/explorer/polars_backend/series.ex:24: Explorer.PolarsBackend.Series.from_list/2
       (explorer 0.9.0-dev) lib/explorer/series.ex:418: Explorer.Series.from_list/2
       test/explorer/series/list_test.exs:124: (test)

It appears that this is a legal series:

https://www.rustexplorer.com/b/0ga2cs

shape: (3,)
Series: 's' [list[list[i32]]]
[
	[]
	[[]]
	[[1]]
]
@billylanchantin billylanchantin added the kind:bug Something isn't working label Feb 12, 2024
@billylanchantin
Copy link
Contributor Author

This issue is a blocker for the property test PR. Should I fix it there or do it separately?

@lkarthee
Copy link
Member

lkarthee commented Feb 12, 2024

Lists in polars are tricky:

  • If it is empty it does not matter how many levels of nested list. It takes the dtype of non empty list element

For [[] , [[]] , [[[[]]]], [[2]] ] => dtype will be list[list[s64]]

  • if all elements are empty in list it results in a different dtype - dtype of nested list having highest levels.

[[[[]]]] => dtype will be list[list[list[list[null]]]]

What is the rationale behind this, any clue ?

@billylanchantin
Copy link
Contributor Author

@lkarthee Interesting I didn't realize. Is this the behavior of Rust Polars or Python Polars?

@lkarthee
Copy link
Member

I have tested it in py polars - from your rust example it looks like rust is no different from py.

@billylanchantin
Copy link
Contributor Author

I ask because I'm not sure you could make it in Rust because the compiler might prevent you.

Also, what's resulting series from [[] , [[]] , [[[[]]]], [[2]]]? I'd assume this:

[[] , [[]] , [[]], [[2]]] 
             ^^^^
             Inner nesting removed

Otherwise the dtypes wouldn't match, right?

I'll try and test these in Rust later today.

@lkarthee
Copy link
Member

lkarthee commented Feb 12, 2024

Yeah true.

s = pl.Series([[[[[[[]]]]]], [], [[]]])
shape: (3,)
Series: '' [list[list[list[list[list[list[null]]]]]]]
[
	[[[[[[]]]]]]
	[]
	[[]]
]
s = pl.Series([[[[[[[]]]]]], [], [[]], [[]], [[0], [], []]])
shape: (5,)
Series: '' [list[list[i64]]]
[
	[null]
	[]
	[[]]
	[[]]
	[[0], [], []]
]

Interesting to note that first element reduced to [null] not [[]].

@billylanchantin
Copy link
Contributor Author

Interesting to note that first element reduced to [null] not [[]].

Hm yeah I would've expected [[]] or [[null]], not that.

@josevalim
Copy link
Member

Hm yeah I would've expected [[]] or [[null]], not that.

That can potentially be a polars bug?

@billylanchantin
Copy link
Contributor Author

Yeah it might be a bug. Although it's technically legal since every element can potentially be null. Who knows, maybe it's actually the correct way to do it for some non-obvious reason 🤷

(But I think bug is most likely)

@lkarthee
Copy link
Member

lkarthee commented Feb 13, 2024

s = pl.Series([[], [[[3]]], [[2], [2]]])
shape: (3,)
Series: '' [list[list[list[i64]]]]
[
	[]
	[[[3]]]
	[[[2]], [[2]]]
]

This makes me believe there is a bug - a list 2 level nesting [[2]] is converted to 3 level nesting [[[2]]]. Better to log an issue with polars before attempting a fix ? It only considers dtype of first non empty element.

s = pl.Series([[], [3], [[2], [2]]])
shape: (3,)
Series: '' [list[i64]]
[
	[]
	[3]
	[null, null]
]

@billylanchantin
Copy link
Contributor Author

I started to make an issue, but now I think Polars is more correct than we initially thought. Consider this example:

pl.Series([["a"], [1]])
# shape: (2,)
# Series: '' [list[str]]
# [
#         ["a"]
#         ["1"]
# ]

pl.Series([[1], ["a"]])
# shape: (2,)
# Series: '' [list[i64]]
# [
#         [1]
#         [null]
# ]

The dtypes of these series are either list[str] or list[i64]. Polars needs a tie-breaker, so letting the first element decide seems like a sensible choice. I think that's what's happening with some of our examples.

However, I still can't account for some other stuff we're seeing. These cases still seem wrong even if we use the first-non-empty-element-wins rule:

# Wrapping elements in additional layers of nesting to make the dtypes compatible
pl.Series([[[2]], [1, 1]])
# shape: (2,)
# Series: '' [list[list[i64]]]
# [
#         [[2]]
#         [[1], [1]]
# ]

# An empty list as the errant element pushes the null up one layer of nesting
pl.Series([ [[2, 2]], [["b"]] ])
# shape: (2,)
# Series: '' [list[list[i64]]]
# [
#         [[2, 2]]
#         [[null]]
# ]

pl.Series([ [[2, 2]], [[[]]] ])
# shape: (2,)
# Series: '' [list[list[i64]]]
# [
#         [[2, 2]]
#         [null]
# ]

Then there's this:

pl.Series([ [[2, 2]], [[2, 2]] ])
# shape: (2,)
# Series: '' [list[list[i64]]]
# [
#         [[2, 2]]
#         [[2, 2]]
# ]

pl.Series([ [[2, 2]], [[2, []]] ])
# shape: (2,)
# Series: '' [o][object]
# [
#         [[2, 2]]
#         [[2, []]]
# ]

Not sure what to make of that. Does anyone know if the intended behavior is documented anywhere? I didn't see it in the Python docs, but I might've just missed it.

@josevalim
Copy link
Member

The dtypes of these series are either list[str] or list[i64]. Polars needs a tie-breaker, so letting the first element decide seems like a sensible choice. I think that's what's happening with some of our examples.

You are right, that makes sense. This means we should raise on these cases, including in several of the examples below:

pl.Series([[[2]], [1, 1]])

Should raise.

pl.Series([ [[2, 2]], [["b"]] ])

Should raise.

pl.Series([ [[2, 2]], [[[]]] ])

Should raise. You can think that [[[]]] means it is at least a three element deep list, which violates the first value.

pl.Series([ [[2, 2]], [[2, []]] ])

Should raise.


The example you originally reported is still wrong and has to be fixed though. [[], [[]], [[1]]] means that the 1+ level list, then 2+ level list, and officially confirmed with a {:list, {:list, :s64}} list. I can try to work on a PR, since I wrote the current version of the code.

@billylanchantin
Copy link
Contributor Author

@josevalim I think Polars is making non-matching elements null instead of raising. So for:

pl.Series([[[2]], [1, 1]])

Should raise.

I think it should yield [[[2]], [null, null]] instead since each 1 it encounters ought to be a list. I have no idea why they decided to wrap each 1 to yield [[[2]], [[1], [1]]].

That said, we can also chose to differ from how Python Polars is resolving these situations by raising.

@josevalim
Copy link
Member

I think we should be stricter, unless we have a valid use case. Returning nil or wrapping are both undesired IMO.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants