Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Series.map/2 with lists #835

Open
cigrainger opened this issue Jan 22, 2024 · 6 comments
Open

Support Series.map/2 with lists #835

cigrainger opened this issue Jan 22, 2024 · 6 comments

Comments

@cigrainger
Copy link
Member

It's intuitive to me to be able to do things like:

require Explorer.Series, as: S
s = S.from_list([1, 2], [3, 4], [1, 5, 7]]
S.map(s, max(_))

and get

#Explorer.Series<
  Polars[3]
  s64 [2, 4, 7]
>
@josevalim
Copy link
Member

This may be tricky because it makes max ambiguous: it is both an aggregator and an element-wise operator. We may be able to implement it, but we need to handle the differences properly inside LazySeries.

@cigrainger
Copy link
Member Author

Another option would be to just be able to give Series.max/1 and similar functions a list series directly. In that case:

iex> s = S.from_list([1, 2], [3, 4], [1, 5, 7]]
iex> S.max(s)
#Explorer.Series<
  Polars[3]
  s64 [2, 4, 7]
>

It would mean that we'd need to support the inner dtype for that series function. I know this can be tricky because it's a recursive dtype.

@josevalim
Copy link
Member

Yes, that's the same feature, because we need to keep everything consistent since max inside DF or Series both need to follow the same backend callbacks. :)

@billylanchantin
Copy link
Contributor

Not that we're looking for solutions to this particular problem, but here's a workaround in the meantime:

require Explorer.DataFrame, as: DF
require Explorer.Series, as: S

s = S.from_list([[1, 2], [3, 4], [1, 5, 7]])

DF.new(series: s)
# This part will be easier once we have `with_row_index` (or similar)
|> DF.put(:index, S.from_list([0, 1, 2], dtype: :u8))
|> DF.explode(:series)
|> DF.group_by(:index)
|> DF.summarise(agg: max(series))
|> DF.pull(:agg)
# #Explorer.Series<
#   Polars[3]
#   s64 [2, 4, 7]
# >

This strategy should work pretty generally.

@cigrainger
Copy link
Member Author

cigrainger commented Jan 22, 2024

Well that would be a pretty practical thing for row_number() as discussed in #833.

require Explorer.DataFrame, as: DF
require Explorer.Series, as: S

s = S.from_list([[1, 2], [3, 4], [1, 5, 7]])

DF.new(series: s)
# This part will be easier once we have `with_row_index` (or similar)
|> DF.mutate(index: row_number())
|> DF.explode(:series)
|> DF.group_by(:index)
|> DF.summarise(agg: max(series))
|> DF.pull(:agg)
# #Explorer.Series<
#   Polars[3]
#   s64 [2, 4, 7]
# >

Edit: I literally just read the comment that I copied and pasted. 🤦

@josevalim
Copy link
Member

Given #864, we should probably start considering adding some sort of prefix for list functions that match a series function of the same name.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants