-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Series.filter
should work inside DataFrame.summarise
#927
Comments
I am not sure I agree. Wouldn’t that be the same as a DF.filter before hand? In any case, we should at least improve the error message. :) |
In this case it'd be the same, but mine is just a minimal example. The original example from elixirforum isn't equivalent. I don't see why we shouldn't support it. But if we can't for some reason, then definitely an improved error message is the way to go. |
The group_by makes DF.filter not entirely viable without backfilling some column values after the fact. For example, currently our approach looks like this. in the future, we will also have 3 more of these aggregations I have to get the distinct values of the I believe that filtering a series inside summarise would make that really what i want to do for each column of interest inside the group is "give me the first not nil value or if the series only has nil, then 'none'." sim_idx = data_frame |> DataFrame.distinct([:sim_idx])
data_frame =
any_data_frame
|> DataFrame.mutate(
any_id:
if result in ["one", "two", "three", "four"] do
person_id
else
nil
end
)
|> DataFrame.drop_nil([:any_id])
|> DataFrame.group_by(["sim_idx"])
|> DataFrame.summarise(any: first(any_id))
|> DataFrame.join(sim_idx, on: [:sim_idx], how: :right)
two_data_frame =
data_frame
|> DataFrame.mutate(
two_id:
if result == "two" do
person_id
else
nil
end
)
|> DataFrame.drop_nil([:two_id])
|> DataFrame.group_by(["sim_idx"])
|> DataFrame.summarise(two: first(two_id))
|> DataFrame.join(sim_idx, on: [:sim_idx], how: :right)
DataFrame.join(any_data_frame, two_data_frame, on: [:sim_idx])
|> DataFrame.mutate(
any: fill_missing(any, "none"),
two: fill_missing(two, "none")
) I might be misunderstanding, but the dplyr docs seems to imply that their API can do grouped filtering: https://dplyr.tidyverse.org/articles/grouping.html?q=summ#filter |
but, as I send that, I see that DF.filter works with groups... which is what i think Jose was saying. let me try that out 🤦 |
Yeah so that method can work, but seems like my previous workaround just rearranged. I think the key thing that that the call to DF.summarise after the call to DF.filter will not summarise any grouped values if they were filtered out. |
Originally noted here:
Example:
yields:
The text was updated successfully, but these errors were encountered: