Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Access subdf in @by and @combine #360

Closed
HenriDeh opened this issue Apr 14, 2023 · 7 comments
Closed

Access subdf in @by and @combine #360

HenriDeh opened this issue Apr 14, 2023 · 7 comments

Comments

@HenriDeh
Copy link

HenriDeh commented Apr 14, 2023

Hello,

I'd like to discuss a possible new feature. Currently when using @by or @combine to groupby and combine, one can only work with the column names but not with the subdataframe. Whereas when working with the original api from DataFrames, one can do

combine(groupby(df, :col1)) do sdf
    nrow(sdf)
end

Correct me if I'm wrong but there's no equivalent to this with

@by df :col1 begin
   nrow(a_reserved_name_for_subdataframe)
end

I don't know what would be the best way to implement this.

@pdeffebach
Copy link
Collaborator

I don't think it would be possible to have something like the following

@by df :col1 begin
    :x1 # Column in df
    _subdf # reserved name for subdf
end

Since this is not possible in the src => fun => dest mini-language.

I think what you want is essentially

by(fun, df, groupcol) = combine(fun, groupby(df, groupcol))

I'm not sure this is worth it, but if so, it would belong in DataFrames.jl, not DataFramesMeta.jl, which is reserved for metaprogramming.

@HenriDeh
Copy link
Author

What about an inner macro then?

@by df :col1 begin
    @withsubdf nrow(:x1)
end

to tell the outer macro that we want the :x1 column of the subdf and not that of the main df.

@pdeffebach
Copy link
Collaborator

That's currently pretty close to what's implemented, right?

@by df :col1 begin
    :y = f(:col2)
end

@HenriDeh
Copy link
Author

This will apply f to the entire column of df, but not to the subdataframes created by groupby.
It is not equivalent to

combine(groupby(df, :col1)) do sdf
    f(sdf.col2)
end

which does apply f to the subdataframes.

@HenriDeh
Copy link
Author

Nevermind, one can actually use length(:col2) to get the size of the df. We still cannot work with the subdf, unlike the DataFrame do syntax, but the applications are not that numerous.

@pdeffebach
Copy link
Collaborator

What about

combine(gd) do sdf
    @with sdf begin
        :x1
     end
end

You have to repeat sdf twice, but you can get both the sdf and use the columns nicely.

@HenriDeh
Copy link
Author

Yes that's fairly clean. No need for a dedicated macro with this syntax in my opinion. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants