-
-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GroupBy with multiple variables #145
Comments
This repo doesn't get too many issues, so it's not a problem to post it here for now. You may get quicker responses (more eyes on the question) if you post on Julia's slack. Using your example data, I believe this does what you're trying to do: julia> stat = Group(Mean(), Variance(), Extrema(), Extrema(), Extrema());
julia> o = GroupBy(Int, stat);
julia> fit!(o, zip(x, OnlineStats.eachrow(y))) |
Thank you. In the end i might need to write a customised groupby function myself. Since my y contains invalid or missing value and i need to run fit! per each y[x] and if y[x] is not valid I skip it but continue on fit! Y[x+1]. Also here it runs zip function as well as a eachrow function. Will this create unnecessary memory allocation? I am doing this online algo with data of rows = 10million so i need to make sure each loop my memory usage is O(1) Thanks |
Also i am not able to use julia slack. My company network forbidden this website. But github my company firewall allows. |
julia> x = rand(1:10, 10^7); y = x .+ randn(10^7, 5);
julia> stat = Group(Mean(), Variance(), Extrema(), Extrema(), Extrema());
julia> o = GroupBy(Int, stat);
julia> @time fit!(o, zip(x, OnlineStats.eachrow(y)));
0.624278 seconds (10.00 M allocations: 305.176 MiB, 3.40% gc time) I don't completely follow what you're doing, but you can take a look at |
This is Not an issue but just a question on how to do things (Sorry I donno where else I should post these questions... Pls Let me know where is better to ask questions)
I saw "GroupBy" can group by only 1 stat
"Group" is a collection, that holds multiple operations applied on a vector.
My question, how can I combine the two?
Simple example: I want to group by X, at the same time, Y is a matrix, has 5 columns
I want to group by X, and show the Average ( first column of Y), Variance (2nd column of Y), Extrema(3rd column), etc....
x = rand(1:10, 100); y = x .+ randn(100, 5)
How to do that? If not possible now, I have to write new code?
thank you
The text was updated successfully, but these errors were encountered: