Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement DataFrameGroupBy.count() #292

Merged
merged 3 commits into from Oct 23, 2020
Merged

Conversation

V1NAY8
Copy link
Contributor

@V1NAY8 V1NAY8 commented Oct 19, 2020

Closes #289

Implemented

  • ed_df.groupby([...]).count()
  • ed_df.groupby([...]).agg(['count'])
  • ed_df.groupby([...]).agg(['max', 'min', 'count', 'mean']) maintains order
  • Fixed bug where following raises exception ed_df.agg['count']
  • Added tests for all the above functionality
  • Add hacktoberfest-accepted label

@sethmlarson Please Review :)

@elasticmachine
Copy link

Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually?

@sethmlarson
Copy link
Contributor

Jenkins test this please

Copy link
Contributor

@sethmlarson sethmlarson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woo, this is great :) Some comments for you!

eland/dataframe.py Show resolved Hide resolved
eland/field_mappings.py Outdated Show resolved Hide resolved
eland/field_mappings.py Show resolved Hide resolved
eland/operations.py Show resolved Hide resolved
eland/operations.py Outdated Show resolved Hide resolved
eland/operations.py Outdated Show resolved Hide resolved
eland/query.py Outdated Show resolved Hide resolved
eland/groupby.py Outdated Show resolved Hide resolved
eland/tests/dataframe/test_groupby_pytest.py Outdated Show resolved Hide resolved
Copy link
Contributor

@sethmlarson sethmlarson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything except the core functionality looks good, I'm going to pull locally and try things out to better review.

eland/query.py Show resolved Hide resolved
@sethmlarson
Copy link
Contributor

Pulled it locally and I think I found a way to work around the count issue by adding a fake result to each bucket that gets unpacked properly in _unpack_metric_aggs(). I've attached it, you can apply it via: git apply eland-groupby-count.txt

eland-groupby-count.txt

(It's not a .patch file because apparently GitHub doesn't allow embedding those to comments?)

@V1NAY8
Copy link
Contributor Author

V1NAY8 commented Oct 23, 2020

Wait.. What ?? We could do it that simple, Without additional logic for maintaining order`. That was brilliant. 🤯
I guess now everything is optimized..

Copy link
Contributor

@sethmlarson sethmlarson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, thanks for prettying it up! :)

@sethmlarson
Copy link
Contributor

jenkins test this please

@V1NAY8
Copy link
Contributor Author

V1NAY8 commented Oct 23, 2020

Builds looking good :)

@sethmlarson sethmlarson merged commit 475e0f4 into elastic:master Oct 23, 2020
@sethmlarson
Copy link
Contributor

Yeah went to grab coffee! ☕ Thank you for this PR :)

@V1NAY8 V1NAY8 deleted the issue/289 branch October 23, 2020 13:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement DataFrameGroupBy.count()
3 participants