Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add mode to dataframe and series #323

Merged
merged 6 commits into from Jan 7, 2021
Merged

Conversation

V1NAY8
Copy link
Contributor

@V1NAY8 V1NAY8 commented Nov 12, 2020

Closes #215

  • Added mode to dataframe and series

  • Added tests, documentation

  • Currently mode isn't supported by pandas groupby too, Hence added a NotImplementedError

  • To satisfy mypy I had to add type hints to some other methods too.

  • Implemented es_size parameter instead of dumping the entire cluster if we have multiple mode values.

  • Currently dropna=True is only supported because I have a query on missing parameter of terms aggregation
    Reference

@sethmlarson Please review 馃槂

@elasticmachine
Copy link

Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually?

Copy link
Contributor

@sethmlarson sethmlarson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really great, here are some comments for you!

eland/field_mappings.py Outdated Show resolved Hide resolved
eland/operations.py Show resolved Hide resolved
eland/operations.py Outdated Show resolved Hide resolved
eland/operations.py Show resolved Hide resolved
eland/operations.py Show resolved Hide resolved
eland/tests/dataframe/test_metrics_pytest.py Outdated Show resolved Hide resolved
@V1NAY8
Copy link
Contributor Author

V1NAY8 commented Nov 17, 2020

@sethmlarson If you're happy with the change, I will rebase my branch and resolve merge conflicts :)

@sethmlarson
Copy link
Contributor

jenkins test this please

Copy link
Contributor

@sethmlarson sethmlarson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for letting this go so long without a final review, I'm pretty sure this is ready but want to run it against the new pandas v1.2.0, can you rebase?

eland/field_mappings.py Outdated Show resolved Hide resolved
@V1NAY8
Copy link
Contributor Author

V1NAY8 commented Dec 31, 2020

I forgot to include recursive-include eland in manifest.in. So, I included it. Otherwise the files won't come to release packages.

@V1NAY8
Copy link
Contributor Author

V1NAY8 commented Dec 31, 2020

@sethmlarson Ask jenkins to test this.

@sethmlarson
Copy link
Contributor

jenkins test this please

@V1NAY8
Copy link
Contributor Author

V1NAY8 commented Jan 4, 2021

CI is good 馃敟

Copy link
Contributor

@sethmlarson sethmlarson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is almost done! Here's a couple more comments:

MANIFEST.in Outdated Show resolved Hide resolved
eland/dataframe.py Outdated Show resolved Hide resolved
eland/operations.py Show resolved Hide resolved
eland/operations.py Outdated Show resolved Hide resolved
tests/dataframe/test_metrics_pytest.py Outdated Show resolved Hide resolved
tests/series/test_metrics_pytest.py Outdated Show resolved Hide resolved
@V1NAY8
Copy link
Contributor Author

V1NAY8 commented Jan 5, 2021

I had to make one small change to tests because
Even though the values are same, dtypes are being differed. I am hoping this shouldn't be a thing.
But dtype checks for other es_size are working.

>>> pd_testing.mode()[:1] 
  Cancelled  dayOfWeek           timestamp DestCountry FlightNum
0     False          0 2018-01-09 07:54:19          IT   00882F6
>>> ed_testing.mode(es_size=1) 
   Cancelled  dayOfWeek           timestamp DestCountry FlightNum
0      False          0 2018-01-09 07:54:19          IT   00882F6
assert_frame_equal(ed_testing.mode(es_size=1),pd_testing.mode()[:1])
AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="Cancelled") are different

[left]:  bool
[right]: object

Copy link
Contributor

@sethmlarson sethmlarson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one more comment, otherwise looks good to me!

@sethmlarson
Copy link
Contributor

jenkins test this please

@V1NAY8
Copy link
Contributor Author

V1NAY8 commented Jan 6, 2021

Added those files.

Copy link
Contributor

@sethmlarson sethmlarson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

@sethmlarson sethmlarson merged commit 421d84f into elastic:master Jan 7, 2021
@V1NAY8 V1NAY8 deleted the add-mode branch January 7, 2021 18:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement [DataFrame, Series].mode aggregation via top terms
3 participants