Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add functions var, std #175

Merged

Conversation

mesejo
Copy link
Contributor

@mesejo mesejo commented Apr 10, 2020

This add the functions var and std. It mirrors the same implementation of the other metric aggregations.
The changes were made in _metric_aggs, for two reasons:

  1. To keep all logic related to the aggregating functions in the same place.
  2. extended_stats are metric aggregations in Elasticsearch API.

Closes #171

@elasticmachine
Copy link

Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually?

@sethmlarson
Copy link
Contributor

jenkins test this please

Copy link
Contributor

@sethmlarson sethmlarson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a code perspective this looks good. We also need to add mad and other aggregations too.

I'm also seeing an opportunity to refactor the .mean(), .var(), etc functions to use .aggregate() so we don't have to maintain both routes of applying these aggregations.

@mesejo
Copy link
Contributor Author

mesejo commented Apr 12, 2020

@sethmlarson Thanks for the comments, I agree with your perspective. Also I'm pretty positive there is a bug with median:

>>> df = ed.DataFrame(es, 'flights')
>>> df[['AvgTicketPrice']].agg(['median'])

throws this,

Traceback (most recent call last):
File "/home/daniel/PycharmProjects/eland/venv/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3331, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "", line 1, in
df[['AvgTicketPrice']].agg(['median'])
File "/home/daniel/PycharmProjects/eland/venv/lib/python3.6/site-packages/eland-7.6.0a4-py3.6.egg/eland/dataframe.py", line 1316, in aggregate
return self._query_compiler.aggs(func)
File "/home/daniel/PycharmProjects/eland/venv/lib/python3.6/site-packages/eland-7.6.0a4-py3.6.egg/eland/query_compiler.py", line 458, in aggs
return self.operations.aggs(self, func)
File "/home/daniel/PycharmProjects/eland/venv/lib/python3.6/site-packages/eland-7.6.0a4-py3.6.egg/eland/operations.py", line 513, in aggs
response["aggregations"][es_agg[0] + "
" + field][es_agg[1]]
KeyError: '50.0'

The call is made in the same way as with extended_stats (a tuple), but is a call to percentiles and do not match the Elasticsearch API.

@sethmlarson
Copy link
Contributor

.median() probably needs the same fix I made to aggregate("mean")! You have to unpack an additional JSON object for the values within a quantile aggregation. Can you fix that too?

Even more reason to consolidate the implementations. :) That can be done in a separate PR though. Thanks for finding all these issues.

@mesejo
Copy link
Contributor Author

mesejo commented Apr 12, 2020

I think is better to do it in a separate PR

@sethmlarson
Copy link
Contributor

sethmlarson commented Apr 12, 2020

Okay, the v8.0.0 failures are unrelated. Merging this now! :) Thanks much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create all aggregation helpers like .var(), .std()
3 participants