Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Scripted aggregations and improve Field Mappings #383

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

V1NAY8
Copy link
Contributor

@V1NAY8 V1NAY8 commented Aug 24, 2021

This PR Closes #267

  • Makes field_mappings.py strictly typed.
  • Changed capability matrix to hold display_names as a column rather than index.
  • Optimized the way we query and use the mapping_capabilities to make it elegant, understandable, and easy.
  • Added Scripted aggregations on series objects along with tests.
  • There is no current change to existing tests because existing logic is just refactored, nothing is removed or left. Except for es_info where we print the mapping_capabilities.

@sethmlarson please review it. :)

Ask Jenkins to run this once

@elasticmachine
Copy link

Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually?

@sethmlarson
Copy link
Contributor

jenkins test this please

@V1NAY8
Copy link
Contributor Author

V1NAY8 commented Aug 24, 2021

Hopefully now tests should pass

@sethmlarson
Copy link
Contributor

jenkins test this please

@V1NAY8
Copy link
Contributor Author

V1NAY8 commented Sep 30, 2021

@sethmlarson Can we please work together and get this in ? Type hinting in other files is depending on this.
Sorry for the very big PR. I know its tough to review. I'm afraid for the extra work that will take to split this. If required I will split it

… and refactor/improve capability_matrix usage
Copy link
Contributor

@sethmlarson sethmlarson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this! I have some questions for you:

@@ -122,7 +123,12 @@ def terms_aggs(
}
}
"""
agg = {func: {"field": field}}
if field in self._script_fields:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this construction is repeated 4 times below can we create a private method (_get_field_definition(field)?) that contains this logic?

@@ -67,6 +68,7 @@ class Field(NamedTuple):
"""Holds all information on a particular field in the mapping"""

column: str
display_name: str
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is column different from display_name? We should probably document that somewhere in comments if we need to have both.

try:
# display_name can be None
index = self._mappings_capabilities[
(self._mappings_capabilities.display_name == display_name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What circumstances can have a display_name of None? Do we need this if we remove display_name in favor of column?

"""
Returns
-------
dtypes: pd.Series
Index: Display name
Values: pd_dtype as np.dtype
"""
pd_dtypes = self._mappings_capabilities["pd_dtype"]
pd_dtypes = self._mappings_capabilities[["display_name", "pd_dtype"]].set_index(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Help me understand why this change is necessary, I'm sure it is but I don't see it immediately.



class TestSeriesArithmetics(TestData):
def test_ecommerce_datetime_comparisons(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we losing this test case, I don't see it covered below?

by querying directly
"""
renames_df: pd.DataFrame = self._mappings_capabilities[
self._mappings_capabilities.apply(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need an .apply() call here instead of a pandas boolean filter here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support aggregations on scripted fields
3 participants