-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Scripted aggregations and improve Field Mappings
#383
base: main
Are you sure you want to change the base?
Conversation
Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually? |
jenkins test this please |
Hopefully now tests should pass |
jenkins test this please |
c9885af
to
ff49b55
Compare
@sethmlarson Can we please work together and get this in ? Type hinting in other files is depending on this. |
… and refactor/improve capability_matrix usage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this! I have some questions for you:
@@ -122,7 +123,12 @@ def terms_aggs( | |||
} | |||
} | |||
""" | |||
agg = {func: {"field": field}} | |||
if field in self._script_fields: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this construction is repeated 4 times below can we create a private method (_get_field_definition(field)
?) that contains this logic?
@@ -67,6 +68,7 @@ class Field(NamedTuple): | |||
"""Holds all information on a particular field in the mapping""" | |||
|
|||
column: str | |||
display_name: str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is column
different from display_name
? We should probably document that somewhere in comments if we need to have both.
try: | ||
# display_name can be None | ||
index = self._mappings_capabilities[ | ||
(self._mappings_capabilities.display_name == display_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What circumstances can have a display_name
of None
? Do we need this if we remove display_name
in favor of column
?
""" | ||
Returns | ||
------- | ||
dtypes: pd.Series | ||
Index: Display name | ||
Values: pd_dtype as np.dtype | ||
""" | ||
pd_dtypes = self._mappings_capabilities["pd_dtype"] | ||
pd_dtypes = self._mappings_capabilities[["display_name", "pd_dtype"]].set_index( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Help me understand why this change is necessary, I'm sure it is but I don't see it immediately.
|
||
|
||
class TestSeriesArithmetics(TestData): | ||
def test_ecommerce_datetime_comparisons(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we losing this test case, I don't see it covered below?
by querying directly | ||
""" | ||
renames_df: pd.DataFrame = self._mappings_capabilities[ | ||
self._mappings_capabilities.apply( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need an .apply()
call here instead of a pandas boolean filter here?
This PR Closes #267
field_mappings.py
strictly typed.capability matrix
to holddisplay_names
as a column rather than index.mapping_capabilities
to make it elegant, understandable, and easy.series
objects along with tests.es_info
where we print themapping_capabilities
.@sethmlarson please review it. :)
Ask Jenkins to run this once