String Arithmetics: add ops #68

blaklaybul · 2019-11-26T13:48:54Z

Addresses #65 by adding support for __add__ operation between:

a str and an eland.Series
an eland.Series and a str
2 eland.Series

__mult__ was intentionally not implemented, as this could result in some unwanted behavior when the numeric operand is a large value. E.g. df['customer_first_name'] * 10000. We could include a value check here, but any number I included seemed arbitrary. Happy to discuss further.

stevedodson · 2019-11-26T14:40:17Z

👀

stevedodson

Looks pretty good!

However, it's a bit messy to do this at the moment, and I'd advise addressing some of the comments, then we can refactor to cleanup.

stevedodson · 2019-11-26T14:50:53Z

eland/operations.py

-                  "script": {
-                    "source": "doc[left_field].value / doc[right_field].value"
-                   }
+        if not field_name.endswith("||str") and not field_name.startswith("str||"):


Instead of the check on field_name, it would be neater to add an 'op_type' to the task. i.e.
# task = ('arithmetic_op_fields', (field_name, (op_name, (left_field, right_field)))) is
# task = ('arithmetic_op_fields', (field_name, (op_name, op_type)(left_field, right_field)))
or similar. Then op_type could be compared rather than field_name string match (magic match..)

Soon we should also refactor operations.py so tasks are subclass of a task class (using Bridge pattern or other). For now, if op_type is added to the arithmetic_op_fields task object, it will move this forward.

agree that this could use a refactor. The task items will quickly become unmanageable as we add more ops. We should consider using dicts or named tuples as items.

implemented in 5307850 for string types

stevedodson · 2019-11-26T19:03:54Z

eland/series.py

+                new_field_name, op_method_name, left, self.name))
+
+            # name of Series remains original name
+            series.name = self.name.replace('.keyword', '')


Removing '.keyword' from the name may not always give expected behaviour. For instance in the pathological case where 'keyword' is embedded in the field name:

"C" : { "properties" : { "keyword" : { "properties" : { "str" : { "type" : "keyword" } } } } },``` A better way to do this is to call `Mappings.aggregatable_field_names` to map the requested field name to the aggregatable field name (remember to catch the possible ValueError). In the case of non-aggregatable fields we can still do this using `_source` in the script. This is inefficient and I'd leave it to a separate issue.

5307850 right now, removing the last occurrence of .keyword in aggregatable field using for scripting. This should ensure that no embedded occurrences get removed for fields ending in .keyword. Will update to check mappings for aggregatable fields

stevedodson · 2019-11-26T19:05:20Z

eland/tests/series/test_str_arithmetics_pytest.py

+        edadd = self.ed_ecommerce()['customer_first_name'] + self.ed_ecommerce()['customer_last_name']
+        pdadd = self.pd_ecommerce()['customer_first_name'] + self.pd_ecommerce()['customer_last_name']
+
+        assert_pandas_eland_series_equal(pdadd, edadd, check_less_precise=True)


remove check_less_precise=True in all this.

done 5307850

stevedodson · 2019-11-26T19:06:23Z

eland/series.py

@@ -503,6 +503,20 @@ def __add__(self, right):
        3    176.979996
        4     82.980003
        dtype: float64
+        >>> df.customer_first_name + df.customer_last_name


can we capture the behaviour on df.customer_first_name + " " + df.customer_last_name and raise a separate issue if this fails.

raised in #69

stevedodson · 2019-11-26T19:07:30Z

eland/series.py

+                return series
+
+            elif self._dtype == 'object' and right._dtype == 'object':
+                new_field_name = "str||{0}_{1}_{2}||str".format(self.name, method_name, right.name)


if we don't rely on the field name of op_type then we don't need to name it like this.

no longer using variants of str|| , implemented op_type checking 5307850

stevedodson · 2019-11-27T12:49:04Z

Looks good. We need some refactoring as the complexity increases, but best to merge this first.

stevedodson

LGTM!

blaklaybul added 2 commits November 26, 2019 08:40

adds support for __add__ ops for string objects and literals

c937531

adds tests for string arithmetic

94f8183

blaklaybul added enhancement New feature or request topic:series Issue or PR about eland.Series labels Nov 26, 2019

blaklaybul requested a review from stevedodson November 26, 2019 13:48

blaklaybul self-assigned this Nov 26, 2019

updates comment in numeric field resolution

7f24e39

stevedodson requested changes Nov 26, 2019

View reviewed changes

adds op_type parameter for numeric_ops

5307850

stevedodson approved these changes Nov 27, 2019

View reviewed changes

Merge branch 'master' into string-arithmetics

e12e1ac

blaklaybul merged commit a3dd860 into elastic:master Nov 27, 2019

This was referenced Nov 27, 2019

Better Handling of Non-Aggregatable Fields in eland/mappings #71

Closed

Add support for arithmetic operations on string Series #65

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

String Arithmetics: add ops #68

String Arithmetics: add ops #68

blaklaybul commented Nov 26, 2019

stevedodson commented Nov 26, 2019

stevedodson left a comment •

edited

Loading

stevedodson Nov 26, 2019

stevedodson Nov 26, 2019

blaklaybul Nov 26, 2019

blaklaybul Nov 26, 2019

stevedodson Nov 26, 2019

blaklaybul Nov 26, 2019

stevedodson Nov 26, 2019

blaklaybul Nov 26, 2019

stevedodson Nov 26, 2019

blaklaybul Nov 26, 2019

stevedodson Nov 26, 2019

blaklaybul Nov 26, 2019

stevedodson commented Nov 27, 2019

stevedodson left a comment

String Arithmetics: __add__ ops #68

String Arithmetics: __add__ ops #68

Conversation

blaklaybul commented Nov 26, 2019

stevedodson commented Nov 26, 2019

stevedodson left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stevedodson commented Nov 27, 2019

stevedodson left a comment

Choose a reason for hiding this comment

String Arithmetics: add ops #68

String Arithmetics: add ops #68

stevedodson left a comment •

edited

Loading