Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pandas 0.25.2 causes regressions in eland testsuite #72

Closed
Winterflower opened this issue Nov 27, 2019 · 1 comment
Closed

Pandas 0.25.2 causes regressions in eland testsuite #72

Winterflower opened this issue Nov 27, 2019 · 1 comment
Labels
bug Something isn't working

Comments

@Winterflower
Copy link
Contributor

In order to support Python 3.8, we will need to upgrade eland 's Pandas dependency to 0.25.2 from the current 0.25.1.
Keeping the dependency as 0.25.1 and using eland with Python 3.8 causes the following test failure

======================================================================================= FAILURES ========================================================================================
_________________________________________________________________________ TestDataFrameQuery.test_simple_query __________________________________________________________________________
self = <eland.tests.dataframe.test_query_pytest.TestDataFrameQuery object at 0x7febc580af70>
    def test_simple_query(self):
        ed_flights = self.ed_flights()
        pd_flights = self.pd_flights()
>       assert pd_flights.query('FlightDelayMin > 60').shape == \
               ed_flights.query('FlightDelayMin > 60').shape
eland/tests/dataframe/test_query_pytest.py:50: 


self = <pandas.core.computation.expr.PandasExprVisitor object at 0x7febc582c130>, node = <_ast.Constant object at 0x7febc582cd90>, kwargs = {'side': 'right'}, method = 'visit_Constant'
visitor = <bound method NodeVisitor.visit_Constant of <pandas.core.computation.expr.PandasExprVisitor object at 0x7febc582c130>>
    def visit(self, node, **kwargs):
        if isinstance(node, str):
            clean = self.preparser(node)
            try:
                node = ast.fix_missing_locations(ast.parse(clean))
            except SyntaxError as e:
                from keyword import iskeyword
                if any(iskeyword(x) for x in clean.split()):
                    e.msg = "Python keyword not valid identifier" " in numexpr query"
                raise e
        method = "visit_" + node.__class__.__name__
        visitor = getattr(self, method)
>       return visitor(node, **kwargs)
E       TypeError: visit_Constant() got an unexpected keyword argument 'side'
/usr/local/lib/python3.8/site-packages/pandas/core/computation/expr.py:441: TypeError

@blaklaybul found these issues which correspond to a pandas bug, which is fixed in pandas 0.25.2
pandas-dev/pandas#27261
pandas-dev/pandas#28101

After upgrading the pandas dependency to 0.25.2, the eland testsuite started failing in the following tests

=================================== FAILURES ===================================
__________________ TestDataFrameRepr.test_num_rows_repr_html ___________________
self = <eland.tests.dataframe.test_repr_pytest.TestDataFrameRepr object at 0x7f9da80ff3a0>
    def test_num_rows_repr_html(self):
        # check setup works
        assert pd.get_option('display.max_rows') == 60
        show_dimensions = pd.get_option('display.show_dimensions')
        # TODO - there is a bug in 'show_dimensions' as it gets added after the last </div>
        # For now test without this
        pd.set_option('display.show_dimensions', False)
        # Test eland.DataFrame.to_string vs pandas.DataFrame.to_string
        # In pandas calling 'to_string' without max_rows set, will dump ALL rows
        # Test n-1, n, n+1 for edge cases
>       self.num_rows_repr_html(pd.get_option('display.max_rows')-1)
eland/tests/dataframe/test_repr_pytest.py:166: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
self = <eland.tests.dataframe.test_repr_pytest.TestDataFrameRepr object at 0x7f9da80ff3a0>
rows = 59, max_rows = None
    def num_rows_repr_html(self, rows, max_rows=None):
        ed_flights = self.ed_flights()
        pd_flights = self.pd_flights()
        ed_head = ed_flights.head(rows)
        pd_head = pd_flights.head(rows)
        ed_head_str = ed_head._repr_html_()
        pd_head_str = pd_head._repr_html_()
        #print(ed_head_str)
        #print(pd_head_str)
>       assert pd_head_str == ed_head_str
E       AssertionError: assert '<div>\n<styl...able>\n</div>' == '<div>\n<styl...able>\n</div>'
E         Skipping 496 identical leading characters in diff, use -v to show
E         Skipping 146 identical trailing characters in diff, use -v to show
E           >
E         -       <th>0</th>
E         ?         ^     ^
E         +       <td>0</td>
E         ?         ^     ^...
E         
E         ...Full output truncated (640 lines hidden), use '-vv' to show
eland/tests/dataframe/test_repr_pytest.py:186: AssertionError
___________________________ TestSeriesRepr.test_repr ___________________________
self = <eland.tests.series.test_repr_pytest.TestSeriesRepr object at 0x7f9da80fe880>
    def test_repr(self):
        pd_s = self.pd_flights()['Carrier']
        ed_s = ed.Series(ELASTICSEARCH_HOST, FLIGHTS_INDEX_NAME, 'Carrier')
        pd_repr = repr(pd_s)
        ed_repr = repr(ed_s)
>       assert pd_repr == ed_repr
E       AssertionError: assert '0         Ki...dtype: object' == '0         Ki...dtype: object'
E         Skipping 291 identical leading characters in diff, use -v to show
E         -  Carrier, dtype: object
E         +  Carrier, Length: 13059, dtype: object
E         ?           +++++++++++++++
eland/tests/series/test_repr_pytest.py:17: AssertionError
======================== 2 failed, 74 passed in 22.68s =========================
@Winterflower Winterflower added the bug Something isn't working label Nov 27, 2019
@stevedodson
Copy link
Contributor

Hopefully resolved by #91

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants