Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI: Test failures on numpydev #23172

Closed
TomAugspurger opened this issue Oct 15, 2018 · 12 comments
Closed

CI: Test failures on numpydev #23172

TomAugspurger opened this issue Oct 15, 2018 · 12 comments
Labels
CI Continuous Integration
Milestone

Comments

@TomAugspurger
Copy link
Contributor

e.g. https://travis-ci.org/pandas-dev/pandas/jobs/441869826

I can reproduce these locally.

I just restarted the latest build on master to ensure that they're present there.

@TomAugspurger
Copy link
Contributor Author

@TomAugspurger
Copy link
Contributor Author

From https://api.travis-ci.org/v3/job/441716261/log.txt to https://api.travis-ci.org/v3/job/441766904/log.txt, the relevant env changes are

  • Cython bumped to 0.29
  • numpy from 1.16.0.dev0+54985e3 to 1.16.0.dev0+86ebcff

pytest-xdist, certifies, and py also upgraded, but those are less likely.

@TomAugspurger
Copy link
Contributor Author

TomAugspurger commented Oct 16, 2018

I can reproduce locally with Cython 0.28.5 and NumPy@86ebcffb482afb67c2f6ec4f396d9017ea610bf1

@TomAugspurger
Copy link
Contributor Author

TomAugspurger commented Oct 16, 2018

git bisect points to numpy/numpy@607842a (cc @shoyer)

I'm just running pytest pandas/tests/test_multilevel.py::TestMultiLevel::test_frame_group_ops[True-True-1-0-skew]

@TomAugspurger
Copy link
Contributor Author

TomAugspurger commented Oct 16, 2018

Looking at the diff, nothing really jumps out. If it helps debug at all, that test is parametrized across a bunch of things. Just skew and mad fail, and I notice those are not dispatched.

@TomAugspurger
Copy link
Contributor Author

A not really minimal example of what changed is

In [22]: df = pd.DataFrame([[1, 2], [3, 4]], columns=pd.MultiIndex.from_tuples([('A', 1), ('A', 2)]))

In [23]: df.skew(level=1, axis=1)
Out[23]:
1    0   NaN
1   NaN
dtype: float64
2    0   NaN
1   NaN
dtype: float64
dtype: object

Previously that was

In [1]: import pandas as pd

In [2]: df = pd.DataFrame([[1, 2], [3, 4]], columns=pd.MultiIndex.from_tuples([('A', 1), ('A', 2)]))

In [3]: df.skew(level=1, axis=1)
Out[3]:
    1   2
0 NaN NaN
1 NaN NaN

@TomAugspurger TomAugspurger added the CI Continuous Integration label Oct 16, 2018
@TomAugspurger
Copy link
Contributor Author

@shoyer do you have any guesses about why this would have been broken by
numpy/numpy@607842a?

If not, I'll revisit this tomorrow.

@shoyer
Copy link
Member

shoyer commented Oct 18, 2018

The main thing that commit changes is that NumPy functions like np.sum() are now defined with decorators, so they point to wrappers generated with functools.wraps.

Since pandas isn't using __array_function__ yet, I have two guesses about what could be going wrong here (but neither feel especially likely):

  1. pandas does some sort of introspection on NumPy function objects that apparently no longer works.
  2. the dispatcher for some NumPy function is broken, at least when called by pandas (e.g., it's trying to iterate over some object that isn't iterable), and then pandas catches that error and creates an object with the wrong type.

Probably the best place to start would be with the definition of the skew implementation in terms of NumPy functions.

@shoyer
Copy link
Member

shoyer commented Oct 20, 2018

I think the most likely culprit for this is that these changes to NumPy break inspect.getargspec and inspect.getfullargspec. This gets called internally in pandas here, which I'm pretty sure is in the code path for these method calls.

@shoyer
Copy link
Member

shoyer commented Oct 20, 2018

Nevermind that -- my proposed NumPy patch didn't fix this failure in pandas, so probably something else is going on.

@shoyer
Copy link
Member

shoyer commented Oct 27, 2018

I figured it out! The cause is numpy/numpy#12263

After applying the NumPy fix (numpy/numpy#12280), the pandas test suite outputs the following warning:

$ pytest "pandas/tests/test_multilevel.py::TestMultiLevel::test_frame_group_ops[True
................................... -True-1-0-skew]"
================================================ test session starts =================================================
platform darwin -- Python 3.6.4, pytest-3.9.1, py-1.5.3, pluggy-0.8.0
hypothesis profile 'ci' -> timeout=5000, suppress_health_check=[HealthCheck.too_slow], database=DirectoryBasedExampleDatabase('/Users/shoyer/dev/pandas/.hypothesis/examples')
rootdir: /Users/shoyer/dev/pandas, inifile: setup.cfg
plugins: hypothesis-3.79.0
collected 1 item

pandas/tests/test_multilevel.py .                                                                              [100%]

============================================= slowest 10 test durations ==============================================
0.05s setup    pandas/tests/test_multilevel.py::TestMultiLevel::test_frame_group_ops[True-True-1-0-skew]
0.03s call     pandas/tests/test_multilevel.py::TestMultiLevel::test_frame_group_ops[True-True-1-0-skew]

(0.00 durations hidden.  Use -vv to show these durations.)
================================================== warnings summary ==================================================
/Users/shoyer/dev/pandas/pandas/core/groupby/generic.py:432: FutureWarning: arrays to stack must be passed as a sequence. Support for non-sequence iterables is deprecated as of NumPy 1.16 and will raise an error in the future. Note also that dispatch with __array_function__ is not supported when arrays are not provided as a sequence.
  stacked_values = np.vstack(map(np.asarray, values))

-- Docs: https://docs.pytest.org/en/latest/warnings.html
======================================== 1 passed, 1 warnings in 0.17 seconds ========================================

It looks like my second diagnosis above was mostly correct: the dispatcher for vstack() exhausted the generator from map, which then results in a very indirect error because pandas catches the resulting ValueError:

except (ValueError, AttributeError):
# GH1738: values is list of arrays of unequal lengths fall
# through to the outer else caluse
return Series(values, index=key_index,
name=self._selection_name)

@TomAugspurger
Copy link
Contributor Author

Wow, thanks for tracking that down :)

I'll make time next week to get us running on NumPy dev again (we fail errors from NumPy on our numpydev build).

@jreback jreback added this to the 0.24.0 milestone Nov 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI Continuous Integration
Projects
None yet
Development

No branches or pull requests

3 participants