Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI: New Numpydev version broke ci #44343

Closed
3 tasks done
phofl opened this issue Nov 7, 2021 · 9 comments · Fixed by numpy/numpy#20331 or #44452
Closed
3 tasks done

CI: New Numpydev version broke ci #44343

phofl opened this issue Nov 7, 2021 · 9 comments · Fixed by numpy/numpy#20331 or #44452
Labels
Bug CI Continuous Integration Dependencies Required and optional dependencies

Comments

@phofl
Copy link
Member

phofl commented Nov 7, 2021

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the master branch of pandas.

Reproducible Example

Two tests are affected

Issue Description


    @pytest.mark.parametrize("quantile", [0.0, 0.1, 0.45, 0.5, 1])
    @pytest.mark.parametrize(
        "interpolation", ["linear", "lower", "higher", "nearest", "midpoint"]
    )
    @pytest.mark.parametrize(
        "data",
        [
            [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0],
            [8.0, 1.0, 3.0, 4.0, 5.0, 2.0, 6.0, 7.0],
            [0.0, np.nan, 0.2, np.nan, 0.4],
            [np.nan, np.nan, np.nan, np.nan],
            [np.nan, 0.1, np.nan, 0.3, 0.4, 0.5],
            [0.5],
            [np.nan, 0.7, 0.6],
        ],
    )
    def test_rolling_quantile_interpolation_options(quantile, interpolation, data):
        # Tests that rolling window's quantile behavior is analogous to
        # Series' quantile for each interpolation option
        s = Series(data)
    
        q1 = s.quantile(quantile, interpolation)
        q2 = s.expanding(min_periods=1).quantile(quantile, interpolation).iloc[-1]
    
        if np.isnan(q1):
            assert np.isnan(q2)
        else:
>           assert q1 == q2
E           assert 0.040000000000000036 == 0.04000000000000001

pandas/tests/window/test_rolling.py:1664: AssertionError
________ test_rolling_quantile_interpolation_options[data2-linear-0.45] ________
[gw0] linux -- Python 3.9.7 /usr/share/miniconda/envs/pandas-dev/bin/python

quantile = 0.45, interpolation = 'linear', data = [0.0, nan, 0.2, nan, 0.4]

    @pytest.mark.parametrize("quantile", [0.0, 0.1, 0.45, 0.5, 1])
    @pytest.mark.parametrize(
        "interpolation", ["linear", "lower", "higher", "nearest", "midpoint"]
    )
    @pytest.mark.parametrize(
        "data",
        [
            [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0],
            [8.0, 1.0, 3.0, 4.0, 5.0, 2.0, 6.0, 7.0],
            [0.0, np.nan, 0.2, np.nan, 0.4],
            [np.nan, np.nan, np.nan, np.nan],
            [np.nan, 0.1, np.nan, 0.3, 0.4, 0.5],
            [0.5],
            [np.nan, 0.7, 0.6],
        ],
    )
    def test_rolling_quantile_interpolation_options(quantile, interpolation, data):
        # Tests that rolling window's quantile behavior is analogous to
        # Series' quantile for each interpolation option
        s = Series(data)
    
        q1 = s.quantile(quantile, interpolation)
        q2 = s.expanding(min_periods=1).quantile(quantile, interpolation).iloc[-1]
    
        if np.isnan(q1):
            assert np.isnan(q2)
        else:
>           assert q1 == q2
E           assert 0.18000000000000005 == 0.18000000000000002

Expected Behavior

Passing

Installed Versions

INSTALLED VERSIONS

commit : 0861da6
python : 3.8.12.final.0
python-bits : 64
OS : Linux
OS-release : 5.11.0-38-generic
Version : #42~20.04.1-Ubuntu SMP Tue Sep 28 20:41:07 UTC 2021
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.4.0.dev0+1056.g0861da680b
numpy : 1.22.0.dev0+1693.gf52acb53a
pytz : 2021.1
dateutil : 2.8.2
pip : 21.2.4
setuptools : 58.0.4
Cython : 0.29.24
pytest : 6.2.5
hypothesis : 6.23.1
sphinx : 4.2.0
blosc : None
feather : None
xlsxwriter : 3.0.1
lxml.etree : 4.6.3
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 3.0.1
IPython : 7.28.0
pandas_datareader: None
bs4 : 4.10.0
bottleneck : 1.3.2
fsspec : 2021.11.0
fastparquet : 0.7.1
gcsfs : 2021.05.0
matplotlib : 3.4.3
numexpr : 2.7.3
odfpy : None
openpyxl : 3.0.9
pandas_gbq : None
pyarrow : 5.0.0
pyxlsb : None
s3fs : 2021.11.0
scipy : 1.7.1
sqlalchemy : 1.4.25
tables : 3.6.1
tabulate : 0.8.9
xarray : 0.18.2
xlrd : 2.0.1
xlwt : 1.3.0
numba : 0.53.1
None

Process finished with exit code 0

@phofl phofl added Bug Needs Triage Issue that has not been reviewed by a pandas team member CI Continuous Integration Dependencies Required and optional dependencies and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Nov 7, 2021
@jreback
Copy link
Contributor

jreback commented Nov 7, 2021

cc @seberg

@seberg
Copy link
Contributor

seberg commented Nov 7, 2021

Ping @bzah in case you have an idea. Percentile/quantile was fixed with respect to the interpolation (any input welcome, as mentioned on the mailing list I would like to rename interpolation to `method).

Not quite sure how much we should worry about it, but my guess is that this is due to rounding differences in:

    return n * quantiles + (
             alpha + quantiles * (1 - alpha - beta)
     ) - 1

behaving different for different n in the case where alpha=1 and beta=1 (the case for the default "linear" method).

I was wondering before if we should rewrite it, maybe even using _lerp, but not sure. E.g.: quantiles * (n + 1 - alpha - beta) + (alpha - 1) (not thought through.)

@bzah
Copy link

bzah commented Nov 8, 2021

Yes it really seems to come from the rounding of _compute_virtual_index which uses the formula @seberg has given.

You can see that with:

data = [0.0, np.nan, 0.2, np.nan, 0.4]

# the new formula with values replaced 
len(data) * 0.45 + (
            1 + 0.45 * (1 - 1 - 1)
    ) - 1
# Out[20]: 1.7999999999999998

0.45 * (len(data) - 1)
# Out[19]: 1.8

To avoid this issue we could use (n -1) * quantile when the method 7 (a.k.a linear) is used and the full formula for others methods. But it would make the code less readable.

@alimcmaster1
Copy link
Member

Unless a fix for this is coming soon on the numpy side? shall we add this build against numpy dev to our allowed failures - since there is quite a CI backlog.

numpy/numpy#19857

@jreback
Copy link
Contributor

jreback commented Nov 8, 2021

yeah let's either xfail that test or allow failures for now

@seberg
Copy link
Contributor

seberg commented Nov 8, 2021

Mid-term, the right thing seems to be either relax the test to allow a few ULPs of error, or modify NumPy. But I guess you are talking about short-term CI-unbreaking?

@jreback
Copy link
Contributor

jreback commented Nov 8, 2021

sure we should fix the test too
but this is making our ci red so let's fix that first

@seberg
Copy link
Contributor

seberg commented Nov 9, 2021

I merged the fixup that ensures we use the simpler (and thus more precise) formula again for the default. That should fix this issue.

A heads up though: I am hoping to rename interpolation= to method= with a DeprecationWarning. This would be very soon and should be in 1.22. If that is problematic, let us know and we could reconsider or take it slower.

@QuLogic
Copy link
Contributor

QuLogic commented Jan 16, 2022

A heads up though: I am hoping to rename interpolation= to method= with a DeprecationWarning. This would be very soon and should be in 1.22. If that is problematic, let us know and we could reconsider or take it slower.

Is this bit fixed in Pandas somewhere?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug CI Continuous Integration Dependencies Required and optional dependencies
Projects
None yet
6 participants