BUG: metadata does not always propagate in binary operations #49916

MarcoGorelli · 2022-11-26T14:14:20Z

Pandas version checks

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

pytest pandas/tests/generic/test_finalize.py  # works fine

pytest pandas/tests/generic/test_finalize.py -n 4  # fails

Issue Description

In the second case, I get:

======================================================= FAILURES =======================================================_____________________________________________ test_binops[pow-args7-right] _____________________________________________[gw1] linux -- Python 3.8.15 /home/marcogorelli/mambaforge/envs/pandas-dev/bin/python3.8

request = <FixtureRequest for <Function test_binops[pow-args7-right]>>, args = (   A
0  1, 0    1
dtype: int64)
annotate = 'right', all_binary_operators = <built-in function pow>

    @pytest.mark.filterwarnings(
        "ignore:Automatic reindexing on DataFrame vs Series:FutureWarning"
    )
    @pytest.mark.parametrize("annotate", ["left", "right", "both"])
    @pytest.mark.parametrize(
        "args",
        [
            (1, pd.Series([1])),
            (1, pd.DataFrame({"A": [1]})),
            (pd.Series([1]), 1),
            (pd.DataFrame({"A": [1]}), 1),
            (pd.Series([1]), pd.Series([1])),
            (pd.DataFrame({"A": [1]}), pd.DataFrame({"A": [1]})),
            (pd.Series([1]), pd.DataFrame({"A": [1]})),
            (pd.DataFrame({"A": [1]}), pd.Series([1])),
        ],
    )
    def test_binops(request, args, annotate, all_binary_operators):
        # This generates 624 tests... Is that needed?
        left, right = args
        if annotate == "both" and isinstance(left, int) or isinstance(right, int):
            return

        if annotate in {"left", "both"} and not isinstance(left, int):
            left.attrs = {"a": 1}
        if annotate in {"left", "both"} and not isinstance(right, int):
            right.attrs = {"a": 1}

        result = all_binary_operators(left, right)
>       assert result.attrs == {"a": 1}
E       AssertionError: assert {} == {'a': 1}
E         Right contains 1 more item:
E         {'a': 1}
E         Use -v to get more diff

pandas/tests/generic/test_finalize.py:508: AssertionError

Has anyone else come across this?

Expected Behavior

Tests should pass whether run in parallel or not

Has anyone else come across this / can reproduce?

Installed Versions

INSTALLED VERSIONS

commit : 3c72d6f
python : 3.8.15.final.0
python-bits : 64
OS : Linux
OS-release : 5.10.102.1-microsoft-standard-WSL2
Version : #1 SMP Wed Mar 2 00:30:59 UTC 2022
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_GB.UTF-8
LOCALE : en_GB.UTF-8

pandas : 2.0.0.dev0+762.g3c72d6f1c0
numpy : 1.23.5
pytz : 2022.6
dateutil : 2.8.2
setuptools : 65.5.1
pip : 22.3.1
Cython : 0.29.32
pytest : 7.2.0
hypothesis : 6.58.1
sphinx : 4.5.0
blosc : None
feather : None
xlsxwriter : 3.0.3
lxml.etree : 4.9.1
html5lib : 1.1
pymysql : 1.0.2
psycopg2 : 2.9.3
jinja2 : 3.0.3
IPython : 8.6.0
pandas_datareader: 0.10.0
bs4 : 4.11.1
bottleneck : 1.3.5
brotli :
fastparquet : 2022.11.0
fsspec : 2021.11.0
gcsfs : 2021.11.0
matplotlib : 3.6.2
numba : 0.56.4
numexpr : 2.8.3
odfpy : None
openpyxl : 3.0.10
pandas_gbq : 0.17.9
pyarrow : 9.0.0
pyreadstat : 1.2.0
pyxlsb : 1.0.10
s3fs : 2021.11.0
scipy : 1.9.3
snappy :
sqlalchemy : 1.4.44
tables : 3.7.0
tabulate : 0.9.0
xarray : 2022.11.0
xlrd : 2.0.1
zstandard : 0.19.0
tzdata : None
qtpy : None
pyqt5 : None
None

The text was updated successfully, but these errors were encountered:

MarcoGorelli · 2022-11-26T14:21:33Z

Actually, just

pytest pandas/tests/generic/test_finalize.py::test_binops[ne-args1-right]

fails every time

MarcoGorelli · 2022-11-26T14:34:53Z

This line looks a bit odd to me:

pandas/pandas/tests/generic/test_finalize.py

Lines 504 to 505 in dbb2adc

    
           if annotate in {"left", "both"} and not isinstance(right, int): 
        
               right.attrs = {"a": 1}

if annotate is 'left', then why would right.attrs be set?

I'll take a closer look later

Looks like

pandas/pandas/core/ops/__init__.py

Lines 311 to 314 in 20172d4

    
           left, right = left.align( 
        
               right, join="outer", axis=axis, level=level, copy=False 
        
           ) 
        
           right = _maybe_align_series_as_frame(left, right, axis)

might be the issue - prior to that line, right.attrs is set, but after it, it's unset

Looks like a real issue:

$ cat f.py
import pandas as pd
from pandas import *

left = Series([1])
right = DataFrame({'a': [1]})
left.attrs = {'a': 1}
result = left + right
print(result.attrs)
$ python f.py
{}

phofl · 2022-11-26T19:53:52Z

same on my machine.

mroeschke · 2022-11-27T18:46:24Z

There might be vague connection to #34373

TomAugspurger · 2023-04-01T13:28:09Z

Just to confirm, the original test failures are fixed, and the issue is now described in #49916 (comment), about propagating attrs in binary operations?

I'd recommend we follow xarray here (docs). By default, attrs aren't propagated through binary operations:

In [1]: import xarray as xr

In [2]: a = xr.DataArray([1, 2], name='a', attrs={"foo": "bar"})

In [3]: b = xr.DataArray([1, 2], name='a')

In [4]: (a + a).attrs
Out[4]: {}

In [5]: (a + b).attrs
Out[5]: {}

But a flag

In [8]: xr.set_options(keep_attrs=True)
Out[8]: <xarray.core.options.set_options at 0x10fc70ad0>

In [9]: (a + b).attrs
Out[9]: {'foo': 'bar'}

In [10]: (a + b).attrs
Out[10]: {'foo': 'bar'}

And I believe there are other values to control how merging / conflicts are handled

In [11]: b = xr.DataArray([1, 2], name='a', attrs={"foo": "baz", "other": "other"})

In [12]: (a + b).attrs
Out[12]: {'foo': 'bar'}

An implication of this choice is that we do not propagate attrs through most operations unless explicitly flagged (some methods have a keep_attrs option, and there is a global flag, accessible with xarray.set_options(), for setting this to be always True or False). Similarly, xarray does not check for conflicts between attrs when combining arrays and datasets, unless explicitly requested with the option compat='identical'. The guiding principle is that metadata should not be allowed to get in the way.

MarcoGorelli · 2023-04-05T14:28:21Z

the original test failures are fixed

that's right! several runs of pytest pandas/tests/generic/test_finalize.py -n 4 all pass for me locally

I'd recommend we follow xarray here

Do you mean to follow them in the sense of propagating through binary operations, or adding keep_attrs?

MarcoGorelli added Bug Needs Triage Issue that has not been reviewed by a pandas team member Unreliable Test Unit tests that occasionally fail and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Nov 26, 2022

MarcoGorelli changed the title ~~BUG: pandas/tests/generic/test_finalize.py fails if run in parallel?~~ BUG: metadata does not always propagate in binary operations Nov 27, 2022

MarcoGorelli mentioned this issue Nov 27, 2022

BUG: metadata does not always propagate in binary operations #49931

Closed

5 tasks

MarcoGorelli mentioned this issue Feb 10, 2023

DEPR: __finalize__ and _metadata #51280

Open

MarcoGorelli mentioned this issue Feb 20, 2023

CI: Use pytest-randomly #51496

Closed

5 tasks

TomAugspurger mentioned this issue Apr 1, 2023

Various methods don't call call __finalize__ #28283

Open

38 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: metadata does not always propagate in binary operations #49916

BUG: metadata does not always propagate in binary operations #49916

MarcoGorelli commented Nov 26, 2022 •

edited

Loading

INSTALLED VERSIONS

MarcoGorelli commented Nov 26, 2022

MarcoGorelli commented Nov 26, 2022 •

edited

Loading

phofl commented Nov 26, 2022

mroeschke commented Nov 27, 2022

TomAugspurger commented Apr 1, 2023

MarcoGorelli commented Apr 5, 2023

BUG: metadata does not always propagate in binary operations #49916

BUG: metadata does not always propagate in binary operations #49916

Comments

MarcoGorelli commented Nov 26, 2022 • edited Loading

Pandas version checks

Reproducible Example

Issue Description

Expected Behavior

Installed Versions

INSTALLED VERSIONS

MarcoGorelli commented Nov 26, 2022

MarcoGorelli commented Nov 26, 2022 • edited Loading

phofl commented Nov 26, 2022

mroeschke commented Nov 27, 2022

TomAugspurger commented Apr 1, 2023

MarcoGorelli commented Apr 5, 2023

MarcoGorelli commented Nov 26, 2022 •

edited

Loading

MarcoGorelli commented Nov 26, 2022 •

edited

Loading