Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Series index names not assigned to None when mismatching during addition #20507

Open
Dr-Irv opened this issue Mar 27, 2018 · 7 comments
Open
Labels
Bug Numeric Operations Arithmetic, Comparison, and Logical operations

Comments

@Dr-Irv
Copy link
Contributor

Dr-Irv commented Mar 27, 2018

Code Sample, a copy-pastable example if possible

In [1]: import pandas as pd

In [2]: s1 = pd.Series([1,2,3], index=pd.Index([1,2,3], name="T"))

In [3]: s1
Out[3]:
T
1    1
2    2
3    3
dtype: int64

In [4]: s2 = pd.Series([10,20,30], index=pd.Index([1,2,3], name="Time"))

In [5]: s2
Out[5]:
Time
1    10
2    20
3    30
dtype: int64

In [6]: s1.align(s2)
Out[6]:
(T
 1    1
 2    2
 3    3
 dtype: int64, Time
 1    10
 2    20
 3    30
 dtype: int64)

In [7]: s3 = pd.Series([1,2,4], index=pd.Index([1,2,4], name="T"))

In [8]: s3
Out[8]:
T
1    1
2    2
4    4
dtype: int64

In [9]: s3.align(s2)
Out[9]:
(1    1.0
 2    2.0
 3    NaN
 4    4.0
 dtype: float64, 1    10.0
 2    20.0
 3    30.0
 4     NaN
 dtype: float64)

In [10]: s4 = pd.Series([10,20,30], index=pd.Index([1,2,3], name="T"))

In [11]: s4
Out[11]:
T
1    10
2    20
3    30
dtype: int64

In [12]: s3.align(s4)
Out[12]:
(T
 1    1.0
 2    2.0
 3    NaN
 4    4.0
 dtype: float64, T
 1    10.0
 2    20.0
 3    30.0
 4     NaN
 dtype: float64)

In [15]: s1 + s2
Out[15]:
T
1    11
2    22
3    33
dtype: int64

In [16]: s2 + s1
Out[16]:
Time
1    11
2    22
3    33
dtype: int64

Problem description

It's not clear if this is a bug or a feature, and if the latter, then the behavior should be documented.

In the first example s1.align(s2), the two series have the same index values, but the names of the respective index for each series is different. In this case, s1.align(s2) returns the two series, preserving the names of the respective indexes.

In the second example, s3.align(s2), the two series have different index values, and the names of the respective index for each series is different. In this case, s3.align(s2) returns the two series, but the names of the indices have disappeared.

In the third example, s3.align(s4), the two series have different index values, and the names of the respective index for each series is the same. In this case, s3.align(s4) returns the two series, and the names of each index are the same name.

So in the first two cases, the names are different, but in the first case, the names are preserved, while in the second case, the names are lost.

In the last two cases, the index values are the different, but the names are preserved if they are the same, otherwise they are lost.

Finally, the last 2 examples, involving addition show some asymmetries due to this issue with Series.align() with respect to name handling when adding two series with different names. In this case, s1+s2 and s2+s1 have indices with different names, which seems a bit odd.

Expected Output

This is not clear to me. Either

  1. If the names are different, then return no names on the corresponding indexes
  2. If the names are different, then raise an Exception

Output of pd.show_versions()

INSTALLED VERSIONS

commit: 6c1ab7f
python: 3.6.4.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.23.0.dev0+685.g6c1ab7f2c
pytest: 3.3.2
pip: 9.0.1
setuptools: 38.4.0
Cython: 0.27.3
numpy: 1.14.0
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: 1.6.6
patsy: 0.5.0
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: 2.1.2
openpyxl: 2.4.10
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.1
pymysql: 0.7.11.None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@jreback
Copy link
Contributor

jreback commented Mar 30, 2018

actually what .align is doing is just fine here, its just doing a joint reindex. The problem is actually in the arithmetic ops which are not using our pattern of preserving Index names (when they match or turning to None when they don't).

cc @jbrockmendel

@jreback jreback added Reshaping Concat, Merge/Join, Stack/Unstack, Explode API Design Numeric Operations Arithmetic, Comparison, and Logical operations Difficulty Intermediate labels Mar 30, 2018
@jreback jreback added this to the Next Major Release milestone Mar 30, 2018
@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Mar 30, 2018

@jreback So why not have .align have the same behavior for the names as the arithmetic ops?

@jreback
Copy link
Contributor

jreback commented Mar 30, 2018

because you are returning both objects

@mroeschke mroeschke added Bug and removed API Design Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Jun 19, 2021
@mroeschke mroeschke changed the title API: Inconsistent treatment of index names in Series.align() BUG: Series names not assigned to None when mismatching during addition Jun 19, 2021
@jbrockmendel
Copy link
Member

is there anything left to do here?

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Nov 22, 2021

is there anything left to do here?

I think we need to decide what behavior we want when doing an arithmetic operation on two series with different names, in the case where the indices are aligned, or not aligned. See [15] and [16] in the original example.

@jbrockmendel
Copy link
Member

IIUC the issue isn't result.name but result.index.name not being commutative?

@Dr-Irv Dr-Irv changed the title BUG: Series names not assigned to None when mismatching during addition BUG: Series index names not assigned to None when mismatching during addition Nov 22, 2021
@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Nov 22, 2021

IIUC the issue isn't result.name but result.index.name not being commutative?

Yes, that is correct.

@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

No branches or pull requests

4 participants