Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: join with list does not behave like singleton #57676

Open
3 tasks done
ilan-gold opened this issue Feb 29, 2024 · 6 comments · May be fixed by #57890
Open
3 tasks done

BUG: join with list does not behave like singleton #57676

ilan-gold opened this issue Feb 29, 2024 · 6 comments · May be fixed by #57890
Assignees
Labels
Bug Needs Triage Issue that has not been reviewed by a pandas team member

Comments

@ilan-gold
Copy link

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
import numpy as np

cat_df = pd.DataFrame({'cat': pd.Categorical(['a', 'v', 'd'])}, index=pd.Index(['a', 'b', 'c'], name='y'))
join_df = pd.DataFrame({'foo': np.arange(6)}, index = pd.MultiIndex.from_tuples([(0, 'a'), (0, 'b'), (0, 'c'), (1, 'a'), (1, 'b'), (1, 'c')], names=('x', 'y')) )
join_df.join([cat_df]) # NaNs in the `cat` column
join_df.join(cat_df) # correct

Issue Description

I would expect the result to be identical. I am really interested in being able to left join multiple cat_dfs that share (one of) an index with the join_df.

Expected Behavior

I would expect identical behavior. I did notice that with the on kwarg, I get an error that might indicate this is not allowed:

join_df.join([cat_df], on='y')

gives

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/ilangold/Projects/Theis/pandas/pandas/core/frame.py", line 10463, in join
    raise ValueError(
ValueError: Joining multiple DataFrames only supported for joining on index

Installed Versions

Not sure what's up with the installed version commit since my git log looks like

commit e14a9bd41d8cd8ac52c5c958b735623fe0eae064 (HEAD -> main, origin/main, origin/HEAD)
Author: Eric Larson <larson.eric.d@gmail.com>
Date:   Wed Feb 28 17:21:32 2024 -0500

    ENH: Report how far off the formatting is (#57667)

INSTALLED VERSIONS

commit : 52cb549
python : 3.11.6.final.0
python-bits : 64
OS : Darwin
OS-release : 22.6.0
Version : Darwin Kernel Version 22.6.0: Wed Oct 4 21:26:23 PDT 2023; root:xnu-8796.141.3.701.17~4/RELEASE_ARM64_T6000
machine : arm64
processor : arm
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.UTF-8

pandas : 3.0.0.dev0+87.g52cb549f44.dirty
numpy : 1.26.4
pytz : 2024.1
dateutil : 2.8.2
setuptools : 68.2.2
pip : 23.3.1
Cython : 3.0.8
pytest : 8.0.2
hypothesis : 6.98.13
sphinx : 7.2.6
blosc : None
feather : None
xlsxwriter : 3.2.0
lxml.etree : 5.1.0
html5lib : 1.1
pymysql : 1.4.6
psycopg2 : 2.9.9
jinja2 : 3.1.3
IPython : 8.22.1
pandas_datareader : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.12.3
bottleneck : 1.3.8
fastparquet : 2024.2.0
fsspec : 2024.2.0
gcsfs : 2024.2.0
matplotlib : 3.8.3
numba : 0.59.0
numexpr : 2.9.0
odfpy : None
openpyxl : 3.1.2
pyarrow : 15.0.0
pyreadstat : 1.2.6
python-calamine : None
pyxlsb : 1.0.10
s3fs : 2024.2.0
scipy : 1.12.0
sqlalchemy : 2.0.27
tables : 3.9.2
tabulate : 0.9.0
xarray : 2024.2.0
xlrd : 2.0.1
zstandard : 0.22.0
tzdata : 2024.1
qtpy : None
pyqt5 : None

@ilan-gold ilan-gold added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Feb 29, 2024
@Dacops
Copy link

Dacops commented Mar 17, 2024

take

@Dacops
Copy link

Dacops commented Mar 19, 2024

I believe the problem is that whenever join receives a list, that list is evaluated and then it either is concatenated or merged. With the current evaluation method [cat_df] is getting concatenated with join_df while cat_df gets merged with join_df. In my pull request I naively changed this evaluation to fix this issue but now it fails several other join tests. I'll look into the theory about concatenation vs merging in further detail and update the pull request.

Dacops added a commit to Dacops/pandas that referenced this issue Mar 20, 2024
Dacops added a commit to Dacops/pandas that referenced this issue Mar 20, 2024
@Dacops
Copy link

Dacops commented Mar 20, 2024

Figured out that the simpler way to deal with this is that whenever a list of a single element is passed, convert it into a join with another element. The operation that evaluates the boolean "can_concat" has been there for 12 years (doubt it's wrong), however there might have been an oversight for some specific cases of this uncommon practice (passing a list with a single element).

@ilan-gold
Copy link
Author

@Dacops I will say that it isn't my intention to pass a single item list, but when you are creating the lists from other things, it can happen.

@Dacops
Copy link

Dacops commented Mar 24, 2024

Yeah that makes sense, meanwhile I've sent the pull request and it passed everything in the pipeline so now it's just waiting for a developer review

@ilan-gold
Copy link
Author

Thanks so much @Dacops :)

Dacops added a commit to Dacops/pandas that referenced this issue Apr 4, 2024
Dacops added a commit to Dacops/pandas that referenced this issue Apr 21, 2024
Dacops added a commit to Dacops/pandas that referenced this issue May 1, 2024
Dacops added a commit to Dacops/pandas that referenced this issue May 2, 2024
Dacops added a commit to Dacops/pandas that referenced this issue May 2, 2024
Dacops added a commit to Dacops/pandas that referenced this issue May 7, 2024
Dacops added a commit to Dacops/pandas that referenced this issue May 8, 2024
Dacops added a commit to Dacops/pandas that referenced this issue May 8, 2024
Dacops added a commit to Dacops/pandas that referenced this issue May 26, 2024
Dacops added a commit to Dacops/pandas that referenced this issue May 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Needs Triage Issue that has not been reviewed by a pandas team member
Projects
None yet
2 participants