Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pd.merge for outer with empty dataframe on the left side results in merged dataframe with the key column not being first #9937

Closed
kundapur opened this issue Apr 19, 2015 · 1 comment · Fixed by #55028
Labels
Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode

Comments

@kundapur
Copy link

Hi,

I am new to Pandas and like it a lot. I noticed that while doing an outer merge if the dataframe on the left side is empty, then the key column in the merged dataframe is not the first.

First the summary and then my code, print output, and pd.show_versions is included below. Consider 2 dataframes left and right. The left data frame has 2 columns - key and lval. The right data frame has 2 columns - key and rval. Then consider 4 situations below.

  1. Both left and right dataframes are non-empty
  2. Left is non-empty and right is empty
  3. Left is empty and right is non-empty
  4. Both left and right dataframes are empty

In each of the situations, do an outer merge on the key column. The merged data frame has the columns in the following order:
In (1) and (2), the merged columns are key, lval, rval -> Good and as I expected
In (3) and (4), the merged columns are lval, key, rval -> Not as I expected. Is this expected? I'd expect the merged columns to be key, lval, rval. Is there a way to force this? I could do a workaround by checking the shape of the left data frame but I am wondering if I am doing something wrong or there's something wrong.

Code

import pandas as pd
import numpy as np

def print_matrices():
    print 'left'
    print left
    print 'right'
    print right
    print 'merged'
    print merged


print 'pandas show_versions'
pd.show_versions(as_json=False)

print '\n\nAn outer join'
left = pd.DataFrame({'key': ['foo1', 'foo2'], 'lval': [1, 2]})
right = pd.DataFrame({'key': ['foo1', 'foo3'], 'rval': [4, 5]})
merged = pd.merge(left, right, on='key', how='outer')
print_matrices()

print '\n\nAn outer join with right empty dataframe'
left = pd.DataFrame({'key': ['foo1', 'foo2'], 'lval': [1, 2]})
right = pd.DataFrame({'key': [], 'rval': []})
merged = pd.merge(left, right, on='key', how='outer')
print_matrices()

print '\n\nAn outer join with left empty dataframe'
left = pd.DataFrame({'key': [], 'lval': []})
right = pd.DataFrame({'key': ['foo1', 'foo3'], 'rval': [4, 5]})
merged = pd.merge(left, right, on='key', how='outer')
print_matrices()

print '\n\nAn outer join with both left and right empty dataframe'
left = pd.DataFrame({'key': [], 'lval': []})
right = pd.DataFrame({'key': [], 'rval': []})
merged = pd.merge(left, right, on='key', how='outer')
print_matrices()

Output:

An outer join

left
    key  lval
0  foo1     1
1  foo2     2
right
    key  rval
0  foo1     4
1  foo3     5
merged
    key  lval  rval
0  foo1     1     4
1  foo2     2   NaN
2  foo3   NaN     5

An outer join with right empty dataframe

left
    key  lval
0  foo1     1
1  foo2     2
right
Empty DataFrame
Columns: [key, rval]
Index: []
merged
    key  lval  rval
0  foo1     1   NaN
1  foo2     2   NaN

An outer join with left empty dataframe

left
Empty DataFrame
Columns: [key, lval]
Index: []
right
    key  rval
0  foo1     4
1  foo3     5
merged
   lval   key  rval
0   NaN  foo1     4
1   NaN  foo3     5

An outer join with two empty dataframe

left
Empty DataFrame
Columns: [key, lval]
Index: []
right
Empty DataFrame
Columns: [key, rval]
Index: []
merged
Empty DataFrame
Columns: [lval, key, rval]
Index: []

pandas show_versions

INSTALLED VERSIONS

commit: None
python: 2.7.5.final.0
python-bits: 64
OS: Darwin
OS-release: 13.4.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.16.0
nose: None
Cython: None
numpy: 1.9.2
scipy: None
statsmodels: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.4.1
pytz: 2014.10
bottleneck: None
tables: None
numexpr: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: 0.9
apiclient: 1.4.0
sqlalchemy: None
pymysql: None
psycopg2: None

@mroeschke
Copy link
Member

We do documents

outer: use union of keys from both frames, similar to a SQL full outer join; sort keys lexicographically

which is not preserved when the left frame contains an empty column

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants