You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am new to Pandas and like it a lot. I noticed that while doing an outer merge if the dataframe on the left side is empty, then the key column in the merged dataframe is not the first.
First the summary and then my code, print output, and pd.show_versions is included below. Consider 2 dataframes left and right. The left data frame has 2 columns - key and lval. The right data frame has 2 columns - key and rval. Then consider 4 situations below.
Both left and right dataframes are non-empty
Left is non-empty and right is empty
Left is empty and right is non-empty
Both left and right dataframes are empty
In each of the situations, do an outer merge on the key column. The merged data frame has the columns in the following order:
In (1) and (2), the merged columns are key, lval, rval -> Good and as I expected
In (3) and (4), the merged columns are lval, key, rval -> Not as I expected. Is this expected? I'd expect the merged columns to be key, lval, rval. Is there a way to force this? I could do a workaround by checking the shape of the left data frame but I am wondering if I am doing something wrong or there's something wrong.
Code
import pandas as pd
import numpy as np
def print_matrices():
print 'left'
print left
print 'right'
print right
print 'merged'
print merged
print 'pandas show_versions'
pd.show_versions(as_json=False)
print '\n\nAn outer join'
left = pd.DataFrame({'key': ['foo1', 'foo2'], 'lval': [1, 2]})
right = pd.DataFrame({'key': ['foo1', 'foo3'], 'rval': [4, 5]})
merged = pd.merge(left, right, on='key', how='outer')
print_matrices()
print '\n\nAn outer join with right empty dataframe'
left = pd.DataFrame({'key': ['foo1', 'foo2'], 'lval': [1, 2]})
right = pd.DataFrame({'key': [], 'rval': []})
merged = pd.merge(left, right, on='key', how='outer')
print_matrices()
print '\n\nAn outer join with left empty dataframe'
left = pd.DataFrame({'key': [], 'lval': []})
right = pd.DataFrame({'key': ['foo1', 'foo3'], 'rval': [4, 5]})
merged = pd.merge(left, right, on='key', how='outer')
print_matrices()
print '\n\nAn outer join with both left and right empty dataframe'
left = pd.DataFrame({'key': [], 'lval': []})
right = pd.DataFrame({'key': [], 'rval': []})
merged = pd.merge(left, right, on='key', how='outer')
print_matrices()
Output:
An outer join
left
key lval
0 foo1 1
1 foo2 2
right
key rval
0 foo1 4
1 foo3 5
merged
key lval rval
0 foo1 1 4
1 foo2 2 NaN
2 foo3 NaN 5
An outer join with right empty dataframe
left
key lval
0 foo1 1
1 foo2 2
right
Empty DataFrame
Columns: [key, rval]
Index: []
merged
key lval rval
0 foo1 1 NaN
1 foo2 2 NaN
An outer join with left empty dataframe
left
Empty DataFrame
Columns: [key, lval]
Index: []
right
key rval
0 foo1 4
1 foo3 5
merged
lval key rval
0 NaN foo1 4
1 NaN foo3 5
Hi,
I am new to Pandas and like it a lot. I noticed that while doing an outer merge if the dataframe on the left side is empty, then the key column in the merged dataframe is not the first.
First the summary and then my code, print output, and pd.show_versions is included below. Consider 2 dataframes left and right. The left data frame has 2 columns - key and lval. The right data frame has 2 columns - key and rval. Then consider 4 situations below.
In each of the situations, do an outer merge on the key column. The merged data frame has the columns in the following order:
In (1) and (2), the merged columns are key, lval, rval -> Good and as I expected
In (3) and (4), the merged columns are lval, key, rval -> Not as I expected. Is this expected? I'd expect the merged columns to be key, lval, rval. Is there a way to force this? I could do a workaround by checking the shape of the left data frame but I am wondering if I am doing something wrong or there's something wrong.
Code
Output:
An outer join
An outer join with right empty dataframe
An outer join with left empty dataframe
An outer join with two empty dataframe
pandas show_versions
INSTALLED VERSIONS
commit: None
python: 2.7.5.final.0
python-bits: 64
OS: Darwin
OS-release: 13.4.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
pandas: 0.16.0
nose: None
Cython: None
numpy: 1.9.2
scipy: None
statsmodels: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.4.1
pytz: 2014.10
bottleneck: None
tables: None
numexpr: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: 0.9
apiclient: 1.4.0
sqlalchemy: None
pymysql: None
psycopg2: None
The text was updated successfully, but these errors were encountered: