Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect join of DataFrames with non-unique datetime indices #1306

Closed
leonbaum opened this issue May 24, 2012 · 3 comments

Comments

@leonbaum
Copy link

commented May 24, 2012

I'm not sure whether joining of DFs with non-unique indices is now supported, but it's not giving an error and this simple example don't make sense:

In [11]: df1 = pandas.DataFrame({'x': ['a']}, index=[np.datetime64('2012')])

In [12]: df2 = pandas.DataFrame({'y': ['b', 'c']}, index=[np.datetime64('2012')] * 2)

In [13]: df1
Out[13]: 
                     x
1970-01-16 08:09:36  a

In [14]: df2
Out[14]: 
                     y
1970-01-16 08:09:36  b
1970-01-16 08:09:36  c

In [15]: df1.join(df2, how='inner')
Out[15]: 
                     x  y
1970-01-16 08:09:36  a  b

Shouldn't the 1st row of df1 join to both rows of df2?

@leonbaum

This comment has been minimized.

Copy link
Author

commented May 24, 2012

I just noticed the timestamp is also screwed up, but I'm guessing that's a separate issue.

I'm using the latest master branch, btw.

@wesm

This comment has been minimized.

Copy link
Member

commented May 24, 2012

It looks to me like an edge case, I'll look into it. I'll fix the timestamp issue too; unfortunately the NumPy datetime API is a disaster in NumPy 1.6.1 and I'm doing my best to work around it. Affairs will be much improved in NumPy 1.7 and later

@wesm

This comment has been minimized.

Copy link
Member

commented May 25, 2012

I worked through this and built the many-to-one and many-to-many join machinery today for indexes. Was not easy:

In [6]: df1.join(df2)
Out[6]: 
                     x  y
1970-01-16 08:09:36  a  b
1970-01-16 08:09:36  a  c

The matter of the timestamp handling is something separate, so closing this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.