Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataFrame.join can not have datetime as column names #811

Closed
turkeytest opened this issue Feb 22, 2012 · 4 comments
Closed

DataFrame.join can not have datetime as column names #811

turkeytest opened this issue Feb 22, 2012 · 4 comments
Labels
Milestone

Comments

@turkeytest
Copy link

Issue:
For DataFrameA.join( DataFrameB , on = 'somecol' ), DataFrameB can not have datetime objects as column labels

Example:

from datetime import datetime
from pandas import DataFrame

str_dates = [ '20120209' , '20120222' ]
dt_dates = [ datetime(2012,2,9) , datetime(2012,2,22)]

A = DataFrame(str_dates , index = range(2) , columns = ['aa'] )

B = DataFrame([[1,2],[3,4]] , index = str_dates , columns = str_dates)
C = DataFrame([[1,2],[3,4]] , index = str_dates , columns = dt_dates )

works = A.join( B , on = 'aa' ) # works -- extra column labels are string
fails = A.join( C , on = 'aa' ) # fails -- extra column labels are datetime

@adamklein
Copy link
Contributor

The problem is not that the columns cannot be datetime, but rather that strings and datetimes cannot be compared (and thus the resulting columns cannot be ordered). So, if you have on the other hand

A = DataFrame(str_dates , index = range(2) , columns = [datetime(2012,1,1)] )

it should all work. eg,

In [16]: A = DataFrame(str_dates , index = range(2) , columns = [datetime(2012,1,1)] )

In [17]: A
Out[17]: 
  2012-01-01
0   20120209
1   20120222

In [18]: A.join(C, on=datetime(2012,1,1))
Out[18]: 
  2012-01-01  2012-02-09  2012-02-22
0   20120209           1           2
1   20120222           3           4

Not sure what can be done about this one.

@wesm
Copy link
Member

wesm commented Feb 24, 2012

This is a bug in Index.union. If two indexes are monotonic but their elements are incomparable, self._inner_indexer will fail. Write a unit test with a union of these two indexes:

ipdb> self
Index([aa], dtype=object)
ipdb> other
Index([2012-02-09 00:00:00, 2012-02-22 00:00:00], dtype=object)

ipdb> self
Index([aa], dtype=object)
ipdb> other
Index([2012-02-09 00:00:00, 2012-02-22 00:00:00], dtype=object)

then add a workaround in case of TypeError from the Cython method

@adamklein
Copy link
Contributor

Wes, please have a look. Falls back on slower non-monotonic methodology in intersect on TypeError (union works fine - I think you meant intersection?).

@wesm
Copy link
Member

wesm commented Feb 24, 2012

Yeah intersection, sorry. Let me look at the PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants