DataFrame.join can not have datetime as column names #811

Closed
turkeytest opened this Issue Feb 22, 2012 · 4 comments

Projects

None yet

3 participants

@turkeytest

Issue:
For DataFrameA.join( DataFrameB , on = 'somecol' ), DataFrameB can not have datetime objects as column labels

Example:

from datetime import datetime
from pandas import DataFrame

str_dates = [ '20120209' , '20120222' ]
dt_dates = [ datetime(2012,2,9) , datetime(2012,2,22)]

A = DataFrame(str_dates , index = range(2) , columns = ['aa'] )

B = DataFrame([[1,2],[3,4]] , index = str_dates , columns = str_dates)
C = DataFrame([[1,2],[3,4]] , index = str_dates , columns = dt_dates )

works = A.join( B , on = 'aa' ) # works -- extra column labels are string
fails = A.join( C , on = 'aa' ) # fails -- extra column labels are datetime

@adamklein

The problem is not that the columns cannot be datetime, but rather that strings and datetimes cannot be compared (and thus the resulting columns cannot be ordered). So, if you have on the other hand

A = DataFrame(str_dates , index = range(2) , columns = [datetime(2012,1,1)] )

it should all work. eg,

In [16]: A = DataFrame(str_dates , index = range(2) , columns = [datetime(2012,1,1)] )

In [17]: A
Out[17]: 
  2012-01-01
0   20120209
1   20120222

In [18]: A.join(C, on=datetime(2012,1,1))
Out[18]: 
  2012-01-01  2012-02-09  2012-02-22
0   20120209           1           2
1   20120222           3           4

Not sure what can be done about this one.

@wesm
Member
wesm commented Feb 24, 2012

This is a bug in Index.union. If two indexes are monotonic but their elements are incomparable, self._inner_indexer will fail. Write a unit test with a union of these two indexes:

ipdb> self
Index([aa], dtype=object)
ipdb> other
Index([2012-02-09 00:00:00, 2012-02-22 00:00:00], dtype=object)

ipdb> self
Index([aa], dtype=object)
ipdb> other
Index([2012-02-09 00:00:00, 2012-02-22 00:00:00], dtype=object)

then add a workaround in case of TypeError from the Cython method

@adamklein

Wes, please have a look. Falls back on slower non-monotonic methodology in intersect on TypeError (union works fine - I think you meant intersection?).

@wesm
Member
wesm commented Feb 24, 2012

Yeah intersection, sorry. Let me look at the PR

@wesm wesm closed this in 75bf87b Feb 24, 2012
@adamklein adamklein added a commit to adamklein/pandas that referenced this issue Feb 24, 2012
@adamklein adamklein Merge branch 'master' into mergemaster
* master: (27 commits)
  BUG: close #811, fix index.intersection where indices are incomparable
  ENH: rename fill_method to method, close #827
  BUG: closes #822, cast non-string columns in to_records
  BUG: use Cython take_2d instead of ndarray.take due to Fortran order performance problem, GH #817
  BUG: clean up Series wrapper that is not needed, per #819 comments
  BUG: close #816, defer tuple-unboxing until later per perivous commit comments
  BUG: close #812, fix index name dropping in _ensure_index
  ENH: close #818, per Wes comments on izip df tuple iterator
  TST: script testing groupby iteration performance GH close #817
  BUG: add test case for integer index failure re: #819
  BUG: close #816, fix exception thrown on np.diff
  BUG: close #812, reindex keeps original name along both axes
  ENH: add new izip-based row-iterator, update release and docs (close #818)
  BUG: fix issues resulting from unclean merge in PR #807
  BUG: handle grouping aggregations consistently whether as_index is True/False, close #819
  BUG: remove lingering set_trace related to previous commit
  BUG: dtype sometimes not converted to bool, closes issue #820
  BUG: malformed BlockManager in groupby, regression from 0.7.0, GH #814
  ENH: cache indexers when conforming list of series
  ENH: minor tweaks to grouped_hist
  ...
869cf0f
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment