Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
Stacking MultiIndex DataFrame columns with Timestamps levels fails #8039
Comments
TomAugspurger
added Bug Reshaping
labels
Aug 15, 2014
TomAugspurger
added this to the
0.15.0
milestone
Aug 15, 2014
|
when you pass Use this to create a frame
And simply set the columns.
|
jreback
closed this
Aug 15, 2014
jreback
added Usage Question and removed Bug Reshaping
labels
Aug 15, 2014
|
@TomAugspurger not a bug, but a usage issue. |
|
yeah just read your comment. |
|
@ldkge's problem was with In [67]: result.stack(0)
Out[67]:
SomeColumnName
AnotherOne
1 2014-08-01 34204
2 2014-08-01 43580
3 2014-08-01 84329
5 2014-08-01 23485
In [68]: result.stack(1)
Out[68]:
Empty DataFrame
Columns: [(2014-08-01 00:00:00, AnotherOne)]
Index: []would be different. |
|
This works in master (recently added feature).
I am not what stack(1) would/should actually do What would you expect? |
|
I thought it should shift the >>>df.stack(1)
2014-08-01
AnotherOne
1 SomeColumnName 34204
2 SomeColumnName 43580
3 SomeColumnName 84329
5 SomeColumnName 23485 |
|
cc @onesandzeroes what do you think? |
jreback
reopened this
Aug 15, 2014
|
ok I think agree could be a bug |
jreback
added Bug Reshaping
labels
Aug 15, 2014
|
I'll submit a PR once I figure out what's wrong. |
|
@jreback it has to do with how the MultiIndex is storing the timestamp. Any idea offhand why with In [6]: idx = pd.MultiIndex.from_tuples([(pd.datetime(2014, 1, 1), 'A', 'B')])these two aren't equal? In [10]: idx.values[0][0]
Out[10]: Timestamp('2014-01-01 00:00:00')
In [8]: idx.levels[0].values
Out[8]: array(['2013-12-31T18:00:00.000000000-0600'], dtype='datetime64[ns]')edit: or even clearer, why isn't
equal to In [33]: idx.levels[0][0]
Out[33]: Timestamp('2014-01-01 00:00:00')I'm going to go digging in index.py |
TomAugspurger
added MultiIndex and removed Reshaping
labels
Aug 15, 2014
|
where is this type of comparison? |
|
(I think) they're compared when constructing the new dataframe in ipdb> new_data
{(numpy.datetime64('2013-12-31T18:00:00.000000000-0600'), 'B'): array([1, 2, 3, 4])}
ipdb> new_columns
MultiIndex(levels=[[2014-01-01 00:00:00], ['B']],
labels=[[0], [0]])
ipdb> result = DataFrame(new_data, index=new_index, columns=new_columns)
ipdb> result
2014-01-01
B
0 C NaN
1 C NaN
2 C NaN
3 C NaNI'll see why |
|
I can't see exactly where you are pointing too... levels should be using |
|
@jreback I agree with TomAugspurger about what the expected behaviour of |
ldkge commentedAug 15, 2014
You can see the bug in the following code:
We would expect the data to be unchanged, however the returned DataFrame is empty.
The Pandas version used was 0.11.0