MultiIndex DataFrame to_csv() ignores date_format #7791

Poquaruse · 2014-07-18T16:42:38Z

Hi all,

I'm running pandas 0.14.1:

INSTALLED VERSIONS

commit: None
python: 3.4.1.final.0
python-bits: 64
OS: Windows
OS-release: 8
machine: AMD64
processor: Intel64 Family 6 Model 58 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: DE

pandas: 0.14.1
nose: 1.3.3
Cython: 0.20.1
numpy: 1.8.1
scipy: 0.14.0
statsmodels: None
IPython: 2.1.0
sphinx: 1.2.2
patsy: 0.2.1
scikits.timeseries: None
dateutil: 2.1
pytz: 2014.4
bottleneck: None
tables: 3.1.1
numexpr: 2.3.1
matplotlib: 1.3.1
openpyxl: 1.8.5
xlrd: 0.9.3
xlwt: None
xlsxwriter: 0.5.5
lxml: 3.3.5
bs4: 4.3.1
html5lib: None
httplib2: None
apiclient: None
rpy2: None
sqlalchemy: 0.9.4
pymysql: None
psycopg2: None

I'm experiencing a bug which might be related to #7622

How to reproduce the bug:
At first, a new DataFrame is created

dt_rng = pd.date_range(start='2014-01-01 00:00', periods = 1000, freq='1s')
df = pd.DataFrame({'a':np.random.randn(1000), 'b': np.random.randn(1000)},index = dt_rng)
df['b'] = df['b'].round()

Then, this DataFrame is exported as csv with a custom date_format:

df.to_csv(date_format='%Y demo')

This does work: ',a,b\n2014 demo,-0.5582228932333034,1.0\n2014 demo,[...]'

Now, the other scenario. Same DataFrame but a groupby('b') and a resample:

df.groupby(df['b']).resample('1min').to_csv(date_format='%y demo')

Here, the custom date_format is not used:
'b,,a\n-4.0,2014-01-01 00:12:00,1.571130069273494[...]'

Thanks and best regards

The text was updated successfully, but these errors were encountered:

jreback · 2014-07-18T17:33:49Z

use the format below for grouping (this is new in 0.14.0), as its more flexible (e.g. the resample could be on a column).
this is a bug, but I think for another reason, the index is not being constructured 100% correctly, xref BUG: multi-index with one level with a tz buggy #7793

In [5]: df.groupby(['b',pd.Grouper(freq='1h')]).mean()
Out[5]: 
                      a
b                      
-4 2014-01-01  0.236842
-3 2014-01-01 -0.196441
-2 2014-01-01  0.084048
-1 2014-01-01 -0.079776
 0 2014-01-01  0.059750
 1 2014-01-01 -0.010162
 2 2014-01-01 -0.124559
 3 2014-01-01 -0.009600

In [6]: df.groupby(['b',pd.Grouper(freq='1h')]).mean().to_csv('test.csv',mode='w',date_format='%Y')

In [7]: !cat test.csv
b,,a
-4.0,2014-01-01 00:00:00,0.23684162290873984
-3.0,2014-01-01 00:00:00,-0.1964410632489068
-2.0,2014-01-01 00:00:00,0.08404773044520912
-1.0,2014-01-01 00:00:00,-0.07977596435616328
0.0,2014-01-01 00:00:00,0.05975025407287002
1.0,2014-01-01 00:00:00,-0.010162014106808491
2.0,2014-01-01 00:00:00,-0.1245593183893703
3.0,2014-01-01 00:00:00,-0.00959968236480397

Poquaruse · 2014-07-19T07:34:35Z

Thanks! :-)

jreback · 2014-07-19T13:43:35Z

ok, since #7746 fix was just merged, this is a separate issue.

I think that the multi-index to_native_types is not passing thru the date_format (and maybe other formatters) to the sub-index formatting.

nbonnotte · 2015-11-08T14:44:41Z

@jreback i've been looking at that issue, and i would not say the difficulty is "novice"...

The issue I'm struggling with is that (if i understand things correctly) a MultiIndex is a MultiIndex whatever the type of the content, while a 1D index with datetimes is a DatetimeIndex.

So when we need to format a DatetimeIndex, DatetimeIndex._format_native_types is there and does what it has to do, knowing its values are Timestamps https://github.com/pydata/pandas/blob/d0a21032be1e92d4fcb92b8a1c16dee514cb61dc/pandas/tseries/index.py#L737

On the other hand, a MultiIndex has only a generic _format_native_types method, which basically does nothing but returning its content https://github.com/pydata/pandas/blob/d0a21032be1e92d4fcb92b8a1c16dee514cb61dc/pandas/core/index.py#L4329

I could try to go through the individual values of the MultiIndex and format all Timestamp that I encounter, but I'm not sure that's such a good idea. Probably finding a way to reuse DatetimeIndex._format_native_types would be better.

Is it possible to transform a MultiIndex into a list of regular indexes (e.g. DatetimeIndex) back and forth?

jreback · 2015-11-08T16:23:50Z

I'll bump the difficulty up 1 :)

here's what you have to do

redefine _format_native_types for a MulitIndex

for each level, call _format_native_types, passing thru the kwargs. these are the new levels,
then reconstruct a MultiIndexIndex and call .values on it.

In [1]: mi = pd.MultiIndex.from_product([range(3),list('abc'),date_range('20130101 09:00:00',periods=3)])
In [4]: mi.levels

Out[4]: FrozenList([[0, 1, 2], [u'a', u'b', u'c'], [2013-01-01 09:00:00, 2013-01-02 09:00:00, 2013-01-03 09:00:00]])

In [5]: mi.levels[0]
Out[5]: Int64Index([0, 1, 2], dtype='int64')

In [6]: mi.levels[1]
Out[6]: Index([u'a', u'b', u'c'], dtype='object')

In [7]: mi.levels[2]
Out[7]: DatetimeIndex(['2013-01-01 09:00:00', '2013-01-02 09:00:00', '2013-01-03 09:00:00'], dtype='datetime64[ns]', freq='D')

In [8]: mi.levels[2].to_native_types()
Out[8]: array(['2013-01-01 09:00:00', '2013-01-02 09:00:00', '2013-01-03 09:00:00'], dtype=object)

In [9]: mi.levels[2].to_native_types(date_format='%Y%m%d')
Out[9]: array(['20130101', '20130102', '20130103'], dtype=object)

something like this:

In [10]: MultiIndex(levels=[ l._format_native_types() for l in mi.levels],labels=mi.labels,sortorder=mi.sortorder,verify_integrity=False)
Out[10]: 
MultiIndex(levels=[[u'0', u'1', u'2'], [u'a', u'b', u'c'], [u'2013-01-01 09:00:00', u'2013-01-02 09:00:00', u'2013-01-03 09:00:00']],
           labels=[[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2], [0, 0, 0, 1, 1, 1, 2, 2, 2, 0, 0, 0, 1, 1, 1, 2, 2, 2, 0, 0, 0, 1, 1, 1, 2, 2, 2], [0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2]])

In [11]: MultiIndex(levels=[ l._format_native_types() for l in mi.levels],labels=mi.labels,sortorder=mi.sortorder,verify_integrity=False).values
Out[11]: 
array([('0', 'a', '2013-01-01 09:00:00'),
       ('0', 'a', '2013-01-02 09:00:00'),
       ('0', 'a', '2013-01-03 09:00:00'),
       ('0', 'b', '2013-01-01 09:00:00'),
       ('0', 'b', '2013-01-02 09:00:00'),
       ('0', 'b', '2013-01-03 09:00:00'),
       ('0', 'c', '2013-01-01 09:00:00'),
       ('0', 'c', '2013-01-02 09:00:00'),
       ('0', 'c', '2013-01-03 09:00:00'),
       ('1', 'a', '2013-01-01 09:00:00'),
       ('1', 'a', '2013-01-02 09:00:00'),
       ('1', 'a', '2013-01-03 09:00:00'),
       ('1', 'b', '2013-01-01 09:00:00'),
       ('1', 'b', '2013-01-02 09:00:00'),
       ('1', 'b', '2013-01-03 09:00:00'),
       ('1', 'c', '2013-01-01 09:00:00'),
       ('1', 'c', '2013-01-02 09:00:00'),
       ('1', 'c', '2013-01-03 09:00:00'),
       ('2', 'a', '2013-01-01 09:00:00'),
       ('2', 'a', '2013-01-02 09:00:00'),
       ('2', 'a', '2013-01-03 09:00:00'),
       ('2', 'b', '2013-01-01 09:00:00'),
       ('2', 'b', '2013-01-02 09:00:00'),
       ('2', 'b', '2013-01-03 09:00:00'),
       ('2', 'c', '2013-01-01 09:00:00'),
       ('2', 'c', '2013-01-02 09:00:00'), ('2', 'c', '2013-01-03 09:00:00')], dtype=object)

nbonnotte · 2015-11-08T17:36:49Z

Nice! I didn't know the concept of "level", and this is exactly what i was missing, thx ;)

Here we go

jreback · 2015-11-11T03:20:28Z

closed by #11151

jreback added Bug labels Jul 18, 2014

jreback added this to the 0.15.0 milestone Jul 18, 2014

This was referenced Jul 18, 2014

BUG: multi-index with one level with a tz buggy #7793

Closed

BUG: reset_index with a multi-index PeriodIndex #7746

Closed

jreback added the Output-Formatting label Jul 19, 2014

jreback modified the milestones: 0.15.1, 0.15.0 Sep 10, 2014

jreback mentioned this issue Sep 19, 2014

Bloomberg Hackathon #8323

Closed

jreback added the Good as first PR label Sep 21, 2014

jreback modified the milestones: 0.16.0, Next Major Release Mar 6, 2015

jreback mentioned this issue May 26, 2015

to_csv date_format including hh:mm:ss #10209

Closed

jreback added Difficulty Intermediate and removed Difficulty Novice labels Nov 8, 2015

nbonnotte mentioned this issue Nov 8, 2015

BUG: multi-index to_native_types is not passing thru parameters #11551

Closed

jreback modified the milestones: 0.17.1, Next Major Release Nov 8, 2015

jreback mentioned this issue Nov 8, 2015

to_csv date format flag not behaving as expected with multiindex #6797

Closed

jreback pushed a commit that referenced this issue Nov 11, 2015

BUG: multi-index to_native_types is not passing thru parameters, #7791

9cbe8b9

jreback closed this as completed Nov 11, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MultiIndex DataFrame to_csv() ignores date_format #7791

MultiIndex DataFrame to_csv() ignores date_format #7791

Poquaruse commented Jul 18, 2014

jreback commented Jul 18, 2014

Poquaruse commented Jul 19, 2014

jreback commented Jul 19, 2014

nbonnotte commented Nov 8, 2015

jreback commented Nov 8, 2015

nbonnotte commented Nov 8, 2015

jreback commented Nov 11, 2015

MultiIndex DataFrame to_csv() ignores date_format #7791

MultiIndex DataFrame to_csv() ignores date_format #7791

Comments

Poquaruse commented Jul 18, 2014

INSTALLED VERSIONS

jreback commented Jul 18, 2014

Poquaruse commented Jul 19, 2014

jreback commented Jul 19, 2014

nbonnotte commented Nov 8, 2015

jreback commented Nov 8, 2015

nbonnotte commented Nov 8, 2015

jreback commented Nov 11, 2015