Rethink when HTML repr of DataFrame is displayed #4886

takluyver · 2013-09-19T17:20:01Z

Helping out with a moderately beginner class recently, I noticed several people having problems, because they could easily display a table view of a small DataFrame, but the representation looked completely different when it exceeded a certain size. People thought that they had a different type of object, or that the detailed information was some kind of an error message. There's no obvious way to get the HTML repr for larger DataFrames.

Suggestions:

Increase the size limit for displaying the full HTML repr. When we did force larger dataframes to display as HTML tables, the IPython notebook can easily handle substantially larger tables than the current cutoff.
When the DataFrame is too large to display whole, produce a truncated HTML table rather than switching to a completely different kind of repr.

I'll try to work on this soon if no-one objects or beats me to it.

jtratner · 2013-09-20T00:19:30Z

Instead of specializing just on HTML, why don't we just change the default max row config option if you detect you're in a notebook?

jtratner · 2013-09-20T00:21:43Z

Looks like you can't easily detect whether in a notebook, I guess we could just up the max_rows in __repr_html__. Good part is that info() will still produce the other view.

jtratner · 2013-09-20T00:32:19Z

This isn't free though, gets slow when you get up to 50,000 cells for example:

df = DataFrame([range(1000) for _ in range(50)])
In [21]: %timeit df.to_string() # method used to print object
1 loops, best of 3: 3.26 s per loop

takluyver · 2013-09-20T03:01:36Z

Yep, by design, the kernel (where code is executed) doesn't know about the frontend.

Looking at the code, it mentions:

# ipnb in html repr mode allows scrolling
# users strongly prefer to h-scroll a wide HTML table in the browser
# then to get a summary view. GH3541, GH3573

So perhaps this is already improved, and I saw it in an older version. Linking those issues: #3541, #3573, and PR #3663 claiming to fix them.

Oh, and there's code which attempts to detect whether it's running in the Qt console or the notebook...sorry, that won't work all the time (the process which started a kernel isn't necessarily the same as the process making this execution request). I'll bring that up to try to work out a better way to do handle the difference.

jtratner · 2013-09-20T03:05:24Z

yeah, I had a sense. Does Qt console also use __repr_html__? If so, that's unfortunate.

takluyver · 2013-09-20T03:14:32Z

It does. I'm proposing that we (IPython) define a rich HTML repr and a separate 'poor HTML', suitable for use in the Qt console.

I'd still like to leave this issue open, because it looks like when you hit 60 rows (or whatever max_rows is configured to), it still switches abruptly to the short 'info' view, whereas I think it should show a truncated table.

jtratner · 2013-09-20T03:19:29Z

that'd be helpful :) - but yes, it seems like it would make sense to change html's repr, instead of just defaulting to info()

jtratner · 2013-09-21T20:19:03Z

@takluyver if you can set up how you'd like the repr to look, we can add a config option that can be set either in a .pandasrc or in an ipython startup script/in a notebook.

takluyver · 2013-11-19T01:33:08Z

I've found time to take a look at this. I reused the max_rows and max_columns display settings, propagating the values down to the HTMLFormatter, and truncating rows and columns. My changes are on this branch - they don't yet handle all the odd cases, so I haven't made a pull request just now. Does this look like the right approach? Or would we like to make this more general than the HTML formatting code? I considered slicing the dataframe and then taking the HTML repr of that, but I think getting the ... truncation markers in place would be tricky in that case.

Here's the current display when you go beyond 60 rows/20 columns:

And here's the new:

jtratner · 2013-11-19T01:48:40Z

I personally like how your proposed version looks.

takluyver · 2013-11-19T22:24:18Z

Thanks Jeff. I've now covered the cases with MultiIndex-es, added tests, and made PR #5550.

ghost · 2013-11-20T10:00:48Z

I don't object to making this controllable via an option, but I'm -1 on making it the default.
Obviously, changes like this can be traumatic to existing users but I think the current
behavior actually makes sense from a useability pov.

The way I see it, the default view of a dataframe is the info view. It always
provides schema information and "query info" such as number of rows.
The reason a small frame is displayed in it's entirety when it's "small enough" is that a view of
all the data is a superset of the data in the info view (schema+number of records).
But yes, this can have a jarring "jump-cut" effect on new-users.

An alternative solution in 2 parts is:

add a caption to the repr output describing type and display mode: "DataFrame [data view]:",
"Dataframe [info view]:", "Series [data view]:".
That also distinguishes series from single column frames, another newbie gotcha.
~~The PR doesn't address the problem of~~ re: getting a glimpse of larger frames,
that's df.head()'s role, altering it to emit ellipsis for wider frames makes a lot of sense.
(for html repr, and for text repr when expand_frame_rapr is off)

I strongly urge conducting a small usability study (have a few users adopt it for a week
and report) before making potentially disruptive change like this to UX.

jorisvandenbossche · 2013-11-20T12:59:52Z

The Series representation gives the first and last elements. That could maybe also be an interesting approach to something similar for DataFrames, instead of first rows/cols in the proposal.

Example of Series (there is also, apart from the data, some extra information on the total length):

In [27]: s = pd.Series(np.arange(61))

In [28]: s
Out[28]: 
0      0
1      1
2      2
3      3
4      4
5      5
6      6
7      7
8      8
9      9
10    10
11    11
12    12
13    13
14    14
...
46    46
47    47
48    48
49    49
50    50
51    51
52    52
53    53
54    54
55    55
56    56
57    57
58    58
59    59
60    60
Length: 61, dtype: int32

takluyver · 2013-11-20T20:24:53Z

Conversely, there's the .info() method if you want to see a summary of the columns. Showing the data with truncation is in line with numpy reprs and Series reprs, as well as regular Python reprs of collections (although they don't truncate).

Not making it the default would defeat the entire point. New users are not going to hunt around in config settings to set this to behave intuitively. I don't even know what configuration file pandas uses. And I don't think another config setting is necessary: if you want to see the info view, use the info() method.

I would love some people to do user testing - any volunteers? However, in uncontrolled user testing of the current behaviour, I have observed the sudden switch to a completely different repr confusing new users and annoying more experienced users.

ghost · 2013-11-20T21:09:26Z

after playing with this some more I think it is an improvement - objections withdrawn.

takluyver · 2013-11-20T22:17:25Z

Cheers @y-p. :-)

ghost · 2013-11-26T11:07:01Z

merged #5550.

takluyver mentioned this issue Nov 19, 2013

HTML (and text) reprs for large dataframes. #5550

Merged

ghost closed this as completed Nov 26, 2013

jseabold mentioned this issue Mar 6, 2014

New DataFrame display information? #6547

Closed

This was referenced Apr 17, 2014

ENH (GH6568) Add option info_verbose #6890

Closed

Introduce 'tidy_repr' for DataFrames #6938

Closed

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rethink when HTML repr of DataFrame is displayed #4886

Rethink when HTML repr of DataFrame is displayed #4886

takluyver commented Sep 19, 2013

jtratner commented Sep 20, 2013

jtratner commented Sep 20, 2013

jtratner commented Sep 20, 2013

takluyver commented Sep 20, 2013

jtratner commented Sep 20, 2013

takluyver commented Sep 20, 2013

jtratner commented Sep 20, 2013

jtratner commented Sep 21, 2013

takluyver commented Nov 19, 2013

jtratner commented Nov 19, 2013

takluyver commented Nov 19, 2013

ghost commented Nov 20, 2013

jorisvandenbossche commented Nov 20, 2013

takluyver commented Nov 20, 2013

ghost commented Nov 20, 2013

takluyver commented Nov 20, 2013

ghost commented Nov 26, 2013

Rethink when HTML repr of DataFrame is displayed #4886

Rethink when HTML repr of DataFrame is displayed #4886

Comments

takluyver commented Sep 19, 2013

jtratner commented Sep 20, 2013

jtratner commented Sep 20, 2013

jtratner commented Sep 20, 2013

takluyver commented Sep 20, 2013

jtratner commented Sep 20, 2013

takluyver commented Sep 20, 2013

jtratner commented Sep 20, 2013

jtratner commented Sep 21, 2013

takluyver commented Nov 19, 2013

jtratner commented Nov 19, 2013

takluyver commented Nov 19, 2013

ghost commented Nov 20, 2013

jorisvandenbossche commented Nov 20, 2013

takluyver commented Nov 20, 2013

ghost commented Nov 20, 2013

takluyver commented Nov 20, 2013

ghost commented Nov 26, 2013