Add _repr_html method to make DataFrames nice in IPython notebook. #772

Merged
merged 1 commit into from Feb 13, 2012

Projects

None yet

4 participants

@ellisonbg

This makes DataFrame's show up and nice HTML tables in the IPython notebook. This initial implementation is very plain - but better than the plaintext output. We need to think more about how we want to handle these things, but this is a start.

@takluyver
Python for Data member

Good idea, I'd been thinking about this recently.

If the dataframe is longer than some limit, the standard __repr__ will switch to a brief form just showing info about each column, rather than showing a massive table. I don't think to_html() currently does that, but it should probably be implemented for this.

@ellisonbg

I think It would be nice to put the table in a scrollable div of fixed size. I haven't looked at to_html yet, but do you think that approach makes sense?

@takluyver
Python for Data member

That mitigates the problem, but just testing, I can quickly make a DataFrame with 10M rows. That HTML will probably take some time to generate, and take over a lot of memory (first on the server, then in the browser). So I think there needs to be some cut-off.

Another option would be to do a head view - so show the first, say, 50 rows, and have something at the bottom that indicates there's more.

@lodagro
Python for Data member

to_html() does not have the __repr__() cleverness to switch between a brief form or a full dump. It is not really needed i think, there is plenty of stuff one can do in the html world to display tables in whatever form. FYI I'm using to_html() combined with mako

Now for the ipython notebook, it sounds like a good idea to do something extra on top of to_html() (like indeed for e.g a scrollable div) to handle large DataFrames. This can be done without changing to_html() itself.

@ellisonbg
@lodagro lodagro merged commit 4103ce9 into pydata:master Feb 13, 2012
@lodagro
Python for Data member

I pulled your code, added a scrollable div and fall over to info representation for large DataFrames ( b570153). Also added a little unit test.

@takluyver
Python for Data member

I see the mechanics of the fallback for large dataframes returns text in a <pre> tag. I think there's a neater way of falling back to a text repr, though I forget whether it's to return None or raise an error. Brian will know, I'm sure.

@wesm
Python for Data member

@lodagro the summary repr from _repr_html is missing the class header

<class 'pandas.core.frame.DataFrame'>
@wesm
Python for Data member

Also maybe it should use print_config.max_columns instead of 20?

@lodagro
Python for Data member

Reason i did not use print_config variables is because it is in a scrollable div.
Meaning that with the default setting of print_config.max_columns, there would be no vertical scroll bar. Since repr would switch over to summary view if DataFrame is wider than terminal.

Ok, i will use the same switch over between full and summary for _repr_html_ and __repr__ and have a look at the missing class header.

@wesm
Python for Data member

Ah, that's a good point. Maybe just have no limit then on the number of columns since you can scroll right?

@lodagro
Python for Data member

I did put a limit on rows/columns to avoid sending massive amounts of html to the notebook, in case of very large dataframes. So what do we chose?

Now that i think of it, what about adding some css styling? I don't know if the notebook can handle it. I can give it a try.
Any preferences on style?

@wesm
Python for Data member

No preferences there but go right ahead. I have never been the best with those kinds of aesthetics

@ellisonbg

A few points:

  • I think we do want to limit the number of rows that are sent to the browser, probably at around 100-500.
  • I like the idea of having to_html obey the print_config.max_columns option for this.
  • The generated HTML should be the same regardless of whether the full frame or a subset of rows is displayed. We should not fall back to the other representation if it is big. This will require modifying the logic in to_html.
  • For this particular view, I would not do any additional styling - the notebook already has css to style these tables and leaving it alone will allow it to be consistent with the rest of the notebook.
  • The scrollable div should be pretty small - small enough that it easily fits in browser window at one time. Otherwise we will run into weird double scrolling issues when both the div and window are trying to scroll.
@wesm
Python for Data member

Any plans to add an option to the notebook to disable enriched repr? I actually rather like the plain text output for demos

@ellisonbg
@lodagro
Python for Data member

Was patiently waiting for feedback, apparently my last comments did not reach github, so repeating

  • If print_config max_rows and max_columns are to be used, i would remove the scrollable div. Also I think there is no way to avoid double scrolling. Even if you make the div small, browser window can always be smaller, and a very small div looks rather impractical when using a full screen browser window.
  • Keeping the summary fall back option (although plain text), makes the html representation consistent with plain text display. I would keep this.
  • no css styling
  • had a look at why for the summary view the class header (=first line) is not printed. Apparently '<pre>' + multi_line_string + '</pre>' (currently used) is not the same as '<pre>\n' + multi_line_string + '\n</pre>'. From html point of view, both should be ok.
  • print_config could have an extra option notebook_html, to be used in _repr_html_(), when False _repr_html_() falls back to __repr__() wrapped in pre tag. This way html representation can be disabled (kinda).
  • probably need to do this for Series as well.
@ellisonbg

I don't really have time to work on this right now but a few comments:

  • I do think a scrollable div should be used, even if max_rows/max_columns are used.
  • I agree about keeping the summary view.
  • I agree that there should be a notebook_html print config option. When it is False, _repr_html_ should just return None at the standard __repr__ will be used.
  • Yep on Series as well.
@lodagro lodagro added a commit to lodagro/pandas that referenced this pull request Feb 24, 2012
@lodagro lodagro DataFrame_repr_html changes according to #772 discussion. ca357c7
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment