New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set pd.options.display.max_columns=0 by default #17023

Merged
merged 21 commits into from Mar 28, 2018

Conversation

Projects
None yet
7 participants
@cbrnr
Contributor

cbrnr commented Jul 19, 2017

Update: Remove everything related to max_rows and only deal with max_columns in this PR.

Changed max_columns to 0 (automatically adapt the number of displayed columns to the actual terminal width) when run in a terminal and max_rows to 20 (because I'd like to see the "whole" data frame at a glance like in R's tibble).

  • closes #16579
  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry
@TomAugspurger

This comment has been minimized.

Contributor

TomAugspurger commented Jul 19, 2017

Could you provide some before / after screenshots? This will need some feedback from the wider community, since the visual display of DataFrames is an API grey-zone; prepare for bike-shedding 😄

We still have some situations where we can't detect the terminal width reliably. We need to make sure the output is handled as well as possible there.

I'm +1 for reducing the number of rows displayed. Typically I use 10 rows.

@cbrnr

This comment has been minimized.

Contributor

cbrnr commented Jul 19, 2017

Here's the current output when printing a data frame with shape (5, 10) in a terminal with 100 characters width:

before

And here is the same data frame after the proposed change:

after

@chris-b1

This comment has been minimized.

Contributor

chris-b1 commented Jul 19, 2017

I'm OK with this. I do think it needs a big note in the whatsnew, with instructions on how to change back (maybe a ref to IPython config too). Also looks like some tests that will need adjusted.

validator=is_instance_factory([type(None), int]))
cf.register_option('max_categories', 8, pc_max_categories_doc,
validator=is_int)
cf.register_option('max_colwidth', 50, max_colwidth_doc, validator=is_int)
cf.register_option('max_columns', 20, pc_max_cols_doc,
cf.register_option('max_columns', 0, pc_max_cols_doc,

This comment has been minimized.

@jreback

jreback Jul 20, 2017

Contributor

hmm, this should be None to auto-detect (0 might do the same though)

This comment has been minimized.

@cbrnr

cbrnr Jul 20, 2017

Contributor

So should I change 0 to None or leave it as is?

This comment has been minimized.

@cbrnr

cbrnr Jul 20, 2017

Contributor

A quick test shows that None means there is no limit (i.e. display all columns). So I guess this should remain 0.

This comment has been minimized.

@jreback

jreback Jul 20, 2017

Contributor

hmm, so 0 is NOT the same as None here ? that is very odd. can you show an example

This comment has been minimized.

@cbrnr

cbrnr Jul 20, 2017

Contributor

Well, I tried setting this value to None after importing pandas, i.e.

import pandas as pd
pd.options.display.max_columns = None

And this results in all columns printed out (so no ellipsis to mark skipped columns in the output). But I will try setting the value in config_init.py as well.

This comment has been minimized.

@cbrnr

cbrnr Jul 20, 2017

Contributor

I can confirm that None and 0 have different meanings. None prints all columns, whereas 0 prints columns that fit within the terminal width.

@jreback

This comment has been minimized.

Contributor

jreback commented Jul 20, 2017

yeah changing the column default to auto-detect is fine. I personally use an event smaller default for max_rows, but 20 looks fine. Pls update docs, a before/after screen shot that we can include in the what's new would be good (IOW your above ones). You will have to fix some tests.

@cbrnr

This comment has been minimized.

Contributor

cbrnr commented Jul 20, 2017

Do you really want screenshots or could we mimic the old and new behavior with markdown? If you want screenshots let me modify them so that my username doesn't show up. Regarding the tests, I'll see what I can do (I certainly didn't expect to break so many tests just by changing one value 😄).

@jreback

This comment has been minimized.

Contributor

jreback commented Jul 20, 2017

@cbrnr

the issue is that we are now auto-detecting and so the actual terminal width matters. yes certainly we can 'set' it so it works and show in mark down. I think screenshots might be more clear here though.

@cbrnr

This comment has been minimized.

Contributor

cbrnr commented Jul 20, 2017

I see, I'll provide new screenshots then. Thanks for pointing out the issue, of course this makes a huge difference!

@cbrnr

This comment has been minimized.

Contributor

cbrnr commented Jul 20, 2017

Also, the IPython QtConsole doesn't play nicely with pd.options.display.column_width=0:

screen shot 2017-07-20 at 15 23 22

@chris-b1

This comment has been minimized.

Contributor

chris-b1 commented Jul 20, 2017

@takluyver - assuming this hasn't changed, but do you know offhand if it's still not possible to detect terminal size running in the qtconsole? Found an older SO answer from you, thanks!
https://stackoverflow.com/questions/27813132/determining-terminal-width-in-ipython-qtconsole

@takluyver

This comment has been minimized.

Contributor

takluyver commented Jul 20, 2017

No, sorry. It's a conceptual mismatch, not just a technical one. The kernel is producing output for (potentially) several frontends which may be receiving it at the moment, and for applications which may later display saved copies of that output. So questions about the shape of 'the' output area don't really make sense in the Jupyter protocol.

As I see it, the real issue is that the Qt console doesn't understand any structured way of representing a table. We turned off its HTML support because it's just too limited and tends to break richer HTML written for the notebook frontend. I have occasionally advocated for a 'simple HTML' repr option which the Qt console would display, but it's never been high priority.

In the long run, I think our plan is to make an HTML console and use QtWebkit to embed it in Qt applications. Then it should be able to display HTML tables.

@cbrnr

This comment has been minimized.

Contributor

cbrnr commented Jul 21, 2017

Thanks @takluyver - so this isn't going to work until HTML tables are rendered (which would be awesome BTW). Is it possible to determine if Pandas is running in a real terminal or not? Could someone point me to the relevant code parts?

@takluyver

This comment has been minimized.

Contributor

takluyver commented Jul 25, 2017

It's possible to distinguish terminal IPython from IPython as a kernel for a Jupyter frontend, something like this:

try:
    ip = get_ipython()
except NameError:
    ... # Not IPython
else:
    if hasattr(ip, 'kernel'):
        ... # IPython as a Jupyter kernel
    else:
        ... # IPython terminal interface
@jreback

This comment has been minimized.

Contributor

jreback commented Sep 23, 2017

@cbrnr can you rebase this and compose a note for the what's new?

@cbrnr

This comment has been minimized.

Contributor

cbrnr commented Sep 24, 2017

Sure, but many tests need to be fixed and I don't know if I have the time to do that. I guess this change should be added to the API changes section?

@jreback

This comment has been minimized.

Contributor

jreback commented Sep 24, 2017

@cbrnr this would need its own sub-section in API

yes would need to fix any tests.

@cbrnr

This comment has been minimized.

Contributor

cbrnr commented Oct 18, 2017

@cbrnr

This comment has been minimized.

Contributor

cbrnr commented Nov 7, 2017

Many tests rely on calling str on a data frame with the current default max number of columns. I'm not sure this will be easy to fix. This would be easy to fix if Pandas supported a pandasrc config file as proposed in #4907.

@jreback

This comment has been minimized.

Contributor

jreback commented Nov 7, 2017

Many tests rely on calling str on a data frame with the current default max number of columns. I'm not sure this will be easy to fix. This would be easy to fix if Pandas supported a pandasrc config file as proposed in #4907.

nothing to do with that issue
all options already have defaults
for testing you need to setup the specific conditions for tests
generally using pd.option_context

@cbrnr

This comment has been minimized.

Contributor

cbrnr commented Nov 9, 2017

OK, I've fixed almost all tests. Only 2 tests still fail, but I'm not sure if these failures are related to my changes:

  • pandas/tests/tseries/test_timezones.py:1290: AssertionError
  • pandas/tests/scalar/test_timestamp.py:1110: AssertionError

Here's the complete test output:

___________________________________ TestTimestamp.test_timestamp ___________________________________
[gw1] darwin -- Python 3.6.3 /Users/clemens/anaconda/envs/pandas_dev/bin/python
self = <pandas.tests.scalar.test_timestamp.TestTimestamp object at 0x10ac29940>

    def test_timestamp(self):
        # GH#17329
        # tz-naive --> treat it as if it were UTC for purposes of timestamp()
        ts = Timestamp.now()
        uts = ts.replace(tzinfo=utc)
        assert ts.timestamp() == uts.timestamp()
    
        tsc = Timestamp('2014-10-11 11:00:01.12345678', tz='US/Central')
        utsc = tsc.tz_convert('UTC')
        # utsc is a different representation of the same time
        assert tsc.timestamp() == utsc.timestamp()
    
        if PY3:
            # should agree with datetime.timestamp method
            dt = ts.to_pydatetime()
>           assert dt.timestamp() == ts.timestamp()
E           AssertionError: assert 1510231568.085538 == 1510235168.085538
E            +  where 1510231568.085538 = <built-in method timestamp of datetime.datetime object at 0x10c197d50>()
E            +    where <built-in method timestamp of datetime.datetime object at 0x10c197d50> = datetime.datetime(2017, 11, 9, 13, 46, 8, 85538).timestamp
E            +  and   1510235168.085538 = <built-in method timestamp of Timestamp object at 0x10c16fb10>()
E            +    where <built-in method timestamp of Timestamp object at 0x10c16fb10> = Timestamp('2017-11-09 13:46:08.085538').timestamp

pandas/tests/scalar/test_timestamp.py:1110: AssertionError
________________________________ TestTimeZones.test_replace_tzinfo _________________________________
[gw1] darwin -- Python 3.6.3 /Users/clemens/anaconda/envs/pandas_dev/bin/python
self = <pandas.tests.tseries.test_timezones.TestTimeZones object at 0x10e8f2f98>

    def test_replace_tzinfo(self):
        # GH 15683
        dt = datetime(2016, 3, 27, 1)
        tzinfo = pytz.timezone('CET').localize(dt, is_dst=False).tzinfo
    
        result_dt = dt.replace(tzinfo=tzinfo)
        result_pd = Timestamp(dt).replace(tzinfo=tzinfo)
    
        if hasattr(result_dt, 'timestamp'):  # New method in Py 3.3
            assert result_dt.timestamp() == result_pd.timestamp()
        assert result_dt == result_pd
        assert result_dt == result_pd.to_pydatetime()
    
        result_dt = dt.replace(tzinfo=tzinfo).replace(tzinfo=None)
        result_pd = Timestamp(dt).replace(tzinfo=tzinfo).replace(tzinfo=None)
    
        if hasattr(result_dt, 'timestamp'):  # New method in Py 3.3
>           assert result_dt.timestamp() == result_pd.timestamp()
E           AssertionError: assert 1459036800.0 == 1459040400.0
E            +  where 1459036800.0 = <built-in method timestamp of datetime.datetime object at 0x10e8effd0>()
E            +    where <built-in method timestamp of datetime.datetime object at 0x10e8effd0> = datetime.datetime(2016, 3, 27, 1, 0).timestamp
E            +  and   1459040400.0 = <built-in method timestamp of Timestamp object at 0x10e8fcf48>()
E            +    where <built-in method timestamp of Timestamp object at 0x10e8fcf48> = Timestamp('2016-03-27 01:00:00').timestamp

pandas/tests/tseries/test_timezones.py:1290: AssertionError

Any ideas?

@jreback

This comment has been minimized.

Contributor

jreback commented Nov 9, 2017

@cbrnr ignore those, see #18037

python .timestamp() uses the local timezone to convert things, needs to be put into a consistent tz so it works for everyone.

@cbrnr

This comment has been minimized.

Contributor

cbrnr commented Nov 9, 2017

OK cool, so let's wait if CIs come back happy (except for these 2 timezone-related ones). Could you help me with the whats_new entry (because we've agreed that this should be prominently visible)?

Also, I hope that my changes to the tests are OK, I mostly set the values for max_columns and max_rows to their old defaults 20 and 60, respectively.

@jreback

This comment has been minimized.

Contributor

jreback commented Nov 9, 2017

whatsnew, make a new subsection. then put a screen shot of the before and one of the after. then it should read as if you are a user wanting to know whether this change will affect you (e.g. if you use ipython, the interpreter, etc).

@jreback

This comment has been minimized.

Contributor

jreback commented Nov 9, 2017

for 0.22

@cbrnr

This comment has been minimized.

Contributor

cbrnr commented Nov 9, 2017

A new subsection under "New features"? It's not really a new feature, but it doesn't fit into the other categories either.

@jreback

This comment has been minimized.

Contributor

jreback commented Nov 9, 2017

under api breaking changes

@codecov

This comment has been minimized.

codecov bot commented Nov 9, 2017

Codecov Report

Merging #17023 into master will decrease coverage by 0.04%.
The diff coverage is 66.66%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17023      +/-   ##
==========================================
- Coverage   91.42%   91.38%   -0.05%     
==========================================
  Files         163      163              
  Lines       50068    50071       +3     
==========================================
- Hits        45776    45755      -21     
- Misses       4292     4316      +24
Flag Coverage Δ
#multiple 89.18% <66.66%> (-0.03%) ⬇️
#single 40.39% <66.66%> (-0.04%) ⬇️
Impacted Files Coverage Δ
pandas/core/config_init.py 96.09% <66.66%> (-2.26%) ⬇️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/plotting/_converter.py 63.38% <0%> (-1.82%) ⬇️
pandas/core/frame.py 97.8% <0%> (-0.1%) ⬇️
pandas/core/groupby.py 92.02% <0%> (-0.02%) ⬇️
pandas/io/formats/format.py 96.01% <0%> (ø) ⬆️
pandas/core/generic.py 95.72% <0%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8dac633...85d6225. Read the comment docs.

@codecov

This comment has been minimized.

codecov bot commented Nov 9, 2017

Codecov Report

Merging #17023 into master will increase coverage by 0.01%.
The diff coverage is 73.33%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17023      +/-   ##
==========================================
+ Coverage   91.82%   91.84%   +0.01%     
==========================================
  Files         152      152              
  Lines       49235    49245      +10     
==========================================
+ Hits        45212    45230      +18     
+ Misses       4023     4015       -8
Flag Coverage Δ
#multiple 90.23% <73.33%> (+0.01%) ⬆️
#single 41.89% <66.66%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/io/formats/format.py 98.24% <100%> (ø) ⬆️
pandas/io/formats/terminal.py 20.98% <66.66%> (+4.54%) ⬆️
pandas/core/config_init.py 99.24% <80%> (-0.76%) ⬇️
pandas/core/arrays/categorical.py 96.19% <0%> (-0.02%) ⬇️
pandas/core/indexes/datetimes.py 95.73% <0%> (-0.01%) ⬇️
pandas/core/indexes/period.py 92.61% <0%> (ø) ⬆️
pandas/core/strings.py 98.32% <0%> (ø) ⬆️
pandas/core/frame.py 97.18% <0%> (ø) ⬆️
pandas/core/dtypes/missing.py 91.07% <0%> (ø) ⬆️
pandas/core/generic.py 95.85% <0%> (ø) ⬆️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6c0c277...f795914. Read the comment docs.

@jorisvandenbossche

This comment has been minimized.

Member

jorisvandenbossche commented Mar 27, 2018

I moved the images to _static. Do I need to change anything when I refer to them, or does

.. image:: print_df_old.png

keep working?

I think you need to add _static/ to the path (at least, that's how we do it for all other images -> yes, the path it relative to the source file, or absolute to the main source directory)

@cbrnr

This comment has been minimized.

Contributor

cbrnr commented Mar 27, 2018

Regarding the deleted images (that I locally moved to doc/source/_static), this path is in .gitignore, which is why they are gone. How should I proceed?

@jorisvandenbossche

This comment has been minimized.

Member

jorisvandenbossche commented Mar 27, 2018

@cbrnr you need to add them by force (git add --force) to overwrite this ignore file (we ignore it because sphinx adds more images there that should be ignored)

@jorisvandenbossche

This comment has been minimized.

Member

jorisvandenbossche commented Mar 27, 2018

we ignore it because sphinx adds more images there that should be ignored

We could also decide to move our actual images somewhere else, to not have this confusion, but that's for another PR.

@cbrnr

This comment has been minimized.

Contributor

cbrnr commented Mar 27, 2018

Got it, the images are back.

@cbrnr

This comment has been minimized.

Contributor

cbrnr commented Mar 27, 2018

What do you mean with adding _static/ to the path? Do I need to modify the link I use in the .rst file?

@jorisvandenbossche

This comment has been minimized.

Member

jorisvandenbossche commented Mar 27, 2018

What do you mean with adding _static/ to the path? Do I need to modify the link I use in the .rst file?

Yes. You can always check if the images are included in the output with python doc/make.py --single whatsnew

@cbrnr

This comment has been minimized.

Contributor

cbrnr commented Mar 27, 2018

Nice, thanks! Images are now correctly embedded.

@cbrnr

This comment has been minimized.

Contributor

cbrnr commented Mar 27, 2018

I think I already asked that, but is it possible to see the HTML docs built by a CI service? I know that other projects use CircleCI for this purpose (so that it is not necessary to set up everything locally).

@jorisvandenbossche

This comment has been minimized.

Member

jorisvandenbossche commented Mar 27, 2018

I think I already asked that, but is it possible to see the HTML docs built by a CI service? I know that other projects use CircleCI for this purpose (so that it is not necessary to set up everything locally).

No, it is currently not possible. Open issue about this: #17921

pd.options.display.max_columns = 20
.. _whatsnew_0230.api:

This comment has been minimized.

@jorisvandenbossche

jorisvandenbossche Mar 27, 2018

Member

I think this one sneaked in due to merge conflict? (anyhow it can be removed)

This comment has been minimized.

@cbrnr

cbrnr Mar 27, 2018

Contributor

I'm sorry, what do you mean?

This comment has been minimized.

@jorisvandenbossche

jorisvandenbossche Mar 27, 2018

Member

According to the diff, you added this line. But this should not be added (therefore I assumed you added it by accident while updating against master with rebasing/merging). But so you can just remove this line.

This comment has been minimized.

@cbrnr

cbrnr Mar 27, 2018

Contributor

You mean line 690 (.. _whatsnew_0230.api:)?

This comment has been minimized.

@jorisvandenbossche

jorisvandenbossche Mar 27, 2018

Member

yes, that's the line on which I commented. There is no header following for which that link would make sense (there is actually already another link label on line 692)

This comment has been minimized.

@cbrnr

cbrnr Mar 27, 2018

Contributor

I thought there should be a section header before introducing the subsections. At least that's how it is done with .. _whatsnew_0230.api_breaking: in line 350 (but there is a heading after that). In any case, I'm happy to delete it.

This comment has been minimized.

@jorisvandenbossche

jorisvandenbossche Mar 27, 2018

Member

Yes, that is correct. But I don't understand the relation with this line? This link is just floating with no section or subsection header following it. You already have a header with link at line 661 - 664 ?

This comment has been minimized.

@cbrnr

cbrnr Mar 27, 2018

Contributor

It's for the section below:

.. _whatsnew_0230.api:

.. _whatsnew_0230.api.datetimelike:

Datetimelike API Changes
^^^^^^^^^^^^^^^^^^^^^^^^

This comment has been minimized.

@jorisvandenbossche

jorisvandenbossche Mar 27, 2018

Member

But that header already has the "whatsnew_0230.api.datetimelike" label, it does not need two labels.

This comment has been minimized.

@cbrnr

cbrnr Mar 27, 2018

Contributor

OK, done.

@jorisvandenbossche

Apart from my last two comments, looks good!

@pep8speaks

This comment has been minimized.

pep8speaks commented Mar 27, 2018

Hello @cbrnr! Thanks for updating the PR.

Line 648:80: E501 line too long (81 > 79 characters)

@@ -625,7 +625,7 @@ def to_string(self):
max_len += size_tr_col # Need to make space for largest row
# plus truncate dot col
dif = max_len - self.w
adj_dif = dif
adj_dif = dif + 1 # see GH PR #17023

This comment has been minimized.

@jorisvandenbossche

jorisvandenbossche Mar 27, 2018

Member

can you put it on the line above?

This comment has been minimized.

@cbrnr

cbrnr Mar 27, 2018

Contributor

You mean

dif = max_len - self.w  # see GH PR #17023
adj_dif = dif

?

dif is never used so we might as well skip it completely.

This comment has been minimized.

@jorisvandenbossche

jorisvandenbossche Mar 27, 2018

Member

No, I just meant to put the comment on its own line, not on the same line after the code, like

# '+ 1' to avoid too wide repr (GH PR #17023)
adj_dif = dif + 1

This comment has been minimized.

@cbrnr

cbrnr Mar 27, 2018

Contributor

I'm sorry, of course, I'll change that.

cbrnr added some commits Mar 27, 2018

@jorisvandenbossche jorisvandenbossche merged commit c9e8f59 into pandas-dev:master Mar 28, 2018

2 of 3 checks passed

continuous-integration/travis-ci/pr The Travis CI build is in progress
Details
ci/circleci Your tests passed on CircleCI!
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
@jorisvandenbossche

This comment has been minimized.

Member

jorisvandenbossche commented Mar 28, 2018

@cbrnr Thanks a lot for this (and for your patience getting this merged :))

@cbrnr cbrnr deleted the cbrnr:nicer_display_defaults branch Mar 28, 2018

@cbrnr cbrnr referenced this pull request Mar 28, 2018

Open

Set pd.options.display.max_rows = 20 by default #20514

3 of 4 tasks complete

javadnoorb added a commit to javadnoorb/pandas that referenced this pull request Mar 29, 2018

Set pd.options.display.max_columns=0 by default (pandas-dev#17023)
Change `max_columns` to `0` (automatically adapt the number of displayed columns to the actual terminal width)

dworvos pushed a commit to dworvos/pandas that referenced this pull request Apr 2, 2018

Set pd.options.display.max_columns=0 by default (pandas-dev#17023)
Change `max_columns` to `0` (automatically adapt the number of displayed columns to the actual terminal width)

kornilova-l added a commit to kornilova-l/pandas that referenced this pull request Apr 23, 2018

Set pd.options.display.max_columns=0 by default (pandas-dev#17023)
Change `max_columns` to `0` (automatically adapt the number of displayed columns to the actual terminal width)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment