Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect truncating of Series according to max_rows #7508

Closed
jorisvandenbossche opened this issue Jun 19, 2014 · 11 comments · Fixed by #9182
Closed

Incorrect truncating of Series according to max_rows #7508

jorisvandenbossche opened this issue Jun 19, 2014 · 11 comments · Fixed by #9182
Labels
Bug Output-Formatting __repr__ of pandas objects, to_string
Milestone

Comments

@jorisvandenbossche
Copy link
Member

Series is not correctly truncated according to the max_rows (number of rows is halved).

In [146]: s = pd.Series(np.random.randn(20))

In [147]: pd.options.display.max_rows = 10

In [148]: s
Out[148]:
0    1.545202
1   -1.427565
2   -0.961094
...
17    0.125228
18    2.153724
19   -0.384024
Length: 20, dtype: float64

In [149]: s.to_frame()
Out[149]:
           0
0   1.545202
1  -1.427565
2  -0.961094
3   0.387392
4   1.036472
..       ...
15  0.918433
16 -0.874253
17  0.125228
18  2.153724
19 -0.384024

[20 rows x 1 columns]
@jorisvandenbossche
Copy link
Member Author

And something else, I noticed the explanation of max_rows/columns is not correct (not updated with recent changes):

In [22]: pd.__version__
Out[22]: '0.14.0-155-gc83ee14'

In [23]: pd.describe_option('display.max_columns')
display.max_columns : [default: 20] [currently: 20]: int
        max_rows and max_columns are used in __repr__() methods to decide if
        to_string() or info() is used to render an object to a string.  In case
        python/IPython is running in a terminal this can be set to 0 and pandas
        will correctly auto-detect the width the terminal and swap to a smaller
        format in case all columns would not fit vertically. The IPython noteboo
k,
        IPython qtconsole, or IDLE do not run in a terminal and hence it is not
        possible to do correct auto-detection.
        'None' value means unlimited.

In [24]: pd.describe_option('display.max_rows')
display.max_rows : [default: 60] [currently: 10]: int
        This sets the maximum number of rows pandas should output when printing
        out various output. For example, this value determines whether the repr(
)
        for a dataframe prints out fully or just a summary repr.
        'None' value means unlimited.

It still says it will decide on showing a info view or not, while now this is controlled by display.large_repr

@bjonen
Copy link
Contributor

bjonen commented Jun 19, 2014

The part about auto detection is also out of date (see #7180 (comment))

@jreback jreback modified the milestones: 0.15.0, 0.14.1 Jun 26, 2014
@jorisvandenbossche jorisvandenbossche modified the milestones: 0.15.0, 0.15.1 Jul 8, 2014
@jreback
Copy link
Contributor

jreback commented Sep 17, 2014

does #7180 fix this?

@jreback jreback modified the milestones: 0.15.1, 0.15.0 Sep 17, 2014
@bjonen
Copy link
Contributor

bjonen commented Sep 17, 2014

No. Series truncation is currently done in series.py instead of format.py.

SeriesFormatter needs to be updated to what is done in 'DataFrameFormatter'.

That includes both 1) fixed truncation according tomax_rows and 2) inferred truncation from console size.

@jreback
Copy link
Contributor

jreback commented Sep 17, 2014

ok, If you would like to do this gr8!. for 0.15.0 would be great as well (need it in in next 2 weeks)

@bjonen
Copy link
Contributor

bjonen commented Sep 17, 2014

Sure, just want to make sure #7691 is taken care of before moving on.

@jreback
Copy link
Contributor

jreback commented Oct 10, 2014

@jorisvandenbossche let's put these on tap for 0.15.1. (and #8532)

@jorisvandenbossche
Copy link
Member Author

Additional issue I just noticed. Putting the max_rows at larger values has also no effect on series output:

In [52]: s = pd.Series(range(200))

In [53]: s
Out[53]:
0      0
1      1
2      2
3      3
4      4
5      5
6      6
7      7
8      8
9      9
10    10
11    11
12    12
13    13
14    14
...
185    185
186    186
187    187
188    188
189    189
190    190
191    191
192    192
193    193
194    194
195    195
196    196
197    197
198    198
199    199
Length: 200, dtype: int64

In [54]: pd.options.display.max_rows = 10

In [55]: s
Out[55]:
0    0
1    1
2    2
...
197    197
198    198
199    199
Length: 200, dtype: int64

In [56]: pd.options.display.max_rows = 100

In [57]: s
Out[57]:
0      0
1      1
2      2
3      3
4      4
5      5
6      6
7      7
8      8
9      9
10    10
11    11
12    12
13    13
14    14
...
185    185
186    186
187    187
188    188
189    189
190    190
191    191
192    192
193    193
194    194
195    195
196    196
197    197
198    198
199    199
Length: 200, dtype: int64

@bjonen
Copy link
Contributor

bjonen commented Oct 10, 2014

This is what is done right now:

if max_rows and len(self.index) > max_rows:
    result = self._tidy_repr(min(30, max_rows - 4))

@jorisvandenbossche
Copy link
Member Author

What is the logic behind that?

@jreback
Copy link
Contributor

jreback commented Oct 10, 2014

@jorisvandenbossche I think that is original code from who knows when. I think we should do what @bjonen did for frames, makes a lot of sense (and possibly, though maybe difficult, to share the code a bit)

@jreback jreback modified the milestones: 0.16.0, 0.15.2 Nov 24, 2014
@jreback jreback modified the milestones: 0.16.0, Next Major Release Mar 6, 2015
@jreback jreback modified the milestones: 0.16.0, Next Major Release Mar 6, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Output-Formatting __repr__ of pandas objects, to_string
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants