Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make head(), tail() and limit() return the same number of rows for Expr, Series, DataFrame and LazyFrame #13445

Open
Wainberg opened this issue Jan 4, 2024 · 9 comments
Labels
enhancement New feature or an improvement of an existing feature

Comments

@Wainberg
Copy link
Contributor

Wainberg commented Jan 4, 2024

Description

Expr.head() and Series.head() return 10 rows by default, while DataFrame.head() and LazyFrame.head() return 5 rows. We should pick one ;)

We should do the same for tail() and limit().

@Wainberg Wainberg added the enhancement New feature or an improvement of an existing feature label Jan 4, 2024
@Wainberg Wainberg changed the title Make Expr.head(), Series.head(), DataFrame.head() and LazyFrame.head() return the same number of rows by default Make head(), tail() and limit() return the same number of rows for Expr, Series, DataFrame and LazyFrame Jan 4, 2024
@mcrumiller
Copy link
Contributor

I like 5. The point is to see what the table looks like (columns, dtypes, reasonable samples of data). and adding more rows than that might make you scroll up in a smallish terminal.

@stinodego
Copy link
Member

I don't think we need to pick one at all. One is a single column while the other is a 2D table. 10 rows of a dataframe is a lot of information, 10 elements of a column not so much.

There is also something to be said for having the same value across data types, but it's not so black and white as you make it out to be.

@Wainberg
Copy link
Contributor Author

Wainberg commented Jan 4, 2024

I would argue for either using 5 for all, or changing the default number of printed rows. Currently you can't see the intermediate rows with Expr.head():

>>> pl.DataFrame(range(100)).select(pl.all().head())
shape: (10, 1)
┌──────────┐
│ column_0 │
│ ---      │
│ i64      │
╞══════════╡
│ 0        │
│ 1        │
│ 2        │
│ 3        │
│ …        │
│ 6        │
│ 7        │
│ 8        │
│ 9        │
└──────────┘

@Wainberg
Copy link
Contributor Author

Wainberg commented Jan 4, 2024

Would you be open to changing the default number of printed rows from 8 to 10?

@alexander-beedie
Copy link
Collaborator

alexander-beedie commented Jan 13, 2024

Would you be open to changing the default number of printed rows from 8 to 10?

Apparently so 😎 #13699

@Wainberg
Copy link
Contributor Author

Love it!!

Ok so at this point I would definitely advocate for head(), tail() and limit() to return 10 rows instead of 5. I'm sympathetic to Stijn's argument that 10 rows of a DataFrame can be more of an 'information overload' than 5 rows, but if we're now printing 10 rows of a DataFrame by default, may as well do the same for head() etc.

@alexander-beedie
Copy link
Collaborator

alexander-beedie commented Jan 19, 2024

if we're now printing 10 rows of a DataFrame by default, may as well do the same for head() etc.

Counter-argument: in those 10 rows we're displaying the equivalent of head(5) and tail(5) once the total number of rows >= 10, so it can also be argued that it is consistent to keep that correspondence (I'm relatively neutral, though lean towards keeping frame head/tail at 5 for that reason as well as @stinodego's 😉).

@mcrumiller
Copy link
Contributor

mcrumiller commented Jan 19, 2024

@alexander-beedie very good point. No matter what n we pick for head and tail, we must show 2n+1 or 2n-1 rows in df.__str__() without an asymmetric head/tail (if we want to show the same information).

@mcrumiller
Copy link
Contributor

mcrumiller commented Jan 19, 2024

My vote here is:

  • 10 rows for head and tail
  • 10 rows for limit
  • first 5 and last 5 for __str__

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature
Projects
None yet
Development

No branches or pull requests

4 participants