Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-650150: Add some indication to DataFrame.show() that it isn't all the results #447

Open
mbkupfer opened this issue Aug 22, 2022 · 3 comments
Labels
feature New feature or request

Comments

@mbkupfer
Copy link

What is the current behavior?

No indication that show is only printing top results, which could mislead users that aren't aware.

>>> df = session.range(1,12).to_df('col1') # create 11 element sequence
>>> df.show()
----------
|"COL1"  |
----------
|1       |
|2       |
|3       |
|4       |
|5       |
|6       |
|7       |
|8       |
|9       |
|10      |
----------
>>> df.show(n=11)
----------
|"COL1"  |
----------
|1       |
|2       |
|3       |
|4       |
|5       |
|6       |
|7       |
|8       |
|9       |
|10      |
|11      |
----------

What is the desired behavior?

Option 1:
Some indication that this is a paged result. Like 1/n, or 15 out of # rows (if that is efficient to do).

Option 2:
Create a DataFrame.head() function that replicates pandas behavior, making it more obvious that this is a preview.

How would this improve snowflake-snowpark-python?

Users will not accidentally realize that they are omitting results when results sets are right around the threshold of show

References, Other Background

@mbkupfer mbkupfer added the feature New feature or request label Aug 22, 2022
@github-actions github-actions bot changed the title Add some indication to DataFrame.show() that it isn't all the results SNOW-650150: Add some indication to DataFrame.show() that it isn't all the results Aug 22, 2022
@sfc-gh-jdu
Copy link
Collaborator

Hey @mbkupfer thanks for your suggestion. I think option 1 is better that we will add one line saying how many rows are displayed, but probably will not count the total rows in this dataframe since it needs another call to count(), which might not be efficient. Option 2 might make users confused because pandas' head is to return n rows, instead of print n rows.

@cpcloud
Copy link

cpcloud commented Mar 15, 2023

We had to solve this in https://github.com/ibis-project/ibis and it's not particularly bad. You don't need two calls: you can execute a limit with N+1 rows where N is the number of rows to show and the check whether the number of returned rows is greater than N.

Ibis has great snowflake support, so if anyone is interested in a DataFrame API with a nice indicator of "more rows" check it out!

@mbkupfer
Copy link
Author

@cpcloud I'll take a look at the project. Thanks for sharing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants