Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to limit the display width in polars so that wide dataframes are printed in a legible way? #7665

Open
randomgambit opened this issue Mar 21, 2023 · 9 comments
Labels
enhancement New feature or an improvement of an existing feature

Comments

@randomgambit
Copy link

Problem description

Hello,

Coming from here https://stackoverflow.com/questions/75786523/how-to-limit-the-display-width-in-polars-so-that-wide-dataframes-are-printed-in?noredirect=1#comment133694560_75786523

Consider the following example

pd.set_option('display.width', 50)

pl.DataFrame(data = np.random.randint(0,20, size = (10, 42)),
             columns = list('abcdefghijklmnopqrstuvwxyz123456789ABCDEFG')).to_pandas()

enter image description here

You can see how nicely the columns are formatted, breaking a line after column k so that the full dataframe is printed in chunks in the console. This was controlled by the pandas width argument above. I was not able to reproduce this behavior using polars and all the format options.

I have tried tweaking all possible settings:

pl.Config.set_tbl_cols(10)
pl.Config.set_fmt_str_lengths(10)
pl.Config.set_tbl_width_chars(70)
pl.Config.set_tbl_rows(2)
pl.Config.set_tbl_formatting('NOTHING')
pl.Config.set_tbl_column_data_type_inline(True)  
pl.Config.set_tbl_dataframe_shape_below(True) 

See below:

enter image description here

Any ideas? Thanks!

@randomgambit randomgambit added the enhancement New feature or an improvement of an existing feature label Mar 21, 2023
@alexander-beedie
Copy link
Collaborator

I'll experiment with this one later; got a few ideas (unless anyone else wants to jump in...? ;)

@randomgambit
Copy link
Author

randomgambit commented Apr 19, 2023

Hello @alexander-beedie ! is there something I can try? Happy to help! but just to clarify: this is really about displaying many columns (more than could fit in one single row) at the same time. Pandas seems to do it by breaking the columns into chunks that fit nicely on each line (see my example above). This is extremely useful as it allows the use to see ALL columns of a dataframe (even hundreds) at once after a simple .head() call. The only effort needed is to scroll up/down the console. Does that make sense? Thanks!

@arturdaraujo
Copy link

I upvote for this feature to be implemented. This visual enhancement would be really welcomed.

@arturdaraujo
Copy link

This feature would be nice

@randomgambit
Copy link
Author

This would be amazing indeed.

@cmdlineluser
Copy link
Contributor

I'm not sure if anything changed since pandas 2.x, but I had to set display.max_columns to None in order to enable this behaviour.

display.expand_frame_repr (default=True) then comes into play:

import pandas as pd
import polars as pl

pd.set_option('display.width', 100)
pd.set_option('display.max_columns', None)

(pl.DataFrame(data = np.random.randint(0,20, size = (10, 42)),
   schema = list('abcdefghijklmnopqrstuvwxyz123456789ABCDEFG')).to_pandas())

#     a   b   c   d   e   f   g   h   i   j   k   l   m   n   o   p   q   r   s   t   u   v   w   x  \
# 0  13   3  19   3  13   5  12  17   7  12   3   6  16   8  11   2  18  14   2  12  12   1   4  14
# 1   2   9   0  14  17   8  14   9   5  15   2  10  16  12  18  10   7   4  15  17  13   3  18   3
# 2   3   2  17  13  19   6   4  17  19   7   0  13  17   0  13   6   9   9   4   9  17  12   7   3
# 3   9  15  17   9   5  16   2  15  13  11   1  13  18   2  12  18  18   2  18   3  11  12  13   5
# 4  17  19   8   3   5   7   9  11  13   8  12  13   4  11   6  13  16   0   6   7   6  14  17  12
# 5  13   4  12  12   6   7  15  19   4  15  13   8   2   7   5   2   3   3   3   6   6   7   5  10
# 6  12   1   4  18  16   0   6  16   7  10   2  16   5   1  15  10  10  12  16   7  10  13   9  19
# 7   6  12   5   0  13   8  14   8   6  11   4  10   2   0   3  18  12   5  16  18  18   1  14   7
# 8   8  15   9   4  14  13  19  14  11   0  19   9  19   4   7  10   9  19  13  18  18  11  15   7
# 9   5   9   5   2   0  16  14  19   8  15   8   7   8   3  16   7  15   0   3   2   0   9   3   4
# 
#     y   z   1   2   3   4   5   6   7   8   9   A   B   C   D   E   F   G
# 0   7  11  12  18  12   3   8  15  19   8   6  11   1   5  18  17   6  11
# 1  12   5  14  11   6   3   0   9  19   4  18  19   0  14   8  19   9  13
# 2  19  13  14  12  19   6   5  14  12  12   2  17  15   8  10  12   0  13
# 3  11  17  19  16   1  10  17  10   2   4  12  19  16   7   9   6   7   9
# 4  14   4  15  16   8  16   9  14   2   6   9   9   2   3   8  13   9   9
# 5   0  10  15  11  12  10   6   0  18   0   7  14  13   1   3   7   9  14
# 6   3   5  16   8   4  18   2   7  13  11   8  16   8  16   0   4   8  10
# 7  17   6  18  14  16   5   5  15   5   0   9  10  15  18  11  18   4   0
# 8  18  11   0   6   2   1   4   2  15   2   2  14   5   4  13  11  17   6
# 9  19  17   9  13  13  17  16  13   8  19   7  15   8   2   0   8  15   2

Polars is using comfy-table to display frames:

.set_content_arrangement(ContentArrangement::Dynamic);

It looks like comfy-table has 3 layout modes: Disabled, Dynamic, DynamicFullWidth

https://github.com/Nukesor/comfy-table/blob/aebf4ef66d16ae356fdbac5d2ff5f2d2025fb48a/src/style/table.rs#L12-L29

From what I can tell - none of these allow a way to do this and would need:

  • a custom implementation?
  • adding a new layout to comfy-table (is this even feasible/possible?)
  • ...?

There is a recent issue regarding potential new column handling behaviour: Nukesor/comfy-table#124

But I can't make out if this would also allow a way to implement a display.expand_frame_repr equivalent.

@mikemdr1
Copy link

I hope this feature will be included in recent implementations

@mikemdr1
Copy link

I'm working in Spyder, for me this is a frequent operation (becasue I want check step by step results in my code)
So, I didn't to want use 'df.to_polars()' each time, therefore I overwrote default str method for Polar's DataFrame in order to get same results as Pandas
2023-11-28 22_29_48-Spyder
I hope this is usefull for someone

@mikemdr1
Copy link

I forgot to mention about the purpose of 'use_pyarrow_extension_array=True'

When you convert a Polars Dataframe to Pandas Dataframe, Python takes xx miliseconds to process it
This translates to longer processing time for larger dataset, due to python has to convert polars data types to numpy/pandas data types.
However if you use this option processing time is almost instantanly (due to pandas compatibility with pyarrow data types for pandas>=2.1)

Other alternative is to combine it with 'head' method if you only wants a glimpse of data
As there are tew records, processing time to convert data types is super small
2023-11-29 00_11_01-Spyder

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature
Projects
None yet
Development

No branches or pull requests

5 participants