Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Array-like output #18

Closed
bkamins opened this issue Jul 28, 2019 · 34 comments
Closed

Array-like output #18

bkamins opened this issue Jul 28, 2019 · 34 comments

Comments

@bkamins
Copy link

bkamins commented Jul 28, 2019

A common case is when a table is too wide and too long to be printed in REPL.
Currently only first rows/columns that fit the display size are printed.

It would be great if there were an option for a printout like for matrices in Base (ie. printing first and last rows/columns and dropping the ones in the middle).

Also it would be nice if, when some columns are omitted when printing, there would be an optional information what was omitted under the table (probably also truncated somehow as with 10'000 columns even listing omitted column names would swamp REPL).

@ronisbr
Copy link
Owner

ronisbr commented Jul 28, 2019

Hi @bkamins!

I thought about this when I added the option to crop so that the print fits the display. I got stuck in cases when the combined size of the first and last row is bigger than the display size. This can happen when the table has big strings. In this case, I do not know what to do. Any tips?

@bkamins
Copy link
Author

bkamins commented Jul 28, 2019

I was thinking about it also. Base is not ideal (check out this x = vcat(fill(["a"^100 "b" "c" "d"^100],100)...)).

What I would do is the following set of rules:

  1. if at first and last column fit then print them (then recursively use the algorithm for the remaining space width-wise stripping the first and the last column from the table)
  2. if they do not fit and the first column fits then print it and as many columns as possible from the start (current behavior)
  3. if the first column does not fit but the last fits then do the same as 2) but from the left side (fill as much as possible from the back)

(in steps 2 and 3 there is no recursion as they can work only from one side - only step 1 is recursive)

EDIT: of course you need to reserve space in the middle for ... in case it is needed

Also note that currently (as is) you do not handle this case in the best way possible (I am on a terminal with 166 characters width-wise):

pretty_table(DataFrame(vcat(fill(["a"^100 "b" "c" "d"^100],100)...)))

nor this one

pretty_table(DataFrame(vcat(fill(["a"^1000 "b" "c" "d"^100],100)...)))

(both these have to be fixed before the algorithm described above could work)

Is my idea clear?

@bkamins
Copy link
Author

bkamins commented Jul 28, 2019

If this was done and (sorry for the laundry list of wishes 😄 - I hope it is OK):

  • the HTML and LaTeX displays are added
  • the package load time is reduced (have you thought if this is doable? - now it is 0.4s on my laptop which is a bit large if it were to be a dependency - but maybe it is not possible; I have not checked what is heavy in your dependency list)

then I would love to switch DataFrames.jl default display to your package and avoid duplicating the efforts (actually show related stuff in DataFrames.jl is quite large and there is still lots of things to be done to make it ideal).

@ronisbr
Copy link
Owner

ronisbr commented Jul 29, 2019

Is my idea clear?

Nice! I think I understood what you want. This would be a major change in the way I tried to handle cropping, but it is possible! Let's see what I can do :)

If this was done and (sorry for the laundry list of wishes 😄 - I hope it is OK):

That's fine! Thanks for all those comments and suggestions :)

  • the HTML and LaTeX displays are added

I always wanted it. This will require some major changes in the way I print things to add the capability to change the backend between plain text, HTML, and LaTeX. Maybe I will need some help (I am not HTML expert) but I think I can do this.

  • the package load time is reduced

In this case, I have absolutely no experience. Can you give me some help about how can I start to debug what is making the load time high?

then I would love to switch DataFrames.jl default display to your package and avoid duplicating the efforts

Perfect! I will do my best to add those features :)

@bkamins
Copy link
Author

bkamins commented Jul 29, 2019

This would be a major change in the way I tried to handle cropping, but it is possible! Let's see what I can do :)

It is only a suggestion, but simply the way Base works here is almost perfect IMO (except for the corner case I have mentioned)

HTML and LaTeX

Actually this is even simpler than plain text (you simply have to add markups). You can have a look at these test files for examples: https://github.com/JuliaData/DataFrames.jl/blob/master/test/io.jl.

The only tricky part is getting device dimensions, see the note here about environment variables.

Can you give me some help about how can I start to debug what is making the load time high?

This is pretty tricky as it best should be debugged under Win, Linux and OSx as there are some differences in load times (Win tends to be slowest). The general idea is to drop a dependency if it adds a significant load time while not being used very much in the package.

For example Parameters.jl is an awesome package, but it seems you use it in only three places. I have not benchmarked how much would be saved by removing it, but its "stand alone" load time is 0.127939 seconds, which is 25% of the total load time of your package.

But this requirement was rather a wish - it is likely that package load times improve in the future.

Thank you for taking time of looking into it.


Also you might consider to have a look at allrows, allcols and splitcols kwargs in show of DataFrame. I do not think they are super useful (you can get a similar effect by setting a proper IOContext, but maybe you will like some of the ideas noted there).

@ronisbr
Copy link
Owner

ronisbr commented Sep 14, 2019

Just an update! I need to finish other projects due to some demands at work, but I reviewed the code of PrettyTables and I have at least a plan to implement everything asked here. I hope I will manage to find time to do this soon :) The idea is to make the printing function do not print per se, but call printing functions from backends. Then, we can have HTML, LaTeX, etc.

@bkamins
Copy link
Author

bkamins commented Sep 14, 2019

Great - this "decoupling" of a backend is exactly what I think is needed, just like for plotting.

The only thing to have in the consideration is that different backends might accept different configuration options (e.g. LaTeX has a finite size of page), HTML can be scrolled, in REPL you can do paging. This might be a bit tricky, but I think should be doable.

Thank you for your efforts.

@dylanjm
Copy link

dylanjm commented Oct 12, 2019

Just jumping in on this convo, but it might be cool to do something along the lines of what the R package tibble does to output tables with too many columns.

Screen Shot 2019-10-12 at 3 38 43 PM

@bkamins
Copy link
Author

bkamins commented Oct 12, 2019

We already do it in DataFrames.jl. The only difference is that we do not print names of omitted columns. If you think it would be useful you can consider also opening a separate issue there.

@ronisbr
Copy link
Owner

ronisbr commented Oct 12, 2019

Hi @dylanjm !

Nice, I just need to think how to adapt this to PrettyTables, since it can print columns with different element types.

ronisbr added a commit that referenced this issue Nov 10, 2019
This is still a work in progress to address the issues #2 and #18.
@Tokazama
Copy link
Contributor

Just wanted to chime in with my gratitude for this excellent package and interest in support for pretty printing arrays. I would really like to use it for AxisIndices.jl where there can be all sorts of labels for both rows and columns.

@ronisbr
Copy link
Owner

ronisbr commented Feb 21, 2020

Hi @Tokazama ,

Thank you very much for the kind words. If you need any specific feature, please, let me know!

@Tokazama
Copy link
Contributor

I've almost got something working. The one problem I can't find a good work around for is the row names. So given the following:

x = reshape(1:4, (2,2))
column_names = [:c1, :c2]
row_names = [:r1, :r2]

I can convert everything to the same type as the row names.

pretty_table(
    hcat(row_names, Symbol.(x)),
    vcat(Symbol(""), column_names)
   )

But that's problematic if the array is very large because then the entire array needs to be converted.

Besides this I think I can just do what is done in base where I repeatedly print matrix views of multidimensional arrays.

@ronisbr
Copy link
Owner

ronisbr commented Feb 25, 2020

@Tokazama I wouldn't be worried with this performance for printing tables, because if the table is really too big, it can't even fit on screen. However, you gave me an idea: we could add an option to pass a vector with row names, which will be displayed as the first columns. Does it work for you?

@Tokazama
Copy link
Contributor

because if the table is really too big, it can't even fit on scree

The problem in my example above is that the entire array would need to be converted even if it's not all printed. Of course I could just convert the portion that I know will be printed but that would mean not taking advantage of a lot of the great work you've already done to for detecting screen size and printing the right number of characters.

we could add an option to pass a vector with row names

That would be perfect!

@ronisbr
Copy link
Owner

ronisbr commented Feb 25, 2020

That would be perfect!

Nice! Let's see what I can do.

@Tokazama
Copy link
Contributor

Tokazama commented Mar 2, 2020

I've got a very primitive solution in place here. You can see the output here. I'm sure you could do a lot better than me with this so feel free to steal any of that code if it's helpful.

BTW. The first definition of pretty_array is mostly stolen from the base function Base.show_nd.

@Tokazama
Copy link
Contributor

Tokazama commented Mar 3, 2020

I've got an inefficient hack for displaying rows.
test

Codes in the same place as in the previous links.

@ronisbr
Copy link
Owner

ronisbr commented Mar 3, 2020

Nice! Now I see what do you want. I think that having an option row_names is the way to go! Then, you will be able to combine with the header to achieve the desired effect.

@Tokazama
Copy link
Contributor

Tokazama commented Mar 3, 2020

My approach is definitely inefficient because it takes a bit to print in the REPL, but I'm really impressed with how much this package has allowed me to do. You might be interested in the example I put together that essentially replaces CoefTable in the JuliaStats ecosystem. I have a screenshot here and more stuff in my documentation.

ronisbr added a commit that referenced this issue Mar 8, 2020
In this commit, only the text backend is supported.

This addresses a request mentioned in the issue #18.
@ronisbr
Copy link
Owner

ronisbr commented Mar 8, 2020

Hi @Tokazama

I have just pushed a commit in which I added the option row_names (together with row_name_alignment and row_name_column_title). This will provide the functionality you requested:

Captura de Tela 2020-03-08 às 19 06 58

@ronisbr
Copy link
Owner

ronisbr commented Mar 8, 2020

By the way, I only implemented this for text backend. If the API I selected is fine, then I will do for all the other backends.

@Tokazama
Copy link
Contributor

Tokazama commented Mar 8, 2020

This looks awesome! Does it make sense to have a column_names keyword?

@ronisbr
Copy link
Owner

ronisbr commented Mar 9, 2020

A column name will be the same thing as the header, won’t it?

@Tokazama
Copy link
Contributor

Tokazama commented Mar 9, 2020

I guess the difference in terminology between "names" and "header" is pretty meaningful here so it probably won't be important to disambiguate. I was just thinking out loud and won't have any problems with the implementation you now have.

@Tokazama
Copy link
Contributor

Tokazama commented Apr 8, 2020

Is master currently ready for a new release or is there more that needs to be done before making this feature available?

@ronisbr
Copy link
Owner

ronisbr commented Apr 8, 2020

Hi @Tokazama

Sorry for the delay. The next release will be breaking. Thus, I want to use it to implement a few features I always wanted but require to break things. If everything goes right, then I think I can release master this weekend :)

@Tokazama
Copy link
Contributor

I have a bare bones implementation of pretty printing for arrays with no dependencies but PrettyTables here. If you're interested I could make a PR to PrettyTables.jl with some minimal implementation but I'm not sure I know the coding style of PrettyTables internally well enough to do so effectively.

@ronisbr
Copy link
Owner

ronisbr commented Apr 21, 2020

I have a bare bones implementation of pretty printing for arrays with no dependencies but PrettyTables here. If you're interested I could make a PR to PrettyTables.jl with some minimal implementation but I'm not sure I know the coding style of PrettyTables internally well enough to do so effectively.

Interesting, but can you please explain me what are the gains of defining a new backend for an array? It seems that the same effect can be achieved only by selecting some options to the available text back-end. Am I wrong?

@Tokazama
Copy link
Contributor

It wouldn't necessarily need all the code for printing a matrix or a vector, but this could add support for arrays with more than 2 dimensions.

@ronisbr
Copy link
Owner

ronisbr commented Apr 21, 2020

Hum, can you please show an example?

@Tokazama
Copy link
Contributor

This previously shared image uses similar code to display a multidimensional array using PrettyTables as a backend.

test

If it's outside the scope of this package that's fine. I was just trying to help address the issues title, "Array-like output".

@ronisbr
Copy link
Owner

ronisbr commented Apr 23, 2020

Hum! I see. I think we can add a back-end to print multi-dimensional arrays. However, I need to think a little more about it, because it deviates from the initial purpose of this package. But your idea is very nice! Can you please open a new issue related to arrays with more than 2 dimensions?

@ronisbr
Copy link
Owner

ronisbr commented Apr 17, 2021

I will close this since PrettyTables.jl is already the text backend of DataFrames.jl.

@ronisbr ronisbr closed this as completed Apr 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants