Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement proposal: reduce code complexity, improve legends, allow non-incremental plots (for grouped bar) #136

Closed
greimel opened this issue Jan 13, 2021 · 16 comments

Comments

@greimel
Copy link
Collaborator

greimel commented Jan 13, 2021

Hi @piever,

When working on #91, #84 I found that the pipeline currently is too complex for me to fully grasp and also too complex to make progress on these two PRs. Since I want to use the Makie ecosystem for all my plots, I wrote a lighter version of this package (it's probably closer to StatsMakie) that fits in one Pluto notebook.

It improves on this package in the following ways

What it doesn't handle

  1. only supports data frames (no slicing context)
  2. doesn't deal with analyses
  3. it doesn't handle labelling of x and y axes and categorical ticks

(While 1. and 2. need some thought, 3. will be really easy to address)

EDIT: the updated code is here: https://github.com/greimel/TabularMakie.jl

Here is the notebook https://gist.github.com/greimel/7bef42c2b41dec22650f8c0262bd8b5f
here is a static preview http://htmlpreview.github.io/?https://gist.githubusercontent.com/greimel/7bef42c2b41dec22650f8c0262bd8b5f/raw/15fa97618e34a991379a47e810f3a9b39366f9b4/illustration.jl.html

and at the bottom there are some plots from the notebook.

Let me know if you like this approach then we can discuss how it can be possible to integrate this either in StatsMakie or AlgebraOfGraphics.

Please also let me know if you immediately see why this approach cannot work with the slicing approach or the composability of your AlgebraicDict.

lplot(Lines, ts_df, :t, :v; color = :g_co, layout_x = :g_la, linestyle = :g_ls, linewidth = 2)

image

lplot(BarPlot, bar_df, :grp_x, :y; dodge = :grp_dodge, stack = 
		:grp_stack, color = :grp_stack, layout_x = :g_la)

image

lplot(Scatter, cs_df, :xxx, :yyy; color = :s_c, marker = :g_m,  markersize = :s_m, layout_y = :g_lx)

image

@piever
Copy link
Collaborator

piever commented Jan 15, 2021

Really nice work! I think it's great to simplify the underlying mechanisms of AlgebraOfGraphics. It also could use some decoupling between syntax, data manipulations, and rendering (IMO, rendering is the one that should be factored out).

I would propose the following way to move forward:

  • your rewrite could live in its own experimental package (let's call it AoGBackend for now)
  • AlgebraOfGraphics could switch to using that as a backend (but users who don't like the syntax can still use AoGBackend directly)

I believe that things like grouping should not really be in AoGBackend, which should receive the data already grouped (that is, as a list of tables, each with their own key), so that other grouping strategies become possible (multidimensional slices for instance).

Analysis are not really a problem, because that's done before plotting by AlgebraOfGraphics, so you shouldn't worry about those.

The other thing is the + to combine different plots. For example, to do something like

geom = visual(Scatter) + linear
data(df) * style(x, y; kwargs...) * geom

using AoGBackend, we would need to have some sort of "incremental mode" for geometries (not within a single geometry). In particular, it needs to be able to plot! the second series on top of the first one, which may not be completely trivial in the presence of a layout. Do you think that's feasible in your implementation?

@greimel
Copy link
Collaborator Author

greimel commented Jan 17, 2021

Thanks for your feedback!

I'll think about the +. I think what it'll boil down to is to add the layout positions to group_dict and figure out a way to merge group_dict and style_dict for multiple plots.

I'll comment here once I've created the package. Are there any other ideas for such a package? If we want to make it usable without AlgebraOfGraphics syntax then we probably don't want to have its abbreviation in the name, do we?

Probably TabularMakie.jl or DataMakie.jl, or so?

Any ideas @SimonDanisch, @jkrumbiegel, @mkborregaard (tagging you as my perceived spokesperson for JuliaPlots)?

@greimel
Copy link
Collaborator Author

greimel commented Jan 20, 2021

@piever
Copy link
Collaborator

piever commented Jan 21, 2021

Cool! My plan is as follow. I will try to work on a simplified AlgebraOfGraphicsLight that only does the conversion from the AoG language to the TracesScalesLabels format, which consists of three things:

  • A list of traces (in "data coordinates")
  • A set of scales (discrete and continuous, I assume a unique scale across all traces for now, already "fitted" with the unique values and ranges)
  • A set of labels

Then there should be somewhere an implementation on plot for the TracesScalesLabels that does something sensible, and an easy way to get the legend from it (the "fitted scales" and labels should be enough to generate it).

I suspect the TabularMakie package could come very handy to render the TracesScalesLabels object and could potentially have its own way of generating that object.

@greimel
Copy link
Collaborator Author

greimel commented Jan 21, 2021

Sounds good!

@SimonDanisch
Copy link
Member

Is it possible to use the Tables.jl interface instead of full blown DataFrames? If yes, I wouldn't oppose to have this directly in AbstractPlotting ;)

@greimel
Copy link
Collaborator Author

greimel commented Jan 21, 2021

It might be possible yes.

I think I only need DataFrames-specific stuff twice (operation on grouped data). I could try to make it work with SplitApplyCombine.jl instead.

What's your objection to DataFrames.jl? It's almost 1.0 and it makes things really convenient.

@mkborregaard
Copy link

The whole purpose of the Tables interface is not having to restrict people to Dataframes, though, right?

@greimel
Copy link
Collaborator Author

greimel commented Jan 21, 2021

Is it about the restriction of inputs (I could just transform it internally) or taking the dependency?

If we use it internally we could use the DataFrames mini language for free (I think). That is we can just allow :var or :var => :new_name or :var => ByRow(log) => :log_var as arguments and DataFrames would handle it for us.

@SimonDanisch
Copy link
Member

It's about the dependency, and making the same things work with any arbitrary other datatype that satisfies the table interface.

Especially since I usually avoid Dataframes in my projects... Not because I dislike it, but just because it's quite a lot to learn and my data wrangling is usually super simple :D
I just dislike learning & loading the Dataframes package, just to do a simple stacked barplot :D
But I can understand, that this may make the implementation harder, especially since it becomes much harder to dispatch on a type. But I'd definitely try to make all recipes inside AbstractPlotting Dataframes free...And if there are strong reasons to use it, it should likely go into a package for now ;)

@mkborregaard
Copy link

Wouldn't transforming internally also lead to copying input data in many cases?

@jkrumbiegel
Copy link
Member

I think this should have its own function and not sit in the same recipe pipeline as everything else. First because table-like is not a type and Simon is right that dispatching would be hard. Second because certain recipes might want to define methods for table-like types and that would conflict. Third because the default plotting commands all return FigureAxisPlot, AxisPlot or Plot and returning a Figure with many plots and legend and colorbar breaks that default interface.

@greimel
Copy link
Collaborator Author

greimel commented Jan 21, 2021

@SimonDanisch I would probably leave it outside AbstractPlotting for now, because developing is simpler using DataFrames. When we've made some progress we can still get rid of the dependency and port it to AbstractPlotting.

@mkborregaard that's probably true.

@jkrumbiegel Concerning return type: I agree. I think there should be two interfaces. One which looks like standard Makie and returns something that looks like standard Makie. And a second that returns legends, colorbars, etc for easy customization. (In fact there are already two different plotting functions in TabularMakie for this reason.)

Concerning input type: What about the StatsMakie approach of having a custom type Data that wraps a table object that we can dispatch on?

Thanks a lot for this discussion.

@SimonDanisch
Copy link
Member

Having a TableLike type in AbstractPlotting, that wraps any datatype that conforms the Table interface in AbstractPlotting would be great, and a good dispatch target for any other library (e.g. TabularMakie could define converts for Dataframes to such a type).

@piever
Copy link
Collaborator

piever commented Jan 21, 2021

It's about the dependency, and making the same things work with any arbitrary other datatype that satisfies the table interface.

I have secretly hidden inside StructArrays enough table manipulations machinery as to not need DataFrames. In particular, grouping can be done using just StructArrays, which is a dependency already. There is a StructArrays.finduniquesorted that iterates unique values and their indices of a StructArray, and it can be used for groupby.

If yes, I wouldn't oppose to have this directly in AbstractPlotting ;)

IMO, there are four distinct things:

  1. A descriptive format (kind of like the TracesScalesLabels discussed above) for these types of plot, where the data is already grouped
  2. A sensible way to render this format (allowing to tweaks the layout and what not)
  3. A way to generate this format from a table using StatsMakie-like or TabularMakie-like syntax
  4. A way to generate this format from a table using AoG syntax

I guess 1 and 2 can happily live in AbstractPlotting. 3. is probably simple enough to implement using the "data wrapper" (I'm happy to help switch from DataFrames.groupby to StructArrays.finduniquesorted if we go in that direction). 4. could probably stay in AlgebraOfGraphics, which would become a purely "syntactical" package (which would help me a lot with maintenance).

@piever
Copy link
Collaborator

piever commented May 9, 2021

Closed by #155

@piever piever closed this as completed May 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants