Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What should multi-variable grouping look like in the general case? #25

Closed
jhofman opened this issue Apr 12, 2021 · 20 comments
Closed

What should multi-variable grouping look like in the general case? #25

jhofman opened this issue Apr 12, 2021 · 20 comments
Assignees
Labels
priority next action visualizing-verbs has to do with how a data analysis verb is represented visually

Comments

@jhofman
Copy link
Contributor

jhofman commented Apr 12, 2021

We currently have something that looks good for degree and work in the salary example.

  • Should we put limits on the number of grouping variables and number of levels for each grouping variable? Can we handle 3 binary grouping variables, for instance?
  • Do we want the behavior that we currently have (spatial break vs. underlining)?
  • Should the order in which the grouping variables are specified in the code be reflected in the visualization?
@jhofman
Copy link
Contributor Author

jhofman commented Apr 12, 2021

Maybe we can use ggplot2's faceting + other encodings to prototype this cheaply and figure out the limits of what we would(n't) want to show people.

For instance, would the following mapping work for the grouped grid frame: Grouping variable 1 = row, Grouping variable 2 = column, Grouping variable 3 = color+symbol ?

@jhofman
Copy link
Contributor Author

jhofman commented Apr 12, 2021

Alternatively, as per @dggoldst's suggestion, when there are more grouping variables than we can handle, maybe animate only a subset of the resulting plot as an exemplar of the overall results (e.g., animate only one panel of a huge facetted plot)?

@jhofman jhofman added priority next action visualizing-verbs has to do with how a data analysis verb is represented visually labels Apr 13, 2021
@sharlagelfand
Copy link
Collaborator

@jhofman @dggoldst I have a brain dump of notes here, exploring how many groups it's possible to handle in ggplot2 versus how many we might actually be able to handle differentiating and how to do that, the hierarchy of grouping, etc

@jhofman
Copy link
Contributor Author

jhofman commented Apr 15, 2021

this is great, looking forward to discussing tomorrow.

@dggoldst
Copy link

dggoldst commented Apr 15, 2021 via email

@jhofman
Copy link
Contributor Author

jhofman commented Apr 16, 2021

we decided we'd try to adjust the group-by representation to be in "clumps" that mirror facets (first variable is row, second is column, third is possibly "nested" as in this figure).

IMG_2871

@sharlagelfand
Copy link
Collaborator

It would seem that the "faceted and grouped dot plot" infrastructure is pretty underdeveloped in R! Took a bit, but here are a couple of proof of concepts of the grouping, with much better spacing and clarity I think! We can discuss tomorrow :)

  1. Grouping in facets and subplots (like the example Jake posted above)
  2. Grouping in facets and colours within

cc @jhofman @dggoldst

@jhofman
Copy link
Contributor Author

jhofman commented Apr 20, 2021

Next step is to see how creation and exporting of these new group by plots translate to vegalite specs.

As mentioned in #28, vegalite supports faceting, so hopefully vegawidget does as well.

Let's try two different ways to export?

  1. Just facets: the equivalent of facet_grid(island ~ species) for the penguins data
  2. Patchwork subplots: island1 + island2 + island3

Does the latter have to be an array of vegalite json specs?

@sharlagelfand
Copy link
Collaborator

Here are the specs for 1 (just facets): https://github.com/jhofman/datamations/tree/groups/sandbox/grouping/facet_color_specs

Working on the subplots specs - looks like repeating views is a good start for this, but I'm not sure whether we can combine repeated views with facets - will continue to dig into it.

@sharlagelfand
Copy link
Collaborator

Good idea on using the underlying facet data to "fake" the facets, @jhofman! Much easier than actually trying to offset ourselves, I think. There's a rendered example of what grouping looks like with real facets, "fake" facets, and the fake facets translated into vegalite here

I haven't quite figured out the custom axes in vegalite (e.g. having the islands on the x-axis and the species on y), but I'd say the grouping is pretty convincing otherwise! I've made the size of each "fake facet" equal (like it is in ggplot2), but we can definitely remove that if it seems weird

vegalite specs are here cc @giorgi-ghviniashvili

@jhofman
Copy link
Contributor Author

jhofman commented Apr 21, 2021

wow, that looks great @sharlagelfand, and glad it's easier than the do-it-yourself solution.

how hard do you think it would be to add facet labels in the faked version on rows and columns, to make it easier to see what the groups are?

@sharlagelfand
Copy link
Collaborator

thanks @jhofman!

Do you mean direct labelling on each facet, like this?

Datamations-19

Or just adding to the vegalite version like this (i.e copying over what's in the ggplot2 faked version)?

Datamations-20

@jhofman
Copy link
Contributor Author

jhofman commented Apr 21, 2021

the second: copying over what's on the ggplot2 version to vegalite.

@sharlagelfand
Copy link
Collaborator

oooh yes, that's what I meant by "i haven't quite figured out the custom axes in vegalite" - let me dig into it! just wanted to make sure this was a good direction to head first.

for context, what's happening in the ggplot2 case is that e.g. for the x-axis, the values are still e.g. 1, 2, ..., 59 (however many points there are), but the labels are "Biscoe", "Dream", and "Torgersen", strategically placed at the correct breaks (the midpoint of each fake facet). So I'll look into doing the same with vegalite

@jhofman
Copy link
Contributor Author

jhofman commented Apr 21, 2021 via email

@sharlagelfand
Copy link
Collaborator

Figured out the axes in vegalite, rendered here now, and an example:

visualization

One thing is that the axes are now occupied... so when we want to render axes of the actual values (i.e. once we move onto the scatterplot / summarised view) we might have to use annotations for those? Or maybe move these facet labels to annotations, if they can exist outside of the actual plotting area.

@jhofman
Copy link
Contributor Author

jhofman commented Apr 23, 2021

great! related to #32, so ccing @giorgi-ghviniashvili

@giorgi-ghviniashvili
Copy link
Collaborator

@sharlagelfand could you please point me to the vegalite docs of annotations?

@sharlagelfand
Copy link
Collaborator

@giorgi-ghviniashvili I haven't seen much in terms of actual documentation, but maybe these examples of layered plots with labels/annotations will be a good place to start?

If you're curious about the custom axes labels, I did something like this - the values of where the labels are is the midpoint of each "facet"

@sharlagelfand
Copy link
Collaborator

Going to close this! We have a general case figured out and #40 covers the idea of IDing customized multi-grouping

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority next action visualizing-verbs has to do with how a data analysis verb is represented visually
Projects
None yet
Development

No branches or pull requests

4 participants