Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Continuous / gradient legend #122

Merged
merged 43 commits into from
Mar 3, 2024
Merged

Continuous / gradient legend #122

merged 43 commits into from
Mar 3, 2024

Conversation

grantmcdermott
Copy link
Owner

@grantmcdermott grantmcdermott commented Feb 12, 2024

Closes #84.
Closes #124.
Closes #130.

Some notes:

  • On the actual implementation side, I ended up going with a bespoke raster-based legend rather than trying to do some y-intersp based trickery. The latter ended up being more trouble than it was worth and this way we also get things like alpha transparency and "top!" and "bottom!" legend placements.
  • I added a threshold for unique number of groups (default = 5) before the continuous legend kicks in. This can be over-ridden by the user (tpar("legend.ugc")). But it's my way to avoid something I personally find annoying about ggplot2's default behaviour, which automatically converts any numeric grouping variable into a gradient swatch, even if there are only (say) two categories.

Quick examples. [UPDATED based on bug catches and feedback in thread below.]

pkgload::load_all("~/Documents/Projects/tinyplot")
#> ℹ Loading tinyplot

par(pch = 19, las = 1)

# default
plt(lat ~ long | depth, quakes, grid = TRUE)

# legend switch
plt(lat ~ long | depth, quakes, grid = TRUE, legend = "bottom!")

# color interpolation
plt(lat ~ long | depth, data = quakes, col = hcl.colors(20, palette = "rocket"))

# transparency
plt(
  lat ~ long | depth, quakes, grid = TRUE,
  palette = hcl.colors(palette = "rocket", alpha = 0.5)
)

# separate col and bg control
plt(
  lat ~ long | depth, quakes,
  grid = TRUE,
  palette = hcl.colors(palette = "rocket", alpha = 0.7),
  pch = 21, col = "white", bg = "by", cex = 2
)

Created on 2024-02-22 with reprex v2.1.0

@vincentarelbundock
Copy link
Collaborator

Looks amazing!

I was only able to try the examples above (crazy week), but it looks excellent on my setup.

Unrelated, but could tpar() pass extra arguments to par() via ... so we can always all the same function regardless and don't have to mix and match?

@grantmcdermott
Copy link
Owner Author

Unrelated, but could tpar() pass extra arguments to par() via ... so we can always all the same function regardless and don't have to mix and match?

Ah thanks for the reminder. That's definitely a goal. Can you please file an issue so I remember to implement?

@zeileis
Copy link
Collaborator

zeileis commented Feb 13, 2024

This looks very cool. I'll play around some more with it on Friday.

@grantmcdermott
Copy link
Owner Author

grantmcdermott commented Feb 16, 2024

Okay, this is ready to go for full review from my side. I updated the tests and documentation, and also fixed a few corner cases. One more example (gradient for point interiors, but white borders):

pkgload::load_all("~/Documents/Projects/tinyplot")
#> ℹ Loading tinyplot
plt(
  lat ~ long | depth, quakes,
  grid = TRUE,
  palette = hcl.colors(palette = "rocket", alpha = 0.7),
  pch = 21, col = "white", bg = "by"
)

Created on 2024-02-15 with reprex v2.1.0

@grantmcdermott grantmcdermott changed the title [WIP] Continuous / gradient legend Continuous / gradient legend Feb 16, 2024
Copy link
Contributor

@etiennebacher etiennebacher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @grantmcdermott, I was intrigued by the hash set method for counting unique values so I made a small benchmark (+ fixed a couple of typos), I hope it's useful

NEWS.md Outdated Show resolved Hide resolved
R/utils.R Show resolved Hide resolved
@zeileis
Copy link
Collaborator

zeileis commented Feb 16, 2024

Grant @grantmcdermott, thanks again for implementing this really nice feature. The examples you posted all look great. I started playing around with these examples (without looking at the internals, yet). I noticed a bug in handling the the bg argument for the fill color of pch in 21:25 and I think the handling of by variables with few unique values is too ad hoc. I'll post these below and continue to play around some more...

@zeileis
Copy link
Collaborator

zeileis commented Feb 16, 2024

The bug: The bg argument for setting the fill color is only handled correctly if bg = "by", e.g.,

tinyplot(lat ~ long | depth, data = quakes, pch = 21, bg = "by") ## ok

However, when set to a scalar color, it appears to be only used for the first level(s):

tinyplot(lat ~ long | depth, data = quakes, pch = 21, bg = "red") ## only the lowest level(s)

For discrete variables this works:

tinyplot(mpg ~ wt | gear, data = mtcars, pch = 21, bg = "red") ## ok

However, even for discrete variables it is not possible to set a vector of background colors (e.g., semi-transparent versions of the border color.

tinyplot(mpg ~ wt | gear, data = mtcars, pch = 21, col = p, bg = adjustcolor(p, 0.3)) ## Error in !is.null(bg) && bg == "by"

Additional idea: Should we support some sort of shortcut for the latter application? Rather than just having the same color via bg = "by" we could allow bg = 0.3 as a shortcut for adjustcolor(..., 0.3) applied to the by color.

@zeileis
Copy link
Collaborator

zeileis commented Feb 16, 2024

Palette/legend handling for numeric by variables:

I think that the current implementation is ad hoc and confusing. For example when you happen to draw different subsets of the same type of data:

set.seed(403)
tinyplot(mpg ~ wt | carb, data = mtcars, pch = 19, subset = sample(1:32, 20)) ## categorical
tinyplot(mpg ~ wt | carb, data = mtcars, pch = 19, subset = sample(1:32, 20)) ## continuous

Even more seriously, the palette may yield an error in one but not the other case:

set.seed(403)
v <- hcl.colors(100)
tinyplot(mpg ~ wt | carb, data = mtcars, pch = 19, subset = sample(1:32, 20), col = v) ## error
tinyplot(mpg ~ wt | carb, data = mtcars, pch = 19, subset = sample(1:32, 20), col = v) ## ok

And finally, the killer argument in my opinion is that we get a completely different handling when the discrete values are not equidistant. The categorical version just ignores the distances completely:

mtcars$score <- rep(c(0, 1, 99, 100), each = 8) ## discrete score with values 0, 1, 99, and 100
mtcars$score2 <- mtcars$score + sin(1:32)/100 ## same but with some "fuzz"
tinyplot(mpg ~ wt | score, data = mtcars, pch = 19, col = hcl.colors(4)) ## four equidistant categories
tinyplot(mpg ~ wt | score2, data = mtcars, pch = 19) ## continuous with two extreme ends of the scale

Conceptual considerations:

So I thought a bit more about what style of palette and corresponding legend I would expect for y ~ x | z with different types of z when both x and y are numeric.

Type of z Palette Legend Status
factor (unordered) Qualitative Discrete ✔️
ordered (inheriting from factor) Sequential Discrete
numeric (with many levels) Sequential Continuous ✔️
numeric (with few levels) Sequential

So at the very least we should change the default palette for ordered factors.

And then the question remains whether we should have a special handling of numeric variables with few distinct levels. I would argue: No for the reasons listed above. It is simple enough to say y ~ x | ordered(z).

If you disagree with me here, then at least this case should be handled like an ordered factor (i.e., with a sequential palette) and not like an unordered factor. Also I would prefer to employ a palette that preserves distances between the values. This would also imply that for numeric z it is always ok to supply a col of length greater than the number of unique levels.

If all of this is incorporated then I think I could live with the discrete palette but would still think it's confusing. Also, there is no simple/intuitive argument to change the behavior because an extra call to tpar() is needed.

@grantmcdermott
Copy link
Owner Author

Wow, this is great feedback. Thanks @zeileis! I think that I agree with all of your major points. I won't be able to action anything immediately, since I'm heading out for a weekend at the coast. But I'll mull on some of your high level design decision ideas while I'm looking at the waves rolling in :-)

@grantmcdermott
Copy link
Owner Author

@zeileis @vincentarelbundock Okay... I believe that all of the outstanding issues should now be addressed. I ended up going with the adjusted viridis palette as the default after all, and any other changes should be in line with your suggestions. (See the updated examples right the top of the thread for some illustrations.)

Please kick the tires once more to check that you're happy. Assuming that everything looks good to you, please feel free to squash and merge.

@zeileis
Copy link
Collaborator

zeileis commented Feb 25, 2024

Thanks, Grant, for the thorough update! I played around with it and noticed that my recommendation of reversing the scale has led to some inconsistencies. Sorry about that!

First, I noticed that the case of 100 colors is handled differently from other settings:

tinyplot(lat ~ long | depth, data = quakes, pch = 19, col = hcl.colors(99, "ag_Sunset")) ## low = light
tinyplot(lat ~ long | depth, data = quakes, pch = 19, col = hcl.colors(100, "ag_Sunset")) ## low = dark
tinyplot(lat ~ long | depth, data = quakes, pch = 19, col = hcl.colors(101, "ag_Sunset")) ## low = light

I guess that this might be due to reversing the order in two different places in the code?

Also, the default ordering is different for ordered factors vs. numeric variables, e.g.,

tinyplot(mpg ~ wt | carb, data = mtcars, pch = 19, cex = 1.5) ## low = light, bottom to top
tinyplot(mpg ~ wt | ordered(carb), data = mtcars, pch = 19, cex = 1.5) ## low = dark, top to bottom

Maybe we also want to use the same restricted viridis palette for the ordered factors?

P.S.: Given that you were already kind enough to list me with an "aut" role for the package, you never need to thank me in the NEWS. :-)

@grantmcdermott
Copy link
Owner Author

grantmcdermott commented Feb 27, 2024

Also, the default ordering is different for ordered factors vs. numeric variables, e.g.,

tinyplot(mpg ~ wt | carb, data = mtcars, pch = 19, cex = 1.5) ## low = light, bottom to top
tinyplot(mpg ~ wt | ordered(carb), data = mtcars, pch = 19, cex = 1.5) ## low = dark, top to bottom

Maybe we also want to use the same restricted viridis palette for the ordered factors?

Quick clarification/confirmation on this: We can certainly match the restricted colors for ordered factors and ensure that low values = dark. But do we want the legend to be reversed and run from bottom to top too?

I understand that it will be better for internal consistency, but we are deviating from established norms in other packages. Both ggplot2 and lattice run ordered factors from top to bottom, e.g. lattice::xyplot(mpg ~ wt, group = ordered(carb), data = mtcars, auto.key = TRUE)

@zeileis
Copy link
Collaborator

zeileis commented Feb 27, 2024

Good point. So we can either be consistent within tinyplot for numeric and ordered - or we can be consistent across packages with ggplot2 and lattice. Then let's go with the consistency with ggplot2 and lattice - and let's see how users like it. Given that ordered factors are typically under-used anyway, there are probably not many users affect by this.

@grantmcdermott
Copy link
Owner Author

@zeileis Thanks for confirming (and for catching these cases). Both should be fixed now:

pkgload::load_all("~/Documents/Projects/tinyplot")
#> ℹ Loading tinyplot

tinyplot(lat ~ long | depth, data = quakes, pch = 19, col = hcl.colors(100, "ag_Sunset"))

tinyplot(mpg ~ wt | ordered(carb), data = mtcars, pch = 19, cex = 1.5)

Created on 2024-02-27 with reprex v2.1.0

@grantmcdermott
Copy link
Owner Author

Is there anything we still need to do/check before merging?

@zeileis
Copy link
Collaborator

zeileis commented Mar 1, 2024

Thanks for the reminder, Grant. I'm just traveling home from a conference and didn't have time, yet, to play with the code. I just noticed one last inconsistency that I wanted to mention. But as I explain below, I think that this is the best solution we can do. So I wouldn't change anything.

Compare:

tinyplot(mpg ~ wt | carb,          data = mtcars, pch = 19, cex = 1.5, col = hcl.colors(6, "ag_Sunset"))
tinyplot(mpg ~ wt | ordered(carb), data = mtcars, pch = 19, cex = 1.5, col = hcl.colors(6, "ag_Sunset"))
tinyplot(mpg ~ wt | factor(carb),  data = mtcars, pch = 19, cex = 1.5, col = hcl.colors(6, "ag_Sunset"))

In the numeric case we reverse the order to obtain "dark = high". But in the ordered and factor case we don't reverse the order so that dark = low. While this is somewhat inconsistent, I think this is the best we can do. I just wanted to point out why - so that you can check whether you agree with these considerations or whether you would prefer a different solution.

  • In the unordered factor case, we clearly don't want to re-order. The order of the colors should simply match the order of the categories.
  • In the numeric case, we might disable the reversing and just do it for the default palette. However, that would mean that users would very have to say something like hcl.colors(..., rev = TRUE) which would be rather inconvenient. So I would also leave this as it is.
  • So then we have to decide whether the ordered case should reorder (like numeric) or not reorder (like factor). I think the latter is probably less confusing.

@grantmcdermott
Copy link
Owner Author

So then we have to decide whether the ordered case should reorder (like numeric) or not reorder (like factor). I think the latter is probably less confusing.

Hmmm. Yes, I think you're right that this is the "least bad" tradeoff that we can make here. And users can always use rev = TRUE if they want to switch the ordering. Let's leave it as-is for now and we can potentially adjust if we get strong feedback about it.

Thanks for the reminder, Grant. I'm just traveling home from a conference and didn't have time, yet, to play with the code.

Sorry, I don't mean to be a rash, I was mostly checking in, since I realised that my last message was probably a bit ambiguous. I just pushed another small commit now, but that should be it from me unless you pick up any more issues in testing. Catching these edge cases is important, so take your time... although it would be great if we could merge this PR fairly soon, since that will clear the way for the last few things before CRAN submission ;-) I'm hoping to submit before I head out for an extended vacation around spring break.

Let me know!

@zeileis
Copy link
Collaborator

zeileis commented Mar 2, 2024

  1. I agree, good plan. 2. No worries! 3. I think we can squash and merge now. Should I press the button?

@grantmcdermott
Copy link
Owner Author

  1. If you're happy then I'll go ahead and do it. Thanks again for all the super helpful comments on this one!

@grantmcdermott grantmcdermott merged commit ac242d8 into main Mar 3, 2024
3 checks passed
@grantmcdermott grantmcdermott deleted the continuous-legend branch March 3, 2024 02:22
@zeileis
Copy link
Collaborator

zeileis commented Mar 3, 2024

Thank you for doing all the actual hard work!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants