New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Affixing colours for R classes in `vis_dat` #26

Open
njtierney opened this Issue May 29, 2016 · 8 comments

Comments

3 participants
@njtierney
Collaborator

njtierney commented May 29, 2016

So vis_dat uses the default ggplot colours, which are great! But, I wonder if perhaps it might be more informative to use specific colours for specific classes. For example, if the colour red was always associated with characters, and blue was integers, etc.

I'm not sure if there's an established palette for this sort of thing, but I guess I could look into using a nice text editor scheme for a starting point.

Thoughts are very welcome!

@njtierney

This comment has been minimized.

Collaborator

njtierney commented May 29, 2016

For example:

  • Red = Character
  • Blue = Integer
  • Purple = Double
  • Green = Factor
  • Orange = Logical
  • Yellow = Date
@samclifford

This comment has been minimized.

samclifford commented May 30, 2016

Make an ontology of data types, map it to a spectrum and line up the colours. Character and factor are similar, so colour them similarly. Dates are written as text but closer to doubles than anything, surely.

@njtierney

This comment has been minimized.

Collaborator

njtierney commented May 30, 2016

This is a really cool idea!

@jennybc

This comment has been minimized.

Member

jennybc commented May 30, 2016

Grey = missing

That is my most fervently held belief.

Otherwise, I wonder if there is either something you should copy (Trifacta?) or some principles you should obey, relating to common forms of colourblindness, prevalence of different variables types, or what people are trying to distinguish. Re: the last thing, that actually means you would want character and factor to be really different because having a factor that you think is character is a huge source of data analysis headaches.

Whatever you do, seems like you'd want to make it fairly easy for user to change this palllete or look at "this" vs. everything else.

@njtierney

This comment has been minimized.

Collaborator

njtierney commented May 30, 2016

Thanks @jennybc!

  1. Agreed that Grey = Missing!
  2. I agree with you re colourblindness - the default palette should be colourblind friendly.
  3. Re your point on making it easy to look at "this" vs. everything else, do you feel that this sort of ties into the idea of giving vis_dat expectations, as in #15? Or are you more thinking along the lines of this?
`vis_dat(data, compare = "Factor")` 
  1. In terms of looking at similarity of R data types, I could give more colour "distance" to those that are most similar to each other? I wonder if it's worthwhile spinning up a little survey to send out to people regarding R data type similarity to help sorta crowd source the idea. For example, they could be asked to rank types by their similarity, e.g:
Rank Type
1 Character
2 Date
3 Factor
4 Integer
5 Logical
6 Double

And then ask people to describe which two (or more) are often mistaken for another?

Perhaps this data-driven approach a bit too meta, though.

@jennybc

This comment has been minimized.

Member

jennybc commented May 30, 2016

Re your point on making it easy to look at "this" vs. everything else

I have a hard time telling the difference between the issue and the example above. But basically agree someone might want to look at only one issue at a time, i.e. just missing data or data that meet some other criteria.

In terms of looking at similarity of R data types, I could give more colour "distance" to those that are most similar to each other?

I think your own common sense and thought are enough (vs. survey). My point: the initial proposal has red for character and green for factor, which would be tough on colourblind people trying to find unexpected factors. Another important distinction to help people notice is probably integer vs double.

njtierney added a commit that referenced this issue May 31, 2016

added a palette argument to visdat
I’ve created a draft template for the colours to as described in issue
#26 .
@njtierney

This comment has been minimized.

Collaborator

njtierney commented May 31, 2016

Thanks @jennybc !

I'm really keen to implement #15 when I get the chance, at this stage here is where I'm at with the colours being fixed. commit b24cecc has added a palette argument to vis_dat. There are three arguments, "default", "qual", and "cb_safe"

Default is just as-is

vis_dat(airquality, palette = "default")

image

"qual"

vis_dat(airquality, palette = "qual")

image

This is nice, but not super colour blind friendly.

"cb_safe" provides a better solution.

vis_dat(airquality, palette = "cb_safe")

image

One issue from here is that I have only provided colours for the 6 classes ("character", "date", "factor", "integer", "logical", "numeric"), and I would like to maybe provide a different set of colours for any extra classes that fall outside of these. Perhaps I can create a palette builder function that takes the classes in the plot, this might be able to link to scale_fill_brewer.

These colours are just working ideas at the moment.

I couldn't find colours from trifacta, at this stage I am using info from colorbrewer2.org

@njtierney

This comment has been minimized.

Collaborator

njtierney commented Jan 3, 2017

just adding another pallete that might be cool: http://chriskempson.com/projects/base16/

@njtierney njtierney added this to To Do in CRAN V0.6.0 release Jun 4, 2018

@njtierney njtierney added this to the V0.6.0 milestone Jun 6, 2018

@njtierney njtierney removed the V0.6.0 label Jun 6, 2018

@njtierney njtierney modified the milestones: V0.6.0, V0.7.0 Jun 6, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment