Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

colors in the aggregate #18

Closed
timelyportfolio opened this issue Jul 20, 2015 · 9 comments
Closed

colors in the aggregate #18

timelyportfolio opened this issue Jul 20, 2015 · 9 comments

Comments

@timelyportfolio
Copy link
Collaborator

I'm sure this is my own ignorance, but it appears that colors in the aggregate work differently than I would expect. For instance, if we use only do one level using GNI2010.

library(treemap)
library(dplyr)

treemap(GNI2010, index=c("continent"), vSize="GNI", vColor="GNI", type="value") %>%
  { .$tm } %>%
  select( continent, vColor, color )

gives me

      continent  vColor   color
1        Africa  106410 #EFF8AA
2          Asia  285410 #CEEA84
3        Europe 1056360 #0F8445
4 North America  240850 #D9EF8B
5       Oceania   80770 #F3FAAF
6 South America   71410 #F3FAAF

and

image

while setting index = c("continent","iso3") gets me different colors.

treemap(GNI2010, index=c("continent","iso3"), vSize="GNI", vColor="GNI", type="value") %>%
  { .$tm } %>%
  filter( is.na(iso3) ) %>%
  select( continent, vColor, color )

gives me

      continent  vColor   color
1        Africa  106410 #006837
2          Asia  285410 #006837
3        Europe 1056360 #006837
4 North America  240850 #006837
5       Oceania   80770 #0C7F43
6 South America   71410 #1A9850

and a ugly plot to show the colors

image

Naively, I would expect the colors assigned to the aggreage continent to be the same. This is important when trying to match the colors assigned by treemap in #17.

@timelyportfolio
Copy link
Collaborator Author

Perhaps, the answer lies in the type. Maybe I should expect the colors to match at the aggregate level only when the vColor is normalized with a type="dens" or type="comp" or if I manually set a range based on the aggregate totals.

@mtennekes
Copy link
Owner

What happens, is that the range of values of the lowest nodes, i.e. the rectangles that will be coloured, is assigned to the color palette. So for the index=c("continent") treemap the lowest nodes are the continentes with aggregated values approximately from 0 to 1200000. For the index=c("continent","iso3") treemap, the lowest nodes are the countries with values only up to 90000. Hence the darkest color is assigned to 90000 in this case (instead of 1200000). The colors of the aggregated values of the continents are 'truncated' to dark green. (Note that the South America bubble has a ligher colour, since its aggregated value, 71410, is included in the range 0-90000).
This behaviour also occurs for other treemap types.

The savest way is indeed to use the range:

treemap(GNI2010, index=c("continent","iso3"), vSize="GNI", vColor="GNI", type="value",
range=c(0,1200000)) %>%
    { .$tm } %>%
    filter( is.na(iso3) ) %>%
    select( continent, vColor, color )

      continent  vColor   color
1        Africa  106410 #EFF8AA
2          Asia  285410 #CEEA84
3        Europe 1056360 #0F8445
4 North America  240850 #D9EF8B
5       Oceania   80770 #F3FAAF
6 South America   71410 #F3FAAF

@timelyportfolio
Copy link
Collaborator Author

Ok, thanks for clearing this up. It appears sum is the only aggregate function available if I read these lines correctly. What happens with things like averages that shouldn't sum? Is there any way to hack/override the aggregation?

All this is important to me as I try to make d3treeR. I understand that treemap was not developed with this use case in mind. However, since only two levels show at a time in the experimental iteration of d3treeR aggregate colors will likely need more than just sum. I could easily write a function to calculate the different aggregates on the return value from treemap, but then I don't have all the information I would need to apply the original color scale. I'll hack away a little more to explore other options. Maybe I should just wrap treemap with d3tree, and then I would have all the original arguments with which I could possibly make it work.

@mtennekes
Copy link
Owner

Exactly. Originally, I only had two treemap types, comp and dens, of which I thought they were sufficient from a statistical point of view. However, for optimal functionality, the value type was born, which turned out to be, I think, the most used one.

It's not hard to generalize the aggregation function (with sum as default). However, for aggregation of averages or ratios, we probably need a weighted average, since we cannot simply average, say, the percentages of smokers per country to continents. For this, we need population numbers per country as weights.

Is this reasoning what you had in mind, or do you have other kind of aggregates in mind?
For the GNI2010 example, I think summing is probably the correct aggregation function. I also used a fixed range when zooming in and out in my (very primitive) interactive shiny tool itreemap.

@timelyportfolio
Copy link
Collaborator Author

You understand correctly and yes GNI2010 does not really apply here but it is the first example in treemap :) Averages are the most likely use case. I tried to make a count column = 1 and then use type='dens' but that is the reverse average if I understand correctly and also does not make sense for the leaf level.

@mtennekes
Copy link
Owner

Now there is an argument fun.aggregate! If weighted.mean is used, the weights, argument w, are by default the vSize variable. See https://github.com/mtennekes/treemap/blob/master/test/test_aggregation_functions.R

If you have a useful typical dataset that contains averages, we could include it in the package.

@timelyportfolio
Copy link
Collaborator Author

beautiful, thanks so much for the very quick response! I'lll play with it throughout the day and report back. So far it looks great.

@timelyportfolio
Copy link
Collaborator Author

@ignacio82 says this solved his problem. I played with it more today with max, min, median, and they all worked great. I don't have a dataset, but I'll put some thought into examples with one of the built-in base datasets and demonstrate with d3treeR. It would probably also play nicely with old-fashioned tables and xtabs.

Thanks again!

@timelyportfolio
Copy link
Collaborator Author

Happy to close this. Thanks again for such a quick response!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants