Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider supporting easy visualization of terminal node sizes #16

Open
henningsway opened this issue Mar 20, 2019 · 12 comments
Open

Consider supporting easy visualization of terminal node sizes #16

henningsway opened this issue Mar 20, 2019 · 12 comments

Comments

@henningsway
Copy link

I hope you don't mind my little feature suggestions. :)

I really like the approach taken by https://github.com/parrt/dtreeviz which visualizes Decision trees in a very clear and pleasing way. I like the visualization of cutpoints, but also the possibility to easily glimpse the size of the terminal nodes.

Good luck with the project!

@martin-borkovec
Copy link
Owner

No, don't mind it all. Thanks for your interest in this project!

Yes, that's an interesting idea. I am going to keep it in mind for further development.

@henningsway
Copy link
Author

Would there be way to turn the node size into a pie chart (angles mapped to the proportions, size mapped to terminal node size) already?

Tried to pass

geom_nodeplot(gglist = list(geom_bar(aes(x = "", fill = sex),
                                     position = position_dodge()) +
                                     coord_polar("y")
                            ))

but this doesn't seem to be the right approach. ;-/

@martin-borkovec
Copy link
Owner

martin-borkovec commented Mar 21, 2019

Do you mean like this? Had to add a new setting of "nodesize" for width and height.
Well, actually it's mapped to the log of nodesize since the actual proportions are way too extreme.

Regarding your suggested code: be careful not to use + instead of a comma for the gglist argument. It has to be a normal list. I know this may be a pitfall for new users.

library(MASS)
library("partykit")
#> Loading required package: grid
#> Loading required package: libcoin
#> Loading required package: mvtnorm
SexTest <- ctree(sex ~ ., data=Aids2)
library(ggparty)
#> Loading required package: ggplot2
ggparty(SexTest) +
  geom_edge() + 
  geom_edge_label() +
  geom_node_splitvar() +
  geom_nodeplot(gglist = list(geom_bar(aes(x = "", fill = sex),
                                       position = position_fill()),
                              coord_polar("y"),
                              theme_void()),
                width = "nodesize",
                height = "nodesize"
  )

Created on 2019-03-21 by the reprex package (v0.2.1)

@henningsway
Copy link
Author

This looks very good! I will try it very soon. :)

I think the dataset (and the resulting nodesizes) are quite imbalanced, which is why the choice of the log for the nodesize seems appropriate.

Leaving this transformation to the user is probably too verbose or difficult to implement (e.g. geom_nodeplot(width = log(nodesize)) or sth) I would think?

Maybe it instead of choosing both width and height just one option (area or size) may be what's needed in most usecases.

@martin-borkovec
Copy link
Owner

Leaving this transformation to the user is probably too verbose or difficult to implement (e.g. geom_nodeplot(width = log(nodesize)) or sth) I would think?

No, shouldn't be too troublesome to implement, I plan on doing this.

Maybe it instead of choosing both width and height just one option (area or size) may be what's needed in most usecases.

Not sure about that, separate width and height may also be very handy in many cases. But yes, adding another option, which takes care of both at once is a good idea!

@henningsway
Copy link
Author

henningsway commented Mar 21, 2019

I just took this for a testdrive.

For dataset with say 50k rows and about a dozen terminal nodes the differences in the nodesize (ranging about 2000 to 5000) are currently barely visible. So a choice of the transformation would be very useful in this case.

PS: (unrelated) Is it possible to map the color of the edge_label to the variable selected?

@martin-borkovec
Copy link
Owner

For dataset with say 50k rows and about a dozen terminal nodes the differences in the nodesize (ranging about 2000 to 5000) are currently barely visible. So a choice of the transformation would be very useful in this case.

Yes, I'd imagine.

PS: (unrelated) Is it possible to map the color of the edge_label to the variable selected?

What exactly do you mean? like this?

library(MASS)
library("partykit")
#> Loading required package: grid
#> Loading required package: libcoin
#> Loading required package: mvtnorm
SexTest <- ctree(sex ~ ., data=Aids2)
library(ggparty)
#> Loading required package: ggplot2
ggparty(SexTest) +
  geom_edge(aes(col = splitvar), size = 1.5) + 
  scale_color_discrete(h.start = 100) +
  geom_edge_label() +
  geom_node_splitvar() +
  geom_nodeplot(gglist = list(geom_bar(aes(x = "", fill = sex),
                                       position = position_fill()),
                              coord_polar("y"),
                              theme_void()),
                width = "nodesize",
                height = "nodesize"
  )

Created on 2019-03-21 by the reprex package (v0.2.1)

@henningsway
Copy link
Author

Awesome, I'll try this for the labels soon. Thank you!

@martin-borkovec
Copy link
Owner

oh, sorry... misread it.
here you go:

library(MASS)
library("partykit")
#> Loading required package: grid
#> Loading required package: libcoin
#> Loading required package: mvtnorm
SexTest <- ctree(sex ~ ., data=Aids2)
library(ggparty)
#> Loading required package: ggplot2
ggparty(SexTest) +
  geom_edge() + 
  scale_color_discrete(h.start = 100) +
  geom_edge_label(aes(col = splitvar)) +
  geom_node_splitvar() +
  geom_nodeplot(gglist = list(geom_bar(aes(x = "", fill = sex),
                                       position = position_fill()),
                              coord_polar("y"),
                              theme_void()),
                width = "nodesize",
                height = "nodesize"
  )

Created on 2019-03-21 by the reprex package (v0.2.1)

@martin-borkovec
Copy link
Owner

update regarding node size:
removed the option of mapping to node size for width and height, and introduced argument size instead which modifies both values at once by the provided multiplier. Can be set to "nodesize" or "log(nodesize)"

general update:
changed name of geom_nodeplot to geom_node_plot

library(MASS)
library("partykit")
#> Loading required package: grid
#> Loading required package: libcoin
#> Loading required package: mvtnorm
library(ggparty)
#> Loading required package: ggplot2
SexTest <- ctree(sex ~ ., data=Aids2)
ggparty(SexTest) +
  geom_edge() + 
  geom_edge_label() +
  geom_node_splitvar() +
  geom_node_plot(gglist = list(geom_bar(aes(x = "", fill = sex),
                                       position = position_fill()),
                              coord_polar("y"),
                              theme_void()),
                size = "log(nodesize)"
  )

Created on 2019-03-26 by the reprex package (v0.2.1)

@henningsway
Copy link
Author

Let me test this soon and get back to you. :)

@henningsway
Copy link
Author

henningsway commented Mar 27, 2019

Well, it works and for me this issue would be solvend! :)

Two additional thoughts:

  • I feel it would be cleaner to use tidy evaluation or similar for the size argument instead of passing a string?
  • when I use the default size = "nodesize" option, does the area of the terminal node correctly resemble the nodesize? (or is it too large, because both arguments are changed at the same time, essentially squaring the nodesizes)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants