Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changing plot labels within the function plotPartialPrediction #993

Closed
ghost opened this issue Jul 8, 2016 · 19 comments
Closed

Changing plot labels within the function plotPartialPrediction #993

ghost opened this issue Jul 8, 2016 · 19 comments

Comments

@ghost
Copy link

ghost commented Jul 8, 2016

This might be an entirely stupid question, but I have troubles with a graphical parameter in the function plotPartialPrediction and couldn't find out how to solve the problem.

Let us say I want to run a random forest analysis with 9 features, labeled as a_A, k_A, p_A, a_B, k_B, p_B, a_C, k_C, p_C in the object called "results" which I select from the object "result" as follows:

result.final <- dplyr::select(result,outcome, a_A, k_A, p_A, a_B, k_B, p_B, a_C, k_C, p_C)

After succesfully running the random forest anaylyis I succesfully generated the partial prediction plot as follows

plotPartialPrediction(pd.regr) + theme_bw() + ylab("Bias")

Now, here comes the actual problem:
The resulting 9 partial prediction plots use the original feature names a_A, k_A, p_A, a_B, k_B, p_B, a_C, k_C, p_C. However, these names are obviously awkward and not suitable for publications. I fiddled around with dplyr to replace these names by more meaningful ones but failed. One possibility is to use the "rename" function from dyplr and to change these variable names in before one runs the random forest analysis. However, I wonder if there is an easier way to change these names within the plotPartialPrediction function. Help is much appreciated!

Best, Benny

@ja-thomas
Copy link
Contributor

Hm I am not sure if this is possible in an easier way right now. The thing is that to change the labels in facet_grid or facet_wrap we need to supply a labeller, but we cannot give additional arguments to the plotPartialDependency function. I don't think you can change the labels after the plot with a + awesome_gglot_function() call.

@berndbischl
Copy link
Sponsor Member

This might be an entirely stupid question, b

it really isnt, i guess what you describe is a pretty normal scenario.

now regarding the answer:
this is (at least one) of the reasons that before you go to plotting, mlr forces you to jump thru the hoop to create the container object first with that generate* call. all data for the plot comes from there and the data structure should be easily understandable. so simply change the labels there, if you need to do this posthoc.

let me try to cook up a quick example

@zmjones paging you to the thread to maybe share some thoughts

@berndbischl
Copy link
Sponsor Member

berndbischl commented Jul 8, 2016

ok, it is a TINY bit more complex than i tough, you need 2 lines of code, i would have wished for 1 :)

lrn = makeLearner("classif.rpart", predict.type = "prob")
m = train(lrn, iris.task)
pd = generatePartialDependenceData(m, iris.task)
newnames = paste0("f", 1:4)
pd$features = newnames
colnames(pd$data)[3:6] = newnames
p = plotPartialDependence(pd)
print(p)

@berndbischl
Copy link
Sponsor Member

@zmjones
would you be so kind to add this as an example in the docs?

@ghost
Copy link
Author

ghost commented Jul 8, 2016

Speaking of adding the example to the docs, here is the code referring to the latest mlr package version. Some function names do no longer exist in the latest version (generatePartialDependenceData and plotPartialDependence)

lrn = makeLearner("classif.rpart", predict.type = "prob")
m = train(lrn, iris.task)
pd = generatePartialPredictionData(m, iris.task)
newnames = paste0("f", 1:4)
pd$features = newnames
colnames(pd$data)[3:6] = newnames
p = plotPartialPrediction(pd)
print(p)

Moreover, although I managed to rerun the example succesfully, something strange happened after adapting your example to my analysis: R produced only a single plot with the new labels below the x-axis. I am somewhat lost here… My code is as follows:

pd.regr = generatePartialPredictionData(fit.regr, regr.task)
newnames = paste0("Bias","H effect (spot A)", "D parameter (spot A)", "Trait frequencies (spot A)","H effect (spot B)", "D parameter (spot B)", "Trait frequencies (spot B)","H effect (spot C)", "D parameter (spot C)", "Trait frequencies (spot C)" ,"G-E corr.","D-E corr.", "E-E corr.", "Population values")

pd.regr$features = newnames
colnames(pd.regr$data)[3:14] = newnames

plots.partial.ae <- plotPartialPrediction(pd.regr) + theme_bw() + ylab("Bias")
print(plots.partial.ae)

rplot

@zmjones
Copy link
Contributor

zmjones commented Jul 8, 2016

@Denominator1

i followed this bit of code and didn't see any problems other than the name change from partial prediction to partial dependence.

lrn = makeLearner("classif.rpart", predict.type = "prob")
m = train(lrn, iris.task)
pd = generatePartialDependenceData(m, iris.task)
newnames = paste0("f", 1:4)
pd$features = newnames
colnames(pd$data)[3:6] = newnames
p = plotPartialDependence(pd)
print(p)

Your issue with the second block is with newnames. When you paste it together you get a character vector of length 1, so you are assigning all of the features to have the same name. So the plotting function doesn't work with that. Just change paste to c and it will work.

@zmjones
Copy link
Contributor

zmjones commented Jul 8, 2016

And @berndbischl I can document this if you want but I think this is outside our scope. People can just use better names when they are creating the task. Or do you think this is a common enough task that this merits documentation?

@ghost
Copy link
Author

ghost commented Jul 8, 2016

@zmjones Obviously a very silly mistake I made. Thanks for the help!

@zmjones
Copy link
Contributor

zmjones commented Jul 8, 2016

@Denominator1 sure thing. happy to help

@larskotthoff
Copy link
Sponsor Member

@zmjones I agree with @berndbischl on this one -- it's a common task to rename things and it would be great to have an example of this in the docs or tutorial.

@zmjones
Copy link
Contributor

zmjones commented Jul 8, 2016

ok will add

@berndbischl
Copy link
Sponsor Member

Lemme add 2cents here:

a)

it's a common task to rename things and it would be great to have an example of this in the docs or tutorial.

very much so, and it is especially annoying if the computation was very time-consuming and is already done. i basically had the same problem here: #899

b)
in an ideal world we would a function like relabel(object, task.ids=, features=, class.names=, ....) that would be callable on at least the plot-data objects and the BMRs. But that can quickly escalate and become complicated and a burden.

c) So I would suggest:I close also #899 and create a general issue for this. Even if later a smaller helper exists for the most common cases, this will be helpful for people.

@zmjones when you have added a tiny-mini example in the docs, tell me and we can close here

@zmjones
Copy link
Contributor

zmjones commented Jul 8, 2016

my objection to this is that i didn't think that the plot functions were supposed to generate publication quality plots. my understanding was that we intended for people to use the generation functions only in these instances.

@berndbischl
Copy link
Sponsor Member

berndbischl commented Jul 8, 2016

my objection to this is that i didn't think that the plot functions were supposed to generate publication quality plots. my understanding was that we intended for people to use the generation functions only in these instances.

i don't completely get this.

first of all, note there are different forms of "publications". journal papers, conf papers, websites, blog posts, conf talks, lectures, ....

and i never saw this so clear-cut. i always saw the gen-functions as a cool potential way out / alternative if somebody did not like the provided plots. i could see myself putting some of our plots in some articles? i have put much uglier plots into papers (from other packages) for sure.

also imagine this: you basically like the plot we provide. you deem it fit for your form of "publication". the only think you dont like are some objects labels. now reprogram the whole crap?

but also please note that i do not want to go completely overboard with this and said

but that can quickly escalate and become complicated and a burden.

IMHO we should only add stuff here which is simple AND commonly useful.

@zmjones
Copy link
Contributor

zmjones commented Jul 8, 2016

yea ok that makes snese

@schiffner
Copy link
Contributor

I can go through the tutorial and add some hints about changing labels/names/annotations.

@berndbischl
Copy link
Sponsor Member

I can go through the tutorial and add some hints about changing labels/names/annotations.

that would be great. only add the most important stuff you can explain quickly

@jakobbossek
Copy link
Contributor

Actually you can manipulate the ggplot object returned by plotPartialDependence. Just write your own labeller and "overwrite" the faceting:

lrn = makeLearner("classif.rpart", predict.type = "prob")
m = train(lrn, iris.task)
pd = generatePartialDependenceData(m, iris.task)
p = plotPartialDependence(pd)
p + facet_wrap(~ Feature, labeller = function(variable, value) {
  mapping = list(Sepal.Width = "SP", Sepal.Length = "SL", Petal.Width = "PW", Petal.Length = "PL")
  sapply(value, function(v) mapping[[v]])
})

@pat-s
Copy link
Member

pat-s commented Aug 13, 2019

Julia added info to the tutorial, Jakob explained to how to change plots post-creation. Seems like this issue can be closed.

@pat-s pat-s closed this as completed Aug 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants