Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract "dependent.variable.name" from a ranger object #697

Closed
talegari opened this issue Oct 24, 2023 · 6 comments
Closed

Extract "dependent.variable.name" from a ranger object #697

talegari opened this issue Oct 24, 2023 · 6 comments

Comments

@talegari
Copy link
Contributor

talegari commented Oct 24, 2023

Hey Martin,

What is the simplest way to extract "dependent.variable.name" from a ranger object?
IMHO, this needs to be extracted from output$call where output inherits ranger class. This gets tricky to extract depending on the user used formula interface or "dependent.variable.name" or x/y interface at the time. Further, any hacky extraction might break due to future changes in the package.

Although having "dependent.variable.name" is not required for predict, would it be helpful to store it somewhere like "independent.variable.names" is stored in output$forest?

Here is a quick fix when formula interface is used (which I can submit as a PR if you prefer):

Around line 278, we have:

data.selected <- parse.formula(formula, data, env = parent.frame())
y <- data.selected[, 1]
x <- data.selected[, -1, drop = FALSE]

which can be enriched

data.selected <- parse.formula(formula, data, env = parent.frame())
y <- data.selected[, 1]
x <- data.selected[, -1, drop = FALSE]
dependent.variable.name = names(data.selected)[1]

and later

result$dependent.variable.name = dependent.variable.name
# or: result$forest$dependent.variable.name = dependent.variable.name
@mnwright
Copy link
Member

You are right, it is currently not saved. I think we can add it as you suggest, we just need to add something for the x/y interface (where there might be no name).

@talegari
Copy link
Contributor Author

IMHO for x/y interface, dependent.variable.name should be NULL as we are not getting it either from a named argument or a named datastructure (such as a dataframe).

@jemus42
Copy link
Contributor

jemus42 commented Nov 3, 2023

For the xy/ interface we could default to using deparse(substitute(y)), no? Taking the name of the object might be more informative than a NULL

@mnwright
Copy link
Member

mnwright commented Nov 7, 2023

But then it would be an object name and not a variable name, right? I think people want to use dependent.variable.name to subset data later, which wouldn't make sense with x/y.

@jemus42
Copy link
Contributor

jemus42 commented Nov 7, 2023

Ah, sure - I was primarily worried about having default behavior that doesn't yield NULL, and assumed that x/y interface would be used in a context where target and features are in distinct objects anyway, hence falling back to object names is the next best thing to having variable names. Obviously, if y = somedata$sometarget then that point is moot, kind of.

@mnwright
Copy link
Member

mnwright commented Nov 8, 2023

Thanks, merged #698.

@mnwright mnwright closed this as completed Nov 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants