Extract "dependent.variable.name" from a ranger object #697

talegari · 2023-10-24T18:25:10Z

Hey Martin,

What is the simplest way to extract "dependent.variable.name" from a ranger object?
IMHO, this needs to be extracted from output$call where output inherits ranger class. This gets tricky to extract depending on the user used formula interface or "dependent.variable.name" or x/y interface at the time. Further, any hacky extraction might break due to future changes in the package.

Although having "dependent.variable.name" is not required for predict, would it be helpful to store it somewhere like "independent.variable.names" is stored in output$forest?

Here is a quick fix when formula interface is used (which I can submit as a PR if you prefer):

Around line 278, we have:

data.selected <- parse.formula(formula, data, env = parent.frame())
y <- data.selected[, 1]
x <- data.selected[, -1, drop = FALSE]

which can be enriched

data.selected <- parse.formula(formula, data, env = parent.frame())
y <- data.selected[, 1]
x <- data.selected[, -1, drop = FALSE]
dependent.variable.name = names(data.selected)[1]

and later

result$dependent.variable.name = dependent.variable.name
# or: result$forest$dependent.variable.name = dependent.variable.name

The text was updated successfully, but these errors were encountered:

mnwright · 2023-10-25T17:45:32Z

You are right, it is currently not saved. I think we can add it as you suggest, we just need to add something for the x/y interface (where there might be no name).

talegari · 2023-10-26T17:50:08Z

IMHO for x/y interface, dependent.variable.name should be NULL as we are not getting it either from a named argument or a named datastructure (such as a dataframe).

jemus42 · 2023-11-03T16:40:07Z

For the xy/ interface we could default to using deparse(substitute(y)), no? Taking the name of the object might be more informative than a NULL

mnwright · 2023-11-07T21:35:29Z

But then it would be an object name and not a variable name, right? I think people want to use dependent.variable.name to subset data later, which wouldn't make sense with x/y.

jemus42 · 2023-11-07T23:20:55Z

Ah, sure - I was primarily worried about having default behavior that doesn't yield NULL, and assumed that x/y interface would be used in a context where target and features are in distinct objects anyway, hence falling back to object names is the next best thing to having variable names. Obviously, if y = somedata$sometarget then that point is moot, kind of.

mnwright · 2023-11-08T05:01:21Z

Thanks, merged #698.

talegari mentioned this issue Oct 26, 2023

added depvar to result #698

Merged

mnwright closed this as completed Nov 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extract "dependent.variable.name" from a ranger object #697

Extract "dependent.variable.name" from a ranger object #697

talegari commented Oct 24, 2023 •

edited

Loading

mnwright commented Oct 25, 2023

talegari commented Oct 26, 2023

jemus42 commented Nov 3, 2023

mnwright commented Nov 7, 2023

jemus42 commented Nov 7, 2023

mnwright commented Nov 8, 2023

Extract "dependent.variable.name" from a ranger object #697

Extract "dependent.variable.name" from a ranger object #697

Comments

talegari commented Oct 24, 2023 • edited Loading

mnwright commented Oct 25, 2023

talegari commented Oct 26, 2023

jemus42 commented Nov 3, 2023

mnwright commented Nov 7, 2023

jemus42 commented Nov 7, 2023

mnwright commented Nov 8, 2023

talegari commented Oct 24, 2023 •

edited

Loading