Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Best practices on size of random training subset for policytree #141

Open
njawadekar opened this issue Sep 28, 2022 · 1 comment
Open

Best practices on size of random training subset for policytree #141

njawadekar opened this issue Sep 28, 2022 · 1 comment
Labels
question Further information is requested

Comments

@njawadekar
Copy link

njawadekar commented Sep 28, 2022

I notice much variation in the variables that are selected by the policytree, depending on the size of the random training subset that is specified, in this step:

Fit a depth 2 tree on a random training subset.

n <- 250
train <- sample(1:n, 200)

opt.tree <- policy_tree(X[train, ], Gamma.matrix[train, ], depth = 2)
opt.tree

Are there recommended best practices regarding the size (or % of the original sample) for the random training subset when setting up the policytree?

@erikcs
Copy link
Member

erikcs commented Sep 29, 2022

There's no universal guideline, a 50/50 train/test split is just a reasonable default. And for different splits, that's normal considering trees are discontinuous. Ideally, if there is signal, the estimated rewards are similar even though the splits are different: there may be many trees that give a reasonable policy, it's not necessarily unique. Here is a tutorial if that's helpful.

@erikcs erikcs added the question Further information is requested label Sep 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants