New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[tune] Cross Validation (simply parellization) patterns? #7744
Comments
Hey David, great question. Would you be open to a quick call on this subject - it's something we've been discussing internally (the ray / anyscale team) and it'd be good to understand your perspective to make sure we're thinking about it in the right way - given your well formulated question :). can you ping me, bill @ anyscale or if you're on the ray slack I'm on there too. |
Hey @cottrell, would this work https://ray.readthedocs.io/en/latest/tune-searchalg.html#repeated-evaluations for you? Documentation here: https://ray.readthedocs.io/en/latest/tune/api_docs/suggestion.html#ray.tune.suggest.Repeater What you could do is you can have the Trainable execute something different depending on the trial_index. class TestMe(Trainable):
def _setup(self, config):
index = config[tune.suggest.repeater.TRIAL_INDEX]
data_train, data_test = create_from_index(index, config)
def _train(self):
...
tune.run(TestMe, search_alg=Repeater(HyperOptSearch(search_space), repeat=5, set_index=True)) Does this make sense? Feel free to follow up with any questions (or any suggestions for how we can improve the docs). |
@richardliaw I think a modified Repeater would handle that case I'm thinking of ... but I'm kind of reluctant to fit a framework around it. You could get really fancy and just treat all individual runs as independent and just update the stats scoring mechanism in the scheduler I guess. Will try to jump on slack |
Can do a call to chat. @anabranch Will msg you @ anyscale ... I've requested to join slack but it might take a few days. |
Sent an invite; happy to chat online. |
I think we resolved this offline (feel free to reopen if not resolved.) |
Curious what the specific resolution of this was. @cottrell, have you settled for the Repeater solution Richard showed above or have you found a different way around it? |
Ah, I think the resolution was to just not use this type of computation pattern. |
I'm interested in this issue. Is there an example of doing cross validation with tune? I'm interested in doing cross validation with tensorflow or torch frameworks. |
I'm also interested! @richardliaw, is the |
@richardliaw Hi, do you have any advice on using the Repeater for cross-validation? If not, do you have another technique in mind? |
Hey @FarzanT sorry for the slow reply! Repeater would allow you to run the same model/hypers but using an index to do a different split. In general I would suggest just doing the cross validation manually within the training run. This is what we do in |
Hi, I'm a bot from the Ray team :) To help human contributors to focus on more relevant issues, I will automatically add the stale label to issues that have had no activity for more than 4 months. If there is no further activity in the 14 days, the issue will be closed!
You can always ask for help on our discussion forum or Ray's public slack channel. |
Hi again! The issue will be closed because there has been no more activity in the 14 days since the last message. Please feel free to reopen or open a new issue if you'd still like it to be addressed. Again, you can always ask for help on our discussion forum or Ray's public slack channel. Thanks again for opening the issue! |
Here's a concept that might work: https://discuss.ray.io/t/ray-tune-confidence-interval/2967 Let me pull the code from the example:
Basically what you could do is to use the This was enabled by the I'll close this issue for now, but if there's any more questions or suggestions around this topic, please feel free to re-open or create a new discuss/issue. |
Hi, if we do it manually then there isn't any parallelization across CV folds, right? Is there any way to run different folds in parallel within a Trial? Or, I guess a more general question is, can Tune allow us to have a concept of Trial dependencies? Appreciate if you could share your thoughts on this! |
@krfricke , thank you for the example! How would you recommend this be done if the |
Something similar was asked before but this is different.
What pattern is imagined for run dags that have one config instance leading to a number of (model, data_train, data_test) runs? You might typically collapse this into a mean score - std score or something like that.
I feel like I am fighting the framework so am probably not getting something right. The issue is the Trainable class handle the serialization/persistance but you really way to pass everything down to the remotes that get parallized.
Or is there some pattern at the tune.run level that allows you to sample across the train/test pairs (not optimize) i.e. treat the data like config params?
#6560
The text was updated successfully, but these errors were encountered: