New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of async one step q-learning #1064

Merged
merged 35 commits into from Aug 8, 2017

Conversation

Projects
None yet
3 participants
@ShangtongZhang
Member

ShangtongZhang commented Jul 19, 2017

Although the test time is still a bummer, I think in general it's ready for review.

namespace mlpack {
namespace rl {
class TrainingConfig

This comment has been minimized.

@zoq

zoq Jul 20, 2017

Member

I like the approach, but it's not conform with what's proposed here: #1044 (comment). I think for now this is fine. Not sure adding each parameter to the constructor is helpful here. What be great to get @rcurtin's view on this.

@zoq

zoq Jul 20, 2017

Member

I like the approach, but it's not conform with what's proposed here: #1044 (comment). I think for now this is fine. Not sure adding each parameter to the constructor is helpful here. What be great to get @rcurtin's view on this.

This comment has been minimized.

@ShangtongZhang

ShangtongZhang Jul 20, 2017

Member

CV shouldn't be a problem, as I don't think there exists some cross validation in RL. Maybe in the future I need to refactor the API to make it consistent with other parts of mlpack, but it won't block the CV project anyway for now. As for hyper-parameter tuner, I'm not very clear how it works and it seems in that discussion there hasn't be a clear standard.

In the hyperparameter tuner it only makes sense for the user to pass all of the hyperparameters to the Optimize() function

RL does need a hyper parameter tuner, but if we need to pass all the hyper parameters to Train(), it may be a disaster for user.

@ShangtongZhang

ShangtongZhang Jul 20, 2017

Member

CV shouldn't be a problem, as I don't think there exists some cross validation in RL. Maybe in the future I need to refactor the API to make it consistent with other parts of mlpack, but it won't block the CV project anyway for now. As for hyper-parameter tuner, I'm not very clear how it works and it seems in that discussion there hasn't be a clear standard.

In the hyperparameter tuner it only makes sense for the user to pass all of the hyperparameters to the Optimize() function

RL does need a hyper parameter tuner, but if we need to pass all the hyper parameters to Train(), it may be a disaster for user.

This comment has been minimized.

@zoq

zoq Jul 20, 2017

Member

I should have mentioned that, I was referring to the standard mlpack interface: especially "Hyperparameters are passed to the constructor, but not to Train().", if we do that we should probably do this for all methods to avoid any confusion. But I I think we can do this later and I agree we don't have to wait for the change.

@zoq

zoq Jul 20, 2017

Member

I should have mentioned that, I was referring to the standard mlpack interface: especially "Hyperparameters are passed to the constructor, but not to Train().", if we do that we should probably do this for all methods to avoid any confusion. But I I think we can do this later and I agree we don't have to wait for the change.

This comment has been minimized.

@rcurtin

rcurtin Jul 25, 2017

Member

Sorry for the slow response here---I was unavailable over the weekend so I could only get to this now. I think for the hyperparameter tuner, we would only need to add a constructor to TrainingConfig that also takes all of the parameter values. I believe the HP tuner will be able to pass hyperparameters to "nested" constructors, so even if not all hyperparameters are directly set in the top-level RL class constructors, it's ok if they are set in constructors of objects that are passed to RL classes.

@rcurtin

rcurtin Jul 25, 2017

Member

Sorry for the slow response here---I was unavailable over the weekend so I could only get to this now. I think for the hyperparameter tuner, we would only need to add a constructor to TrainingConfig that also takes all of the parameter values. I believe the HP tuner will be able to pass hyperparameters to "nested" constructors, so even if not all hyperparameters are directly set in the top-level RL class constructors, it's ok if they are set in constructors of objects that are passed to RL classes.

Show outdated Hide outdated src/mlpack/tests/async_learning_test.cpp
@zoq

This comment has been minimized.

Show comment
Hide comment
@zoq

zoq Jul 20, 2017

Member

I'm not sure there is anything we can do about the test right now, but at the same time I'm not sure it's feasible to include the test, what do you think? I don't like to remove it, since it could be helpful for everyone looking for a simple example, but running it on a daily basis is probably difficult, especially if the python bindings are merged.

Member

zoq commented Jul 20, 2017

I'm not sure there is anything we can do about the test right now, but at the same time I'm not sure it's feasible to include the test, what do you think? I don't like to remove it, since it could be helpful for everyone looking for a simple example, but running it on a daily basis is probably difficult, especially if the python bindings are merged.

@ShangtongZhang

This comment has been minimized.

Show comment
Hide comment
@ShangtongZhang

ShangtongZhang Jul 20, 2017

Member

Yeah it's important to keep this test because without this simple example, user may never know the correct way to use it. In my understanding, the most important role for this test case is to teach user how to use it rather than to make sure it converges (after all we can test convergency off-line when we modify rl-related code). Based on this following is my suggestion:

  • Load a pre-trained network
  • Simply comment out the agent.Train(...) and leave some comments there

Which one do you prefer?

Member

ShangtongZhang commented Jul 20, 2017

Yeah it's important to keep this test because without this simple example, user may never know the correct way to use it. In my understanding, the most important role for this test case is to teach user how to use it rather than to make sure it converges (after all we can test convergency off-line when we modify rl-related code). Based on this following is my suggestion:

  • Load a pre-trained network
  • Simply comment out the agent.Train(...) and leave some comments there

Which one do you prefer?

@zoq

This comment has been minimized.

Show comment
Hide comment
@zoq

zoq Jul 20, 2017

Member

I like the idea of using a pre-trained network, but not sure what the speedup in this case would be.

Member

zoq commented Jul 20, 2017

I like the idea of using a pre-trained network, but not sure what the speedup in this case would be.

@ShangtongZhang

This comment has been minimized.

Show comment
Hide comment
@ShangtongZhang

ShangtongZhang Jul 20, 2017

Member

I can try to load a fully converged network to see what happens.

Member

ShangtongZhang commented Jul 20, 2017

I can try to load a fully converged network to see what happens.

@zoq

This comment has been minimized.

Show comment
Hide comment
@zoq

zoq Jul 20, 2017

Member

I guess, the interesting part is the travis runtime here, so yeah let's see.

Member

zoq commented Jul 20, 2017

I guess, the interesting part is the travis runtime here, so yeah let's see.

@ShangtongZhang

This comment has been minimized.

Show comment
Hide comment
@ShangtongZhang

ShangtongZhang Jul 20, 2017

Member
    std::ifstream ifs(fileName, std::ios::binary);
    boost::archive::binary_iarchive i(ifs);
    i >> data::CreateNVP(model, "network");
    ifs.close();

    std::ofstream ofs(fileName, std::ios::binary);
    boost::archive::binary_oarchive o(ofs);
    o << data::CreateNVP(agent.Network(), "network");
    ofs.close();

Is my serialization wrong? It works well in my local environment but gets

unknown location(0): fatal error in "OneStepQLearningTest": std::exception: unsupported version

in the server.

Member

ShangtongZhang commented Jul 20, 2017

    std::ifstream ifs(fileName, std::ios::binary);
    boost::archive::binary_iarchive i(ifs);
    i >> data::CreateNVP(model, "network");
    ifs.close();

    std::ofstream ofs(fileName, std::ios::binary);
    boost::archive::binary_oarchive o(ofs);
    o << data::CreateNVP(agent.Network(), "network");
    ofs.close();

Is my serialization wrong? It works well in my local environment but gets

unknown location(0): fatal error in "OneStepQLearningTest": std::exception: unsupported version

in the server.

@zoq

This comment has been minimized.

Show comment
Hide comment
@zoq

zoq Jul 20, 2017

Member

hm, I'll have to take a deeper look into the issue; have you tested:

data::Save("output_model_file.bin", "model", model, false);
data::Load("input_model_file.bin", "model", model);

I think it's easier to use.

Member

zoq commented Jul 20, 2017

hm, I'll have to take a deeper look into the issue; have you tested:

data::Save("output_model_file.bin", "model", model, false);
data::Load("input_model_file.bin", "model", model);

I think it's easier to use.

@zoq

This comment has been minimized.

Show comment
Hide comment
@zoq

zoq Jul 20, 2017

Member

Looks like it works with data::save, data::load.

Member

zoq commented Jul 20, 2017

Looks like it works with data::save, data::load.

ShangtongZhang added some commits Jul 26, 2017

@ShangtongZhang

This comment has been minimized.

Show comment
Hide comment
@ShangtongZhang

ShangtongZhang Jul 27, 2017

Member

Finally made it. Now the # of threads is total independent with # of workers.

Member

ShangtongZhang commented Jul 27, 2017

Finally made it. Now the # of threads is total independent with # of workers.

ShangtongZhang added some commits Jul 27, 2017

@ShangtongZhang

This comment has been minimized.

Show comment
Hide comment
@ShangtongZhang

ShangtongZhang Jul 28, 2017

Member

Here is some stats with random seed:
With 8 threads, it fails twice out of 20 trials, with 1 thread, it fails once out of 20 trials. In general it's stable.

Member

ShangtongZhang commented Jul 28, 2017

Here is some stats with random seed:
With 8 threads, it fails twice out of 20 trials, with 1 thread, it fails once out of 20 trials. In general it's stable.

@zoq

This comment has been minimized.

Show comment
Hide comment
@zoq

zoq Jul 28, 2017

Member

Here is some stats with random seed:
With 8 threads, it fails twice out of 20 trials, with 1 thread, it fails once out of 20 trials. In general it's stable.

I think that's fine, I guess a quick solution would be to run multiple trials like we did here: https://github.com/mlpack/mlpack/blob/master/src/mlpack/tests/recurrent_network_test.cpp#L571. But that might be unnecessary here, what do you think?

Member

zoq commented Jul 28, 2017

Here is some stats with random seed:
With 8 threads, it fails twice out of 20 trials, with 1 thread, it fails once out of 20 trials. In general it's stable.

I think that's fine, I guess a quick solution would be to run multiple trials like we did here: https://github.com/mlpack/mlpack/blob/master/src/mlpack/tests/recurrent_network_test.cpp#L571. But that might be unnecessary here, what do you think?

@ShangtongZhang

This comment has been minimized.

Show comment
Hide comment
@ShangtongZhang

ShangtongZhang Jul 28, 2017

Member

Maybe we don't need that for now. The probability for failure is small enough I think.

Member

ShangtongZhang commented Jul 28, 2017

Maybe we don't need that for now. The probability for failure is small enough I think.

@zoq

This comment has been minimized.

Show comment
Hide comment
@zoq

zoq Jul 28, 2017

Member

Agreed, we can adjust the test case if needed later.

Member

zoq commented Jul 28, 2017

Agreed, we can adjust the test case if needed later.

@zoq

This comment has been minimized.

Show comment
Hide comment
@zoq

zoq Jul 30, 2017

Member

Thanks for looking into the issue, @rcurtin any more comments on this one, if not I think we can merge it.

Member

zoq commented Jul 30, 2017

Thanks for looking into the issue, @rcurtin any more comments on this one, if not I think we can merge it.

@ShangtongZhang

This comment has been minimized.

Show comment
Hide comment
@ShangtongZhang

ShangtongZhang Aug 2, 2017

Member

@rcurtin Would you have any more feedback about the new parallel style?

Member

ShangtongZhang commented Aug 2, 2017

@rcurtin Would you have any more feedback about the new parallel style?

@zoq

This comment has been minimized.

Show comment
Hide comment
@zoq

zoq Aug 2, 2017

Member

Let us wait one more day, before we merge the code.

Member

zoq commented Aug 2, 2017

Let us wait one more day, before we merge the code.

@rcurtin

This comment has been minimized.

Show comment
Hide comment
@rcurtin

rcurtin Aug 4, 2017

Member

Sorry for the slow response. I think the new parallel implementation is a lot better, thank you for taking the time to do that! I left a couple comments on it, but they are minor comments. If you can handle those please I think this is ready for merge.

Member

rcurtin commented Aug 4, 2017

Sorry for the slow response. I think the new parallel implementation is a lot better, thank you for taking the time to do that! I left a couple comments on it, but they are minor comments. If you can handle those please I think this is ready for merge.

ShangtongZhang added some commits Aug 4, 2017

@ShangtongZhang

This comment has been minimized.

Show comment
Hide comment
@ShangtongZhang

ShangtongZhang Aug 8, 2017

Member

Would anyone have more comments?

Member

ShangtongZhang commented Aug 8, 2017

Would anyone have more comments?

@rcurtin

rcurtin approved these changes Aug 8, 2017

Nothing from my end, I guess I should have hit approve a few days ago---sorry about that. Thanks for the hard work! :)

@zoq zoq merged commit 4f313da into mlpack:master Aug 8, 2017

3 checks passed

Style Checks Build finished.
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@zoq

This comment has been minimized.

Show comment
Hide comment
@zoq

zoq Aug 8, 2017

Member

Thanks for the awesome work!

Member

zoq commented Aug 8, 2017

Thanks for the awesome work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment