Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Async n-step q-learning and one step sarsa #1084

Merged
merged 4 commits into from Aug 23, 2017
Merged

Conversation

ShangtongZhang
Copy link
Member

No description provided.

Copy link
Member

@zoq zoq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you meant with "share many common code snippets", and I think it's fine to duplicate some code here, to improve the readability.

@@ -137,6 +139,22 @@ template <
>
class OneStepQLearningWorker;

template <
typename EnvironmentType,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you comment on the template parameter?

@ShangtongZhang
Copy link
Member Author

Would there be more comments on this PR?

* @tparam EnvironmentType The type of the reinforcement learning task.
* @tparam NetworkType The type of the network model.
* @tparam UpdaterType The type of the optimizer.
* @tparam PolicyType The type of the behavior policy. *
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove the * at the end?

using ActionType = typename EnvironmentType::Action;
using TransitionType = std::tuple<StateType, ActionType, double, StateType>;

/**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mind to add a method description here; something like should do:

Construct N step Q-Learning worker with the given parameters and environment.

network.Backward(actionValue, gradients);

// Accumulate gradients.
totalGradients += gradients;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should initialize totalGradients with zero.

* @tparam EnvironmentType The type of the reinforcement learning task.
* @tparam NetworkType The type of the network model.
* @tparam UpdaterType The type of the optimizer.
* @tparam PolicyType The type of the behavior policy. *
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove the * at the end?

ActionType>;

/**
* @param updater The optimizer.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a method description here?

if (terminal || pendingIndex >= config.UpdateInterval())
{
// Initialize the gradient storage.
arma::mat totalGradients(learningNetwork.Parameters().n_rows,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should initialize totalGradients with zeros here.

double targetActionValue = actionValue[std::get<4>(transition)];
if (terminal && i == pending.size() - 1)
targetActionValue = 0;
targetActionValue = std::get<2>(transition) +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use an else case here? That way we can slightly simplify the expression if (terminal && i == pending.size() - 1) is true.

config(config),
deterministic(deterministic),
pending(config.UpdateInterval())
{ reset(); }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use Upper camel casing for all method names?

@ShangtongZhang
Copy link
Member Author

Thanks for your feedback. Hope it's ready to merge now.

Copy link
Member

@zoq zoq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is ready to go; let's go ahead and wait two more days before merging it in, in case anyone else has comments.

@zoq zoq merged commit c8110ab into mlpack:master Aug 23, 2017
@zoq
Copy link
Member

zoq commented Aug 23, 2017

Thanks for another great contribution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants