Skip to content

Commit

Permalink
Merge pull request #4 from Kaixhin/master
Browse files Browse the repository at this point in the history
Reduce clamping
  • Loading branch information
floringogianu committed Apr 7, 2018
2 parents e2425d1 + ab257ca commit eb93978
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion policy_improvement/categorical_update.py
Expand Up @@ -50,7 +50,7 @@ def accumulate_gradient(self, batch_sz, states, actions, rewards,
target_qa_probs = self._get_categorical(next_states, rewards, mask)

# Compute the cross-entropy of phi(TZ(x_,a)) || Z(x,a)
qa_probs.data.clamp_(0.01, 0.99) # Tudor's trick for avoiding nans
qa_probs = qa_probs.clamp(min=1e-3) # Tudor's trick for avoiding nans
loss = - torch.sum(target_qa_probs * torch.log(qa_probs))

# Accumulate gradients
Expand Down

0 comments on commit eb93978

Please sign in to comment.