Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

Truncation for exploration? #13

Closed
AaronLiu1997 opened this issue Nov 13, 2021 · 2 comments
Closed

Truncation for exploration? #13

AaronLiu1997 opened this issue Nov 13, 2021 · 2 comments

Comments

@AaronLiu1997
Copy link

I'm reading the paper and code, and can't follow the truncation process. Table 2 sets exploration stddev. clip equal to 0.3, so I assume that the exploration noise is clipped. However, the action seems to be selected by action = dist.sample(clip=None) which does no clipping. Instead clipping is seemingly applied during training with dist.sample(clip=self.stddev_clip). Am I misunderstanding something here? Thanks!!

@denisyarats
Copy link
Contributor

This is correct, the exploration noise is only being clipped when the sampled action is being used to query the Q function.

@AaronLiu1997
Copy link
Author

Got it, thanks! I guess I was just confused about the terminology then!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants