Truncation for exploration? #13

AaronLiu1997 · 2021-11-13T23:53:20Z

I'm reading the paper and code, and can't follow the truncation process. Table 2 sets exploration stddev. clip equal to 0.3, so I assume that the exploration noise is clipped. However, the action seems to be selected by action = dist.sample(clip=None) which does no clipping. Instead clipping is seemingly applied during training with dist.sample(clip=self.stddev_clip). Am I misunderstanding something here? Thanks!!

The text was updated successfully, but these errors were encountered:

denisyarats · 2021-11-15T00:24:48Z

This is correct, the exploration noise is only being clipped when the sampled action is being used to query the Q function.

AaronLiu1997 · 2021-11-15T15:57:03Z

Got it, thanks! I guess I was just confused about the terminology then!

AaronLiu1997 closed this as completed Nov 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Truncation for exploration? #13

Truncation for exploration? #13

AaronLiu1997 commented Nov 13, 2021

denisyarats commented Nov 15, 2021

AaronLiu1997 commented Nov 15, 2021

Truncation for exploration? #13

Truncation for exploration? #13

Comments

AaronLiu1997 commented Nov 13, 2021

denisyarats commented Nov 15, 2021

AaronLiu1997 commented Nov 15, 2021