SAC custom env #43

hn2 · 2022-03-05T07:51:29Z

I get this error:

ValueError Traceback (most recent call last)
in
26 score = 0
27 while not done:
---> 28 action = agent.choose_action(observation)
29 observation_, reward, done, info = env.step(action)
30 score += reward

in choose_action(self, observation)
23 def choose_action(self, observation):
24 state = T.Tensor([observation]).to(self.actor.device)
---> 25 actions, _ = self.actor.sample_normal(state, reparameterize=False)
26
27 return actions.cpu().detach().numpy()[0]

in sample_normal(self, state, reparameterize)
38 def sample_normal(self, state, reparameterize=True):
39 mu, sigma = self.forward(state)
---> 40 probabilities = Normal(mu, sigma)
41
42 if reparameterize:

~\Anaconda3\lib\site-packages\torch\distributions\normal.py in init(self, loc, scale, validate_args)
48 else:
49 batch_shape = self.loc.size()
---> 50 super(Normal, self).init(batch_shape, validate_args=validate_args)
51
52 def expand(self, batch_shape, _instance=None):

~\Anaconda3\lib\site-packages\torch\distributions\distribution.py in init(self, batch_shape, event_shape, validate_args)
54 if not valid.all():
55 raise ValueError(
---> 56 f"Expected parameter {param} "
57 f"({type(value).name} of shape {tuple(value.shape)}) "
58 f"of distribution {repr(self)} "

ValueError: Expected parameter loc (Tensor of shape (1, 1, 1)) of distribution Normal(loc: tensor([[[nan]]], device='cuda:0', grad_fn=), scale: tensor([[[nan]]], device='cuda:0', grad_fn=)) to satisfy the constraint Real(), but found invalid values:
tensor([[[nan]]], device='cuda:0', grad_fn=)

Any idea how to fix this?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SAC custom env #43

SAC custom env #43

hn2 commented Mar 5, 2022

SAC custom env #43

SAC custom env #43

Comments

hn2 commented Mar 5, 2022