I think there are some small bugs in your code. #1

ZYunfeii · 2021-07-27T03:31:19Z

In train.py, your code is as follows:

PART II hindsight replay

for i, transition in enumerate(episode_cache):
    new_goals = generate_goals(i, episode_cache, args.HER_sample_num)
    for new_goal in new_goals:
        reward = calcu_reward(new_goal, state, action) 
        state, action, new_state = gene_new_sas(new_goal, transition)
        ram.add(state, action, reward, new_state)

But I think it should be like that:

PART II hindsight replay

for i, transition in enumerate(episode_cache):
    new_goals = generate_goals(i, episode_cache, args.HER_sample_num)
    for new_goal in new_goals:
        state = transition[0]
        action = transition[1]
        reward = calcu_reward(new_goal, state, action) 
        state, action, new_state = gene_new_sas(new_goal, transition) # 一个transition被换成了各种goals
        ram.add(state, action, reward, new_state)

Otherwise, this algorithm is not convergent. I have tried to train it.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I think there are some small bugs in your code. #1

I think there are some small bugs in your code. #1

ZYunfeii commented Jul 27, 2021

I think there are some small bugs in your code. #1

I think there are some small bugs in your code. #1

Comments

ZYunfeii commented Jul 27, 2021

PART II hindsight replay

PART II hindsight replay