Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[S-02-2] UOF + DDPG Implementation #11

Closed
3 of 4 tasks
Tracked by #8
CUN-bjy opened this issue Aug 25, 2021 · 1 comment
Closed
3 of 4 tasks
Tracked by #8

[S-02-2] UOF + DDPG Implementation #11

CUN-bjy opened this issue Aug 25, 2021 · 1 comment
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@CUN-bjy
Copy link
Member

CUN-bjy commented Aug 25, 2021

to-do-list

Results

@CUN-bjy CUN-bjy changed the title -> UOF + DDPG Implementation UOF + DDPG Implementation Aug 28, 2021
@CUN-bjy CUN-bjy changed the title UOF + DDPG Implementation [S-02-2] UOF + DDPG Implementation Aug 28, 2021
@CUN-bjy CUN-bjy added this to Stage 2 in project-sandwich-man Aug 28, 2021
@CUN-bjy CUN-bjy added this to the Stage 2 milestone Aug 28, 2021
@CUN-bjy CUN-bjy self-assigned this Sep 1, 2021
@CUN-bjy CUN-bjy added the enhancement New feature or request label Sep 1, 2021
@CUN-bjy
Copy link
Member Author

CUN-bjy commented Oct 4, 2021

Discussion

  1. MountainCarContinuous has some dependencies between time-series actions or states-actions.
  • using HAC(Hierarchical Actor Critic), this gap can be handled because all states(position and velocity) could be subgoals.
  • on the other hand, using UOF, the gap couldn't be handled because we cannot choose the optimal subgoals(we don't know exactly about the domain information of environments)
  1. MountainCarContinuous has just one final goal and a small range of start position.
  • it makes UOF harder to converge. Since UOF just uses some subgoal(even we don't know exactly this goal is good or not), we cannot get various states and achieved goals for the hindsight framework.
  1. MountainCarContinuous requires harder and continuous hierarchical reasoning.
  • In UOF paper, the algorithm was tested only block-stacking manipulation problem with various start positions and goal positions. In this environment, the agent's state is not affecting its behavior.
  • We cannot handle MountainCarContinuous only using discrete (and unknown) subgoals

Closing this issue,

  • we need to test on block-stacking manipulation problem.
  • UOF is not good enough for general hierarchical framwork.

@CUN-bjy CUN-bjy closed this as completed Oct 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Development

No branches or pull requests

1 participant