Skip to content

Latest commit

 

History

History
24 lines (17 loc) · 977 Bytes

8.2.3-reinforcement-learning.md

File metadata and controls

24 lines (17 loc) · 977 Bytes

8.2.3 Reinforcement learning

🌳 Tip 🌳
To refresh your knowledge on deep RL, checkout Spinning Up in Deep RL (OpenAI)

  1. [E] Explain the explore vs exploit tradeoff with examples.

  2. [E] How would a finite or infinite horizon affect our algorithms?

  3. [E] Why do we need the discount term for objective functions?

  4. [E] Fill in the empty circles using the minimax algorithm.

    Minimax algorithm
  5. [M] Fill in the alpha and beta values as you traverse the minimax tree from left to right.

    Alpha-beta pruning
  6. [E] Given a policy, derive the reward function.

  7. [M] Pros and cons of on-policy vs. off-policy.

  8. [M] What’s the difference between model-based and model-free? Which one is more data-efficient?