🌳 Tip 🌳
To refresh your knowledge on deep RL, checkout Spinning Up in Deep RL (OpenAI)

[E] Explain the explore vs exploit tradeoff with examples.
[E] How would a finite or infinite horizon affect our algorithms?
[E] Why do we need the discount term for objective functions?
[E] Fill in the empty circles using the minimax algorithm.
[M] Fill in the alpha and beta values as you traverse the minimax tree from left to right.
[E] Given a policy, derive the reward function.
[M] Pros and cons of on-policy vs. off-policy.
[M] What’s the difference between model-based and model-free? Which one is more data-efficient?

Provide feedback

Saved searches