Second Quiz [[quiz2]]

The best way to learn and to avoid the illusion of competence is to test yourself. This will help you to find where you need to reinforce your knowledge.

Q1: What is Q-Learning?

Q2: What is a Q-table?

Q3: Why if we have an optimal Q-function Q* we have an optimal policy?

Solution

Because if we have an optimal Q-function, we have an optimal policy since we know for each state what is the best action to take.

Q4: Can you explain what is Epsilon-Greedy Strategy?

Solution

Epsilon Greedy Strategy is a policy that handles the exploration/exploitation trade-off.

The idea is that we define epsilon ɛ = 1.0:

With probability 1 — ɛ : we do exploitation (aka our agent selects the action with the highest state-action pair value).
With probability ɛ : we do exploration (trying random action).

Q5: How do we update the Q value of a state, action pair?

Solution

Q6: What's the difference between on-policy and off-policy

Solution

Congrats on finishing this Quiz 🥳, if you missed some elements, take time to read again the chapter to reinforce (😏) your knowledge.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

quiz2.mdx

quiz2.mdx

Second Quiz [[quiz2]]

Q1: What is Q-Learning?

Q2: What is a Q-table?

Q3: Why if we have an optimal Q-function Q* we have an optimal policy?

Q4: Can you explain what is Epsilon-Greedy Strategy?

Q5: How do we update the Q value of a state, action pair?

Q6: What's the difference between on-policy and off-policy

Files

quiz2.mdx

Latest commit

History

quiz2.mdx

File metadata and controls

Second Quiz [[quiz2]]

Q1: What is Q-Learning?

Q2: What is a Q-table?

Q3: Why if we have an optimal Q-function Q* we have an optimal policy?

Q4: Can you explain what is Epsilon-Greedy Strategy?

Q5: How do we update the Q value of a state, action pair?

Q6: What's the difference between on-policy and off-policy