You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: units/en/unit2/q-learning.mdx
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -19,7 +19,7 @@ The **Q comes from "the Quality" (the value) of that action at that state.**
19
19
Let's recap the difference between value and reward:
20
20
21
21
- The *value of a state*, or a *state-action pair* is the expected cumulative reward our agent gets if it starts at this state (or state-action pair) and then acts accordingly to its policy.
22
-
- The *reward* is the **feedback I get from the environment** after performing an action at a state.
22
+
- The *reward* is the **feedback it gets from the environment** after performing an action at a state.
23
23
24
24
Internally, our Q-function is encoded by **a Q-table, a table where each cell corresponds to a state-action pair value.** Think of this Q-table as **the memory or cheat sheet of our Q-function.**
0 commit comments