Skip to content

Commit f8ed470

Browse files
Merge pull request #624 from kuduwa-keshavram/patch-1
Simple grammatical change in q-learning.mdx LGTM
2 parents ab308e9 + b41a9c0 commit f8ed470

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

units/en/unit2/q-learning.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ The **Q comes from "the Quality" (the value) of that action at that state.**
1919
Let's recap the difference between value and reward:
2020

2121
- The *value of a state*, or a *state-action pair* is the expected cumulative reward our agent gets if it starts at this state (or state-action pair) and then acts accordingly to its policy.
22-
- The *reward* is the **feedback I get from the environment** after performing an action at a state.
22+
- The *reward* is the **feedback it gets from the environment** after performing an action at a state.
2323

2424
Internally, our Q-function is encoded by **a Q-table, a table where each cell corresponds to a state-action pair value.** Think of this Q-table as **the memory or cheat sheet of our Q-function.**
2525

0 commit comments

Comments
 (0)