Skip to content

Commit

Permalink
Merge pull request #10 from Ynjxsjmh/patch-1
Browse files Browse the repository at this point in the history
修正公式 3.18
  • Loading branch information
qiwihui committed Oct 25, 2020
2 parents d6635b4 + 8e01938 commit bbf8e8d
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion source/partI/chapter3/finite_markov_decision_process.rst
Expand Up @@ -624,7 +624,7 @@ MDP框架是从相互作用的目标导向学习的问题中抽象出来的。
\begin{align*}
q_*(s,a) &= \mathbb{E}\left[R_{t+1}+\gamma\sum_{a^\prime}q_*(S_{t+1,a^\prime})|S_t=s,A_t=a\right] \\
&=\sum_{s^\prime,r}p(s^\prime,r|s,a)[r+\gamma \sum_{a^\prime}q_*(s^\prime,a^\prime)]
&=\sum_{s^\prime,r}p(s^\prime,r|s,a)[r+\gamma \max_{a^\prime}q_*(s^\prime,a^\prime)]
\end{align*}
下图中的备份图以图像方式显示了在 :math:`v_*` 和 :math:`q_*` 的贝尔曼最优方程中考虑的未来状态和动作的跨度。
Expand Down

0 comments on commit bbf8e8d

Please sign in to comment.