Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@
<div align="center">
<img src="./img/concept_of_rl.png" ch="500" width=60%/>
</div>
<center>图10.1.1. 强化学习里的基本概念 </center>
<div align=center>图10.1.1. 强化学习里的基本概念</div>

由于真实的环境复杂度较高,且存在很多与当前问题无关的冗余信息, 通常构建模拟器(Simulator)来模拟真实环境。智能体(Agent)通常指做出决策的模型,即强化学习算法本身。智能体会根据当前这一时刻的状态 $s_t$ 执行动作 $a_t$ 来和环境(Envoriment)交互;同时,环境会将智能体执行动作后到达的下一个时刻的状态 $s_{t+1}$ 和这个动作拿到的奖励 $r_t$ 返回给智能体。智能体可以收集一个时间段内的状态,动作和奖励 $(𝑎_1, 𝑠_1, 𝑟_1, . . . , 𝑎_𝑡, 𝑠_𝑡, 𝑟_𝑡)$ 等历史信息,并将其用来训练自身的强化学习模型。

Expand Down Expand Up @@ -97,4 +97,4 @@ $$V^{\pi}\left(s_{t}=s\right)=\mathbb{E}_{\pi}\left[r_{t}+\gamma r_{t+1}+\gamma^
- [14] Wang S, Jia D, Weng X. Deep reinforcement learning for autonomous driving[J]. arXiv preprint arXiv:1811.11329, 2018.
- [15] Kiran B R, Sobh I, Talpaert V, et al. Deep reinforcement learning for autonomous driving: A survey[J]. IEEE Transactions on Intelligent Transportation Systems, 2021.
- [16] Zhang B, Mao Z, Liu W, et al. Geometric reinforcement learning for path planning of UAVs[J]. Journal of Intelligent & Robotic Systems, 2015, 77(2): 391-409.
- [17] O. Vinyals, I. Babuschkin, W. M. Czarnecki, M. Mathieu, A. Dudzik, J. Chung, D. H. Choi, R. Powell, T. Ewalds, P. Georgiev and others, "Grandmaster level in StarCraft II using multi-agent reinforcement learning," Nature, vol. 575, p. 350–354, 2019.
- [17] O. Vinyals, I. Babuschkin, W. M. Czarnecki, M. Mathieu, A. Dudzik, J. Chung, D. H. Choi, R. Powell, T. Ewalds, P. Georgiev and others, "Grandmaster level in StarCraft II using multi-agent reinforcement learning," Nature, vol. 575, p. 350–354, 2019.