Skip to content
/ rl Public

Introduction to Reinforcement Learning: A Short Course

License

Notifications You must be signed in to change notification settings

scott-moura/rl

Repository files navigation

Introduction to Reinforcement Learning

Welcome! This course is jointly taught by UC Berkeley and the Tsinghua-Berkeley Shenzhen Institute (TBSI).

Instructors

  • Prof. Scott Moura (UC Berkeley) <smoura [at] berkeley.edu>
  • Co-Instructor Saehong Park (UC Berkeley) <sspark [at] berkeley.edu>
  • TA Xinyi Zhou (TBSI) <zxyyx48 [at] 163.com>

Course Schedule

China Time California Time
July 7, 8, 9, 10 (Tu-F) July 6, 7, 8, 9 (M-Th)
July 14, 15, 16, 17 (Tu-F); July 13, 14, 15, 16, 17 (M-Th)
all at 08:30-10:05 China Time all at 5:30pm PT - 7:05pm PT

Add to Google Calendar: ()

Day-by-Day Schedule

Day Topic Speaker Pre-recorded Lecture Slides / Notes Real-time Lecture Recordings
1 1a. Introduction - Course Org Scott Moura Zoom Recording PW: 1e*OV@Re LEC1a Slides Recording Link PW: 9L%JePa=
1b. Introduction – History of RL Scott Moura Zoom Recording PW: 1k.E69^o LEC1a Slides
1c. Optimal Control Intro Scott Moura Zoom Recording PW: 2B&=2@*@
2 2a. Dynamic Programming Scott Moura Zoom Recording PW: 3F*1rg%? LEC2a Notes Recording Link PW: 8Q?#51=J
2b. Case Study: Linear Quadratic Regulator (LQR) Scott Moura Zoom Recording PW: 5Y#4=58& LEC2b Notes
3 3a. Policy Evaluation & Policy Improvement Scott Moura Zoom Recording PW: 9N@%H4&@ LEC3a Notes Recording Link PW: 1A@@0G63
3b. Policy Iteration Algo Scott Moura Zoom Recording PW: 6y+!+6#9 LEC3b Notes
3c. Case Study: LQR Scott Moura Zoom Recording PW: 6D@YkC&= LEC3c Notes
4 4a. Approximate DP: TD Error & Value Function Approx. Scott Moura Zoom Recording PW: 6v&78$We LEC4a Notes Recording Link PW: 4t=#ye7T
4b. Case Study: LQR Scott Moura Zoom Recording PW: 1O^fh.8+ LEC4b Notes Installation Recording PW: 2s+83!eQ
4c. Online RL with ADP Scott Moura Zoom Recording PW: 0q=.4378 LEC4c Notes
5 5a. Actor-Critic Method Scott Moura Zoom Recording PW: 2y!@@#$7 LEC5a Notes Recording Link PW: 1Z^6B28+
5b. Case Study: Offshore Wind Scott Moura LEC5b Notes
6 6a. Markov Decision Process Saehong Park Zoom Recording PW:5L=*%&2i LEC6 Notes Recording Link PW: 4L*=91?@
6b. Q-Learning Saehong Park Zoom Recording PW: 3K!+fj^V
7 7a. Policy Optimization Saehong Park Zoom Recording PW: 0W$fa0$M LEC7a Notes Recording Link PW: 9j++=3$5
7b. Policy Gradient Saehong Park Zoom Recording PW: 2N++5&I3 LEC7b Notes
7c. Policy Gradient Saehong Park Zoom Recording PW: 3j%n80** LEC7c Notes
8 8a. Actor Critic Saehong Park Zoom Recording PW: 2F!WI9$8 LEC8a Notes Recording Link PW: 0W$+=9P*
8b. Actor Critic Saehong Park Zoom Recording PW: 9r$HH%59 LEC8b Notes
8c. RL for Energy Systems: Battery Fast-charging Saehong Park Zoom Recording PW: 9r$HH%59 Slides

Topic Outline

  1. Optimal Control
  2. Dynamic Programming
    1. Principal of Optimality & Value Functions
      • Case Study: Linear Quadratic Regulator (LQR)
  3. Policy Evaluation & Policy Improvement
    1. Policy Iteration Algo & Variants
    • Case Study: LQR
  4. Approximate Dynamic Programming (ADP)
    1. Temporal Difference (TD) Error
    2. Value Function Approximation
      • Case Study: LQR
    3. Online RL with ADP
    4. Actor-Critic Method
      • Case Study: Offshore Wind
  5. Q-Learning
    1. Q-learning algorithm
    2. Advanced Q-learning algorithm, i.e., DQN
  6. Policy Gradient
    1. Policy Optimization
    2. Vanilla policy gradient (REINFORCE)
  7. Actor-Critic using Policy Gradient
    1. Actor-Critic using Policy Gradient
    2. Advanced Actor-Critic algorithm, i.e., DDPG
  8. RL for energy systems
    1. Case Study: Battery Fast-charging

Lectures Notes

Jupyter Notebook

About

Introduction to Reinforcement Learning: A Short Course

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published