Practical_RL

A course on reinforcement learning in the wild. Taught on-campus in HSE and Yandex SDA (russian) and maintained to be friendly to online students (both english and russian).

Manifesto:

Optimize for the curious. For all the materials that aren’t covered in detail there are links to more information and related materials (D.Silver/Sutton/blogs/whatever). Assignments will have bonus sections if you want to dig deeper.
Practicality first. Everything essential to solving reinforcement learning problems is worth mentioning. We won't shun away from covering tricks and heuristics. For every major idea there should be a lab that allows to “feel” it on a practical problem.
Git-course. Know a way to make the course better? Noticed a typo in a formula? Made the code more readable? Made a version for alternative framework? You're awesome! Pull-request it!

Coordinates and useful links

HSE classes are on mondays at 18-10 in Room 505
YSDA classes are on thursdays at 18-00 in "Princeton" classroom
Lecture slides are here.
Online student survival guide
Installing the libraries - guide and issues thread
Magical button that creates VM: (may be down time to time. If it won't load for 2-3 minutes, it's down)
Telegram chat room (russian)
Gitter chat room (english)
How to submit homeworks[HSE and YSDA only]: anytask instructions and grading rules
E-mail for everything else : practicalrl17@gmail.com (please don't submit homeworks via e-mail)
Anonymous feedback form for everything that didn't go through e-mail.
About the course

Announcements

01.03.17 - YSDA deadline on week2 homework moved to 08.03.17
28.02.17 - (HSE) homework 4 published
24.02.17 - Dependencies updated (same url). Please install theano/lasagne/agentnet until week4 or make sure you're familiar enough with your deep learning framework of choice.
23.02.17 - YSDA homework 2 can be found here. If you're from HSE you can opt to submit either old or new whichever you prefer.
17.02.17 - warning! we force-pushed into the repository. Please back-up your github files before you pull!
16.02.17 - Lecture slides are now available through urls in README files for each week like this. You can also find full archive here.
16.02.17 - HSE homework 3 added
14.02.17 - HSE deadlines for weeks 1-2 extended!
14.02.17 - anytask invites moved here
14.02.17 - if you're from HSE track and we didn't reply to your week0 homework submission, raise panic!
11.02.17 - week2 success thresholds are now easier: get >+50 for LunarLander or >-180 for MountainCar. Solving env will yield bonus points.
13.02.17 - Added invites for anytask.org
10.02.17 - from now on, we'll formally describe homework and add useful links via ./week*/README.md files. Example.
9.02.17 - YSDA track started
7.02.17 - HWs checked up
6.02.17 - week2 uploaded
27.01.17 - merged fix by omtcyfz, thanks!
27.01.17 - added course mail for homework submission: practicalrl17@gmail.com
23.01.17 - first class happened
23.01.17 - created repo

Syllabus

week0 Welcome to the MDP
Lecture: RL problems around us. Markov decision process. Simple solutions through combinatoric optimization.
Seminar: Frozenlake with genetic algorithms
- Homework description - ./week0/README.md
- HSE Homework deadline: 23.59 1.02.17
- YSDA Homework deadline: 23.59 19.02.17
week1 Crossentropy method and monte-carlo algorithms
Lecture: Crossentropy method in general and for RL. Extension to continuous state & action space. Limitations.
Seminar: Tabular CEM for Taxi-v0, deep CEM for box2d environments.
- HSE homework deadline: 23.59 15.02.17
- YSDA homework deadline: 23.59 26.02.17
week2 Temporal Difference
Lecture: Discounted reward MDP. Value iteration. Q-learning. Temporal difference Vs Monte-Carlo.
Seminar: Tabular q-learning
- Homework description - see ./week2/README.md
- HSE homework deadline: 23.59 15.02.17
- YSDA homework deadline: 23.59 8.03.17
week3 Value-based algorithms
Lecture: SARSA. Off-policy Vs on-policy algorithms. N-step algorithms. Eligibility traces.
Seminar: Qlearning Vs SARSA Vs expected value sarsa in the wild
Homework description
- HSE homework deadline 23.59 22.02.17
week3.5 Deep learning recap
Lecture: deep learning, convolutional nets, batchnorm, dropout, data augmentation and all that stuff.
Seminar: Theano/Lasagne on mnist, simple deep q-learning with CartPole (TF version contrib is welcome)
Homework - convnets on MNIST or simple deep q-learning
- HSE homework deadline 23.59 1.03.17
week4 Approximate reinforcement learning
Lecture: Infinite/continuous state space. Value function approximation. Convergence conditions. Multiple agents trick.
Seminar: Approximate Q-learning with experience replay. (CartPole, Acrobot, Doom)
Homework - convnets on MNIST or simple deep q-learning
- HSE homework deadline 23.59 8.03.17

Future lectures:

week5 Deep reinforcement learning (coming 6.03.2017)
Lecture: Deep Q-learning/sarsa/whatever. Heuristics & motivation behind them: experience replay, target networks, double/dueling/bootstrap DQN, etc.
Seminar: Double DQN, Dueling DQN, experience replay on atari
week6 Policy gradient methods (coming 13.03.2017)
Lecture: Motivation for policy-based, policy gradient, logderivative trick, REINFORCE/crossentropy method, variance theorem(advantage), advantage actor-critic (incl.n-step advantage), off-policy actor-critic (off-PAC), natural gradients(briefly), continuous action space(teaser).
Seminar: a2c Vs qlearning for MountainCar/Doom, entropy regularization & tricks, simple demo with continuous action spaces

somewhere here comes RNN crash-course

week7 Partially observable MDPs (coming 20.03.2017)
Lecture: POMDP intro. Model-based solvers. RNN solvers. RNN tricks: attention, problems with normalization methods, pre-training.
Seminar: Deep kung-fu & doom with recurrent A2C vs feedforward A2C
week i+1 Trust Region Policy Optimization.
Lecture: Trust region policy optimization in detail.
approximate TRPO vs approximate Q-learning for gym box2d envs (robotics-themed)
week i+1 RL in Large/Continuous action spaces.
Lecture: Continuous action space MDPs. Model-based approach (NAF). Actor-critic approach (dpg, svg). Trust Region Policy Optimization. Large discrete action space problem. Action embedding.
Seminar: Classic Control and BipedalWalker with ddpg Vs qNAF. https://gym.openai.com/envs/BipedalWalker-v2 .
week i+1 Advanced exploration methods: intrinsic motivation
Lecture: Augmented rewards. Heuristics (UNREAL,density-based models), formal approach: information maximizing exploration. Model-based tricks(also refer mcts).
Seminar: Vime vs epsilon-greedy for Go9x9 (bonus 19x19)
week i+1 Advanced exploration methods: probablistic approach.
Lecture: Improved exploration methods (quantile-based, etc.). Bayesian approach. Case study: Contextual bandits for RTB.
Seminar: Bandits
week i+1 Case studies I
Lecture: Reinforcement Learning as a general way to optimize non-differentiable loss. KL(p||q) vs KL(q||p). Case study: machine ranslation, speech synthesis, conversation models.
Seminar: Optimizing Levenshtein distance with seq2seq for g2p
week i+1 Hierarchical MDP
Lecture: MDP Vs real world. Sparse and delayed rewards. When Q-learning fails. Hierarchical MDP. Hierarchy as temporal abstraction. MDP with symbolic reasoning.
Seminar: Hierarchical RL for atari games with rare rewards (starting from pre-trained DQN)
week i+1 Case studies II
Lecture: Direct policy optimization: finance. Inverse Reinforcement Learning: personalized medial treatment, robotics.
Seminar: Portfolio optimization as POMDP.

Course staff

Course materials and teaching by

Fedor Ratnikov - lectures, seminars, hw checkups
Alexander Fritsler - lectures, seminars, hw checkups
Oleg Vasilev - seminars, hw checkups, technical support
Pavel Shvechikov - lectures, seminars, HW checkups

Contributors

Using pictures from http://ai.berkeley.edu/home.html
Tensorflow assignments by Scitator
Other contributions: omtcyfz dmittov arogozhnikov

Name		Name	Last commit message	Last commit date
Latest commit History 145 Commits
docker		docker
week0		week0
week1		week1
week2		week2
week3.5		week3.5
week3		week3
week4		week4
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE.md		LICENSE.md
README.md		README.md
xvfb		xvfb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docker

docker

week0

week0

week1

week1

week2

week2

week3.5

week3.5

week3

week3

week4

week4

.gitignore

.gitignore

Dockerfile

Dockerfile

LICENSE.md

LICENSE.md

README.md

README.md

xvfb

xvfb

Repository files navigation

Practical_RL

Manifesto:

Coordinates and useful links

Announcements

Syllabus

Future lectures:

Course staff

Contributors

About

Releases

Packages

Languages

License

kmkolasinski/Practical_RL

Folders and files

Latest commit

History

Repository files navigation

Practical_RL

Manifesto:

Coordinates and useful links

Announcements

Syllabus

Future lectures:

Course staff

Contributors

About

Resources

License

Stars

Watchers

Forks

Languages