Deep Reinforcement Learning With Python

Master classic RL, deep RL, distributional RL, inverse RL, and more using OpenAI Gym and TensorFlow with extensive Math

About the book

With significant enhancement in the quality and quantity of algorithms in recent years, this second edition of Hands-On Reinforcement Learning with Python has been completely revamped into an example-rich guide to learning state-of-the-art reinforcement learning (RL) and deep RL algorithms with TensorFlow and the OpenAI Gym toolkit.

In addition to exploring RL basics and foundational concepts such as the Bellman equation, Markov decision processes, and dynamic programming, this second edition dives deep into the full spectrum of value-based, policy-based, and actor- critic RL methods with detailed math. It explores state-of-the-art algorithms such as DQN, TRPO, PPO and ACKTR, DDPG, TD3, and SAC in depth, demystifying the underlying math and demonstrating implementations through simple code examples.

The book has several new chapters dedicated to new RL techniques including distributional RL, imitation learning, inverse RL, and meta RL. You will learn to leverage Stable Baselines, an improvement of OpenAI's baseline library, to implement popular RL algorithms effortlessly. The book concludes with an overview of promising approaches such as meta-learning and imagination augmented agents in research.

Get the book

2.1. Setting Up our Machine
2.2. Creating our First Gym Environment
2.3. Generating an episode
2.4. Classic Control Environments
2.5. Cart Pole Balancing with Random Policy
2.6. Atari Game Environments
2.7. Agent Playing the Tennis Game
2.8. Recording the Game
2.9. Other environments
2.10. Environment Synopsis

3. Bellman Equation and Dynamic Programming

3.1. The Bellman Equation
3.2. Bellman Optimality Equation
3.3. Relation Between Value and Q Function
3.4. Dynamic Programming
3.5. Value Iteration
3.6. Solving the Frozen Lake Problem with Value Iteration
3.7. Policy iteration
3.8. Solving the Frozen Lake Problem with Policy Iteration
3.9. Is DP Applicable to all Environments?

4. Monte Carlo Methods

4.1. Understanding the Monte Carlo Method
4.2. Prediction and Control Tasks
4.3. Monte Carlo Prediction
4.4. Understanding the BlackJack Game
4.5. Every-visit MC Prediction with Blackjack Game
4.6. First-visit MC Prediction with Blackjack Game
4.7. Incremental Mean Updates
4.8. MC Prediction (Q Function)
4.9. Monte Carlo Control
4.10. On-Policy Monte Carlo Control
4.11. Monte Carlo Exploring Starts
4.12. Monte Carlo with Epsilon-Greedy Policy
4.13. Implementing On-Policy MC Control
4.14. Off-Policy Monte Carlo Control
4.15. Is MC Method Applicable to all Tasks?

5. Understanding Temporal Difference Learning

5.1. TD Learning
5.2. TD Prediction
5.3. Predicting the Value of States in a Frozen Lake Environment
5.4. TD Control
5.5. On-Policy TD Control - SARSA
5.6. Computing Optimal Policy using SARSA
5.7. Off-Policy TD Control - Q Learning
5.8. Computing the Optimal Policy using Q Learning
5.9. The Difference Between Q Learning and SARSA
5.10. Comparing DP, MC, and TD Methods

6. Case Study: The MAB Problem

6.1. The MAB Problem
6.2. Creating Bandit in the Gym
6.3. Epsilon-Greedy
6.4. Implementing Epsilon-Greedy
6.5. Softmax Exploration
6.6. Implementing Softmax Exploration
6.7. Upper Confidence Bound
6.8. Implementing UCB
6.9. Thompson Sampling
6.10. Implementing Thompson Sampling
6.11. Applications of MAB
6.12. Finding the Best Advertisement Banner using Bandits
6.13. Contextual Bandits

7. Deep Learning Foundations

7.1. Biological and artifical neurons
7.2. ANN and its layers
7.3. Exploring activation functions
7.4. Forward and backward propgation in ANN
7.5. Building neural network from scratch
7.6. Recurrent neural networks
7.7. LSTM-RNN
7.8. Convolutional neural networks
7.9. Generative adversarial networks

8. Getting to Know TensorFlow

8.1. What is TensorFlow?
8.2. Understanding Computational Graphs and Sessions
8.3. Variables, Constants, and Placeholders
8.4. Introducing TensorBoard
8.5. Handwritten digits classification using Tensorflow
8.6. Visualizing Computational graph in TensorBord
8.7. Introducing Eager execution
8.8. Math operations in TensorFlow
8.9. Tensorflow 2.0 and Keras
8.10. MNIST digits classification in Tensorflow 2.0

9. Deep Q Network and its Variants

9.1. What is Deep Q Network?
9.2. Understanding DQN
9.3. Playing Atari Games using DQN
9.4. Double DQN
9.5. DQN with Prioritized Experience Replay
9.6. Dueling DQN
9.7. Deep Recurrent Q Network

10. Policy Gradient Method

10.1. Why Policy Based Methods?
10.2. Policy Gradient Intuition
10.3. Understanding the Policy Gradient
10.4. Deriving Policy Gradien
10.5. Variance Reduction Methods
10.6. Policy Gradient with Reward-to-go
10.7. Cart Pole Balancing with Policy Gradient
10.8. Policy Gradient with Baseline

11. Actor Critic Methods - A2C and A3C

11.1. Overview of Actor Critic Method
11.2. Understanding the Actor Critic Method
11.3. Advantage Actor Critic
11.4. Asynchronous Advantage Actor Critic
11.5. Mountain Car Climbing using A3C
11.6. A2C Revisited

12. Learning DDPG, TD3 and SAC

12.1. Deep Deterministic Policy Gradient
12.2. Components of DDPG
12.3. Putting it all together
12.4. Algorithm - DDPG
12.5. Swinging Up the Pendulum using DDPG
12.6. Twin Delayed DDPG
12.7. Components of TD3
12.8. Putting it all together
12.9. Algorithm - TD3
12.10. Soft Actor Critic
12.11. Components of SAC
12.12. Putting it all together
12.13. Algorithm - SAC

13. TRPO, PPO and ACKTR Methods

13.1 Trust Region Policy Optimization
13.2. Math Essentials
13.3. Designing the TRPO Objective Function
13.4. Solving the TRPO Objective Function
13.5. Algorithm - TRPO
13.6. Proximal Policy Optimization
13.7. PPO with Clipped Objective
13.9. Implementing PPO-Clipped Method
13.10. PPO with Penalized Objective
13.11. Actor Critic using Kronecker Factored Trust Region
13.12. Math Essentials
13.13. Kronecker-Factored Approximate Curvature (K-FAC)
13.14. K-FAC in Actor Critic

14. Distributional Reinforcement Learning

14.1. Why Distributional Reinforcement Learning?
14.2. Categorical DQN
14.3. Playing Atari games using Categorical DQN
14.4. Quantile Regression DQN
14.5. Math Essentials
14.6. Understanding QR-DQN
14.7. Distributed Distributional DDPG

15. Imitation Learning and Inverse RL

15.1. Supervised Imitation Learning
15.2. DAgger
15.3. Deep Q learning from Demonstrations
15.4. Inverse Reinforcement Learning
15.5. Maximum Entropy IRL
15.6. Generative Adversarial Imitation Learning

16. Deep Reinforcement Learning with Stable Baselines

16.1. Creating our First Agent with Baseline
16.2. Multiprocessing with Vectorized Environments
16.3. Integrating the Custom Environments
16.4. Playing Atari Games with DQN
16.5. Implememt DQN variants
16.6. Lunar Lander using A2C
16.7. Creating a custom network
16.8. Swinging up a Pendulum using DDPG
16.9. Training an Agent to Walk using TRPO
16.10. Training Cheetah Bot to Run using PPO

17. Reinforcement Learning Frontiers

17.1. Meta Reinforcement Learning
17.2. Model Agnostic Meta Learning
17.3. Understanding MAML
17.4. MAML in the Supervised Learning Setting
17.5. Algorithm - MAML in Supervised Learning
17.6. MAML in the Reinforcement Learning Setting
17.7. Algorithm - MAML in Reinforcement Learning
17.8. Hierarchical Reinforcement Learning
17.9. MAXQ Value Function Decomposition
17.10. Imagination Augmented Agents

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
01. Fundamentals of Reinforcement Learning		01. Fundamentals of Reinforcement Learning
02. A Guide to the Gym Toolkit		02. A Guide to the Gym Toolkit
03. Bellman Equation and Dynamic Programming		03. Bellman Equation and Dynamic Programming
04. Monte Carlo Methods		04. Monte Carlo Methods
05. Understanding Temporal Difference Learning		05. Understanding Temporal Difference Learning
06. Case Study: The MAB Problem		06. Case Study: The MAB Problem
07. Deep learning foundations		07. Deep learning foundations
08. A primer on TensorFlow		08. A primer on TensorFlow
09. Deep Q Network and its Variants		09. Deep Q Network and its Variants
10. Policy Gradient Method		10. Policy Gradient Method
11. Actor Critic Methods - A2C and A3C		11. Actor Critic Methods - A2C and A3C
12. Learning DDPG, TD3 and SAC		12. Learning DDPG, TD3 and SAC
13. TRPO, PPO and ACKTR Methods		13. TRPO, PPO and ACKTR Methods
14. Distributional Reinforcement Learning		14. Distributional Reinforcement Learning
15. Imitation Learning and Inverse RL		15. Imitation Learning and Inverse RL
16. Deep Reinforcement Learning with Stable Baselines		16. Deep Reinforcement Learning with Stable Baselines
17. Reinforcement Learning Frontiers		17. Reinforcement Learning Frontiers
images		images
pdf		pdf
README.md		README.md

juliuskittler/Deep-Reinforcement-Learning-With-Python

Folders and files

Latest commit

History

Repository files navigation

Master classic RL, deep RL, distributional RL, inverse RL, and more using OpenAI Gym and TensorFlow with extensive Math

About the book

Get the book

Table of Contents

Free download the chapter 1 here.

6. Case Study: The MAB Problem

About

Resources

Stars

Watchers

Forks

Languages