Skip to content

rennaMAhcuS/Hands-on-RL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Hands on RL: SOC

Note:

To solve the questions in the Weeks (If you are interested ☺️), checkout the Questions folder in the Weeks. Check out the Course Info file for more details!

[Prerequisites]

So for the first week we'll start with a bit of light work. Your task will be to read the first two chapters from Grokking and explore more about the new terms that you come across.

Also in this week, we will learn about Python, in particular Python Libraries such as NumPy, TensorFlow, PyTorch, MatPlotLib and Scikit-Learn.

We will implement some of the common ML algorithms from scratch, I will provide some reading material for the same and an assignment by eod or by tomorrow. Till then, you can get some basic ideas about Machine Learning through any online source or YouTube video.

Hope everyone had fun learning the basics of ML in the first week (and finished the assignment in time). Now it's time to start with Reinforcement Learning, a paradigm in Machine Learning. We will be learning the basics of RL this week.

Our learning objectives for this week are:

  1. N-armed bandits:

    Perhaps the simplest RL challenge. You are provided with n arms, and you can pull one of them at a time. Each arm gives a reward from a probability distribution, independent of which arms you pulled beforehand (However, it may depend on the time step you pull). Your task is to maximize the sum of these rewards; that is essentially finding the arm which gives the maximum expected reward. We will be studying a few algorithms to solve this problem.

  2. Formalism of Reinforcement Learning in terms of Markov Decision Process:

    Way to generalize and represent a given problem as a Reinforcement Learning problem.

  3. Dynamic Programming:

    The most basic set of algorithms to deal with prediction and control problems, that is, finding an optimal policy given the complete Markov Decision Process.

  • Grokking:

    Skim through chapter 1. Read and try to implement from chapters 2, 3 and 4.

  • If you are more interested in theory, you may also read these topics from Sutton and Barto.

This (snake.py) contains some starter code for creating a window and moving a square around in it. Modify this code to implement a game of Snake. (This project is intended to be highly collaborative: so get to know your co-mentees and work with them on this! Also, this is a widespread game and can be found anywhere, but try implementing it yourself, adding some valid customisations is encouraged.)

It's time to start proper Reinforcement Learning. For Week 3, we will be looking at two classes of algorithms to deal with prediction and control problems, when we don't know the model of the environment:

  1. Monte Carlo Methods
  2. Temporal Difference Learning
  • The primary reading material for this week will be Chapters 5 and 6 of Grokking RL.

  • Also, if you are interested in a more theoretical approach, you may also refer to chapters 5 and 6 from Sutton and Barto and additionally chapter 7 for Eligibility Traces, which acts as a bridge between these two classes of algorithms. The topics may seem more difficult than what you have been reading upon, but the basic ideas remain the same, and by the end, you will definitely get a sense of how these algorithms are constructed.