Skip to content

User Tutorial 3. Actor critic on the rain car environment

BorjaFG edited this page Mar 11, 2019 · 5 revisions

Prerequisites

  • We strongly recommend doing Tutorial #2 before this one.

Objectives

Using the same environment, we will set up an Actor-Critic agent and try to make it learn the control of the car. The main difference of Actor-Critic agents with respect to Q-Learning agents is that the output of the agent is continuous, whereas the output of Q-Learning and similar agents is a discretization of the continuous action space.

Tutorial

Actor-Critic agents consist on two elements:

  • The actor: learns a policy pi(s) based on the feedback of the critic. In this tutorial, we will use Cacla
  • The critic: estimates the value of the current policy V(s) as a function of the state. Every time-step, the critic sends a feedback value to the actor assessing the quality of the last action selected. In this tutorial, we will use TD(lambda), the most popular value function learning algorithm

Clone this wiki locally