User Tutorial 3. Actor critic on the rain car environment

Prerequisites

We strongly recommend doing Tutorial #2 before this one.

Objectives

Using the same environment, we will set up an Actor-Critic agent and try to make it learn the control of the car. The main difference of Actor-Critic agents with respect to Q-Learning agents is that the output of the agent is continuous, whereas the output of Q-Learning and similar agents is a discretization of the continuous action space.

Tutorial

Actor-Critic agents consist on two elements:

The actor: learns a policy pi(s) based on the feedback of the critic. In this tutorial, we will use Cacla
The critic: estimates the value of the current policy V(s) as a function of the state. Every time-step, the critic sends a feedback value to the actor assessing the quality of the last action selected. In this tutorial, we will use TD(lambda), the most popular value function learning algorithm

SimionSoft - Group of Computational Intelligence ( University of the Basque Country (UPV/EHU) )

Home

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

User Tutorial 3. Actor critic on the rain car environment

Prerequisites

Objectives

Tutorial

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally