Actor-Critic CartPole

Implementations of actor-critic reinforcement learning algorithms on the CartPole-v1 environment.

Four algorithms are implemented across multiple backends:

Algorithm	Description
`td`	TD Actor-Critic — online per-step updates using TD error δ
`reinforce`	Vanilla REINFORCE — no critic; actor updated with raw discounted returns G_t
`advantage`	A2C (single worker) — actor updated with normalised advantage (G_t − V(s_t))
`a2c`	A2C Parallel Workers — synchronous multi-worker batch updates with entropy bonus

Python

Three Python backends share the same algorithm set and command-line interface.

Requirements

pip install gymnasium torch tensorflow pygame rich

Scripts

Script	Backend	Notes
`actor_critic.py`	PyTorch	Neural network actor/critic (128-unit hidden layer)
`actor_critic_tf.py`	TensorFlow	Same architecture in Keras
`actor_critic_lfa.py`	NumPy	Linear function approximation — no autograd
`actor_critic_cont_tf.py`	TensorFlow	Continuous action space variant (Gaussian policy)

Usage

python actor_critic.py [--algo ALGO] [--viz VIZ] [--workers N]
python actor_critic_tf.py [--algo ALGO] [--viz VIZ] [--workers N]
python actor_critic_lfa.py [--algo ALGO] [--viz VIZ] [--workers N]
python actor_critic_cont_tf.py [--algo ALGO] [--viz VIZ] [--workers N]

Command-Line Arguments

`--algo` — Training algorithm

Choices: td | reinforce | advantage | a2c Default: a2c

Value	Algorithm
`td`	TD Actor-Critic — step-level TD error updates
`reinforce`	Vanilla REINFORCE — episode-level policy gradient, no critic
`advantage`	Actor-Critic with Advantage — episode-level with normalised advantage
`a2c`	A2C Parallel Workers — batched multi-worker with entropy bonus

`--viz` — Visualisation mode

Choices: text | gui | interactive | none Default: text

Value	Description
`text`	Colour heatmaps printed to the terminal every 50 episodes
`gui`	Live pygame window showing policy and value function heatmaps
`interactive`	Pygame window with axis/slice controls to explore the 4D state space
`none`	No visualisation — fastest training

`--workers` — Number of parallel workers (A2C only)

Type: int Default: 8

Number of synchronous environments used when --algo a2c. Has no effect for other algorithms.

Hyperparameters

Parameter	Value
Discount factor γ	0.99
Actor learning rate α	0.001
Critic learning rate α	0.005
Max episodes	5 000
Solved threshold (avg 100 ep.)	295.0

Feature Extractors (`actor_critic_lfa.py`)

The linear FA script defaults to tile coding. Edit the FEATURES variable at the top of actor_critic_lfa.py to switch:

Value	Feature type	Reference
`tile` (default)	Tile coding (8 tilings × 8 tiles)	S&B §9.5.4
`polynomial`	Polynomial basis (degree 3)	S&B §9.5.1
`fourier`	Fourier basis (order 3)	S&B §9.5.2
`rbf`	Radial Basis Functions (5 centres/dim)	S&B §9.5.5

Examples

# A2C with 16 workers, no visualisation
python actor_critic.py --algo a2c --workers 16 --viz none

# TD actor-critic with live pygame heatmap
python actor_critic.py --algo td --viz gui

# Vanilla REINFORCE on TensorFlow with terminal heatmaps
python actor_critic_tf.py --algo reinforce --viz text

# Linear FA, advantage algorithm, interactive heatmap explorer
python actor_critic_lfa.py --algo advantage --viz interactive

After training, a rendered episode is displayed automatically using the trained actor.

The PyTorch advantage variant also saves the trained actor to actor.pt.

JavaScript

An in-browser training engine that mirrors the Python algorithms. No server required — everything runs in Web Workers.

Try it online: https://mochan.dev/actorcritic/

Running locally

cd js
npx serve .        # or any static file server

Open http://localhost:3000 in your browser.

Controls

Control	Options
Approximator	Linear FA · Neural Network
Algorithm	TD Actor-Critic · REINFORCE · Advantage · A2C
Feature extractor (Linear FA only)	Tile Coding · Polynomial · Fourier · RBF
Actor α / Critic α	Learning rates (pre-filled with sensible defaults per approximator)
Skip episodes	Fast-forward N episodes without rendering

Visualisation

Three live canvases update as training runs:

CartPole — animated environment rendering
Value heatmap — V(s) over a 2D slice of the 4D state space
Policy heatmap — P(push right | s) over the same slice

Use the axis selector controls to choose which two state dimensions to display on the axes, and set fixed values for the remaining two dimensions. A snapshot history slider lets you scrub back through earlier checkpoints.

Click Replay at any time to run a greedy episode with the current policy.

Files

File	Purpose
`index.html` / `style.css`	UI layout and styling
`main.js`	UI logic, rendering, and worker coordination
`worker.js`	Training loop running off the main thread
`algorithms.js`	Trainer classes (TD, REINFORCE, Advantage, A2C)
`linear_fa.js`	Linear function approximation (features, actor, critic)
`nn_fa.js`	Neural network actor and critic
`cartpole.js`	CartPole environment simulation
`heatmap_worker.js`	Off-thread heatmap computation
`benchmark_heatmap.js`	Heatmap benchmark utility
`test_linear_fa.js`	Unit tests for linear FA

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
js		js
README.md		README.md
actor_critic.py		actor_critic.py
actor_critic_cont_tf.py		actor_critic_cont_tf.py
actor_critic_lfa.py		actor_critic_lfa.py
actor_critic_tf.py		actor_critic_tf.py
actor_critic_utils.py		actor_critic_utils.py
linear_fa.py		linear_fa.py
linear_fa_torch.py		linear_fa_torch.py
test_linear_fa.py		test_linear_fa.py
test_linear_fa_torch.py		test_linear_fa_torch.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Actor-Critic CartPole

Python

Requirements

Scripts

Usage

Command-Line Arguments

`--algo` — Training algorithm

`--viz` — Visualisation mode

`--workers` — Number of parallel workers (A2C only)

Hyperparameters

Feature Extractors (`actor_critic_lfa.py`)

Examples

JavaScript

Running locally

Controls

Visualisation

Files

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Actor-Critic CartPole

Python

Requirements

Scripts

Usage

Command-Line Arguments

--algo — Training algorithm

--viz — Visualisation mode

--workers — Number of parallel workers (A2C only)

Hyperparameters

Feature Extractors (actor_critic_lfa.py)

Examples

JavaScript

Running locally

Controls

Visualisation

Files

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`--algo` — Training algorithm

`--viz` — Visualisation mode

`--workers` — Number of parallel workers (A2C only)

Feature Extractors (`actor_critic_lfa.py`)

Packages