Reinforcement Learning (RL)

This projects apply basic RL concepts to various games with pytorch
More games (e.g., Baseball games) and RL methods (e.g., Double, Dueling DQN) will be updated soon.

Environment Settings

We need python 3.x, pytorch, numpy libraries.
According to games types, gym and other libraries are needed.

The list of additional libraries

Dots - (X)
CartPole - gym
LunarLander - gym, box2d

the way of installation is below

# 1) gym and Box2D
pip install gym Box2D

Games

Dots
CartPole
LunarLander
Baseball game (Coming Soon)

In case of OpenAI games, the goal of the each game is here and you can find more OpenAI games here

1) Dots

The agent (blue) moves to maximize score!
There are 4 kinds of blocks which are blue, red, green, white.

Blue - Agent. we can only move this agent
Red - Obstacle. the score is decreased when the agent meet the obstacle
Green - Item. the score is increased when the agent meet the item
White - Edge. It represents the end of frame and the score is also decreased when the agent force to go to edge

States are given with colored map, the actions are 4 (Up:0, Down:1, Left:2, Right:3)
You can control the size of map and the default value is 5x5 (total 7x7 with frame)

the original code of game is below https://github.com/awjuliani/DeepRL-Agents/blob/master/gridworld.py

Rewards Graph

Agent Behaviors

2) CartPole

We move cart to keep pole stand up within frame !

States are given with 4 values, the actions are 2 (left, right)

You can easily apply this game using gym library

env = gym.make('cartpole-v1')

Agent Behaviors

Ver.1 1000 episode

Ver.2 2000 episode

Ver.1 4000 episode

3) LunarLander

We aim to land the agent within two flags of moon surface.

env = gym.make('LunarLander-v2')    # discrete
or
env = gym.make('LunarLanderContinuous-v2')  # continuous

Discrete version : States and actions are given with 8 float values and 4 integer values, respectively.
Continuous version : States and actions are given with 8 float values and 2 float values (-1 ~ +1), respectively.

Agent Behaviors

Ver.1 Discrete Actions

Ver.2 Continuous Actions

RL Methods

1) Policy gradient method

Based on this game, the agent learned by vanila policy gradient and deep q learning.

2) Q learning

Basic Q learning (without memory buffer)

3) A2C

Actor-Critic with Advantage function

4) SAC

Soft Actor-Critic

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
Actor_Critic		Actor_Critic
DQN		DQN
Games		Games
Policy_Gradient		Policy_Gradient
img		img
README.md		README.md
utils.py		utils.py

hanseul-jeong/RL

Folders and files

Latest commit

History

Repository files navigation

Reinforcement Learning (RL)

Environment Settings

Games

1) Dots

Rewards Graph

Agent Behaviors

2) CartPole

Agent Behaviors

Ver.1 1000 episode

Ver.2 2000 episode

Ver.1 4000 episode

3) LunarLander

Agent Behaviors

Ver.1 Discrete Actions

Ver.2 Continuous Actions

RL Methods

1) Policy gradient method

2) Q learning

3) A2C

4) SAC

About

Resources

Stars

Watchers

Forks

Languages