PPO_super_mario

Play game super mario using Proximal Policy Optimization method.

Setup

Tested in Windows 8.1, Windows 10, Ubuntu16.04.

Python=3.6, Pytorch>=0.4.0.

Other requirements package.

pip install -r requirements.txt

Save video need install ffmpeg.

Usage

# Train a agent from scratch
python run.py train

Download pre-trained model from here.

# Play game with a trained model
python run.py play ./pre_trained_model/mario_10000-best.dat

Training processing takes about 5 hours when I use nvidia-V100(1GPU, 16 parallel game envs), rewards will reach about 200.0 and game length 275 steps. It look like below when model converge.

Reference

Proximal Policy Optimization
http://blog.varunajayasiri.com/ml/ppo.html This great post help me a lot. It tell me how to warp a game like deepmind done with atari game.
Game environment
PPO tutorial code That is a very clean project and friendly for newbie rl algorithm learner. I borrow part code from it.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
checkpoint		checkpoint
saved_video		saved_video
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

checkpoint

checkpoint

saved_video

saved_video

src

src

.gitignore

.gitignore

README.md

README.md

requirements.txt

requirements.txt

run.py

run.py

Repository files navigation

PPO_super_mario

Setup

Reference

About

Releases

Packages

Languages

ray075hl/PPO_super_mario

Folders and files

Latest commit

History

Repository files navigation

PPO_super_mario

Setup

Reference

About

Resources

Stars

Watchers

Forks

Languages