News

[Towardsdatascience] ElegantRL: A Lightweight and Stable Deep Reinforcement Learning Library
ElegantRL: Mastering PPO Algorithms (Part I)
[MLearning.ai] ElegantRL Demo: Stock Trading Using DDPG (Part I)
[MLearning.ai] ElegantRL Demo: Stock Trading Using DDPG (Part II)

File Structure

An agent in agent.py uses networks in net.py and is trained in run.py by interacting with an environment in env.py.

-----kernel file----

elegantrl/net.py # Neural networks.
- Q-Net,
- Actor Network,
- Critic Network,
elegantrl/agent.py # RL algorithms.
- AgentBase
elegantrl/run.py # run DEMO 1 ~ 4
- Parameter initialization,
- Training loop,
- Evaluator.

-----utils file----

elegantrl/envs/ # gym env or custom env, including FinanceStockEnv.
- gym_utils.py: A PreprocessEnv class for gym-environment modification.
- Stock_Trading_Env: A self-created stock trading environment as an example for user customization.
eRL_demo_BipedalWalker.ipynb # BipedalWalker-v2 in jupyter notebooks
eRL_demos.ipynb # Demo 1~4 in jupyter notebooks. Tell you how to use tutorial version and advanced version.
eRL_demo_SingleFilePPO.py # Use single file to train PPO, more simple than tutorial version
eRL_demo_StockTrading.py # Stock Trading Application in jupyter notebooks

As a high-level overview, the relations among the files are as follows. Initialize an environment in Env.py and an agent in Agent.py. The agent is constructed with Actor and Critic networks in Net.py. In each training step in Run.py, the agent interacts with the environment, generating transitions that are stored into a Replay Buffer. Then, the agent fetches transitions from the Replay Buffer to train its networks. After each update, an evaluator evaluates the agent's performance and saves the agent if the performance is good.

Training Pipeline

Initialization:

hyper-parameters args.
env = PreprocessEnv() : creates an environment (in the OpenAI gym format).
agent = agent.XXX() : creates an agent for a DRL algorithm.
evaluator = Evaluator() : evaluates and stores the trained model.
buffer = ReplayBuffer() : stores the transitions.

Then, the training process is controlled by a while-loop:

agent.explore_env(…): the agent explores the environment within target steps, generates transitions, and stores them into the ReplayBuffer.
agent.update_net(…): the agent uses a batch from the ReplayBuffer to update the network parameters.
evaluator.evaluate_save(…): evaluates the agent's performance and keeps the trained model with the highest score.

The while-loop will terminate when the conditions are met, e.g., achieving a target score, maximum steps, or manually breaks.

Experimental Results

Results using ElegantRL

LunarLanderContinuous-v2

BipedalWalkerHardcore-v2

BipedalWalkerHardcore is a difficult task in continuous action space. There are only a few RL implementations can reach the target reward.

Check out a video on bilibili: Crack the BipedalWalkerHardcore-v2 with total reward 310 using IntelAC.

Requirements

Necessary:
| Python 3.6+     |           
| PyTorch 1.6+    |    

Not necessary:
| Numpy 1.18+     | For ReplayBuffer. Numpy will be installed along with PyTorch.
| gym 0.17.0      | For env. Gym provides tutorial env for DRL training. (env.render() bug in gym==1.18 pyglet==1.6. Change to gym==1.17.0, pyglet==1.5)
| pybullet 2.7+   | For env. We use PyBullet (free) as an alternative of MuJoCo (not free).
| box2d-py 2.3.8  | For gym. Use pip install Box2D (instead of box2d-py)
| matplotlib 3.2  | For plots. Evaluate the agent performance.

pip3 install gym==1.17.0 pybullet Box2D matplotlib

Citation:

To cite this repository:

@misc{rlalgorithms,
  author = {Xiao-Yang Liu, Zechu Li, Zhaoran Wang, Jiahao Zheng},
  title = {ElegantRL: A Lightweight and Stable Deep Reinforcement Learning Library},
  year = {2021},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/AI4Finance-LLC/ElegantRL}},
}

Name		Name	Last commit message	Last commit date
Latest commit History 852 Commits
docs		docs
elegantrl		elegantrl
figs		figs
FinanceStock.npy		FinanceStock.npy
LICENSE		LICENSE
README.md		README.md
eRL_demo_BipedalWalker.ipynb		eRL_demo_BipedalWalker.ipynb
eRL_demo_SingleFilePPO.py		eRL_demo_SingleFilePPO.py
eRL_demo_StockTrading.ipynb		eRL_demo_StockTrading.ipynb
eRL_demos.ipynb		eRL_demos.ipynb
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs

docs

elegantrl

elegantrl

figs

figs

FinanceStock.npy

FinanceStock.npy

LICENSE

LICENSE

README.md

README.md

eRL_demo_BipedalWalker.ipynb

eRL_demo_BipedalWalker.ipynb

eRL_demo_SingleFilePPO.py

eRL_demo_SingleFilePPO.py

eRL_demo_StockTrading.ipynb

eRL_demo_StockTrading.ipynb

eRL_demos.ipynb

eRL_demos.ipynb

setup.py

setup.py

Repository files navigation

Lightweight, Efficient and Stable DRL Implementation Using PyTorch

Table of Contents

News

File Structure

Training Pipeline

Initialization:

Then, the training process is controlled by a while-loop:

Experimental Results

Requirements

Citation:

About

Releases

Packages

Languages

License

mingl2000/ElegantRL

Folders and files

Latest commit

History

Repository files navigation

Lightweight, Efficient and Stable DRL Implementation Using PyTorch

Table of Contents

News

File Structure

Training Pipeline

Initialization:

Then, the training process is controlled by a while-loop:

Experimental Results

Requirements

Citation:

About

Resources

License

Stars

Watchers

Forks

Languages