This project is a near implementation of the research paper titled, "Mobile Energy Transmitter Scheduling in Energy Harvesting IoT Networks using Deep Reinforcement Learning".
Here we have implemented the environment and reward function as suggested by the above mentioned paper.
python --version - 3.6.9 (min)
# If python2 available by default, use pip3 instead of pip
All required libraries with versions are entered in the setup.py
git clone git@github.com:rrustagi20/rl_iot.git
cd rl_iot
python3 -m venv .env
source .env/bin/activate
pip3 install -e .
Failed building wheel for grpcio
Running setup.py clean for grpcio
Failed to build grpcio
Then Run:
pip3 install tensorboard==2.10.0
Similarly if any package is not build on your system, please try and install it separately like above, using the version number provided in setup.py
file
There are 2 agents implemented in the module
- PPO (Proximal Policy Optimisation) Algorithm
- DDPG (Deep Deterministic Policy Gradient) Algorithm
Source the virtual environment first (venv)
$ cd rl_iot
$ source .env/bin/activate
To train the PPO algorithm: 2 arguments can be passed while running the agent.
- --logname (to save model training logs)
- --model (to save model name) Otherwise there are default values as well. No need to pass external arguments. But make sure that file names do not overwrite.
$ python3 ppo_main.py --logname=PPO_TEST1 --model=1
To train the DDPG algorithm:
$ python3 ddpg_main.py --logname=DDPG_TEST1 --model=1
a. Above mentioned paper MDP approach: Coupled Reward Optimisation Problem
b. A Novel Approach: Decoupled Reward System and external target guidance
To visualise the already trained rewards:
We are using tensorboard to visualise logs of the training agent. The below command is run parallely in another terminal while the agent is training. After running this command, a localhost would be output by the command. Click on the link and refresh as the model trains to visualise the different metrics for training.
$ tensorboard --logdir=ppo_coupled_approach_reward # Visualing Paper Implementation Results on Already Trained PPO Algorithm
$ tensorboard --logdir=ddpg_coupled_approach_reward # Visualing Paper Implementation Results on Already Trained DDPG Algorithm
$ tensorboard --logdir=ppo_decoupled_approach_reward # Visualing Novel Approach Results on Already Trained PPO Algorithm
$ tensorboard --logdir=${LOG_NAME} # Visualing Model Training Logs on Trained Algorithm
# Here replace ${LOG_NAME} with the actual value passed in --logname argument
- Rahul Rustagi (rustagirahul24@gmail.com)
- Chinmay Joshi