# Alpha Worm

**About the project**

This project is a student project of a group of 4 LMU students from the computer science department. Objective of the project is to implement one (or more) RL approaches to solve a Unity ML Agent ([https://github.com/Unity-Technologies/ml-agents](https://github.com/Unity-Technologies/ml-agents)) domain. Here we consider the **Worm Domain** from Unity.

The following notebook shows how to interact with the DDPG algortihm.

---

**Some notes/issues we faced:**
* **You can't use the Windows Subsystem for Linux**!
* The executable/environment has to build for the platform where you execute on
* You have to install `tensorflow = 1.15.3` (the latest tensorflow version doesn't work)
* Make sure that the environments are closed in Python (`env.close()`) after execution! If they are not closed probably the unity window would freeze

**About the Pyhton-Unity connection:**
* You can use a standalone builded environment (i.e. an .exe for Windows or an .x86_64)
* Alternatively, you can also interact through Python with an Unity environment that is open in the the Unity Editor
    * Just passing `None` in `env = UnityEnvironment(file_name=None)`
    * You have to press play to interact in th environment
    * Benefit: Here you have console outputs!



## Imports

In [None]:
import torch

torch.__version__

In [None]:
# It is important to append the parent directory to the Python path to import modules from our package.
import sys
sys.path.insert(0, "../")
# For correclty referencing the directoy e.g. when you save files, you have to change the working directoy of this notebook.
# Important: If you reimport you have to restart the kernel. Otherwise you would always go one directory above.
import os
os.chdir("../")

####################
# Default Packages #
####################
import pickle
from pathlib import Path
from datetime import datetime

##################
# ML/RL Packages #
##################
import gym
from gym import wrappers

################
# Our Packages #
################
from trainer import DDPGTrainer, TD3Trainer
from utils.mlagent_utils import get_env
from config.config import log, logFormatter

---

## Train: Pendulum Domain (or any other gym environment)

You will find the tracked results and any file that is tracked in `models/CURRENT_DATE/NAME/` (starting from the project root)

In [None]:
######################
# Set Training Infos #
######################
env_name = "Pendulum-v0"
name = f"DPPG-{env_name}"

###########################################
# Ensure that the path exists for logging #
###########################################
folder = Path(f'models/{datetime.now().date()}/{name}/')
folder.mkdir(parents=True, exist_ok=True)

# Store logs directly nearby the results!
fh = log.FileHandler(f'models/{datetime.now().date()}/{name}/{datetime.now().date()}.log')
fh.setFormatter(logFormatter)
log.getLogger().addHandler(fh)

######################
# Create Environment #
######################
env = gym.make(env_name)

trainer = DDPGTrainer()

log.info(f"Start DDPG training ({env_name})...")

# If you want to customize the training.
# trainer.config["episodes"] = 5
# trainer.config["training_steps"] = 5
# trainer.config["evaluation_steps"] = 5 # To disable evaluation set to 0
# trainer.config["evaluation_lim"] = 10

trainer.train(env, name=name, render=False)

log.info("Training done!")

---

## Train: Worm Domain

You will find the tracked results and any file that is tracked in `models/CURRENT_DATE/NAME/` (starting from the project root)

For plotting the results after training you can use the internal stored results:
* E.g. `trainer.training_rewards_df.plot()`

In [None]:
######################
# Set Training Infos #
######################
name = f"DPPG-AlphaWorm"

###########################################
# Ensure that the path exists for logging #
###########################################
folder = Path(f'models/{datetime.now().date()}/{name}/')
folder.mkdir(parents=True, exist_ok=True)

# Store logs directly nearby the results!
fh = log.FileHandler(f'models/{datetime.now().date()}/{name}/{datetime.now().date()}.log')
fh.setFormatter(logFormatter)
log.getLogger().addHandler(fh)

env = "envs/worm_dynamic_one_agent/win/UnityEnvironment"  # For Windows
# env = "./envs/worm_dynamic_one_agent/linux/worm_dynamic"  # For Linux
env = get_env(env, True)

trainer = DDPGTrainer()

log.info("Start DDPG training (WormDomain)...")

# If you want to customize the training.
# trainer.config["episodes"] = 5
# trainer.config["training_steps"] = 10
# trainer.config["evaluation_steps"] = 1 # To disable evaluation set to 0
# trainer.config["evaluation_lim"] = 10
# trainer.config["explore_threshold"] = 0.1

trainer.train(env, name=name)

log.info("Training done!")

In [None]:
env.close()

---

## HPO Training: Worm Domain

Optuna stores the HPO results and states in a so called `study`. Such a study can be used to evaluate and view the results during HPO. Here are some examples:

* `study.best_params`for getting the parameters of the best performing model

For a full API see [https://optuna.readthedocs.io/en/stable/](https://optuna.readthedocs.io/en/stable/)


In [None]:
######################
# Set Training Infos #
######################
name = f"DPPG-AlphaWorm-HPO"

###########################################
# Ensure that the path exists for logging #
###########################################
folder = Path(f'models/{datetime.now().date()}/{name}/')
folder.mkdir(parents=True, exist_ok=True)

# Store logs directly nearby the results!
fh = log.FileHandler(f'models/{datetime.now().date()}/{name}/{datetime.now().date()}.log')
fh.setFormatter(logFormatter)
log.getLogger().addHandler(fh)

env = "envs/worm_dynamic_one_agent/win/UnityEnvironment"  # For Windows
# env = "./envs/worm_dynamic_one_agent/linux/worm_dynamic"  # For Linux
env = get_env(env, False)

trainer = DDPGTrainer()

# Important: Start training is only needed for HPO.
# Important: If you set default = True and set the number of trials > 1 you train multiple times on the same parameters!
# For using the search space use default = False.
study = trainer.start_training(env, trials=2, render=False, name=name, default=True)

In [None]:
##########################
# Plotting study results #
##########################
import optuna
# optuna.visualization.plot_optimization_history(study)
# optuna.visualization.plot_intermediate_values(study)
# optuna.visualization.plot_optimization_history(study)
# optuna.visualization.plot_parallel_coordinate(study)

---

## Load already stored agents

By default agents are dumped using pickle in the model directory. 

**Important**: Pickle dumps the object in a specific context and we have to make sure that the context matches while we loading the dumped object! See comments below.

BTW: This procedure also works, if you want to review a specific study again.

In [None]:
# Important: Pickle dumps object in a specific context. Therefore it is very important to also reconstruct this context.
# This means imply make sure that you add the parent directoy to the python path.
import sys
sys.path.insert(0, "../")
import pickle

# Example path! Replace by your path.
with open("../models/2020-07-03/DDPG-Pendulum-2/ddpg_training.pickle", "rb") as f:
    ddpg_agent = pickle.load(f)

# Use the trained agent! E.g. by runing on a specific environment.
# ddpg_agent.run(env, steps=1000)