# Single Goal Environment

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/mhtb32/tl-env/blob/master/scripts/single_goal.ipynb)


Here we want to train an agent to reach a goal using reinforcement learning algorithms.

## Warm-up
First, we do some imports and initializations.

In [1]:
import gym
from stable_baselines.sac.policies import MlpPolicy
from stable_baselines import SAC

# noinspection PyUnresolvedReferences
import tl_env

env = gym.make('tl_env:SingleGoal-v0')

The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.

pygame 1.9.6
Hello from the pygame community. https://www.pygame.org/contribute.html


Then, we specify a save path to save trained model.

In [None]:
from pathlib import Path

(Path.cwd().parent / 'out').mkdir(exist_ok=True)
save_path = Path.cwd().parent / 'out' / 'sac_single_goal'

## Training
Now we train the agent using Soft Actor Critic(SAC) algorithm.

In [None]:
model = SAC(MlpPolicy, env, verbose=1, buffer_size=5000)
model.learn(total_timesteps=30000, log_interval=200)
model.save(str(save_path))

## Testing
Now we test the agent for a few episodes to see how it is doing. We first define a simple helper function for
visualization of episodes:

In [None]:
from IPython import display as ipythondisplay
from pyvirtualdisplay import Display
from gym.wrappers import Monitor
import base64
from tqdm.notebook import trange

display = Display(visible=0, size=(1400, 900))
display.start()

def show_video():
    html = []
    for mp4 in Path("video").glob("*.mp4"):
        video_b64 = base64.b64encode(mp4.read_bytes())
        html.append('''<video alt="{}" autoplay
                      loop controls style="height: 400px;">
                      <source src="data:video/mp4;base64,{}" type="video/mp4" />
                 </video>'''.format(mp4, video_b64.decode('ascii')))
    # noinspection PyTypeChecker
    ipythondisplay.display(ipythondisplay.HTML(data="<br>".join(html)))

Now we test the policy:

In [None]:
env = Monitor(env, './video', force=True, video_callable=lambda episode: True)
for episode in trange(3, desc="Test episodes"):
    obs, done = env.reset(), False
    env.unwrapped.automatic_rendering_callback = env.video_recorder.capture_frame
    while not done:
        action, _ = model.predict(obs)
        obs, reward, done, info = env.step(action)
env.close()
show_video()

