# Capture the Flag (RL - Policy Gradient)

- Seung Hyun Kim
- skim449@illinois.edu

## Implementation Details

- Actor-critic
- On-Policy
- 

### Sampling
- [ ] Mini-batch to update 'average' gradient
- [ ] Experience Replay for Random Sampling
- [ ] Importance Sampling
    
### Deterministic Policy Gradient
- [ ] DDPG
- [x] MADDPG

### Stability and Reducing Variance
- [x] Gradient clipping
- [ ] Normalized Reward/Advantage
- [x] Target Network
- [ ] TRPO
- [ ] PPO

### Multiprocessing
- [ ] Synchronous Training (A2C)
- [x] Asynchronous Training (A3C)

### Applied Training Methods:
- [ ] Self-play
- [ ] Batch Policy

## Notes

- This notebook includes:
    - Building the structure of policy driven network.
    - Training with/without render
    - Saver that save model and weights to ./model directory
    - Writer that will record some necessary datas to ./logs

- This notebook does not include:
    - Simulation with RL policy
        - The simulation can be done using policy_RL.py
    - cap_test.py is changed appropriately.
    
## References :
- https://github.com/awjuliani/DeepRL-Agents/blob/master/Vanilla-Policy.ipynb (source)
- https://www.youtube.com/watch?v=PDbXPBwOavc
- https://github.com/lilianweng/deep-reinforcement-learning-gym/blob/master/playground/policies/actor_critic.py (source)
- https://github.com/spro/practical-pytorch/blob/master/reinforce-gridworld/reinforce-gridworld.ipynb
- https://arxiv.org/pdf/1706.02275.pdf

## TODO:

- Research on '_bootstrap_' instead of end-reward
- Record method in network

In [1]:
!rm -rf logs/B4R4_nonzero_MAA3C/ model/B4R4_nonzero_MAA3C

In [2]:
TRAIN_NAME='B4R4_nonzero_MAA3C'
LOG_PATH='./logs/'+TRAIN_NAME
MODEL_PATH='./model/' + TRAIN_NAME
GPU_CAPACITY=0.25 # gpu capacity in percentage

In [3]:
import os

import signal
import multiprocessing
import threading

import tensorflow as tf
import tensorflow.contrib.slim as slim
import tensorflow.contrib.layers as layers
from tensorflow.python.client import device_lib
import matplotlib.pyplot as plt
%matplotlib inline

import time
import gym
import gym_cap
import gym_cap.envs.const as CONST
import numpy as np
import random
import math

# the modules that you can use to generate the policy. 
import policy.random
import policy.roomba
import policy.policy_RL
import policy.zeros

# Data Processing Module
from utility.dataModule import one_hot_encoder
from utility.utils import MovingAverage as MA
from utility.utils import Experience_buffer, discount_rewards
from utility.vec_env import SubprocVecEnv

from network.MAActorCritic import MAActorCritic as AC

%load_ext autoreload
%autoreload 2

## Hyperparameters

In [4]:
# Replay Variables
total_episodes= 120000
max_ep = 150
update_frequency = 64
entropy_beta = 0.01

# Saving Related
save_network_frequency = 1200
save_stat_frequency = 128
moving_average_step = 128

# Training Variables
decay_lr = False
lr_a = 1e-4
lr_c = 1e-4
lr_a_gamma = 0.9995
lr_c_gamma = 0.9995
lr_a_final = 1e-5
lr_c_final = 1e-4
lr_a_decay_step = (int) (math.log(lr_a_final/lr_a) / math.log(lr_a_gamma))
lr_c_decay_step = (int) (math.log(lr_c_final/lr_c) / math.log(lr_c_gamma))

gamma = 0.99 # discount_factor

# Env Settings
MAP_SIZE = 20
VISION_RANGE = 9 # What decide the network size !!!
VISION_dX, VISION_dY = 2*VISION_RANGE+1, 2*VISION_RANGE+1
in_size = [None,VISION_dX,VISION_dY,6]
NENV = (int) (multiprocessing.cpu_count())

# Asynch Settings
global_scope = 'global'

## Environment Setting

In [5]:
if not os.path.exists(MODEL_PATH):
    os.makedirs(MODEL_PATH)
    
#Create a directory to save episode playback gifs to
if not os.path.exists(LOG_PATH):
    os.makedirs(LOG_PATH)

In [6]:
action_space = 5
n_agent = 4

## A3C Network Structure

<img src="https://cdn-images-1.medium.com/max/1600/1*YtnGhtSAMnnHSL8PvS7t_w.png" width="450">

- Network is given in network.ActorCritic

## Environments

<img src="https://cdn-images-1.medium.com/max/1600/1*Hzql_1t0-wwDxiz0C97AcQ.png" width="450">
<img src="https://lilianweng.github.io/lil-log/assets/images/MADDPG.png" width="450">

In [7]:
global_rewards = MA(moving_average_step)
global_ep_rewards = MA(moving_average_step)
global_length = MA(moving_average_step)
global_succeed = MA(moving_average_step)
global_episodes = 0

# Launch the session
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=GPU_CAPACITY, allow_growth=True)

sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))
#sess = tf.Session()
progbar = tf.keras.utils.Progbar(total_episodes,interval=1)

## Worker

In [8]:
class Worker(object):
    def __init__(self, name, globalAC, sess, global_step=0):
        # Initialize Environment worker
        self.env = gym.make("cap-v0").unwrapped
        self.name = name
        
        # Create AC Network for Worker
        self.AC = AC(in_size=in_size,
                     action_size=action_space,
                     num_agent=n_agent,
                     decay_lr=decay_lr,
                     lr_actor=lr_a,
                     lr_critic=lr_c,
                     grad_clip_norm=0,
                     scope=self.name,
                     global_step=global_step,
                     initial_step=0,
                     entropy_beta = 0.01,
                     sess=sess,
                     globalAC=globalAC)
        
        self.sess=sess
        
    def work(self, saver, writer):
        global global_rewards, global_ep_rewards, global_episodes, global_length, global_succeed
        total_step = 1
                
        # loop
        with self.sess.as_default(), self.sess.graph.as_default():
            while not coord.should_stop() and global_episodes < total_episodes:
                s0 = self.env.reset(map_size=MAP_SIZE, policy_red=policy.zeros.PolicyGen(self.env.get_map, self.env.get_team_red))
                #s0 = one_hot_encoder(s0, self.env.get_team_blue, VISION_RANGE)
                s0 = one_hot_encoder(self.env._env, self.env.get_team_blue, VISION_RANGE)
                v0 = self.AC.get_critic(s0, np.arange(n_agent))
                
                # parameters 
                ep_r = 0 # Episodic Reward
                prev_r = 0
                was_alive = [ag.isAlive for ag in self.env.get_team_blue]

                indv_history = [ [] for _ in range(len(self.env.get_team_blue)) ]
                for step in range(max_ep+1):
                    
                    a = self.AC.get_action(s0, np.arange(n_agent))
                    
                    s1, rc, d, _ = self.env.step(a)
                    #s1 = one_hot_encoder(s1, self.env.get_team_blue, VISION_RANGE)
                    s1 = one_hot_encoder(self.env._env, self.env.get_team_blue, VISION_RANGE)

                    r = (rc - prev_r-1)
                    #d = (True if d or step == max_ep else False)
                    if step == max_ep and d == False:
                        r = -100
                        rc = -100
                        d = True

                    r /= 100.0
                    ep_r += r

                    if d:
                        v1 = [0.0 for _ in range(len(self.env.get_team_blue))]
                    else:
                        v1 = self.AC.get_critic(s1, np.arange(n_agent))

                    # push to buffer
                    for idx, agent in enumerate(self.env.get_team_blue):
                        if was_alive[idx]:
                            indv_history[idx].append([s0[idx],a[idx],r,v0[idx]])

                    if total_step % update_frequency == 0 or d:
                        #aloss, closs, etrpy = self.train(indv_history, sess, v1)
                        summary_str = self.train(indv_history, sess, v1)
                        #summary_str = self.train(indv_history, sess, v1)
                        #if d and global_episodes % save_stat_frequency == 0 and global_episodes != 0:
                            # record loss
                            #writer.add_summary(summary_str, global_episodes)
                        indv_history = [ [] for _ in range(len(self.env.get_team_blue)) ]

                    # Iteration
                    prev_r = rc
                    was_alive = [ag.isAlive for ag in self.env.get_team_blue]
                    s0=s1
                    total_step += 1
                    v0 = v1

                    if d:
                        global_ep_rewards.append(ep_r)
                        global_rewards.append(rc)
                        global_length.append(step)
                        global_succeed.append(self.env.blue_win)
                        global_episodes += 1
                        self.sess.run(global_step_next)
                        progbar.update(global_episodes)
                        if global_episodes % save_stat_frequency == 0 and global_episodes != 0:
                            summary = tf.Summary()
                            summary.value.add(tag='Records/mean_reward', simple_value=global_rewards())
                            summary.value.add(tag='Records/mean_length', simple_value=global_length())
                            summary.value.add(tag='Records/mean_succeed', simple_value=global_succeed())
                            summary.value.add(tag='Records/mean_episode_reward', simple_value=global_ep_rewards())
                            '''summary.value.add(tag='summary/Entropy', simple_value=etrpy)
                            summary.value.add(tag='summary/actor_loss', simple_value=aloss)
                            summary.value.add(tag='summary/critic_loss', simple_value=closs)'''
                            writer.add_summary(summary,global_episodes)
                            writer.add_summary(summary_str,global_episodes)

                            writer.flush()
                        if global_episodes % save_network_frequency == 0:
                            saver.save(self.sess, MODEL_PATH+'/ctf_policy.ckpt', global_step=global_episodes)
                        break
                        
    def train(self, indv_buffer, sess, bootstrap=0.0):
        for idx, buffer in enumerate(indv_buffer):
            if len(buffer) == 0:
                continue
            _history = np.array(buffer)
            observations = _history[:,0]
            actions = _history[:,1].astype(np.int32)
            rewards = _history[:,2]
            values = _history[:,3]
            
            rewards_ext = np.append(rewards, [bootstrap[idx]])
            #discounted_rewards = discount_rewards(rewards_ext,gamma)[:-1]
            value_ext = np.append(values, [bootstrap[idx]])
            td_targets = rewards + gamma * value_ext[1:]
            advantages = rewards + gamma * value_ext[1:] - value_ext[:-1]
            advantages = discount_rewards(advantages,gamma)
            
            self.AC.update_unitpolicy_global(idx, observations, actions, advantages, td_targets)
            
        # get global parameters to local ActorCritic 
        self.AC.pull_global()
    

## Run

In [9]:
with tf.device("/cpu:0"):
    # Global Network
    global_step = tf.Variable(0, trainable=False, name='global_step')
    global_step_next = tf.assign_add(global_step, 1)
    global_ac = AC(in_size=in_size, action_size=action_space, num_agent=n_agent, scope=global_scope, sess=sess, global_step=global_step)

    # Local workers
    workers = []
    # loop for each workers
    for idx in range(NENV):
        name = 'W_%i' % idx
        workers.append(Worker(name, global_ac, sess, global_step=global_step))
        print(f'worker: {name} initiated')
    saver = tf.train.Saver(max_to_keep=3)
    writer = tf.summary.FileWriter(LOG_PATH, sess.graph)
    
ckpt = tf.train.get_checkpoint_state(MODEL_PATH)
if ckpt and tf.train.checkpoint_exists(ckpt.model_checkpoint_path):
    saver.restore(sess, ckpt.model_checkpoint_path)
    print("Load Model : ", ckpt.model_checkpoint_path)
else:
    sess.run(tf.global_variables_initializer())
    print("Initialized Variables")
    
coord = tf.train.Coordinator()
worker_threads = []
global_episodes = sess.run(global_step)

for worker in workers:
    job = lambda: worker.work(saver, writer)
    t = threading.Thread(target=job)
    t.start()
    worker_threads.append(t)
coord.join(worker_threads)

worker: W_0 initiated
worker: W_1 initiated
worker: W_2 initiated
worker: W_3 initiated
worker: W_4 initiated
worker: W_5 initiated
worker: W_6 initiated
worker: W_7 initiated
worker: W_8 initiated
worker: W_9 initiated
worker: W_10 initiated
worker: W_11 initiated
worker: W_12 initiated
worker: W_13 initiated
worker: W_14 initiated
worker: W_15 initiated
worker: W_16 initiated
worker: W_17 initiated
worker: W_18 initiated
worker: W_19 initiated
worker: W_20 initiated
worker: W_21 initiated
worker: W_22 initiated
worker: W_23 initiated
Initialized Variables


Exception in thread Thread-15:
Traceback (most recent call last):
  File "/home/namsong/github/ctf_public/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1292, in _do_call
    return fn(*args)
  File "/home/namsong/github/ctf_public/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1277, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/home/namsong/github/ctf_public/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1367, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'W_0/actor_loss/agent0/action_hold' with dtype int32 and shape [?]
	 [[{{node W_0/actor_loss/agent0/action_hold}} = Placeholder[dtype=DT_INT32, shape=[?], _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call 

Exception in thread Thread-19:
Traceback (most recent call last):
  File "/home/namsong/github/ctf_public/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1292, in _do_call
    return fn(*args)
  File "/home/namsong/github/ctf_public/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1277, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/home/namsong/github/ctf_public/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1367, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'W_0/actor_loss/agent0/adv_hold' with dtype float and shape [?]
	 [[{{node W_0/actor_loss/agent0/adv_hold}} = Placeholder[dtype=DT_FLOAT, shape=[?], _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

Exception in thread Thread-5:
Traceback (most recent call last):
  File "/home/namsong/github/ctf_public/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1292, in _do_call
    return fn(*args)
  File "/home/namsong/github/ctf_public/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1277, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/home/namsong/github/ctf_public/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1367, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'W_0/actor_loss/agent0/action_hold' with dtype int32 and shape [?]
	 [[{{node W_0/actor_loss/agent0/action_hold}} = Placeholder[dtype=DT_INT32, shape=[?], _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call l

Exception in thread Thread-9:
Traceback (most recent call last):
  File "/home/namsong/github/ctf_public/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1292, in _do_call
    return fn(*args)
  File "/home/namsong/github/ctf_public/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1277, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/home/namsong/github/ctf_public/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1367, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'W_0/actor_loss/agent0/action_hold' with dtype int32 and shape [?]
	 [[{{node W_0/actor_loss/agent0/action_hold}} = Placeholder[dtype=DT_INT32, shape=[?], _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call l

Exception in thread Thread-23:
Traceback (most recent call last):
  File "/home/namsong/github/ctf_public/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1292, in _do_call
    return fn(*args)
  File "/home/namsong/github/ctf_public/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1277, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/home/namsong/github/ctf_public/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1367, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'W_0/actor_loss/agent0/action_hold' with dtype int32 and shape [?]
	 [[{{node W_0/actor_loss/agent0/action_hold}} = Placeholder[dtype=DT_INT32, shape=[?], _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call 

Exception in thread Thread-21:
Traceback (most recent call last):
  File "/home/namsong/github/ctf_public/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1292, in _do_call
    return fn(*args)
  File "/home/namsong/github/ctf_public/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1277, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/home/namsong/github/ctf_public/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1367, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'W_0/actor_loss/agent0/action_hold' with dtype int32 and shape [?]
	 [[{{node W_0/actor_loss/agent0/action_hold}} = Placeholder[dtype=DT_INT32, shape=[?], _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call 

Exception in thread Thread-25:
Traceback (most recent call last):
  File "/home/namsong/github/ctf_public/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1292, in _do_call
    return fn(*args)
  File "/home/namsong/github/ctf_public/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1277, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/home/namsong/github/ctf_public/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1367, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'W_0/actor_loss/agent0/adv_hold' with dtype float and shape [?]
	 [[{{node W_0/actor_loss/agent0/adv_hold}} = Placeholder[dtype=DT_FLOAT, shape=[?], _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

Exception in thread Thread-6:
Traceback (most recent call last):
  File "/home/namsong/github/ctf_public/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1292, in _do_call
    return fn(*args)
  File "/home/namsong/github/ctf_public/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1277, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/home/namsong/github/ctf_public/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1367, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'W_0/actor_loss/agent0/action_hold' with dtype int32 and shape [?]
	 [[{{node W_0/actor_loss/agent0/action_hold}} = Placeholder[dtype=DT_INT32, shape=[?], _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call l

Exception in thread Thread-7:
Traceback (most recent call last):
  File "/home/namsong/github/ctf_public/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1292, in _do_call
    return fn(*args)
  File "/home/namsong/github/ctf_public/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1277, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/home/namsong/github/ctf_public/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1367, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'W_0/actor_loss/agent0/action_hold' with dtype int32 and shape [?]
	 [[{{node W_0/actor_loss/agent0/action_hold}} = Placeholder[dtype=DT_INT32, shape=[?], _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call l

Exception in thread Thread-10:
Traceback (most recent call last):
  File "/home/namsong/github/ctf_public/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1292, in _do_call
    return fn(*args)
  File "/home/namsong/github/ctf_public/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1277, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/home/namsong/github/ctf_public/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1367, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'W_1/critic_loss/Placeholder' with dtype float and shape [?]
	 [[{{node W_1/critic_loss/Placeholder}} = Placeholder[dtype=DT_FLOAT, shape=[?], _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  Fil

Exception in thread Thread-28:
Traceback (most recent call last):
  File "/home/namsong/github/ctf_public/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1292, in _do_call
    return fn(*args)
  File "/home/namsong/github/ctf_public/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1277, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/home/namsong/github/ctf_public/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1367, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'W_0/actor_loss/agent0/action_hold' with dtype int32 and shape [?]
	 [[{{node W_0/actor_loss/agent0/action_hold}} = Placeholder[dtype=DT_INT32, shape=[?], _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call 

Exception in thread Thread-11:
Traceback (most recent call last):
  File "/home/namsong/github/ctf_public/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1292, in _do_call
    return fn(*args)
  File "/home/namsong/github/ctf_public/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1277, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/home/namsong/github/ctf_public/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1367, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'W_5/actor_loss/agent0/action_hold' with dtype int32 and shape [?]
	 [[{{node W_5/actor_loss/agent0/action_hold}} = Placeholder[dtype=DT_INT32, shape=[?], _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call 

Exception in thread Thread-26:
Traceback (most recent call last):
  File "/home/namsong/github/ctf_public/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1292, in _do_call
    return fn(*args)
  File "/home/namsong/github/ctf_public/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1277, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/home/namsong/github/ctf_public/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1367, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'W_0/actor_loss/agent0/action_hold' with dtype int32 and shape [?]
	 [[{{node W_0/actor_loss/agent0/action_hold}} = Placeholder[dtype=DT_INT32, shape=[?], _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call 

Exception in thread Thread-22:
Traceback (most recent call last):
  File "/home/namsong/github/ctf_public/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1292, in _do_call
    return fn(*args)
  File "/home/namsong/github/ctf_public/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1277, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/home/namsong/github/ctf_public/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1367, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'W_0/actor_loss/agent0/adv_hold' with dtype float and shape [?]
	 [[{{node W_0/actor_loss/agent0/adv_hold}} = Placeholder[dtype=DT_FLOAT, shape=[?], _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

Exception in thread Thread-20:
Traceback (most recent call last):
  File "/home/namsong/github/ctf_public/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1292, in _do_call
    return fn(*args)
  File "/home/namsong/github/ctf_public/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1277, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/home/namsong/github/ctf_public/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1367, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'W_0/actor_loss/agent0/action_hold' with dtype int32 and shape [?]
	 [[{{node W_0/actor_loss/agent0/action_hold}} = Placeholder[dtype=DT_INT32, shape=[?], _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call 

Exception in thread Thread-17:
Traceback (most recent call last):
  File "/home/namsong/github/ctf_public/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1292, in _do_call
    return fn(*args)
  File "/home/namsong/github/ctf_public/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1277, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/home/namsong/github/ctf_public/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1367, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'W_0/agent0/state' with dtype float and shape [?,19,19,6]
	 [[{{node W_0/agent0/state}} = Placeholder[dtype=DT_FLOAT, shape=[?,19,19,6], _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/us

Exception in thread Thread-13:
Traceback (most recent call last):
  File "/home/namsong/github/ctf_public/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1292, in _do_call
    return fn(*args)
  File "/home/namsong/github/ctf_public/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1277, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/home/namsong/github/ctf_public/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1367, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'W_0/actor_loss/agent0/action_hold' with dtype int32 and shape [?]
	 [[{{node W_0/actor_loss/agent0/action_hold}} = Placeholder[dtype=DT_INT32, shape=[?], _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call 

Exception in thread Thread-16:
Traceback (most recent call last):
  File "/home/namsong/github/ctf_public/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1292, in _do_call
    return fn(*args)
  File "/home/namsong/github/ctf_public/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1277, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/home/namsong/github/ctf_public/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1367, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'W_0/actor_loss/agent0/action_hold' with dtype int32 and shape [?]
	 [[{{node W_0/actor_loss/agent0/action_hold}} = Placeholder[dtype=DT_INT32, shape=[?], _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call 

Exception in thread Thread-8:
Traceback (most recent call last):
  File "/home/namsong/github/ctf_public/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1292, in _do_call
    return fn(*args)
  File "/home/namsong/github/ctf_public/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1277, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/home/namsong/github/ctf_public/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1367, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'W_0/actor_loss/agent0/action_hold' with dtype int32 and shape [?]
	 [[{{node W_0/actor_loss/agent0/action_hold}} = Placeholder[dtype=DT_INT32, shape=[?], _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call l

Exception in thread Thread-24:
Traceback (most recent call last):
  File "/home/namsong/github/ctf_public/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1292, in _do_call
    return fn(*args)
  File "/home/namsong/github/ctf_public/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1277, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/home/namsong/github/ctf_public/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1367, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'W_0/actor_loss/agent0/adv_hold' with dtype float and shape [?]
	 [[{{node W_0/actor_loss/agent0/adv_hold}} = Placeholder[dtype=DT_FLOAT, shape=[?], _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

Exception in thread Thread-12:
Traceback (most recent call last):
  File "/home/namsong/github/ctf_public/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1292, in _do_call
    return fn(*args)
  File "/home/namsong/github/ctf_public/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1277, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/home/namsong/github/ctf_public/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1367, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'W_0/actor_loss/agent0/action_hold' with dtype int32 and shape [?]
	 [[{{node W_0/actor_loss/agent0/action_hold}} = Placeholder[dtype=DT_INT32, shape=[?], _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call 

Exception in thread Thread-27:
Traceback (most recent call last):
  File "/home/namsong/github/ctf_public/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1292, in _do_call
    return fn(*args)
  File "/home/namsong/github/ctf_public/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1277, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/home/namsong/github/ctf_public/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1367, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'W_0/actor_loss/agent0/action_hold' with dtype int32 and shape [?]
	 [[{{node W_0/actor_loss/agent0/action_hold}} = Placeholder[dtype=DT_INT32, shape=[?], _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call 

Exception in thread Thread-18:
Traceback (most recent call last):
  File "/home/namsong/github/ctf_public/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1292, in _do_call
    return fn(*args)
  File "/home/namsong/github/ctf_public/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1277, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/home/namsong/github/ctf_public/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1367, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'W_0/actor_loss/agent0/adv_hold' with dtype float and shape [?]
	 [[{{node W_0/actor_loss/agent0/adv_hold}} = Placeholder[dtype=DT_FLOAT, shape=[?], _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

Exception in thread Thread-14:
Traceback (most recent call last):
  File "/home/namsong/github/ctf_public/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1292, in _do_call
    return fn(*args)
  File "/home/namsong/github/ctf_public/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1277, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/home/namsong/github/ctf_public/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1367, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'W_0/actor_loss/agent0/adv_hold' with dtype float and shape [?]
	 [[{{node W_0/actor_loss/agent0/adv_hold}} = Placeholder[dtype=DT_FLOAT, shape=[?], _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):