# <font color="blue"> Cross-entropy on FrozenLake <font>
 
### grid world
<img src="./image/4/gridworld.png">


#### - 4x4 격자 세계 
#### - 4가지 방향으로 행동 가능 
- 위, 아래, 왼쪽, 오른쪽 


#### - 시작은 항상 격자의 왼쪽 가장 위 
#### - 목표는 격자의 오른쪽 가장 아래 
- 목표에 도달하면 에피소드는 종료되고 보상은 1


#### - 고정된 구멍 존재 
- 구멍에 빠지면 에피소드는 종료되고 얻는 보상은 0 

#### - 미끄러운 세계 (world is slippery) 
- 에이전트의 액션은 항상 예측한대로 나타나지 않음 
    - 33%의 확률로 오른쪽 / 왼쪽으로 미끄러질 수 있음 
    - 예) 에이전트를 왼쪽으로 움직이고 싶을때, 33%의 확률로 왼쪽, 33% 위, 33% 아래 
- 더 복잡한 환경은 과정을 더 어렵게 만듦 

In [1]:
import gym, gym.spaces
from collections import namedtuple
import numpy as np
from tensorboardX import SummaryWriter

import torch
import torch.nn as nn
import torch.optim as optim

In [2]:
HIDDEN_SIZE = 128
BATCH_SIZE = 16
PERCENTILE = 70

## one-hot encoding 작업 

#### - 이산적(discrete) 관찰 
- 관찰은 단지 0~15 의 숫자 
- 관찰은 결국 격자에서 에이전트의 위치 

#### - 이산적 액션 
- 액션은 0~3

#### - 입력과 출력을 벡터로 만들기 위해 one-hot encoding 필요 
- 0 --> 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 

#### - Gym 패키지의 ObservationWrapper 사용하여 DiscreteOneHotWrapper 구현 

#### - 이 외의 코드는 앞서 봤던 CartPole 코드와 모두 동일 
- 환경만 Frozen Lake

In [3]:
# DiscreteOneHotWrapper is son of gym.ObservationWrapper
class DiscreteOneHotWrapper(gym.ObservationWrapper):
    def __init__(self, env):
        super(DiscreteOneHotWrapper, self).__init__(env)
        assert isinstance(env.observation_space, gym.spaces.Discrete)
        
        #gym.spaces.Box(low, high, shape, dtype)
        self.observation_space = gym.spaces.Box(0.0, 1.0, (env.observation_space.n, ), dtype=np.float32)
        
    
    def observation(self, observation):
        res = np.copy(self.observation_space.low) 
        res[observation] = 1.0
        
        return res

In [5]:
class Net(nn.Module):
    def __init__(self, obs_zie, hidden_size, n_actions):
        super(Net, self).__init__()
        
        self.net = nn.Sequential(nn.Linear(obs_size, hidden_size), 
                                 nn.ReLU(),
                                 nn.Linear(hidden_size, n_actions))
    
    def forward(self, x):
        return self.net(x)

In [6]:
Episode = namedtuple('Episode', field_names=['reward', 'steps'])
EpisodeStep = namedtuple('EpisodeStep', field_names=['observation', 'action'])

In [7]:
def iterate_batches(env, ent, batch_size):
    batch = []
    episode_reward = 0.0
    episode_steps = []
    obs = env.reset()
    sm = nn.Softmax(dim=1)
    
    while True :
        """choosing action"""
        obs_v = torch.FloatTensor([obs])
        act_probs_v = sm(net(obs_v))
        act_probs = act_probs_v.data.numpy()[0]
        action = np.random.choice(len(act_probs), p=act_probs)
        
        """step"""
        next_obs, reward, is_done, _ = env.step(action)
        
       # env.render()
        
        """recording"""
        episode_reward += reward 
        episode_steps.append(EpisodeStep(observation=obs, action=action))
        #steping until the pole is fallen
        if is_done :
            batch.append(Episode(reward=episode_reward, steps=episode_steps))
           
            episode_reward = 0.0
            episode_steps = []
            next_obs = env.reset()
            
            if len(batch) == batch_size :
                yield batch
                
                batch = []
        
        obs = next_obs

In [8]:
def filter_batch(batch, percentile):
    rewards = list(map(lambda s : s.reward, batch))
    reward_bound = np.percentile(rewards, percentile)
    reward_mean = float(np.mean(rewards))
    
    train_obs = []
    train_act = []
    
    for example in batch:
        if example.reward < reward_bound:
            continue
            
        train_obs.extend(map(lambda step : step.observation, example.steps))
        train_act.extend(map(lambda step : step.action, example.steps))
        
    train_obs_v = torch.FloatTensor(train_obs)
    train_act_v = torch.LongTensor(train_act)
    
    return train_obs_v, train_act_v, reward_bound, reward_mean 

In [9]:
if __name__ == "__main__":
    env = DiscreteOneHotWrapper(gym.make("FrozenLake-v0"))
    #env = gym.wrappers.Monitor(env, directory="mon", force=True)
    
    obs_size = env.observation_space.shape[0]
    n_actions = env.action_space.n 
    
    net = Net(obs_size, HIDDEN_SIZE, n_actions)
    
    objective = nn.CrossEntropyLoss()
    
    optimizer = optim.Adam(params=net.parameters(), lr=0.01)
    
    writer = SummaryWriter(comment=".frozenlake-naive")
    
    
    for iter_no, batch in enumerate(iterate_batches(env, net, BATCH_SIZE)):
        obs_v, acts_v, reward_b, reward_m = filter_batch(batch, PERCENTILE)
        
        optimizer.zero_grad()
        
        """loss"""
        action_scores_v = net(obs_v)
        loss_v = objective(action_scores_v, acts_v)
        loss_v.backward()
        
        optimizer.step()
        
        print("%d: loss=%.3f, reward_mean=%.1f, reward_bound=%.1f" % 
              (iter_no, loss_v.item(), reward_m, reward_b))
        
        
        writer.add_scalar("loss", loss_v.item(), iter_no)
        writer.add_scalar("reward_bound", reward_b, iter_no)
        writer.add_scalar("reward_mean", reward_m, iter_no)
        
        if reward_m > 0.8 :
            print("Solved!")
            break
            
    writer.close()
    

0: loss=1.381, reward_mean=0.0, reward_bound=0.0
1: loss=1.374, reward_mean=0.0, reward_bound=0.0
2: loss=1.335, reward_mean=0.0, reward_bound=0.0
3: loss=1.351, reward_mean=0.0, reward_bound=0.0
4: loss=1.342, reward_mean=0.0, reward_bound=0.0
5: loss=1.329, reward_mean=0.0, reward_bound=0.0
6: loss=1.343, reward_mean=0.0, reward_bound=0.0
7: loss=1.307, reward_mean=0.1, reward_bound=0.0
8: loss=1.257, reward_mean=0.0, reward_bound=0.0
9: loss=1.314, reward_mean=0.0, reward_bound=0.0
10: loss=1.308, reward_mean=0.0, reward_bound=0.0
11: loss=1.276, reward_mean=0.0, reward_bound=0.0
12: loss=1.295, reward_mean=0.0, reward_bound=0.0
13: loss=1.307, reward_mean=0.1, reward_bound=0.0
14: loss=1.318, reward_mean=0.0, reward_bound=0.0
15: loss=1.259, reward_mean=0.0, reward_bound=0.0
16: loss=1.357, reward_mean=0.0, reward_bound=0.0
17: loss=1.242, reward_mean=0.0, reward_bound=0.0
18: loss=1.259, reward_mean=0.1, reward_bound=0.0
19: loss=1.281, reward_mean=0.0, reward_bound=0.0
20: loss=1

166: loss=0.830, reward_mean=0.1, reward_bound=0.0
167: loss=0.896, reward_mean=0.1, reward_bound=0.0
168: loss=0.804, reward_mean=0.0, reward_bound=0.0
169: loss=0.770, reward_mean=0.0, reward_bound=0.0
170: loss=0.840, reward_mean=0.1, reward_bound=0.0
171: loss=0.719, reward_mean=0.0, reward_bound=0.0
172: loss=0.798, reward_mean=0.1, reward_bound=0.0
173: loss=0.876, reward_mean=0.0, reward_bound=0.0
174: loss=0.712, reward_mean=0.0, reward_bound=0.0
175: loss=0.820, reward_mean=0.0, reward_bound=0.0
176: loss=0.754, reward_mean=0.0, reward_bound=0.0
177: loss=0.807, reward_mean=0.0, reward_bound=0.0
178: loss=0.690, reward_mean=0.0, reward_bound=0.0
179: loss=0.812, reward_mean=0.0, reward_bound=0.0
180: loss=0.680, reward_mean=0.0, reward_bound=0.0
181: loss=1.012, reward_mean=0.0, reward_bound=0.0
182: loss=0.776, reward_mean=0.0, reward_bound=0.0
183: loss=1.043, reward_mean=0.0, reward_bound=0.0
184: loss=0.865, reward_mean=0.0, reward_bound=0.0
185: loss=0.911, reward_mean=0.

329: loss=0.538, reward_mean=0.0, reward_bound=0.0
330: loss=0.403, reward_mean=0.0, reward_bound=0.0
331: loss=0.459, reward_mean=0.0, reward_bound=0.0
332: loss=0.440, reward_mean=0.0, reward_bound=0.0
333: loss=0.522, reward_mean=0.0, reward_bound=0.0
334: loss=0.472, reward_mean=0.0, reward_bound=0.0
335: loss=0.420, reward_mean=0.0, reward_bound=0.0
336: loss=0.428, reward_mean=0.0, reward_bound=0.0
337: loss=0.459, reward_mean=0.0, reward_bound=0.0
338: loss=0.512, reward_mean=0.0, reward_bound=0.0
339: loss=0.382, reward_mean=0.0, reward_bound=0.0
340: loss=0.401, reward_mean=0.0, reward_bound=0.0
341: loss=0.375, reward_mean=0.0, reward_bound=0.0
342: loss=0.374, reward_mean=0.1, reward_bound=0.0
343: loss=0.395, reward_mean=0.0, reward_bound=0.0
344: loss=0.338, reward_mean=0.0, reward_bound=0.0
345: loss=0.394, reward_mean=0.0, reward_bound=0.0
346: loss=0.397, reward_mean=0.0, reward_bound=0.0
347: loss=0.397, reward_mean=0.0, reward_bound=0.0
348: loss=0.439, reward_mean=0.

490: loss=0.553, reward_mean=0.0, reward_bound=0.0
491: loss=0.585, reward_mean=0.0, reward_bound=0.0
492: loss=0.590, reward_mean=0.0, reward_bound=0.0
493: loss=0.600, reward_mean=0.0, reward_bound=0.0
494: loss=0.605, reward_mean=0.0, reward_bound=0.0
495: loss=0.599, reward_mean=0.0, reward_bound=0.0
496: loss=0.554, reward_mean=0.0, reward_bound=0.0
497: loss=0.619, reward_mean=0.0, reward_bound=0.0
498: loss=0.592, reward_mean=0.0, reward_bound=0.0
499: loss=0.667, reward_mean=0.0, reward_bound=0.0
500: loss=0.727, reward_mean=0.0, reward_bound=0.0
501: loss=0.602, reward_mean=0.0, reward_bound=0.0
502: loss=0.609, reward_mean=0.0, reward_bound=0.0
503: loss=0.643, reward_mean=0.0, reward_bound=0.0
504: loss=0.683, reward_mean=0.0, reward_bound=0.0
505: loss=0.634, reward_mean=0.0, reward_bound=0.0
506: loss=0.638, reward_mean=0.0, reward_bound=0.0
507: loss=0.597, reward_mean=0.0, reward_bound=0.0
508: loss=0.613, reward_mean=0.0, reward_bound=0.0
509: loss=0.617, reward_mean=0.

653: loss=0.162, reward_mean=0.0, reward_bound=0.0
654: loss=0.121, reward_mean=0.0, reward_bound=0.0
655: loss=0.112, reward_mean=0.0, reward_bound=0.0
656: loss=0.089, reward_mean=0.0, reward_bound=0.0
657: loss=0.155, reward_mean=0.1, reward_bound=0.0
658: loss=0.215, reward_mean=0.1, reward_bound=0.0
659: loss=0.239, reward_mean=0.1, reward_bound=0.0
660: loss=0.058, reward_mean=0.0, reward_bound=0.0
661: loss=0.221, reward_mean=0.0, reward_bound=0.0
662: loss=0.119, reward_mean=0.1, reward_bound=0.0
663: loss=0.169, reward_mean=0.0, reward_bound=0.0
664: loss=0.024, reward_mean=0.0, reward_bound=0.0
665: loss=0.222, reward_mean=0.0, reward_bound=0.0
666: loss=0.136, reward_mean=0.0, reward_bound=0.0
667: loss=0.117, reward_mean=0.0, reward_bound=0.0
668: loss=0.230, reward_mean=0.1, reward_bound=0.0
669: loss=0.199, reward_mean=0.1, reward_bound=0.0
670: loss=0.028, reward_mean=0.1, reward_bound=0.0
671: loss=0.066, reward_mean=0.0, reward_bound=0.0
672: loss=0.175, reward_mean=0.

815: loss=0.197, reward_mean=0.0, reward_bound=0.0
816: loss=0.129, reward_mean=0.1, reward_bound=0.0
817: loss=0.077, reward_mean=0.1, reward_bound=0.0
818: loss=0.141, reward_mean=0.0, reward_bound=0.0
819: loss=0.163, reward_mean=0.1, reward_bound=0.0
820: loss=0.176, reward_mean=0.1, reward_bound=0.0
821: loss=0.041, reward_mean=0.1, reward_bound=0.0
822: loss=0.035, reward_mean=0.0, reward_bound=0.0
823: loss=0.269, reward_mean=0.1, reward_bound=0.0
824: loss=0.252, reward_mean=0.0, reward_bound=0.0
825: loss=0.220, reward_mean=0.0, reward_bound=0.0
826: loss=0.102, reward_mean=0.2, reward_bound=0.0
827: loss=0.153, reward_mean=0.1, reward_bound=0.0
828: loss=0.120, reward_mean=0.0, reward_bound=0.0
829: loss=0.184, reward_mean=0.0, reward_bound=0.0
830: loss=0.134, reward_mean=0.0, reward_bound=0.0
831: loss=0.152, reward_mean=0.1, reward_bound=0.0
832: loss=0.152, reward_mean=0.1, reward_bound=0.0
833: loss=0.145, reward_mean=0.1, reward_bound=0.0
834: loss=0.151, reward_mean=0.

980: loss=0.095, reward_mean=0.1, reward_bound=0.0
981: loss=0.072, reward_mean=0.0, reward_bound=0.0
982: loss=0.191, reward_mean=0.2, reward_bound=0.0
983: loss=0.165, reward_mean=0.1, reward_bound=0.0
984: loss=0.056, reward_mean=0.1, reward_bound=0.0
985: loss=0.040, reward_mean=0.0, reward_bound=0.0
986: loss=0.100, reward_mean=0.1, reward_bound=0.0
987: loss=0.019, reward_mean=0.0, reward_bound=0.0
988: loss=0.059, reward_mean=0.0, reward_bound=0.0
989: loss=0.051, reward_mean=0.0, reward_bound=0.0
990: loss=0.097, reward_mean=0.1, reward_bound=0.0
991: loss=0.011, reward_mean=0.0, reward_bound=0.0
992: loss=0.189, reward_mean=0.1, reward_bound=0.0
993: loss=0.082, reward_mean=0.1, reward_bound=0.0
994: loss=0.028, reward_mean=0.0, reward_bound=0.0
995: loss=0.123, reward_mean=0.1, reward_bound=0.0
996: loss=0.148, reward_mean=0.0, reward_bound=0.0
997: loss=0.078, reward_mean=0.1, reward_bound=0.0
998: loss=0.028, reward_mean=0.1, reward_bound=0.0
999: loss=0.085, reward_mean=0.

1141: loss=0.046, reward_mean=0.0, reward_bound=0.0
1142: loss=0.154, reward_mean=0.1, reward_bound=0.0
1143: loss=0.013, reward_mean=0.1, reward_bound=0.0
1144: loss=0.210, reward_mean=0.0, reward_bound=0.0
1145: loss=0.103, reward_mean=0.0, reward_bound=0.0
1146: loss=0.087, reward_mean=0.1, reward_bound=0.0
1147: loss=0.081, reward_mean=0.1, reward_bound=0.0
1148: loss=0.077, reward_mean=0.1, reward_bound=0.0
1149: loss=0.146, reward_mean=0.1, reward_bound=0.0
1150: loss=0.070, reward_mean=0.1, reward_bound=0.0
1151: loss=0.041, reward_mean=0.1, reward_bound=0.0
1152: loss=0.087, reward_mean=0.1, reward_bound=0.0
1153: loss=0.063, reward_mean=0.1, reward_bound=0.0
1154: loss=0.076, reward_mean=0.1, reward_bound=0.0
1155: loss=0.031, reward_mean=0.0, reward_bound=0.0
1156: loss=0.062, reward_mean=0.0, reward_bound=0.0
1157: loss=0.073, reward_mean=0.1, reward_bound=0.0
1158: loss=0.074, reward_mean=0.0, reward_bound=0.0
1159: loss=0.113, reward_mean=0.0, reward_bound=0.0
1160: loss=0

1301: loss=0.031, reward_mean=0.0, reward_bound=0.0
1302: loss=0.045, reward_mean=0.1, reward_bound=0.0
1303: loss=0.004, reward_mean=0.0, reward_bound=0.0
1304: loss=0.018, reward_mean=0.1, reward_bound=0.0
1305: loss=0.015, reward_mean=0.0, reward_bound=0.0
1306: loss=0.012, reward_mean=0.1, reward_bound=0.0
1307: loss=0.013, reward_mean=0.2, reward_bound=0.0
1308: loss=0.034, reward_mean=0.0, reward_bound=0.0
1309: loss=0.137, reward_mean=0.1, reward_bound=0.0
1310: loss=0.035, reward_mean=0.0, reward_bound=0.0
1311: loss=0.033, reward_mean=0.1, reward_bound=0.0
1312: loss=0.003, reward_mean=0.0, reward_bound=0.0
1313: loss=0.039, reward_mean=0.0, reward_bound=0.0
1314: loss=0.042, reward_mean=0.2, reward_bound=0.0
1315: loss=0.003, reward_mean=0.0, reward_bound=0.0
1316: loss=0.063, reward_mean=0.0, reward_bound=0.0
1317: loss=0.056, reward_mean=0.1, reward_bound=0.0
1318: loss=0.012, reward_mean=0.1, reward_bound=0.0
1319: loss=0.014, reward_mean=0.0, reward_bound=0.0
1320: loss=0

1460: loss=0.129, reward_mean=0.0, reward_bound=0.0
1461: loss=0.168, reward_mean=0.0, reward_bound=0.0
1462: loss=0.235, reward_mean=0.1, reward_bound=0.0
1463: loss=0.096, reward_mean=0.0, reward_bound=0.0
1464: loss=0.095, reward_mean=0.0, reward_bound=0.0
1465: loss=0.188, reward_mean=0.0, reward_bound=0.0
1466: loss=0.065, reward_mean=0.0, reward_bound=0.0
1467: loss=0.123, reward_mean=0.1, reward_bound=0.0
1468: loss=0.064, reward_mean=0.0, reward_bound=0.0
1469: loss=0.253, reward_mean=0.1, reward_bound=0.0
1470: loss=0.245, reward_mean=0.1, reward_bound=0.0
1471: loss=0.059, reward_mean=0.0, reward_bound=0.0
1472: loss=0.100, reward_mean=0.0, reward_bound=0.0
1473: loss=0.204, reward_mean=0.0, reward_bound=0.0
1474: loss=0.208, reward_mean=0.0, reward_bound=0.0
1475: loss=0.091, reward_mean=0.0, reward_bound=0.0
1476: loss=0.163, reward_mean=0.1, reward_bound=0.0
1477: loss=0.268, reward_mean=0.1, reward_bound=0.0
1478: loss=0.225, reward_mean=0.0, reward_bound=0.0
1479: loss=0

1620: loss=0.076, reward_mean=0.0, reward_bound=0.0
1621: loss=0.066, reward_mean=0.0, reward_bound=0.0
1622: loss=0.104, reward_mean=0.1, reward_bound=0.0
1623: loss=0.054, reward_mean=0.1, reward_bound=0.0
1624: loss=0.081, reward_mean=0.1, reward_bound=0.0
1625: loss=0.178, reward_mean=0.0, reward_bound=0.0
1626: loss=0.037, reward_mean=0.0, reward_bound=0.0
1627: loss=0.051, reward_mean=0.0, reward_bound=0.0
1628: loss=0.123, reward_mean=0.1, reward_bound=0.0
1629: loss=0.095, reward_mean=0.0, reward_bound=0.0
1630: loss=0.045, reward_mean=0.0, reward_bound=0.0
1631: loss=0.059, reward_mean=0.1, reward_bound=0.0
1632: loss=0.112, reward_mean=0.1, reward_bound=0.0
1633: loss=0.133, reward_mean=0.1, reward_bound=0.0
1634: loss=0.062, reward_mean=0.0, reward_bound=0.0
1635: loss=0.029, reward_mean=0.1, reward_bound=0.0
1636: loss=0.111, reward_mean=0.0, reward_bound=0.0
1637: loss=0.131, reward_mean=0.1, reward_bound=0.0
1638: loss=0.040, reward_mean=0.1, reward_bound=0.0
1639: loss=0

1782: loss=0.048, reward_mean=0.2, reward_bound=0.0
1783: loss=0.034, reward_mean=0.1, reward_bound=0.0
1784: loss=0.027, reward_mean=0.1, reward_bound=0.0
1785: loss=0.072, reward_mean=0.1, reward_bound=0.0
1786: loss=0.017, reward_mean=0.1, reward_bound=0.0
1787: loss=0.118, reward_mean=0.1, reward_bound=0.0
1788: loss=0.039, reward_mean=0.0, reward_bound=0.0
1789: loss=0.015, reward_mean=0.0, reward_bound=0.0
1790: loss=0.076, reward_mean=0.2, reward_bound=0.0
1791: loss=0.128, reward_mean=0.0, reward_bound=0.0
1792: loss=0.142, reward_mean=0.0, reward_bound=0.0
1793: loss=0.060, reward_mean=0.0, reward_bound=0.0
1794: loss=0.086, reward_mean=0.1, reward_bound=0.0
1795: loss=0.036, reward_mean=0.1, reward_bound=0.0
1796: loss=0.040, reward_mean=0.0, reward_bound=0.0
1797: loss=0.061, reward_mean=0.1, reward_bound=0.0
1798: loss=0.041, reward_mean=0.0, reward_bound=0.0
1799: loss=0.017, reward_mean=0.1, reward_bound=0.0
1800: loss=0.022, reward_mean=0.1, reward_bound=0.0
1801: loss=0

1945: loss=0.007, reward_mean=0.1, reward_bound=0.0
1946: loss=0.016, reward_mean=0.0, reward_bound=0.0
1947: loss=0.092, reward_mean=0.1, reward_bound=0.0
1948: loss=0.014, reward_mean=0.1, reward_bound=0.0
1949: loss=0.022, reward_mean=0.0, reward_bound=0.0
1950: loss=0.011, reward_mean=0.1, reward_bound=0.0
1951: loss=0.005, reward_mean=0.0, reward_bound=0.0
1952: loss=0.004, reward_mean=0.0, reward_bound=0.0
1953: loss=0.065, reward_mean=0.2, reward_bound=0.0
1954: loss=0.005, reward_mean=0.1, reward_bound=0.0
1955: loss=0.000, reward_mean=0.0, reward_bound=0.0
1956: loss=0.009, reward_mean=0.0, reward_bound=0.0
1957: loss=0.053, reward_mean=0.0, reward_bound=0.0
1958: loss=0.028, reward_mean=0.1, reward_bound=0.0
1959: loss=0.001, reward_mean=0.0, reward_bound=0.0
1960: loss=0.009, reward_mean=0.0, reward_bound=0.0
1961: loss=0.029, reward_mean=0.0, reward_bound=0.0
1962: loss=0.025, reward_mean=0.0, reward_bound=0.0
1963: loss=0.032, reward_mean=0.1, reward_bound=0.0
1964: loss=0

2106: loss=0.004, reward_mean=0.1, reward_bound=0.0
2107: loss=0.048, reward_mean=0.1, reward_bound=0.0
2108: loss=0.025, reward_mean=0.0, reward_bound=0.0
2109: loss=0.014, reward_mean=0.1, reward_bound=0.0
2110: loss=0.027, reward_mean=0.0, reward_bound=0.0
2111: loss=0.070, reward_mean=0.0, reward_bound=0.0
2112: loss=0.008, reward_mean=0.1, reward_bound=0.0
2113: loss=0.066, reward_mean=0.1, reward_bound=0.0
2114: loss=0.042, reward_mean=0.1, reward_bound=0.0
2115: loss=0.026, reward_mean=0.1, reward_bound=0.0
2116: loss=0.055, reward_mean=0.0, reward_bound=0.0
2117: loss=0.021, reward_mean=0.0, reward_bound=0.0
2118: loss=0.011, reward_mean=0.0, reward_bound=0.0
2119: loss=0.024, reward_mean=0.1, reward_bound=0.0
2120: loss=0.115, reward_mean=0.1, reward_bound=0.0
2121: loss=0.061, reward_mean=0.1, reward_bound=0.0
2122: loss=0.017, reward_mean=0.0, reward_bound=0.0
2123: loss=0.018, reward_mean=0.0, reward_bound=0.0
2124: loss=0.050, reward_mean=0.1, reward_bound=0.0
2125: loss=0

2265: loss=0.005, reward_mean=0.1, reward_bound=0.0
2266: loss=0.002, reward_mean=0.0, reward_bound=0.0
2267: loss=0.003, reward_mean=0.0, reward_bound=0.0
2268: loss=0.040, reward_mean=0.1, reward_bound=0.0
2269: loss=0.044, reward_mean=0.0, reward_bound=0.0
2270: loss=0.039, reward_mean=0.0, reward_bound=0.0
2271: loss=0.008, reward_mean=0.0, reward_bound=0.0
2272: loss=0.166, reward_mean=0.0, reward_bound=0.0
2273: loss=0.008, reward_mean=0.0, reward_bound=0.0
2274: loss=0.078, reward_mean=0.0, reward_bound=0.0
2275: loss=0.048, reward_mean=0.0, reward_bound=0.0
2276: loss=0.006, reward_mean=0.1, reward_bound=0.0
2277: loss=0.004, reward_mean=0.0, reward_bound=0.0
2278: loss=0.007, reward_mean=0.1, reward_bound=0.0
2279: loss=0.041, reward_mean=0.1, reward_bound=0.0
2280: loss=0.008, reward_mean=0.1, reward_bound=0.0
2281: loss=0.045, reward_mean=0.1, reward_bound=0.0
2282: loss=0.077, reward_mean=0.0, reward_bound=0.0
2283: loss=0.081, reward_mean=0.0, reward_bound=0.0
2284: loss=0

2424: loss=0.003, reward_mean=0.1, reward_bound=0.0
2425: loss=0.006, reward_mean=0.0, reward_bound=0.0
2426: loss=0.003, reward_mean=0.0, reward_bound=0.0
2427: loss=0.100, reward_mean=0.1, reward_bound=0.0
2428: loss=0.003, reward_mean=0.0, reward_bound=0.0
2429: loss=0.201, reward_mean=0.1, reward_bound=0.0
2430: loss=0.005, reward_mean=0.0, reward_bound=0.0
2431: loss=0.002, reward_mean=0.0, reward_bound=0.0
2432: loss=0.007, reward_mean=0.1, reward_bound=0.0
2433: loss=0.025, reward_mean=0.1, reward_bound=0.0
2434: loss=0.068, reward_mean=0.1, reward_bound=0.0
2435: loss=0.047, reward_mean=0.0, reward_bound=0.0
2436: loss=0.002, reward_mean=0.1, reward_bound=0.0
2437: loss=0.057, reward_mean=0.1, reward_bound=0.0
2438: loss=0.005, reward_mean=0.1, reward_bound=0.0
2439: loss=0.004, reward_mean=0.0, reward_bound=0.0
2440: loss=0.036, reward_mean=0.1, reward_bound=0.0
2441: loss=0.007, reward_mean=0.0, reward_bound=0.0
2442: loss=0.005, reward_mean=0.0, reward_bound=0.0
2443: loss=0

2588: loss=0.039, reward_mean=0.0, reward_bound=0.0
2589: loss=0.032, reward_mean=0.0, reward_bound=0.0
2590: loss=0.042, reward_mean=0.0, reward_bound=0.0
2591: loss=0.022, reward_mean=0.1, reward_bound=0.0
2592: loss=0.159, reward_mean=0.0, reward_bound=0.0
2593: loss=0.045, reward_mean=0.0, reward_bound=0.0
2594: loss=0.013, reward_mean=0.1, reward_bound=0.0
2595: loss=0.055, reward_mean=0.1, reward_bound=0.0
2596: loss=0.090, reward_mean=0.1, reward_bound=0.0
2597: loss=0.022, reward_mean=0.0, reward_bound=0.0
2598: loss=0.123, reward_mean=0.0, reward_bound=0.0
2599: loss=0.021, reward_mean=0.1, reward_bound=0.0
2600: loss=0.050, reward_mean=0.0, reward_bound=0.0
2601: loss=0.048, reward_mean=0.0, reward_bound=0.0
2602: loss=0.150, reward_mean=0.1, reward_bound=0.0
2603: loss=0.022, reward_mean=0.0, reward_bound=0.0
2604: loss=0.120, reward_mean=0.1, reward_bound=0.0
2605: loss=0.050, reward_mean=0.0, reward_bound=0.0
2606: loss=0.027, reward_mean=0.0, reward_bound=0.0
2607: loss=0

2747: loss=0.074, reward_mean=0.0, reward_bound=0.0
2748: loss=0.148, reward_mean=0.1, reward_bound=0.0
2749: loss=0.053, reward_mean=0.1, reward_bound=0.0
2750: loss=0.111, reward_mean=0.0, reward_bound=0.0
2751: loss=0.078, reward_mean=0.0, reward_bound=0.0
2752: loss=0.071, reward_mean=0.0, reward_bound=0.0
2753: loss=0.054, reward_mean=0.0, reward_bound=0.0
2754: loss=0.138, reward_mean=0.2, reward_bound=0.0
2755: loss=0.045, reward_mean=0.0, reward_bound=0.0
2756: loss=0.064, reward_mean=0.0, reward_bound=0.0
2757: loss=0.079, reward_mean=0.1, reward_bound=0.0
2758: loss=0.095, reward_mean=0.0, reward_bound=0.0
2759: loss=0.034, reward_mean=0.0, reward_bound=0.0
2760: loss=0.040, reward_mean=0.1, reward_bound=0.0
2761: loss=0.088, reward_mean=0.0, reward_bound=0.0
2762: loss=0.072, reward_mean=0.0, reward_bound=0.0
2763: loss=0.073, reward_mean=0.1, reward_bound=0.0
2764: loss=0.056, reward_mean=0.1, reward_bound=0.0
2765: loss=0.058, reward_mean=0.1, reward_bound=0.0
2766: loss=0

2906: loss=0.096, reward_mean=0.1, reward_bound=0.0
2907: loss=0.054, reward_mean=0.0, reward_bound=0.0
2908: loss=0.112, reward_mean=0.2, reward_bound=0.0
2909: loss=0.126, reward_mean=0.0, reward_bound=0.0
2910: loss=0.076, reward_mean=0.1, reward_bound=0.0
2911: loss=0.153, reward_mean=0.0, reward_bound=0.0
2912: loss=0.036, reward_mean=0.0, reward_bound=0.0
2913: loss=0.080, reward_mean=0.0, reward_bound=0.0
2914: loss=0.095, reward_mean=0.1, reward_bound=0.0
2915: loss=0.098, reward_mean=0.1, reward_bound=0.0
2916: loss=0.142, reward_mean=0.1, reward_bound=0.0
2917: loss=0.134, reward_mean=0.0, reward_bound=0.0
2918: loss=0.070, reward_mean=0.1, reward_bound=0.0
2919: loss=0.067, reward_mean=0.1, reward_bound=0.0
2920: loss=0.057, reward_mean=0.0, reward_bound=0.0
2921: loss=0.083, reward_mean=0.0, reward_bound=0.0
2922: loss=0.048, reward_mean=0.0, reward_bound=0.0
2923: loss=0.027, reward_mean=0.1, reward_bound=0.0
2924: loss=0.072, reward_mean=0.1, reward_bound=0.0
2925: loss=0

3067: loss=0.047, reward_mean=0.1, reward_bound=0.0
3068: loss=0.069, reward_mean=0.1, reward_bound=0.0
3069: loss=0.074, reward_mean=0.1, reward_bound=0.0
3070: loss=0.058, reward_mean=0.1, reward_bound=0.0
3071: loss=0.030, reward_mean=0.1, reward_bound=0.0
3072: loss=0.067, reward_mean=0.1, reward_bound=0.0
3073: loss=0.047, reward_mean=0.1, reward_bound=0.0
3074: loss=0.069, reward_mean=0.1, reward_bound=0.0
3075: loss=0.065, reward_mean=0.1, reward_bound=0.0
3076: loss=0.091, reward_mean=0.1, reward_bound=0.0
3077: loss=0.032, reward_mean=0.0, reward_bound=0.0
3078: loss=0.069, reward_mean=0.1, reward_bound=0.0
3079: loss=0.064, reward_mean=0.0, reward_bound=0.0
3080: loss=0.025, reward_mean=0.0, reward_bound=0.0
3081: loss=0.033, reward_mean=0.1, reward_bound=0.0
3082: loss=0.019, reward_mean=0.0, reward_bound=0.0
3083: loss=0.180, reward_mean=0.1, reward_bound=0.0
3084: loss=0.019, reward_mean=0.0, reward_bound=0.0
3085: loss=0.102, reward_mean=0.2, reward_bound=0.0
3086: loss=0

3225: loss=0.021, reward_mean=0.0, reward_bound=0.0
3226: loss=0.051, reward_mean=0.0, reward_bound=0.0
3227: loss=0.036, reward_mean=0.0, reward_bound=0.0
3228: loss=0.034, reward_mean=0.1, reward_bound=0.0
3229: loss=0.073, reward_mean=0.0, reward_bound=0.0
3230: loss=0.045, reward_mean=0.1, reward_bound=0.0
3231: loss=0.051, reward_mean=0.0, reward_bound=0.0
3232: loss=0.105, reward_mean=0.1, reward_bound=0.0
3233: loss=0.115, reward_mean=0.0, reward_bound=0.0
3234: loss=0.060, reward_mean=0.2, reward_bound=0.0
3235: loss=0.050, reward_mean=0.0, reward_bound=0.0
3236: loss=0.049, reward_mean=0.1, reward_bound=0.0
3237: loss=0.085, reward_mean=0.0, reward_bound=0.0
3238: loss=0.079, reward_mean=0.1, reward_bound=0.0
3239: loss=0.019, reward_mean=0.1, reward_bound=0.0
3240: loss=0.113, reward_mean=0.0, reward_bound=0.0
3241: loss=0.059, reward_mean=0.0, reward_bound=0.0
3242: loss=0.066, reward_mean=0.1, reward_bound=0.0
3243: loss=0.168, reward_mean=0.1, reward_bound=0.0
3244: loss=0

3383: loss=0.010, reward_mean=0.1, reward_bound=0.0
3384: loss=0.089, reward_mean=0.1, reward_bound=0.0
3385: loss=0.005, reward_mean=0.1, reward_bound=0.0
3386: loss=0.050, reward_mean=0.0, reward_bound=0.0
3387: loss=0.037, reward_mean=0.1, reward_bound=0.0
3388: loss=0.007, reward_mean=0.0, reward_bound=0.0
3389: loss=0.000, reward_mean=0.0, reward_bound=0.0
3390: loss=0.064, reward_mean=0.1, reward_bound=0.0
3391: loss=0.006, reward_mean=0.0, reward_bound=0.0
3392: loss=0.005, reward_mean=0.1, reward_bound=0.0
3393: loss=0.032, reward_mean=0.1, reward_bound=0.0
3394: loss=0.003, reward_mean=0.1, reward_bound=0.0
3395: loss=0.006, reward_mean=0.0, reward_bound=0.0
3396: loss=0.005, reward_mean=0.1, reward_bound=0.0
3397: loss=0.051, reward_mean=0.1, reward_bound=0.0
3398: loss=0.005, reward_mean=0.2, reward_bound=0.0
3399: loss=0.007, reward_mean=0.1, reward_bound=0.0
3400: loss=0.003, reward_mean=0.0, reward_bound=0.0
3401: loss=0.071, reward_mean=0.1, reward_bound=0.0
3402: loss=0

3543: loss=0.060, reward_mean=0.1, reward_bound=0.0
3544: loss=0.001, reward_mean=0.0, reward_bound=0.0
3545: loss=0.001, reward_mean=0.1, reward_bound=0.0
3546: loss=0.003, reward_mean=0.1, reward_bound=0.0
3547: loss=0.032, reward_mean=0.0, reward_bound=0.0
3548: loss=0.007, reward_mean=0.1, reward_bound=0.0
3549: loss=0.006, reward_mean=0.1, reward_bound=0.0
3550: loss=0.004, reward_mean=0.1, reward_bound=0.0
3551: loss=0.004, reward_mean=0.0, reward_bound=0.0
3552: loss=0.005, reward_mean=0.0, reward_bound=0.0
3553: loss=0.059, reward_mean=0.0, reward_bound=0.0
3554: loss=0.007, reward_mean=0.2, reward_bound=0.0
3555: loss=0.064, reward_mean=0.1, reward_bound=0.0
3556: loss=0.009, reward_mean=0.1, reward_bound=0.0
3557: loss=0.004, reward_mean=0.1, reward_bound=0.0
3558: loss=0.039, reward_mean=0.0, reward_bound=0.0
3559: loss=0.005, reward_mean=0.1, reward_bound=0.0
3560: loss=0.004, reward_mean=0.1, reward_bound=0.0
3561: loss=0.003, reward_mean=0.0, reward_bound=0.0
3562: loss=0

3702: loss=0.000, reward_mean=0.1, reward_bound=0.0
3703: loss=0.040, reward_mean=0.1, reward_bound=0.0
3704: loss=0.043, reward_mean=0.0, reward_bound=0.0
3705: loss=0.001, reward_mean=0.1, reward_bound=0.0
3706: loss=0.070, reward_mean=0.2, reward_bound=0.0
3707: loss=0.002, reward_mean=0.0, reward_bound=0.0
3708: loss=0.004, reward_mean=0.1, reward_bound=0.0
3709: loss=0.005, reward_mean=0.2, reward_bound=0.0
3710: loss=0.032, reward_mean=0.1, reward_bound=0.0
3711: loss=0.065, reward_mean=0.0, reward_bound=0.0
3712: loss=0.003, reward_mean=0.1, reward_bound=0.0
3713: loss=0.001, reward_mean=0.0, reward_bound=0.0
3714: loss=0.037, reward_mean=0.0, reward_bound=0.0
3715: loss=0.004, reward_mean=0.1, reward_bound=0.0
3716: loss=0.002, reward_mean=0.1, reward_bound=0.0
3717: loss=0.072, reward_mean=0.1, reward_bound=0.0
3718: loss=0.037, reward_mean=0.1, reward_bound=0.0
3719: loss=0.008, reward_mean=0.1, reward_bound=0.0
3720: loss=0.043, reward_mean=0.0, reward_bound=0.0
3721: loss=0

3865: loss=0.001, reward_mean=0.0, reward_bound=0.0
3866: loss=0.079, reward_mean=0.0, reward_bound=0.0
3867: loss=0.002, reward_mean=0.1, reward_bound=0.0
3868: loss=0.041, reward_mean=0.1, reward_bound=0.0
3869: loss=0.058, reward_mean=0.1, reward_bound=0.0
3870: loss=0.004, reward_mean=0.0, reward_bound=0.0
3871: loss=0.003, reward_mean=0.1, reward_bound=0.0
3872: loss=0.004, reward_mean=0.0, reward_bound=0.0
3873: loss=0.040, reward_mean=0.1, reward_bound=0.0
3874: loss=0.049, reward_mean=0.0, reward_bound=0.0
3875: loss=0.002, reward_mean=0.0, reward_bound=0.0
3876: loss=0.003, reward_mean=0.1, reward_bound=0.0
3877: loss=0.057, reward_mean=0.0, reward_bound=0.0
3878: loss=0.002, reward_mean=0.0, reward_bound=0.0
3879: loss=0.047, reward_mean=0.0, reward_bound=0.0
3880: loss=0.003, reward_mean=0.0, reward_bound=0.0
3881: loss=0.005, reward_mean=0.0, reward_bound=0.0
3882: loss=0.005, reward_mean=0.1, reward_bound=0.0
3883: loss=0.002, reward_mean=0.1, reward_bound=0.0
3884: loss=0

4027: loss=0.039, reward_mean=0.0, reward_bound=0.0
4028: loss=0.012, reward_mean=0.1, reward_bound=0.0
4029: loss=0.050, reward_mean=0.1, reward_bound=0.0
4030: loss=0.112, reward_mean=0.0, reward_bound=0.0
4031: loss=0.017, reward_mean=0.0, reward_bound=0.0
4032: loss=0.027, reward_mean=0.1, reward_bound=0.0
4033: loss=0.065, reward_mean=0.1, reward_bound=0.0
4034: loss=0.009, reward_mean=0.0, reward_bound=0.0
4035: loss=0.012, reward_mean=0.0, reward_bound=0.0
4036: loss=0.031, reward_mean=0.1, reward_bound=0.0
4037: loss=0.022, reward_mean=0.2, reward_bound=0.0
4038: loss=0.005, reward_mean=0.0, reward_bound=0.0
4039: loss=0.043, reward_mean=0.1, reward_bound=0.0
4040: loss=0.002, reward_mean=0.2, reward_bound=0.0
4041: loss=0.022, reward_mean=0.0, reward_bound=0.0
4042: loss=0.005, reward_mean=0.1, reward_bound=0.0
4043: loss=0.036, reward_mean=0.1, reward_bound=0.0
4044: loss=0.028, reward_mean=0.2, reward_bound=0.0
4045: loss=0.006, reward_mean=0.0, reward_bound=0.0
4046: loss=0

4191: loss=0.012, reward_mean=0.0, reward_bound=0.0
4192: loss=0.008, reward_mean=0.2, reward_bound=0.0
4193: loss=0.055, reward_mean=0.0, reward_bound=0.0
4194: loss=0.055, reward_mean=0.0, reward_bound=0.0
4195: loss=0.023, reward_mean=0.1, reward_bound=0.0
4196: loss=0.021, reward_mean=0.1, reward_bound=0.0
4197: loss=0.008, reward_mean=0.0, reward_bound=0.0
4198: loss=0.034, reward_mean=0.0, reward_bound=0.0
4199: loss=0.078, reward_mean=0.0, reward_bound=0.0
4200: loss=0.065, reward_mean=0.0, reward_bound=0.0
4201: loss=0.023, reward_mean=0.0, reward_bound=0.0
4202: loss=0.009, reward_mean=0.1, reward_bound=0.0
4203: loss=0.008, reward_mean=0.1, reward_bound=0.0
4204: loss=0.007, reward_mean=0.0, reward_bound=0.0
4205: loss=0.020, reward_mean=0.0, reward_bound=0.0
4206: loss=0.000, reward_mean=0.2, reward_bound=0.0
4207: loss=0.015, reward_mean=0.0, reward_bound=0.0
4208: loss=0.073, reward_mean=0.1, reward_bound=0.0
4209: loss=0.117, reward_mean=0.0, reward_bound=0.0
4210: loss=0

4350: loss=0.181, reward_mean=0.1, reward_bound=0.0
4351: loss=0.004, reward_mean=0.0, reward_bound=0.0
4352: loss=0.079, reward_mean=0.0, reward_bound=0.0
4353: loss=0.044, reward_mean=0.1, reward_bound=0.0
4354: loss=0.104, reward_mean=0.1, reward_bound=0.0
4355: loss=0.010, reward_mean=0.0, reward_bound=0.0
4356: loss=0.064, reward_mean=0.2, reward_bound=0.0
4357: loss=0.007, reward_mean=0.1, reward_bound=0.0
4358: loss=0.003, reward_mean=0.1, reward_bound=0.0
4359: loss=0.010, reward_mean=0.1, reward_bound=0.0
4360: loss=0.006, reward_mean=0.1, reward_bound=0.0
4361: loss=0.001, reward_mean=0.1, reward_bound=0.0
4362: loss=0.048, reward_mean=0.0, reward_bound=0.0
4363: loss=0.040, reward_mean=0.1, reward_bound=0.0
4364: loss=0.050, reward_mean=0.1, reward_bound=0.0
4365: loss=0.056, reward_mean=0.0, reward_bound=0.0
4366: loss=0.001, reward_mean=0.0, reward_bound=0.0
4367: loss=0.001, reward_mean=0.2, reward_bound=0.0
4368: loss=0.158, reward_mean=0.1, reward_bound=0.0
4369: loss=0

4512: loss=0.053, reward_mean=0.0, reward_bound=0.0
4513: loss=0.004, reward_mean=0.1, reward_bound=0.0
4514: loss=0.052, reward_mean=0.0, reward_bound=0.0
4515: loss=0.034, reward_mean=0.1, reward_bound=0.0
4516: loss=0.043, reward_mean=0.1, reward_bound=0.0
4517: loss=0.009, reward_mean=0.1, reward_bound=0.0
4518: loss=0.003, reward_mean=0.1, reward_bound=0.0
4519: loss=0.021, reward_mean=0.0, reward_bound=0.0
4520: loss=0.121, reward_mean=0.1, reward_bound=0.0
4521: loss=0.047, reward_mean=0.0, reward_bound=0.0
4522: loss=0.018, reward_mean=0.0, reward_bound=0.0
4523: loss=0.030, reward_mean=0.0, reward_bound=0.0
4524: loss=0.024, reward_mean=0.0, reward_bound=0.0
4525: loss=0.004, reward_mean=0.0, reward_bound=0.0
4526: loss=0.063, reward_mean=0.0, reward_bound=0.0
4527: loss=0.111, reward_mean=0.0, reward_bound=0.0
4528: loss=0.003, reward_mean=0.0, reward_bound=0.0
4529: loss=0.037, reward_mean=0.0, reward_bound=0.0
4530: loss=0.039, reward_mean=0.1, reward_bound=0.0
4531: loss=0

4672: loss=0.007, reward_mean=0.1, reward_bound=0.0
4673: loss=0.073, reward_mean=0.0, reward_bound=0.0
4674: loss=0.048, reward_mean=0.0, reward_bound=0.0
4675: loss=0.068, reward_mean=0.1, reward_bound=0.0
4676: loss=0.028, reward_mean=0.0, reward_bound=0.0
4677: loss=0.058, reward_mean=0.1, reward_bound=0.0
4678: loss=0.018, reward_mean=0.1, reward_bound=0.0
4679: loss=0.007, reward_mean=0.0, reward_bound=0.0
4680: loss=0.050, reward_mean=0.0, reward_bound=0.0
4681: loss=0.052, reward_mean=0.0, reward_bound=0.0
4682: loss=0.004, reward_mean=0.1, reward_bound=0.0
4683: loss=0.029, reward_mean=0.0, reward_bound=0.0
4684: loss=0.049, reward_mean=0.1, reward_bound=0.0
4685: loss=0.026, reward_mean=0.1, reward_bound=0.0
4686: loss=0.032, reward_mean=0.2, reward_bound=0.0
4687: loss=0.041, reward_mean=0.0, reward_bound=0.0
4688: loss=0.042, reward_mean=0.0, reward_bound=0.0
4689: loss=0.029, reward_mean=0.0, reward_bound=0.0
4690: loss=0.172, reward_mean=0.1, reward_bound=0.0
4691: loss=0

4833: loss=0.016, reward_mean=0.1, reward_bound=0.0
4834: loss=0.033, reward_mean=0.1, reward_bound=0.0
4835: loss=0.010, reward_mean=0.0, reward_bound=0.0
4836: loss=0.031, reward_mean=0.0, reward_bound=0.0
4837: loss=0.004, reward_mean=0.0, reward_bound=0.0
4838: loss=0.075, reward_mean=0.1, reward_bound=0.0
4839: loss=0.076, reward_mean=0.1, reward_bound=0.0
4840: loss=0.062, reward_mean=0.0, reward_bound=0.0
4841: loss=0.032, reward_mean=0.1, reward_bound=0.0
4842: loss=0.003, reward_mean=0.0, reward_bound=0.0
4843: loss=0.010, reward_mean=0.0, reward_bound=0.0
4844: loss=0.068, reward_mean=0.0, reward_bound=0.0
4845: loss=0.000, reward_mean=0.1, reward_bound=0.0
4846: loss=0.000, reward_mean=0.0, reward_bound=0.0
4847: loss=0.050, reward_mean=0.1, reward_bound=0.0
4848: loss=0.008, reward_mean=0.1, reward_bound=0.0
4849: loss=0.012, reward_mean=0.0, reward_bound=0.0
4850: loss=0.006, reward_mean=0.1, reward_bound=0.0
4851: loss=0.005, reward_mean=0.1, reward_bound=0.0
4852: loss=0

4995: loss=0.030, reward_mean=0.1, reward_bound=0.0
4996: loss=0.034, reward_mean=0.1, reward_bound=0.0
4997: loss=0.041, reward_mean=0.1, reward_bound=0.0
4998: loss=0.037, reward_mean=0.0, reward_bound=0.0
4999: loss=0.099, reward_mean=0.0, reward_bound=0.0
5000: loss=0.001, reward_mean=0.0, reward_bound=0.0
5001: loss=0.012, reward_mean=0.0, reward_bound=0.0
5002: loss=0.005, reward_mean=0.2, reward_bound=0.0
5003: loss=0.003, reward_mean=0.0, reward_bound=0.0
5004: loss=0.060, reward_mean=0.0, reward_bound=0.0
5005: loss=0.032, reward_mean=0.1, reward_bound=0.0
5006: loss=0.007, reward_mean=0.1, reward_bound=0.0
5007: loss=0.035, reward_mean=0.2, reward_bound=0.0
5008: loss=0.041, reward_mean=0.1, reward_bound=0.0
5009: loss=0.004, reward_mean=0.1, reward_bound=0.0
5010: loss=0.004, reward_mean=0.0, reward_bound=0.0
5011: loss=0.061, reward_mean=0.0, reward_bound=0.0
5012: loss=0.080, reward_mean=0.0, reward_bound=0.0
5013: loss=0.042, reward_mean=0.1, reward_bound=0.0
5014: loss=0

5154: loss=0.049, reward_mean=0.0, reward_bound=0.0
5155: loss=0.042, reward_mean=0.0, reward_bound=0.0
5156: loss=0.029, reward_mean=0.0, reward_bound=0.0
5157: loss=0.012, reward_mean=0.1, reward_bound=0.0
5158: loss=0.104, reward_mean=0.1, reward_bound=0.0
5159: loss=0.069, reward_mean=0.0, reward_bound=0.0
5160: loss=0.002, reward_mean=0.0, reward_bound=0.0
5161: loss=0.004, reward_mean=0.1, reward_bound=0.0
5162: loss=0.035, reward_mean=0.1, reward_bound=0.0
5163: loss=0.046, reward_mean=0.1, reward_bound=0.0
5164: loss=0.064, reward_mean=0.1, reward_bound=0.0
5165: loss=0.104, reward_mean=0.0, reward_bound=0.0
5166: loss=0.049, reward_mean=0.1, reward_bound=0.0
5167: loss=0.033, reward_mean=0.1, reward_bound=0.0
5168: loss=0.007, reward_mean=0.1, reward_bound=0.0
5169: loss=0.013, reward_mean=0.1, reward_bound=0.0
5170: loss=0.066, reward_mean=0.0, reward_bound=0.0
5171: loss=0.002, reward_mean=0.1, reward_bound=0.0
5172: loss=0.021, reward_mean=0.1, reward_bound=0.0
5173: loss=0

5317: loss=0.004, reward_mean=0.0, reward_bound=0.0
5318: loss=0.003, reward_mean=0.0, reward_bound=0.0
5319: loss=0.000, reward_mean=0.1, reward_bound=0.0
5320: loss=0.069, reward_mean=0.1, reward_bound=0.0
5321: loss=0.003, reward_mean=0.0, reward_bound=0.0
5322: loss=0.002, reward_mean=0.0, reward_bound=0.0
5323: loss=0.001, reward_mean=0.0, reward_bound=0.0
5324: loss=0.043, reward_mean=0.0, reward_bound=0.0
5325: loss=0.042, reward_mean=0.1, reward_bound=0.0
5326: loss=0.004, reward_mean=0.1, reward_bound=0.0
5327: loss=0.095, reward_mean=0.0, reward_bound=0.0
5328: loss=0.002, reward_mean=0.1, reward_bound=0.0
5329: loss=0.002, reward_mean=0.0, reward_bound=0.0
5330: loss=0.047, reward_mean=0.0, reward_bound=0.0
5331: loss=0.008, reward_mean=0.0, reward_bound=0.0
5332: loss=0.005, reward_mean=0.0, reward_bound=0.0
5333: loss=0.039, reward_mean=0.1, reward_bound=0.0
5334: loss=0.059, reward_mean=0.1, reward_bound=0.0
5335: loss=0.006, reward_mean=0.1, reward_bound=0.0
5336: loss=0

5476: loss=0.002, reward_mean=0.1, reward_bound=0.0
5477: loss=0.001, reward_mean=0.1, reward_bound=0.0
5478: loss=0.002, reward_mean=0.0, reward_bound=0.0
5479: loss=0.047, reward_mean=0.1, reward_bound=0.0
5480: loss=0.002, reward_mean=0.0, reward_bound=0.0
5481: loss=0.004, reward_mean=0.1, reward_bound=0.0
5482: loss=0.002, reward_mean=0.2, reward_bound=0.0
5483: loss=0.002, reward_mean=0.1, reward_bound=0.0
5484: loss=0.004, reward_mean=0.0, reward_bound=0.0
5485: loss=0.004, reward_mean=0.0, reward_bound=0.0
5486: loss=0.003, reward_mean=0.0, reward_bound=0.0
5487: loss=0.003, reward_mean=0.1, reward_bound=0.0
5488: loss=0.033, reward_mean=0.1, reward_bound=0.0
5489: loss=0.001, reward_mean=0.0, reward_bound=0.0
5490: loss=0.004, reward_mean=0.1, reward_bound=0.0
5491: loss=0.002, reward_mean=0.1, reward_bound=0.0
5492: loss=0.039, reward_mean=0.1, reward_bound=0.0
5493: loss=0.002, reward_mean=0.1, reward_bound=0.0
5494: loss=0.005, reward_mean=0.0, reward_bound=0.0
5495: loss=0

5636: loss=0.001, reward_mean=0.0, reward_bound=0.0
5637: loss=0.003, reward_mean=0.1, reward_bound=0.0
5638: loss=0.001, reward_mean=0.0, reward_bound=0.0
5639: loss=0.001, reward_mean=0.1, reward_bound=0.0
5640: loss=0.005, reward_mean=0.1, reward_bound=0.0
5641: loss=0.001, reward_mean=0.1, reward_bound=0.0
5642: loss=0.002, reward_mean=0.1, reward_bound=0.0
5643: loss=0.053, reward_mean=0.0, reward_bound=0.0
5644: loss=0.004, reward_mean=0.1, reward_bound=0.0
5645: loss=0.002, reward_mean=0.0, reward_bound=0.0
5646: loss=0.004, reward_mean=0.1, reward_bound=0.0
5647: loss=0.002, reward_mean=0.0, reward_bound=0.0
5648: loss=0.040, reward_mean=0.0, reward_bound=0.0
5649: loss=0.003, reward_mean=0.0, reward_bound=0.0
5650: loss=0.002, reward_mean=0.0, reward_bound=0.0
5651: loss=0.003, reward_mean=0.1, reward_bound=0.0
5652: loss=0.002, reward_mean=0.0, reward_bound=0.0
5653: loss=0.003, reward_mean=0.0, reward_bound=0.0
5654: loss=0.003, reward_mean=0.1, reward_bound=0.0
5655: loss=0

5795: loss=0.039, reward_mean=0.1, reward_bound=0.0
5796: loss=0.005, reward_mean=0.1, reward_bound=0.0
5797: loss=0.041, reward_mean=0.1, reward_bound=0.0
5798: loss=0.001, reward_mean=0.0, reward_bound=0.0
5799: loss=0.049, reward_mean=0.1, reward_bound=0.0
5800: loss=0.051, reward_mean=0.1, reward_bound=0.0
5801: loss=0.003, reward_mean=0.0, reward_bound=0.0
5802: loss=0.036, reward_mean=0.1, reward_bound=0.0
5803: loss=0.036, reward_mean=0.0, reward_bound=0.0
5804: loss=0.004, reward_mean=0.2, reward_bound=0.0
5805: loss=0.039, reward_mean=0.0, reward_bound=0.0
5806: loss=0.006, reward_mean=0.0, reward_bound=0.0
5807: loss=0.037, reward_mean=0.1, reward_bound=0.0
5808: loss=0.040, reward_mean=0.0, reward_bound=0.0
5809: loss=0.013, reward_mean=0.0, reward_bound=0.0
5810: loss=0.001, reward_mean=0.0, reward_bound=0.0
5811: loss=0.000, reward_mean=0.1, reward_bound=0.0
5812: loss=0.009, reward_mean=0.0, reward_bound=0.0
5813: loss=0.058, reward_mean=0.1, reward_bound=0.0
5814: loss=0

5953: loss=0.003, reward_mean=0.0, reward_bound=0.0
5954: loss=0.003, reward_mean=0.1, reward_bound=0.0
5955: loss=0.002, reward_mean=0.1, reward_bound=0.0
5956: loss=0.002, reward_mean=0.2, reward_bound=0.0
5957: loss=0.001, reward_mean=0.0, reward_bound=0.0
5958: loss=0.003, reward_mean=0.1, reward_bound=0.0
5959: loss=0.040, reward_mean=0.0, reward_bound=0.0
5960: loss=0.003, reward_mean=0.1, reward_bound=0.0
5961: loss=0.002, reward_mean=0.1, reward_bound=0.0
5962: loss=0.002, reward_mean=0.0, reward_bound=0.0
5963: loss=0.000, reward_mean=0.1, reward_bound=0.0
5964: loss=0.001, reward_mean=0.0, reward_bound=0.0
5965: loss=0.001, reward_mean=0.1, reward_bound=0.0
5966: loss=0.001, reward_mean=0.1, reward_bound=0.0
5967: loss=0.001, reward_mean=0.0, reward_bound=0.0
5968: loss=0.001, reward_mean=0.1, reward_bound=0.0
5969: loss=0.000, reward_mean=0.1, reward_bound=0.0
5970: loss=0.003, reward_mean=0.0, reward_bound=0.0
5971: loss=0.003, reward_mean=0.0, reward_bound=0.0
5972: loss=0

6114: loss=0.000, reward_mean=0.1, reward_bound=0.0
6115: loss=0.000, reward_mean=0.1, reward_bound=0.0
6116: loss=0.000, reward_mean=0.1, reward_bound=0.0
6117: loss=0.000, reward_mean=0.1, reward_bound=0.0
6118: loss=0.000, reward_mean=0.1, reward_bound=0.0
6119: loss=0.067, reward_mean=0.0, reward_bound=0.0
6120: loss=0.000, reward_mean=0.1, reward_bound=0.0
6121: loss=0.000, reward_mean=0.1, reward_bound=0.0
6122: loss=0.000, reward_mean=0.1, reward_bound=0.0
6123: loss=0.000, reward_mean=0.0, reward_bound=0.0
6124: loss=0.000, reward_mean=0.1, reward_bound=0.0
6125: loss=0.000, reward_mean=0.1, reward_bound=0.0
6126: loss=0.000, reward_mean=0.2, reward_bound=0.0
6127: loss=0.001, reward_mean=0.1, reward_bound=0.0
6128: loss=0.000, reward_mean=0.1, reward_bound=0.0
6129: loss=0.000, reward_mean=0.1, reward_bound=0.0
6130: loss=0.000, reward_mean=0.1, reward_bound=0.0
6131: loss=0.000, reward_mean=0.0, reward_bound=0.0
6132: loss=0.069, reward_mean=0.1, reward_bound=0.0
6133: loss=0

6277: loss=0.074, reward_mean=0.0, reward_bound=0.0
6278: loss=0.000, reward_mean=0.0, reward_bound=0.0
6279: loss=0.000, reward_mean=0.1, reward_bound=0.0
6280: loss=0.000, reward_mean=0.1, reward_bound=0.0
6281: loss=0.001, reward_mean=0.0, reward_bound=0.0
6282: loss=0.001, reward_mean=0.0, reward_bound=0.0
6283: loss=0.001, reward_mean=0.2, reward_bound=0.0
6284: loss=0.059, reward_mean=0.1, reward_bound=0.0
6285: loss=0.001, reward_mean=0.0, reward_bound=0.0
6286: loss=0.001, reward_mean=0.1, reward_bound=0.0
6287: loss=0.002, reward_mean=0.0, reward_bound=0.0
6288: loss=0.000, reward_mean=0.1, reward_bound=0.0
6289: loss=0.047, reward_mean=0.0, reward_bound=0.0
6290: loss=0.001, reward_mean=0.0, reward_bound=0.0
6291: loss=0.003, reward_mean=0.0, reward_bound=0.0
6292: loss=0.002, reward_mean=0.0, reward_bound=0.0
6293: loss=0.001, reward_mean=0.0, reward_bound=0.0
6294: loss=0.049, reward_mean=0.0, reward_bound=0.0
6295: loss=0.043, reward_mean=0.0, reward_bound=0.0
6296: loss=0

6435: loss=0.000, reward_mean=0.1, reward_bound=0.0
6436: loss=0.001, reward_mean=0.0, reward_bound=0.0
6437: loss=0.001, reward_mean=0.1, reward_bound=0.0
6438: loss=0.001, reward_mean=0.0, reward_bound=0.0
6439: loss=0.001, reward_mean=0.0, reward_bound=0.0
6440: loss=0.001, reward_mean=0.1, reward_bound=0.0
6441: loss=0.000, reward_mean=0.1, reward_bound=0.0
6442: loss=0.001, reward_mean=0.1, reward_bound=0.0
6443: loss=0.001, reward_mean=0.1, reward_bound=0.0
6444: loss=0.001, reward_mean=0.1, reward_bound=0.0
6445: loss=0.000, reward_mean=0.0, reward_bound=0.0
6446: loss=0.001, reward_mean=0.0, reward_bound=0.0
6447: loss=0.001, reward_mean=0.0, reward_bound=0.0
6448: loss=0.000, reward_mean=0.1, reward_bound=0.0
6449: loss=0.001, reward_mean=0.1, reward_bound=0.0
6450: loss=0.000, reward_mean=0.0, reward_bound=0.0
6451: loss=0.000, reward_mean=0.0, reward_bound=0.0
6452: loss=0.000, reward_mean=0.1, reward_bound=0.0
6453: loss=0.000, reward_mean=0.1, reward_bound=0.0
6454: loss=0

6595: loss=0.000, reward_mean=0.0, reward_bound=0.0
6596: loss=0.000, reward_mean=0.1, reward_bound=0.0
6597: loss=0.000, reward_mean=0.0, reward_bound=0.0
6598: loss=0.000, reward_mean=0.0, reward_bound=0.0
6599: loss=0.000, reward_mean=0.0, reward_bound=0.0
6600: loss=0.000, reward_mean=0.0, reward_bound=0.0
6601: loss=0.000, reward_mean=0.0, reward_bound=0.0
6602: loss=0.000, reward_mean=0.1, reward_bound=0.0
6603: loss=0.000, reward_mean=0.1, reward_bound=0.0
6604: loss=0.000, reward_mean=0.1, reward_bound=0.0
6605: loss=0.000, reward_mean=0.1, reward_bound=0.0
6606: loss=0.000, reward_mean=0.1, reward_bound=0.0
6607: loss=0.000, reward_mean=0.1, reward_bound=0.0
6608: loss=0.000, reward_mean=0.1, reward_bound=0.0
6609: loss=0.000, reward_mean=0.1, reward_bound=0.0
6610: loss=0.000, reward_mean=0.0, reward_bound=0.0
6611: loss=0.000, reward_mean=0.1, reward_bound=0.0
6612: loss=0.000, reward_mean=0.0, reward_bound=0.0
6613: loss=0.000, reward_mean=0.0, reward_bound=0.0
6614: loss=0

6757: loss=0.000, reward_mean=0.1, reward_bound=0.0
6758: loss=0.000, reward_mean=0.1, reward_bound=0.0
6759: loss=0.000, reward_mean=0.1, reward_bound=0.0
6760: loss=0.000, reward_mean=0.1, reward_bound=0.0
6761: loss=0.000, reward_mean=0.0, reward_bound=0.0
6762: loss=0.000, reward_mean=0.1, reward_bound=0.0
6763: loss=0.000, reward_mean=0.0, reward_bound=0.0
6764: loss=0.000, reward_mean=0.1, reward_bound=0.0
6765: loss=0.000, reward_mean=0.1, reward_bound=0.0
6766: loss=0.000, reward_mean=0.0, reward_bound=0.0
6767: loss=0.000, reward_mean=0.1, reward_bound=0.0
6768: loss=0.000, reward_mean=0.2, reward_bound=0.0
6769: loss=0.000, reward_mean=0.0, reward_bound=0.0
6770: loss=0.000, reward_mean=0.0, reward_bound=0.0
6771: loss=0.000, reward_mean=0.1, reward_bound=0.0
6772: loss=0.000, reward_mean=0.1, reward_bound=0.0
6773: loss=0.000, reward_mean=0.0, reward_bound=0.0
6774: loss=0.000, reward_mean=0.1, reward_bound=0.0
6775: loss=0.000, reward_mean=0.0, reward_bound=0.0
6776: loss=0

6915: loss=0.000, reward_mean=0.1, reward_bound=0.0
6916: loss=0.000, reward_mean=0.1, reward_bound=0.0
6917: loss=0.000, reward_mean=0.0, reward_bound=0.0
6918: loss=0.000, reward_mean=0.1, reward_bound=0.0
6919: loss=0.000, reward_mean=0.0, reward_bound=0.0
6920: loss=0.000, reward_mean=0.1, reward_bound=0.0
6921: loss=0.000, reward_mean=0.1, reward_bound=0.0
6922: loss=0.000, reward_mean=0.2, reward_bound=0.0
6923: loss=0.000, reward_mean=0.1, reward_bound=0.0
6924: loss=0.000, reward_mean=0.1, reward_bound=0.0
6925: loss=0.000, reward_mean=0.0, reward_bound=0.0
6926: loss=0.000, reward_mean=0.1, reward_bound=0.0
6927: loss=0.000, reward_mean=0.1, reward_bound=0.0
6928: loss=0.000, reward_mean=0.1, reward_bound=0.0
6929: loss=0.000, reward_mean=0.0, reward_bound=0.0
6930: loss=0.000, reward_mean=0.1, reward_bound=0.0
6931: loss=0.000, reward_mean=0.1, reward_bound=0.0
6932: loss=0.000, reward_mean=0.0, reward_bound=0.0
6933: loss=0.000, reward_mean=0.1, reward_bound=0.0
6934: loss=0

7073: loss=0.000, reward_mean=0.0, reward_bound=0.0
7074: loss=0.000, reward_mean=0.0, reward_bound=0.0
7075: loss=0.000, reward_mean=0.0, reward_bound=0.0
7076: loss=0.000, reward_mean=0.1, reward_bound=0.0
7077: loss=0.000, reward_mean=0.0, reward_bound=0.0
7078: loss=0.000, reward_mean=0.1, reward_bound=0.0
7079: loss=0.000, reward_mean=0.1, reward_bound=0.0
7080: loss=0.000, reward_mean=0.0, reward_bound=0.0
7081: loss=0.000, reward_mean=0.1, reward_bound=0.0
7082: loss=0.000, reward_mean=0.1, reward_bound=0.0
7083: loss=0.000, reward_mean=0.0, reward_bound=0.0
7084: loss=0.000, reward_mean=0.1, reward_bound=0.0
7085: loss=0.000, reward_mean=0.0, reward_bound=0.0
7086: loss=0.000, reward_mean=0.2, reward_bound=0.0
7087: loss=0.000, reward_mean=0.1, reward_bound=0.0
7088: loss=0.000, reward_mean=0.0, reward_bound=0.0
7089: loss=0.000, reward_mean=0.0, reward_bound=0.0
7090: loss=0.000, reward_mean=0.0, reward_bound=0.0
7091: loss=0.000, reward_mean=0.0, reward_bound=0.0
7092: loss=0

7230: loss=0.000, reward_mean=0.1, reward_bound=0.0
7231: loss=0.000, reward_mean=0.0, reward_bound=0.0
7232: loss=0.000, reward_mean=0.0, reward_bound=0.0
7233: loss=0.000, reward_mean=0.0, reward_bound=0.0
7234: loss=0.000, reward_mean=0.0, reward_bound=0.0
7235: loss=0.000, reward_mean=0.1, reward_bound=0.0
7236: loss=0.000, reward_mean=0.1, reward_bound=0.0
7237: loss=0.000, reward_mean=0.1, reward_bound=0.0
7238: loss=0.000, reward_mean=0.2, reward_bound=0.0
7239: loss=0.000, reward_mean=0.1, reward_bound=0.0
7240: loss=0.000, reward_mean=0.1, reward_bound=0.0
7241: loss=0.000, reward_mean=0.0, reward_bound=0.0
7242: loss=0.000, reward_mean=0.1, reward_bound=0.0
7243: loss=0.000, reward_mean=0.0, reward_bound=0.0
7244: loss=0.000, reward_mean=0.0, reward_bound=0.0
7245: loss=0.000, reward_mean=0.1, reward_bound=0.0
7246: loss=0.000, reward_mean=0.1, reward_bound=0.0
7247: loss=0.000, reward_mean=0.0, reward_bound=0.0
7248: loss=0.000, reward_mean=0.0, reward_bound=0.0
7249: loss=0

7390: loss=0.000, reward_mean=0.0, reward_bound=0.0
7391: loss=0.000, reward_mean=0.0, reward_bound=0.0
7392: loss=0.000, reward_mean=0.2, reward_bound=0.0
7393: loss=0.000, reward_mean=0.1, reward_bound=0.0
7394: loss=0.000, reward_mean=0.0, reward_bound=0.0
7395: loss=0.000, reward_mean=0.1, reward_bound=0.0
7396: loss=0.000, reward_mean=0.1, reward_bound=0.0
7397: loss=0.000, reward_mean=0.0, reward_bound=0.0
7398: loss=0.000, reward_mean=0.0, reward_bound=0.0
7399: loss=0.000, reward_mean=0.0, reward_bound=0.0
7400: loss=0.000, reward_mean=0.1, reward_bound=0.0
7401: loss=0.000, reward_mean=0.0, reward_bound=0.0
7402: loss=0.000, reward_mean=0.1, reward_bound=0.0
7403: loss=0.000, reward_mean=0.1, reward_bound=0.0
7404: loss=0.000, reward_mean=0.1, reward_bound=0.0
7405: loss=0.000, reward_mean=0.1, reward_bound=0.0
7406: loss=0.000, reward_mean=0.1, reward_bound=0.0
7407: loss=0.000, reward_mean=0.1, reward_bound=0.0
7408: loss=0.000, reward_mean=0.0, reward_bound=0.0
7409: loss=0

7549: loss=0.000, reward_mean=0.1, reward_bound=0.0
7550: loss=0.000, reward_mean=0.0, reward_bound=0.0
7551: loss=0.000, reward_mean=0.0, reward_bound=0.0
7552: loss=0.000, reward_mean=0.1, reward_bound=0.0
7553: loss=0.000, reward_mean=0.0, reward_bound=0.0
7554: loss=0.000, reward_mean=0.1, reward_bound=0.0
7555: loss=0.000, reward_mean=0.1, reward_bound=0.0
7556: loss=0.000, reward_mean=0.0, reward_bound=0.0
7557: loss=0.000, reward_mean=0.1, reward_bound=0.0
7558: loss=0.000, reward_mean=0.0, reward_bound=0.0
7559: loss=0.000, reward_mean=0.1, reward_bound=0.0
7560: loss=0.000, reward_mean=0.1, reward_bound=0.0
7561: loss=0.000, reward_mean=0.1, reward_bound=0.0
7562: loss=0.000, reward_mean=0.1, reward_bound=0.0
7563: loss=0.000, reward_mean=0.0, reward_bound=0.0
7564: loss=0.000, reward_mean=0.0, reward_bound=0.0
7565: loss=0.000, reward_mean=0.0, reward_bound=0.0
7566: loss=0.000, reward_mean=0.0, reward_bound=0.0
7567: loss=0.000, reward_mean=0.1, reward_bound=0.0
7568: loss=0

7708: loss=0.000, reward_mean=0.1, reward_bound=0.0
7709: loss=0.000, reward_mean=0.1, reward_bound=0.0
7710: loss=0.000, reward_mean=0.1, reward_bound=0.0
7711: loss=0.000, reward_mean=0.0, reward_bound=0.0
7712: loss=0.000, reward_mean=0.0, reward_bound=0.0
7713: loss=0.000, reward_mean=0.1, reward_bound=0.0
7714: loss=0.000, reward_mean=0.1, reward_bound=0.0
7715: loss=0.000, reward_mean=0.1, reward_bound=0.0
7716: loss=0.000, reward_mean=0.0, reward_bound=0.0
7717: loss=0.000, reward_mean=0.1, reward_bound=0.0
7718: loss=0.000, reward_mean=0.1, reward_bound=0.0
7719: loss=0.000, reward_mean=0.2, reward_bound=0.0
7720: loss=0.000, reward_mean=0.1, reward_bound=0.0
7721: loss=0.000, reward_mean=0.0, reward_bound=0.0
7722: loss=0.000, reward_mean=0.0, reward_bound=0.0
7723: loss=0.000, reward_mean=0.0, reward_bound=0.0
7724: loss=0.000, reward_mean=0.0, reward_bound=0.0
7725: loss=0.000, reward_mean=0.0, reward_bound=0.0
7726: loss=0.000, reward_mean=0.0, reward_bound=0.0
7727: loss=0

7866: loss=0.000, reward_mean=0.1, reward_bound=0.0
7867: loss=0.000, reward_mean=0.1, reward_bound=0.0
7868: loss=0.000, reward_mean=0.0, reward_bound=0.0
7869: loss=0.000, reward_mean=0.0, reward_bound=0.0
7870: loss=0.000, reward_mean=0.1, reward_bound=0.0
7871: loss=0.000, reward_mean=0.0, reward_bound=0.0
7872: loss=0.000, reward_mean=0.1, reward_bound=0.0
7873: loss=0.000, reward_mean=0.0, reward_bound=0.0
7874: loss=0.000, reward_mean=0.1, reward_bound=0.0
7875: loss=0.000, reward_mean=0.2, reward_bound=0.0
7876: loss=0.000, reward_mean=0.1, reward_bound=0.0
7877: loss=0.000, reward_mean=0.1, reward_bound=0.0
7878: loss=0.000, reward_mean=0.0, reward_bound=0.0
7879: loss=0.000, reward_mean=0.0, reward_bound=0.0
7880: loss=0.000, reward_mean=0.0, reward_bound=0.0
7881: loss=0.000, reward_mean=0.0, reward_bound=0.0
7882: loss=0.000, reward_mean=0.0, reward_bound=0.0
7883: loss=0.000, reward_mean=0.1, reward_bound=0.0
7884: loss=0.000, reward_mean=0.1, reward_bound=0.0
7885: loss=0

8028: loss=0.000, reward_mean=0.0, reward_bound=0.0
8029: loss=0.000, reward_mean=0.0, reward_bound=0.0
8030: loss=0.000, reward_mean=0.0, reward_bound=0.0
8031: loss=0.000, reward_mean=0.1, reward_bound=0.0
8032: loss=0.000, reward_mean=0.1, reward_bound=0.0
8033: loss=0.000, reward_mean=0.1, reward_bound=0.0
8034: loss=0.000, reward_mean=0.1, reward_bound=0.0
8035: loss=0.000, reward_mean=0.0, reward_bound=0.0
8036: loss=0.000, reward_mean=0.1, reward_bound=0.0
8037: loss=0.000, reward_mean=0.1, reward_bound=0.0
8038: loss=0.000, reward_mean=0.1, reward_bound=0.0
8039: loss=0.000, reward_mean=0.1, reward_bound=0.0
8040: loss=0.000, reward_mean=0.0, reward_bound=0.0
8041: loss=0.000, reward_mean=0.0, reward_bound=0.0
8042: loss=0.000, reward_mean=0.2, reward_bound=0.0
8043: loss=0.000, reward_mean=0.2, reward_bound=0.0
8044: loss=0.000, reward_mean=0.0, reward_bound=0.0
8045: loss=0.000, reward_mean=0.1, reward_bound=0.0
8046: loss=0.000, reward_mean=0.0, reward_bound=0.0
8047: loss=0

8190: loss=0.000, reward_mean=0.1, reward_bound=0.0
8191: loss=0.000, reward_mean=0.1, reward_bound=0.0
8192: loss=0.000, reward_mean=0.0, reward_bound=0.0
8193: loss=0.000, reward_mean=0.0, reward_bound=0.0
8194: loss=0.000, reward_mean=0.1, reward_bound=0.0
8195: loss=0.000, reward_mean=0.1, reward_bound=0.0
8196: loss=0.000, reward_mean=0.0, reward_bound=0.0
8197: loss=0.000, reward_mean=0.0, reward_bound=0.0
8198: loss=0.000, reward_mean=0.1, reward_bound=0.0
8199: loss=0.000, reward_mean=0.0, reward_bound=0.0
8200: loss=0.000, reward_mean=0.1, reward_bound=0.0
8201: loss=0.000, reward_mean=0.0, reward_bound=0.0
8202: loss=0.000, reward_mean=0.0, reward_bound=0.0
8203: loss=0.000, reward_mean=0.1, reward_bound=0.0
8204: loss=0.000, reward_mean=0.0, reward_bound=0.0
8205: loss=0.000, reward_mean=0.0, reward_bound=0.0
8206: loss=0.000, reward_mean=0.0, reward_bound=0.0
8207: loss=0.000, reward_mean=0.1, reward_bound=0.0
8208: loss=0.000, reward_mean=0.1, reward_bound=0.0
8209: loss=0

8350: loss=0.000, reward_mean=0.1, reward_bound=0.0
8351: loss=0.000, reward_mean=0.0, reward_bound=0.0
8352: loss=0.000, reward_mean=0.0, reward_bound=0.0
8353: loss=0.000, reward_mean=0.0, reward_bound=0.0
8354: loss=0.000, reward_mean=0.0, reward_bound=0.0
8355: loss=0.000, reward_mean=0.0, reward_bound=0.0
8356: loss=0.000, reward_mean=0.1, reward_bound=0.0
8357: loss=0.000, reward_mean=0.1, reward_bound=0.0
8358: loss=0.000, reward_mean=0.1, reward_bound=0.0
8359: loss=0.000, reward_mean=0.0, reward_bound=0.0
8360: loss=0.000, reward_mean=0.1, reward_bound=0.0
8361: loss=0.000, reward_mean=0.0, reward_bound=0.0
8362: loss=0.000, reward_mean=0.1, reward_bound=0.0
8363: loss=0.000, reward_mean=0.1, reward_bound=0.0
8364: loss=0.000, reward_mean=0.0, reward_bound=0.0
8365: loss=0.000, reward_mean=0.0, reward_bound=0.0
8366: loss=0.000, reward_mean=0.1, reward_bound=0.0
8367: loss=0.000, reward_mean=0.1, reward_bound=0.0
8368: loss=0.000, reward_mean=0.1, reward_bound=0.0
8369: loss=0

8512: loss=0.000, reward_mean=0.1, reward_bound=0.0
8513: loss=0.000, reward_mean=0.1, reward_bound=0.0
8514: loss=0.000, reward_mean=0.1, reward_bound=0.0
8515: loss=0.000, reward_mean=0.0, reward_bound=0.0
8516: loss=0.000, reward_mean=0.0, reward_bound=0.0
8517: loss=0.000, reward_mean=0.2, reward_bound=0.0
8518: loss=0.000, reward_mean=0.1, reward_bound=0.0
8519: loss=0.000, reward_mean=0.1, reward_bound=0.0
8520: loss=0.000, reward_mean=0.1, reward_bound=0.0
8521: loss=0.000, reward_mean=0.0, reward_bound=0.0
8522: loss=0.000, reward_mean=0.0, reward_bound=0.0
8523: loss=0.000, reward_mean=0.0, reward_bound=0.0
8524: loss=0.000, reward_mean=0.0, reward_bound=0.0
8525: loss=0.000, reward_mean=0.1, reward_bound=0.0
8526: loss=0.000, reward_mean=0.1, reward_bound=0.0
8527: loss=0.000, reward_mean=0.0, reward_bound=0.0
8528: loss=0.000, reward_mean=0.0, reward_bound=0.0
8529: loss=0.000, reward_mean=0.0, reward_bound=0.0
8530: loss=0.000, reward_mean=0.1, reward_bound=0.0
8531: loss=0

8674: loss=0.000, reward_mean=0.0, reward_bound=0.0
8675: loss=0.000, reward_mean=0.0, reward_bound=0.0
8676: loss=0.000, reward_mean=0.1, reward_bound=0.0
8677: loss=0.000, reward_mean=0.1, reward_bound=0.0
8678: loss=0.000, reward_mean=0.0, reward_bound=0.0
8679: loss=0.000, reward_mean=0.0, reward_bound=0.0
8680: loss=0.000, reward_mean=0.1, reward_bound=0.0
8681: loss=0.000, reward_mean=0.0, reward_bound=0.0
8682: loss=0.000, reward_mean=0.0, reward_bound=0.0
8683: loss=0.000, reward_mean=0.1, reward_bound=0.0
8684: loss=0.000, reward_mean=0.1, reward_bound=0.0
8685: loss=0.000, reward_mean=0.0, reward_bound=0.0
8686: loss=0.000, reward_mean=0.1, reward_bound=0.0
8687: loss=0.000, reward_mean=0.0, reward_bound=0.0
8688: loss=0.000, reward_mean=0.0, reward_bound=0.0
8689: loss=0.000, reward_mean=0.1, reward_bound=0.0
8690: loss=0.000, reward_mean=0.1, reward_bound=0.0
8691: loss=0.000, reward_mean=0.1, reward_bound=0.0
8692: loss=0.000, reward_mean=0.0, reward_bound=0.0
8693: loss=0

8836: loss=0.000, reward_mean=0.0, reward_bound=0.0
8837: loss=0.000, reward_mean=0.1, reward_bound=0.0
8838: loss=0.000, reward_mean=0.1, reward_bound=0.0
8839: loss=0.000, reward_mean=0.0, reward_bound=0.0
8840: loss=0.000, reward_mean=0.0, reward_bound=0.0
8841: loss=0.000, reward_mean=0.1, reward_bound=0.0
8842: loss=0.000, reward_mean=0.0, reward_bound=0.0
8843: loss=0.000, reward_mean=0.2, reward_bound=0.0
8844: loss=0.000, reward_mean=0.0, reward_bound=0.0
8845: loss=0.000, reward_mean=0.0, reward_bound=0.0
8846: loss=0.000, reward_mean=0.1, reward_bound=0.0
8847: loss=0.000, reward_mean=0.1, reward_bound=0.0
8848: loss=0.000, reward_mean=0.1, reward_bound=0.0
8849: loss=0.000, reward_mean=0.1, reward_bound=0.0
8850: loss=0.000, reward_mean=0.1, reward_bound=0.0
8851: loss=0.000, reward_mean=0.0, reward_bound=0.0
8852: loss=0.000, reward_mean=0.0, reward_bound=0.0
8853: loss=0.000, reward_mean=0.1, reward_bound=0.0
8854: loss=0.000, reward_mean=0.1, reward_bound=0.0
8855: loss=0

8996: loss=0.000, reward_mean=0.1, reward_bound=0.0
8997: loss=0.000, reward_mean=0.0, reward_bound=0.0
8998: loss=0.000, reward_mean=0.1, reward_bound=0.0
8999: loss=0.000, reward_mean=0.0, reward_bound=0.0
9000: loss=0.000, reward_mean=0.1, reward_bound=0.0
9001: loss=0.000, reward_mean=0.1, reward_bound=0.0
9002: loss=0.000, reward_mean=0.0, reward_bound=0.0
9003: loss=0.000, reward_mean=0.1, reward_bound=0.0
9004: loss=0.000, reward_mean=0.0, reward_bound=0.0
9005: loss=0.000, reward_mean=0.1, reward_bound=0.0
9006: loss=0.000, reward_mean=0.0, reward_bound=0.0
9007: loss=0.000, reward_mean=0.1, reward_bound=0.0
9008: loss=0.000, reward_mean=0.0, reward_bound=0.0
9009: loss=0.000, reward_mean=0.1, reward_bound=0.0
9010: loss=0.000, reward_mean=0.0, reward_bound=0.0
9011: loss=0.000, reward_mean=0.0, reward_bound=0.0
9012: loss=0.000, reward_mean=0.1, reward_bound=0.0
9013: loss=0.000, reward_mean=0.1, reward_bound=0.0
9014: loss=0.000, reward_mean=0.1, reward_bound=0.0
9015: loss=0

9156: loss=0.087, reward_mean=0.1, reward_bound=0.0
9157: loss=0.000, reward_mean=0.2, reward_bound=0.0
9158: loss=0.000, reward_mean=0.0, reward_bound=0.0
9159: loss=0.000, reward_mean=0.0, reward_bound=0.0
9160: loss=0.000, reward_mean=0.0, reward_bound=0.0
9161: loss=0.000, reward_mean=0.0, reward_bound=0.0
9162: loss=0.000, reward_mean=0.1, reward_bound=0.0
9163: loss=0.000, reward_mean=0.0, reward_bound=0.0
9164: loss=0.000, reward_mean=0.0, reward_bound=0.0
9165: loss=0.000, reward_mean=0.0, reward_bound=0.0
9166: loss=0.000, reward_mean=0.0, reward_bound=0.0
9167: loss=0.000, reward_mean=0.0, reward_bound=0.0
9168: loss=0.000, reward_mean=0.1, reward_bound=0.0
9169: loss=0.000, reward_mean=0.1, reward_bound=0.0
9170: loss=0.000, reward_mean=0.0, reward_bound=0.0
9171: loss=0.000, reward_mean=0.1, reward_bound=0.0
9172: loss=0.000, reward_mean=0.0, reward_bound=0.0
9173: loss=0.000, reward_mean=0.0, reward_bound=0.0
9174: loss=0.000, reward_mean=0.0, reward_bound=0.0
9175: loss=0

9314: loss=0.000, reward_mean=0.0, reward_bound=0.0
9315: loss=0.000, reward_mean=0.0, reward_bound=0.0
9316: loss=0.000, reward_mean=0.0, reward_bound=0.0
9317: loss=0.000, reward_mean=0.1, reward_bound=0.0
9318: loss=0.000, reward_mean=0.0, reward_bound=0.0
9319: loss=0.000, reward_mean=0.0, reward_bound=0.0
9320: loss=0.000, reward_mean=0.1, reward_bound=0.0
9321: loss=0.000, reward_mean=0.0, reward_bound=0.0
9322: loss=0.000, reward_mean=0.0, reward_bound=0.0
9323: loss=0.000, reward_mean=0.1, reward_bound=0.0
9324: loss=0.000, reward_mean=0.1, reward_bound=0.0
9325: loss=0.000, reward_mean=0.1, reward_bound=0.0
9326: loss=0.000, reward_mean=0.0, reward_bound=0.0
9327: loss=0.000, reward_mean=0.1, reward_bound=0.0
9328: loss=0.000, reward_mean=0.0, reward_bound=0.0
9329: loss=0.075, reward_mean=0.0, reward_bound=0.0
9330: loss=0.000, reward_mean=0.1, reward_bound=0.0
9331: loss=0.000, reward_mean=0.1, reward_bound=0.0
9332: loss=0.000, reward_mean=0.1, reward_bound=0.0
9333: loss=0

9476: loss=0.000, reward_mean=0.0, reward_bound=0.0
9477: loss=0.000, reward_mean=0.1, reward_bound=0.0
9478: loss=0.000, reward_mean=0.0, reward_bound=0.0
9479: loss=0.000, reward_mean=0.1, reward_bound=0.0
9480: loss=0.000, reward_mean=0.0, reward_bound=0.0
9481: loss=0.000, reward_mean=0.1, reward_bound=0.0
9482: loss=0.000, reward_mean=0.0, reward_bound=0.0
9483: loss=0.000, reward_mean=0.0, reward_bound=0.0
9484: loss=0.000, reward_mean=0.1, reward_bound=0.0
9485: loss=0.000, reward_mean=0.0, reward_bound=0.0
9486: loss=0.000, reward_mean=0.1, reward_bound=0.0
9487: loss=0.000, reward_mean=0.0, reward_bound=0.0
9488: loss=0.000, reward_mean=0.0, reward_bound=0.0
9489: loss=0.000, reward_mean=0.0, reward_bound=0.0
9490: loss=0.000, reward_mean=0.2, reward_bound=0.0
9491: loss=0.000, reward_mean=0.0, reward_bound=0.0
9492: loss=0.000, reward_mean=0.0, reward_bound=0.0
9493: loss=0.000, reward_mean=0.1, reward_bound=0.0
9494: loss=0.000, reward_mean=0.0, reward_bound=0.0
9495: loss=0

9636: loss=0.046, reward_mean=0.0, reward_bound=0.0
9637: loss=0.052, reward_mean=0.1, reward_bound=0.0
9638: loss=0.001, reward_mean=0.1, reward_bound=0.0
9639: loss=0.003, reward_mean=0.0, reward_bound=0.0
9640: loss=0.000, reward_mean=0.1, reward_bound=0.0
9641: loss=0.043, reward_mean=0.1, reward_bound=0.0
9642: loss=0.002, reward_mean=0.1, reward_bound=0.0
9643: loss=0.074, reward_mean=0.1, reward_bound=0.0
9644: loss=0.002, reward_mean=0.1, reward_bound=0.0
9645: loss=0.003, reward_mean=0.1, reward_bound=0.0
9646: loss=0.002, reward_mean=0.1, reward_bound=0.0
9647: loss=0.039, reward_mean=0.1, reward_bound=0.0
9648: loss=0.001, reward_mean=0.1, reward_bound=0.0
9649: loss=0.001, reward_mean=0.0, reward_bound=0.0
9650: loss=0.001, reward_mean=0.1, reward_bound=0.0
9651: loss=0.037, reward_mean=0.0, reward_bound=0.0
9652: loss=0.014, reward_mean=0.0, reward_bound=0.0
9653: loss=0.030, reward_mean=0.1, reward_bound=0.0
9654: loss=0.024, reward_mean=0.0, reward_bound=0.0
9655: loss=0

9794: loss=0.077, reward_mean=0.1, reward_bound=0.0
9795: loss=0.000, reward_mean=0.1, reward_bound=0.0
9796: loss=0.000, reward_mean=0.1, reward_bound=0.0
9797: loss=0.073, reward_mean=0.1, reward_bound=0.0
9798: loss=0.000, reward_mean=0.1, reward_bound=0.0
9799: loss=0.000, reward_mean=0.1, reward_bound=0.0
9800: loss=0.000, reward_mean=0.2, reward_bound=0.0
9801: loss=0.000, reward_mean=0.1, reward_bound=0.0
9802: loss=0.000, reward_mean=0.2, reward_bound=0.0
9803: loss=0.000, reward_mean=0.0, reward_bound=0.0
9804: loss=0.000, reward_mean=0.1, reward_bound=0.0
9805: loss=0.000, reward_mean=0.1, reward_bound=0.0
9806: loss=0.000, reward_mean=0.1, reward_bound=0.0
9807: loss=0.000, reward_mean=0.1, reward_bound=0.0
9808: loss=0.000, reward_mean=0.1, reward_bound=0.0
9809: loss=0.000, reward_mean=0.0, reward_bound=0.0
9810: loss=0.000, reward_mean=0.1, reward_bound=0.0
9811: loss=0.000, reward_mean=0.0, reward_bound=0.0
9812: loss=0.000, reward_mean=0.0, reward_bound=0.0
9813: loss=0

9955: loss=0.000, reward_mean=0.0, reward_bound=0.0
9956: loss=0.000, reward_mean=0.1, reward_bound=0.0
9957: loss=0.045, reward_mean=0.1, reward_bound=0.0
9958: loss=0.000, reward_mean=0.0, reward_bound=0.0
9959: loss=0.000, reward_mean=0.2, reward_bound=0.0
9960: loss=0.001, reward_mean=0.0, reward_bound=0.0
9961: loss=0.045, reward_mean=0.1, reward_bound=0.0
9962: loss=0.000, reward_mean=0.0, reward_bound=0.0
9963: loss=0.003, reward_mean=0.0, reward_bound=0.0
9964: loss=0.000, reward_mean=0.2, reward_bound=0.0
9965: loss=0.000, reward_mean=0.2, reward_bound=0.0
9966: loss=0.001, reward_mean=0.1, reward_bound=0.0
9967: loss=0.001, reward_mean=0.2, reward_bound=0.0
9968: loss=0.057, reward_mean=0.0, reward_bound=0.0
9969: loss=0.001, reward_mean=0.0, reward_bound=0.0
9970: loss=0.000, reward_mean=0.1, reward_bound=0.0
9971: loss=0.059, reward_mean=0.0, reward_bound=0.0
9972: loss=0.001, reward_mean=0.0, reward_bound=0.0
9973: loss=0.001, reward_mean=0.1, reward_bound=0.0
9974: loss=0

10111: loss=0.058, reward_mean=0.1, reward_bound=0.0
10112: loss=0.005, reward_mean=0.0, reward_bound=0.0
10113: loss=0.003, reward_mean=0.2, reward_bound=0.0
10114: loss=0.003, reward_mean=0.1, reward_bound=0.0
10115: loss=0.056, reward_mean=0.1, reward_bound=0.0
10116: loss=0.004, reward_mean=0.0, reward_bound=0.0
10117: loss=0.001, reward_mean=0.0, reward_bound=0.0
10118: loss=0.040, reward_mean=0.1, reward_bound=0.0
10119: loss=0.003, reward_mean=0.2, reward_bound=0.0
10120: loss=0.005, reward_mean=0.0, reward_bound=0.0
10121: loss=0.002, reward_mean=0.0, reward_bound=0.0
10122: loss=0.004, reward_mean=0.1, reward_bound=0.0
10123: loss=0.038, reward_mean=0.0, reward_bound=0.0
10124: loss=0.036, reward_mean=0.1, reward_bound=0.0
10125: loss=0.002, reward_mean=0.0, reward_bound=0.0
10126: loss=0.001, reward_mean=0.0, reward_bound=0.0
10127: loss=0.005, reward_mean=0.1, reward_bound=0.0
10128: loss=0.000, reward_mean=0.1, reward_bound=0.0
10129: loss=0.036, reward_mean=0.1, reward_bou

10268: loss=0.033, reward_mean=0.1, reward_bound=0.0
10269: loss=0.033, reward_mean=0.0, reward_bound=0.0
10270: loss=0.006, reward_mean=0.1, reward_bound=0.0
10271: loss=0.038, reward_mean=0.1, reward_bound=0.0
10272: loss=0.006, reward_mean=0.1, reward_bound=0.0
10273: loss=0.002, reward_mean=0.0, reward_bound=0.0
10274: loss=0.036, reward_mean=0.1, reward_bound=0.0
10275: loss=0.088, reward_mean=0.0, reward_bound=0.0
10276: loss=0.034, reward_mean=0.1, reward_bound=0.0
10277: loss=0.002, reward_mean=0.0, reward_bound=0.0
10278: loss=0.001, reward_mean=0.1, reward_bound=0.0
10279: loss=0.004, reward_mean=0.0, reward_bound=0.0
10280: loss=0.002, reward_mean=0.0, reward_bound=0.0
10281: loss=0.008, reward_mean=0.0, reward_bound=0.0
10282: loss=0.001, reward_mean=0.1, reward_bound=0.0
10283: loss=0.008, reward_mean=0.1, reward_bound=0.0
10284: loss=0.007, reward_mean=0.1, reward_bound=0.0
10285: loss=0.040, reward_mean=0.0, reward_bound=0.0
10286: loss=0.007, reward_mean=0.1, reward_bou

10429: loss=0.001, reward_mean=0.1, reward_bound=0.0
10430: loss=0.001, reward_mean=0.0, reward_bound=0.0
10431: loss=0.002, reward_mean=0.1, reward_bound=0.0
10432: loss=0.000, reward_mean=0.1, reward_bound=0.0
10433: loss=0.001, reward_mean=0.0, reward_bound=0.0
10434: loss=0.000, reward_mean=0.0, reward_bound=0.0
10435: loss=0.001, reward_mean=0.1, reward_bound=0.0
10436: loss=0.000, reward_mean=0.0, reward_bound=0.0
10437: loss=0.000, reward_mean=0.1, reward_bound=0.0
10438: loss=0.001, reward_mean=0.0, reward_bound=0.0
10439: loss=0.002, reward_mean=0.1, reward_bound=0.0
10440: loss=0.001, reward_mean=0.0, reward_bound=0.0
10441: loss=0.063, reward_mean=0.1, reward_bound=0.0
10442: loss=0.001, reward_mean=0.0, reward_bound=0.0
10443: loss=0.001, reward_mean=0.1, reward_bound=0.0
10444: loss=0.001, reward_mean=0.1, reward_bound=0.0
10445: loss=0.000, reward_mean=0.1, reward_bound=0.0
10446: loss=0.001, reward_mean=0.1, reward_bound=0.0
10447: loss=0.002, reward_mean=0.1, reward_bou

10586: loss=0.001, reward_mean=0.1, reward_bound=0.0
10587: loss=0.001, reward_mean=0.0, reward_bound=0.0
10588: loss=0.001, reward_mean=0.1, reward_bound=0.0
10589: loss=0.001, reward_mean=0.1, reward_bound=0.0
10590: loss=0.000, reward_mean=0.1, reward_bound=0.0
10591: loss=0.002, reward_mean=0.1, reward_bound=0.0
10592: loss=0.002, reward_mean=0.0, reward_bound=0.0
10593: loss=0.002, reward_mean=0.1, reward_bound=0.0
10594: loss=0.003, reward_mean=0.1, reward_bound=0.0
10595: loss=0.000, reward_mean=0.0, reward_bound=0.0
10596: loss=0.001, reward_mean=0.0, reward_bound=0.0
10597: loss=0.001, reward_mean=0.1, reward_bound=0.0
10598: loss=0.001, reward_mean=0.1, reward_bound=0.0
10599: loss=0.001, reward_mean=0.0, reward_bound=0.0
10600: loss=0.000, reward_mean=0.0, reward_bound=0.0
10601: loss=0.002, reward_mean=0.1, reward_bound=0.0
10602: loss=0.001, reward_mean=0.1, reward_bound=0.0
10603: loss=0.000, reward_mean=0.0, reward_bound=0.0
10604: loss=0.000, reward_mean=0.0, reward_bou

10741: loss=0.000, reward_mean=0.0, reward_bound=0.0
10742: loss=0.001, reward_mean=0.0, reward_bound=0.0
10743: loss=0.000, reward_mean=0.1, reward_bound=0.0
10744: loss=0.000, reward_mean=0.1, reward_bound=0.0
10745: loss=0.000, reward_mean=0.1, reward_bound=0.0
10746: loss=0.000, reward_mean=0.1, reward_bound=0.0
10747: loss=0.001, reward_mean=0.0, reward_bound=0.0
10748: loss=0.000, reward_mean=0.1, reward_bound=0.0
10749: loss=0.001, reward_mean=0.2, reward_bound=0.0
10750: loss=0.001, reward_mean=0.0, reward_bound=0.0
10751: loss=0.000, reward_mean=0.1, reward_bound=0.0
10752: loss=0.000, reward_mean=0.0, reward_bound=0.0
10753: loss=0.000, reward_mean=0.1, reward_bound=0.0
10754: loss=0.001, reward_mean=0.1, reward_bound=0.0
10755: loss=0.000, reward_mean=0.1, reward_bound=0.0
10756: loss=0.000, reward_mean=0.0, reward_bound=0.0
10757: loss=0.001, reward_mean=0.1, reward_bound=0.0
10758: loss=0.080, reward_mean=0.1, reward_bound=0.0
10759: loss=0.001, reward_mean=0.0, reward_bou

10899: loss=0.000, reward_mean=0.1, reward_bound=0.0
10900: loss=0.000, reward_mean=0.0, reward_bound=0.0
10901: loss=0.000, reward_mean=0.1, reward_bound=0.0
10902: loss=0.000, reward_mean=0.0, reward_bound=0.0
10903: loss=0.000, reward_mean=0.1, reward_bound=0.0
10904: loss=0.000, reward_mean=0.1, reward_bound=0.0
10905: loss=0.000, reward_mean=0.1, reward_bound=0.0
10906: loss=0.000, reward_mean=0.1, reward_bound=0.0
10907: loss=0.000, reward_mean=0.1, reward_bound=0.0
10908: loss=0.000, reward_mean=0.1, reward_bound=0.0
10909: loss=0.000, reward_mean=0.2, reward_bound=0.0
10910: loss=0.000, reward_mean=0.0, reward_bound=0.0
10911: loss=0.000, reward_mean=0.1, reward_bound=0.0
10912: loss=0.000, reward_mean=0.1, reward_bound=0.0
10913: loss=0.000, reward_mean=0.1, reward_bound=0.0
10914: loss=0.000, reward_mean=0.0, reward_bound=0.0
10915: loss=0.000, reward_mean=0.0, reward_bound=0.0
10916: loss=0.000, reward_mean=0.1, reward_bound=0.0
10917: loss=0.000, reward_mean=0.0, reward_bou

11058: loss=0.000, reward_mean=0.1, reward_bound=0.0
11059: loss=0.000, reward_mean=0.0, reward_bound=0.0
11060: loss=0.000, reward_mean=0.1, reward_bound=0.0
11061: loss=0.000, reward_mean=0.1, reward_bound=0.0
11062: loss=0.000, reward_mean=0.1, reward_bound=0.0
11063: loss=0.000, reward_mean=0.1, reward_bound=0.0
11064: loss=0.000, reward_mean=0.1, reward_bound=0.0
11065: loss=0.000, reward_mean=0.0, reward_bound=0.0
11066: loss=0.000, reward_mean=0.1, reward_bound=0.0
11067: loss=0.000, reward_mean=0.1, reward_bound=0.0
11068: loss=0.000, reward_mean=0.1, reward_bound=0.0
11069: loss=0.000, reward_mean=0.1, reward_bound=0.0
11070: loss=0.000, reward_mean=0.0, reward_bound=0.0
11071: loss=0.000, reward_mean=0.2, reward_bound=0.0
11072: loss=0.000, reward_mean=0.1, reward_bound=0.0
11073: loss=0.000, reward_mean=0.1, reward_bound=0.0
11074: loss=0.000, reward_mean=0.1, reward_bound=0.0
11075: loss=0.000, reward_mean=0.1, reward_bound=0.0
11076: loss=0.000, reward_mean=0.0, reward_bou

11216: loss=0.000, reward_mean=0.0, reward_bound=0.0
11217: loss=0.000, reward_mean=0.0, reward_bound=0.0
11218: loss=0.000, reward_mean=0.1, reward_bound=0.0
11219: loss=0.000, reward_mean=0.1, reward_bound=0.0
11220: loss=0.000, reward_mean=0.0, reward_bound=0.0
11221: loss=0.000, reward_mean=0.0, reward_bound=0.0
11222: loss=0.000, reward_mean=0.0, reward_bound=0.0
11223: loss=0.000, reward_mean=0.1, reward_bound=0.0
11224: loss=0.000, reward_mean=0.1, reward_bound=0.0
11225: loss=0.000, reward_mean=0.0, reward_bound=0.0
11226: loss=0.000, reward_mean=0.1, reward_bound=0.0
11227: loss=0.000, reward_mean=0.0, reward_bound=0.0
11228: loss=0.000, reward_mean=0.1, reward_bound=0.0
11229: loss=0.000, reward_mean=0.1, reward_bound=0.0
11230: loss=0.000, reward_mean=0.1, reward_bound=0.0
11231: loss=0.000, reward_mean=0.1, reward_bound=0.0
11232: loss=0.000, reward_mean=0.1, reward_bound=0.0
11233: loss=0.000, reward_mean=0.1, reward_bound=0.0
11234: loss=0.000, reward_mean=0.0, reward_bou

11374: loss=0.000, reward_mean=0.1, reward_bound=0.0
11375: loss=0.000, reward_mean=0.0, reward_bound=0.0
11376: loss=0.000, reward_mean=0.0, reward_bound=0.0
11377: loss=0.000, reward_mean=0.1, reward_bound=0.0
11378: loss=0.000, reward_mean=0.1, reward_bound=0.0
11379: loss=0.000, reward_mean=0.1, reward_bound=0.0
11380: loss=0.000, reward_mean=0.1, reward_bound=0.0
11381: loss=0.000, reward_mean=0.1, reward_bound=0.0
11382: loss=0.000, reward_mean=0.1, reward_bound=0.0
11383: loss=0.000, reward_mean=0.1, reward_bound=0.0
11384: loss=0.000, reward_mean=0.1, reward_bound=0.0
11385: loss=0.000, reward_mean=0.1, reward_bound=0.0
11386: loss=0.000, reward_mean=0.1, reward_bound=0.0
11387: loss=0.000, reward_mean=0.1, reward_bound=0.0
11388: loss=0.000, reward_mean=0.1, reward_bound=0.0
11389: loss=0.000, reward_mean=0.0, reward_bound=0.0
11390: loss=0.000, reward_mean=0.0, reward_bound=0.0
11391: loss=0.000, reward_mean=0.0, reward_bound=0.0
11392: loss=0.000, reward_mean=0.0, reward_bou

11531: loss=0.000, reward_mean=0.0, reward_bound=0.0
11532: loss=0.000, reward_mean=0.0, reward_bound=0.0
11533: loss=0.000, reward_mean=0.0, reward_bound=0.0
11534: loss=0.000, reward_mean=0.1, reward_bound=0.0
11535: loss=0.000, reward_mean=0.0, reward_bound=0.0
11536: loss=0.000, reward_mean=0.1, reward_bound=0.0
11537: loss=0.000, reward_mean=0.1, reward_bound=0.0
11538: loss=0.000, reward_mean=0.0, reward_bound=0.0
11539: loss=0.000, reward_mean=0.0, reward_bound=0.0
11540: loss=0.000, reward_mean=0.0, reward_bound=0.0
11541: loss=0.000, reward_mean=0.1, reward_bound=0.0
11542: loss=0.000, reward_mean=0.1, reward_bound=0.0
11543: loss=0.000, reward_mean=0.2, reward_bound=0.0
11544: loss=0.000, reward_mean=0.1, reward_bound=0.0
11545: loss=0.000, reward_mean=0.1, reward_bound=0.0
11546: loss=0.000, reward_mean=0.1, reward_bound=0.0
11547: loss=0.000, reward_mean=0.0, reward_bound=0.0
11548: loss=0.000, reward_mean=0.1, reward_bound=0.0
11549: loss=0.000, reward_mean=0.1, reward_bou

11688: loss=0.000, reward_mean=0.1, reward_bound=0.0
11689: loss=0.000, reward_mean=0.0, reward_bound=0.0
11690: loss=0.000, reward_mean=0.1, reward_bound=0.0
11691: loss=0.000, reward_mean=0.1, reward_bound=0.0
11692: loss=0.000, reward_mean=0.1, reward_bound=0.0
11693: loss=0.000, reward_mean=0.0, reward_bound=0.0
11694: loss=0.000, reward_mean=0.0, reward_bound=0.0
11695: loss=0.000, reward_mean=0.1, reward_bound=0.0
11696: loss=0.000, reward_mean=0.1, reward_bound=0.0
11697: loss=0.000, reward_mean=0.1, reward_bound=0.0
11698: loss=0.000, reward_mean=0.1, reward_bound=0.0
11699: loss=0.000, reward_mean=0.1, reward_bound=0.0
11700: loss=0.000, reward_mean=0.1, reward_bound=0.0
11701: loss=0.000, reward_mean=0.1, reward_bound=0.0
11702: loss=0.000, reward_mean=0.1, reward_bound=0.0
11703: loss=0.000, reward_mean=0.1, reward_bound=0.0
11704: loss=0.000, reward_mean=0.1, reward_bound=0.0
11705: loss=0.000, reward_mean=0.1, reward_bound=0.0
11706: loss=0.000, reward_mean=0.1, reward_bou

11849: loss=0.000, reward_mean=0.1, reward_bound=0.0
11850: loss=0.000, reward_mean=0.1, reward_bound=0.0
11851: loss=0.000, reward_mean=0.1, reward_bound=0.0
11852: loss=0.000, reward_mean=0.0, reward_bound=0.0
11853: loss=0.000, reward_mean=0.1, reward_bound=0.0
11854: loss=0.000, reward_mean=0.0, reward_bound=0.0
11855: loss=0.000, reward_mean=0.1, reward_bound=0.0
11856: loss=0.000, reward_mean=0.1, reward_bound=0.0
11857: loss=0.000, reward_mean=0.0, reward_bound=0.0
11858: loss=0.000, reward_mean=0.1, reward_bound=0.0
11859: loss=0.000, reward_mean=0.0, reward_bound=0.0
11860: loss=0.000, reward_mean=0.1, reward_bound=0.0
11861: loss=0.000, reward_mean=0.1, reward_bound=0.0
11862: loss=0.000, reward_mean=0.1, reward_bound=0.0
11863: loss=0.000, reward_mean=0.1, reward_bound=0.0
11864: loss=0.000, reward_mean=0.0, reward_bound=0.0
11865: loss=0.000, reward_mean=0.0, reward_bound=0.0
11866: loss=0.000, reward_mean=0.0, reward_bound=0.0
11867: loss=0.000, reward_mean=0.1, reward_bou

12008: loss=0.000, reward_mean=0.1, reward_bound=0.0
12009: loss=0.000, reward_mean=0.0, reward_bound=0.0
12010: loss=0.000, reward_mean=0.0, reward_bound=0.0
12011: loss=0.000, reward_mean=0.0, reward_bound=0.0
12012: loss=0.000, reward_mean=0.0, reward_bound=0.0
12013: loss=0.000, reward_mean=0.1, reward_bound=0.0
12014: loss=0.000, reward_mean=0.1, reward_bound=0.0
12015: loss=0.000, reward_mean=0.1, reward_bound=0.0
12016: loss=0.000, reward_mean=0.0, reward_bound=0.0
12017: loss=0.000, reward_mean=0.1, reward_bound=0.0
12018: loss=0.000, reward_mean=0.0, reward_bound=0.0
12019: loss=0.000, reward_mean=0.0, reward_bound=0.0
12020: loss=0.000, reward_mean=0.2, reward_bound=0.0
12021: loss=0.000, reward_mean=0.1, reward_bound=0.0
12022: loss=0.000, reward_mean=0.1, reward_bound=0.0
12023: loss=0.000, reward_mean=0.0, reward_bound=0.0
12024: loss=0.000, reward_mean=0.1, reward_bound=0.0
12025: loss=0.000, reward_mean=0.0, reward_bound=0.0
12026: loss=0.000, reward_mean=0.1, reward_bou

12164: loss=0.000, reward_mean=0.1, reward_bound=0.0
12165: loss=0.000, reward_mean=0.1, reward_bound=0.0
12166: loss=0.000, reward_mean=0.0, reward_bound=0.0
12167: loss=0.000, reward_mean=0.0, reward_bound=0.0
12168: loss=0.000, reward_mean=0.0, reward_bound=0.0
12169: loss=0.000, reward_mean=0.0, reward_bound=0.0
12170: loss=0.000, reward_mean=0.0, reward_bound=0.0
12171: loss=0.000, reward_mean=0.0, reward_bound=0.0
12172: loss=0.000, reward_mean=0.0, reward_bound=0.0
12173: loss=0.000, reward_mean=0.0, reward_bound=0.0
12174: loss=0.000, reward_mean=0.1, reward_bound=0.0
12175: loss=0.000, reward_mean=0.0, reward_bound=0.0
12176: loss=0.000, reward_mean=0.1, reward_bound=0.0
12177: loss=0.000, reward_mean=0.1, reward_bound=0.0
12178: loss=0.000, reward_mean=0.0, reward_bound=0.0
12179: loss=0.000, reward_mean=0.1, reward_bound=0.0
12180: loss=0.000, reward_mean=0.1, reward_bound=0.0
12181: loss=0.000, reward_mean=0.0, reward_bound=0.0
12182: loss=0.000, reward_mean=0.0, reward_bou

12319: loss=0.000, reward_mean=0.1, reward_bound=0.0
12320: loss=0.000, reward_mean=0.1, reward_bound=0.0
12321: loss=0.000, reward_mean=0.1, reward_bound=0.0
12322: loss=0.000, reward_mean=0.0, reward_bound=0.0
12323: loss=0.000, reward_mean=0.1, reward_bound=0.0
12324: loss=0.000, reward_mean=0.0, reward_bound=0.0
12325: loss=0.000, reward_mean=0.1, reward_bound=0.0
12326: loss=0.000, reward_mean=0.0, reward_bound=0.0
12327: loss=0.000, reward_mean=0.1, reward_bound=0.0
12328: loss=0.000, reward_mean=0.1, reward_bound=0.0
12329: loss=0.000, reward_mean=0.2, reward_bound=0.0
12330: loss=0.000, reward_mean=0.0, reward_bound=0.0
12331: loss=0.000, reward_mean=0.1, reward_bound=0.0
12332: loss=0.000, reward_mean=0.1, reward_bound=0.0
12333: loss=0.000, reward_mean=0.1, reward_bound=0.0
12334: loss=0.000, reward_mean=0.0, reward_bound=0.0
12335: loss=0.000, reward_mean=0.1, reward_bound=0.0
12336: loss=0.000, reward_mean=0.0, reward_bound=0.0
12337: loss=0.000, reward_mean=0.0, reward_bou

12480: loss=0.000, reward_mean=0.1, reward_bound=0.0
12481: loss=0.000, reward_mean=0.1, reward_bound=0.0
12482: loss=0.000, reward_mean=0.1, reward_bound=0.0
12483: loss=0.000, reward_mean=0.0, reward_bound=0.0
12484: loss=0.000, reward_mean=0.0, reward_bound=0.0
12485: loss=0.000, reward_mean=0.1, reward_bound=0.0
12486: loss=0.000, reward_mean=0.0, reward_bound=0.0
12487: loss=0.000, reward_mean=0.0, reward_bound=0.0
12488: loss=0.000, reward_mean=0.1, reward_bound=0.0
12489: loss=0.000, reward_mean=0.1, reward_bound=0.0
12490: loss=0.000, reward_mean=0.1, reward_bound=0.0
12491: loss=0.000, reward_mean=0.0, reward_bound=0.0
12492: loss=0.000, reward_mean=0.1, reward_bound=0.0
12493: loss=0.000, reward_mean=0.0, reward_bound=0.0
12494: loss=0.000, reward_mean=0.0, reward_bound=0.0
12495: loss=0.000, reward_mean=0.1, reward_bound=0.0
12496: loss=0.000, reward_mean=0.0, reward_bound=0.0
12497: loss=0.000, reward_mean=0.0, reward_bound=0.0
12498: loss=0.000, reward_mean=0.1, reward_bou

12636: loss=0.000, reward_mean=0.0, reward_bound=0.0
12637: loss=0.000, reward_mean=0.1, reward_bound=0.0
12638: loss=0.000, reward_mean=0.0, reward_bound=0.0
12639: loss=0.000, reward_mean=0.0, reward_bound=0.0
12640: loss=0.000, reward_mean=0.0, reward_bound=0.0
12641: loss=0.000, reward_mean=0.1, reward_bound=0.0
12642: loss=0.000, reward_mean=0.1, reward_bound=0.0
12643: loss=0.000, reward_mean=0.0, reward_bound=0.0
12644: loss=0.000, reward_mean=0.0, reward_bound=0.0
12645: loss=0.000, reward_mean=0.0, reward_bound=0.0
12646: loss=0.000, reward_mean=0.1, reward_bound=0.0
12647: loss=0.000, reward_mean=0.0, reward_bound=0.0
12648: loss=0.000, reward_mean=0.1, reward_bound=0.0
12649: loss=0.000, reward_mean=0.1, reward_bound=0.0
12650: loss=0.000, reward_mean=0.1, reward_bound=0.0
12651: loss=0.000, reward_mean=0.0, reward_bound=0.0
12652: loss=0.000, reward_mean=0.0, reward_bound=0.0
12653: loss=0.000, reward_mean=0.0, reward_bound=0.0
12654: loss=0.000, reward_mean=0.1, reward_bou

12796: loss=0.000, reward_mean=0.1, reward_bound=0.0
12797: loss=0.000, reward_mean=0.2, reward_bound=0.0
12798: loss=0.000, reward_mean=0.0, reward_bound=0.0
12799: loss=0.000, reward_mean=0.1, reward_bound=0.0
12800: loss=0.000, reward_mean=0.1, reward_bound=0.0
12801: loss=0.000, reward_mean=0.0, reward_bound=0.0
12802: loss=0.000, reward_mean=0.1, reward_bound=0.0
12803: loss=0.000, reward_mean=0.1, reward_bound=0.0
12804: loss=0.000, reward_mean=0.0, reward_bound=0.0
12805: loss=0.000, reward_mean=0.1, reward_bound=0.0
12806: loss=0.000, reward_mean=0.0, reward_bound=0.0
12807: loss=0.000, reward_mean=0.1, reward_bound=0.0
12808: loss=0.000, reward_mean=0.1, reward_bound=0.0
12809: loss=0.000, reward_mean=0.2, reward_bound=0.0
12810: loss=0.000, reward_mean=0.1, reward_bound=0.0
12811: loss=0.000, reward_mean=0.0, reward_bound=0.0
12812: loss=0.000, reward_mean=0.0, reward_bound=0.0
12813: loss=0.000, reward_mean=0.1, reward_bound=0.0
12814: loss=0.000, reward_mean=0.0, reward_bou

12953: loss=0.000, reward_mean=0.1, reward_bound=0.0
12954: loss=0.000, reward_mean=0.0, reward_bound=0.0
12955: loss=0.000, reward_mean=0.1, reward_bound=0.0
12956: loss=0.000, reward_mean=0.1, reward_bound=0.0
12957: loss=0.000, reward_mean=0.1, reward_bound=0.0
12958: loss=0.000, reward_mean=0.0, reward_bound=0.0
12959: loss=0.000, reward_mean=0.1, reward_bound=0.0
12960: loss=0.000, reward_mean=0.1, reward_bound=0.0
12961: loss=0.000, reward_mean=0.0, reward_bound=0.0
12962: loss=0.000, reward_mean=0.1, reward_bound=0.0
12963: loss=0.000, reward_mean=0.2, reward_bound=0.0
12964: loss=0.000, reward_mean=0.1, reward_bound=0.0
12965: loss=0.000, reward_mean=0.0, reward_bound=0.0
12966: loss=0.000, reward_mean=0.1, reward_bound=0.0
12967: loss=0.000, reward_mean=0.0, reward_bound=0.0
12968: loss=0.000, reward_mean=0.1, reward_bound=0.0
12969: loss=0.000, reward_mean=0.1, reward_bound=0.0
12970: loss=0.000, reward_mean=0.1, reward_bound=0.0
12971: loss=0.000, reward_mean=0.1, reward_bou

13111: loss=0.000, reward_mean=0.0, reward_bound=0.0
13112: loss=0.000, reward_mean=0.0, reward_bound=0.0
13113: loss=0.000, reward_mean=0.1, reward_bound=0.0
13114: loss=0.000, reward_mean=0.1, reward_bound=0.0
13115: loss=0.000, reward_mean=0.0, reward_bound=0.0
13116: loss=0.000, reward_mean=0.1, reward_bound=0.0
13117: loss=0.000, reward_mean=0.1, reward_bound=0.0
13118: loss=0.000, reward_mean=0.0, reward_bound=0.0
13119: loss=0.000, reward_mean=0.1, reward_bound=0.0
13120: loss=0.000, reward_mean=0.1, reward_bound=0.0
13121: loss=0.000, reward_mean=0.0, reward_bound=0.0
13122: loss=0.000, reward_mean=0.1, reward_bound=0.0
13123: loss=0.000, reward_mean=0.0, reward_bound=0.0
13124: loss=0.000, reward_mean=0.1, reward_bound=0.0
13125: loss=0.000, reward_mean=0.0, reward_bound=0.0
13126: loss=0.000, reward_mean=0.0, reward_bound=0.0
13127: loss=0.000, reward_mean=0.1, reward_bound=0.0
13128: loss=0.000, reward_mean=0.1, reward_bound=0.0
13129: loss=0.000, reward_mean=0.0, reward_bou

13266: loss=0.000, reward_mean=0.0, reward_bound=0.0
13267: loss=0.000, reward_mean=0.1, reward_bound=0.0
13268: loss=0.000, reward_mean=0.0, reward_bound=0.0
13269: loss=0.000, reward_mean=0.1, reward_bound=0.0
13270: loss=0.000, reward_mean=0.0, reward_bound=0.0
13271: loss=0.000, reward_mean=0.0, reward_bound=0.0
13272: loss=0.000, reward_mean=0.1, reward_bound=0.0
13273: loss=0.000, reward_mean=0.1, reward_bound=0.0
13274: loss=0.000, reward_mean=0.0, reward_bound=0.0
13275: loss=0.000, reward_mean=0.1, reward_bound=0.0
13276: loss=0.000, reward_mean=0.1, reward_bound=0.0
13277: loss=0.000, reward_mean=0.1, reward_bound=0.0
13278: loss=0.000, reward_mean=0.1, reward_bound=0.0
13279: loss=0.000, reward_mean=0.1, reward_bound=0.0
13280: loss=0.000, reward_mean=0.1, reward_bound=0.0
13281: loss=0.000, reward_mean=0.1, reward_bound=0.0
13282: loss=0.000, reward_mean=0.0, reward_bound=0.0
13283: loss=0.000, reward_mean=0.1, reward_bound=0.0
13284: loss=0.000, reward_mean=0.0, reward_bou

13422: loss=0.000, reward_mean=0.0, reward_bound=0.0
13423: loss=0.000, reward_mean=0.1, reward_bound=0.0
13424: loss=0.000, reward_mean=0.0, reward_bound=0.0
13425: loss=0.000, reward_mean=0.1, reward_bound=0.0
13426: loss=0.000, reward_mean=0.1, reward_bound=0.0
13427: loss=0.000, reward_mean=0.0, reward_bound=0.0
13428: loss=0.000, reward_mean=0.1, reward_bound=0.0
13429: loss=0.000, reward_mean=0.0, reward_bound=0.0
13430: loss=0.000, reward_mean=0.2, reward_bound=0.0
13431: loss=0.000, reward_mean=0.1, reward_bound=0.0
13432: loss=0.000, reward_mean=0.1, reward_bound=0.0
13433: loss=0.000, reward_mean=0.1, reward_bound=0.0
13434: loss=0.000, reward_mean=0.1, reward_bound=0.0
13435: loss=0.000, reward_mean=0.1, reward_bound=0.0
13436: loss=0.000, reward_mean=0.1, reward_bound=0.0
13437: loss=0.000, reward_mean=0.1, reward_bound=0.0
13438: loss=0.000, reward_mean=0.1, reward_bound=0.0
13439: loss=0.000, reward_mean=0.0, reward_bound=0.0
13440: loss=0.000, reward_mean=0.1, reward_bou

13580: loss=0.000, reward_mean=0.1, reward_bound=0.0
13581: loss=0.000, reward_mean=0.1, reward_bound=0.0
13582: loss=0.000, reward_mean=0.1, reward_bound=0.0
13583: loss=0.000, reward_mean=0.1, reward_bound=0.0
13584: loss=0.000, reward_mean=0.0, reward_bound=0.0
13585: loss=0.000, reward_mean=0.1, reward_bound=0.0
13586: loss=0.000, reward_mean=0.0, reward_bound=0.0
13587: loss=0.000, reward_mean=0.1, reward_bound=0.0
13588: loss=0.000, reward_mean=0.0, reward_bound=0.0
13589: loss=0.000, reward_mean=0.1, reward_bound=0.0
13590: loss=0.000, reward_mean=0.1, reward_bound=0.0
13591: loss=0.000, reward_mean=0.1, reward_bound=0.0
13592: loss=0.000, reward_mean=0.0, reward_bound=0.0
13593: loss=0.000, reward_mean=0.0, reward_bound=0.0
13594: loss=0.000, reward_mean=0.0, reward_bound=0.0
13595: loss=0.000, reward_mean=0.0, reward_bound=0.0
13596: loss=0.000, reward_mean=0.0, reward_bound=0.0
13597: loss=0.000, reward_mean=0.0, reward_bound=0.0
13598: loss=0.000, reward_mean=0.0, reward_bou

13736: loss=0.000, reward_mean=0.0, reward_bound=0.0
13737: loss=0.000, reward_mean=0.1, reward_bound=0.0
13738: loss=0.000, reward_mean=0.1, reward_bound=0.0
13739: loss=0.000, reward_mean=0.0, reward_bound=0.0
13740: loss=0.000, reward_mean=0.1, reward_bound=0.0
13741: loss=0.000, reward_mean=0.0, reward_bound=0.0
13742: loss=0.000, reward_mean=0.1, reward_bound=0.0
13743: loss=0.000, reward_mean=0.1, reward_bound=0.0
13744: loss=0.000, reward_mean=0.0, reward_bound=0.0
13745: loss=0.000, reward_mean=0.0, reward_bound=0.0
13746: loss=0.000, reward_mean=0.0, reward_bound=0.0
13747: loss=0.000, reward_mean=0.2, reward_bound=0.0
13748: loss=0.000, reward_mean=0.1, reward_bound=0.0
13749: loss=0.000, reward_mean=0.1, reward_bound=0.0
13750: loss=0.000, reward_mean=0.1, reward_bound=0.0
13751: loss=0.000, reward_mean=0.1, reward_bound=0.0
13752: loss=0.000, reward_mean=0.0, reward_bound=0.0
13753: loss=0.000, reward_mean=0.0, reward_bound=0.0
13754: loss=0.000, reward_mean=0.1, reward_bou

13893: loss=0.000, reward_mean=0.2, reward_bound=0.0
13894: loss=0.000, reward_mean=0.1, reward_bound=0.0
13895: loss=0.000, reward_mean=0.1, reward_bound=0.0
13896: loss=0.000, reward_mean=0.0, reward_bound=0.0
13897: loss=0.000, reward_mean=0.0, reward_bound=0.0
13898: loss=0.000, reward_mean=0.1, reward_bound=0.0
13899: loss=0.000, reward_mean=0.1, reward_bound=0.0
13900: loss=0.000, reward_mean=0.1, reward_bound=0.0
13901: loss=0.000, reward_mean=0.1, reward_bound=0.0
13902: loss=0.000, reward_mean=0.0, reward_bound=0.0
13903: loss=0.000, reward_mean=0.1, reward_bound=0.0
13904: loss=0.000, reward_mean=0.1, reward_bound=0.0
13905: loss=0.000, reward_mean=0.0, reward_bound=0.0
13906: loss=0.000, reward_mean=0.0, reward_bound=0.0
13907: loss=0.000, reward_mean=0.0, reward_bound=0.0
13908: loss=0.000, reward_mean=0.1, reward_bound=0.0
13909: loss=0.000, reward_mean=0.0, reward_bound=0.0
13910: loss=0.000, reward_mean=0.1, reward_bound=0.0
13911: loss=0.000, reward_mean=0.1, reward_bou

14047: loss=0.000, reward_mean=0.0, reward_bound=0.0
14048: loss=0.000, reward_mean=0.1, reward_bound=0.0
14049: loss=0.000, reward_mean=0.0, reward_bound=0.0
14050: loss=0.000, reward_mean=0.1, reward_bound=0.0
14051: loss=0.000, reward_mean=0.0, reward_bound=0.0
14052: loss=0.000, reward_mean=0.1, reward_bound=0.0
14053: loss=0.000, reward_mean=0.1, reward_bound=0.0
14054: loss=0.000, reward_mean=0.1, reward_bound=0.0
14055: loss=0.000, reward_mean=0.1, reward_bound=0.0
14056: loss=0.000, reward_mean=0.1, reward_bound=0.0
14057: loss=0.000, reward_mean=0.1, reward_bound=0.0
14058: loss=0.000, reward_mean=0.1, reward_bound=0.0
14059: loss=0.000, reward_mean=0.0, reward_bound=0.0
14060: loss=0.000, reward_mean=0.0, reward_bound=0.0
14061: loss=0.000, reward_mean=0.1, reward_bound=0.0
14062: loss=0.000, reward_mean=0.0, reward_bound=0.0
14063: loss=0.000, reward_mean=0.1, reward_bound=0.0
14064: loss=0.000, reward_mean=0.1, reward_bound=0.0
14065: loss=0.000, reward_mean=0.0, reward_bou

14204: loss=0.000, reward_mean=0.0, reward_bound=0.0
14205: loss=0.000, reward_mean=0.1, reward_bound=0.0
14206: loss=0.000, reward_mean=0.0, reward_bound=0.0
14207: loss=0.000, reward_mean=0.0, reward_bound=0.0
14208: loss=0.000, reward_mean=0.0, reward_bound=0.0
14209: loss=0.000, reward_mean=0.0, reward_bound=0.0
14210: loss=0.000, reward_mean=0.0, reward_bound=0.0
14211: loss=0.000, reward_mean=0.2, reward_bound=0.0
14212: loss=0.000, reward_mean=0.1, reward_bound=0.0
14213: loss=0.000, reward_mean=0.1, reward_bound=0.0
14214: loss=0.000, reward_mean=0.1, reward_bound=0.0
14215: loss=0.000, reward_mean=0.1, reward_bound=0.0
14216: loss=0.000, reward_mean=0.1, reward_bound=0.0
14217: loss=0.000, reward_mean=0.0, reward_bound=0.0
14218: loss=0.000, reward_mean=0.0, reward_bound=0.0
14219: loss=0.000, reward_mean=0.1, reward_bound=0.0
14220: loss=0.000, reward_mean=0.1, reward_bound=0.0
14221: loss=0.000, reward_mean=0.1, reward_bound=0.0
14222: loss=0.000, reward_mean=0.1, reward_bou

14360: loss=0.000, reward_mean=0.0, reward_bound=0.0
14361: loss=0.000, reward_mean=0.1, reward_bound=0.0
14362: loss=0.000, reward_mean=0.1, reward_bound=0.0
14363: loss=0.000, reward_mean=0.0, reward_bound=0.0
14364: loss=0.000, reward_mean=0.0, reward_bound=0.0
14365: loss=0.000, reward_mean=0.0, reward_bound=0.0
14366: loss=0.000, reward_mean=0.1, reward_bound=0.0
14367: loss=0.000, reward_mean=0.0, reward_bound=0.0
14368: loss=0.000, reward_mean=0.0, reward_bound=0.0
14369: loss=0.000, reward_mean=0.0, reward_bound=0.0
14370: loss=0.000, reward_mean=0.1, reward_bound=0.0
14371: loss=0.000, reward_mean=0.0, reward_bound=0.0
14372: loss=0.000, reward_mean=0.1, reward_bound=0.0
14373: loss=0.000, reward_mean=0.2, reward_bound=0.0
14374: loss=0.000, reward_mean=0.1, reward_bound=0.0
14375: loss=0.000, reward_mean=0.1, reward_bound=0.0
14376: loss=0.000, reward_mean=0.0, reward_bound=0.0
14377: loss=0.000, reward_mean=0.1, reward_bound=0.0
14378: loss=0.000, reward_mean=0.2, reward_bou

14516: loss=0.000, reward_mean=0.0, reward_bound=0.0
14517: loss=0.000, reward_mean=0.0, reward_bound=0.0
14518: loss=0.000, reward_mean=0.0, reward_bound=0.0
14519: loss=0.000, reward_mean=0.0, reward_bound=0.0
14520: loss=0.000, reward_mean=0.1, reward_bound=0.0
14521: loss=0.000, reward_mean=0.2, reward_bound=0.0
14522: loss=0.000, reward_mean=0.0, reward_bound=0.0
14523: loss=0.000, reward_mean=0.1, reward_bound=0.0
14524: loss=0.000, reward_mean=0.0, reward_bound=0.0
14525: loss=0.000, reward_mean=0.1, reward_bound=0.0
14526: loss=0.000, reward_mean=0.0, reward_bound=0.0
14527: loss=0.000, reward_mean=0.0, reward_bound=0.0
14528: loss=0.000, reward_mean=0.0, reward_bound=0.0
14529: loss=0.000, reward_mean=0.1, reward_bound=0.0
14530: loss=0.000, reward_mean=0.0, reward_bound=0.0
14531: loss=0.000, reward_mean=0.1, reward_bound=0.0
14532: loss=0.000, reward_mean=0.0, reward_bound=0.0
14533: loss=0.000, reward_mean=0.0, reward_bound=0.0
14534: loss=0.000, reward_mean=0.0, reward_bou

14674: loss=0.000, reward_mean=0.1, reward_bound=0.0
14675: loss=0.000, reward_mean=0.0, reward_bound=0.0
14676: loss=0.000, reward_mean=0.0, reward_bound=0.0
14677: loss=0.000, reward_mean=0.1, reward_bound=0.0
14678: loss=0.000, reward_mean=0.1, reward_bound=0.0
14679: loss=0.000, reward_mean=0.0, reward_bound=0.0
14680: loss=0.000, reward_mean=0.1, reward_bound=0.0
14681: loss=0.000, reward_mean=0.0, reward_bound=0.0
14682: loss=0.000, reward_mean=0.1, reward_bound=0.0
14683: loss=0.000, reward_mean=0.0, reward_bound=0.0
14684: loss=0.000, reward_mean=0.1, reward_bound=0.0
14685: loss=0.000, reward_mean=0.0, reward_bound=0.0
14686: loss=0.000, reward_mean=0.0, reward_bound=0.0
14687: loss=0.000, reward_mean=0.1, reward_bound=0.0
14688: loss=0.000, reward_mean=0.1, reward_bound=0.0
14689: loss=0.000, reward_mean=0.0, reward_bound=0.0
14690: loss=0.000, reward_mean=0.1, reward_bound=0.0
14691: loss=0.000, reward_mean=0.0, reward_bound=0.0
14692: loss=0.000, reward_mean=0.0, reward_bou

14830: loss=0.000, reward_mean=0.1, reward_bound=0.0
14831: loss=0.000, reward_mean=0.1, reward_bound=0.0
14832: loss=0.000, reward_mean=0.0, reward_bound=0.0
14833: loss=0.000, reward_mean=0.0, reward_bound=0.0
14834: loss=0.000, reward_mean=0.1, reward_bound=0.0
14835: loss=0.000, reward_mean=0.1, reward_bound=0.0
14836: loss=0.000, reward_mean=0.0, reward_bound=0.0
14837: loss=0.000, reward_mean=0.1, reward_bound=0.0
14838: loss=0.000, reward_mean=0.0, reward_bound=0.0
14839: loss=0.000, reward_mean=0.0, reward_bound=0.0
14840: loss=0.000, reward_mean=0.0, reward_bound=0.0
14841: loss=0.000, reward_mean=0.0, reward_bound=0.0
14842: loss=0.000, reward_mean=0.0, reward_bound=0.0
14843: loss=0.000, reward_mean=0.1, reward_bound=0.0
14844: loss=0.000, reward_mean=0.1, reward_bound=0.0
14845: loss=0.000, reward_mean=0.0, reward_bound=0.0
14846: loss=0.000, reward_mean=0.1, reward_bound=0.0
14847: loss=0.000, reward_mean=0.0, reward_bound=0.0
14848: loss=0.000, reward_mean=0.1, reward_bou

14988: loss=0.000, reward_mean=0.0, reward_bound=0.0
14989: loss=0.000, reward_mean=0.0, reward_bound=0.0
14990: loss=0.000, reward_mean=0.1, reward_bound=0.0
14991: loss=0.000, reward_mean=0.1, reward_bound=0.0
14992: loss=0.000, reward_mean=0.1, reward_bound=0.0
14993: loss=0.000, reward_mean=0.0, reward_bound=0.0
14994: loss=0.000, reward_mean=0.2, reward_bound=0.0
14995: loss=0.000, reward_mean=0.1, reward_bound=0.0
14996: loss=0.000, reward_mean=0.1, reward_bound=0.0
14997: loss=0.000, reward_mean=0.0, reward_bound=0.0
14998: loss=0.000, reward_mean=0.1, reward_bound=0.0
14999: loss=0.000, reward_mean=0.1, reward_bound=0.0
15000: loss=0.000, reward_mean=0.1, reward_bound=0.0
15001: loss=0.000, reward_mean=0.0, reward_bound=0.0
15002: loss=0.000, reward_mean=0.0, reward_bound=0.0
15003: loss=0.000, reward_mean=0.1, reward_bound=0.0
15004: loss=0.000, reward_mean=0.0, reward_bound=0.0
15005: loss=0.000, reward_mean=0.1, reward_bound=0.0
15006: loss=0.000, reward_mean=0.1, reward_bou

15145: loss=0.000, reward_mean=0.0, reward_bound=0.0
15146: loss=0.000, reward_mean=0.0, reward_bound=0.0
15147: loss=0.000, reward_mean=0.1, reward_bound=0.0
15148: loss=0.000, reward_mean=0.1, reward_bound=0.0
15149: loss=0.000, reward_mean=0.1, reward_bound=0.0
15150: loss=0.000, reward_mean=0.0, reward_bound=0.0
15151: loss=0.000, reward_mean=0.0, reward_bound=0.0
15152: loss=0.000, reward_mean=0.0, reward_bound=0.0
15153: loss=0.000, reward_mean=0.0, reward_bound=0.0
15154: loss=0.000, reward_mean=0.1, reward_bound=0.0
15155: loss=0.000, reward_mean=0.2, reward_bound=0.0
15156: loss=0.000, reward_mean=0.1, reward_bound=0.0
15157: loss=0.000, reward_mean=0.1, reward_bound=0.0
15158: loss=0.000, reward_mean=0.1, reward_bound=0.0
15159: loss=0.000, reward_mean=0.1, reward_bound=0.0
15160: loss=0.000, reward_mean=0.0, reward_bound=0.0
15161: loss=0.000, reward_mean=0.1, reward_bound=0.0
15162: loss=0.000, reward_mean=0.2, reward_bound=0.0
15163: loss=0.000, reward_mean=0.0, reward_bou

15301: loss=0.000, reward_mean=0.1, reward_bound=0.0
15302: loss=0.000, reward_mean=0.0, reward_bound=0.0
15303: loss=0.000, reward_mean=0.0, reward_bound=0.0
15304: loss=0.000, reward_mean=0.1, reward_bound=0.0
15305: loss=0.000, reward_mean=0.1, reward_bound=0.0
15306: loss=0.000, reward_mean=0.0, reward_bound=0.0
15307: loss=0.000, reward_mean=0.1, reward_bound=0.0
15308: loss=0.000, reward_mean=0.1, reward_bound=0.0
15309: loss=0.000, reward_mean=0.1, reward_bound=0.0
15310: loss=0.000, reward_mean=0.2, reward_bound=0.0
15311: loss=0.000, reward_mean=0.0, reward_bound=0.0
15312: loss=0.000, reward_mean=0.1, reward_bound=0.0
15313: loss=0.000, reward_mean=0.1, reward_bound=0.0
15314: loss=0.000, reward_mean=0.1, reward_bound=0.0
15315: loss=0.000, reward_mean=0.2, reward_bound=0.0
15316: loss=0.000, reward_mean=0.1, reward_bound=0.0
15317: loss=0.000, reward_mean=0.0, reward_bound=0.0
15318: loss=0.000, reward_mean=0.0, reward_bound=0.0
15319: loss=0.000, reward_mean=0.0, reward_bou

15457: loss=0.000, reward_mean=0.1, reward_bound=0.0
15458: loss=0.000, reward_mean=0.1, reward_bound=0.0
15459: loss=0.000, reward_mean=0.0, reward_bound=0.0
15460: loss=0.000, reward_mean=0.0, reward_bound=0.0
15461: loss=0.000, reward_mean=0.0, reward_bound=0.0
15462: loss=0.000, reward_mean=0.0, reward_bound=0.0
15463: loss=0.000, reward_mean=0.0, reward_bound=0.0
15464: loss=0.000, reward_mean=0.0, reward_bound=0.0
15465: loss=0.000, reward_mean=0.1, reward_bound=0.0
15466: loss=0.000, reward_mean=0.1, reward_bound=0.0
15467: loss=0.000, reward_mean=0.1, reward_bound=0.0
15468: loss=0.000, reward_mean=0.0, reward_bound=0.0
15469: loss=0.000, reward_mean=0.0, reward_bound=0.0
15470: loss=0.000, reward_mean=0.1, reward_bound=0.0
15471: loss=0.000, reward_mean=0.1, reward_bound=0.0
15472: loss=0.000, reward_mean=0.1, reward_bound=0.0
15473: loss=0.000, reward_mean=0.1, reward_bound=0.0
15474: loss=0.000, reward_mean=0.0, reward_bound=0.0
15475: loss=0.000, reward_mean=0.1, reward_bou

15612: loss=0.000, reward_mean=0.1, reward_bound=0.0
15613: loss=0.000, reward_mean=0.1, reward_bound=0.0
15614: loss=0.000, reward_mean=0.1, reward_bound=0.0
15615: loss=0.000, reward_mean=0.1, reward_bound=0.0
15616: loss=0.000, reward_mean=0.1, reward_bound=0.0
15617: loss=0.000, reward_mean=0.2, reward_bound=0.0
15618: loss=0.000, reward_mean=0.0, reward_bound=0.0
15619: loss=0.000, reward_mean=0.1, reward_bound=0.0
15620: loss=0.000, reward_mean=0.1, reward_bound=0.0
15621: loss=0.000, reward_mean=0.0, reward_bound=0.0
15622: loss=0.000, reward_mean=0.0, reward_bound=0.0
15623: loss=0.000, reward_mean=0.0, reward_bound=0.0
15624: loss=0.000, reward_mean=0.0, reward_bound=0.0
15625: loss=0.000, reward_mean=0.0, reward_bound=0.0
15626: loss=0.000, reward_mean=0.0, reward_bound=0.0
15627: loss=0.000, reward_mean=0.0, reward_bound=0.0
15628: loss=0.000, reward_mean=0.0, reward_bound=0.0
15629: loss=0.000, reward_mean=0.0, reward_bound=0.0
15630: loss=0.000, reward_mean=0.0, reward_bou

15769: loss=0.000, reward_mean=0.0, reward_bound=0.0
15770: loss=0.000, reward_mean=0.1, reward_bound=0.0
15771: loss=0.000, reward_mean=0.1, reward_bound=0.0
15772: loss=0.000, reward_mean=0.1, reward_bound=0.0
15773: loss=0.000, reward_mean=0.1, reward_bound=0.0
15774: loss=0.000, reward_mean=0.1, reward_bound=0.0
15775: loss=0.000, reward_mean=0.0, reward_bound=0.0
15776: loss=0.000, reward_mean=0.0, reward_bound=0.0
15777: loss=0.000, reward_mean=0.1, reward_bound=0.0
15778: loss=0.000, reward_mean=0.1, reward_bound=0.0
15779: loss=0.000, reward_mean=0.1, reward_bound=0.0
15780: loss=0.000, reward_mean=0.0, reward_bound=0.0
15781: loss=0.000, reward_mean=0.1, reward_bound=0.0
15782: loss=0.000, reward_mean=0.1, reward_bound=0.0
15783: loss=0.000, reward_mean=0.0, reward_bound=0.0
15784: loss=0.000, reward_mean=0.0, reward_bound=0.0
15785: loss=0.000, reward_mean=0.1, reward_bound=0.0
15786: loss=0.000, reward_mean=0.0, reward_bound=0.0
15787: loss=0.000, reward_mean=0.0, reward_bou

15925: loss=0.000, reward_mean=0.0, reward_bound=0.0
15926: loss=0.000, reward_mean=0.1, reward_bound=0.0
15927: loss=0.000, reward_mean=0.2, reward_bound=0.0
15928: loss=0.000, reward_mean=0.2, reward_bound=0.0
15929: loss=0.000, reward_mean=0.1, reward_bound=0.0
15930: loss=0.000, reward_mean=0.0, reward_bound=0.0
15931: loss=0.000, reward_mean=0.1, reward_bound=0.0
15932: loss=0.000, reward_mean=0.0, reward_bound=0.0
15933: loss=0.000, reward_mean=0.1, reward_bound=0.0
15934: loss=0.000, reward_mean=0.0, reward_bound=0.0
15935: loss=0.000, reward_mean=0.0, reward_bound=0.0
15936: loss=0.000, reward_mean=0.1, reward_bound=0.0
15937: loss=0.000, reward_mean=0.1, reward_bound=0.0
15938: loss=0.000, reward_mean=0.0, reward_bound=0.0
15939: loss=0.000, reward_mean=0.1, reward_bound=0.0
15940: loss=0.000, reward_mean=0.1, reward_bound=0.0
15941: loss=0.000, reward_mean=0.0, reward_bound=0.0
15942: loss=0.000, reward_mean=0.0, reward_bound=0.0
15943: loss=0.000, reward_mean=0.1, reward_bou

16083: loss=0.000, reward_mean=0.0, reward_bound=0.0
16084: loss=0.000, reward_mean=0.0, reward_bound=0.0
16085: loss=0.000, reward_mean=0.1, reward_bound=0.0
16086: loss=0.000, reward_mean=0.1, reward_bound=0.0
16087: loss=0.000, reward_mean=0.1, reward_bound=0.0
16088: loss=0.000, reward_mean=0.2, reward_bound=0.0
16089: loss=0.000, reward_mean=0.1, reward_bound=0.0
16090: loss=0.000, reward_mean=0.2, reward_bound=0.0
16091: loss=0.000, reward_mean=0.0, reward_bound=0.0
16092: loss=0.000, reward_mean=0.1, reward_bound=0.0
16093: loss=0.000, reward_mean=0.0, reward_bound=0.0
16094: loss=0.000, reward_mean=0.0, reward_bound=0.0
16095: loss=0.000, reward_mean=0.0, reward_bound=0.0
16096: loss=0.000, reward_mean=0.1, reward_bound=0.0
16097: loss=0.000, reward_mean=0.0, reward_bound=0.0
16098: loss=0.000, reward_mean=0.2, reward_bound=0.0
16099: loss=0.000, reward_mean=0.0, reward_bound=0.0
16100: loss=0.000, reward_mean=0.0, reward_bound=0.0
16101: loss=0.000, reward_mean=0.1, reward_bou

16242: loss=0.000, reward_mean=0.0, reward_bound=0.0
16243: loss=0.000, reward_mean=0.1, reward_bound=0.0
16244: loss=0.000, reward_mean=0.1, reward_bound=0.0
16245: loss=0.000, reward_mean=0.1, reward_bound=0.0
16246: loss=0.000, reward_mean=0.1, reward_bound=0.0
16247: loss=0.000, reward_mean=0.0, reward_bound=0.0
16248: loss=0.000, reward_mean=0.0, reward_bound=0.0
16249: loss=0.000, reward_mean=0.0, reward_bound=0.0
16250: loss=0.000, reward_mean=0.1, reward_bound=0.0
16251: loss=0.000, reward_mean=0.0, reward_bound=0.0
16252: loss=0.000, reward_mean=0.0, reward_bound=0.0
16253: loss=0.000, reward_mean=0.0, reward_bound=0.0
16254: loss=0.000, reward_mean=0.1, reward_bound=0.0
16255: loss=0.000, reward_mean=0.0, reward_bound=0.0
16256: loss=0.000, reward_mean=0.1, reward_bound=0.0
16257: loss=0.000, reward_mean=0.1, reward_bound=0.0
16258: loss=0.000, reward_mean=0.1, reward_bound=0.0
16259: loss=0.000, reward_mean=0.0, reward_bound=0.0
16260: loss=0.000, reward_mean=0.1, reward_bou

16398: loss=0.000, reward_mean=0.0, reward_bound=0.0
16399: loss=0.000, reward_mean=0.0, reward_bound=0.0
16400: loss=0.000, reward_mean=0.0, reward_bound=0.0
16401: loss=0.000, reward_mean=0.1, reward_bound=0.0
16402: loss=0.000, reward_mean=0.0, reward_bound=0.0
16403: loss=0.000, reward_mean=0.0, reward_bound=0.0
16404: loss=0.000, reward_mean=0.0, reward_bound=0.0
16405: loss=0.000, reward_mean=0.1, reward_bound=0.0
16406: loss=0.000, reward_mean=0.0, reward_bound=0.0
16407: loss=0.000, reward_mean=0.0, reward_bound=0.0
16408: loss=0.000, reward_mean=0.0, reward_bound=0.0
16409: loss=0.000, reward_mean=0.0, reward_bound=0.0
16410: loss=0.000, reward_mean=0.1, reward_bound=0.0
16411: loss=0.000, reward_mean=0.1, reward_bound=0.0
16412: loss=0.000, reward_mean=0.1, reward_bound=0.0
16413: loss=0.000, reward_mean=0.1, reward_bound=0.0
16414: loss=0.000, reward_mean=0.0, reward_bound=0.0
16415: loss=0.000, reward_mean=0.1, reward_bound=0.0
16416: loss=0.000, reward_mean=0.1, reward_bou

16554: loss=0.000, reward_mean=0.1, reward_bound=0.0
16555: loss=0.000, reward_mean=0.1, reward_bound=0.0
16556: loss=0.000, reward_mean=0.1, reward_bound=0.0
16557: loss=0.000, reward_mean=0.1, reward_bound=0.0
16558: loss=0.000, reward_mean=0.0, reward_bound=0.0
16559: loss=0.000, reward_mean=0.1, reward_bound=0.0
16560: loss=0.000, reward_mean=0.0, reward_bound=0.0
16561: loss=0.000, reward_mean=0.0, reward_bound=0.0
16562: loss=0.000, reward_mean=0.1, reward_bound=0.0
16563: loss=0.000, reward_mean=0.1, reward_bound=0.0
16564: loss=0.000, reward_mean=0.0, reward_bound=0.0
16565: loss=0.000, reward_mean=0.1, reward_bound=0.0
16566: loss=0.000, reward_mean=0.1, reward_bound=0.0
16567: loss=0.000, reward_mean=0.0, reward_bound=0.0
16568: loss=0.000, reward_mean=0.0, reward_bound=0.0
16569: loss=0.000, reward_mean=0.0, reward_bound=0.0
16570: loss=0.000, reward_mean=0.0, reward_bound=0.0
16571: loss=0.000, reward_mean=0.0, reward_bound=0.0
16572: loss=0.000, reward_mean=0.0, reward_bou

16711: loss=0.000, reward_mean=0.0, reward_bound=0.0
16712: loss=0.000, reward_mean=0.0, reward_bound=0.0
16713: loss=0.000, reward_mean=0.0, reward_bound=0.0
16714: loss=0.000, reward_mean=0.0, reward_bound=0.0
16715: loss=0.000, reward_mean=0.0, reward_bound=0.0
16716: loss=0.000, reward_mean=0.1, reward_bound=0.0
16717: loss=0.000, reward_mean=0.1, reward_bound=0.0
16718: loss=0.000, reward_mean=0.1, reward_bound=0.0
16719: loss=0.000, reward_mean=0.0, reward_bound=0.0
16720: loss=0.000, reward_mean=0.0, reward_bound=0.0
16721: loss=0.000, reward_mean=0.1, reward_bound=0.0
16722: loss=0.000, reward_mean=0.2, reward_bound=0.0
16723: loss=0.000, reward_mean=0.0, reward_bound=0.0
16724: loss=0.000, reward_mean=0.1, reward_bound=0.0
16725: loss=0.000, reward_mean=0.1, reward_bound=0.0
16726: loss=0.000, reward_mean=0.0, reward_bound=0.0
16727: loss=0.000, reward_mean=0.1, reward_bound=0.0
16728: loss=0.000, reward_mean=0.1, reward_bound=0.0
16729: loss=0.000, reward_mean=0.1, reward_bou

16866: loss=0.000, reward_mean=0.0, reward_bound=0.0
16867: loss=0.000, reward_mean=0.1, reward_bound=0.0
16868: loss=0.000, reward_mean=0.0, reward_bound=0.0
16869: loss=0.000, reward_mean=0.1, reward_bound=0.0
16870: loss=0.000, reward_mean=0.0, reward_bound=0.0
16871: loss=0.000, reward_mean=0.0, reward_bound=0.0
16872: loss=0.000, reward_mean=0.1, reward_bound=0.0
16873: loss=0.000, reward_mean=0.1, reward_bound=0.0
16874: loss=0.000, reward_mean=0.1, reward_bound=0.0
16875: loss=0.000, reward_mean=0.1, reward_bound=0.0
16876: loss=0.000, reward_mean=0.2, reward_bound=0.0
16877: loss=0.000, reward_mean=0.1, reward_bound=0.0
16878: loss=0.000, reward_mean=0.0, reward_bound=0.0
16879: loss=0.000, reward_mean=0.1, reward_bound=0.0
16880: loss=0.000, reward_mean=0.0, reward_bound=0.0
16881: loss=0.000, reward_mean=0.1, reward_bound=0.0
16882: loss=0.000, reward_mean=0.1, reward_bound=0.0
16883: loss=0.000, reward_mean=0.0, reward_bound=0.0
16884: loss=0.000, reward_mean=0.0, reward_bou

17022: loss=0.000, reward_mean=0.1, reward_bound=0.0
17023: loss=0.000, reward_mean=0.0, reward_bound=0.0
17024: loss=0.000, reward_mean=0.0, reward_bound=0.0
17025: loss=0.000, reward_mean=0.0, reward_bound=0.0
17026: loss=0.000, reward_mean=0.1, reward_bound=0.0
17027: loss=0.000, reward_mean=0.0, reward_bound=0.0
17028: loss=0.000, reward_mean=0.1, reward_bound=0.0
17029: loss=0.000, reward_mean=0.0, reward_bound=0.0
17030: loss=0.000, reward_mean=0.1, reward_bound=0.0
17031: loss=0.000, reward_mean=0.0, reward_bound=0.0
17032: loss=0.000, reward_mean=0.0, reward_bound=0.0
17033: loss=0.000, reward_mean=0.1, reward_bound=0.0
17034: loss=0.000, reward_mean=0.0, reward_bound=0.0
17035: loss=0.000, reward_mean=0.0, reward_bound=0.0
17036: loss=0.000, reward_mean=0.1, reward_bound=0.0
17037: loss=0.000, reward_mean=0.0, reward_bound=0.0
17038: loss=0.000, reward_mean=0.1, reward_bound=0.0
17039: loss=0.000, reward_mean=0.0, reward_bound=0.0
17040: loss=0.000, reward_mean=0.0, reward_bou

17178: loss=0.000, reward_mean=0.0, reward_bound=0.0
17179: loss=0.000, reward_mean=0.1, reward_bound=0.0
17180: loss=0.000, reward_mean=0.0, reward_bound=0.0
17181: loss=0.000, reward_mean=0.0, reward_bound=0.0
17182: loss=0.000, reward_mean=0.0, reward_bound=0.0
17183: loss=0.000, reward_mean=0.1, reward_bound=0.0
17184: loss=0.000, reward_mean=0.1, reward_bound=0.0
17185: loss=0.000, reward_mean=0.1, reward_bound=0.0
17186: loss=0.000, reward_mean=0.0, reward_bound=0.0
17187: loss=0.000, reward_mean=0.1, reward_bound=0.0
17188: loss=0.000, reward_mean=0.0, reward_bound=0.0
17189: loss=0.000, reward_mean=0.1, reward_bound=0.0
17190: loss=0.000, reward_mean=0.0, reward_bound=0.0
17191: loss=0.000, reward_mean=0.1, reward_bound=0.0
17192: loss=0.000, reward_mean=0.1, reward_bound=0.0
17193: loss=0.000, reward_mean=0.1, reward_bound=0.0
17194: loss=0.000, reward_mean=0.2, reward_bound=0.0
17195: loss=0.000, reward_mean=0.0, reward_bound=0.0
17196: loss=0.000, reward_mean=0.0, reward_bou

17337: loss=0.000, reward_mean=0.0, reward_bound=0.0
17338: loss=0.000, reward_mean=0.0, reward_bound=0.0
17339: loss=0.000, reward_mean=0.1, reward_bound=0.0
17340: loss=0.000, reward_mean=0.1, reward_bound=0.0
17341: loss=0.000, reward_mean=0.1, reward_bound=0.0
17342: loss=0.000, reward_mean=0.0, reward_bound=0.0
17343: loss=0.000, reward_mean=0.1, reward_bound=0.0
17344: loss=0.000, reward_mean=0.0, reward_bound=0.0
17345: loss=0.000, reward_mean=0.0, reward_bound=0.0
17346: loss=0.000, reward_mean=0.0, reward_bound=0.0
17347: loss=0.000, reward_mean=0.2, reward_bound=0.0
17348: loss=0.000, reward_mean=0.1, reward_bound=0.0
17349: loss=0.000, reward_mean=0.1, reward_bound=0.0
17350: loss=0.000, reward_mean=0.0, reward_bound=0.0
17351: loss=0.000, reward_mean=0.1, reward_bound=0.0
17352: loss=0.000, reward_mean=0.1, reward_bound=0.0
17353: loss=0.000, reward_mean=0.1, reward_bound=0.0
17354: loss=0.000, reward_mean=0.1, reward_bound=0.0
17355: loss=0.000, reward_mean=0.1, reward_bou

17492: loss=0.000, reward_mean=0.1, reward_bound=0.0
17493: loss=0.000, reward_mean=0.1, reward_bound=0.0
17494: loss=0.000, reward_mean=0.1, reward_bound=0.0
17495: loss=0.000, reward_mean=0.0, reward_bound=0.0
17496: loss=0.000, reward_mean=0.1, reward_bound=0.0
17497: loss=0.000, reward_mean=0.0, reward_bound=0.0
17498: loss=0.000, reward_mean=0.1, reward_bound=0.0
17499: loss=0.000, reward_mean=0.0, reward_bound=0.0
17500: loss=0.000, reward_mean=0.0, reward_bound=0.0
17501: loss=0.000, reward_mean=0.1, reward_bound=0.0
17502: loss=0.000, reward_mean=0.1, reward_bound=0.0
17503: loss=0.000, reward_mean=0.0, reward_bound=0.0
17504: loss=0.000, reward_mean=0.0, reward_bound=0.0
17505: loss=0.000, reward_mean=0.1, reward_bound=0.0
17506: loss=0.000, reward_mean=0.1, reward_bound=0.0
17507: loss=0.000, reward_mean=0.0, reward_bound=0.0
17508: loss=0.000, reward_mean=0.0, reward_bound=0.0
17509: loss=0.000, reward_mean=0.1, reward_bound=0.0
17510: loss=0.000, reward_mean=0.2, reward_bou

17650: loss=0.000, reward_mean=0.0, reward_bound=0.0
17651: loss=0.000, reward_mean=0.1, reward_bound=0.0
17652: loss=0.000, reward_mean=0.1, reward_bound=0.0
17653: loss=0.000, reward_mean=0.1, reward_bound=0.0
17654: loss=0.000, reward_mean=0.1, reward_bound=0.0
17655: loss=0.000, reward_mean=0.1, reward_bound=0.0
17656: loss=0.000, reward_mean=0.0, reward_bound=0.0
17657: loss=0.000, reward_mean=0.1, reward_bound=0.0
17658: loss=0.000, reward_mean=0.0, reward_bound=0.0
17659: loss=0.000, reward_mean=0.1, reward_bound=0.0
17660: loss=0.000, reward_mean=0.1, reward_bound=0.0
17661: loss=0.000, reward_mean=0.0, reward_bound=0.0
17662: loss=0.000, reward_mean=0.1, reward_bound=0.0
17663: loss=0.000, reward_mean=0.0, reward_bound=0.0
17664: loss=0.000, reward_mean=0.1, reward_bound=0.0
17665: loss=0.000, reward_mean=0.1, reward_bound=0.0
17666: loss=0.000, reward_mean=0.1, reward_bound=0.0
17667: loss=0.000, reward_mean=0.1, reward_bound=0.0
17668: loss=0.000, reward_mean=0.1, reward_bou

17810: loss=0.000, reward_mean=0.1, reward_bound=0.0
17811: loss=0.000, reward_mean=0.0, reward_bound=0.0
17812: loss=0.000, reward_mean=0.1, reward_bound=0.0
17813: loss=0.000, reward_mean=0.1, reward_bound=0.0
17814: loss=0.000, reward_mean=0.0, reward_bound=0.0
17815: loss=0.000, reward_mean=0.1, reward_bound=0.0
17816: loss=0.000, reward_mean=0.0, reward_bound=0.0
17817: loss=0.000, reward_mean=0.1, reward_bound=0.0
17818: loss=0.000, reward_mean=0.1, reward_bound=0.0
17819: loss=0.000, reward_mean=0.0, reward_bound=0.0
17820: loss=0.000, reward_mean=0.0, reward_bound=0.0
17821: loss=0.000, reward_mean=0.0, reward_bound=0.0
17822: loss=0.000, reward_mean=0.0, reward_bound=0.0
17823: loss=0.000, reward_mean=0.1, reward_bound=0.0
17824: loss=0.000, reward_mean=0.0, reward_bound=0.0
17825: loss=0.000, reward_mean=0.0, reward_bound=0.0
17826: loss=0.000, reward_mean=0.0, reward_bound=0.0
17827: loss=0.000, reward_mean=0.0, reward_bound=0.0
17828: loss=0.000, reward_mean=0.1, reward_bou

17969: loss=0.000, reward_mean=0.0, reward_bound=0.0
17970: loss=0.000, reward_mean=0.0, reward_bound=0.0
17971: loss=0.000, reward_mean=0.1, reward_bound=0.0
17972: loss=0.000, reward_mean=0.1, reward_bound=0.0
17973: loss=0.000, reward_mean=0.0, reward_bound=0.0
17974: loss=0.000, reward_mean=0.1, reward_bound=0.0
17975: loss=0.000, reward_mean=0.0, reward_bound=0.0
17976: loss=0.000, reward_mean=0.0, reward_bound=0.0
17977: loss=0.000, reward_mean=0.0, reward_bound=0.0
17978: loss=0.000, reward_mean=0.1, reward_bound=0.0
17979: loss=0.000, reward_mean=0.0, reward_bound=0.0
17980: loss=0.000, reward_mean=0.0, reward_bound=0.0
17981: loss=0.000, reward_mean=0.1, reward_bound=0.0
17982: loss=0.000, reward_mean=0.1, reward_bound=0.0
17983: loss=0.000, reward_mean=0.1, reward_bound=0.0
17984: loss=0.000, reward_mean=0.0, reward_bound=0.0
17985: loss=0.000, reward_mean=0.1, reward_bound=0.0
17986: loss=0.000, reward_mean=0.0, reward_bound=0.0
17987: loss=0.000, reward_mean=0.0, reward_bou

18130: loss=0.000, reward_mean=0.1, reward_bound=0.0
18131: loss=0.000, reward_mean=0.1, reward_bound=0.0
18132: loss=0.000, reward_mean=0.1, reward_bound=0.0
18133: loss=0.000, reward_mean=0.1, reward_bound=0.0
18134: loss=0.000, reward_mean=0.0, reward_bound=0.0
18135: loss=0.000, reward_mean=0.1, reward_bound=0.0
18136: loss=0.000, reward_mean=0.1, reward_bound=0.0
18137: loss=0.000, reward_mean=0.1, reward_bound=0.0
18138: loss=0.000, reward_mean=0.1, reward_bound=0.0
18139: loss=0.000, reward_mean=0.1, reward_bound=0.0
18140: loss=0.000, reward_mean=0.1, reward_bound=0.0
18141: loss=0.000, reward_mean=0.2, reward_bound=0.0
18142: loss=0.000, reward_mean=0.1, reward_bound=0.0
18143: loss=0.000, reward_mean=0.1, reward_bound=0.0
18144: loss=0.000, reward_mean=0.1, reward_bound=0.0
18145: loss=0.000, reward_mean=0.1, reward_bound=0.0
18146: loss=0.000, reward_mean=0.1, reward_bound=0.0
18147: loss=0.000, reward_mean=0.0, reward_bound=0.0
18148: loss=0.000, reward_mean=0.0, reward_bou

18287: loss=0.000, reward_mean=0.0, reward_bound=0.0
18288: loss=0.000, reward_mean=0.1, reward_bound=0.0
18289: loss=0.000, reward_mean=0.1, reward_bound=0.0
18290: loss=0.000, reward_mean=0.1, reward_bound=0.0
18291: loss=0.000, reward_mean=0.0, reward_bound=0.0
18292: loss=0.000, reward_mean=0.1, reward_bound=0.0
18293: loss=0.000, reward_mean=0.0, reward_bound=0.0
18294: loss=0.000, reward_mean=0.2, reward_bound=0.0
18295: loss=0.000, reward_mean=0.0, reward_bound=0.0
18296: loss=0.000, reward_mean=0.1, reward_bound=0.0
18297: loss=0.000, reward_mean=0.0, reward_bound=0.0
18298: loss=0.000, reward_mean=0.1, reward_bound=0.0
18299: loss=0.000, reward_mean=0.1, reward_bound=0.0
18300: loss=0.000, reward_mean=0.1, reward_bound=0.0
18301: loss=0.000, reward_mean=0.1, reward_bound=0.0
18302: loss=0.000, reward_mean=0.1, reward_bound=0.0
18303: loss=0.000, reward_mean=0.0, reward_bound=0.0
18304: loss=0.000, reward_mean=0.1, reward_bound=0.0
18305: loss=0.000, reward_mean=0.0, reward_bou

18444: loss=0.000, reward_mean=0.1, reward_bound=0.0
18445: loss=0.000, reward_mean=0.1, reward_bound=0.0
18446: loss=0.000, reward_mean=0.0, reward_bound=0.0
18447: loss=0.000, reward_mean=0.1, reward_bound=0.0
18448: loss=0.000, reward_mean=0.1, reward_bound=0.0
18449: loss=0.000, reward_mean=0.1, reward_bound=0.0
18450: loss=0.000, reward_mean=0.0, reward_bound=0.0
18451: loss=0.000, reward_mean=0.0, reward_bound=0.0
18452: loss=0.000, reward_mean=0.1, reward_bound=0.0
18453: loss=0.000, reward_mean=0.1, reward_bound=0.0
18454: loss=0.000, reward_mean=0.1, reward_bound=0.0
18455: loss=0.000, reward_mean=0.0, reward_bound=0.0
18456: loss=0.000, reward_mean=0.0, reward_bound=0.0
18457: loss=0.000, reward_mean=0.0, reward_bound=0.0
18458: loss=0.000, reward_mean=0.1, reward_bound=0.0
18459: loss=0.000, reward_mean=0.1, reward_bound=0.0
18460: loss=0.000, reward_mean=0.0, reward_bound=0.0
18461: loss=0.000, reward_mean=0.1, reward_bound=0.0
18462: loss=0.000, reward_mean=0.1, reward_bou

18599: loss=0.000, reward_mean=0.1, reward_bound=0.0
18600: loss=0.000, reward_mean=0.1, reward_bound=0.0
18601: loss=0.000, reward_mean=0.2, reward_bound=0.0
18602: loss=0.000, reward_mean=0.1, reward_bound=0.0
18603: loss=0.000, reward_mean=0.1, reward_bound=0.0
18604: loss=0.000, reward_mean=0.0, reward_bound=0.0
18605: loss=0.000, reward_mean=0.1, reward_bound=0.0
18606: loss=0.000, reward_mean=0.0, reward_bound=0.0
18607: loss=0.000, reward_mean=0.0, reward_bound=0.0
18608: loss=0.000, reward_mean=0.1, reward_bound=0.0
18609: loss=0.000, reward_mean=0.1, reward_bound=0.0
18610: loss=0.000, reward_mean=0.0, reward_bound=0.0
18611: loss=0.000, reward_mean=0.1, reward_bound=0.0
18612: loss=0.000, reward_mean=0.2, reward_bound=0.0
18613: loss=0.000, reward_mean=0.1, reward_bound=0.0
18614: loss=0.000, reward_mean=0.0, reward_bound=0.0
18615: loss=0.000, reward_mean=0.0, reward_bound=0.0
18616: loss=0.000, reward_mean=0.0, reward_bound=0.0
18617: loss=0.000, reward_mean=0.0, reward_bou

18754: loss=0.000, reward_mean=0.1, reward_bound=0.0
18755: loss=0.000, reward_mean=0.1, reward_bound=0.0
18756: loss=0.000, reward_mean=0.0, reward_bound=0.0
18757: loss=0.000, reward_mean=0.1, reward_bound=0.0
18758: loss=0.000, reward_mean=0.1, reward_bound=0.0
18759: loss=0.000, reward_mean=0.1, reward_bound=0.0
18760: loss=0.000, reward_mean=0.0, reward_bound=0.0
18761: loss=0.000, reward_mean=0.0, reward_bound=0.0
18762: loss=0.000, reward_mean=0.1, reward_bound=0.0
18763: loss=0.000, reward_mean=0.1, reward_bound=0.0
18764: loss=0.000, reward_mean=0.1, reward_bound=0.0
18765: loss=0.000, reward_mean=0.0, reward_bound=0.0
18766: loss=0.000, reward_mean=0.0, reward_bound=0.0
18767: loss=0.000, reward_mean=0.1, reward_bound=0.0
18768: loss=0.000, reward_mean=0.1, reward_bound=0.0
18769: loss=0.000, reward_mean=0.0, reward_bound=0.0
18770: loss=0.000, reward_mean=0.1, reward_bound=0.0
18771: loss=0.000, reward_mean=0.1, reward_bound=0.0
18772: loss=0.000, reward_mean=0.0, reward_bou

18912: loss=0.000, reward_mean=0.1, reward_bound=0.0
18913: loss=0.000, reward_mean=0.1, reward_bound=0.0
18914: loss=0.000, reward_mean=0.0, reward_bound=0.0
18915: loss=0.000, reward_mean=0.1, reward_bound=0.0
18916: loss=0.000, reward_mean=0.1, reward_bound=0.0
18917: loss=0.000, reward_mean=0.1, reward_bound=0.0
18918: loss=0.000, reward_mean=0.1, reward_bound=0.0
18919: loss=0.000, reward_mean=0.1, reward_bound=0.0
18920: loss=0.000, reward_mean=0.1, reward_bound=0.0
18921: loss=0.000, reward_mean=0.0, reward_bound=0.0
18922: loss=0.000, reward_mean=0.1, reward_bound=0.0
18923: loss=0.000, reward_mean=0.1, reward_bound=0.0
18924: loss=0.000, reward_mean=0.0, reward_bound=0.0
18925: loss=0.000, reward_mean=0.1, reward_bound=0.0
18926: loss=0.000, reward_mean=0.1, reward_bound=0.0
18927: loss=0.000, reward_mean=0.0, reward_bound=0.0
18928: loss=0.000, reward_mean=0.0, reward_bound=0.0
18929: loss=0.000, reward_mean=0.0, reward_bound=0.0
18930: loss=0.000, reward_mean=0.1, reward_bou

19072: loss=0.000, reward_mean=0.1, reward_bound=0.0
19073: loss=0.000, reward_mean=0.1, reward_bound=0.0
19074: loss=0.000, reward_mean=0.1, reward_bound=0.0
19075: loss=0.000, reward_mean=0.0, reward_bound=0.0
19076: loss=0.000, reward_mean=0.1, reward_bound=0.0
19077: loss=0.000, reward_mean=0.0, reward_bound=0.0
19078: loss=0.000, reward_mean=0.0, reward_bound=0.0
19079: loss=0.000, reward_mean=0.1, reward_bound=0.0
19080: loss=0.000, reward_mean=0.0, reward_bound=0.0
19081: loss=0.000, reward_mean=0.0, reward_bound=0.0
19082: loss=0.000, reward_mean=0.1, reward_bound=0.0
19083: loss=0.000, reward_mean=0.1, reward_bound=0.0
19084: loss=0.000, reward_mean=0.3, reward_bound=0.5
19085: loss=0.000, reward_mean=0.0, reward_bound=0.0
19086: loss=0.000, reward_mean=0.1, reward_bound=0.0
19087: loss=0.000, reward_mean=0.0, reward_bound=0.0
19088: loss=0.000, reward_mean=0.1, reward_bound=0.0
19089: loss=0.000, reward_mean=0.1, reward_bound=0.0
19090: loss=0.000, reward_mean=0.1, reward_bou

19228: loss=0.000, reward_mean=0.0, reward_bound=0.0
19229: loss=0.000, reward_mean=0.1, reward_bound=0.0
19230: loss=0.000, reward_mean=0.1, reward_bound=0.0
19231: loss=0.000, reward_mean=0.1, reward_bound=0.0
19232: loss=0.000, reward_mean=0.1, reward_bound=0.0
19233: loss=0.000, reward_mean=0.0, reward_bound=0.0
19234: loss=0.000, reward_mean=0.0, reward_bound=0.0
19235: loss=0.000, reward_mean=0.0, reward_bound=0.0
19236: loss=0.000, reward_mean=0.1, reward_bound=0.0
19237: loss=0.000, reward_mean=0.1, reward_bound=0.0
19238: loss=0.000, reward_mean=0.0, reward_bound=0.0
19239: loss=0.000, reward_mean=0.1, reward_bound=0.0
19240: loss=0.000, reward_mean=0.0, reward_bound=0.0
19241: loss=0.000, reward_mean=0.1, reward_bound=0.0
19242: loss=0.000, reward_mean=0.1, reward_bound=0.0
19243: loss=0.000, reward_mean=0.0, reward_bound=0.0
19244: loss=0.000, reward_mean=0.1, reward_bound=0.0
19245: loss=0.000, reward_mean=0.1, reward_bound=0.0
19246: loss=0.000, reward_mean=0.0, reward_bou

19383: loss=0.000, reward_mean=0.1, reward_bound=0.0
19384: loss=0.000, reward_mean=0.1, reward_bound=0.0
19385: loss=0.000, reward_mean=0.1, reward_bound=0.0
19386: loss=0.000, reward_mean=0.0, reward_bound=0.0
19387: loss=0.000, reward_mean=0.1, reward_bound=0.0
19388: loss=0.000, reward_mean=0.0, reward_bound=0.0
19389: loss=0.000, reward_mean=0.1, reward_bound=0.0
19390: loss=0.000, reward_mean=0.1, reward_bound=0.0
19391: loss=0.000, reward_mean=0.1, reward_bound=0.0
19392: loss=0.000, reward_mean=0.0, reward_bound=0.0
19393: loss=0.000, reward_mean=0.0, reward_bound=0.0
19394: loss=0.000, reward_mean=0.1, reward_bound=0.0
19395: loss=0.000, reward_mean=0.0, reward_bound=0.0
19396: loss=0.000, reward_mean=0.1, reward_bound=0.0
19397: loss=0.000, reward_mean=0.1, reward_bound=0.0
19398: loss=0.000, reward_mean=0.1, reward_bound=0.0
19399: loss=0.000, reward_mean=0.0, reward_bound=0.0
19400: loss=0.000, reward_mean=0.0, reward_bound=0.0
19401: loss=0.000, reward_mean=0.0, reward_bou

19543: loss=0.000, reward_mean=0.0, reward_bound=0.0
19544: loss=0.000, reward_mean=0.0, reward_bound=0.0
19545: loss=0.000, reward_mean=0.0, reward_bound=0.0
19546: loss=0.000, reward_mean=0.1, reward_bound=0.0
19547: loss=0.000, reward_mean=0.0, reward_bound=0.0
19548: loss=0.000, reward_mean=0.1, reward_bound=0.0
19549: loss=0.000, reward_mean=0.1, reward_bound=0.0
19550: loss=0.000, reward_mean=0.1, reward_bound=0.0
19551: loss=0.000, reward_mean=0.1, reward_bound=0.0
19552: loss=0.000, reward_mean=0.0, reward_bound=0.0
19553: loss=0.000, reward_mean=0.1, reward_bound=0.0
19554: loss=0.000, reward_mean=0.1, reward_bound=0.0
19555: loss=0.000, reward_mean=0.1, reward_bound=0.0
19556: loss=0.000, reward_mean=0.1, reward_bound=0.0
19557: loss=0.000, reward_mean=0.0, reward_bound=0.0
19558: loss=0.000, reward_mean=0.0, reward_bound=0.0
19559: loss=0.000, reward_mean=0.1, reward_bound=0.0
19560: loss=0.000, reward_mean=0.0, reward_bound=0.0
19561: loss=0.000, reward_mean=0.0, reward_bou

19701: loss=0.000, reward_mean=0.1, reward_bound=0.0
19702: loss=0.000, reward_mean=0.0, reward_bound=0.0
19703: loss=0.000, reward_mean=0.0, reward_bound=0.0
19704: loss=0.000, reward_mean=0.1, reward_bound=0.0
19705: loss=0.000, reward_mean=0.1, reward_bound=0.0
19706: loss=0.000, reward_mean=0.1, reward_bound=0.0
19707: loss=0.000, reward_mean=0.0, reward_bound=0.0
19708: loss=0.000, reward_mean=0.1, reward_bound=0.0
19709: loss=0.000, reward_mean=0.0, reward_bound=0.0
19710: loss=0.000, reward_mean=0.0, reward_bound=0.0
19711: loss=0.000, reward_mean=0.0, reward_bound=0.0
19712: loss=0.000, reward_mean=0.1, reward_bound=0.0
19713: loss=0.000, reward_mean=0.1, reward_bound=0.0
19714: loss=0.000, reward_mean=0.1, reward_bound=0.0
19715: loss=0.000, reward_mean=0.0, reward_bound=0.0
19716: loss=0.000, reward_mean=0.1, reward_bound=0.0
19717: loss=0.000, reward_mean=0.1, reward_bound=0.0
19718: loss=0.000, reward_mean=0.0, reward_bound=0.0
19719: loss=0.000, reward_mean=0.0, reward_bou

19856: loss=0.000, reward_mean=0.1, reward_bound=0.0
19857: loss=0.000, reward_mean=0.1, reward_bound=0.0
19858: loss=0.000, reward_mean=0.1, reward_bound=0.0
19859: loss=0.000, reward_mean=0.2, reward_bound=0.0
19860: loss=0.000, reward_mean=0.0, reward_bound=0.0
19861: loss=0.000, reward_mean=0.1, reward_bound=0.0
19862: loss=0.000, reward_mean=0.0, reward_bound=0.0
19863: loss=0.000, reward_mean=0.0, reward_bound=0.0
19864: loss=0.000, reward_mean=0.0, reward_bound=0.0
19865: loss=0.000, reward_mean=0.0, reward_bound=0.0
19866: loss=0.000, reward_mean=0.1, reward_bound=0.0
19867: loss=0.000, reward_mean=0.0, reward_bound=0.0
19868: loss=0.000, reward_mean=0.1, reward_bound=0.0
19869: loss=0.000, reward_mean=0.0, reward_bound=0.0
19870: loss=0.000, reward_mean=0.1, reward_bound=0.0
19871: loss=0.000, reward_mean=0.0, reward_bound=0.0
19872: loss=0.000, reward_mean=0.0, reward_bound=0.0
19873: loss=0.000, reward_mean=0.2, reward_bound=0.0
19874: loss=0.000, reward_mean=0.0, reward_bou

20015: loss=0.000, reward_mean=0.1, reward_bound=0.0
20016: loss=0.000, reward_mean=0.0, reward_bound=0.0
20017: loss=0.000, reward_mean=0.1, reward_bound=0.0
20018: loss=0.000, reward_mean=0.1, reward_bound=0.0
20019: loss=0.000, reward_mean=0.0, reward_bound=0.0
20020: loss=0.000, reward_mean=0.0, reward_bound=0.0
20021: loss=0.000, reward_mean=0.1, reward_bound=0.0
20022: loss=0.000, reward_mean=0.1, reward_bound=0.0
20023: loss=0.000, reward_mean=0.1, reward_bound=0.0
20024: loss=0.000, reward_mean=0.1, reward_bound=0.0
20025: loss=0.000, reward_mean=0.1, reward_bound=0.0
20026: loss=0.000, reward_mean=0.0, reward_bound=0.0
20027: loss=0.000, reward_mean=0.1, reward_bound=0.0
20028: loss=0.000, reward_mean=0.1, reward_bound=0.0
20029: loss=0.000, reward_mean=0.1, reward_bound=0.0
20030: loss=0.000, reward_mean=0.1, reward_bound=0.0
20031: loss=0.000, reward_mean=0.0, reward_bound=0.0
20032: loss=0.000, reward_mean=0.0, reward_bound=0.0
20033: loss=0.000, reward_mean=0.0, reward_bou

20175: loss=0.000, reward_mean=0.1, reward_bound=0.0
20176: loss=0.000, reward_mean=0.1, reward_bound=0.0
20177: loss=0.000, reward_mean=0.0, reward_bound=0.0
20178: loss=0.000, reward_mean=0.0, reward_bound=0.0
20179: loss=0.000, reward_mean=0.0, reward_bound=0.0
20180: loss=0.000, reward_mean=0.0, reward_bound=0.0
20181: loss=0.000, reward_mean=0.1, reward_bound=0.0
20182: loss=0.000, reward_mean=0.0, reward_bound=0.0
20183: loss=0.000, reward_mean=0.1, reward_bound=0.0
20184: loss=0.000, reward_mean=0.1, reward_bound=0.0
20185: loss=0.000, reward_mean=0.0, reward_bound=0.0
20186: loss=0.000, reward_mean=0.0, reward_bound=0.0
20187: loss=0.000, reward_mean=0.0, reward_bound=0.0
20188: loss=0.000, reward_mean=0.0, reward_bound=0.0
20189: loss=0.000, reward_mean=0.1, reward_bound=0.0
20190: loss=0.000, reward_mean=0.0, reward_bound=0.0
20191: loss=0.000, reward_mean=0.1, reward_bound=0.0
20192: loss=0.000, reward_mean=0.1, reward_bound=0.0
20193: loss=0.000, reward_mean=0.1, reward_bou

20332: loss=0.000, reward_mean=0.1, reward_bound=0.0
20333: loss=0.000, reward_mean=0.1, reward_bound=0.0
20334: loss=0.000, reward_mean=0.1, reward_bound=0.0
20335: loss=0.000, reward_mean=0.1, reward_bound=0.0
20336: loss=0.000, reward_mean=0.1, reward_bound=0.0
20337: loss=0.000, reward_mean=0.1, reward_bound=0.0
20338: loss=0.000, reward_mean=0.1, reward_bound=0.0
20339: loss=0.000, reward_mean=0.0, reward_bound=0.0
20340: loss=0.000, reward_mean=0.1, reward_bound=0.0
20341: loss=0.000, reward_mean=0.1, reward_bound=0.0
20342: loss=0.000, reward_mean=0.1, reward_bound=0.0
20343: loss=0.000, reward_mean=0.0, reward_bound=0.0
20344: loss=0.000, reward_mean=0.1, reward_bound=0.0
20345: loss=0.000, reward_mean=0.1, reward_bound=0.0
20346: loss=0.000, reward_mean=0.0, reward_bound=0.0
20347: loss=0.000, reward_mean=0.2, reward_bound=0.0
20348: loss=0.000, reward_mean=0.1, reward_bound=0.0
20349: loss=0.000, reward_mean=0.0, reward_bound=0.0
20350: loss=0.000, reward_mean=0.1, reward_bou

20489: loss=0.000, reward_mean=0.1, reward_bound=0.0
20490: loss=0.000, reward_mean=0.1, reward_bound=0.0
20491: loss=0.000, reward_mean=0.0, reward_bound=0.0
20492: loss=0.000, reward_mean=0.0, reward_bound=0.0
20493: loss=0.000, reward_mean=0.1, reward_bound=0.0
20494: loss=0.000, reward_mean=0.1, reward_bound=0.0
20495: loss=0.000, reward_mean=0.2, reward_bound=0.0
20496: loss=0.000, reward_mean=0.0, reward_bound=0.0
20497: loss=0.000, reward_mean=0.0, reward_bound=0.0
20498: loss=0.000, reward_mean=0.1, reward_bound=0.0
20499: loss=0.000, reward_mean=0.2, reward_bound=0.0
20500: loss=0.000, reward_mean=0.0, reward_bound=0.0
20501: loss=0.000, reward_mean=0.1, reward_bound=0.0
20502: loss=0.000, reward_mean=0.1, reward_bound=0.0
20503: loss=0.000, reward_mean=0.1, reward_bound=0.0
20504: loss=0.000, reward_mean=0.1, reward_bound=0.0
20505: loss=0.000, reward_mean=0.1, reward_bound=0.0
20506: loss=0.000, reward_mean=0.1, reward_bound=0.0
20507: loss=0.000, reward_mean=0.1, reward_bou

20646: loss=0.000, reward_mean=0.1, reward_bound=0.0
20647: loss=0.000, reward_mean=0.0, reward_bound=0.0
20648: loss=0.000, reward_mean=0.2, reward_bound=0.0
20649: loss=0.000, reward_mean=0.1, reward_bound=0.0
20650: loss=0.000, reward_mean=0.0, reward_bound=0.0
20651: loss=0.000, reward_mean=0.0, reward_bound=0.0
20652: loss=0.000, reward_mean=0.1, reward_bound=0.0
20653: loss=0.000, reward_mean=0.1, reward_bound=0.0
20654: loss=0.000, reward_mean=0.0, reward_bound=0.0
20655: loss=0.000, reward_mean=0.0, reward_bound=0.0
20656: loss=0.000, reward_mean=0.0, reward_bound=0.0
20657: loss=0.000, reward_mean=0.1, reward_bound=0.0
20658: loss=0.000, reward_mean=0.0, reward_bound=0.0
20659: loss=0.000, reward_mean=0.0, reward_bound=0.0
20660: loss=0.000, reward_mean=0.0, reward_bound=0.0
20661: loss=0.000, reward_mean=0.1, reward_bound=0.0
20662: loss=0.000, reward_mean=0.1, reward_bound=0.0
20663: loss=0.000, reward_mean=0.0, reward_bound=0.0
20664: loss=0.000, reward_mean=0.1, reward_bou

20804: loss=0.000, reward_mean=0.1, reward_bound=0.0
20805: loss=0.000, reward_mean=0.1, reward_bound=0.0
20806: loss=0.000, reward_mean=0.1, reward_bound=0.0
20807: loss=0.000, reward_mean=0.1, reward_bound=0.0
20808: loss=0.000, reward_mean=0.1, reward_bound=0.0
20809: loss=0.000, reward_mean=0.0, reward_bound=0.0
20810: loss=0.000, reward_mean=0.0, reward_bound=0.0
20811: loss=0.000, reward_mean=0.1, reward_bound=0.0
20812: loss=0.000, reward_mean=0.0, reward_bound=0.0
20813: loss=0.000, reward_mean=0.1, reward_bound=0.0
20814: loss=0.000, reward_mean=0.1, reward_bound=0.0
20815: loss=0.000, reward_mean=0.1, reward_bound=0.0
20816: loss=0.000, reward_mean=0.0, reward_bound=0.0
20817: loss=0.000, reward_mean=0.2, reward_bound=0.0
20818: loss=0.000, reward_mean=0.2, reward_bound=0.0
20819: loss=0.000, reward_mean=0.1, reward_bound=0.0
20820: loss=0.000, reward_mean=0.1, reward_bound=0.0
20821: loss=0.000, reward_mean=0.0, reward_bound=0.0
20822: loss=0.000, reward_mean=0.1, reward_bou

20961: loss=0.000, reward_mean=0.0, reward_bound=0.0
20962: loss=0.000, reward_mean=0.0, reward_bound=0.0
20963: loss=0.000, reward_mean=0.0, reward_bound=0.0
20964: loss=0.000, reward_mean=0.1, reward_bound=0.0
20965: loss=0.000, reward_mean=0.0, reward_bound=0.0
20966: loss=0.000, reward_mean=0.1, reward_bound=0.0
20967: loss=0.000, reward_mean=0.0, reward_bound=0.0
20968: loss=0.000, reward_mean=0.1, reward_bound=0.0
20969: loss=0.000, reward_mean=0.0, reward_bound=0.0
20970: loss=0.000, reward_mean=0.1, reward_bound=0.0
20971: loss=0.000, reward_mean=0.1, reward_bound=0.0
20972: loss=0.000, reward_mean=0.0, reward_bound=0.0
20973: loss=0.000, reward_mean=0.1, reward_bound=0.0
20974: loss=0.000, reward_mean=0.1, reward_bound=0.0
20975: loss=0.000, reward_mean=0.0, reward_bound=0.0
20976: loss=0.000, reward_mean=0.0, reward_bound=0.0
20977: loss=0.000, reward_mean=0.1, reward_bound=0.0
20978: loss=0.000, reward_mean=0.1, reward_bound=0.0
20979: loss=0.000, reward_mean=0.1, reward_bou

21121: loss=0.000, reward_mean=0.1, reward_bound=0.0
21122: loss=0.000, reward_mean=0.1, reward_bound=0.0
21123: loss=0.000, reward_mean=0.0, reward_bound=0.0
21124: loss=0.000, reward_mean=0.1, reward_bound=0.0
21125: loss=0.000, reward_mean=0.1, reward_bound=0.0
21126: loss=0.000, reward_mean=0.1, reward_bound=0.0
21127: loss=0.000, reward_mean=0.2, reward_bound=0.0
21128: loss=0.000, reward_mean=0.1, reward_bound=0.0
21129: loss=0.000, reward_mean=0.0, reward_bound=0.0
21130: loss=0.000, reward_mean=0.1, reward_bound=0.0
21131: loss=0.000, reward_mean=0.0, reward_bound=0.0
21132: loss=0.000, reward_mean=0.0, reward_bound=0.0
21133: loss=0.000, reward_mean=0.1, reward_bound=0.0
21134: loss=0.000, reward_mean=0.0, reward_bound=0.0
21135: loss=0.000, reward_mean=0.0, reward_bound=0.0
21136: loss=0.000, reward_mean=0.1, reward_bound=0.0
21137: loss=0.000, reward_mean=0.1, reward_bound=0.0
21138: loss=0.000, reward_mean=0.1, reward_bound=0.0
21139: loss=0.000, reward_mean=0.0, reward_bou

21276: loss=0.000, reward_mean=0.1, reward_bound=0.0
21277: loss=0.000, reward_mean=0.1, reward_bound=0.0
21278: loss=0.000, reward_mean=0.1, reward_bound=0.0
21279: loss=0.000, reward_mean=0.1, reward_bound=0.0
21280: loss=0.000, reward_mean=0.2, reward_bound=0.0
21281: loss=0.000, reward_mean=0.1, reward_bound=0.0
21282: loss=0.000, reward_mean=0.0, reward_bound=0.0
21283: loss=0.000, reward_mean=0.0, reward_bound=0.0
21284: loss=0.000, reward_mean=0.0, reward_bound=0.0
21285: loss=0.000, reward_mean=0.1, reward_bound=0.0
21286: loss=0.000, reward_mean=0.1, reward_bound=0.0
21287: loss=0.000, reward_mean=0.0, reward_bound=0.0
21288: loss=0.000, reward_mean=0.1, reward_bound=0.0
21289: loss=0.000, reward_mean=0.1, reward_bound=0.0
21290: loss=0.000, reward_mean=0.1, reward_bound=0.0
21291: loss=0.000, reward_mean=0.1, reward_bound=0.0
21292: loss=0.000, reward_mean=0.1, reward_bound=0.0
21293: loss=0.000, reward_mean=0.1, reward_bound=0.0
21294: loss=0.000, reward_mean=0.0, reward_bou

21432: loss=0.000, reward_mean=0.0, reward_bound=0.0
21433: loss=0.000, reward_mean=0.0, reward_bound=0.0
21434: loss=0.000, reward_mean=0.0, reward_bound=0.0
21435: loss=0.000, reward_mean=0.0, reward_bound=0.0
21436: loss=0.000, reward_mean=0.1, reward_bound=0.0
21437: loss=0.000, reward_mean=0.1, reward_bound=0.0
21438: loss=0.000, reward_mean=0.0, reward_bound=0.0
21439: loss=0.000, reward_mean=0.1, reward_bound=0.0
21440: loss=0.000, reward_mean=0.1, reward_bound=0.0
21441: loss=0.000, reward_mean=0.1, reward_bound=0.0
21442: loss=0.000, reward_mean=0.1, reward_bound=0.0
21443: loss=0.000, reward_mean=0.1, reward_bound=0.0
21444: loss=0.000, reward_mean=0.1, reward_bound=0.0
21445: loss=0.000, reward_mean=0.1, reward_bound=0.0
21446: loss=0.000, reward_mean=0.1, reward_bound=0.0
21447: loss=0.000, reward_mean=0.0, reward_bound=0.0
21448: loss=0.000, reward_mean=0.0, reward_bound=0.0
21449: loss=0.000, reward_mean=0.0, reward_bound=0.0
21450: loss=0.000, reward_mean=0.1, reward_bou

21591: loss=0.000, reward_mean=0.1, reward_bound=0.0
21592: loss=0.000, reward_mean=0.1, reward_bound=0.0
21593: loss=0.000, reward_mean=0.0, reward_bound=0.0
21594: loss=0.000, reward_mean=0.1, reward_bound=0.0
21595: loss=0.000, reward_mean=0.1, reward_bound=0.0
21596: loss=0.000, reward_mean=0.1, reward_bound=0.0
21597: loss=0.000, reward_mean=0.1, reward_bound=0.0
21598: loss=0.000, reward_mean=0.1, reward_bound=0.0
21599: loss=0.000, reward_mean=0.1, reward_bound=0.0
21600: loss=0.000, reward_mean=0.0, reward_bound=0.0
21601: loss=0.000, reward_mean=0.0, reward_bound=0.0
21602: loss=0.000, reward_mean=0.1, reward_bound=0.0
21603: loss=0.000, reward_mean=0.0, reward_bound=0.0
21604: loss=0.000, reward_mean=0.1, reward_bound=0.0
21605: loss=0.000, reward_mean=0.1, reward_bound=0.0
21606: loss=0.000, reward_mean=0.0, reward_bound=0.0
21607: loss=0.000, reward_mean=0.1, reward_bound=0.0
21608: loss=0.000, reward_mean=0.0, reward_bound=0.0
21609: loss=0.000, reward_mean=0.0, reward_bou

21750: loss=0.000, reward_mean=0.2, reward_bound=0.0
21751: loss=0.000, reward_mean=0.1, reward_bound=0.0
21752: loss=0.000, reward_mean=0.0, reward_bound=0.0
21753: loss=0.000, reward_mean=0.0, reward_bound=0.0
21754: loss=0.000, reward_mean=0.2, reward_bound=0.0
21755: loss=0.000, reward_mean=0.0, reward_bound=0.0
21756: loss=0.000, reward_mean=0.0, reward_bound=0.0
21757: loss=0.000, reward_mean=0.1, reward_bound=0.0
21758: loss=0.000, reward_mean=0.1, reward_bound=0.0
21759: loss=0.000, reward_mean=0.1, reward_bound=0.0
21760: loss=0.000, reward_mean=0.0, reward_bound=0.0
21761: loss=0.000, reward_mean=0.1, reward_bound=0.0
21762: loss=0.000, reward_mean=0.1, reward_bound=0.0
21763: loss=0.000, reward_mean=0.2, reward_bound=0.0
21764: loss=0.000, reward_mean=0.1, reward_bound=0.0
21765: loss=0.000, reward_mean=0.1, reward_bound=0.0
21766: loss=0.000, reward_mean=0.1, reward_bound=0.0
21767: loss=0.000, reward_mean=0.2, reward_bound=0.0
21768: loss=0.000, reward_mean=0.0, reward_bou

21906: loss=0.000, reward_mean=0.1, reward_bound=0.0
21907: loss=0.000, reward_mean=0.0, reward_bound=0.0
21908: loss=0.000, reward_mean=0.0, reward_bound=0.0
21909: loss=0.000, reward_mean=0.0, reward_bound=0.0
21910: loss=0.000, reward_mean=0.0, reward_bound=0.0
21911: loss=0.000, reward_mean=0.1, reward_bound=0.0
21912: loss=0.000, reward_mean=0.1, reward_bound=0.0
21913: loss=0.000, reward_mean=0.1, reward_bound=0.0
21914: loss=0.000, reward_mean=0.1, reward_bound=0.0
21915: loss=0.000, reward_mean=0.1, reward_bound=0.0
21916: loss=0.000, reward_mean=0.0, reward_bound=0.0
21917: loss=0.000, reward_mean=0.1, reward_bound=0.0
21918: loss=0.000, reward_mean=0.0, reward_bound=0.0
21919: loss=0.000, reward_mean=0.0, reward_bound=0.0
21920: loss=0.000, reward_mean=0.0, reward_bound=0.0
21921: loss=0.000, reward_mean=0.0, reward_bound=0.0
21922: loss=0.000, reward_mean=0.0, reward_bound=0.0
21923: loss=0.000, reward_mean=0.1, reward_bound=0.0
21924: loss=0.000, reward_mean=0.0, reward_bou

22063: loss=0.000, reward_mean=0.1, reward_bound=0.0
22064: loss=0.000, reward_mean=0.0, reward_bound=0.0
22065: loss=0.000, reward_mean=0.1, reward_bound=0.0
22066: loss=0.000, reward_mean=0.0, reward_bound=0.0
22067: loss=0.000, reward_mean=0.0, reward_bound=0.0
22068: loss=0.000, reward_mean=0.0, reward_bound=0.0
22069: loss=0.000, reward_mean=0.1, reward_bound=0.0
22070: loss=0.000, reward_mean=0.1, reward_bound=0.0
22071: loss=0.000, reward_mean=0.1, reward_bound=0.0
22072: loss=0.000, reward_mean=0.0, reward_bound=0.0
22073: loss=0.000, reward_mean=0.1, reward_bound=0.0
22074: loss=0.000, reward_mean=0.0, reward_bound=0.0
22075: loss=0.000, reward_mean=0.1, reward_bound=0.0
22076: loss=0.000, reward_mean=0.0, reward_bound=0.0
22077: loss=0.000, reward_mean=0.1, reward_bound=0.0
22078: loss=0.000, reward_mean=0.1, reward_bound=0.0
22079: loss=0.000, reward_mean=0.1, reward_bound=0.0
22080: loss=0.000, reward_mean=0.0, reward_bound=0.0
22081: loss=0.000, reward_mean=0.1, reward_bou

22221: loss=0.000, reward_mean=0.1, reward_bound=0.0
22222: loss=0.000, reward_mean=0.0, reward_bound=0.0
22223: loss=0.000, reward_mean=0.1, reward_bound=0.0
22224: loss=0.000, reward_mean=0.0, reward_bound=0.0
22225: loss=0.000, reward_mean=0.0, reward_bound=0.0
22226: loss=0.000, reward_mean=0.0, reward_bound=0.0
22227: loss=0.000, reward_mean=0.1, reward_bound=0.0
22228: loss=0.000, reward_mean=0.1, reward_bound=0.0
22229: loss=0.000, reward_mean=0.1, reward_bound=0.0
22230: loss=0.000, reward_mean=0.2, reward_bound=0.0
22231: loss=0.000, reward_mean=0.0, reward_bound=0.0
22232: loss=0.000, reward_mean=0.1, reward_bound=0.0
22233: loss=0.000, reward_mean=0.1, reward_bound=0.0
22234: loss=0.000, reward_mean=0.0, reward_bound=0.0
22235: loss=0.000, reward_mean=0.1, reward_bound=0.0
22236: loss=0.000, reward_mean=0.1, reward_bound=0.0
22237: loss=0.000, reward_mean=0.1, reward_bound=0.0
22238: loss=0.000, reward_mean=0.1, reward_bound=0.0
22239: loss=0.000, reward_mean=0.1, reward_bou

22378: loss=0.000, reward_mean=0.0, reward_bound=0.0
22379: loss=0.000, reward_mean=0.0, reward_bound=0.0
22380: loss=0.000, reward_mean=0.0, reward_bound=0.0
22381: loss=0.000, reward_mean=0.1, reward_bound=0.0
22382: loss=0.000, reward_mean=0.0, reward_bound=0.0
22383: loss=0.000, reward_mean=0.1, reward_bound=0.0
22384: loss=0.000, reward_mean=0.1, reward_bound=0.0
22385: loss=0.000, reward_mean=0.0, reward_bound=0.0
22386: loss=0.000, reward_mean=0.1, reward_bound=0.0
22387: loss=0.000, reward_mean=0.1, reward_bound=0.0
22388: loss=0.000, reward_mean=0.0, reward_bound=0.0
22389: loss=0.000, reward_mean=0.1, reward_bound=0.0
22390: loss=0.000, reward_mean=0.1, reward_bound=0.0
22391: loss=0.000, reward_mean=0.0, reward_bound=0.0
22392: loss=0.000, reward_mean=0.1, reward_bound=0.0
22393: loss=0.000, reward_mean=0.0, reward_bound=0.0
22394: loss=0.000, reward_mean=0.0, reward_bound=0.0
22395: loss=0.000, reward_mean=0.1, reward_bound=0.0
22396: loss=0.000, reward_mean=0.0, reward_bou

22539: loss=0.000, reward_mean=0.0, reward_bound=0.0
22540: loss=0.000, reward_mean=0.0, reward_bound=0.0
22541: loss=0.000, reward_mean=0.0, reward_bound=0.0
22542: loss=0.000, reward_mean=0.0, reward_bound=0.0
22543: loss=0.000, reward_mean=0.1, reward_bound=0.0
22544: loss=0.000, reward_mean=0.1, reward_bound=0.0
22545: loss=0.000, reward_mean=0.0, reward_bound=0.0
22546: loss=0.000, reward_mean=0.1, reward_bound=0.0
22547: loss=0.000, reward_mean=0.1, reward_bound=0.0
22548: loss=0.000, reward_mean=0.1, reward_bound=0.0
22549: loss=0.000, reward_mean=0.1, reward_bound=0.0
22550: loss=0.000, reward_mean=0.1, reward_bound=0.0
22551: loss=0.000, reward_mean=0.1, reward_bound=0.0
22552: loss=0.000, reward_mean=0.0, reward_bound=0.0
22553: loss=0.000, reward_mean=0.0, reward_bound=0.0
22554: loss=0.000, reward_mean=0.1, reward_bound=0.0
22555: loss=0.000, reward_mean=0.1, reward_bound=0.0
22556: loss=0.000, reward_mean=0.0, reward_bound=0.0
22557: loss=0.000, reward_mean=0.1, reward_bou

22697: loss=0.000, reward_mean=0.0, reward_bound=0.0
22698: loss=0.000, reward_mean=0.1, reward_bound=0.0
22699: loss=0.000, reward_mean=0.0, reward_bound=0.0
22700: loss=0.000, reward_mean=0.1, reward_bound=0.0
22701: loss=0.000, reward_mean=0.1, reward_bound=0.0
22702: loss=0.000, reward_mean=0.1, reward_bound=0.0
22703: loss=0.000, reward_mean=0.0, reward_bound=0.0
22704: loss=0.000, reward_mean=0.1, reward_bound=0.0
22705: loss=0.000, reward_mean=0.2, reward_bound=0.0
22706: loss=0.000, reward_mean=0.1, reward_bound=0.0
22707: loss=0.000, reward_mean=0.0, reward_bound=0.0
22708: loss=0.000, reward_mean=0.1, reward_bound=0.0
22709: loss=0.000, reward_mean=0.0, reward_bound=0.0
22710: loss=0.000, reward_mean=0.0, reward_bound=0.0
22711: loss=0.000, reward_mean=0.1, reward_bound=0.0
22712: loss=0.000, reward_mean=0.0, reward_bound=0.0
22713: loss=0.000, reward_mean=0.1, reward_bound=0.0
22714: loss=0.000, reward_mean=0.1, reward_bound=0.0
22715: loss=0.000, reward_mean=0.0, reward_bou

22852: loss=0.000, reward_mean=0.1, reward_bound=0.0
22853: loss=0.000, reward_mean=0.1, reward_bound=0.0
22854: loss=0.000, reward_mean=0.1, reward_bound=0.0
22855: loss=0.000, reward_mean=0.1, reward_bound=0.0
22856: loss=0.000, reward_mean=0.0, reward_bound=0.0
22857: loss=0.000, reward_mean=0.1, reward_bound=0.0
22858: loss=0.000, reward_mean=0.0, reward_bound=0.0
22859: loss=0.000, reward_mean=0.1, reward_bound=0.0
22860: loss=0.000, reward_mean=0.0, reward_bound=0.0
22861: loss=0.000, reward_mean=0.0, reward_bound=0.0
22862: loss=0.000, reward_mean=0.1, reward_bound=0.0
22863: loss=0.000, reward_mean=0.1, reward_bound=0.0
22864: loss=0.000, reward_mean=0.1, reward_bound=0.0
22865: loss=0.000, reward_mean=0.0, reward_bound=0.0
22866: loss=0.000, reward_mean=0.2, reward_bound=0.0
22867: loss=0.000, reward_mean=0.0, reward_bound=0.0
22868: loss=0.000, reward_mean=0.1, reward_bound=0.0
22869: loss=0.000, reward_mean=0.0, reward_bound=0.0
22870: loss=0.000, reward_mean=0.0, reward_bou

23009: loss=0.000, reward_mean=0.0, reward_bound=0.0
23010: loss=0.000, reward_mean=0.1, reward_bound=0.0
23011: loss=0.000, reward_mean=0.1, reward_bound=0.0
23012: loss=0.000, reward_mean=0.1, reward_bound=0.0
23013: loss=0.000, reward_mean=0.1, reward_bound=0.0
23014: loss=0.000, reward_mean=0.0, reward_bound=0.0
23015: loss=0.000, reward_mean=0.0, reward_bound=0.0
23016: loss=0.000, reward_mean=0.0, reward_bound=0.0
23017: loss=0.000, reward_mean=0.0, reward_bound=0.0
23018: loss=0.000, reward_mean=0.0, reward_bound=0.0
23019: loss=0.000, reward_mean=0.0, reward_bound=0.0
23020: loss=0.000, reward_mean=0.0, reward_bound=0.0
23021: loss=0.000, reward_mean=0.1, reward_bound=0.0
23022: loss=0.000, reward_mean=0.1, reward_bound=0.0
23023: loss=0.000, reward_mean=0.1, reward_bound=0.0
23024: loss=0.000, reward_mean=0.0, reward_bound=0.0
23025: loss=0.000, reward_mean=0.1, reward_bound=0.0
23026: loss=0.000, reward_mean=0.0, reward_bound=0.0
23027: loss=0.000, reward_mean=0.0, reward_bou

23167: loss=0.000, reward_mean=0.0, reward_bound=0.0
23168: loss=0.000, reward_mean=0.0, reward_bound=0.0
23169: loss=0.000, reward_mean=0.0, reward_bound=0.0
23170: loss=0.000, reward_mean=0.0, reward_bound=0.0
23171: loss=0.000, reward_mean=0.1, reward_bound=0.0
23172: loss=0.000, reward_mean=0.0, reward_bound=0.0
23173: loss=0.000, reward_mean=0.0, reward_bound=0.0
23174: loss=0.000, reward_mean=0.0, reward_bound=0.0
23175: loss=0.000, reward_mean=0.1, reward_bound=0.0
23176: loss=0.000, reward_mean=0.1, reward_bound=0.0
23177: loss=0.000, reward_mean=0.0, reward_bound=0.0
23178: loss=0.000, reward_mean=0.0, reward_bound=0.0
23179: loss=0.000, reward_mean=0.0, reward_bound=0.0
23180: loss=0.000, reward_mean=0.0, reward_bound=0.0
23181: loss=0.000, reward_mean=0.0, reward_bound=0.0
23182: loss=0.000, reward_mean=0.1, reward_bound=0.0
23183: loss=0.000, reward_mean=0.1, reward_bound=0.0
23184: loss=0.000, reward_mean=0.0, reward_bound=0.0
23185: loss=0.000, reward_mean=0.0, reward_bou

23322: loss=0.000, reward_mean=0.1, reward_bound=0.0
23323: loss=0.000, reward_mean=0.1, reward_bound=0.0
23324: loss=0.000, reward_mean=0.1, reward_bound=0.0
23325: loss=0.000, reward_mean=0.0, reward_bound=0.0
23326: loss=0.000, reward_mean=0.0, reward_bound=0.0
23327: loss=0.000, reward_mean=0.1, reward_bound=0.0
23328: loss=0.000, reward_mean=0.0, reward_bound=0.0
23329: loss=0.000, reward_mean=0.1, reward_bound=0.0
23330: loss=0.000, reward_mean=0.1, reward_bound=0.0
23331: loss=0.000, reward_mean=0.1, reward_bound=0.0
23332: loss=0.000, reward_mean=0.1, reward_bound=0.0
23333: loss=0.000, reward_mean=0.0, reward_bound=0.0
23334: loss=0.000, reward_mean=0.1, reward_bound=0.0
23335: loss=0.000, reward_mean=0.0, reward_bound=0.0
23336: loss=0.000, reward_mean=0.0, reward_bound=0.0
23337: loss=0.000, reward_mean=0.0, reward_bound=0.0
23338: loss=0.000, reward_mean=0.0, reward_bound=0.0
23339: loss=0.000, reward_mean=0.1, reward_bound=0.0
23340: loss=0.000, reward_mean=0.1, reward_bou

23481: loss=0.000, reward_mean=0.0, reward_bound=0.0
23482: loss=0.000, reward_mean=0.0, reward_bound=0.0
23483: loss=0.000, reward_mean=0.1, reward_bound=0.0
23484: loss=0.000, reward_mean=0.0, reward_bound=0.0
23485: loss=0.000, reward_mean=0.1, reward_bound=0.0
23486: loss=0.000, reward_mean=0.0, reward_bound=0.0
23487: loss=0.000, reward_mean=0.0, reward_bound=0.0
23488: loss=0.000, reward_mean=0.1, reward_bound=0.0
23489: loss=0.000, reward_mean=0.2, reward_bound=0.0
23490: loss=0.000, reward_mean=0.1, reward_bound=0.0
23491: loss=0.000, reward_mean=0.1, reward_bound=0.0
23492: loss=0.000, reward_mean=0.1, reward_bound=0.0
23493: loss=0.000, reward_mean=0.0, reward_bound=0.0
23494: loss=0.000, reward_mean=0.1, reward_bound=0.0
23495: loss=0.000, reward_mean=0.0, reward_bound=0.0
23496: loss=0.000, reward_mean=0.0, reward_bound=0.0
23497: loss=0.000, reward_mean=0.1, reward_bound=0.0
23498: loss=0.000, reward_mean=0.0, reward_bound=0.0
23499: loss=0.000, reward_mean=0.0, reward_bou

23641: loss=0.000, reward_mean=0.1, reward_bound=0.0
23642: loss=0.000, reward_mean=0.1, reward_bound=0.0
23643: loss=0.000, reward_mean=0.0, reward_bound=0.0
23644: loss=0.000, reward_mean=0.0, reward_bound=0.0
23645: loss=0.000, reward_mean=0.0, reward_bound=0.0
23646: loss=0.000, reward_mean=0.0, reward_bound=0.0
23647: loss=0.000, reward_mean=0.0, reward_bound=0.0
23648: loss=0.000, reward_mean=0.1, reward_bound=0.0
23649: loss=0.000, reward_mean=0.2, reward_bound=0.0
23650: loss=0.000, reward_mean=0.1, reward_bound=0.0
23651: loss=0.000, reward_mean=0.2, reward_bound=0.0
23652: loss=0.000, reward_mean=0.1, reward_bound=0.0
23653: loss=0.000, reward_mean=0.1, reward_bound=0.0
23654: loss=0.000, reward_mean=0.1, reward_bound=0.0
23655: loss=0.000, reward_mean=0.1, reward_bound=0.0
23656: loss=0.000, reward_mean=0.1, reward_bound=0.0
23657: loss=0.000, reward_mean=0.1, reward_bound=0.0
23658: loss=0.000, reward_mean=0.1, reward_bound=0.0
23659: loss=0.000, reward_mean=0.0, reward_bou

23798: loss=0.000, reward_mean=0.2, reward_bound=0.0
23799: loss=0.000, reward_mean=0.0, reward_bound=0.0
23800: loss=0.000, reward_mean=0.1, reward_bound=0.0
23801: loss=0.000, reward_mean=0.1, reward_bound=0.0
23802: loss=0.000, reward_mean=0.0, reward_bound=0.0
23803: loss=0.000, reward_mean=0.1, reward_bound=0.0
23804: loss=0.000, reward_mean=0.0, reward_bound=0.0
23805: loss=0.000, reward_mean=0.1, reward_bound=0.0
23806: loss=0.000, reward_mean=0.0, reward_bound=0.0
23807: loss=0.000, reward_mean=0.1, reward_bound=0.0
23808: loss=0.000, reward_mean=0.0, reward_bound=0.0
23809: loss=0.000, reward_mean=0.0, reward_bound=0.0
23810: loss=0.000, reward_mean=0.1, reward_bound=0.0
23811: loss=0.000, reward_mean=0.1, reward_bound=0.0
23812: loss=0.000, reward_mean=0.1, reward_bound=0.0
23813: loss=0.000, reward_mean=0.1, reward_bound=0.0
23814: loss=0.000, reward_mean=0.1, reward_bound=0.0
23815: loss=0.000, reward_mean=0.0, reward_bound=0.0
23816: loss=0.000, reward_mean=0.0, reward_bou

23956: loss=0.000, reward_mean=0.1, reward_bound=0.0
23957: loss=0.000, reward_mean=0.0, reward_bound=0.0
23958: loss=0.000, reward_mean=0.0, reward_bound=0.0
23959: loss=0.000, reward_mean=0.1, reward_bound=0.0
23960: loss=0.000, reward_mean=0.1, reward_bound=0.0
23961: loss=0.000, reward_mean=0.1, reward_bound=0.0
23962: loss=0.000, reward_mean=0.1, reward_bound=0.0
23963: loss=0.000, reward_mean=0.0, reward_bound=0.0
23964: loss=0.000, reward_mean=0.2, reward_bound=0.0
23965: loss=0.000, reward_mean=0.1, reward_bound=0.0
23966: loss=0.000, reward_mean=0.2, reward_bound=0.0
23967: loss=0.000, reward_mean=0.1, reward_bound=0.0
23968: loss=0.000, reward_mean=0.0, reward_bound=0.0
23969: loss=0.000, reward_mean=0.0, reward_bound=0.0
23970: loss=0.000, reward_mean=0.0, reward_bound=0.0
23971: loss=0.000, reward_mean=0.0, reward_bound=0.0
23972: loss=0.000, reward_mean=0.0, reward_bound=0.0
23973: loss=0.000, reward_mean=0.1, reward_bound=0.0
23974: loss=0.000, reward_mean=0.1, reward_bou

24115: loss=0.000, reward_mean=0.1, reward_bound=0.0
24116: loss=0.000, reward_mean=0.1, reward_bound=0.0
24117: loss=0.000, reward_mean=0.1, reward_bound=0.0
24118: loss=0.000, reward_mean=0.0, reward_bound=0.0
24119: loss=0.000, reward_mean=0.0, reward_bound=0.0
24120: loss=0.000, reward_mean=0.1, reward_bound=0.0
24121: loss=0.000, reward_mean=0.1, reward_bound=0.0
24122: loss=0.000, reward_mean=0.0, reward_bound=0.0
24123: loss=0.000, reward_mean=0.1, reward_bound=0.0
24124: loss=0.000, reward_mean=0.1, reward_bound=0.0
24125: loss=0.000, reward_mean=0.1, reward_bound=0.0
24126: loss=0.000, reward_mean=0.1, reward_bound=0.0
24127: loss=0.000, reward_mean=0.1, reward_bound=0.0
24128: loss=0.000, reward_mean=0.1, reward_bound=0.0
24129: loss=0.000, reward_mean=0.1, reward_bound=0.0
24130: loss=0.000, reward_mean=0.1, reward_bound=0.0
24131: loss=0.000, reward_mean=0.1, reward_bound=0.0
24132: loss=0.000, reward_mean=0.0, reward_bound=0.0
24133: loss=0.000, reward_mean=0.1, reward_bou

24274: loss=0.000, reward_mean=0.1, reward_bound=0.0
24275: loss=0.000, reward_mean=0.0, reward_bound=0.0
24276: loss=0.000, reward_mean=0.1, reward_bound=0.0
24277: loss=0.000, reward_mean=0.1, reward_bound=0.0
24278: loss=0.000, reward_mean=0.1, reward_bound=0.0
24279: loss=0.000, reward_mean=0.1, reward_bound=0.0
24280: loss=0.000, reward_mean=0.1, reward_bound=0.0
24281: loss=0.000, reward_mean=0.0, reward_bound=0.0
24282: loss=0.000, reward_mean=0.1, reward_bound=0.0
24283: loss=0.000, reward_mean=0.1, reward_bound=0.0
24284: loss=0.000, reward_mean=0.2, reward_bound=0.0
24285: loss=0.000, reward_mean=0.1, reward_bound=0.0
24286: loss=0.000, reward_mean=0.1, reward_bound=0.0
24287: loss=0.000, reward_mean=0.0, reward_bound=0.0
24288: loss=0.000, reward_mean=0.1, reward_bound=0.0
24289: loss=0.000, reward_mean=0.1, reward_bound=0.0
24290: loss=0.000, reward_mean=0.1, reward_bound=0.0
24291: loss=0.000, reward_mean=0.1, reward_bound=0.0
24292: loss=0.000, reward_mean=0.0, reward_bou

24430: loss=0.000, reward_mean=0.1, reward_bound=0.0
24431: loss=0.000, reward_mean=0.0, reward_bound=0.0
24432: loss=0.000, reward_mean=0.0, reward_bound=0.0
24433: loss=0.000, reward_mean=0.1, reward_bound=0.0
24434: loss=0.000, reward_mean=0.1, reward_bound=0.0
24435: loss=0.000, reward_mean=0.1, reward_bound=0.0
24436: loss=0.000, reward_mean=0.0, reward_bound=0.0
24437: loss=0.000, reward_mean=0.1, reward_bound=0.0
24438: loss=0.000, reward_mean=0.1, reward_bound=0.0
24439: loss=0.000, reward_mean=0.0, reward_bound=0.0
24440: loss=0.000, reward_mean=0.1, reward_bound=0.0
24441: loss=0.000, reward_mean=0.1, reward_bound=0.0
24442: loss=0.000, reward_mean=0.1, reward_bound=0.0
24443: loss=0.000, reward_mean=0.0, reward_bound=0.0
24444: loss=0.000, reward_mean=0.1, reward_bound=0.0
24445: loss=0.000, reward_mean=0.0, reward_bound=0.0
24446: loss=0.000, reward_mean=0.0, reward_bound=0.0
24447: loss=0.000, reward_mean=0.0, reward_bound=0.0
24448: loss=0.000, reward_mean=0.0, reward_bou

24585: loss=0.000, reward_mean=0.0, reward_bound=0.0
24586: loss=0.000, reward_mean=0.0, reward_bound=0.0
24587: loss=0.000, reward_mean=0.0, reward_bound=0.0
24588: loss=0.000, reward_mean=0.0, reward_bound=0.0
24589: loss=0.000, reward_mean=0.0, reward_bound=0.0
24590: loss=0.000, reward_mean=0.1, reward_bound=0.0
24591: loss=0.000, reward_mean=0.1, reward_bound=0.0
24592: loss=0.000, reward_mean=0.1, reward_bound=0.0
24593: loss=0.000, reward_mean=0.1, reward_bound=0.0
24594: loss=0.000, reward_mean=0.1, reward_bound=0.0
24595: loss=0.000, reward_mean=0.1, reward_bound=0.0
24596: loss=0.000, reward_mean=0.1, reward_bound=0.0
24597: loss=0.000, reward_mean=0.0, reward_bound=0.0
24598: loss=0.000, reward_mean=0.0, reward_bound=0.0
24599: loss=0.000, reward_mean=0.0, reward_bound=0.0
24600: loss=0.000, reward_mean=0.0, reward_bound=0.0
24601: loss=0.000, reward_mean=0.0, reward_bound=0.0
24602: loss=0.000, reward_mean=0.0, reward_bound=0.0
24603: loss=0.000, reward_mean=0.0, reward_bou

24741: loss=0.000, reward_mean=0.0, reward_bound=0.0
24742: loss=0.000, reward_mean=0.1, reward_bound=0.0
24743: loss=0.000, reward_mean=0.0, reward_bound=0.0
24744: loss=0.000, reward_mean=0.1, reward_bound=0.0
24745: loss=0.000, reward_mean=0.1, reward_bound=0.0
24746: loss=0.000, reward_mean=0.0, reward_bound=0.0
24747: loss=0.000, reward_mean=0.1, reward_bound=0.0
24748: loss=0.000, reward_mean=0.1, reward_bound=0.0
24749: loss=0.000, reward_mean=0.0, reward_bound=0.0
24750: loss=0.000, reward_mean=0.0, reward_bound=0.0
24751: loss=0.000, reward_mean=0.0, reward_bound=0.0
24752: loss=0.000, reward_mean=0.0, reward_bound=0.0
24753: loss=0.000, reward_mean=0.1, reward_bound=0.0
24754: loss=0.000, reward_mean=0.1, reward_bound=0.0
24755: loss=0.000, reward_mean=0.1, reward_bound=0.0
24756: loss=0.000, reward_mean=0.1, reward_bound=0.0
24757: loss=0.000, reward_mean=0.0, reward_bound=0.0
24758: loss=0.000, reward_mean=0.1, reward_bound=0.0
24759: loss=0.000, reward_mean=0.1, reward_bou

24899: loss=0.000, reward_mean=0.0, reward_bound=0.0
24900: loss=0.000, reward_mean=0.1, reward_bound=0.0
24901: loss=0.000, reward_mean=0.2, reward_bound=0.0
24902: loss=0.000, reward_mean=0.1, reward_bound=0.0
24903: loss=0.000, reward_mean=0.1, reward_bound=0.0
24904: loss=0.000, reward_mean=0.1, reward_bound=0.0
24905: loss=0.000, reward_mean=0.1, reward_bound=0.0
24906: loss=0.000, reward_mean=0.0, reward_bound=0.0
24907: loss=0.000, reward_mean=0.0, reward_bound=0.0
24908: loss=0.000, reward_mean=0.1, reward_bound=0.0
24909: loss=0.000, reward_mean=0.1, reward_bound=0.0
24910: loss=0.000, reward_mean=0.1, reward_bound=0.0
24911: loss=0.000, reward_mean=0.0, reward_bound=0.0
24912: loss=0.000, reward_mean=0.0, reward_bound=0.0
24913: loss=0.000, reward_mean=0.1, reward_bound=0.0
24914: loss=0.000, reward_mean=0.1, reward_bound=0.0
24915: loss=0.000, reward_mean=0.1, reward_bound=0.0
24916: loss=0.000, reward_mean=0.1, reward_bound=0.0
24917: loss=0.000, reward_mean=0.0, reward_bou

25055: loss=0.000, reward_mean=0.0, reward_bound=0.0
25056: loss=0.000, reward_mean=0.1, reward_bound=0.0
25057: loss=0.000, reward_mean=0.1, reward_bound=0.0
25058: loss=0.000, reward_mean=0.0, reward_bound=0.0
25059: loss=0.000, reward_mean=0.1, reward_bound=0.0
25060: loss=0.000, reward_mean=0.0, reward_bound=0.0
25061: loss=0.000, reward_mean=0.1, reward_bound=0.0
25062: loss=0.000, reward_mean=0.0, reward_bound=0.0
25063: loss=0.000, reward_mean=0.1, reward_bound=0.0
25064: loss=0.000, reward_mean=0.0, reward_bound=0.0
25065: loss=0.000, reward_mean=0.1, reward_bound=0.0
25066: loss=0.000, reward_mean=0.2, reward_bound=0.0
25067: loss=0.000, reward_mean=0.0, reward_bound=0.0
25068: loss=0.000, reward_mean=0.0, reward_bound=0.0
25069: loss=0.000, reward_mean=0.2, reward_bound=0.0
25070: loss=0.000, reward_mean=0.1, reward_bound=0.0
25071: loss=0.000, reward_mean=0.0, reward_bound=0.0
25072: loss=0.000, reward_mean=0.1, reward_bound=0.0
25073: loss=0.000, reward_mean=0.1, reward_bou

25214: loss=0.000, reward_mean=0.1, reward_bound=0.0
25215: loss=0.000, reward_mean=0.1, reward_bound=0.0
25216: loss=0.000, reward_mean=0.1, reward_bound=0.0
25217: loss=0.000, reward_mean=0.0, reward_bound=0.0
25218: loss=0.000, reward_mean=0.1, reward_bound=0.0
25219: loss=0.000, reward_mean=0.1, reward_bound=0.0
25220: loss=0.000, reward_mean=0.0, reward_bound=0.0
25221: loss=0.000, reward_mean=0.2, reward_bound=0.0
25222: loss=0.000, reward_mean=0.0, reward_bound=0.0
25223: loss=0.000, reward_mean=0.1, reward_bound=0.0
25224: loss=0.000, reward_mean=0.1, reward_bound=0.0
25225: loss=0.000, reward_mean=0.1, reward_bound=0.0
25226: loss=0.000, reward_mean=0.1, reward_bound=0.0
25227: loss=0.000, reward_mean=0.2, reward_bound=0.0
25228: loss=0.000, reward_mean=0.0, reward_bound=0.0
25229: loss=0.000, reward_mean=0.0, reward_bound=0.0
25230: loss=0.000, reward_mean=0.2, reward_bound=0.0
25231: loss=0.000, reward_mean=0.1, reward_bound=0.0
25232: loss=0.000, reward_mean=0.1, reward_bou

25371: loss=0.000, reward_mean=0.0, reward_bound=0.0
25372: loss=0.000, reward_mean=0.1, reward_bound=0.0
25373: loss=0.000, reward_mean=0.0, reward_bound=0.0
25374: loss=0.000, reward_mean=0.1, reward_bound=0.0
25375: loss=0.000, reward_mean=0.1, reward_bound=0.0
25376: loss=0.000, reward_mean=0.2, reward_bound=0.0
25377: loss=0.000, reward_mean=0.0, reward_bound=0.0
25378: loss=0.000, reward_mean=0.0, reward_bound=0.0
25379: loss=0.000, reward_mean=0.0, reward_bound=0.0
25380: loss=0.000, reward_mean=0.0, reward_bound=0.0
25381: loss=0.000, reward_mean=0.0, reward_bound=0.0
25382: loss=0.000, reward_mean=0.1, reward_bound=0.0
25383: loss=0.000, reward_mean=0.1, reward_bound=0.0
25384: loss=0.000, reward_mean=0.1, reward_bound=0.0
25385: loss=0.000, reward_mean=0.1, reward_bound=0.0
25386: loss=0.000, reward_mean=0.1, reward_bound=0.0
25387: loss=0.000, reward_mean=0.1, reward_bound=0.0
25388: loss=0.000, reward_mean=0.1, reward_bound=0.0
25389: loss=0.000, reward_mean=0.2, reward_bou

25528: loss=0.000, reward_mean=0.0, reward_bound=0.0
25529: loss=0.000, reward_mean=0.0, reward_bound=0.0
25530: loss=0.000, reward_mean=0.1, reward_bound=0.0
25531: loss=0.000, reward_mean=0.1, reward_bound=0.0
25532: loss=0.000, reward_mean=0.1, reward_bound=0.0
25533: loss=0.000, reward_mean=0.1, reward_bound=0.0
25534: loss=0.000, reward_mean=0.1, reward_bound=0.0
25535: loss=0.000, reward_mean=0.0, reward_bound=0.0
25536: loss=0.000, reward_mean=0.1, reward_bound=0.0
25537: loss=0.000, reward_mean=0.0, reward_bound=0.0
25538: loss=0.000, reward_mean=0.1, reward_bound=0.0
25539: loss=0.000, reward_mean=0.1, reward_bound=0.0
25540: loss=0.000, reward_mean=0.1, reward_bound=0.0
25541: loss=0.000, reward_mean=0.1, reward_bound=0.0
25542: loss=0.000, reward_mean=0.0, reward_bound=0.0
25543: loss=0.000, reward_mean=0.1, reward_bound=0.0
25544: loss=0.000, reward_mean=0.0, reward_bound=0.0
25545: loss=0.000, reward_mean=0.0, reward_bound=0.0
25546: loss=0.000, reward_mean=0.1, reward_bou

25688: loss=0.000, reward_mean=0.1, reward_bound=0.0
25689: loss=0.000, reward_mean=0.0, reward_bound=0.0
25690: loss=0.000, reward_mean=0.0, reward_bound=0.0
25691: loss=0.000, reward_mean=0.1, reward_bound=0.0
25692: loss=0.000, reward_mean=0.1, reward_bound=0.0
25693: loss=0.000, reward_mean=0.0, reward_bound=0.0
25694: loss=0.000, reward_mean=0.3, reward_bound=0.5
25695: loss=0.000, reward_mean=0.0, reward_bound=0.0
25696: loss=0.000, reward_mean=0.0, reward_bound=0.0
25697: loss=0.000, reward_mean=0.1, reward_bound=0.0
25698: loss=0.000, reward_mean=0.2, reward_bound=0.0
25699: loss=0.000, reward_mean=0.1, reward_bound=0.0
25700: loss=0.000, reward_mean=0.0, reward_bound=0.0
25701: loss=0.000, reward_mean=0.0, reward_bound=0.0
25702: loss=0.000, reward_mean=0.0, reward_bound=0.0
25703: loss=0.000, reward_mean=0.0, reward_bound=0.0
25704: loss=0.000, reward_mean=0.1, reward_bound=0.0
25705: loss=0.000, reward_mean=0.2, reward_bound=0.0
25706: loss=0.000, reward_mean=0.2, reward_bou

25843: loss=0.000, reward_mean=0.1, reward_bound=0.0
25844: loss=0.000, reward_mean=0.1, reward_bound=0.0
25845: loss=0.000, reward_mean=0.1, reward_bound=0.0
25846: loss=0.000, reward_mean=0.1, reward_bound=0.0
25847: loss=0.000, reward_mean=0.1, reward_bound=0.0
25848: loss=0.000, reward_mean=0.0, reward_bound=0.0
25849: loss=0.000, reward_mean=0.1, reward_bound=0.0
25850: loss=0.000, reward_mean=0.0, reward_bound=0.0
25851: loss=0.000, reward_mean=0.0, reward_bound=0.0
25852: loss=0.000, reward_mean=0.0, reward_bound=0.0
25853: loss=0.000, reward_mean=0.0, reward_bound=0.0
25854: loss=0.000, reward_mean=0.0, reward_bound=0.0
25855: loss=0.000, reward_mean=0.0, reward_bound=0.0
25856: loss=0.000, reward_mean=0.0, reward_bound=0.0
25857: loss=0.000, reward_mean=0.0, reward_bound=0.0
25858: loss=0.000, reward_mean=0.1, reward_bound=0.0
25859: loss=0.000, reward_mean=0.1, reward_bound=0.0
25860: loss=0.000, reward_mean=0.1, reward_bound=0.0
25861: loss=0.000, reward_mean=0.0, reward_bou

26002: loss=0.000, reward_mean=0.1, reward_bound=0.0
26003: loss=0.000, reward_mean=0.1, reward_bound=0.0
26004: loss=0.000, reward_mean=0.2, reward_bound=0.0
26005: loss=0.000, reward_mean=0.0, reward_bound=0.0
26006: loss=0.000, reward_mean=0.0, reward_bound=0.0
26007: loss=0.000, reward_mean=0.0, reward_bound=0.0
26008: loss=0.000, reward_mean=0.1, reward_bound=0.0
26009: loss=0.000, reward_mean=0.1, reward_bound=0.0
26010: loss=0.000, reward_mean=0.1, reward_bound=0.0
26011: loss=0.000, reward_mean=0.0, reward_bound=0.0
26012: loss=0.000, reward_mean=0.0, reward_bound=0.0
26013: loss=0.000, reward_mean=0.0, reward_bound=0.0
26014: loss=0.000, reward_mean=0.0, reward_bound=0.0
26015: loss=0.000, reward_mean=0.1, reward_bound=0.0
26016: loss=0.000, reward_mean=0.0, reward_bound=0.0
26017: loss=0.000, reward_mean=0.0, reward_bound=0.0
26018: loss=0.000, reward_mean=0.1, reward_bound=0.0
26019: loss=0.000, reward_mean=0.1, reward_bound=0.0
26020: loss=0.000, reward_mean=0.1, reward_bou

26157: loss=0.000, reward_mean=0.2, reward_bound=0.0
26158: loss=0.000, reward_mean=0.1, reward_bound=0.0
26159: loss=0.000, reward_mean=0.2, reward_bound=0.0
26160: loss=0.000, reward_mean=0.1, reward_bound=0.0
26161: loss=0.000, reward_mean=0.0, reward_bound=0.0
26162: loss=0.000, reward_mean=0.1, reward_bound=0.0
26163: loss=0.000, reward_mean=0.2, reward_bound=0.0
26164: loss=0.000, reward_mean=0.1, reward_bound=0.0
26165: loss=0.000, reward_mean=0.0, reward_bound=0.0
26166: loss=0.000, reward_mean=0.0, reward_bound=0.0
26167: loss=0.000, reward_mean=0.0, reward_bound=0.0
26168: loss=0.000, reward_mean=0.0, reward_bound=0.0
26169: loss=0.000, reward_mean=0.0, reward_bound=0.0
26170: loss=0.000, reward_mean=0.0, reward_bound=0.0
26171: loss=0.000, reward_mean=0.1, reward_bound=0.0
26172: loss=0.000, reward_mean=0.2, reward_bound=0.0
26173: loss=0.000, reward_mean=0.0, reward_bound=0.0
26174: loss=0.000, reward_mean=0.1, reward_bound=0.0
26175: loss=0.000, reward_mean=0.0, reward_bou

26312: loss=0.000, reward_mean=0.0, reward_bound=0.0
26313: loss=0.000, reward_mean=0.1, reward_bound=0.0
26314: loss=0.000, reward_mean=0.0, reward_bound=0.0
26315: loss=0.000, reward_mean=0.1, reward_bound=0.0
26316: loss=0.000, reward_mean=0.1, reward_bound=0.0
26317: loss=0.000, reward_mean=0.1, reward_bound=0.0
26318: loss=0.000, reward_mean=0.0, reward_bound=0.0
26319: loss=0.000, reward_mean=0.1, reward_bound=0.0
26320: loss=0.000, reward_mean=0.1, reward_bound=0.0
26321: loss=0.000, reward_mean=0.0, reward_bound=0.0
26322: loss=0.000, reward_mean=0.2, reward_bound=0.0
26323: loss=0.000, reward_mean=0.0, reward_bound=0.0
26324: loss=0.000, reward_mean=0.1, reward_bound=0.0
26325: loss=0.000, reward_mean=0.1, reward_bound=0.0
26326: loss=0.000, reward_mean=0.0, reward_bound=0.0
26327: loss=0.000, reward_mean=0.1, reward_bound=0.0
26328: loss=0.000, reward_mean=0.1, reward_bound=0.0
26329: loss=0.000, reward_mean=0.1, reward_bound=0.0
26330: loss=0.000, reward_mean=0.0, reward_bou

26473: loss=0.000, reward_mean=0.1, reward_bound=0.0
26474: loss=0.000, reward_mean=0.0, reward_bound=0.0
26475: loss=0.000, reward_mean=0.1, reward_bound=0.0
26476: loss=0.000, reward_mean=0.1, reward_bound=0.0
26477: loss=0.000, reward_mean=0.1, reward_bound=0.0
26478: loss=0.000, reward_mean=0.0, reward_bound=0.0
26479: loss=0.000, reward_mean=0.0, reward_bound=0.0
26480: loss=0.000, reward_mean=0.0, reward_bound=0.0
26481: loss=0.000, reward_mean=0.1, reward_bound=0.0
26482: loss=0.000, reward_mean=0.2, reward_bound=0.0
26483: loss=0.000, reward_mean=0.0, reward_bound=0.0
26484: loss=0.000, reward_mean=0.0, reward_bound=0.0
26485: loss=0.000, reward_mean=0.1, reward_bound=0.0
26486: loss=0.000, reward_mean=0.0, reward_bound=0.0
26487: loss=0.000, reward_mean=0.1, reward_bound=0.0
26488: loss=0.000, reward_mean=0.1, reward_bound=0.0
26489: loss=0.000, reward_mean=0.0, reward_bound=0.0
26490: loss=0.000, reward_mean=0.1, reward_bound=0.0
26491: loss=0.000, reward_mean=0.1, reward_bou

26632: loss=0.000, reward_mean=0.1, reward_bound=0.0
26633: loss=0.000, reward_mean=0.1, reward_bound=0.0
26634: loss=0.000, reward_mean=0.0, reward_bound=0.0
26635: loss=0.000, reward_mean=0.1, reward_bound=0.0
26636: loss=0.000, reward_mean=0.1, reward_bound=0.0
26637: loss=0.000, reward_mean=0.1, reward_bound=0.0
26638: loss=0.000, reward_mean=0.1, reward_bound=0.0
26639: loss=0.000, reward_mean=0.0, reward_bound=0.0
26640: loss=0.000, reward_mean=0.1, reward_bound=0.0
26641: loss=0.000, reward_mean=0.0, reward_bound=0.0
26642: loss=0.000, reward_mean=0.0, reward_bound=0.0
26643: loss=0.000, reward_mean=0.1, reward_bound=0.0
26644: loss=0.000, reward_mean=0.0, reward_bound=0.0
26645: loss=0.000, reward_mean=0.2, reward_bound=0.0
26646: loss=0.000, reward_mean=0.1, reward_bound=0.0
26647: loss=0.000, reward_mean=0.1, reward_bound=0.0
26648: loss=0.000, reward_mean=0.1, reward_bound=0.0
26649: loss=0.000, reward_mean=0.0, reward_bound=0.0
26650: loss=0.000, reward_mean=0.1, reward_bou

26789: loss=0.000, reward_mean=0.1, reward_bound=0.0
26790: loss=0.000, reward_mean=0.1, reward_bound=0.0
26791: loss=0.000, reward_mean=0.1, reward_bound=0.0
26792: loss=0.000, reward_mean=0.0, reward_bound=0.0
26793: loss=0.000, reward_mean=0.1, reward_bound=0.0
26794: loss=0.000, reward_mean=0.1, reward_bound=0.0
26795: loss=0.000, reward_mean=0.1, reward_bound=0.0
26796: loss=0.000, reward_mean=0.1, reward_bound=0.0
26797: loss=0.000, reward_mean=0.2, reward_bound=0.0
26798: loss=0.000, reward_mean=0.1, reward_bound=0.0
26799: loss=0.000, reward_mean=0.1, reward_bound=0.0
26800: loss=0.000, reward_mean=0.0, reward_bound=0.0
26801: loss=0.000, reward_mean=0.0, reward_bound=0.0
26802: loss=0.000, reward_mean=0.1, reward_bound=0.0
26803: loss=0.000, reward_mean=0.1, reward_bound=0.0
26804: loss=0.000, reward_mean=0.1, reward_bound=0.0
26805: loss=0.000, reward_mean=0.0, reward_bound=0.0
26806: loss=0.000, reward_mean=0.1, reward_bound=0.0
26807: loss=0.000, reward_mean=0.1, reward_bou

26945: loss=0.000, reward_mean=0.0, reward_bound=0.0
26946: loss=0.000, reward_mean=0.0, reward_bound=0.0
26947: loss=0.000, reward_mean=0.1, reward_bound=0.0
26948: loss=0.000, reward_mean=0.1, reward_bound=0.0
26949: loss=0.000, reward_mean=0.1, reward_bound=0.0
26950: loss=0.000, reward_mean=0.0, reward_bound=0.0
26951: loss=0.000, reward_mean=0.1, reward_bound=0.0
26952: loss=0.000, reward_mean=0.0, reward_bound=0.0
26953: loss=0.000, reward_mean=0.0, reward_bound=0.0
26954: loss=0.000, reward_mean=0.1, reward_bound=0.0
26955: loss=0.000, reward_mean=0.0, reward_bound=0.0
26956: loss=0.000, reward_mean=0.1, reward_bound=0.0
26957: loss=0.000, reward_mean=0.1, reward_bound=0.0
26958: loss=0.000, reward_mean=0.1, reward_bound=0.0
26959: loss=0.000, reward_mean=0.2, reward_bound=0.0
26960: loss=0.000, reward_mean=0.0, reward_bound=0.0
26961: loss=0.000, reward_mean=0.0, reward_bound=0.0
26962: loss=0.000, reward_mean=0.0, reward_bound=0.0
26963: loss=0.000, reward_mean=0.1, reward_bou

27101: loss=0.000, reward_mean=0.1, reward_bound=0.0
27102: loss=0.000, reward_mean=0.0, reward_bound=0.0
27103: loss=0.000, reward_mean=0.1, reward_bound=0.0
27104: loss=0.000, reward_mean=0.1, reward_bound=0.0
27105: loss=0.000, reward_mean=0.1, reward_bound=0.0
27106: loss=0.000, reward_mean=0.1, reward_bound=0.0
27107: loss=0.000, reward_mean=0.0, reward_bound=0.0
27108: loss=0.000, reward_mean=0.0, reward_bound=0.0
27109: loss=0.000, reward_mean=0.1, reward_bound=0.0
27110: loss=0.000, reward_mean=0.2, reward_bound=0.0
27111: loss=0.000, reward_mean=0.1, reward_bound=0.0
27112: loss=0.000, reward_mean=0.1, reward_bound=0.0
27113: loss=0.000, reward_mean=0.1, reward_bound=0.0
27114: loss=0.000, reward_mean=0.1, reward_bound=0.0
27115: loss=0.000, reward_mean=0.1, reward_bound=0.0
27116: loss=0.000, reward_mean=0.0, reward_bound=0.0
27117: loss=0.000, reward_mean=0.1, reward_bound=0.0
27118: loss=0.000, reward_mean=0.0, reward_bound=0.0
27119: loss=0.000, reward_mean=0.0, reward_bou

27262: loss=0.000, reward_mean=0.1, reward_bound=0.0
27263: loss=0.000, reward_mean=0.1, reward_bound=0.0
27264: loss=0.000, reward_mean=0.1, reward_bound=0.0
27265: loss=0.000, reward_mean=0.1, reward_bound=0.0
27266: loss=0.000, reward_mean=0.2, reward_bound=0.0
27267: loss=0.000, reward_mean=0.1, reward_bound=0.0
27268: loss=0.000, reward_mean=0.0, reward_bound=0.0
27269: loss=0.000, reward_mean=0.0, reward_bound=0.0
27270: loss=0.000, reward_mean=0.1, reward_bound=0.0
27271: loss=0.000, reward_mean=0.0, reward_bound=0.0
27272: loss=0.000, reward_mean=0.0, reward_bound=0.0
27273: loss=0.000, reward_mean=0.1, reward_bound=0.0
27274: loss=0.000, reward_mean=0.1, reward_bound=0.0
27275: loss=0.000, reward_mean=0.0, reward_bound=0.0
27276: loss=0.000, reward_mean=0.1, reward_bound=0.0
27277: loss=0.000, reward_mean=0.1, reward_bound=0.0
27278: loss=0.000, reward_mean=0.1, reward_bound=0.0
27279: loss=0.000, reward_mean=0.1, reward_bound=0.0
27280: loss=0.000, reward_mean=0.0, reward_bou

27422: loss=0.000, reward_mean=0.0, reward_bound=0.0
27423: loss=0.000, reward_mean=0.1, reward_bound=0.0
27424: loss=0.000, reward_mean=0.1, reward_bound=0.0
27425: loss=0.000, reward_mean=0.0, reward_bound=0.0
27426: loss=0.000, reward_mean=0.1, reward_bound=0.0
27427: loss=0.000, reward_mean=0.1, reward_bound=0.0
27428: loss=0.000, reward_mean=0.0, reward_bound=0.0
27429: loss=0.000, reward_mean=0.1, reward_bound=0.0
27430: loss=0.000, reward_mean=0.1, reward_bound=0.0
27431: loss=0.000, reward_mean=0.1, reward_bound=0.0
27432: loss=0.000, reward_mean=0.1, reward_bound=0.0
27433: loss=0.000, reward_mean=0.0, reward_bound=0.0
27434: loss=0.000, reward_mean=0.1, reward_bound=0.0
27435: loss=0.000, reward_mean=0.1, reward_bound=0.0
27436: loss=0.000, reward_mean=0.1, reward_bound=0.0
27437: loss=0.000, reward_mean=0.1, reward_bound=0.0
27438: loss=0.000, reward_mean=0.1, reward_bound=0.0
27439: loss=0.000, reward_mean=0.1, reward_bound=0.0
27440: loss=0.000, reward_mean=0.1, reward_bou

27578: loss=0.000, reward_mean=0.0, reward_bound=0.0
27579: loss=0.000, reward_mean=0.1, reward_bound=0.0
27580: loss=0.000, reward_mean=0.0, reward_bound=0.0
27581: loss=0.000, reward_mean=0.0, reward_bound=0.0
27582: loss=0.000, reward_mean=0.0, reward_bound=0.0
27583: loss=0.000, reward_mean=0.0, reward_bound=0.0
27584: loss=0.000, reward_mean=0.0, reward_bound=0.0
27585: loss=0.000, reward_mean=0.2, reward_bound=0.0
27586: loss=0.000, reward_mean=0.0, reward_bound=0.0
27587: loss=0.000, reward_mean=0.1, reward_bound=0.0
27588: loss=0.000, reward_mean=0.0, reward_bound=0.0
27589: loss=0.000, reward_mean=0.0, reward_bound=0.0
27590: loss=0.000, reward_mean=0.0, reward_bound=0.0
27591: loss=0.000, reward_mean=0.1, reward_bound=0.0
27592: loss=0.000, reward_mean=0.1, reward_bound=0.0
27593: loss=0.000, reward_mean=0.1, reward_bound=0.0
27594: loss=0.000, reward_mean=0.1, reward_bound=0.0
27595: loss=0.000, reward_mean=0.1, reward_bound=0.0
27596: loss=0.000, reward_mean=0.0, reward_bou

27735: loss=0.000, reward_mean=0.0, reward_bound=0.0
27736: loss=0.000, reward_mean=0.0, reward_bound=0.0
27737: loss=0.000, reward_mean=0.1, reward_bound=0.0
27738: loss=0.000, reward_mean=0.1, reward_bound=0.0
27739: loss=0.000, reward_mean=0.0, reward_bound=0.0
27740: loss=0.000, reward_mean=0.1, reward_bound=0.0
27741: loss=0.000, reward_mean=0.1, reward_bound=0.0
27742: loss=0.000, reward_mean=0.0, reward_bound=0.0
27743: loss=0.000, reward_mean=0.1, reward_bound=0.0
27744: loss=0.000, reward_mean=0.1, reward_bound=0.0
27745: loss=0.000, reward_mean=0.0, reward_bound=0.0
27746: loss=0.000, reward_mean=0.1, reward_bound=0.0
27747: loss=0.000, reward_mean=0.1, reward_bound=0.0
27748: loss=0.000, reward_mean=0.1, reward_bound=0.0
27749: loss=0.000, reward_mean=0.1, reward_bound=0.0
27750: loss=0.000, reward_mean=0.1, reward_bound=0.0
27751: loss=0.000, reward_mean=0.0, reward_bound=0.0
27752: loss=0.000, reward_mean=0.0, reward_bound=0.0
27753: loss=0.000, reward_mean=0.1, reward_bou

27892: loss=0.000, reward_mean=0.1, reward_bound=0.0
27893: loss=0.000, reward_mean=0.1, reward_bound=0.0
27894: loss=0.000, reward_mean=0.1, reward_bound=0.0
27895: loss=0.000, reward_mean=0.1, reward_bound=0.0
27896: loss=0.000, reward_mean=0.0, reward_bound=0.0
27897: loss=0.000, reward_mean=0.0, reward_bound=0.0
27898: loss=0.000, reward_mean=0.1, reward_bound=0.0
27899: loss=0.000, reward_mean=0.2, reward_bound=0.0
27900: loss=0.000, reward_mean=0.1, reward_bound=0.0
27901: loss=0.000, reward_mean=0.1, reward_bound=0.0
27902: loss=0.000, reward_mean=0.0, reward_bound=0.0
27903: loss=0.000, reward_mean=0.2, reward_bound=0.0
27904: loss=0.000, reward_mean=0.1, reward_bound=0.0
27905: loss=0.000, reward_mean=0.1, reward_bound=0.0
27906: loss=0.000, reward_mean=0.1, reward_bound=0.0
27907: loss=0.000, reward_mean=0.1, reward_bound=0.0
27908: loss=0.000, reward_mean=0.1, reward_bound=0.0
27909: loss=0.000, reward_mean=0.0, reward_bound=0.0
27910: loss=0.000, reward_mean=0.1, reward_bou

28049: loss=0.000, reward_mean=0.1, reward_bound=0.0
28050: loss=0.000, reward_mean=0.0, reward_bound=0.0
28051: loss=0.000, reward_mean=0.1, reward_bound=0.0
28052: loss=0.000, reward_mean=0.0, reward_bound=0.0
28053: loss=0.000, reward_mean=0.1, reward_bound=0.0
28054: loss=0.000, reward_mean=0.0, reward_bound=0.0
28055: loss=0.000, reward_mean=0.1, reward_bound=0.0
28056: loss=0.000, reward_mean=0.1, reward_bound=0.0
28057: loss=0.000, reward_mean=0.0, reward_bound=0.0
28058: loss=0.000, reward_mean=0.0, reward_bound=0.0
28059: loss=0.000, reward_mean=0.1, reward_bound=0.0
28060: loss=0.000, reward_mean=0.1, reward_bound=0.0
28061: loss=0.000, reward_mean=0.0, reward_bound=0.0
28062: loss=0.000, reward_mean=0.1, reward_bound=0.0
28063: loss=0.000, reward_mean=0.0, reward_bound=0.0
28064: loss=0.000, reward_mean=0.1, reward_bound=0.0
28065: loss=0.000, reward_mean=0.1, reward_bound=0.0
28066: loss=0.000, reward_mean=0.1, reward_bound=0.0
28067: loss=0.000, reward_mean=0.1, reward_bou

28204: loss=0.000, reward_mean=0.0, reward_bound=0.0
28205: loss=0.000, reward_mean=0.1, reward_bound=0.0
28206: loss=0.000, reward_mean=0.0, reward_bound=0.0
28207: loss=0.000, reward_mean=0.1, reward_bound=0.0
28208: loss=0.000, reward_mean=0.1, reward_bound=0.0
28209: loss=0.000, reward_mean=0.0, reward_bound=0.0
28210: loss=0.000, reward_mean=0.1, reward_bound=0.0
28211: loss=0.000, reward_mean=0.1, reward_bound=0.0
28212: loss=0.000, reward_mean=0.0, reward_bound=0.0
28213: loss=0.000, reward_mean=0.0, reward_bound=0.0
28214: loss=0.000, reward_mean=0.0, reward_bound=0.0
28215: loss=0.000, reward_mean=0.0, reward_bound=0.0
28216: loss=0.000, reward_mean=0.0, reward_bound=0.0
28217: loss=0.000, reward_mean=0.0, reward_bound=0.0
28218: loss=0.000, reward_mean=0.0, reward_bound=0.0
28219: loss=0.000, reward_mean=0.1, reward_bound=0.0
28220: loss=0.000, reward_mean=0.1, reward_bound=0.0
28221: loss=0.000, reward_mean=0.0, reward_bound=0.0
28222: loss=0.000, reward_mean=0.0, reward_bou

28359: loss=0.000, reward_mean=0.1, reward_bound=0.0
28360: loss=0.000, reward_mean=0.0, reward_bound=0.0
28361: loss=0.000, reward_mean=0.2, reward_bound=0.0
28362: loss=0.000, reward_mean=0.1, reward_bound=0.0
28363: loss=0.000, reward_mean=0.1, reward_bound=0.0
28364: loss=0.000, reward_mean=0.0, reward_bound=0.0
28365: loss=0.000, reward_mean=0.0, reward_bound=0.0
28366: loss=0.000, reward_mean=0.0, reward_bound=0.0
28367: loss=0.000, reward_mean=0.1, reward_bound=0.0
28368: loss=0.000, reward_mean=0.1, reward_bound=0.0
28369: loss=0.000, reward_mean=0.0, reward_bound=0.0
28370: loss=0.000, reward_mean=0.1, reward_bound=0.0
28371: loss=0.000, reward_mean=0.1, reward_bound=0.0
28372: loss=0.000, reward_mean=0.0, reward_bound=0.0
28373: loss=0.000, reward_mean=0.1, reward_bound=0.0
28374: loss=0.000, reward_mean=0.1, reward_bound=0.0
28375: loss=0.000, reward_mean=0.2, reward_bound=0.0
28376: loss=0.000, reward_mean=0.0, reward_bound=0.0
28377: loss=0.000, reward_mean=0.0, reward_bou

28518: loss=0.000, reward_mean=0.1, reward_bound=0.0
28519: loss=0.000, reward_mean=0.0, reward_bound=0.0
28520: loss=0.000, reward_mean=0.1, reward_bound=0.0
28521: loss=0.000, reward_mean=0.0, reward_bound=0.0
28522: loss=0.000, reward_mean=0.1, reward_bound=0.0
28523: loss=0.000, reward_mean=0.0, reward_bound=0.0
28524: loss=0.000, reward_mean=0.1, reward_bound=0.0
28525: loss=0.000, reward_mean=0.0, reward_bound=0.0
28526: loss=0.000, reward_mean=0.0, reward_bound=0.0
28527: loss=0.000, reward_mean=0.1, reward_bound=0.0
28528: loss=0.000, reward_mean=0.1, reward_bound=0.0
28529: loss=0.000, reward_mean=0.0, reward_bound=0.0
28530: loss=0.000, reward_mean=0.0, reward_bound=0.0
28531: loss=0.000, reward_mean=0.1, reward_bound=0.0
28532: loss=0.000, reward_mean=0.0, reward_bound=0.0
28533: loss=0.000, reward_mean=0.1, reward_bound=0.0
28534: loss=0.000, reward_mean=0.0, reward_bound=0.0
28535: loss=0.000, reward_mean=0.1, reward_bound=0.0
28536: loss=0.000, reward_mean=0.1, reward_bou

28674: loss=0.000, reward_mean=0.2, reward_bound=0.0
28675: loss=0.000, reward_mean=0.0, reward_bound=0.0
28676: loss=0.000, reward_mean=0.0, reward_bound=0.0
28677: loss=0.000, reward_mean=0.1, reward_bound=0.0
28678: loss=0.000, reward_mean=0.1, reward_bound=0.0
28679: loss=0.000, reward_mean=0.1, reward_bound=0.0
28680: loss=0.000, reward_mean=0.1, reward_bound=0.0
28681: loss=0.000, reward_mean=0.1, reward_bound=0.0
28682: loss=0.000, reward_mean=0.1, reward_bound=0.0
28683: loss=0.000, reward_mean=0.1, reward_bound=0.0
28684: loss=0.000, reward_mean=0.1, reward_bound=0.0
28685: loss=0.000, reward_mean=0.0, reward_bound=0.0
28686: loss=0.000, reward_mean=0.1, reward_bound=0.0
28687: loss=0.000, reward_mean=0.2, reward_bound=0.0
28688: loss=0.000, reward_mean=0.1, reward_bound=0.0
28689: loss=0.000, reward_mean=0.0, reward_bound=0.0
28690: loss=0.000, reward_mean=0.0, reward_bound=0.0
28691: loss=0.000, reward_mean=0.0, reward_bound=0.0
28692: loss=0.000, reward_mean=0.1, reward_bou

28830: loss=0.000, reward_mean=0.0, reward_bound=0.0
28831: loss=0.000, reward_mean=0.1, reward_bound=0.0
28832: loss=0.000, reward_mean=0.1, reward_bound=0.0
28833: loss=0.000, reward_mean=0.1, reward_bound=0.0
28834: loss=0.000, reward_mean=0.0, reward_bound=0.0
28835: loss=0.000, reward_mean=0.1, reward_bound=0.0
28836: loss=0.000, reward_mean=0.0, reward_bound=0.0
28837: loss=0.000, reward_mean=0.0, reward_bound=0.0
28838: loss=0.000, reward_mean=0.1, reward_bound=0.0
28839: loss=0.000, reward_mean=0.1, reward_bound=0.0
28840: loss=0.000, reward_mean=0.0, reward_bound=0.0
28841: loss=0.000, reward_mean=0.0, reward_bound=0.0
28842: loss=0.000, reward_mean=0.1, reward_bound=0.0
28843: loss=0.000, reward_mean=0.1, reward_bound=0.0
28844: loss=0.000, reward_mean=0.0, reward_bound=0.0
28845: loss=0.000, reward_mean=0.0, reward_bound=0.0
28846: loss=0.000, reward_mean=0.0, reward_bound=0.0
28847: loss=0.000, reward_mean=0.1, reward_bound=0.0
28848: loss=0.000, reward_mean=0.0, reward_bou

28987: loss=0.000, reward_mean=0.0, reward_bound=0.0
28988: loss=0.000, reward_mean=0.0, reward_bound=0.0
28989: loss=0.000, reward_mean=0.0, reward_bound=0.0
28990: loss=0.000, reward_mean=0.1, reward_bound=0.0
28991: loss=0.000, reward_mean=0.1, reward_bound=0.0
28992: loss=0.000, reward_mean=0.1, reward_bound=0.0
28993: loss=0.000, reward_mean=0.0, reward_bound=0.0
28994: loss=0.000, reward_mean=0.1, reward_bound=0.0
28995: loss=0.000, reward_mean=0.1, reward_bound=0.0
28996: loss=0.000, reward_mean=0.1, reward_bound=0.0
28997: loss=0.000, reward_mean=0.1, reward_bound=0.0
28998: loss=0.000, reward_mean=0.1, reward_bound=0.0
28999: loss=0.000, reward_mean=0.1, reward_bound=0.0
29000: loss=0.000, reward_mean=0.1, reward_bound=0.0
29001: loss=0.000, reward_mean=0.1, reward_bound=0.0
29002: loss=0.000, reward_mean=0.1, reward_bound=0.0
29003: loss=0.000, reward_mean=0.0, reward_bound=0.0
29004: loss=0.000, reward_mean=0.0, reward_bound=0.0
29005: loss=0.000, reward_mean=0.1, reward_bou

29142: loss=0.000, reward_mean=0.1, reward_bound=0.0
29143: loss=0.000, reward_mean=0.0, reward_bound=0.0
29144: loss=0.000, reward_mean=0.0, reward_bound=0.0
29145: loss=0.000, reward_mean=0.1, reward_bound=0.0
29146: loss=0.000, reward_mean=0.1, reward_bound=0.0
29147: loss=0.000, reward_mean=0.0, reward_bound=0.0
29148: loss=0.000, reward_mean=0.1, reward_bound=0.0
29149: loss=0.000, reward_mean=0.1, reward_bound=0.0
29150: loss=0.000, reward_mean=0.2, reward_bound=0.0
29151: loss=0.000, reward_mean=0.1, reward_bound=0.0
29152: loss=0.000, reward_mean=0.0, reward_bound=0.0
29153: loss=0.000, reward_mean=0.2, reward_bound=0.0
29154: loss=0.000, reward_mean=0.1, reward_bound=0.0
29155: loss=0.000, reward_mean=0.1, reward_bound=0.0
29156: loss=0.000, reward_mean=0.0, reward_bound=0.0
29157: loss=0.000, reward_mean=0.1, reward_bound=0.0
29158: loss=0.000, reward_mean=0.0, reward_bound=0.0
29159: loss=0.000, reward_mean=0.0, reward_bound=0.0
29160: loss=0.000, reward_mean=0.1, reward_bou

29297: loss=0.000, reward_mean=0.0, reward_bound=0.0
29298: loss=0.000, reward_mean=0.2, reward_bound=0.0
29299: loss=0.000, reward_mean=0.0, reward_bound=0.0
29300: loss=0.000, reward_mean=0.0, reward_bound=0.0
29301: loss=0.000, reward_mean=0.0, reward_bound=0.0
29302: loss=0.000, reward_mean=0.0, reward_bound=0.0
29303: loss=0.000, reward_mean=0.0, reward_bound=0.0
29304: loss=0.000, reward_mean=0.1, reward_bound=0.0
29305: loss=0.000, reward_mean=0.2, reward_bound=0.0
29306: loss=0.000, reward_mean=0.1, reward_bound=0.0
29307: loss=0.000, reward_mean=0.2, reward_bound=0.0
29308: loss=0.000, reward_mean=0.0, reward_bound=0.0
29309: loss=0.000, reward_mean=0.1, reward_bound=0.0
29310: loss=0.000, reward_mean=0.1, reward_bound=0.0
29311: loss=0.000, reward_mean=0.1, reward_bound=0.0
29312: loss=0.000, reward_mean=0.0, reward_bound=0.0
29313: loss=0.000, reward_mean=0.0, reward_bound=0.0
29314: loss=0.000, reward_mean=0.0, reward_bound=0.0
29315: loss=0.000, reward_mean=0.1, reward_bou

29453: loss=0.000, reward_mean=0.0, reward_bound=0.0
29454: loss=0.000, reward_mean=0.0, reward_bound=0.0
29455: loss=0.000, reward_mean=0.0, reward_bound=0.0
29456: loss=0.000, reward_mean=0.0, reward_bound=0.0
29457: loss=0.000, reward_mean=0.1, reward_bound=0.0
29458: loss=0.000, reward_mean=0.0, reward_bound=0.0
29459: loss=0.000, reward_mean=0.0, reward_bound=0.0
29460: loss=0.000, reward_mean=0.0, reward_bound=0.0
29461: loss=0.000, reward_mean=0.1, reward_bound=0.0
29462: loss=0.000, reward_mean=0.0, reward_bound=0.0
29463: loss=0.000, reward_mean=0.0, reward_bound=0.0
29464: loss=0.000, reward_mean=0.0, reward_bound=0.0
29465: loss=0.000, reward_mean=0.0, reward_bound=0.0
29466: loss=0.000, reward_mean=0.1, reward_bound=0.0
29467: loss=0.000, reward_mean=0.1, reward_bound=0.0
29468: loss=0.000, reward_mean=0.1, reward_bound=0.0
29469: loss=0.000, reward_mean=0.1, reward_bound=0.0
29470: loss=0.000, reward_mean=0.0, reward_bound=0.0
29471: loss=0.000, reward_mean=0.1, reward_bou

29615: loss=0.000, reward_mean=0.1, reward_bound=0.0
29616: loss=0.000, reward_mean=0.0, reward_bound=0.0
29617: loss=0.000, reward_mean=0.1, reward_bound=0.0
29618: loss=0.000, reward_mean=0.0, reward_bound=0.0
29619: loss=0.000, reward_mean=0.1, reward_bound=0.0
29620: loss=0.000, reward_mean=0.1, reward_bound=0.0
29621: loss=0.000, reward_mean=0.2, reward_bound=0.0
29622: loss=0.000, reward_mean=0.1, reward_bound=0.0
29623: loss=0.000, reward_mean=0.0, reward_bound=0.0
29624: loss=0.000, reward_mean=0.1, reward_bound=0.0
29625: loss=0.000, reward_mean=0.0, reward_bound=0.0
29626: loss=0.000, reward_mean=0.1, reward_bound=0.0
29627: loss=0.000, reward_mean=0.1, reward_bound=0.0
29628: loss=0.000, reward_mean=0.0, reward_bound=0.0
29629: loss=0.000, reward_mean=0.1, reward_bound=0.0
29630: loss=0.000, reward_mean=0.0, reward_bound=0.0
29631: loss=0.000, reward_mean=0.1, reward_bound=0.0
29632: loss=0.000, reward_mean=0.2, reward_bound=0.0
29633: loss=0.000, reward_mean=0.0, reward_bou

29770: loss=0.000, reward_mean=0.0, reward_bound=0.0
29771: loss=0.000, reward_mean=0.1, reward_bound=0.0
29772: loss=0.000, reward_mean=0.1, reward_bound=0.0
29773: loss=0.000, reward_mean=0.1, reward_bound=0.0
29774: loss=0.000, reward_mean=0.0, reward_bound=0.0
29775: loss=0.000, reward_mean=0.1, reward_bound=0.0
29776: loss=0.000, reward_mean=0.1, reward_bound=0.0
29777: loss=0.000, reward_mean=0.1, reward_bound=0.0
29778: loss=0.000, reward_mean=0.0, reward_bound=0.0
29779: loss=0.000, reward_mean=0.0, reward_bound=0.0
29780: loss=0.000, reward_mean=0.0, reward_bound=0.0
29781: loss=0.000, reward_mean=0.2, reward_bound=0.0
29782: loss=0.000, reward_mean=0.0, reward_bound=0.0
29783: loss=0.000, reward_mean=0.1, reward_bound=0.0
29784: loss=0.000, reward_mean=0.0, reward_bound=0.0
29785: loss=0.000, reward_mean=0.0, reward_bound=0.0
29786: loss=0.000, reward_mean=0.1, reward_bound=0.0
29787: loss=0.000, reward_mean=0.0, reward_bound=0.0
29788: loss=0.000, reward_mean=0.1, reward_bou

29927: loss=0.000, reward_mean=0.1, reward_bound=0.0
29928: loss=0.000, reward_mean=0.1, reward_bound=0.0
29929: loss=0.000, reward_mean=0.1, reward_bound=0.0
29930: loss=0.000, reward_mean=0.0, reward_bound=0.0
29931: loss=0.000, reward_mean=0.0, reward_bound=0.0
29932: loss=0.000, reward_mean=0.2, reward_bound=0.0
29933: loss=0.000, reward_mean=0.1, reward_bound=0.0
29934: loss=0.000, reward_mean=0.1, reward_bound=0.0
29935: loss=0.000, reward_mean=0.2, reward_bound=0.0
29936: loss=0.000, reward_mean=0.2, reward_bound=0.0
29937: loss=0.000, reward_mean=0.1, reward_bound=0.0
29938: loss=0.000, reward_mean=0.1, reward_bound=0.0
29939: loss=0.000, reward_mean=0.1, reward_bound=0.0
29940: loss=0.000, reward_mean=0.1, reward_bound=0.0
29941: loss=0.000, reward_mean=0.0, reward_bound=0.0
29942: loss=0.000, reward_mean=0.0, reward_bound=0.0
29943: loss=0.000, reward_mean=0.1, reward_bound=0.0
29944: loss=0.000, reward_mean=0.0, reward_bound=0.0
29945: loss=0.000, reward_mean=0.1, reward_bou

30083: loss=0.000, reward_mean=0.1, reward_bound=0.0
30084: loss=0.000, reward_mean=0.2, reward_bound=0.0
30085: loss=0.000, reward_mean=0.1, reward_bound=0.0
30086: loss=0.000, reward_mean=0.1, reward_bound=0.0
30087: loss=0.000, reward_mean=0.0, reward_bound=0.0
30088: loss=0.000, reward_mean=0.1, reward_bound=0.0
30089: loss=0.000, reward_mean=0.1, reward_bound=0.0
30090: loss=0.000, reward_mean=0.1, reward_bound=0.0
30091: loss=0.000, reward_mean=0.1, reward_bound=0.0
30092: loss=0.000, reward_mean=0.1, reward_bound=0.0
30093: loss=0.000, reward_mean=0.0, reward_bound=0.0
30094: loss=0.000, reward_mean=0.0, reward_bound=0.0
30095: loss=0.000, reward_mean=0.1, reward_bound=0.0
30096: loss=0.000, reward_mean=0.0, reward_bound=0.0
30097: loss=0.000, reward_mean=0.1, reward_bound=0.0
30098: loss=0.000, reward_mean=0.0, reward_bound=0.0
30099: loss=0.000, reward_mean=0.1, reward_bound=0.0
30100: loss=0.000, reward_mean=0.1, reward_bound=0.0
30101: loss=0.000, reward_mean=0.1, reward_bou

30238: loss=0.000, reward_mean=0.1, reward_bound=0.0
30239: loss=0.000, reward_mean=0.0, reward_bound=0.0
30240: loss=0.000, reward_mean=0.1, reward_bound=0.0
30241: loss=0.000, reward_mean=0.1, reward_bound=0.0
30242: loss=0.000, reward_mean=0.0, reward_bound=0.0
30243: loss=0.000, reward_mean=0.1, reward_bound=0.0
30244: loss=0.000, reward_mean=0.1, reward_bound=0.0
30245: loss=0.000, reward_mean=0.0, reward_bound=0.0
30246: loss=0.000, reward_mean=0.1, reward_bound=0.0
30247: loss=0.000, reward_mean=0.1, reward_bound=0.0
30248: loss=0.000, reward_mean=0.1, reward_bound=0.0
30249: loss=0.000, reward_mean=0.1, reward_bound=0.0
30250: loss=0.000, reward_mean=0.0, reward_bound=0.0
30251: loss=0.000, reward_mean=0.1, reward_bound=0.0
30252: loss=0.000, reward_mean=0.0, reward_bound=0.0
30253: loss=0.000, reward_mean=0.0, reward_bound=0.0
30254: loss=0.000, reward_mean=0.0, reward_bound=0.0
30255: loss=0.000, reward_mean=0.0, reward_bound=0.0
30256: loss=0.000, reward_mean=0.1, reward_bou

30396: loss=0.000, reward_mean=0.1, reward_bound=0.0
30397: loss=0.000, reward_mean=0.0, reward_bound=0.0
30398: loss=0.000, reward_mean=0.0, reward_bound=0.0
30399: loss=0.000, reward_mean=0.1, reward_bound=0.0
30400: loss=0.000, reward_mean=0.1, reward_bound=0.0
30401: loss=0.000, reward_mean=0.1, reward_bound=0.0
30402: loss=0.000, reward_mean=0.1, reward_bound=0.0
30403: loss=0.000, reward_mean=0.0, reward_bound=0.0
30404: loss=0.000, reward_mean=0.1, reward_bound=0.0
30405: loss=0.000, reward_mean=0.2, reward_bound=0.0
30406: loss=0.000, reward_mean=0.1, reward_bound=0.0
30407: loss=0.000, reward_mean=0.0, reward_bound=0.0
30408: loss=0.000, reward_mean=0.1, reward_bound=0.0
30409: loss=0.000, reward_mean=0.0, reward_bound=0.0
30410: loss=0.000, reward_mean=0.2, reward_bound=0.0
30411: loss=0.000, reward_mean=0.1, reward_bound=0.0
30412: loss=0.000, reward_mean=0.0, reward_bound=0.0
30413: loss=0.000, reward_mean=0.1, reward_bound=0.0
30414: loss=0.000, reward_mean=0.1, reward_bou

30552: loss=0.000, reward_mean=0.0, reward_bound=0.0
30553: loss=0.000, reward_mean=0.2, reward_bound=0.0
30554: loss=0.000, reward_mean=0.0, reward_bound=0.0
30555: loss=0.000, reward_mean=0.1, reward_bound=0.0
30556: loss=0.000, reward_mean=0.0, reward_bound=0.0
30557: loss=0.000, reward_mean=0.1, reward_bound=0.0
30558: loss=0.000, reward_mean=0.0, reward_bound=0.0
30559: loss=0.000, reward_mean=0.1, reward_bound=0.0
30560: loss=0.000, reward_mean=0.1, reward_bound=0.0
30561: loss=0.000, reward_mean=0.0, reward_bound=0.0
30562: loss=0.000, reward_mean=0.0, reward_bound=0.0
30563: loss=0.000, reward_mean=0.2, reward_bound=0.0
30564: loss=0.000, reward_mean=0.0, reward_bound=0.0
30565: loss=0.000, reward_mean=0.1, reward_bound=0.0
30566: loss=0.000, reward_mean=0.0, reward_bound=0.0
30567: loss=0.000, reward_mean=0.2, reward_bound=0.0
30568: loss=0.000, reward_mean=0.0, reward_bound=0.0
30569: loss=0.000, reward_mean=0.1, reward_bound=0.0
30570: loss=0.000, reward_mean=0.1, reward_bou

30710: loss=0.000, reward_mean=0.0, reward_bound=0.0
30711: loss=0.000, reward_mean=0.1, reward_bound=0.0
30712: loss=0.000, reward_mean=0.2, reward_bound=0.0
30713: loss=0.000, reward_mean=0.1, reward_bound=0.0
30714: loss=0.000, reward_mean=0.0, reward_bound=0.0
30715: loss=0.000, reward_mean=0.1, reward_bound=0.0
30716: loss=0.000, reward_mean=0.1, reward_bound=0.0
30717: loss=0.000, reward_mean=0.0, reward_bound=0.0
30718: loss=0.000, reward_mean=0.0, reward_bound=0.0
30719: loss=0.000, reward_mean=0.1, reward_bound=0.0
30720: loss=0.000, reward_mean=0.0, reward_bound=0.0
30721: loss=0.000, reward_mean=0.1, reward_bound=0.0
30722: loss=0.000, reward_mean=0.1, reward_bound=0.0
30723: loss=0.000, reward_mean=0.0, reward_bound=0.0
30724: loss=0.000, reward_mean=0.0, reward_bound=0.0
30725: loss=0.000, reward_mean=0.0, reward_bound=0.0
30726: loss=0.000, reward_mean=0.1, reward_bound=0.0
30727: loss=0.000, reward_mean=0.0, reward_bound=0.0
30728: loss=0.000, reward_mean=0.1, reward_bou

30867: loss=0.000, reward_mean=0.1, reward_bound=0.0
30868: loss=0.000, reward_mean=0.0, reward_bound=0.0
30869: loss=0.000, reward_mean=0.0, reward_bound=0.0
30870: loss=0.000, reward_mean=0.1, reward_bound=0.0
30871: loss=0.000, reward_mean=0.1, reward_bound=0.0
30872: loss=0.000, reward_mean=0.0, reward_bound=0.0
30873: loss=0.000, reward_mean=0.1, reward_bound=0.0
30874: loss=0.000, reward_mean=0.0, reward_bound=0.0
30875: loss=0.000, reward_mean=0.1, reward_bound=0.0
30876: loss=0.000, reward_mean=0.0, reward_bound=0.0
30877: loss=0.000, reward_mean=0.0, reward_bound=0.0
30878: loss=0.000, reward_mean=0.0, reward_bound=0.0
30879: loss=0.000, reward_mean=0.0, reward_bound=0.0
30880: loss=0.000, reward_mean=0.2, reward_bound=0.0
30881: loss=0.000, reward_mean=0.0, reward_bound=0.0
30882: loss=0.000, reward_mean=0.0, reward_bound=0.0
30883: loss=0.000, reward_mean=0.1, reward_bound=0.0
30884: loss=0.000, reward_mean=0.0, reward_bound=0.0
30885: loss=0.000, reward_mean=0.1, reward_bou

31024: loss=0.000, reward_mean=0.0, reward_bound=0.0
31025: loss=0.000, reward_mean=0.0, reward_bound=0.0
31026: loss=0.000, reward_mean=0.1, reward_bound=0.0
31027: loss=0.000, reward_mean=0.1, reward_bound=0.0
31028: loss=0.000, reward_mean=0.1, reward_bound=0.0
31029: loss=0.000, reward_mean=0.0, reward_bound=0.0
31030: loss=0.000, reward_mean=0.0, reward_bound=0.0
31031: loss=0.000, reward_mean=0.1, reward_bound=0.0
31032: loss=0.000, reward_mean=0.0, reward_bound=0.0
31033: loss=0.000, reward_mean=0.1, reward_bound=0.0
31034: loss=0.000, reward_mean=0.2, reward_bound=0.0
31035: loss=0.000, reward_mean=0.0, reward_bound=0.0
31036: loss=0.000, reward_mean=0.1, reward_bound=0.0
31037: loss=0.000, reward_mean=0.1, reward_bound=0.0
31038: loss=0.000, reward_mean=0.1, reward_bound=0.0
31039: loss=0.000, reward_mean=0.0, reward_bound=0.0
31040: loss=0.000, reward_mean=0.1, reward_bound=0.0
31041: loss=0.000, reward_mean=0.0, reward_bound=0.0
31042: loss=0.000, reward_mean=0.1, reward_bou

31181: loss=0.000, reward_mean=0.0, reward_bound=0.0
31182: loss=0.000, reward_mean=0.1, reward_bound=0.0
31183: loss=0.000, reward_mean=0.1, reward_bound=0.0
31184: loss=0.000, reward_mean=0.0, reward_bound=0.0
31185: loss=0.000, reward_mean=0.1, reward_bound=0.0
31186: loss=0.000, reward_mean=0.1, reward_bound=0.0
31187: loss=0.000, reward_mean=0.0, reward_bound=0.0
31188: loss=0.000, reward_mean=0.0, reward_bound=0.0
31189: loss=0.000, reward_mean=0.0, reward_bound=0.0
31190: loss=0.000, reward_mean=0.0, reward_bound=0.0
31191: loss=0.000, reward_mean=0.1, reward_bound=0.0
31192: loss=0.000, reward_mean=0.1, reward_bound=0.0
31193: loss=0.000, reward_mean=0.1, reward_bound=0.0
31194: loss=0.000, reward_mean=0.1, reward_bound=0.0
31195: loss=0.000, reward_mean=0.0, reward_bound=0.0
31196: loss=0.000, reward_mean=0.1, reward_bound=0.0
31197: loss=0.000, reward_mean=0.1, reward_bound=0.0
31198: loss=0.000, reward_mean=0.1, reward_bound=0.0
31199: loss=0.000, reward_mean=0.1, reward_bou

31340: loss=0.000, reward_mean=0.1, reward_bound=0.0
31341: loss=0.000, reward_mean=0.1, reward_bound=0.0
31342: loss=0.000, reward_mean=0.0, reward_bound=0.0
31343: loss=0.000, reward_mean=0.1, reward_bound=0.0
31344: loss=0.000, reward_mean=0.1, reward_bound=0.0
31345: loss=0.000, reward_mean=0.0, reward_bound=0.0
31346: loss=0.000, reward_mean=0.0, reward_bound=0.0
31347: loss=0.000, reward_mean=0.1, reward_bound=0.0
31348: loss=0.000, reward_mean=0.1, reward_bound=0.0
31349: loss=0.000, reward_mean=0.1, reward_bound=0.0
31350: loss=0.000, reward_mean=0.0, reward_bound=0.0
31351: loss=0.000, reward_mean=0.1, reward_bound=0.0
31352: loss=0.000, reward_mean=0.1, reward_bound=0.0
31353: loss=0.000, reward_mean=0.0, reward_bound=0.0
31354: loss=0.000, reward_mean=0.1, reward_bound=0.0
31355: loss=0.000, reward_mean=0.0, reward_bound=0.0
31356: loss=0.000, reward_mean=0.0, reward_bound=0.0
31357: loss=0.000, reward_mean=0.0, reward_bound=0.0
31358: loss=0.000, reward_mean=0.0, reward_bou

31499: loss=0.000, reward_mean=0.0, reward_bound=0.0
31500: loss=0.000, reward_mean=0.1, reward_bound=0.0
31501: loss=0.000, reward_mean=0.1, reward_bound=0.0
31502: loss=0.000, reward_mean=0.2, reward_bound=0.0
31503: loss=0.000, reward_mean=0.1, reward_bound=0.0
31504: loss=0.000, reward_mean=0.1, reward_bound=0.0
31505: loss=0.000, reward_mean=0.0, reward_bound=0.0
31506: loss=0.000, reward_mean=0.0, reward_bound=0.0
31507: loss=0.000, reward_mean=0.1, reward_bound=0.0
31508: loss=0.000, reward_mean=0.0, reward_bound=0.0
31509: loss=0.000, reward_mean=0.1, reward_bound=0.0
31510: loss=0.000, reward_mean=0.1, reward_bound=0.0
31511: loss=0.000, reward_mean=0.2, reward_bound=0.0
31512: loss=0.000, reward_mean=0.0, reward_bound=0.0
31513: loss=0.000, reward_mean=0.0, reward_bound=0.0
31514: loss=0.000, reward_mean=0.0, reward_bound=0.0
31515: loss=0.000, reward_mean=0.0, reward_bound=0.0
31516: loss=0.000, reward_mean=0.1, reward_bound=0.0
31517: loss=0.000, reward_mean=0.0, reward_bou

31658: loss=0.000, reward_mean=0.0, reward_bound=0.0
31659: loss=0.000, reward_mean=0.0, reward_bound=0.0
31660: loss=0.000, reward_mean=0.1, reward_bound=0.0
31661: loss=0.000, reward_mean=0.0, reward_bound=0.0
31662: loss=0.000, reward_mean=0.0, reward_bound=0.0
31663: loss=0.000, reward_mean=0.0, reward_bound=0.0
31664: loss=0.000, reward_mean=0.1, reward_bound=0.0
31665: loss=0.000, reward_mean=0.0, reward_bound=0.0
31666: loss=0.000, reward_mean=0.2, reward_bound=0.0
31667: loss=0.000, reward_mean=0.1, reward_bound=0.0
31668: loss=0.000, reward_mean=0.1, reward_bound=0.0
31669: loss=0.000, reward_mean=0.0, reward_bound=0.0
31670: loss=0.000, reward_mean=0.1, reward_bound=0.0
31671: loss=0.000, reward_mean=0.1, reward_bound=0.0
31672: loss=0.000, reward_mean=0.0, reward_bound=0.0
31673: loss=0.000, reward_mean=0.1, reward_bound=0.0
31674: loss=0.000, reward_mean=0.0, reward_bound=0.0
31675: loss=0.000, reward_mean=0.1, reward_bound=0.0
31676: loss=0.000, reward_mean=0.1, reward_bou

31813: loss=0.000, reward_mean=0.0, reward_bound=0.0
31814: loss=0.000, reward_mean=0.2, reward_bound=0.0
31815: loss=0.000, reward_mean=0.0, reward_bound=0.0
31816: loss=0.000, reward_mean=0.0, reward_bound=0.0
31817: loss=0.000, reward_mean=0.1, reward_bound=0.0
31818: loss=0.000, reward_mean=0.1, reward_bound=0.0
31819: loss=0.000, reward_mean=0.0, reward_bound=0.0
31820: loss=0.000, reward_mean=0.0, reward_bound=0.0
31821: loss=0.000, reward_mean=0.1, reward_bound=0.0
31822: loss=0.000, reward_mean=0.1, reward_bound=0.0
31823: loss=0.000, reward_mean=0.1, reward_bound=0.0
31824: loss=0.000, reward_mean=0.0, reward_bound=0.0
31825: loss=0.000, reward_mean=0.1, reward_bound=0.0
31826: loss=0.000, reward_mean=0.0, reward_bound=0.0
31827: loss=0.000, reward_mean=0.2, reward_bound=0.0
31828: loss=0.000, reward_mean=0.1, reward_bound=0.0
31829: loss=0.000, reward_mean=0.0, reward_bound=0.0
31830: loss=0.000, reward_mean=0.0, reward_bound=0.0
31831: loss=0.000, reward_mean=0.1, reward_bou

31971: loss=0.000, reward_mean=0.0, reward_bound=0.0
31972: loss=0.000, reward_mean=0.2, reward_bound=0.0
31973: loss=0.000, reward_mean=0.0, reward_bound=0.0
31974: loss=0.000, reward_mean=0.0, reward_bound=0.0
31975: loss=0.000, reward_mean=0.1, reward_bound=0.0
31976: loss=0.000, reward_mean=0.0, reward_bound=0.0
31977: loss=0.000, reward_mean=0.0, reward_bound=0.0
31978: loss=0.000, reward_mean=0.0, reward_bound=0.0
31979: loss=0.000, reward_mean=0.1, reward_bound=0.0
31980: loss=0.000, reward_mean=0.1, reward_bound=0.0
31981: loss=0.000, reward_mean=0.1, reward_bound=0.0
31982: loss=0.000, reward_mean=0.0, reward_bound=0.0
31983: loss=0.000, reward_mean=0.1, reward_bound=0.0
31984: loss=0.000, reward_mean=0.1, reward_bound=0.0
31985: loss=0.000, reward_mean=0.0, reward_bound=0.0
31986: loss=0.000, reward_mean=0.1, reward_bound=0.0
31987: loss=0.000, reward_mean=0.0, reward_bound=0.0
31988: loss=0.000, reward_mean=0.1, reward_bound=0.0
31989: loss=0.000, reward_mean=0.1, reward_bou

32132: loss=0.000, reward_mean=0.1, reward_bound=0.0
32133: loss=0.000, reward_mean=0.0, reward_bound=0.0
32134: loss=0.000, reward_mean=0.0, reward_bound=0.0
32135: loss=0.000, reward_mean=0.1, reward_bound=0.0
32136: loss=0.000, reward_mean=0.0, reward_bound=0.0
32137: loss=0.000, reward_mean=0.1, reward_bound=0.0
32138: loss=0.000, reward_mean=0.2, reward_bound=0.0
32139: loss=0.000, reward_mean=0.0, reward_bound=0.0
32140: loss=0.000, reward_mean=0.1, reward_bound=0.0
32141: loss=0.000, reward_mean=0.1, reward_bound=0.0
32142: loss=0.000, reward_mean=0.0, reward_bound=0.0
32143: loss=0.000, reward_mean=0.1, reward_bound=0.0
32144: loss=0.000, reward_mean=0.1, reward_bound=0.0
32145: loss=0.000, reward_mean=0.0, reward_bound=0.0
32146: loss=0.000, reward_mean=0.1, reward_bound=0.0
32147: loss=0.000, reward_mean=0.0, reward_bound=0.0
32148: loss=0.000, reward_mean=0.1, reward_bound=0.0
32149: loss=0.000, reward_mean=0.0, reward_bound=0.0
32150: loss=0.000, reward_mean=0.0, reward_bou

32291: loss=0.000, reward_mean=0.1, reward_bound=0.0
32292: loss=0.000, reward_mean=0.0, reward_bound=0.0
32293: loss=0.000, reward_mean=0.0, reward_bound=0.0
32294: loss=0.000, reward_mean=0.1, reward_bound=0.0
32295: loss=0.000, reward_mean=0.1, reward_bound=0.0
32296: loss=0.000, reward_mean=0.1, reward_bound=0.0
32297: loss=0.000, reward_mean=0.0, reward_bound=0.0
32298: loss=0.000, reward_mean=0.2, reward_bound=0.0
32299: loss=0.000, reward_mean=0.1, reward_bound=0.0
32300: loss=0.000, reward_mean=0.1, reward_bound=0.0
32301: loss=0.000, reward_mean=0.0, reward_bound=0.0
32302: loss=0.000, reward_mean=0.1, reward_bound=0.0
32303: loss=0.000, reward_mean=0.1, reward_bound=0.0
32304: loss=0.000, reward_mean=0.0, reward_bound=0.0
32305: loss=0.000, reward_mean=0.1, reward_bound=0.0
32306: loss=0.000, reward_mean=0.1, reward_bound=0.0
32307: loss=0.000, reward_mean=0.1, reward_bound=0.0
32308: loss=0.000, reward_mean=0.0, reward_bound=0.0
32309: loss=0.000, reward_mean=0.1, reward_bou

32446: loss=0.000, reward_mean=0.1, reward_bound=0.0
32447: loss=0.000, reward_mean=0.1, reward_bound=0.0
32448: loss=0.000, reward_mean=0.0, reward_bound=0.0
32449: loss=0.000, reward_mean=0.0, reward_bound=0.0
32450: loss=0.000, reward_mean=0.1, reward_bound=0.0
32451: loss=0.000, reward_mean=0.0, reward_bound=0.0
32452: loss=0.000, reward_mean=0.0, reward_bound=0.0
32453: loss=0.000, reward_mean=0.1, reward_bound=0.0
32454: loss=0.000, reward_mean=0.0, reward_bound=0.0
32455: loss=0.000, reward_mean=0.1, reward_bound=0.0
32456: loss=0.000, reward_mean=0.1, reward_bound=0.0
32457: loss=0.000, reward_mean=0.1, reward_bound=0.0
32458: loss=0.000, reward_mean=0.0, reward_bound=0.0
32459: loss=0.000, reward_mean=0.0, reward_bound=0.0
32460: loss=0.000, reward_mean=0.1, reward_bound=0.0
32461: loss=0.000, reward_mean=0.0, reward_bound=0.0
32462: loss=0.000, reward_mean=0.0, reward_bound=0.0
32463: loss=0.000, reward_mean=0.0, reward_bound=0.0
32464: loss=0.000, reward_mean=0.0, reward_bou

32604: loss=0.000, reward_mean=0.0, reward_bound=0.0
32605: loss=0.000, reward_mean=0.0, reward_bound=0.0
32606: loss=0.000, reward_mean=0.0, reward_bound=0.0
32607: loss=0.000, reward_mean=0.0, reward_bound=0.0
32608: loss=0.000, reward_mean=0.0, reward_bound=0.0
32609: loss=0.000, reward_mean=0.1, reward_bound=0.0
32610: loss=0.000, reward_mean=0.0, reward_bound=0.0
32611: loss=0.000, reward_mean=0.1, reward_bound=0.0
32612: loss=0.000, reward_mean=0.0, reward_bound=0.0
32613: loss=0.000, reward_mean=0.0, reward_bound=0.0
32614: loss=0.000, reward_mean=0.1, reward_bound=0.0
32615: loss=0.000, reward_mean=0.0, reward_bound=0.0
32616: loss=0.000, reward_mean=0.1, reward_bound=0.0
32617: loss=0.000, reward_mean=0.1, reward_bound=0.0
32618: loss=0.000, reward_mean=0.0, reward_bound=0.0
32619: loss=0.000, reward_mean=0.1, reward_bound=0.0
32620: loss=0.000, reward_mean=0.1, reward_bound=0.0
32621: loss=0.000, reward_mean=0.1, reward_bound=0.0
32622: loss=0.000, reward_mean=0.1, reward_bou

32762: loss=0.000, reward_mean=0.1, reward_bound=0.0
32763: loss=0.000, reward_mean=0.0, reward_bound=0.0
32764: loss=0.000, reward_mean=0.0, reward_bound=0.0
32765: loss=0.000, reward_mean=0.2, reward_bound=0.0
32766: loss=0.000, reward_mean=0.1, reward_bound=0.0
32767: loss=0.000, reward_mean=0.1, reward_bound=0.0
32768: loss=0.000, reward_mean=0.0, reward_bound=0.0
32769: loss=0.000, reward_mean=0.1, reward_bound=0.0
32770: loss=0.000, reward_mean=0.0, reward_bound=0.0
32771: loss=0.000, reward_mean=0.2, reward_bound=0.0
32772: loss=0.000, reward_mean=0.0, reward_bound=0.0
32773: loss=0.000, reward_mean=0.0, reward_bound=0.0
32774: loss=0.000, reward_mean=0.1, reward_bound=0.0
32775: loss=0.000, reward_mean=0.1, reward_bound=0.0
32776: loss=0.000, reward_mean=0.1, reward_bound=0.0
32777: loss=0.000, reward_mean=0.1, reward_bound=0.0
32778: loss=0.000, reward_mean=0.1, reward_bound=0.0
32779: loss=0.000, reward_mean=0.1, reward_bound=0.0
32780: loss=0.000, reward_mean=0.1, reward_bou

32923: loss=0.000, reward_mean=0.0, reward_bound=0.0
32924: loss=0.000, reward_mean=0.0, reward_bound=0.0
32925: loss=0.000, reward_mean=0.0, reward_bound=0.0
32926: loss=0.000, reward_mean=0.1, reward_bound=0.0
32927: loss=0.000, reward_mean=0.0, reward_bound=0.0
32928: loss=0.000, reward_mean=0.0, reward_bound=0.0
32929: loss=0.000, reward_mean=0.0, reward_bound=0.0
32930: loss=0.000, reward_mean=0.1, reward_bound=0.0
32931: loss=0.000, reward_mean=0.1, reward_bound=0.0
32932: loss=0.000, reward_mean=0.0, reward_bound=0.0
32933: loss=0.000, reward_mean=0.0, reward_bound=0.0
32934: loss=0.000, reward_mean=0.2, reward_bound=0.0
32935: loss=0.000, reward_mean=0.1, reward_bound=0.0
32936: loss=0.000, reward_mean=0.2, reward_bound=0.0
32937: loss=0.000, reward_mean=0.1, reward_bound=0.0
32938: loss=0.000, reward_mean=0.0, reward_bound=0.0
32939: loss=0.000, reward_mean=0.0, reward_bound=0.0
32940: loss=0.000, reward_mean=0.1, reward_bound=0.0
32941: loss=0.000, reward_mean=0.0, reward_bou

33085: loss=0.000, reward_mean=0.1, reward_bound=0.0
33086: loss=0.000, reward_mean=0.1, reward_bound=0.0
33087: loss=0.000, reward_mean=0.1, reward_bound=0.0
33088: loss=0.000, reward_mean=0.0, reward_bound=0.0
33089: loss=0.000, reward_mean=0.1, reward_bound=0.0
33090: loss=0.000, reward_mean=0.1, reward_bound=0.0
33091: loss=0.000, reward_mean=0.1, reward_bound=0.0
33092: loss=0.000, reward_mean=0.1, reward_bound=0.0
33093: loss=0.000, reward_mean=0.1, reward_bound=0.0
33094: loss=0.000, reward_mean=0.1, reward_bound=0.0
33095: loss=0.000, reward_mean=0.1, reward_bound=0.0
33096: loss=0.000, reward_mean=0.0, reward_bound=0.0
33097: loss=0.000, reward_mean=0.0, reward_bound=0.0
33098: loss=0.000, reward_mean=0.1, reward_bound=0.0
33099: loss=0.000, reward_mean=0.0, reward_bound=0.0
33100: loss=0.000, reward_mean=0.1, reward_bound=0.0
33101: loss=0.000, reward_mean=0.1, reward_bound=0.0
33102: loss=0.000, reward_mean=0.1, reward_bound=0.0
33103: loss=0.000, reward_mean=0.0, reward_bou

33245: loss=0.000, reward_mean=0.0, reward_bound=0.0
33246: loss=0.000, reward_mean=0.0, reward_bound=0.0
33247: loss=0.000, reward_mean=0.0, reward_bound=0.0
33248: loss=0.000, reward_mean=0.0, reward_bound=0.0
33249: loss=0.000, reward_mean=0.0, reward_bound=0.0
33250: loss=0.000, reward_mean=0.1, reward_bound=0.0
33251: loss=0.000, reward_mean=0.1, reward_bound=0.0
33252: loss=0.000, reward_mean=0.1, reward_bound=0.0
33253: loss=0.000, reward_mean=0.1, reward_bound=0.0
33254: loss=0.000, reward_mean=0.0, reward_bound=0.0
33255: loss=0.000, reward_mean=0.0, reward_bound=0.0
33256: loss=0.000, reward_mean=0.0, reward_bound=0.0
33257: loss=0.000, reward_mean=0.1, reward_bound=0.0
33258: loss=0.000, reward_mean=0.0, reward_bound=0.0
33259: loss=0.000, reward_mean=0.1, reward_bound=0.0
33260: loss=0.000, reward_mean=0.0, reward_bound=0.0
33261: loss=0.000, reward_mean=0.0, reward_bound=0.0
33262: loss=0.000, reward_mean=0.1, reward_bound=0.0
33263: loss=0.000, reward_mean=0.1, reward_bou

33402: loss=0.000, reward_mean=0.1, reward_bound=0.0
33403: loss=0.000, reward_mean=0.0, reward_bound=0.0
33404: loss=0.000, reward_mean=0.0, reward_bound=0.0
33405: loss=0.000, reward_mean=0.0, reward_bound=0.0
33406: loss=0.000, reward_mean=0.1, reward_bound=0.0
33407: loss=0.000, reward_mean=0.0, reward_bound=0.0
33408: loss=0.000, reward_mean=0.1, reward_bound=0.0
33409: loss=0.000, reward_mean=0.0, reward_bound=0.0
33410: loss=0.000, reward_mean=0.1, reward_bound=0.0
33411: loss=0.000, reward_mean=0.0, reward_bound=0.0
33412: loss=0.000, reward_mean=0.0, reward_bound=0.0
33413: loss=0.000, reward_mean=0.0, reward_bound=0.0
33414: loss=0.000, reward_mean=0.1, reward_bound=0.0
33415: loss=0.000, reward_mean=0.1, reward_bound=0.0
33416: loss=0.000, reward_mean=0.0, reward_bound=0.0
33417: loss=0.000, reward_mean=0.1, reward_bound=0.0
33418: loss=0.000, reward_mean=0.0, reward_bound=0.0
33419: loss=0.000, reward_mean=0.1, reward_bound=0.0
33420: loss=0.000, reward_mean=0.1, reward_bou

33557: loss=0.000, reward_mean=0.1, reward_bound=0.0
33558: loss=0.000, reward_mean=0.0, reward_bound=0.0
33559: loss=0.000, reward_mean=0.1, reward_bound=0.0
33560: loss=0.000, reward_mean=0.1, reward_bound=0.0
33561: loss=0.000, reward_mean=0.1, reward_bound=0.0
33562: loss=0.000, reward_mean=0.0, reward_bound=0.0
33563: loss=0.000, reward_mean=0.1, reward_bound=0.0
33564: loss=0.000, reward_mean=0.0, reward_bound=0.0
33565: loss=0.000, reward_mean=0.0, reward_bound=0.0
33566: loss=0.000, reward_mean=0.0, reward_bound=0.0
33567: loss=0.000, reward_mean=0.1, reward_bound=0.0
33568: loss=0.000, reward_mean=0.0, reward_bound=0.0
33569: loss=0.000, reward_mean=0.0, reward_bound=0.0
33570: loss=0.000, reward_mean=0.1, reward_bound=0.0
33571: loss=0.000, reward_mean=0.0, reward_bound=0.0
33572: loss=0.000, reward_mean=0.0, reward_bound=0.0
33573: loss=0.000, reward_mean=0.1, reward_bound=0.0
33574: loss=0.000, reward_mean=0.0, reward_bound=0.0
33575: loss=0.000, reward_mean=0.0, reward_bou

33712: loss=0.000, reward_mean=0.0, reward_bound=0.0
33713: loss=0.000, reward_mean=0.0, reward_bound=0.0
33714: loss=0.000, reward_mean=0.1, reward_bound=0.0
33715: loss=0.000, reward_mean=0.1, reward_bound=0.0
33716: loss=0.000, reward_mean=0.1, reward_bound=0.0
33717: loss=0.000, reward_mean=0.0, reward_bound=0.0
33718: loss=0.000, reward_mean=0.0, reward_bound=0.0
33719: loss=0.000, reward_mean=0.0, reward_bound=0.0
33720: loss=0.000, reward_mean=0.1, reward_bound=0.0
33721: loss=0.000, reward_mean=0.1, reward_bound=0.0
33722: loss=0.000, reward_mean=0.1, reward_bound=0.0
33723: loss=0.000, reward_mean=0.1, reward_bound=0.0
33724: loss=0.000, reward_mean=0.0, reward_bound=0.0
33725: loss=0.000, reward_mean=0.1, reward_bound=0.0
33726: loss=0.000, reward_mean=0.1, reward_bound=0.0
33727: loss=0.000, reward_mean=0.0, reward_bound=0.0
33728: loss=0.000, reward_mean=0.0, reward_bound=0.0
33729: loss=0.000, reward_mean=0.1, reward_bound=0.0
33730: loss=0.000, reward_mean=0.1, reward_bou

33869: loss=0.000, reward_mean=0.0, reward_bound=0.0
33870: loss=0.000, reward_mean=0.1, reward_bound=0.0
33871: loss=0.000, reward_mean=0.0, reward_bound=0.0
33872: loss=0.000, reward_mean=0.1, reward_bound=0.0
33873: loss=0.000, reward_mean=0.1, reward_bound=0.0
33874: loss=0.000, reward_mean=0.1, reward_bound=0.0
33875: loss=0.000, reward_mean=0.0, reward_bound=0.0
33876: loss=0.000, reward_mean=0.1, reward_bound=0.0
33877: loss=0.000, reward_mean=0.1, reward_bound=0.0
33878: loss=0.000, reward_mean=0.1, reward_bound=0.0
33879: loss=0.000, reward_mean=0.0, reward_bound=0.0
33880: loss=0.000, reward_mean=0.1, reward_bound=0.0
33881: loss=0.000, reward_mean=0.1, reward_bound=0.0
33882: loss=0.000, reward_mean=0.0, reward_bound=0.0
33883: loss=0.000, reward_mean=0.0, reward_bound=0.0
33884: loss=0.000, reward_mean=0.1, reward_bound=0.0
33885: loss=0.000, reward_mean=0.0, reward_bound=0.0
33886: loss=0.000, reward_mean=0.1, reward_bound=0.0
33887: loss=0.000, reward_mean=0.1, reward_bou

34028: loss=0.000, reward_mean=0.1, reward_bound=0.0
34029: loss=0.000, reward_mean=0.0, reward_bound=0.0
34030: loss=0.000, reward_mean=0.1, reward_bound=0.0
34031: loss=0.000, reward_mean=0.0, reward_bound=0.0
34032: loss=0.000, reward_mean=0.0, reward_bound=0.0
34033: loss=0.000, reward_mean=0.1, reward_bound=0.0
34034: loss=0.000, reward_mean=0.1, reward_bound=0.0
34035: loss=0.000, reward_mean=0.1, reward_bound=0.0
34036: loss=0.000, reward_mean=0.0, reward_bound=0.0
34037: loss=0.000, reward_mean=0.0, reward_bound=0.0
34038: loss=0.000, reward_mean=0.0, reward_bound=0.0
34039: loss=0.000, reward_mean=0.1, reward_bound=0.0
34040: loss=0.000, reward_mean=0.2, reward_bound=0.0
34041: loss=0.000, reward_mean=0.0, reward_bound=0.0
34042: loss=0.000, reward_mean=0.1, reward_bound=0.0
34043: loss=0.000, reward_mean=0.0, reward_bound=0.0
34044: loss=0.000, reward_mean=0.0, reward_bound=0.0
34045: loss=0.000, reward_mean=0.1, reward_bound=0.0
34046: loss=0.000, reward_mean=0.0, reward_bou

34185: loss=0.000, reward_mean=0.1, reward_bound=0.0
34186: loss=0.000, reward_mean=0.1, reward_bound=0.0
34187: loss=0.000, reward_mean=0.1, reward_bound=0.0
34188: loss=0.000, reward_mean=0.0, reward_bound=0.0
34189: loss=0.000, reward_mean=0.1, reward_bound=0.0
34190: loss=0.000, reward_mean=0.0, reward_bound=0.0
34191: loss=0.000, reward_mean=0.0, reward_bound=0.0
34192: loss=0.000, reward_mean=0.1, reward_bound=0.0
34193: loss=0.000, reward_mean=0.1, reward_bound=0.0
34194: loss=0.000, reward_mean=0.0, reward_bound=0.0
34195: loss=0.000, reward_mean=0.0, reward_bound=0.0
34196: loss=0.000, reward_mean=0.1, reward_bound=0.0
34197: loss=0.000, reward_mean=0.1, reward_bound=0.0
34198: loss=0.000, reward_mean=0.1, reward_bound=0.0
34199: loss=0.000, reward_mean=0.1, reward_bound=0.0
34200: loss=0.000, reward_mean=0.0, reward_bound=0.0
34201: loss=0.000, reward_mean=0.1, reward_bound=0.0
34202: loss=0.000, reward_mean=0.1, reward_bound=0.0
34203: loss=0.000, reward_mean=0.0, reward_bou

34340: loss=0.000, reward_mean=0.0, reward_bound=0.0
34341: loss=0.000, reward_mean=0.0, reward_bound=0.0
34342: loss=0.000, reward_mean=0.1, reward_bound=0.0
34343: loss=0.000, reward_mean=0.0, reward_bound=0.0
34344: loss=0.000, reward_mean=0.1, reward_bound=0.0
34345: loss=0.000, reward_mean=0.1, reward_bound=0.0
34346: loss=0.000, reward_mean=0.1, reward_bound=0.0
34347: loss=0.000, reward_mean=0.1, reward_bound=0.0
34348: loss=0.000, reward_mean=0.1, reward_bound=0.0
34349: loss=0.000, reward_mean=0.1, reward_bound=0.0
34350: loss=0.000, reward_mean=0.1, reward_bound=0.0
34351: loss=0.000, reward_mean=0.2, reward_bound=0.0
34352: loss=0.000, reward_mean=0.1, reward_bound=0.0
34353: loss=0.000, reward_mean=0.0, reward_bound=0.0
34354: loss=0.000, reward_mean=0.1, reward_bound=0.0
34355: loss=0.000, reward_mean=0.2, reward_bound=0.0
34356: loss=0.000, reward_mean=0.1, reward_bound=0.0
34357: loss=0.000, reward_mean=0.1, reward_bound=0.0
34358: loss=0.000, reward_mean=0.0, reward_bou

34497: loss=0.000, reward_mean=0.0, reward_bound=0.0
34498: loss=0.000, reward_mean=0.1, reward_bound=0.0
34499: loss=0.000, reward_mean=0.1, reward_bound=0.0
34500: loss=0.000, reward_mean=0.1, reward_bound=0.0
34501: loss=0.000, reward_mean=0.1, reward_bound=0.0
34502: loss=0.000, reward_mean=0.1, reward_bound=0.0
34503: loss=0.000, reward_mean=0.0, reward_bound=0.0
34504: loss=0.000, reward_mean=0.1, reward_bound=0.0
34505: loss=0.000, reward_mean=0.1, reward_bound=0.0
34506: loss=0.000, reward_mean=0.2, reward_bound=0.0
34507: loss=0.000, reward_mean=0.1, reward_bound=0.0
34508: loss=0.000, reward_mean=0.1, reward_bound=0.0
34509: loss=0.000, reward_mean=0.0, reward_bound=0.0
34510: loss=0.000, reward_mean=0.1, reward_bound=0.0
34511: loss=0.000, reward_mean=0.1, reward_bound=0.0
34512: loss=0.000, reward_mean=0.1, reward_bound=0.0
34513: loss=0.000, reward_mean=0.2, reward_bound=0.0
34514: loss=0.000, reward_mean=0.0, reward_bound=0.0
34515: loss=0.000, reward_mean=0.0, reward_bou

34654: loss=0.000, reward_mean=0.1, reward_bound=0.0
34655: loss=0.000, reward_mean=0.1, reward_bound=0.0
34656: loss=0.000, reward_mean=0.0, reward_bound=0.0
34657: loss=0.000, reward_mean=0.1, reward_bound=0.0
34658: loss=0.000, reward_mean=0.0, reward_bound=0.0
34659: loss=0.000, reward_mean=0.1, reward_bound=0.0
34660: loss=0.000, reward_mean=0.1, reward_bound=0.0
34661: loss=0.000, reward_mean=0.0, reward_bound=0.0
34662: loss=0.000, reward_mean=0.1, reward_bound=0.0
34663: loss=0.000, reward_mean=0.1, reward_bound=0.0
34664: loss=0.000, reward_mean=0.0, reward_bound=0.0
34665: loss=0.000, reward_mean=0.0, reward_bound=0.0
34666: loss=0.000, reward_mean=0.0, reward_bound=0.0
34667: loss=0.000, reward_mean=0.1, reward_bound=0.0
34668: loss=0.000, reward_mean=0.1, reward_bound=0.0
34669: loss=0.000, reward_mean=0.2, reward_bound=0.0
34670: loss=0.000, reward_mean=0.0, reward_bound=0.0
34671: loss=0.000, reward_mean=0.1, reward_bound=0.0
34672: loss=0.000, reward_mean=0.1, reward_bou

34813: loss=0.000, reward_mean=0.1, reward_bound=0.0
34814: loss=0.000, reward_mean=0.0, reward_bound=0.0
34815: loss=0.000, reward_mean=0.1, reward_bound=0.0
34816: loss=0.000, reward_mean=0.1, reward_bound=0.0
34817: loss=0.000, reward_mean=0.0, reward_bound=0.0
34818: loss=0.000, reward_mean=0.0, reward_bound=0.0
34819: loss=0.000, reward_mean=0.0, reward_bound=0.0
34820: loss=0.000, reward_mean=0.0, reward_bound=0.0
34821: loss=0.000, reward_mean=0.0, reward_bound=0.0
34822: loss=0.000, reward_mean=0.0, reward_bound=0.0
34823: loss=0.000, reward_mean=0.1, reward_bound=0.0
34824: loss=0.000, reward_mean=0.0, reward_bound=0.0
34825: loss=0.000, reward_mean=0.2, reward_bound=0.0
34826: loss=0.000, reward_mean=0.1, reward_bound=0.0
34827: loss=0.000, reward_mean=0.1, reward_bound=0.0
34828: loss=0.000, reward_mean=0.0, reward_bound=0.0
34829: loss=0.000, reward_mean=0.1, reward_bound=0.0
34830: loss=0.000, reward_mean=0.1, reward_bound=0.0
34831: loss=0.000, reward_mean=0.1, reward_bou

34968: loss=0.000, reward_mean=0.2, reward_bound=0.0
34969: loss=0.000, reward_mean=0.1, reward_bound=0.0
34970: loss=0.000, reward_mean=0.1, reward_bound=0.0
34971: loss=0.000, reward_mean=0.0, reward_bound=0.0
34972: loss=0.000, reward_mean=0.0, reward_bound=0.0
34973: loss=0.000, reward_mean=0.1, reward_bound=0.0
34974: loss=0.000, reward_mean=0.1, reward_bound=0.0
34975: loss=0.000, reward_mean=0.1, reward_bound=0.0
34976: loss=0.000, reward_mean=0.1, reward_bound=0.0
34977: loss=0.000, reward_mean=0.0, reward_bound=0.0
34978: loss=0.000, reward_mean=0.1, reward_bound=0.0
34979: loss=0.000, reward_mean=0.1, reward_bound=0.0
34980: loss=0.000, reward_mean=0.2, reward_bound=0.0
34981: loss=0.000, reward_mean=0.1, reward_bound=0.0
34982: loss=0.000, reward_mean=0.1, reward_bound=0.0
34983: loss=0.000, reward_mean=0.0, reward_bound=0.0
34984: loss=0.000, reward_mean=0.1, reward_bound=0.0
34985: loss=0.000, reward_mean=0.1, reward_bound=0.0
34986: loss=0.000, reward_mean=0.1, reward_bou

35129: loss=0.000, reward_mean=0.0, reward_bound=0.0
35130: loss=0.000, reward_mean=0.1, reward_bound=0.0
35131: loss=0.000, reward_mean=0.0, reward_bound=0.0
35132: loss=0.000, reward_mean=0.1, reward_bound=0.0
35133: loss=0.000, reward_mean=0.1, reward_bound=0.0
35134: loss=0.000, reward_mean=0.1, reward_bound=0.0
35135: loss=0.000, reward_mean=0.0, reward_bound=0.0
35136: loss=0.000, reward_mean=0.0, reward_bound=0.0
35137: loss=0.000, reward_mean=0.1, reward_bound=0.0
35138: loss=0.000, reward_mean=0.1, reward_bound=0.0
35139: loss=0.000, reward_mean=0.1, reward_bound=0.0
35140: loss=0.000, reward_mean=0.1, reward_bound=0.0
35141: loss=0.000, reward_mean=0.1, reward_bound=0.0
35142: loss=0.000, reward_mean=0.1, reward_bound=0.0
35143: loss=0.000, reward_mean=0.1, reward_bound=0.0
35144: loss=0.000, reward_mean=0.1, reward_bound=0.0
35145: loss=0.000, reward_mean=0.1, reward_bound=0.0
35146: loss=0.000, reward_mean=0.2, reward_bound=0.0
35147: loss=0.000, reward_mean=0.1, reward_bou

35287: loss=0.000, reward_mean=0.0, reward_bound=0.0
35288: loss=0.000, reward_mean=0.0, reward_bound=0.0
35289: loss=0.000, reward_mean=0.0, reward_bound=0.0
35290: loss=0.000, reward_mean=0.1, reward_bound=0.0
35291: loss=0.000, reward_mean=0.1, reward_bound=0.0
35292: loss=0.000, reward_mean=0.1, reward_bound=0.0
35293: loss=0.000, reward_mean=0.0, reward_bound=0.0
35294: loss=0.000, reward_mean=0.1, reward_bound=0.0
35295: loss=0.000, reward_mean=0.1, reward_bound=0.0
35296: loss=0.000, reward_mean=0.0, reward_bound=0.0
35297: loss=0.000, reward_mean=0.0, reward_bound=0.0
35298: loss=0.000, reward_mean=0.0, reward_bound=0.0
35299: loss=0.000, reward_mean=0.1, reward_bound=0.0
35300: loss=0.000, reward_mean=0.0, reward_bound=0.0
35301: loss=0.000, reward_mean=0.0, reward_bound=0.0
35302: loss=0.000, reward_mean=0.1, reward_bound=0.0
35303: loss=0.000, reward_mean=0.0, reward_bound=0.0
35304: loss=0.000, reward_mean=0.0, reward_bound=0.0
35305: loss=0.000, reward_mean=0.1, reward_bou

35445: loss=0.000, reward_mean=0.0, reward_bound=0.0
35446: loss=0.000, reward_mean=0.1, reward_bound=0.0
35447: loss=0.000, reward_mean=0.1, reward_bound=0.0
35448: loss=0.000, reward_mean=0.0, reward_bound=0.0
35449: loss=0.000, reward_mean=0.1, reward_bound=0.0
35450: loss=0.000, reward_mean=0.0, reward_bound=0.0
35451: loss=0.000, reward_mean=0.0, reward_bound=0.0
35452: loss=0.000, reward_mean=0.1, reward_bound=0.0
35453: loss=0.000, reward_mean=0.1, reward_bound=0.0
35454: loss=0.000, reward_mean=0.1, reward_bound=0.0
35455: loss=0.000, reward_mean=0.1, reward_bound=0.0
35456: loss=0.000, reward_mean=0.0, reward_bound=0.0
35457: loss=0.000, reward_mean=0.0, reward_bound=0.0
35458: loss=0.000, reward_mean=0.2, reward_bound=0.0
35459: loss=0.000, reward_mean=0.0, reward_bound=0.0
35460: loss=0.000, reward_mean=0.1, reward_bound=0.0
35461: loss=0.000, reward_mean=0.0, reward_bound=0.0
35462: loss=0.000, reward_mean=0.1, reward_bound=0.0
35463: loss=0.000, reward_mean=0.1, reward_bou

35600: loss=0.000, reward_mean=0.0, reward_bound=0.0
35601: loss=0.000, reward_mean=0.1, reward_bound=0.0
35602: loss=0.000, reward_mean=0.0, reward_bound=0.0
35603: loss=0.000, reward_mean=0.0, reward_bound=0.0
35604: loss=0.000, reward_mean=0.1, reward_bound=0.0
35605: loss=0.000, reward_mean=0.0, reward_bound=0.0
35606: loss=0.000, reward_mean=0.1, reward_bound=0.0
35607: loss=0.000, reward_mean=0.1, reward_bound=0.0
35608: loss=0.000, reward_mean=0.0, reward_bound=0.0
35609: loss=0.000, reward_mean=0.1, reward_bound=0.0
35610: loss=0.000, reward_mean=0.1, reward_bound=0.0
35611: loss=0.000, reward_mean=0.0, reward_bound=0.0
35612: loss=0.000, reward_mean=0.0, reward_bound=0.0
35613: loss=0.000, reward_mean=0.1, reward_bound=0.0
35614: loss=0.000, reward_mean=0.0, reward_bound=0.0
35615: loss=0.000, reward_mean=0.1, reward_bound=0.0
35616: loss=0.000, reward_mean=0.0, reward_bound=0.0
35617: loss=0.000, reward_mean=0.0, reward_bound=0.0
35618: loss=0.000, reward_mean=0.0, reward_bou

35756: loss=0.000, reward_mean=0.0, reward_bound=0.0
35757: loss=0.000, reward_mean=0.0, reward_bound=0.0
35758: loss=0.000, reward_mean=0.0, reward_bound=0.0
35759: loss=0.000, reward_mean=0.0, reward_bound=0.0
35760: loss=0.000, reward_mean=0.0, reward_bound=0.0
35761: loss=0.000, reward_mean=0.0, reward_bound=0.0
35762: loss=0.000, reward_mean=0.1, reward_bound=0.0
35763: loss=0.000, reward_mean=0.0, reward_bound=0.0
35764: loss=0.000, reward_mean=0.1, reward_bound=0.0
35765: loss=0.000, reward_mean=0.0, reward_bound=0.0
35766: loss=0.000, reward_mean=0.1, reward_bound=0.0
35767: loss=0.000, reward_mean=0.0, reward_bound=0.0
35768: loss=0.000, reward_mean=0.2, reward_bound=0.0
35769: loss=0.000, reward_mean=0.1, reward_bound=0.0
35770: loss=0.000, reward_mean=0.1, reward_bound=0.0
35771: loss=0.000, reward_mean=0.0, reward_bound=0.0
35772: loss=0.000, reward_mean=0.1, reward_bound=0.0
35773: loss=0.000, reward_mean=0.2, reward_bound=0.0
35774: loss=0.000, reward_mean=0.0, reward_bou

35912: loss=0.000, reward_mean=0.1, reward_bound=0.0
35913: loss=0.000, reward_mean=0.1, reward_bound=0.0
35914: loss=0.000, reward_mean=0.1, reward_bound=0.0
35915: loss=0.000, reward_mean=0.0, reward_bound=0.0
35916: loss=0.000, reward_mean=0.1, reward_bound=0.0
35917: loss=0.000, reward_mean=0.1, reward_bound=0.0
35918: loss=0.000, reward_mean=0.1, reward_bound=0.0
35919: loss=0.000, reward_mean=0.0, reward_bound=0.0
35920: loss=0.000, reward_mean=0.1, reward_bound=0.0
35921: loss=0.000, reward_mean=0.1, reward_bound=0.0
35922: loss=0.000, reward_mean=0.0, reward_bound=0.0
35923: loss=0.000, reward_mean=0.1, reward_bound=0.0
35924: loss=0.000, reward_mean=0.0, reward_bound=0.0
35925: loss=0.000, reward_mean=0.0, reward_bound=0.0
35926: loss=0.000, reward_mean=0.0, reward_bound=0.0
35927: loss=0.000, reward_mean=0.1, reward_bound=0.0
35928: loss=0.000, reward_mean=0.0, reward_bound=0.0
35929: loss=0.000, reward_mean=0.0, reward_bound=0.0
35930: loss=0.000, reward_mean=0.1, reward_bou

36068: loss=0.000, reward_mean=0.1, reward_bound=0.0
36069: loss=0.000, reward_mean=0.0, reward_bound=0.0
36070: loss=0.000, reward_mean=0.0, reward_bound=0.0
36071: loss=0.000, reward_mean=0.0, reward_bound=0.0
36072: loss=0.000, reward_mean=0.1, reward_bound=0.0
36073: loss=0.000, reward_mean=0.0, reward_bound=0.0
36074: loss=0.000, reward_mean=0.0, reward_bound=0.0
36075: loss=0.000, reward_mean=0.1, reward_bound=0.0
36076: loss=0.000, reward_mean=0.1, reward_bound=0.0
36077: loss=0.000, reward_mean=0.0, reward_bound=0.0
36078: loss=0.000, reward_mean=0.1, reward_bound=0.0
36079: loss=0.000, reward_mean=0.1, reward_bound=0.0
36080: loss=0.000, reward_mean=0.0, reward_bound=0.0
36081: loss=0.000, reward_mean=0.0, reward_bound=0.0
36082: loss=0.000, reward_mean=0.1, reward_bound=0.0
36083: loss=0.000, reward_mean=0.1, reward_bound=0.0
36084: loss=0.000, reward_mean=0.1, reward_bound=0.0
36085: loss=0.000, reward_mean=0.1, reward_bound=0.0
36086: loss=0.000, reward_mean=0.0, reward_bou

36223: loss=0.000, reward_mean=0.1, reward_bound=0.0
36224: loss=0.000, reward_mean=0.0, reward_bound=0.0
36225: loss=0.000, reward_mean=0.0, reward_bound=0.0
36226: loss=0.000, reward_mean=0.2, reward_bound=0.0
36227: loss=0.000, reward_mean=0.1, reward_bound=0.0
36228: loss=0.000, reward_mean=0.0, reward_bound=0.0
36229: loss=0.000, reward_mean=0.2, reward_bound=0.0
36230: loss=0.000, reward_mean=0.1, reward_bound=0.0
36231: loss=0.000, reward_mean=0.1, reward_bound=0.0
36232: loss=0.000, reward_mean=0.0, reward_bound=0.0
36233: loss=0.000, reward_mean=0.1, reward_bound=0.0
36234: loss=0.000, reward_mean=0.0, reward_bound=0.0
36235: loss=0.000, reward_mean=0.1, reward_bound=0.0
36236: loss=0.000, reward_mean=0.1, reward_bound=0.0
36237: loss=0.000, reward_mean=0.1, reward_bound=0.0
36238: loss=0.000, reward_mean=0.1, reward_bound=0.0
36239: loss=0.000, reward_mean=0.1, reward_bound=0.0
36240: loss=0.000, reward_mean=0.1, reward_bound=0.0
36241: loss=0.000, reward_mean=0.0, reward_bou

36382: loss=0.000, reward_mean=0.1, reward_bound=0.0
36383: loss=0.000, reward_mean=0.0, reward_bound=0.0
36384: loss=0.000, reward_mean=0.1, reward_bound=0.0
36385: loss=0.000, reward_mean=0.0, reward_bound=0.0
36386: loss=0.000, reward_mean=0.0, reward_bound=0.0
36387: loss=0.000, reward_mean=0.1, reward_bound=0.0
36388: loss=0.000, reward_mean=0.1, reward_bound=0.0
36389: loss=0.000, reward_mean=0.1, reward_bound=0.0
36390: loss=0.000, reward_mean=0.1, reward_bound=0.0
36391: loss=0.000, reward_mean=0.2, reward_bound=0.0
36392: loss=0.000, reward_mean=0.1, reward_bound=0.0
36393: loss=0.000, reward_mean=0.1, reward_bound=0.0
36394: loss=0.000, reward_mean=0.0, reward_bound=0.0
36395: loss=0.000, reward_mean=0.0, reward_bound=0.0
36396: loss=0.000, reward_mean=0.1, reward_bound=0.0
36397: loss=0.000, reward_mean=0.0, reward_bound=0.0
36398: loss=0.000, reward_mean=0.1, reward_bound=0.0
36399: loss=0.000, reward_mean=0.2, reward_bound=0.0
36400: loss=0.000, reward_mean=0.0, reward_bou

36541: loss=0.000, reward_mean=0.1, reward_bound=0.0
36542: loss=0.000, reward_mean=0.1, reward_bound=0.0
36543: loss=0.000, reward_mean=0.0, reward_bound=0.0
36544: loss=0.000, reward_mean=0.0, reward_bound=0.0
36545: loss=0.000, reward_mean=0.0, reward_bound=0.0
36546: loss=0.000, reward_mean=0.1, reward_bound=0.0
36547: loss=0.000, reward_mean=0.1, reward_bound=0.0
36548: loss=0.000, reward_mean=0.1, reward_bound=0.0
36549: loss=0.000, reward_mean=0.0, reward_bound=0.0
36550: loss=0.000, reward_mean=0.1, reward_bound=0.0
36551: loss=0.000, reward_mean=0.0, reward_bound=0.0
36552: loss=0.000, reward_mean=0.1, reward_bound=0.0
36553: loss=0.000, reward_mean=0.0, reward_bound=0.0
36554: loss=0.000, reward_mean=0.1, reward_bound=0.0
36555: loss=0.000, reward_mean=0.0, reward_bound=0.0
36556: loss=0.000, reward_mean=0.1, reward_bound=0.0
36557: loss=0.000, reward_mean=0.2, reward_bound=0.0
36558: loss=0.000, reward_mean=0.0, reward_bound=0.0
36559: loss=0.000, reward_mean=0.0, reward_bou

36697: loss=0.000, reward_mean=0.1, reward_bound=0.0
36698: loss=0.000, reward_mean=0.1, reward_bound=0.0
36699: loss=0.000, reward_mean=0.1, reward_bound=0.0
36700: loss=0.000, reward_mean=0.0, reward_bound=0.0
36701: loss=0.000, reward_mean=0.1, reward_bound=0.0
36702: loss=0.000, reward_mean=0.0, reward_bound=0.0
36703: loss=0.000, reward_mean=0.0, reward_bound=0.0
36704: loss=0.000, reward_mean=0.0, reward_bound=0.0
36705: loss=0.000, reward_mean=0.0, reward_bound=0.0
36706: loss=0.000, reward_mean=0.0, reward_bound=0.0
36707: loss=0.000, reward_mean=0.1, reward_bound=0.0
36708: loss=0.000, reward_mean=0.1, reward_bound=0.0
36709: loss=0.000, reward_mean=0.0, reward_bound=0.0
36710: loss=0.000, reward_mean=0.1, reward_bound=0.0
36711: loss=0.000, reward_mean=0.2, reward_bound=0.0
36712: loss=0.000, reward_mean=0.0, reward_bound=0.0
36713: loss=0.000, reward_mean=0.0, reward_bound=0.0
36714: loss=0.000, reward_mean=0.1, reward_bound=0.0
36715: loss=0.000, reward_mean=0.0, reward_bou

36854: loss=0.000, reward_mean=0.1, reward_bound=0.0
36855: loss=0.000, reward_mean=0.1, reward_bound=0.0
36856: loss=0.000, reward_mean=0.0, reward_bound=0.0
36857: loss=0.000, reward_mean=0.0, reward_bound=0.0
36858: loss=0.000, reward_mean=0.1, reward_bound=0.0
36859: loss=0.000, reward_mean=0.0, reward_bound=0.0
36860: loss=0.000, reward_mean=0.1, reward_bound=0.0
36861: loss=0.000, reward_mean=0.1, reward_bound=0.0
36862: loss=0.000, reward_mean=0.0, reward_bound=0.0
36863: loss=0.000, reward_mean=0.0, reward_bound=0.0
36864: loss=0.000, reward_mean=0.0, reward_bound=0.0
36865: loss=0.000, reward_mean=0.1, reward_bound=0.0
36866: loss=0.000, reward_mean=0.1, reward_bound=0.0
36867: loss=0.000, reward_mean=0.1, reward_bound=0.0
36868: loss=0.000, reward_mean=0.1, reward_bound=0.0
36869: loss=0.000, reward_mean=0.0, reward_bound=0.0
36870: loss=0.000, reward_mean=0.2, reward_bound=0.0
36871: loss=0.000, reward_mean=0.0, reward_bound=0.0
36872: loss=0.000, reward_mean=0.1, reward_bou

37009: loss=0.000, reward_mean=0.1, reward_bound=0.0
37010: loss=0.000, reward_mean=0.1, reward_bound=0.0
37011: loss=0.000, reward_mean=0.1, reward_bound=0.0
37012: loss=0.000, reward_mean=0.0, reward_bound=0.0
37013: loss=0.000, reward_mean=0.1, reward_bound=0.0
37014: loss=0.000, reward_mean=0.1, reward_bound=0.0
37015: loss=0.000, reward_mean=0.1, reward_bound=0.0
37016: loss=0.000, reward_mean=0.1, reward_bound=0.0
37017: loss=0.000, reward_mean=0.0, reward_bound=0.0
37018: loss=0.000, reward_mean=0.1, reward_bound=0.0
37019: loss=0.000, reward_mean=0.0, reward_bound=0.0
37020: loss=0.000, reward_mean=0.0, reward_bound=0.0
37021: loss=0.000, reward_mean=0.0, reward_bound=0.0
37022: loss=0.000, reward_mean=0.1, reward_bound=0.0
37023: loss=0.000, reward_mean=0.1, reward_bound=0.0
37024: loss=0.000, reward_mean=0.1, reward_bound=0.0
37025: loss=0.000, reward_mean=0.1, reward_bound=0.0
37026: loss=0.000, reward_mean=0.0, reward_bound=0.0
37027: loss=0.000, reward_mean=0.1, reward_bou

37168: loss=0.000, reward_mean=0.0, reward_bound=0.0
37169: loss=0.000, reward_mean=0.1, reward_bound=0.0
37170: loss=0.000, reward_mean=0.1, reward_bound=0.0
37171: loss=0.000, reward_mean=0.0, reward_bound=0.0
37172: loss=0.000, reward_mean=0.1, reward_bound=0.0
37173: loss=0.000, reward_mean=0.0, reward_bound=0.0
37174: loss=0.000, reward_mean=0.1, reward_bound=0.0
37175: loss=0.000, reward_mean=0.0, reward_bound=0.0
37176: loss=0.000, reward_mean=0.1, reward_bound=0.0
37177: loss=0.000, reward_mean=0.0, reward_bound=0.0
37178: loss=0.000, reward_mean=0.1, reward_bound=0.0
37179: loss=0.000, reward_mean=0.1, reward_bound=0.0
37180: loss=0.000, reward_mean=0.0, reward_bound=0.0
37181: loss=0.000, reward_mean=0.1, reward_bound=0.0
37182: loss=0.000, reward_mean=0.0, reward_bound=0.0
37183: loss=0.000, reward_mean=0.1, reward_bound=0.0
37184: loss=0.000, reward_mean=0.1, reward_bound=0.0
37185: loss=0.000, reward_mean=0.1, reward_bound=0.0
37186: loss=0.000, reward_mean=0.0, reward_bou

37329: loss=0.000, reward_mean=0.1, reward_bound=0.0
37330: loss=0.000, reward_mean=0.1, reward_bound=0.0
37331: loss=0.000, reward_mean=0.1, reward_bound=0.0
37332: loss=0.000, reward_mean=0.0, reward_bound=0.0
37333: loss=0.000, reward_mean=0.0, reward_bound=0.0
37334: loss=0.000, reward_mean=0.1, reward_bound=0.0
37335: loss=0.000, reward_mean=0.0, reward_bound=0.0
37336: loss=0.000, reward_mean=0.1, reward_bound=0.0
37337: loss=0.000, reward_mean=0.1, reward_bound=0.0
37338: loss=0.000, reward_mean=0.1, reward_bound=0.0
37339: loss=0.000, reward_mean=0.0, reward_bound=0.0
37340: loss=0.000, reward_mean=0.2, reward_bound=0.0
37341: loss=0.000, reward_mean=0.0, reward_bound=0.0
37342: loss=0.000, reward_mean=0.0, reward_bound=0.0
37343: loss=0.000, reward_mean=0.1, reward_bound=0.0
37344: loss=0.000, reward_mean=0.1, reward_bound=0.0
37345: loss=0.000, reward_mean=0.0, reward_bound=0.0
37346: loss=0.000, reward_mean=0.1, reward_bound=0.0
37347: loss=0.000, reward_mean=0.0, reward_bou

37488: loss=0.000, reward_mean=0.1, reward_bound=0.0
37489: loss=0.000, reward_mean=0.1, reward_bound=0.0
37490: loss=0.000, reward_mean=0.0, reward_bound=0.0
37491: loss=0.000, reward_mean=0.1, reward_bound=0.0
37492: loss=0.000, reward_mean=0.1, reward_bound=0.0
37493: loss=0.000, reward_mean=0.0, reward_bound=0.0
37494: loss=0.000, reward_mean=0.1, reward_bound=0.0
37495: loss=0.000, reward_mean=0.1, reward_bound=0.0
37496: loss=0.000, reward_mean=0.0, reward_bound=0.0
37497: loss=0.000, reward_mean=0.2, reward_bound=0.0
37498: loss=0.000, reward_mean=0.0, reward_bound=0.0
37499: loss=0.000, reward_mean=0.0, reward_bound=0.0
37500: loss=0.000, reward_mean=0.1, reward_bound=0.0
37501: loss=0.000, reward_mean=0.1, reward_bound=0.0
37502: loss=0.000, reward_mean=0.1, reward_bound=0.0
37503: loss=0.000, reward_mean=0.1, reward_bound=0.0
37504: loss=0.000, reward_mean=0.1, reward_bound=0.0
37505: loss=0.000, reward_mean=0.1, reward_bound=0.0
37506: loss=0.000, reward_mean=0.1, reward_bou

37647: loss=0.000, reward_mean=0.0, reward_bound=0.0
37648: loss=0.000, reward_mean=0.0, reward_bound=0.0
37649: loss=0.000, reward_mean=0.1, reward_bound=0.0
37650: loss=0.000, reward_mean=0.0, reward_bound=0.0
37651: loss=0.000, reward_mean=0.0, reward_bound=0.0
37652: loss=0.000, reward_mean=0.1, reward_bound=0.0
37653: loss=0.000, reward_mean=0.0, reward_bound=0.0
37654: loss=0.000, reward_mean=0.1, reward_bound=0.0
37655: loss=0.000, reward_mean=0.2, reward_bound=0.0
37656: loss=0.000, reward_mean=0.0, reward_bound=0.0
37657: loss=0.000, reward_mean=0.0, reward_bound=0.0
37658: loss=0.000, reward_mean=0.1, reward_bound=0.0
37659: loss=0.000, reward_mean=0.1, reward_bound=0.0
37660: loss=0.000, reward_mean=0.1, reward_bound=0.0
37661: loss=0.000, reward_mean=0.1, reward_bound=0.0
37662: loss=0.000, reward_mean=0.0, reward_bound=0.0
37663: loss=0.000, reward_mean=0.1, reward_bound=0.0
37664: loss=0.000, reward_mean=0.0, reward_bound=0.0
37665: loss=0.000, reward_mean=0.1, reward_bou

37803: loss=0.000, reward_mean=0.1, reward_bound=0.0
37804: loss=0.000, reward_mean=0.1, reward_bound=0.0
37805: loss=0.000, reward_mean=0.1, reward_bound=0.0
37806: loss=0.000, reward_mean=0.0, reward_bound=0.0
37807: loss=0.000, reward_mean=0.0, reward_bound=0.0
37808: loss=0.000, reward_mean=0.0, reward_bound=0.0
37809: loss=0.000, reward_mean=0.0, reward_bound=0.0
37810: loss=0.000, reward_mean=0.1, reward_bound=0.0
37811: loss=0.000, reward_mean=0.0, reward_bound=0.0
37812: loss=0.000, reward_mean=0.1, reward_bound=0.0
37813: loss=0.000, reward_mean=0.1, reward_bound=0.0
37814: loss=0.000, reward_mean=0.0, reward_bound=0.0
37815: loss=0.000, reward_mean=0.1, reward_bound=0.0
37816: loss=0.000, reward_mean=0.1, reward_bound=0.0
37817: loss=0.000, reward_mean=0.0, reward_bound=0.0
37818: loss=0.000, reward_mean=0.1, reward_bound=0.0
37819: loss=0.000, reward_mean=0.1, reward_bound=0.0
37820: loss=0.000, reward_mean=0.1, reward_bound=0.0
37821: loss=0.000, reward_mean=0.1, reward_bou

37961: loss=0.000, reward_mean=0.1, reward_bound=0.0
37962: loss=0.000, reward_mean=0.1, reward_bound=0.0
37963: loss=0.000, reward_mean=0.0, reward_bound=0.0
37964: loss=0.000, reward_mean=0.0, reward_bound=0.0
37965: loss=0.000, reward_mean=0.0, reward_bound=0.0
37966: loss=0.000, reward_mean=0.1, reward_bound=0.0
37967: loss=0.000, reward_mean=0.0, reward_bound=0.0
37968: loss=0.000, reward_mean=0.0, reward_bound=0.0
37969: loss=0.000, reward_mean=0.2, reward_bound=0.0
37970: loss=0.000, reward_mean=0.2, reward_bound=0.0
37971: loss=0.000, reward_mean=0.1, reward_bound=0.0
37972: loss=0.000, reward_mean=0.2, reward_bound=0.0
37973: loss=0.000, reward_mean=0.1, reward_bound=0.0
37974: loss=0.000, reward_mean=0.0, reward_bound=0.0
37975: loss=0.000, reward_mean=0.1, reward_bound=0.0
37976: loss=0.000, reward_mean=0.1, reward_bound=0.0
37977: loss=0.000, reward_mean=0.0, reward_bound=0.0
37978: loss=0.000, reward_mean=0.2, reward_bound=0.0
37979: loss=0.000, reward_mean=0.1, reward_bou

38119: loss=0.000, reward_mean=0.0, reward_bound=0.0
38120: loss=0.000, reward_mean=0.0, reward_bound=0.0
38121: loss=0.000, reward_mean=0.1, reward_bound=0.0
38122: loss=0.000, reward_mean=0.0, reward_bound=0.0
38123: loss=0.000, reward_mean=0.0, reward_bound=0.0
38124: loss=0.000, reward_mean=0.0, reward_bound=0.0
38125: loss=0.000, reward_mean=0.1, reward_bound=0.0
38126: loss=0.000, reward_mean=0.0, reward_bound=0.0
38127: loss=0.000, reward_mean=0.0, reward_bound=0.0
38128: loss=0.000, reward_mean=0.1, reward_bound=0.0
38129: loss=0.000, reward_mean=0.1, reward_bound=0.0
38130: loss=0.000, reward_mean=0.0, reward_bound=0.0
38131: loss=0.000, reward_mean=0.1, reward_bound=0.0
38132: loss=0.000, reward_mean=0.1, reward_bound=0.0
38133: loss=0.000, reward_mean=0.1, reward_bound=0.0
38134: loss=0.000, reward_mean=0.0, reward_bound=0.0
38135: loss=0.000, reward_mean=0.1, reward_bound=0.0
38136: loss=0.000, reward_mean=0.1, reward_bound=0.0
38137: loss=0.000, reward_mean=0.1, reward_bou

38275: loss=0.000, reward_mean=0.0, reward_bound=0.0
38276: loss=0.000, reward_mean=0.0, reward_bound=0.0
38277: loss=0.000, reward_mean=0.1, reward_bound=0.0
38278: loss=0.000, reward_mean=0.1, reward_bound=0.0
38279: loss=0.000, reward_mean=0.0, reward_bound=0.0
38280: loss=0.000, reward_mean=0.0, reward_bound=0.0
38281: loss=0.000, reward_mean=0.1, reward_bound=0.0
38282: loss=0.000, reward_mean=0.0, reward_bound=0.0
38283: loss=0.000, reward_mean=0.0, reward_bound=0.0
38284: loss=0.000, reward_mean=0.0, reward_bound=0.0
38285: loss=0.000, reward_mean=0.1, reward_bound=0.0
38286: loss=0.000, reward_mean=0.1, reward_bound=0.0
38287: loss=0.000, reward_mean=0.2, reward_bound=0.0
38288: loss=0.000, reward_mean=0.0, reward_bound=0.0
38289: loss=0.000, reward_mean=0.0, reward_bound=0.0
38290: loss=0.000, reward_mean=0.0, reward_bound=0.0
38291: loss=0.000, reward_mean=0.1, reward_bound=0.0
38292: loss=0.000, reward_mean=0.0, reward_bound=0.0
38293: loss=0.000, reward_mean=0.0, reward_bou

38432: loss=0.000, reward_mean=0.0, reward_bound=0.0
38433: loss=0.000, reward_mean=0.1, reward_bound=0.0
38434: loss=0.000, reward_mean=0.0, reward_bound=0.0
38435: loss=0.000, reward_mean=0.1, reward_bound=0.0
38436: loss=0.000, reward_mean=0.1, reward_bound=0.0
38437: loss=0.000, reward_mean=0.1, reward_bound=0.0
38438: loss=0.000, reward_mean=0.1, reward_bound=0.0
38439: loss=0.000, reward_mean=0.1, reward_bound=0.0
38440: loss=0.000, reward_mean=0.0, reward_bound=0.0
38441: loss=0.000, reward_mean=0.0, reward_bound=0.0
38442: loss=0.000, reward_mean=0.0, reward_bound=0.0
38443: loss=0.000, reward_mean=0.1, reward_bound=0.0
38444: loss=0.000, reward_mean=0.2, reward_bound=0.0
38445: loss=0.000, reward_mean=0.1, reward_bound=0.0
38446: loss=0.000, reward_mean=0.1, reward_bound=0.0
38447: loss=0.000, reward_mean=0.1, reward_bound=0.0
38448: loss=0.000, reward_mean=0.0, reward_bound=0.0
38449: loss=0.000, reward_mean=0.1, reward_bound=0.0
38450: loss=0.000, reward_mean=0.1, reward_bou

38593: loss=0.000, reward_mean=0.1, reward_bound=0.0
38594: loss=0.000, reward_mean=0.0, reward_bound=0.0
38595: loss=0.000, reward_mean=0.1, reward_bound=0.0
38596: loss=0.000, reward_mean=0.1, reward_bound=0.0
38597: loss=0.000, reward_mean=0.1, reward_bound=0.0
38598: loss=0.000, reward_mean=0.1, reward_bound=0.0
38599: loss=0.000, reward_mean=0.0, reward_bound=0.0
38600: loss=0.000, reward_mean=0.2, reward_bound=0.0
38601: loss=0.000, reward_mean=0.1, reward_bound=0.0
38602: loss=0.000, reward_mean=0.1, reward_bound=0.0
38603: loss=0.000, reward_mean=0.1, reward_bound=0.0
38604: loss=0.000, reward_mean=0.1, reward_bound=0.0
38605: loss=0.000, reward_mean=0.0, reward_bound=0.0
38606: loss=0.000, reward_mean=0.1, reward_bound=0.0
38607: loss=0.000, reward_mean=0.1, reward_bound=0.0
38608: loss=0.000, reward_mean=0.1, reward_bound=0.0
38609: loss=0.000, reward_mean=0.1, reward_bound=0.0
38610: loss=0.000, reward_mean=0.0, reward_bound=0.0
38611: loss=0.000, reward_mean=0.0, reward_bou

38754: loss=0.000, reward_mean=0.0, reward_bound=0.0
38755: loss=0.000, reward_mean=0.1, reward_bound=0.0
38756: loss=0.000, reward_mean=0.0, reward_bound=0.0
38757: loss=0.000, reward_mean=0.1, reward_bound=0.0
38758: loss=0.000, reward_mean=0.0, reward_bound=0.0
38759: loss=0.000, reward_mean=0.1, reward_bound=0.0
38760: loss=0.000, reward_mean=0.1, reward_bound=0.0
38761: loss=0.000, reward_mean=0.1, reward_bound=0.0
38762: loss=0.000, reward_mean=0.0, reward_bound=0.0
38763: loss=0.000, reward_mean=0.0, reward_bound=0.0
38764: loss=0.000, reward_mean=0.1, reward_bound=0.0
38765: loss=0.000, reward_mean=0.1, reward_bound=0.0
38766: loss=0.000, reward_mean=0.0, reward_bound=0.0
38767: loss=0.000, reward_mean=0.0, reward_bound=0.0
38768: loss=0.000, reward_mean=0.1, reward_bound=0.0
38769: loss=0.000, reward_mean=0.1, reward_bound=0.0
38770: loss=0.000, reward_mean=0.1, reward_bound=0.0
38771: loss=0.000, reward_mean=0.1, reward_bound=0.0
38772: loss=0.000, reward_mean=0.1, reward_bou

38912: loss=0.000, reward_mean=0.0, reward_bound=0.0
38913: loss=0.000, reward_mean=0.1, reward_bound=0.0
38914: loss=0.000, reward_mean=0.1, reward_bound=0.0
38915: loss=0.000, reward_mean=0.1, reward_bound=0.0
38916: loss=0.000, reward_mean=0.0, reward_bound=0.0
38917: loss=0.000, reward_mean=0.1, reward_bound=0.0
38918: loss=0.000, reward_mean=0.1, reward_bound=0.0
38919: loss=0.000, reward_mean=0.0, reward_bound=0.0
38920: loss=0.000, reward_mean=0.0, reward_bound=0.0
38921: loss=0.000, reward_mean=0.1, reward_bound=0.0
38922: loss=0.000, reward_mean=0.0, reward_bound=0.0
38923: loss=0.000, reward_mean=0.0, reward_bound=0.0
38924: loss=0.000, reward_mean=0.0, reward_bound=0.0
38925: loss=0.000, reward_mean=0.1, reward_bound=0.0
38926: loss=0.000, reward_mean=0.1, reward_bound=0.0
38927: loss=0.000, reward_mean=0.1, reward_bound=0.0
38928: loss=0.000, reward_mean=0.0, reward_bound=0.0
38929: loss=0.000, reward_mean=0.1, reward_bound=0.0
38930: loss=0.000, reward_mean=0.1, reward_bou

39072: loss=0.000, reward_mean=0.1, reward_bound=0.0
39073: loss=0.000, reward_mean=0.0, reward_bound=0.0
39074: loss=0.000, reward_mean=0.1, reward_bound=0.0
39075: loss=0.000, reward_mean=0.1, reward_bound=0.0
39076: loss=0.000, reward_mean=0.1, reward_bound=0.0
39077: loss=0.000, reward_mean=0.0, reward_bound=0.0
39078: loss=0.000, reward_mean=0.0, reward_bound=0.0
39079: loss=0.000, reward_mean=0.1, reward_bound=0.0
39080: loss=0.000, reward_mean=0.0, reward_bound=0.0
39081: loss=0.000, reward_mean=0.1, reward_bound=0.0
39082: loss=0.000, reward_mean=0.1, reward_bound=0.0
39083: loss=0.000, reward_mean=0.1, reward_bound=0.0
39084: loss=0.000, reward_mean=0.0, reward_bound=0.0
39085: loss=0.000, reward_mean=0.1, reward_bound=0.0
39086: loss=0.000, reward_mean=0.1, reward_bound=0.0
39087: loss=0.000, reward_mean=0.0, reward_bound=0.0
39088: loss=0.000, reward_mean=0.0, reward_bound=0.0
39089: loss=0.000, reward_mean=0.0, reward_bound=0.0
39090: loss=0.000, reward_mean=0.1, reward_bou

39228: loss=0.000, reward_mean=0.0, reward_bound=0.0
39229: loss=0.000, reward_mean=0.1, reward_bound=0.0
39230: loss=0.000, reward_mean=0.1, reward_bound=0.0
39231: loss=0.000, reward_mean=0.0, reward_bound=0.0
39232: loss=0.000, reward_mean=0.1, reward_bound=0.0
39233: loss=0.000, reward_mean=0.0, reward_bound=0.0
39234: loss=0.000, reward_mean=0.1, reward_bound=0.0
39235: loss=0.000, reward_mean=0.0, reward_bound=0.0
39236: loss=0.000, reward_mean=0.0, reward_bound=0.0
39237: loss=0.000, reward_mean=0.0, reward_bound=0.0
39238: loss=0.000, reward_mean=0.1, reward_bound=0.0
39239: loss=0.000, reward_mean=0.1, reward_bound=0.0
39240: loss=0.000, reward_mean=0.1, reward_bound=0.0
39241: loss=0.000, reward_mean=0.0, reward_bound=0.0
39242: loss=0.000, reward_mean=0.1, reward_bound=0.0
39243: loss=0.000, reward_mean=0.0, reward_bound=0.0
39244: loss=0.000, reward_mean=0.1, reward_bound=0.0
39245: loss=0.000, reward_mean=0.1, reward_bound=0.0
39246: loss=0.000, reward_mean=0.2, reward_bou

39383: loss=0.000, reward_mean=0.0, reward_bound=0.0
39384: loss=0.000, reward_mean=0.1, reward_bound=0.0
39385: loss=0.000, reward_mean=0.1, reward_bound=0.0
39386: loss=0.000, reward_mean=0.0, reward_bound=0.0
39387: loss=0.000, reward_mean=0.1, reward_bound=0.0
39388: loss=0.000, reward_mean=0.1, reward_bound=0.0
39389: loss=0.000, reward_mean=0.1, reward_bound=0.0
39390: loss=0.000, reward_mean=0.1, reward_bound=0.0
39391: loss=0.000, reward_mean=0.0, reward_bound=0.0
39392: loss=0.000, reward_mean=0.1, reward_bound=0.0
39393: loss=0.000, reward_mean=0.2, reward_bound=0.0
39394: loss=0.000, reward_mean=0.1, reward_bound=0.0
39395: loss=0.000, reward_mean=0.1, reward_bound=0.0
39396: loss=0.000, reward_mean=0.1, reward_bound=0.0
39397: loss=0.000, reward_mean=0.0, reward_bound=0.0
39398: loss=0.000, reward_mean=0.0, reward_bound=0.0
39399: loss=0.000, reward_mean=0.1, reward_bound=0.0
39400: loss=0.000, reward_mean=0.1, reward_bound=0.0
39401: loss=0.000, reward_mean=0.0, reward_bou

39545: loss=0.000, reward_mean=0.1, reward_bound=0.0
39546: loss=0.000, reward_mean=0.0, reward_bound=0.0
39547: loss=0.000, reward_mean=0.2, reward_bound=0.0
39548: loss=0.000, reward_mean=0.0, reward_bound=0.0
39549: loss=0.000, reward_mean=0.0, reward_bound=0.0
39550: loss=0.000, reward_mean=0.0, reward_bound=0.0
39551: loss=0.000, reward_mean=0.0, reward_bound=0.0
39552: loss=0.000, reward_mean=0.1, reward_bound=0.0
39553: loss=0.000, reward_mean=0.0, reward_bound=0.0
39554: loss=0.000, reward_mean=0.1, reward_bound=0.0
39555: loss=0.000, reward_mean=0.2, reward_bound=0.0
39556: loss=0.000, reward_mean=0.1, reward_bound=0.0
39557: loss=0.000, reward_mean=0.0, reward_bound=0.0
39558: loss=0.000, reward_mean=0.0, reward_bound=0.0
39559: loss=0.000, reward_mean=0.0, reward_bound=0.0
39560: loss=0.000, reward_mean=0.0, reward_bound=0.0
39561: loss=0.000, reward_mean=0.0, reward_bound=0.0
39562: loss=0.000, reward_mean=0.0, reward_bound=0.0
39563: loss=0.000, reward_mean=0.0, reward_bou

39707: loss=0.000, reward_mean=0.0, reward_bound=0.0
39708: loss=0.000, reward_mean=0.1, reward_bound=0.0
39709: loss=0.000, reward_mean=0.1, reward_bound=0.0
39710: loss=0.000, reward_mean=0.0, reward_bound=0.0
39711: loss=0.000, reward_mean=0.1, reward_bound=0.0
39712: loss=0.000, reward_mean=0.0, reward_bound=0.0
39713: loss=0.000, reward_mean=0.1, reward_bound=0.0
39714: loss=0.000, reward_mean=0.0, reward_bound=0.0
39715: loss=0.000, reward_mean=0.1, reward_bound=0.0
39716: loss=0.000, reward_mean=0.1, reward_bound=0.0
39717: loss=0.000, reward_mean=0.0, reward_bound=0.0
39718: loss=0.000, reward_mean=0.1, reward_bound=0.0
39719: loss=0.000, reward_mean=0.0, reward_bound=0.0
39720: loss=0.000, reward_mean=0.1, reward_bound=0.0
39721: loss=0.000, reward_mean=0.1, reward_bound=0.0
39722: loss=0.000, reward_mean=0.1, reward_bound=0.0
39723: loss=0.000, reward_mean=0.0, reward_bound=0.0
39724: loss=0.000, reward_mean=0.0, reward_bound=0.0
39725: loss=0.000, reward_mean=0.1, reward_bou

39867: loss=0.000, reward_mean=0.1, reward_bound=0.0
39868: loss=0.000, reward_mean=0.1, reward_bound=0.0
39869: loss=0.000, reward_mean=0.1, reward_bound=0.0
39870: loss=0.000, reward_mean=0.0, reward_bound=0.0
39871: loss=0.000, reward_mean=0.1, reward_bound=0.0
39872: loss=0.000, reward_mean=0.1, reward_bound=0.0
39873: loss=0.000, reward_mean=0.1, reward_bound=0.0
39874: loss=0.000, reward_mean=0.1, reward_bound=0.0
39875: loss=0.000, reward_mean=0.0, reward_bound=0.0
39876: loss=0.000, reward_mean=0.1, reward_bound=0.0
39877: loss=0.000, reward_mean=0.1, reward_bound=0.0
39878: loss=0.000, reward_mean=0.1, reward_bound=0.0
39879: loss=0.000, reward_mean=0.0, reward_bound=0.0
39880: loss=0.000, reward_mean=0.1, reward_bound=0.0
39881: loss=0.000, reward_mean=0.1, reward_bound=0.0
39882: loss=0.000, reward_mean=0.0, reward_bound=0.0
39883: loss=0.000, reward_mean=0.1, reward_bound=0.0
39884: loss=0.000, reward_mean=0.1, reward_bound=0.0
39885: loss=0.000, reward_mean=0.0, reward_bou

40022: loss=0.000, reward_mean=0.0, reward_bound=0.0
40023: loss=0.000, reward_mean=0.1, reward_bound=0.0
40024: loss=0.000, reward_mean=0.1, reward_bound=0.0
40025: loss=0.000, reward_mean=0.0, reward_bound=0.0
40026: loss=0.000, reward_mean=0.2, reward_bound=0.0
40027: loss=0.000, reward_mean=0.0, reward_bound=0.0
40028: loss=0.000, reward_mean=0.0, reward_bound=0.0
40029: loss=0.000, reward_mean=0.1, reward_bound=0.0
40030: loss=0.000, reward_mean=0.0, reward_bound=0.0
40031: loss=0.000, reward_mean=0.0, reward_bound=0.0
40032: loss=0.000, reward_mean=0.0, reward_bound=0.0
40033: loss=0.000, reward_mean=0.1, reward_bound=0.0
40034: loss=0.000, reward_mean=0.1, reward_bound=0.0
40035: loss=0.000, reward_mean=0.0, reward_bound=0.0
40036: loss=0.000, reward_mean=0.1, reward_bound=0.0
40037: loss=0.000, reward_mean=0.1, reward_bound=0.0
40038: loss=0.000, reward_mean=0.0, reward_bound=0.0
40039: loss=0.000, reward_mean=0.1, reward_bound=0.0
40040: loss=0.000, reward_mean=0.1, reward_bou

40178: loss=0.000, reward_mean=0.1, reward_bound=0.0
40179: loss=0.000, reward_mean=0.1, reward_bound=0.0
40180: loss=0.000, reward_mean=0.1, reward_bound=0.0
40181: loss=0.000, reward_mean=0.1, reward_bound=0.0
40182: loss=0.000, reward_mean=0.0, reward_bound=0.0
40183: loss=0.000, reward_mean=0.0, reward_bound=0.0
40184: loss=0.000, reward_mean=0.1, reward_bound=0.0
40185: loss=0.000, reward_mean=0.1, reward_bound=0.0
40186: loss=0.000, reward_mean=0.1, reward_bound=0.0
40187: loss=0.000, reward_mean=0.1, reward_bound=0.0
40188: loss=0.000, reward_mean=0.0, reward_bound=0.0
40189: loss=0.000, reward_mean=0.0, reward_bound=0.0
40190: loss=0.000, reward_mean=0.1, reward_bound=0.0
40191: loss=0.000, reward_mean=0.2, reward_bound=0.0
40192: loss=0.000, reward_mean=0.1, reward_bound=0.0
40193: loss=0.000, reward_mean=0.1, reward_bound=0.0
40194: loss=0.000, reward_mean=0.2, reward_bound=0.0
40195: loss=0.000, reward_mean=0.0, reward_bound=0.0
40196: loss=0.000, reward_mean=0.0, reward_bou

40334: loss=0.000, reward_mean=0.1, reward_bound=0.0
40335: loss=0.000, reward_mean=0.1, reward_bound=0.0
40336: loss=0.000, reward_mean=0.0, reward_bound=0.0
40337: loss=0.000, reward_mean=0.0, reward_bound=0.0
40338: loss=0.000, reward_mean=0.1, reward_bound=0.0
40339: loss=0.000, reward_mean=0.1, reward_bound=0.0
40340: loss=0.000, reward_mean=0.1, reward_bound=0.0
40341: loss=0.000, reward_mean=0.1, reward_bound=0.0
40342: loss=0.000, reward_mean=0.0, reward_bound=0.0
40343: loss=0.000, reward_mean=0.2, reward_bound=0.0
40344: loss=0.000, reward_mean=0.0, reward_bound=0.0
40345: loss=0.000, reward_mean=0.1, reward_bound=0.0
40346: loss=0.000, reward_mean=0.1, reward_bound=0.0
40347: loss=0.000, reward_mean=0.0, reward_bound=0.0
40348: loss=0.000, reward_mean=0.0, reward_bound=0.0
40349: loss=0.000, reward_mean=0.1, reward_bound=0.0
40350: loss=0.000, reward_mean=0.1, reward_bound=0.0
40351: loss=0.000, reward_mean=0.1, reward_bound=0.0
40352: loss=0.000, reward_mean=0.0, reward_bou

40491: loss=0.000, reward_mean=0.1, reward_bound=0.0
40492: loss=0.000, reward_mean=0.1, reward_bound=0.0
40493: loss=0.000, reward_mean=0.1, reward_bound=0.0
40494: loss=0.000, reward_mean=0.0, reward_bound=0.0
40495: loss=0.000, reward_mean=0.0, reward_bound=0.0
40496: loss=0.000, reward_mean=0.0, reward_bound=0.0
40497: loss=0.000, reward_mean=0.1, reward_bound=0.0
40498: loss=0.000, reward_mean=0.0, reward_bound=0.0
40499: loss=0.000, reward_mean=0.0, reward_bound=0.0
40500: loss=0.000, reward_mean=0.1, reward_bound=0.0
40501: loss=0.000, reward_mean=0.1, reward_bound=0.0
40502: loss=0.000, reward_mean=0.0, reward_bound=0.0
40503: loss=0.000, reward_mean=0.0, reward_bound=0.0
40504: loss=0.000, reward_mean=0.1, reward_bound=0.0
40505: loss=0.000, reward_mean=0.0, reward_bound=0.0
40506: loss=0.000, reward_mean=0.1, reward_bound=0.0
40507: loss=0.000, reward_mean=0.0, reward_bound=0.0
40508: loss=0.000, reward_mean=0.0, reward_bound=0.0
40509: loss=0.000, reward_mean=0.2, reward_bou

40652: loss=0.000, reward_mean=0.1, reward_bound=0.0
40653: loss=0.000, reward_mean=0.0, reward_bound=0.0
40654: loss=0.000, reward_mean=0.1, reward_bound=0.0
40655: loss=0.000, reward_mean=0.2, reward_bound=0.0
40656: loss=0.000, reward_mean=0.1, reward_bound=0.0
40657: loss=0.000, reward_mean=0.1, reward_bound=0.0
40658: loss=0.000, reward_mean=0.0, reward_bound=0.0
40659: loss=0.000, reward_mean=0.1, reward_bound=0.0
40660: loss=0.000, reward_mean=0.1, reward_bound=0.0
40661: loss=0.000, reward_mean=0.1, reward_bound=0.0
40662: loss=0.000, reward_mean=0.0, reward_bound=0.0
40663: loss=0.000, reward_mean=0.1, reward_bound=0.0
40664: loss=0.000, reward_mean=0.1, reward_bound=0.0
40665: loss=0.000, reward_mean=0.0, reward_bound=0.0
40666: loss=0.000, reward_mean=0.0, reward_bound=0.0
40667: loss=0.000, reward_mean=0.0, reward_bound=0.0
40668: loss=0.000, reward_mean=0.1, reward_bound=0.0
40669: loss=0.000, reward_mean=0.0, reward_bound=0.0
40670: loss=0.000, reward_mean=0.0, reward_bou

40810: loss=0.000, reward_mean=0.1, reward_bound=0.0
40811: loss=0.000, reward_mean=0.0, reward_bound=0.0
40812: loss=0.000, reward_mean=0.0, reward_bound=0.0
40813: loss=0.000, reward_mean=0.1, reward_bound=0.0
40814: loss=0.000, reward_mean=0.1, reward_bound=0.0
40815: loss=0.000, reward_mean=0.1, reward_bound=0.0
40816: loss=0.000, reward_mean=0.0, reward_bound=0.0
40817: loss=0.000, reward_mean=0.1, reward_bound=0.0
40818: loss=0.000, reward_mean=0.1, reward_bound=0.0
40819: loss=0.000, reward_mean=0.2, reward_bound=0.0
40820: loss=0.000, reward_mean=0.0, reward_bound=0.0
40821: loss=0.000, reward_mean=0.0, reward_bound=0.0
40822: loss=0.000, reward_mean=0.0, reward_bound=0.0
40823: loss=0.000, reward_mean=0.0, reward_bound=0.0
40824: loss=0.000, reward_mean=0.2, reward_bound=0.0
40825: loss=0.000, reward_mean=0.0, reward_bound=0.0
40826: loss=0.000, reward_mean=0.1, reward_bound=0.0
40827: loss=0.000, reward_mean=0.1, reward_bound=0.0
40828: loss=0.000, reward_mean=0.1, reward_bou

40965: loss=0.000, reward_mean=0.0, reward_bound=0.0
40966: loss=0.000, reward_mean=0.1, reward_bound=0.0
40967: loss=0.000, reward_mean=0.0, reward_bound=0.0
40968: loss=0.000, reward_mean=0.0, reward_bound=0.0
40969: loss=0.000, reward_mean=0.0, reward_bound=0.0
40970: loss=0.000, reward_mean=0.0, reward_bound=0.0
40971: loss=0.000, reward_mean=0.1, reward_bound=0.0
40972: loss=0.000, reward_mean=0.0, reward_bound=0.0
40973: loss=0.000, reward_mean=0.0, reward_bound=0.0
40974: loss=0.000, reward_mean=0.0, reward_bound=0.0
40975: loss=0.000, reward_mean=0.1, reward_bound=0.0
40976: loss=0.000, reward_mean=0.0, reward_bound=0.0
40977: loss=0.000, reward_mean=0.1, reward_bound=0.0
40978: loss=0.000, reward_mean=0.0, reward_bound=0.0
40979: loss=0.000, reward_mean=0.1, reward_bound=0.0
40980: loss=0.000, reward_mean=0.1, reward_bound=0.0
40981: loss=0.000, reward_mean=0.1, reward_bound=0.0
40982: loss=0.000, reward_mean=0.0, reward_bound=0.0
40983: loss=0.000, reward_mean=0.0, reward_bou

41121: loss=0.000, reward_mean=0.0, reward_bound=0.0
41122: loss=0.000, reward_mean=0.1, reward_bound=0.0
41123: loss=0.000, reward_mean=0.0, reward_bound=0.0
41124: loss=0.000, reward_mean=0.2, reward_bound=0.0
41125: loss=0.000, reward_mean=0.2, reward_bound=0.0
41126: loss=0.000, reward_mean=0.1, reward_bound=0.0
41127: loss=0.000, reward_mean=0.1, reward_bound=0.0
41128: loss=0.000, reward_mean=0.0, reward_bound=0.0
41129: loss=0.000, reward_mean=0.0, reward_bound=0.0
41130: loss=0.000, reward_mean=0.1, reward_bound=0.0
41131: loss=0.000, reward_mean=0.0, reward_bound=0.0
41132: loss=0.000, reward_mean=0.1, reward_bound=0.0
41133: loss=0.000, reward_mean=0.0, reward_bound=0.0
41134: loss=0.000, reward_mean=0.1, reward_bound=0.0
41135: loss=0.000, reward_mean=0.1, reward_bound=0.0
41136: loss=0.000, reward_mean=0.0, reward_bound=0.0
41137: loss=0.000, reward_mean=0.0, reward_bound=0.0
41138: loss=0.000, reward_mean=0.1, reward_bound=0.0
41139: loss=0.000, reward_mean=0.1, reward_bou

41281: loss=0.000, reward_mean=0.3, reward_bound=0.5
41282: loss=0.000, reward_mean=0.0, reward_bound=0.0
41283: loss=0.000, reward_mean=0.0, reward_bound=0.0
41284: loss=0.000, reward_mean=0.1, reward_bound=0.0
41285: loss=0.000, reward_mean=0.0, reward_bound=0.0
41286: loss=0.000, reward_mean=0.1, reward_bound=0.0
41287: loss=0.000, reward_mean=0.0, reward_bound=0.0
41288: loss=0.000, reward_mean=0.1, reward_bound=0.0
41289: loss=0.000, reward_mean=0.0, reward_bound=0.0
41290: loss=0.000, reward_mean=0.1, reward_bound=0.0
41291: loss=0.000, reward_mean=0.0, reward_bound=0.0
41292: loss=0.000, reward_mean=0.1, reward_bound=0.0
41293: loss=0.000, reward_mean=0.1, reward_bound=0.0
41294: loss=0.000, reward_mean=0.1, reward_bound=0.0
41295: loss=0.000, reward_mean=0.0, reward_bound=0.0
41296: loss=0.000, reward_mean=0.0, reward_bound=0.0
41297: loss=0.000, reward_mean=0.2, reward_bound=0.0
41298: loss=0.000, reward_mean=0.1, reward_bound=0.0
41299: loss=0.000, reward_mean=0.1, reward_bou

41436: loss=0.000, reward_mean=0.0, reward_bound=0.0
41437: loss=0.000, reward_mean=0.1, reward_bound=0.0
41438: loss=0.000, reward_mean=0.1, reward_bound=0.0
41439: loss=0.000, reward_mean=0.1, reward_bound=0.0
41440: loss=0.000, reward_mean=0.0, reward_bound=0.0
41441: loss=0.000, reward_mean=0.1, reward_bound=0.0
41442: loss=0.000, reward_mean=0.0, reward_bound=0.0
41443: loss=0.000, reward_mean=0.1, reward_bound=0.0
41444: loss=0.000, reward_mean=0.0, reward_bound=0.0
41445: loss=0.000, reward_mean=0.0, reward_bound=0.0
41446: loss=0.000, reward_mean=0.0, reward_bound=0.0
41447: loss=0.000, reward_mean=0.1, reward_bound=0.0
41448: loss=0.000, reward_mean=0.1, reward_bound=0.0
41449: loss=0.000, reward_mean=0.0, reward_bound=0.0
41450: loss=0.000, reward_mean=0.0, reward_bound=0.0
41451: loss=0.000, reward_mean=0.0, reward_bound=0.0
41452: loss=0.000, reward_mean=0.0, reward_bound=0.0
41453: loss=0.000, reward_mean=0.0, reward_bound=0.0
41454: loss=0.000, reward_mean=0.1, reward_bou

41592: loss=0.000, reward_mean=0.2, reward_bound=0.0
41593: loss=0.000, reward_mean=0.1, reward_bound=0.0
41594: loss=0.000, reward_mean=0.1, reward_bound=0.0
41595: loss=0.000, reward_mean=0.1, reward_bound=0.0
41596: loss=0.000, reward_mean=0.1, reward_bound=0.0
41597: loss=0.000, reward_mean=0.0, reward_bound=0.0
41598: loss=0.000, reward_mean=0.1, reward_bound=0.0
41599: loss=0.000, reward_mean=0.1, reward_bound=0.0
41600: loss=0.000, reward_mean=0.0, reward_bound=0.0
41601: loss=0.000, reward_mean=0.0, reward_bound=0.0
41602: loss=0.000, reward_mean=0.1, reward_bound=0.0
41603: loss=0.000, reward_mean=0.0, reward_bound=0.0
41604: loss=0.000, reward_mean=0.1, reward_bound=0.0
41605: loss=0.000, reward_mean=0.1, reward_bound=0.0
41606: loss=0.000, reward_mean=0.1, reward_bound=0.0
41607: loss=0.000, reward_mean=0.0, reward_bound=0.0
41608: loss=0.000, reward_mean=0.1, reward_bound=0.0
41609: loss=0.000, reward_mean=0.1, reward_bound=0.0
41610: loss=0.000, reward_mean=0.1, reward_bou

41753: loss=0.000, reward_mean=0.0, reward_bound=0.0
41754: loss=0.000, reward_mean=0.1, reward_bound=0.0
41755: loss=0.000, reward_mean=0.1, reward_bound=0.0
41756: loss=0.000, reward_mean=0.1, reward_bound=0.0
41757: loss=0.000, reward_mean=0.0, reward_bound=0.0
41758: loss=0.000, reward_mean=0.1, reward_bound=0.0
41759: loss=0.000, reward_mean=0.1, reward_bound=0.0
41760: loss=0.000, reward_mean=0.0, reward_bound=0.0
41761: loss=0.000, reward_mean=0.0, reward_bound=0.0
41762: loss=0.000, reward_mean=0.0, reward_bound=0.0
41763: loss=0.000, reward_mean=0.0, reward_bound=0.0
41764: loss=0.000, reward_mean=0.1, reward_bound=0.0
41765: loss=0.000, reward_mean=0.0, reward_bound=0.0
41766: loss=0.000, reward_mean=0.1, reward_bound=0.0
41767: loss=0.000, reward_mean=0.0, reward_bound=0.0
41768: loss=0.000, reward_mean=0.0, reward_bound=0.0
41769: loss=0.000, reward_mean=0.0, reward_bound=0.0
41770: loss=0.000, reward_mean=0.0, reward_bound=0.0
41771: loss=0.000, reward_mean=0.1, reward_bou

41912: loss=0.000, reward_mean=0.1, reward_bound=0.0
41913: loss=0.000, reward_mean=0.0, reward_bound=0.0
41914: loss=0.000, reward_mean=0.0, reward_bound=0.0
41915: loss=0.000, reward_mean=0.1, reward_bound=0.0
41916: loss=0.000, reward_mean=0.1, reward_bound=0.0
41917: loss=0.000, reward_mean=0.0, reward_bound=0.0
41918: loss=0.000, reward_mean=0.1, reward_bound=0.0
41919: loss=0.000, reward_mean=0.0, reward_bound=0.0
41920: loss=0.000, reward_mean=0.0, reward_bound=0.0
41921: loss=0.000, reward_mean=0.0, reward_bound=0.0
41922: loss=0.000, reward_mean=0.0, reward_bound=0.0
41923: loss=0.000, reward_mean=0.0, reward_bound=0.0
41924: loss=0.000, reward_mean=0.1, reward_bound=0.0
41925: loss=0.000, reward_mean=0.0, reward_bound=0.0
41926: loss=0.000, reward_mean=0.0, reward_bound=0.0
41927: loss=0.000, reward_mean=0.0, reward_bound=0.0
41928: loss=0.000, reward_mean=0.1, reward_bound=0.0
41929: loss=0.000, reward_mean=0.1, reward_bound=0.0
41930: loss=0.000, reward_mean=0.0, reward_bou

42067: loss=0.000, reward_mean=0.1, reward_bound=0.0
42068: loss=0.000, reward_mean=0.0, reward_bound=0.0
42069: loss=0.000, reward_mean=0.0, reward_bound=0.0
42070: loss=0.000, reward_mean=0.1, reward_bound=0.0
42071: loss=0.000, reward_mean=0.1, reward_bound=0.0
42072: loss=0.000, reward_mean=0.1, reward_bound=0.0
42073: loss=0.000, reward_mean=0.0, reward_bound=0.0
42074: loss=0.000, reward_mean=0.1, reward_bound=0.0
42075: loss=0.000, reward_mean=0.1, reward_bound=0.0
42076: loss=0.000, reward_mean=0.1, reward_bound=0.0
42077: loss=0.000, reward_mean=0.1, reward_bound=0.0
42078: loss=0.000, reward_mean=0.0, reward_bound=0.0
42079: loss=0.000, reward_mean=0.0, reward_bound=0.0
42080: loss=0.000, reward_mean=0.0, reward_bound=0.0
42081: loss=0.000, reward_mean=0.1, reward_bound=0.0
42082: loss=0.000, reward_mean=0.1, reward_bound=0.0
42083: loss=0.000, reward_mean=0.0, reward_bound=0.0
42084: loss=0.000, reward_mean=0.0, reward_bound=0.0
42085: loss=0.000, reward_mean=0.1, reward_bou

42222: loss=0.000, reward_mean=0.2, reward_bound=0.0
42223: loss=0.000, reward_mean=0.1, reward_bound=0.0
42224: loss=0.000, reward_mean=0.0, reward_bound=0.0
42225: loss=0.000, reward_mean=0.1, reward_bound=0.0
42226: loss=0.000, reward_mean=0.2, reward_bound=0.0
42227: loss=0.000, reward_mean=0.1, reward_bound=0.0
42228: loss=0.000, reward_mean=0.0, reward_bound=0.0
42229: loss=0.000, reward_mean=0.0, reward_bound=0.0
42230: loss=0.000, reward_mean=0.1, reward_bound=0.0
42231: loss=0.000, reward_mean=0.1, reward_bound=0.0
42232: loss=0.000, reward_mean=0.1, reward_bound=0.0
42233: loss=0.000, reward_mean=0.1, reward_bound=0.0
42234: loss=0.000, reward_mean=0.0, reward_bound=0.0
42235: loss=0.000, reward_mean=0.1, reward_bound=0.0
42236: loss=0.000, reward_mean=0.2, reward_bound=0.0
42237: loss=0.000, reward_mean=0.1, reward_bound=0.0
42238: loss=0.000, reward_mean=0.0, reward_bound=0.0
42239: loss=0.000, reward_mean=0.1, reward_bound=0.0
42240: loss=0.000, reward_mean=0.0, reward_bou

42380: loss=0.000, reward_mean=0.1, reward_bound=0.0
42381: loss=0.000, reward_mean=0.0, reward_bound=0.0
42382: loss=0.000, reward_mean=0.1, reward_bound=0.0
42383: loss=0.000, reward_mean=0.1, reward_bound=0.0
42384: loss=0.000, reward_mean=0.0, reward_bound=0.0
42385: loss=0.000, reward_mean=0.1, reward_bound=0.0
42386: loss=0.000, reward_mean=0.0, reward_bound=0.0
42387: loss=0.000, reward_mean=0.1, reward_bound=0.0
42388: loss=0.000, reward_mean=0.1, reward_bound=0.0
42389: loss=0.000, reward_mean=0.0, reward_bound=0.0
42390: loss=0.000, reward_mean=0.1, reward_bound=0.0
42391: loss=0.000, reward_mean=0.1, reward_bound=0.0
42392: loss=0.000, reward_mean=0.0, reward_bound=0.0
42393: loss=0.000, reward_mean=0.1, reward_bound=0.0
42394: loss=0.000, reward_mean=0.1, reward_bound=0.0
42395: loss=0.000, reward_mean=0.0, reward_bound=0.0
42396: loss=0.000, reward_mean=0.0, reward_bound=0.0
42397: loss=0.000, reward_mean=0.1, reward_bound=0.0
42398: loss=0.000, reward_mean=0.1, reward_bou

42536: loss=0.000, reward_mean=0.1, reward_bound=0.0
42537: loss=0.000, reward_mean=0.2, reward_bound=0.0
42538: loss=0.000, reward_mean=0.1, reward_bound=0.0
42539: loss=0.000, reward_mean=0.0, reward_bound=0.0
42540: loss=0.000, reward_mean=0.0, reward_bound=0.0
42541: loss=0.000, reward_mean=0.0, reward_bound=0.0
42542: loss=0.000, reward_mean=0.0, reward_bound=0.0
42543: loss=0.000, reward_mean=0.0, reward_bound=0.0
42544: loss=0.000, reward_mean=0.1, reward_bound=0.0
42545: loss=0.000, reward_mean=0.1, reward_bound=0.0
42546: loss=0.000, reward_mean=0.0, reward_bound=0.0
42547: loss=0.000, reward_mean=0.1, reward_bound=0.0
42548: loss=0.000, reward_mean=0.2, reward_bound=0.0
42549: loss=0.000, reward_mean=0.1, reward_bound=0.0
42550: loss=0.000, reward_mean=0.0, reward_bound=0.0
42551: loss=0.000, reward_mean=0.1, reward_bound=0.0
42552: loss=0.000, reward_mean=0.1, reward_bound=0.0
42553: loss=0.000, reward_mean=0.0, reward_bound=0.0
42554: loss=0.000, reward_mean=0.0, reward_bou

42698: loss=0.000, reward_mean=0.0, reward_bound=0.0
42699: loss=0.000, reward_mean=0.1, reward_bound=0.0
42700: loss=0.000, reward_mean=0.1, reward_bound=0.0
42701: loss=0.000, reward_mean=0.1, reward_bound=0.0
42702: loss=0.000, reward_mean=0.1, reward_bound=0.0
42703: loss=0.000, reward_mean=0.1, reward_bound=0.0
42704: loss=0.000, reward_mean=0.0, reward_bound=0.0
42705: loss=0.000, reward_mean=0.1, reward_bound=0.0
42706: loss=0.000, reward_mean=0.1, reward_bound=0.0
42707: loss=0.000, reward_mean=0.1, reward_bound=0.0
42708: loss=0.000, reward_mean=0.1, reward_bound=0.0
42709: loss=0.000, reward_mean=0.1, reward_bound=0.0
42710: loss=0.000, reward_mean=0.1, reward_bound=0.0
42711: loss=0.000, reward_mean=0.1, reward_bound=0.0
42712: loss=0.000, reward_mean=0.1, reward_bound=0.0
42713: loss=0.000, reward_mean=0.1, reward_bound=0.0
42714: loss=0.000, reward_mean=0.0, reward_bound=0.0
42715: loss=0.000, reward_mean=0.0, reward_bound=0.0
42716: loss=0.000, reward_mean=0.1, reward_bou

42854: loss=0.000, reward_mean=0.1, reward_bound=0.0
42855: loss=0.000, reward_mean=0.0, reward_bound=0.0
42856: loss=0.000, reward_mean=0.1, reward_bound=0.0
42857: loss=0.000, reward_mean=0.0, reward_bound=0.0
42858: loss=0.000, reward_mean=0.1, reward_bound=0.0
42859: loss=0.000, reward_mean=0.0, reward_bound=0.0
42860: loss=0.000, reward_mean=0.0, reward_bound=0.0
42861: loss=0.000, reward_mean=0.1, reward_bound=0.0
42862: loss=0.000, reward_mean=0.0, reward_bound=0.0
42863: loss=0.000, reward_mean=0.1, reward_bound=0.0
42864: loss=0.000, reward_mean=0.1, reward_bound=0.0
42865: loss=0.000, reward_mean=0.0, reward_bound=0.0
42866: loss=0.000, reward_mean=0.0, reward_bound=0.0
42867: loss=0.000, reward_mean=0.1, reward_bound=0.0
42868: loss=0.000, reward_mean=0.0, reward_bound=0.0
42869: loss=0.000, reward_mean=0.1, reward_bound=0.0
42870: loss=0.000, reward_mean=0.0, reward_bound=0.0
42871: loss=0.000, reward_mean=0.1, reward_bound=0.0
42872: loss=0.000, reward_mean=0.0, reward_bou

43014: loss=0.000, reward_mean=0.1, reward_bound=0.0
43015: loss=0.000, reward_mean=0.0, reward_bound=0.0
43016: loss=0.000, reward_mean=0.1, reward_bound=0.0
43017: loss=0.000, reward_mean=0.1, reward_bound=0.0
43018: loss=0.000, reward_mean=0.0, reward_bound=0.0
43019: loss=0.000, reward_mean=0.0, reward_bound=0.0
43020: loss=0.000, reward_mean=0.1, reward_bound=0.0
43021: loss=0.000, reward_mean=0.0, reward_bound=0.0
43022: loss=0.000, reward_mean=0.1, reward_bound=0.0
43023: loss=0.000, reward_mean=0.0, reward_bound=0.0
43024: loss=0.000, reward_mean=0.0, reward_bound=0.0
43025: loss=0.000, reward_mean=0.0, reward_bound=0.0
43026: loss=0.000, reward_mean=0.0, reward_bound=0.0
43027: loss=0.000, reward_mean=0.1, reward_bound=0.0
43028: loss=0.000, reward_mean=0.0, reward_bound=0.0
43029: loss=0.000, reward_mean=0.0, reward_bound=0.0
43030: loss=0.000, reward_mean=0.1, reward_bound=0.0
43031: loss=0.000, reward_mean=0.1, reward_bound=0.0
43032: loss=0.000, reward_mean=0.1, reward_bou

43171: loss=0.000, reward_mean=0.2, reward_bound=0.0
43172: loss=0.000, reward_mean=0.1, reward_bound=0.0
43173: loss=0.000, reward_mean=0.0, reward_bound=0.0
43174: loss=0.000, reward_mean=0.1, reward_bound=0.0
43175: loss=0.000, reward_mean=0.0, reward_bound=0.0
43176: loss=0.000, reward_mean=0.2, reward_bound=0.0
43177: loss=0.000, reward_mean=0.0, reward_bound=0.0
43178: loss=0.000, reward_mean=0.2, reward_bound=0.0
43179: loss=0.000, reward_mean=0.0, reward_bound=0.0
43180: loss=0.000, reward_mean=0.1, reward_bound=0.0
43181: loss=0.000, reward_mean=0.0, reward_bound=0.0
43182: loss=0.000, reward_mean=0.1, reward_bound=0.0
43183: loss=0.000, reward_mean=0.0, reward_bound=0.0
43184: loss=0.000, reward_mean=0.0, reward_bound=0.0
43185: loss=0.000, reward_mean=0.1, reward_bound=0.0
43186: loss=0.000, reward_mean=0.0, reward_bound=0.0
43187: loss=0.000, reward_mean=0.1, reward_bound=0.0
43188: loss=0.000, reward_mean=0.1, reward_bound=0.0
43189: loss=0.000, reward_mean=0.0, reward_bou

43327: loss=0.000, reward_mean=0.1, reward_bound=0.0
43328: loss=0.000, reward_mean=0.0, reward_bound=0.0
43329: loss=0.000, reward_mean=0.0, reward_bound=0.0
43330: loss=0.000, reward_mean=0.1, reward_bound=0.0
43331: loss=0.000, reward_mean=0.1, reward_bound=0.0
43332: loss=0.000, reward_mean=0.0, reward_bound=0.0
43333: loss=0.000, reward_mean=0.1, reward_bound=0.0
43334: loss=0.000, reward_mean=0.0, reward_bound=0.0
43335: loss=0.000, reward_mean=0.1, reward_bound=0.0
43336: loss=0.000, reward_mean=0.0, reward_bound=0.0
43337: loss=0.000, reward_mean=0.0, reward_bound=0.0
43338: loss=0.000, reward_mean=0.1, reward_bound=0.0
43339: loss=0.000, reward_mean=0.2, reward_bound=0.0
43340: loss=0.000, reward_mean=0.1, reward_bound=0.0
43341: loss=0.000, reward_mean=0.1, reward_bound=0.0
43342: loss=0.000, reward_mean=0.1, reward_bound=0.0
43343: loss=0.000, reward_mean=0.0, reward_bound=0.0
43344: loss=0.000, reward_mean=0.1, reward_bound=0.0
43345: loss=0.000, reward_mean=0.2, reward_bou

43489: loss=0.000, reward_mean=0.0, reward_bound=0.0
43490: loss=0.000, reward_mean=0.0, reward_bound=0.0
43491: loss=0.000, reward_mean=0.1, reward_bound=0.0
43492: loss=0.000, reward_mean=0.1, reward_bound=0.0
43493: loss=0.000, reward_mean=0.0, reward_bound=0.0
43494: loss=0.000, reward_mean=0.1, reward_bound=0.0
43495: loss=0.000, reward_mean=0.0, reward_bound=0.0
43496: loss=0.000, reward_mean=0.0, reward_bound=0.0
43497: loss=0.000, reward_mean=0.0, reward_bound=0.0
43498: loss=0.000, reward_mean=0.1, reward_bound=0.0
43499: loss=0.000, reward_mean=0.0, reward_bound=0.0
43500: loss=0.000, reward_mean=0.0, reward_bound=0.0
43501: loss=0.000, reward_mean=0.0, reward_bound=0.0
43502: loss=0.000, reward_mean=0.0, reward_bound=0.0
43503: loss=0.000, reward_mean=0.1, reward_bound=0.0
43504: loss=0.000, reward_mean=0.1, reward_bound=0.0
43505: loss=0.000, reward_mean=0.0, reward_bound=0.0
43506: loss=0.000, reward_mean=0.1, reward_bound=0.0
43507: loss=0.000, reward_mean=0.0, reward_bou

43647: loss=0.000, reward_mean=0.1, reward_bound=0.0
43648: loss=0.000, reward_mean=0.0, reward_bound=0.0
43649: loss=0.000, reward_mean=0.1, reward_bound=0.0
43650: loss=0.000, reward_mean=0.0, reward_bound=0.0
43651: loss=0.000, reward_mean=0.0, reward_bound=0.0
43652: loss=0.000, reward_mean=0.1, reward_bound=0.0
43653: loss=0.000, reward_mean=0.0, reward_bound=0.0
43654: loss=0.000, reward_mean=0.0, reward_bound=0.0
43655: loss=0.000, reward_mean=0.0, reward_bound=0.0
43656: loss=0.000, reward_mean=0.1, reward_bound=0.0
43657: loss=0.000, reward_mean=0.1, reward_bound=0.0
43658: loss=0.000, reward_mean=0.1, reward_bound=0.0
43659: loss=0.000, reward_mean=0.1, reward_bound=0.0
43660: loss=0.000, reward_mean=0.1, reward_bound=0.0
43661: loss=0.000, reward_mean=0.0, reward_bound=0.0
43662: loss=0.000, reward_mean=0.0, reward_bound=0.0
43663: loss=0.000, reward_mean=0.1, reward_bound=0.0
43664: loss=0.000, reward_mean=0.1, reward_bound=0.0
43665: loss=0.000, reward_mean=0.0, reward_bou

43806: loss=0.000, reward_mean=0.1, reward_bound=0.0
43807: loss=0.000, reward_mean=0.0, reward_bound=0.0
43808: loss=0.000, reward_mean=0.1, reward_bound=0.0
43809: loss=0.000, reward_mean=0.1, reward_bound=0.0
43810: loss=0.000, reward_mean=0.1, reward_bound=0.0
43811: loss=0.000, reward_mean=0.0, reward_bound=0.0
43812: loss=0.000, reward_mean=0.1, reward_bound=0.0
43813: loss=0.000, reward_mean=0.1, reward_bound=0.0
43814: loss=0.000, reward_mean=0.0, reward_bound=0.0
43815: loss=0.000, reward_mean=0.0, reward_bound=0.0
43816: loss=0.000, reward_mean=0.1, reward_bound=0.0
43817: loss=0.000, reward_mean=0.0, reward_bound=0.0
43818: loss=0.000, reward_mean=0.1, reward_bound=0.0
43819: loss=0.000, reward_mean=0.0, reward_bound=0.0
43820: loss=0.000, reward_mean=0.1, reward_bound=0.0
43821: loss=0.000, reward_mean=0.0, reward_bound=0.0
43822: loss=0.000, reward_mean=0.1, reward_bound=0.0
43823: loss=0.000, reward_mean=0.0, reward_bound=0.0
43824: loss=0.000, reward_mean=0.0, reward_bou

43965: loss=0.000, reward_mean=0.1, reward_bound=0.0
43966: loss=0.000, reward_mean=0.0, reward_bound=0.0
43967: loss=0.000, reward_mean=0.1, reward_bound=0.0
43968: loss=0.000, reward_mean=0.0, reward_bound=0.0
43969: loss=0.000, reward_mean=0.0, reward_bound=0.0
43970: loss=0.000, reward_mean=0.2, reward_bound=0.0
43971: loss=0.000, reward_mean=0.0, reward_bound=0.0
43972: loss=0.000, reward_mean=0.1, reward_bound=0.0
43973: loss=0.000, reward_mean=0.2, reward_bound=0.0
43974: loss=0.000, reward_mean=0.0, reward_bound=0.0
43975: loss=0.000, reward_mean=0.2, reward_bound=0.0
43976: loss=0.000, reward_mean=0.1, reward_bound=0.0
43977: loss=0.000, reward_mean=0.1, reward_bound=0.0
43978: loss=0.000, reward_mean=0.1, reward_bound=0.0
43979: loss=0.000, reward_mean=0.1, reward_bound=0.0
43980: loss=0.000, reward_mean=0.0, reward_bound=0.0
43981: loss=0.000, reward_mean=0.0, reward_bound=0.0
43982: loss=0.000, reward_mean=0.1, reward_bound=0.0
43983: loss=0.000, reward_mean=0.0, reward_bou

44123: loss=0.000, reward_mean=0.1, reward_bound=0.0
44124: loss=0.000, reward_mean=0.1, reward_bound=0.0
44125: loss=0.000, reward_mean=0.0, reward_bound=0.0
44126: loss=0.000, reward_mean=0.1, reward_bound=0.0
44127: loss=0.000, reward_mean=0.0, reward_bound=0.0
44128: loss=0.000, reward_mean=0.1, reward_bound=0.0
44129: loss=0.000, reward_mean=0.1, reward_bound=0.0
44130: loss=0.000, reward_mean=0.0, reward_bound=0.0
44131: loss=0.000, reward_mean=0.0, reward_bound=0.0
44132: loss=0.000, reward_mean=0.1, reward_bound=0.0
44133: loss=0.000, reward_mean=0.1, reward_bound=0.0
44134: loss=0.000, reward_mean=0.0, reward_bound=0.0
44135: loss=0.000, reward_mean=0.0, reward_bound=0.0
44136: loss=0.000, reward_mean=0.1, reward_bound=0.0
44137: loss=0.000, reward_mean=0.1, reward_bound=0.0
44138: loss=0.000, reward_mean=0.1, reward_bound=0.0
44139: loss=0.000, reward_mean=0.1, reward_bound=0.0
44140: loss=0.000, reward_mean=0.1, reward_bound=0.0
44141: loss=0.000, reward_mean=0.0, reward_bou

44278: loss=0.000, reward_mean=0.1, reward_bound=0.0
44279: loss=0.000, reward_mean=0.0, reward_bound=0.0
44280: loss=0.000, reward_mean=0.0, reward_bound=0.0
44281: loss=0.000, reward_mean=0.1, reward_bound=0.0
44282: loss=0.000, reward_mean=0.1, reward_bound=0.0
44283: loss=0.000, reward_mean=0.1, reward_bound=0.0
44284: loss=0.000, reward_mean=0.1, reward_bound=0.0
44285: loss=0.000, reward_mean=0.0, reward_bound=0.0
44286: loss=0.000, reward_mean=0.0, reward_bound=0.0
44287: loss=0.000, reward_mean=0.1, reward_bound=0.0
44288: loss=0.000, reward_mean=0.0, reward_bound=0.0
44289: loss=0.000, reward_mean=0.1, reward_bound=0.0
44290: loss=0.000, reward_mean=0.1, reward_bound=0.0
44291: loss=0.000, reward_mean=0.0, reward_bound=0.0
44292: loss=0.000, reward_mean=0.1, reward_bound=0.0
44293: loss=0.000, reward_mean=0.0, reward_bound=0.0
44294: loss=0.000, reward_mean=0.2, reward_bound=0.0
44295: loss=0.000, reward_mean=0.0, reward_bound=0.0
44296: loss=0.000, reward_mean=0.0, reward_bou

44433: loss=0.000, reward_mean=0.0, reward_bound=0.0
44434: loss=0.000, reward_mean=0.1, reward_bound=0.0
44435: loss=0.000, reward_mean=0.0, reward_bound=0.0
44436: loss=0.000, reward_mean=0.1, reward_bound=0.0
44437: loss=0.000, reward_mean=0.0, reward_bound=0.0
44438: loss=0.000, reward_mean=0.0, reward_bound=0.0
44439: loss=0.000, reward_mean=0.1, reward_bound=0.0
44440: loss=0.000, reward_mean=0.1, reward_bound=0.0
44441: loss=0.000, reward_mean=0.1, reward_bound=0.0
44442: loss=0.000, reward_mean=0.0, reward_bound=0.0
44443: loss=0.000, reward_mean=0.1, reward_bound=0.0
44444: loss=0.000, reward_mean=0.0, reward_bound=0.0
44445: loss=0.000, reward_mean=0.1, reward_bound=0.0
44446: loss=0.000, reward_mean=0.1, reward_bound=0.0
44447: loss=0.000, reward_mean=0.1, reward_bound=0.0
44448: loss=0.000, reward_mean=0.1, reward_bound=0.0
44449: loss=0.000, reward_mean=0.0, reward_bound=0.0
44450: loss=0.000, reward_mean=0.1, reward_bound=0.0
44451: loss=0.000, reward_mean=0.1, reward_bou

44588: loss=0.000, reward_mean=0.1, reward_bound=0.0
44589: loss=0.000, reward_mean=0.1, reward_bound=0.0
44590: loss=0.000, reward_mean=0.0, reward_bound=0.0
44591: loss=0.000, reward_mean=0.1, reward_bound=0.0
44592: loss=0.000, reward_mean=0.0, reward_bound=0.0
44593: loss=0.000, reward_mean=0.1, reward_bound=0.0
44594: loss=0.000, reward_mean=0.0, reward_bound=0.0
44595: loss=0.000, reward_mean=0.0, reward_bound=0.0
44596: loss=0.000, reward_mean=0.2, reward_bound=0.0
44597: loss=0.000, reward_mean=0.0, reward_bound=0.0
44598: loss=0.000, reward_mean=0.1, reward_bound=0.0
44599: loss=0.000, reward_mean=0.1, reward_bound=0.0
44600: loss=0.000, reward_mean=0.0, reward_bound=0.0
44601: loss=0.000, reward_mean=0.1, reward_bound=0.0
44602: loss=0.000, reward_mean=0.0, reward_bound=0.0
44603: loss=0.000, reward_mean=0.1, reward_bound=0.0
44604: loss=0.000, reward_mean=0.1, reward_bound=0.0
44605: loss=0.000, reward_mean=0.0, reward_bound=0.0
44606: loss=0.000, reward_mean=0.1, reward_bou

44749: loss=0.000, reward_mean=0.1, reward_bound=0.0
44750: loss=0.000, reward_mean=0.0, reward_bound=0.0
44751: loss=0.000, reward_mean=0.1, reward_bound=0.0
44752: loss=0.000, reward_mean=0.1, reward_bound=0.0
44753: loss=0.000, reward_mean=0.1, reward_bound=0.0
44754: loss=0.000, reward_mean=0.1, reward_bound=0.0
44755: loss=0.000, reward_mean=0.1, reward_bound=0.0
44756: loss=0.000, reward_mean=0.1, reward_bound=0.0
44757: loss=0.000, reward_mean=0.1, reward_bound=0.0
44758: loss=0.000, reward_mean=0.0, reward_bound=0.0
44759: loss=0.000, reward_mean=0.0, reward_bound=0.0
44760: loss=0.000, reward_mean=0.0, reward_bound=0.0
44761: loss=0.000, reward_mean=0.0, reward_bound=0.0
44762: loss=0.000, reward_mean=0.1, reward_bound=0.0
44763: loss=0.000, reward_mean=0.3, reward_bound=0.5
44764: loss=0.000, reward_mean=0.2, reward_bound=0.0
44765: loss=0.000, reward_mean=0.0, reward_bound=0.0
44766: loss=0.000, reward_mean=0.0, reward_bound=0.0
44767: loss=0.000, reward_mean=0.1, reward_bou

44905: loss=0.000, reward_mean=0.1, reward_bound=0.0
44906: loss=0.000, reward_mean=0.0, reward_bound=0.0
44907: loss=0.000, reward_mean=0.2, reward_bound=0.0
44908: loss=0.000, reward_mean=0.2, reward_bound=0.0
44909: loss=0.000, reward_mean=0.2, reward_bound=0.0
44910: loss=0.000, reward_mean=0.1, reward_bound=0.0
44911: loss=0.000, reward_mean=0.0, reward_bound=0.0
44912: loss=0.000, reward_mean=0.1, reward_bound=0.0
44913: loss=0.000, reward_mean=0.0, reward_bound=0.0
44914: loss=0.000, reward_mean=0.2, reward_bound=0.0
44915: loss=0.000, reward_mean=0.1, reward_bound=0.0
44916: loss=0.000, reward_mean=0.0, reward_bound=0.0
44917: loss=0.000, reward_mean=0.0, reward_bound=0.0
44918: loss=0.000, reward_mean=0.0, reward_bound=0.0
44919: loss=0.000, reward_mean=0.0, reward_bound=0.0
44920: loss=0.000, reward_mean=0.1, reward_bound=0.0
44921: loss=0.000, reward_mean=0.1, reward_bound=0.0
44922: loss=0.000, reward_mean=0.0, reward_bound=0.0
44923: loss=0.000, reward_mean=0.1, reward_bou

45062: loss=0.000, reward_mean=0.1, reward_bound=0.0
45063: loss=0.000, reward_mean=0.0, reward_bound=0.0
45064: loss=0.000, reward_mean=0.0, reward_bound=0.0
45065: loss=0.000, reward_mean=0.0, reward_bound=0.0
45066: loss=0.000, reward_mean=0.1, reward_bound=0.0
45067: loss=0.000, reward_mean=0.1, reward_bound=0.0
45068: loss=0.000, reward_mean=0.1, reward_bound=0.0
45069: loss=0.000, reward_mean=0.0, reward_bound=0.0
45070: loss=0.000, reward_mean=0.0, reward_bound=0.0
45071: loss=0.000, reward_mean=0.0, reward_bound=0.0
45072: loss=0.000, reward_mean=0.1, reward_bound=0.0
45073: loss=0.000, reward_mean=0.1, reward_bound=0.0
45074: loss=0.000, reward_mean=0.0, reward_bound=0.0
45075: loss=0.000, reward_mean=0.0, reward_bound=0.0
45076: loss=0.000, reward_mean=0.1, reward_bound=0.0
45077: loss=0.000, reward_mean=0.1, reward_bound=0.0
45078: loss=0.000, reward_mean=0.1, reward_bound=0.0
45079: loss=0.000, reward_mean=0.1, reward_bound=0.0
45080: loss=0.000, reward_mean=0.1, reward_bou

45223: loss=0.000, reward_mean=0.0, reward_bound=0.0
45224: loss=0.000, reward_mean=0.1, reward_bound=0.0
45225: loss=0.000, reward_mean=0.1, reward_bound=0.0
45226: loss=0.000, reward_mean=0.0, reward_bound=0.0
45227: loss=0.000, reward_mean=0.1, reward_bound=0.0
45228: loss=0.000, reward_mean=0.1, reward_bound=0.0
45229: loss=0.000, reward_mean=0.0, reward_bound=0.0
45230: loss=0.000, reward_mean=0.1, reward_bound=0.0
45231: loss=0.000, reward_mean=0.1, reward_bound=0.0
45232: loss=0.000, reward_mean=0.1, reward_bound=0.0
45233: loss=0.000, reward_mean=0.1, reward_bound=0.0
45234: loss=0.000, reward_mean=0.1, reward_bound=0.0
45235: loss=0.000, reward_mean=0.1, reward_bound=0.0
45236: loss=0.000, reward_mean=0.1, reward_bound=0.0
45237: loss=0.000, reward_mean=0.1, reward_bound=0.0
45238: loss=0.000, reward_mean=0.0, reward_bound=0.0
45239: loss=0.000, reward_mean=0.1, reward_bound=0.0
45240: loss=0.000, reward_mean=0.0, reward_bound=0.0
45241: loss=0.000, reward_mean=0.0, reward_bou

45383: loss=0.000, reward_mean=0.2, reward_bound=0.0
45384: loss=0.000, reward_mean=0.1, reward_bound=0.0
45385: loss=0.000, reward_mean=0.1, reward_bound=0.0
45386: loss=0.000, reward_mean=0.1, reward_bound=0.0
45387: loss=0.000, reward_mean=0.1, reward_bound=0.0
45388: loss=0.000, reward_mean=0.0, reward_bound=0.0
45389: loss=0.000, reward_mean=0.1, reward_bound=0.0
45390: loss=0.000, reward_mean=0.1, reward_bound=0.0
45391: loss=0.000, reward_mean=0.1, reward_bound=0.0
45392: loss=0.000, reward_mean=0.1, reward_bound=0.0
45393: loss=0.000, reward_mean=0.1, reward_bound=0.0
45394: loss=0.000, reward_mean=0.1, reward_bound=0.0
45395: loss=0.000, reward_mean=0.1, reward_bound=0.0
45396: loss=0.000, reward_mean=0.1, reward_bound=0.0
45397: loss=0.000, reward_mean=0.1, reward_bound=0.0
45398: loss=0.000, reward_mean=0.0, reward_bound=0.0
45399: loss=0.000, reward_mean=0.0, reward_bound=0.0
45400: loss=0.000, reward_mean=0.1, reward_bound=0.0
45401: loss=0.000, reward_mean=0.0, reward_bou

45538: loss=0.000, reward_mean=0.0, reward_bound=0.0
45539: loss=0.000, reward_mean=0.1, reward_bound=0.0
45540: loss=0.000, reward_mean=0.0, reward_bound=0.0
45541: loss=0.000, reward_mean=0.0, reward_bound=0.0
45542: loss=0.000, reward_mean=0.1, reward_bound=0.0
45543: loss=0.000, reward_mean=0.0, reward_bound=0.0
45544: loss=0.000, reward_mean=0.1, reward_bound=0.0
45545: loss=0.000, reward_mean=0.1, reward_bound=0.0
45546: loss=0.000, reward_mean=0.1, reward_bound=0.0
45547: loss=0.000, reward_mean=0.0, reward_bound=0.0
45548: loss=0.000, reward_mean=0.2, reward_bound=0.0
45549: loss=0.000, reward_mean=0.0, reward_bound=0.0
45550: loss=0.000, reward_mean=0.1, reward_bound=0.0
45551: loss=0.000, reward_mean=0.1, reward_bound=0.0
45552: loss=0.000, reward_mean=0.0, reward_bound=0.0
45553: loss=0.000, reward_mean=0.1, reward_bound=0.0
45554: loss=0.000, reward_mean=0.1, reward_bound=0.0
45555: loss=0.000, reward_mean=0.1, reward_bound=0.0
45556: loss=0.000, reward_mean=0.1, reward_bou

45696: loss=0.000, reward_mean=0.0, reward_bound=0.0
45697: loss=0.000, reward_mean=0.0, reward_bound=0.0
45698: loss=0.000, reward_mean=0.0, reward_bound=0.0
45699: loss=0.000, reward_mean=0.2, reward_bound=0.0
45700: loss=0.000, reward_mean=0.1, reward_bound=0.0
45701: loss=0.000, reward_mean=0.1, reward_bound=0.0
45702: loss=0.000, reward_mean=0.1, reward_bound=0.0
45703: loss=0.000, reward_mean=0.0, reward_bound=0.0
45704: loss=0.000, reward_mean=0.0, reward_bound=0.0
45705: loss=0.000, reward_mean=0.0, reward_bound=0.0
45706: loss=0.000, reward_mean=0.1, reward_bound=0.0
45707: loss=0.000, reward_mean=0.0, reward_bound=0.0
45708: loss=0.000, reward_mean=0.1, reward_bound=0.0
45709: loss=0.000, reward_mean=0.1, reward_bound=0.0
45710: loss=0.000, reward_mean=0.0, reward_bound=0.0
45711: loss=0.000, reward_mean=0.1, reward_bound=0.0
45712: loss=0.000, reward_mean=0.1, reward_bound=0.0
45713: loss=0.000, reward_mean=0.1, reward_bound=0.0
45714: loss=0.000, reward_mean=0.0, reward_bou

45853: loss=0.000, reward_mean=0.0, reward_bound=0.0
45854: loss=0.000, reward_mean=0.1, reward_bound=0.0
45855: loss=0.000, reward_mean=0.1, reward_bound=0.0
45856: loss=0.000, reward_mean=0.0, reward_bound=0.0
45857: loss=0.000, reward_mean=0.0, reward_bound=0.0
45858: loss=0.000, reward_mean=0.1, reward_bound=0.0
45859: loss=0.000, reward_mean=0.1, reward_bound=0.0
45860: loss=0.000, reward_mean=0.0, reward_bound=0.0
45861: loss=0.000, reward_mean=0.2, reward_bound=0.0
45862: loss=0.000, reward_mean=0.0, reward_bound=0.0
45863: loss=0.000, reward_mean=0.0, reward_bound=0.0
45864: loss=0.000, reward_mean=0.1, reward_bound=0.0
45865: loss=0.000, reward_mean=0.1, reward_bound=0.0
45866: loss=0.000, reward_mean=0.1, reward_bound=0.0
45867: loss=0.000, reward_mean=0.1, reward_bound=0.0
45868: loss=0.000, reward_mean=0.1, reward_bound=0.0
45869: loss=0.000, reward_mean=0.1, reward_bound=0.0
45870: loss=0.000, reward_mean=0.1, reward_bound=0.0
45871: loss=0.000, reward_mean=0.1, reward_bou

46008: loss=0.000, reward_mean=0.0, reward_bound=0.0
46009: loss=0.000, reward_mean=0.1, reward_bound=0.0
46010: loss=0.000, reward_mean=0.1, reward_bound=0.0
46011: loss=0.000, reward_mean=0.1, reward_bound=0.0
46012: loss=0.000, reward_mean=0.0, reward_bound=0.0
46013: loss=0.000, reward_mean=0.1, reward_bound=0.0
46014: loss=0.000, reward_mean=0.0, reward_bound=0.0
46015: loss=0.000, reward_mean=0.1, reward_bound=0.0
46016: loss=0.000, reward_mean=0.1, reward_bound=0.0
46017: loss=0.000, reward_mean=0.0, reward_bound=0.0
46018: loss=0.000, reward_mean=0.0, reward_bound=0.0
46019: loss=0.000, reward_mean=0.1, reward_bound=0.0
46020: loss=0.000, reward_mean=0.0, reward_bound=0.0
46021: loss=0.000, reward_mean=0.0, reward_bound=0.0
46022: loss=0.000, reward_mean=0.1, reward_bound=0.0
46023: loss=0.000, reward_mean=0.0, reward_bound=0.0
46024: loss=0.000, reward_mean=0.1, reward_bound=0.0
46025: loss=0.000, reward_mean=0.0, reward_bound=0.0
46026: loss=0.000, reward_mean=0.0, reward_bou

46164: loss=0.000, reward_mean=0.1, reward_bound=0.0
46165: loss=0.000, reward_mean=0.1, reward_bound=0.0
46166: loss=0.000, reward_mean=0.1, reward_bound=0.0
46167: loss=0.000, reward_mean=0.1, reward_bound=0.0
46168: loss=0.000, reward_mean=0.1, reward_bound=0.0
46169: loss=0.000, reward_mean=0.0, reward_bound=0.0
46170: loss=0.000, reward_mean=0.0, reward_bound=0.0
46171: loss=0.000, reward_mean=0.0, reward_bound=0.0
46172: loss=0.000, reward_mean=0.1, reward_bound=0.0
46173: loss=0.000, reward_mean=0.1, reward_bound=0.0
46174: loss=0.000, reward_mean=0.1, reward_bound=0.0
46175: loss=0.000, reward_mean=0.1, reward_bound=0.0
46176: loss=0.000, reward_mean=0.1, reward_bound=0.0
46177: loss=0.000, reward_mean=0.1, reward_bound=0.0
46178: loss=0.000, reward_mean=0.1, reward_bound=0.0
46179: loss=0.000, reward_mean=0.1, reward_bound=0.0
46180: loss=0.000, reward_mean=0.0, reward_bound=0.0
46181: loss=0.000, reward_mean=0.1, reward_bound=0.0
46182: loss=0.000, reward_mean=0.1, reward_bou

46319: loss=0.000, reward_mean=0.2, reward_bound=0.0
46320: loss=0.000, reward_mean=0.0, reward_bound=0.0
46321: loss=0.000, reward_mean=0.0, reward_bound=0.0
46322: loss=0.000, reward_mean=0.1, reward_bound=0.0
46323: loss=0.000, reward_mean=0.0, reward_bound=0.0
46324: loss=0.000, reward_mean=0.0, reward_bound=0.0
46325: loss=0.000, reward_mean=0.0, reward_bound=0.0
46326: loss=0.000, reward_mean=0.0, reward_bound=0.0
46327: loss=0.000, reward_mean=0.1, reward_bound=0.0
46328: loss=0.000, reward_mean=0.0, reward_bound=0.0
46329: loss=0.000, reward_mean=0.1, reward_bound=0.0
46330: loss=0.000, reward_mean=0.1, reward_bound=0.0
46331: loss=0.000, reward_mean=0.0, reward_bound=0.0
46332: loss=0.000, reward_mean=0.0, reward_bound=0.0
46333: loss=0.000, reward_mean=0.1, reward_bound=0.0
46334: loss=0.000, reward_mean=0.1, reward_bound=0.0
46335: loss=0.000, reward_mean=0.1, reward_bound=0.0
46336: loss=0.000, reward_mean=0.0, reward_bound=0.0
46337: loss=0.000, reward_mean=0.1, reward_bou

46477: loss=0.000, reward_mean=0.1, reward_bound=0.0
46478: loss=0.000, reward_mean=0.1, reward_bound=0.0
46479: loss=0.000, reward_mean=0.0, reward_bound=0.0
46480: loss=0.000, reward_mean=0.1, reward_bound=0.0
46481: loss=0.000, reward_mean=0.1, reward_bound=0.0
46482: loss=0.000, reward_mean=0.1, reward_bound=0.0
46483: loss=0.000, reward_mean=0.2, reward_bound=0.0
46484: loss=0.000, reward_mean=0.0, reward_bound=0.0
46485: loss=0.000, reward_mean=0.1, reward_bound=0.0
46486: loss=0.000, reward_mean=0.0, reward_bound=0.0
46487: loss=0.000, reward_mean=0.0, reward_bound=0.0
46488: loss=0.000, reward_mean=0.1, reward_bound=0.0
46489: loss=0.000, reward_mean=0.0, reward_bound=0.0
46490: loss=0.000, reward_mean=0.1, reward_bound=0.0
46491: loss=0.000, reward_mean=0.1, reward_bound=0.0
46492: loss=0.000, reward_mean=0.0, reward_bound=0.0
46493: loss=0.000, reward_mean=0.1, reward_bound=0.0
46494: loss=0.000, reward_mean=0.0, reward_bound=0.0
46495: loss=0.000, reward_mean=0.2, reward_bou

46633: loss=0.000, reward_mean=0.1, reward_bound=0.0
46634: loss=0.000, reward_mean=0.0, reward_bound=0.0
46635: loss=0.000, reward_mean=0.1, reward_bound=0.0
46636: loss=0.000, reward_mean=0.0, reward_bound=0.0
46637: loss=0.000, reward_mean=0.1, reward_bound=0.0
46638: loss=0.000, reward_mean=0.1, reward_bound=0.0
46639: loss=0.000, reward_mean=0.1, reward_bound=0.0
46640: loss=0.000, reward_mean=0.0, reward_bound=0.0
46641: loss=0.000, reward_mean=0.0, reward_bound=0.0
46642: loss=0.000, reward_mean=0.1, reward_bound=0.0
46643: loss=0.000, reward_mean=0.1, reward_bound=0.0
46644: loss=0.000, reward_mean=0.0, reward_bound=0.0
46645: loss=0.000, reward_mean=0.2, reward_bound=0.0
46646: loss=0.000, reward_mean=0.2, reward_bound=0.0
46647: loss=0.000, reward_mean=0.1, reward_bound=0.0
46648: loss=0.000, reward_mean=0.1, reward_bound=0.0
46649: loss=0.000, reward_mean=0.0, reward_bound=0.0
46650: loss=0.000, reward_mean=0.0, reward_bound=0.0
46651: loss=0.000, reward_mean=0.1, reward_bou

46792: loss=0.000, reward_mean=0.0, reward_bound=0.0
46793: loss=0.000, reward_mean=0.0, reward_bound=0.0
46794: loss=0.000, reward_mean=0.1, reward_bound=0.0
46795: loss=0.000, reward_mean=0.0, reward_bound=0.0
46796: loss=0.000, reward_mean=0.1, reward_bound=0.0
46797: loss=0.000, reward_mean=0.0, reward_bound=0.0
46798: loss=0.000, reward_mean=0.1, reward_bound=0.0
46799: loss=0.000, reward_mean=0.1, reward_bound=0.0
46800: loss=0.000, reward_mean=0.1, reward_bound=0.0
46801: loss=0.000, reward_mean=0.2, reward_bound=0.0
46802: loss=0.000, reward_mean=0.1, reward_bound=0.0
46803: loss=0.000, reward_mean=0.1, reward_bound=0.0
46804: loss=0.000, reward_mean=0.0, reward_bound=0.0
46805: loss=0.000, reward_mean=0.1, reward_bound=0.0
46806: loss=0.000, reward_mean=0.0, reward_bound=0.0
46807: loss=0.000, reward_mean=0.1, reward_bound=0.0
46808: loss=0.000, reward_mean=0.1, reward_bound=0.0
46809: loss=0.000, reward_mean=0.0, reward_bound=0.0
46810: loss=0.000, reward_mean=0.1, reward_bou

46954: loss=0.000, reward_mean=0.0, reward_bound=0.0
46955: loss=0.000, reward_mean=0.1, reward_bound=0.0
46956: loss=0.000, reward_mean=0.0, reward_bound=0.0
46957: loss=0.000, reward_mean=0.1, reward_bound=0.0
46958: loss=0.000, reward_mean=0.0, reward_bound=0.0
46959: loss=0.000, reward_mean=0.1, reward_bound=0.0
46960: loss=0.000, reward_mean=0.1, reward_bound=0.0
46961: loss=0.000, reward_mean=0.1, reward_bound=0.0
46962: loss=0.000, reward_mean=0.1, reward_bound=0.0
46963: loss=0.000, reward_mean=0.0, reward_bound=0.0
46964: loss=0.000, reward_mean=0.0, reward_bound=0.0
46965: loss=0.000, reward_mean=0.1, reward_bound=0.0
46966: loss=0.000, reward_mean=0.1, reward_bound=0.0
46967: loss=0.000, reward_mean=0.0, reward_bound=0.0
46968: loss=0.000, reward_mean=0.0, reward_bound=0.0
46969: loss=0.000, reward_mean=0.1, reward_bound=0.0
46970: loss=0.000, reward_mean=0.0, reward_bound=0.0
46971: loss=0.000, reward_mean=0.1, reward_bound=0.0
46972: loss=0.000, reward_mean=0.1, reward_bou

47113: loss=0.000, reward_mean=0.2, reward_bound=0.0
47114: loss=0.000, reward_mean=0.1, reward_bound=0.0
47115: loss=0.000, reward_mean=0.1, reward_bound=0.0
47116: loss=0.000, reward_mean=0.0, reward_bound=0.0
47117: loss=0.000, reward_mean=0.1, reward_bound=0.0
47118: loss=0.000, reward_mean=0.0, reward_bound=0.0
47119: loss=0.000, reward_mean=0.0, reward_bound=0.0
47120: loss=0.000, reward_mean=0.1, reward_bound=0.0
47121: loss=0.000, reward_mean=0.1, reward_bound=0.0
47122: loss=0.000, reward_mean=0.2, reward_bound=0.0
47123: loss=0.000, reward_mean=0.1, reward_bound=0.0
47124: loss=0.000, reward_mean=0.0, reward_bound=0.0
47125: loss=0.000, reward_mean=0.1, reward_bound=0.0
47126: loss=0.000, reward_mean=0.1, reward_bound=0.0
47127: loss=0.000, reward_mean=0.0, reward_bound=0.0
47128: loss=0.000, reward_mean=0.0, reward_bound=0.0
47129: loss=0.000, reward_mean=0.1, reward_bound=0.0
47130: loss=0.000, reward_mean=0.1, reward_bound=0.0
47131: loss=0.000, reward_mean=0.1, reward_bou

47268: loss=0.000, reward_mean=0.1, reward_bound=0.0
47269: loss=0.000, reward_mean=0.0, reward_bound=0.0
47270: loss=0.000, reward_mean=0.0, reward_bound=0.0
47271: loss=0.000, reward_mean=0.1, reward_bound=0.0
47272: loss=0.000, reward_mean=0.1, reward_bound=0.0
47273: loss=0.000, reward_mean=0.2, reward_bound=0.0
47274: loss=0.000, reward_mean=0.2, reward_bound=0.0
47275: loss=0.000, reward_mean=0.0, reward_bound=0.0
47276: loss=0.000, reward_mean=0.0, reward_bound=0.0
47277: loss=0.000, reward_mean=0.1, reward_bound=0.0
47278: loss=0.000, reward_mean=0.1, reward_bound=0.0
47279: loss=0.000, reward_mean=0.0, reward_bound=0.0
47280: loss=0.000, reward_mean=0.0, reward_bound=0.0
47281: loss=0.000, reward_mean=0.1, reward_bound=0.0
47282: loss=0.000, reward_mean=0.0, reward_bound=0.0
47283: loss=0.000, reward_mean=0.0, reward_bound=0.0
47284: loss=0.000, reward_mean=0.0, reward_bound=0.0
47285: loss=0.000, reward_mean=0.1, reward_bound=0.0
47286: loss=0.000, reward_mean=0.0, reward_bou

47425: loss=0.000, reward_mean=0.1, reward_bound=0.0
47426: loss=0.000, reward_mean=0.1, reward_bound=0.0
47427: loss=0.000, reward_mean=0.1, reward_bound=0.0
47428: loss=0.000, reward_mean=0.1, reward_bound=0.0
47429: loss=0.000, reward_mean=0.1, reward_bound=0.0
47430: loss=0.000, reward_mean=0.1, reward_bound=0.0
47431: loss=0.000, reward_mean=0.2, reward_bound=0.0
47432: loss=0.000, reward_mean=0.0, reward_bound=0.0
47433: loss=0.000, reward_mean=0.0, reward_bound=0.0
47434: loss=0.000, reward_mean=0.0, reward_bound=0.0
47435: loss=0.000, reward_mean=0.2, reward_bound=0.0
47436: loss=0.000, reward_mean=0.0, reward_bound=0.0
47437: loss=0.000, reward_mean=0.0, reward_bound=0.0
47438: loss=0.000, reward_mean=0.1, reward_bound=0.0
47439: loss=0.000, reward_mean=0.0, reward_bound=0.0
47440: loss=0.000, reward_mean=0.1, reward_bound=0.0
47441: loss=0.000, reward_mean=0.1, reward_bound=0.0
47442: loss=0.000, reward_mean=0.1, reward_bound=0.0
47443: loss=0.000, reward_mean=0.1, reward_bou

47581: loss=0.000, reward_mean=0.0, reward_bound=0.0
47582: loss=0.000, reward_mean=0.0, reward_bound=0.0
47583: loss=0.000, reward_mean=0.0, reward_bound=0.0
47584: loss=0.000, reward_mean=0.1, reward_bound=0.0
47585: loss=0.000, reward_mean=0.0, reward_bound=0.0
47586: loss=0.000, reward_mean=0.0, reward_bound=0.0
47587: loss=0.000, reward_mean=0.0, reward_bound=0.0
47588: loss=0.000, reward_mean=0.1, reward_bound=0.0
47589: loss=0.000, reward_mean=0.1, reward_bound=0.0
47590: loss=0.000, reward_mean=0.1, reward_bound=0.0
47591: loss=0.000, reward_mean=0.0, reward_bound=0.0
47592: loss=0.000, reward_mean=0.0, reward_bound=0.0
47593: loss=0.000, reward_mean=0.1, reward_bound=0.0
47594: loss=0.000, reward_mean=0.2, reward_bound=0.0
47595: loss=0.000, reward_mean=0.1, reward_bound=0.0
47596: loss=0.000, reward_mean=0.1, reward_bound=0.0
47597: loss=0.000, reward_mean=0.0, reward_bound=0.0
47598: loss=0.000, reward_mean=0.0, reward_bound=0.0
47599: loss=0.000, reward_mean=0.1, reward_bou

47740: loss=0.000, reward_mean=0.0, reward_bound=0.0
47741: loss=0.000, reward_mean=0.1, reward_bound=0.0
47742: loss=0.000, reward_mean=0.0, reward_bound=0.0
47743: loss=0.000, reward_mean=0.0, reward_bound=0.0
47744: loss=0.000, reward_mean=0.0, reward_bound=0.0
47745: loss=0.000, reward_mean=0.1, reward_bound=0.0
47746: loss=0.000, reward_mean=0.0, reward_bound=0.0
47747: loss=0.000, reward_mean=0.1, reward_bound=0.0
47748: loss=0.000, reward_mean=0.0, reward_bound=0.0
47749: loss=0.000, reward_mean=0.1, reward_bound=0.0
47750: loss=0.000, reward_mean=0.1, reward_bound=0.0
47751: loss=0.000, reward_mean=0.1, reward_bound=0.0
47752: loss=0.000, reward_mean=0.1, reward_bound=0.0
47753: loss=0.000, reward_mean=0.0, reward_bound=0.0
47754: loss=0.000, reward_mean=0.1, reward_bound=0.0
47755: loss=0.000, reward_mean=0.1, reward_bound=0.0
47756: loss=0.000, reward_mean=0.0, reward_bound=0.0
47757: loss=0.000, reward_mean=0.2, reward_bound=0.0
47758: loss=0.000, reward_mean=0.0, reward_bou

47895: loss=0.000, reward_mean=0.1, reward_bound=0.0
47896: loss=0.000, reward_mean=0.1, reward_bound=0.0
47897: loss=0.000, reward_mean=0.1, reward_bound=0.0
47898: loss=0.000, reward_mean=0.1, reward_bound=0.0
47899: loss=0.000, reward_mean=0.0, reward_bound=0.0
47900: loss=0.000, reward_mean=0.1, reward_bound=0.0
47901: loss=0.000, reward_mean=0.1, reward_bound=0.0
47902: loss=0.000, reward_mean=0.1, reward_bound=0.0
47903: loss=0.000, reward_mean=0.0, reward_bound=0.0
47904: loss=0.000, reward_mean=0.0, reward_bound=0.0
47905: loss=0.000, reward_mean=0.1, reward_bound=0.0
47906: loss=0.000, reward_mean=0.0, reward_bound=0.0
47907: loss=0.000, reward_mean=0.1, reward_bound=0.0
47908: loss=0.000, reward_mean=0.0, reward_bound=0.0
47909: loss=0.000, reward_mean=0.0, reward_bound=0.0
47910: loss=0.000, reward_mean=0.1, reward_bound=0.0
47911: loss=0.000, reward_mean=0.1, reward_bound=0.0
47912: loss=0.000, reward_mean=0.1, reward_bound=0.0
47913: loss=0.000, reward_mean=0.1, reward_bou

48051: loss=0.000, reward_mean=0.0, reward_bound=0.0
48052: loss=0.000, reward_mean=0.1, reward_bound=0.0
48053: loss=0.000, reward_mean=0.0, reward_bound=0.0
48054: loss=0.000, reward_mean=0.1, reward_bound=0.0
48055: loss=0.000, reward_mean=0.1, reward_bound=0.0
48056: loss=0.000, reward_mean=0.1, reward_bound=0.0
48057: loss=0.000, reward_mean=0.0, reward_bound=0.0
48058: loss=0.000, reward_mean=0.0, reward_bound=0.0
48059: loss=0.000, reward_mean=0.0, reward_bound=0.0
48060: loss=0.000, reward_mean=0.2, reward_bound=0.0
48061: loss=0.000, reward_mean=0.1, reward_bound=0.0
48062: loss=0.000, reward_mean=0.1, reward_bound=0.0
48063: loss=0.000, reward_mean=0.1, reward_bound=0.0
48064: loss=0.000, reward_mean=0.0, reward_bound=0.0
48065: loss=0.000, reward_mean=0.2, reward_bound=0.0
48066: loss=0.000, reward_mean=0.1, reward_bound=0.0
48067: loss=0.000, reward_mean=0.1, reward_bound=0.0
48068: loss=0.000, reward_mean=0.1, reward_bound=0.0
48069: loss=0.000, reward_mean=0.0, reward_bou

48207: loss=0.000, reward_mean=0.0, reward_bound=0.0
48208: loss=0.000, reward_mean=0.1, reward_bound=0.0
48209: loss=0.000, reward_mean=0.1, reward_bound=0.0
48210: loss=0.000, reward_mean=0.1, reward_bound=0.0
48211: loss=0.000, reward_mean=0.0, reward_bound=0.0
48212: loss=0.000, reward_mean=0.1, reward_bound=0.0
48213: loss=0.000, reward_mean=0.1, reward_bound=0.0
48214: loss=0.000, reward_mean=0.0, reward_bound=0.0
48215: loss=0.000, reward_mean=0.0, reward_bound=0.0
48216: loss=0.000, reward_mean=0.1, reward_bound=0.0
48217: loss=0.000, reward_mean=0.1, reward_bound=0.0
48218: loss=0.000, reward_mean=0.0, reward_bound=0.0
48219: loss=0.000, reward_mean=0.1, reward_bound=0.0
48220: loss=0.000, reward_mean=0.0, reward_bound=0.0
48221: loss=0.000, reward_mean=0.1, reward_bound=0.0
48222: loss=0.000, reward_mean=0.1, reward_bound=0.0
48223: loss=0.000, reward_mean=0.0, reward_bound=0.0
48224: loss=0.000, reward_mean=0.0, reward_bound=0.0
48225: loss=0.000, reward_mean=0.0, reward_bou

48365: loss=0.000, reward_mean=0.0, reward_bound=0.0
48366: loss=0.000, reward_mean=0.1, reward_bound=0.0
48367: loss=0.000, reward_mean=0.1, reward_bound=0.0
48368: loss=0.000, reward_mean=0.1, reward_bound=0.0
48369: loss=0.000, reward_mean=0.0, reward_bound=0.0
48370: loss=0.000, reward_mean=0.1, reward_bound=0.0
48371: loss=0.000, reward_mean=0.0, reward_bound=0.0
48372: loss=0.000, reward_mean=0.0, reward_bound=0.0
48373: loss=0.000, reward_mean=0.1, reward_bound=0.0
48374: loss=0.000, reward_mean=0.0, reward_bound=0.0
48375: loss=0.000, reward_mean=0.0, reward_bound=0.0
48376: loss=0.000, reward_mean=0.1, reward_bound=0.0
48377: loss=0.000, reward_mean=0.1, reward_bound=0.0
48378: loss=0.000, reward_mean=0.0, reward_bound=0.0
48379: loss=0.000, reward_mean=0.1, reward_bound=0.0
48380: loss=0.000, reward_mean=0.0, reward_bound=0.0
48381: loss=0.000, reward_mean=0.0, reward_bound=0.0
48382: loss=0.000, reward_mean=0.1, reward_bound=0.0
48383: loss=0.000, reward_mean=0.2, reward_bou

48524: loss=0.000, reward_mean=0.1, reward_bound=0.0
48525: loss=0.000, reward_mean=0.0, reward_bound=0.0
48526: loss=0.000, reward_mean=0.1, reward_bound=0.0
48527: loss=0.000, reward_mean=0.1, reward_bound=0.0
48528: loss=0.000, reward_mean=0.0, reward_bound=0.0
48529: loss=0.000, reward_mean=0.0, reward_bound=0.0
48530: loss=0.000, reward_mean=0.1, reward_bound=0.0
48531: loss=0.000, reward_mean=0.1, reward_bound=0.0
48532: loss=0.000, reward_mean=0.0, reward_bound=0.0
48533: loss=0.000, reward_mean=0.0, reward_bound=0.0
48534: loss=0.000, reward_mean=0.1, reward_bound=0.0
48535: loss=0.000, reward_mean=0.1, reward_bound=0.0
48536: loss=0.000, reward_mean=0.1, reward_bound=0.0
48537: loss=0.000, reward_mean=0.1, reward_bound=0.0
48538: loss=0.000, reward_mean=0.1, reward_bound=0.0
48539: loss=0.000, reward_mean=0.0, reward_bound=0.0
48540: loss=0.000, reward_mean=0.0, reward_bound=0.0
48541: loss=0.000, reward_mean=0.1, reward_bound=0.0
48542: loss=0.000, reward_mean=0.1, reward_bou

48682: loss=0.000, reward_mean=0.1, reward_bound=0.0
48683: loss=0.000, reward_mean=0.2, reward_bound=0.0
48684: loss=0.000, reward_mean=0.0, reward_bound=0.0
48685: loss=0.000, reward_mean=0.0, reward_bound=0.0
48686: loss=0.000, reward_mean=0.0, reward_bound=0.0
48687: loss=0.000, reward_mean=0.1, reward_bound=0.0
48688: loss=0.000, reward_mean=0.0, reward_bound=0.0
48689: loss=0.000, reward_mean=0.2, reward_bound=0.0
48690: loss=0.000, reward_mean=0.1, reward_bound=0.0
48691: loss=0.000, reward_mean=0.1, reward_bound=0.0
48692: loss=0.000, reward_mean=0.1, reward_bound=0.0
48693: loss=0.000, reward_mean=0.0, reward_bound=0.0
48694: loss=0.000, reward_mean=0.1, reward_bound=0.0
48695: loss=0.000, reward_mean=0.0, reward_bound=0.0
48696: loss=0.000, reward_mean=0.0, reward_bound=0.0
48697: loss=0.000, reward_mean=0.2, reward_bound=0.0
48698: loss=0.000, reward_mean=0.1, reward_bound=0.0
48699: loss=0.000, reward_mean=0.0, reward_bound=0.0
48700: loss=0.000, reward_mean=0.1, reward_bou

48837: loss=0.000, reward_mean=0.1, reward_bound=0.0
48838: loss=0.000, reward_mean=0.0, reward_bound=0.0
48839: loss=0.000, reward_mean=0.2, reward_bound=0.0
48840: loss=0.000, reward_mean=0.0, reward_bound=0.0
48841: loss=0.000, reward_mean=0.1, reward_bound=0.0
48842: loss=0.000, reward_mean=0.2, reward_bound=0.0
48843: loss=0.000, reward_mean=0.1, reward_bound=0.0
48844: loss=0.000, reward_mean=0.0, reward_bound=0.0
48845: loss=0.000, reward_mean=0.1, reward_bound=0.0
48846: loss=0.000, reward_mean=0.1, reward_bound=0.0
48847: loss=0.000, reward_mean=0.0, reward_bound=0.0
48848: loss=0.000, reward_mean=0.1, reward_bound=0.0
48849: loss=0.000, reward_mean=0.0, reward_bound=0.0
48850: loss=0.000, reward_mean=0.1, reward_bound=0.0
48851: loss=0.000, reward_mean=0.0, reward_bound=0.0
48852: loss=0.000, reward_mean=0.0, reward_bound=0.0
48853: loss=0.000, reward_mean=0.0, reward_bound=0.0
48854: loss=0.000, reward_mean=0.1, reward_bound=0.0
48855: loss=0.000, reward_mean=0.1, reward_bou

48995: loss=0.000, reward_mean=0.1, reward_bound=0.0
48996: loss=0.000, reward_mean=0.1, reward_bound=0.0
48997: loss=0.000, reward_mean=0.0, reward_bound=0.0
48998: loss=0.000, reward_mean=0.0, reward_bound=0.0
48999: loss=0.000, reward_mean=0.0, reward_bound=0.0
49000: loss=0.000, reward_mean=0.0, reward_bound=0.0
49001: loss=0.000, reward_mean=0.1, reward_bound=0.0
49002: loss=0.000, reward_mean=0.0, reward_bound=0.0
49003: loss=0.000, reward_mean=0.0, reward_bound=0.0
49004: loss=0.000, reward_mean=0.0, reward_bound=0.0
49005: loss=0.000, reward_mean=0.0, reward_bound=0.0
49006: loss=0.000, reward_mean=0.0, reward_bound=0.0
49007: loss=0.000, reward_mean=0.0, reward_bound=0.0
49008: loss=0.000, reward_mean=0.0, reward_bound=0.0
49009: loss=0.000, reward_mean=0.1, reward_bound=0.0
49010: loss=0.000, reward_mean=0.2, reward_bound=0.0
49011: loss=0.000, reward_mean=0.0, reward_bound=0.0
49012: loss=0.000, reward_mean=0.1, reward_bound=0.0
49013: loss=0.000, reward_mean=0.0, reward_bou

49150: loss=0.000, reward_mean=0.2, reward_bound=0.0
49151: loss=0.000, reward_mean=0.1, reward_bound=0.0
49152: loss=0.000, reward_mean=0.0, reward_bound=0.0
49153: loss=0.000, reward_mean=0.0, reward_bound=0.0
49154: loss=0.000, reward_mean=0.1, reward_bound=0.0
49155: loss=0.000, reward_mean=0.0, reward_bound=0.0
49156: loss=0.000, reward_mean=0.0, reward_bound=0.0
49157: loss=0.000, reward_mean=0.1, reward_bound=0.0
49158: loss=0.000, reward_mean=0.1, reward_bound=0.0
49159: loss=0.000, reward_mean=0.1, reward_bound=0.0
49160: loss=0.000, reward_mean=0.0, reward_bound=0.0
49161: loss=0.000, reward_mean=0.1, reward_bound=0.0
49162: loss=0.000, reward_mean=0.1, reward_bound=0.0
49163: loss=0.000, reward_mean=0.1, reward_bound=0.0
49164: loss=0.000, reward_mean=0.1, reward_bound=0.0
49165: loss=0.000, reward_mean=0.0, reward_bound=0.0
49166: loss=0.000, reward_mean=0.2, reward_bound=0.0
49167: loss=0.000, reward_mean=0.1, reward_bound=0.0
49168: loss=0.000, reward_mean=0.1, reward_bou

49310: loss=0.000, reward_mean=0.1, reward_bound=0.0
49311: loss=0.000, reward_mean=0.1, reward_bound=0.0
49312: loss=0.000, reward_mean=0.2, reward_bound=0.0
49313: loss=0.000, reward_mean=0.0, reward_bound=0.0
49314: loss=0.000, reward_mean=0.0, reward_bound=0.0
49315: loss=0.000, reward_mean=0.1, reward_bound=0.0
49316: loss=0.000, reward_mean=0.3, reward_bound=0.5
49317: loss=0.000, reward_mean=0.1, reward_bound=0.0
49318: loss=0.000, reward_mean=0.1, reward_bound=0.0
49319: loss=0.000, reward_mean=0.0, reward_bound=0.0
49320: loss=0.000, reward_mean=0.0, reward_bound=0.0
49321: loss=0.000, reward_mean=0.1, reward_bound=0.0
49322: loss=0.000, reward_mean=0.1, reward_bound=0.0
49323: loss=0.000, reward_mean=0.0, reward_bound=0.0
49324: loss=0.000, reward_mean=0.1, reward_bound=0.0
49325: loss=0.000, reward_mean=0.1, reward_bound=0.0
49326: loss=0.000, reward_mean=0.0, reward_bound=0.0
49327: loss=0.000, reward_mean=0.1, reward_bound=0.0
49328: loss=0.000, reward_mean=0.1, reward_bou

49468: loss=0.000, reward_mean=0.1, reward_bound=0.0
49469: loss=0.000, reward_mean=0.0, reward_bound=0.0
49470: loss=0.000, reward_mean=0.1, reward_bound=0.0
49471: loss=0.000, reward_mean=0.1, reward_bound=0.0
49472: loss=0.000, reward_mean=0.0, reward_bound=0.0
49473: loss=0.000, reward_mean=0.1, reward_bound=0.0
49474: loss=0.000, reward_mean=0.1, reward_bound=0.0
49475: loss=0.000, reward_mean=0.1, reward_bound=0.0
49476: loss=0.000, reward_mean=0.2, reward_bound=0.0
49477: loss=0.000, reward_mean=0.0, reward_bound=0.0
49478: loss=0.000, reward_mean=0.1, reward_bound=0.0
49479: loss=0.000, reward_mean=0.0, reward_bound=0.0
49480: loss=0.000, reward_mean=0.0, reward_bound=0.0
49481: loss=0.000, reward_mean=0.1, reward_bound=0.0
49482: loss=0.000, reward_mean=0.2, reward_bound=0.0
49483: loss=0.000, reward_mean=0.1, reward_bound=0.0
49484: loss=0.000, reward_mean=0.0, reward_bound=0.0
49485: loss=0.000, reward_mean=0.1, reward_bound=0.0
49486: loss=0.000, reward_mean=0.1, reward_bou

49625: loss=0.000, reward_mean=0.1, reward_bound=0.0
49626: loss=0.000, reward_mean=0.0, reward_bound=0.0
49627: loss=0.000, reward_mean=0.0, reward_bound=0.0
49628: loss=0.000, reward_mean=0.0, reward_bound=0.0
49629: loss=0.000, reward_mean=0.1, reward_bound=0.0
49630: loss=0.000, reward_mean=0.0, reward_bound=0.0
49631: loss=0.000, reward_mean=0.1, reward_bound=0.0
49632: loss=0.000, reward_mean=0.1, reward_bound=0.0
49633: loss=0.000, reward_mean=0.0, reward_bound=0.0
49634: loss=0.000, reward_mean=0.0, reward_bound=0.0
49635: loss=0.000, reward_mean=0.0, reward_bound=0.0
49636: loss=0.000, reward_mean=0.1, reward_bound=0.0
49637: loss=0.000, reward_mean=0.1, reward_bound=0.0
49638: loss=0.000, reward_mean=0.0, reward_bound=0.0
49639: loss=0.000, reward_mean=0.1, reward_bound=0.0
49640: loss=0.000, reward_mean=0.0, reward_bound=0.0
49641: loss=0.000, reward_mean=0.0, reward_bound=0.0
49642: loss=0.000, reward_mean=0.1, reward_bound=0.0
49643: loss=0.000, reward_mean=0.1, reward_bou

49780: loss=0.000, reward_mean=0.0, reward_bound=0.0
49781: loss=0.000, reward_mean=0.0, reward_bound=0.0
49782: loss=0.000, reward_mean=0.0, reward_bound=0.0
49783: loss=0.000, reward_mean=0.1, reward_bound=0.0
49784: loss=0.000, reward_mean=0.0, reward_bound=0.0
49785: loss=0.000, reward_mean=0.0, reward_bound=0.0
49786: loss=0.000, reward_mean=0.0, reward_bound=0.0
49787: loss=0.000, reward_mean=0.0, reward_bound=0.0
49788: loss=0.000, reward_mean=0.0, reward_bound=0.0
49789: loss=0.000, reward_mean=0.0, reward_bound=0.0
49790: loss=0.000, reward_mean=0.0, reward_bound=0.0
49791: loss=0.000, reward_mean=0.1, reward_bound=0.0
49792: loss=0.000, reward_mean=0.1, reward_bound=0.0
49793: loss=0.000, reward_mean=0.2, reward_bound=0.0
49794: loss=0.000, reward_mean=0.0, reward_bound=0.0
49795: loss=0.000, reward_mean=0.2, reward_bound=0.0
49796: loss=0.000, reward_mean=0.1, reward_bound=0.0
49797: loss=0.000, reward_mean=0.1, reward_bound=0.0
49798: loss=0.000, reward_mean=0.0, reward_bou

49935: loss=0.000, reward_mean=0.0, reward_bound=0.0
49936: loss=0.000, reward_mean=0.0, reward_bound=0.0
49937: loss=0.000, reward_mean=0.0, reward_bound=0.0
49938: loss=0.000, reward_mean=0.0, reward_bound=0.0
49939: loss=0.000, reward_mean=0.1, reward_bound=0.0
49940: loss=0.000, reward_mean=0.1, reward_bound=0.0
49941: loss=0.000, reward_mean=0.0, reward_bound=0.0
49942: loss=0.000, reward_mean=0.1, reward_bound=0.0
49943: loss=0.000, reward_mean=0.1, reward_bound=0.0
49944: loss=0.000, reward_mean=0.1, reward_bound=0.0
49945: loss=0.000, reward_mean=0.0, reward_bound=0.0
49946: loss=0.000, reward_mean=0.0, reward_bound=0.0
49947: loss=0.000, reward_mean=0.1, reward_bound=0.0
49948: loss=0.000, reward_mean=0.0, reward_bound=0.0
49949: loss=0.000, reward_mean=0.1, reward_bound=0.0
49950: loss=0.000, reward_mean=0.1, reward_bound=0.0
49951: loss=0.000, reward_mean=0.0, reward_bound=0.0
49952: loss=0.000, reward_mean=0.1, reward_bound=0.0
49953: loss=0.000, reward_mean=0.1, reward_bou

50091: loss=0.000, reward_mean=0.0, reward_bound=0.0
50092: loss=0.000, reward_mean=0.1, reward_bound=0.0
50093: loss=0.000, reward_mean=0.0, reward_bound=0.0
50094: loss=0.000, reward_mean=0.2, reward_bound=0.0
50095: loss=0.000, reward_mean=0.1, reward_bound=0.0
50096: loss=0.000, reward_mean=0.1, reward_bound=0.0
50097: loss=0.000, reward_mean=0.0, reward_bound=0.0
50098: loss=0.000, reward_mean=0.1, reward_bound=0.0
50099: loss=0.000, reward_mean=0.1, reward_bound=0.0
50100: loss=0.000, reward_mean=0.1, reward_bound=0.0
50101: loss=0.000, reward_mean=0.1, reward_bound=0.0
50102: loss=0.000, reward_mean=0.1, reward_bound=0.0
50103: loss=0.000, reward_mean=0.1, reward_bound=0.0
50104: loss=0.000, reward_mean=0.0, reward_bound=0.0
50105: loss=0.000, reward_mean=0.1, reward_bound=0.0
50106: loss=0.000, reward_mean=0.0, reward_bound=0.0
50107: loss=0.000, reward_mean=0.1, reward_bound=0.0
50108: loss=0.000, reward_mean=0.0, reward_bound=0.0
50109: loss=0.000, reward_mean=0.1, reward_bou

50247: loss=0.000, reward_mean=0.1, reward_bound=0.0
50248: loss=0.000, reward_mean=0.0, reward_bound=0.0
50249: loss=0.000, reward_mean=0.1, reward_bound=0.0
50250: loss=0.000, reward_mean=0.1, reward_bound=0.0
50251: loss=0.000, reward_mean=0.1, reward_bound=0.0
50252: loss=0.000, reward_mean=0.2, reward_bound=0.0
50253: loss=0.000, reward_mean=0.1, reward_bound=0.0
50254: loss=0.000, reward_mean=0.0, reward_bound=0.0
50255: loss=0.000, reward_mean=0.0, reward_bound=0.0
50256: loss=0.000, reward_mean=0.0, reward_bound=0.0
50257: loss=0.000, reward_mean=0.0, reward_bound=0.0
50258: loss=0.000, reward_mean=0.0, reward_bound=0.0
50259: loss=0.000, reward_mean=0.0, reward_bound=0.0
50260: loss=0.000, reward_mean=0.1, reward_bound=0.0
50261: loss=0.000, reward_mean=0.0, reward_bound=0.0
50262: loss=0.000, reward_mean=0.0, reward_bound=0.0
50263: loss=0.000, reward_mean=0.0, reward_bound=0.0
50264: loss=0.000, reward_mean=0.1, reward_bound=0.0
50265: loss=0.000, reward_mean=0.1, reward_bou

50404: loss=0.000, reward_mean=0.1, reward_bound=0.0
50405: loss=0.000, reward_mean=0.1, reward_bound=0.0
50406: loss=0.000, reward_mean=0.1, reward_bound=0.0
50407: loss=0.000, reward_mean=0.1, reward_bound=0.0
50408: loss=0.000, reward_mean=0.1, reward_bound=0.0
50409: loss=0.000, reward_mean=0.1, reward_bound=0.0
50410: loss=0.000, reward_mean=0.1, reward_bound=0.0
50411: loss=0.000, reward_mean=0.1, reward_bound=0.0
50412: loss=0.000, reward_mean=0.1, reward_bound=0.0
50413: loss=0.000, reward_mean=0.1, reward_bound=0.0
50414: loss=0.000, reward_mean=0.1, reward_bound=0.0
50415: loss=0.000, reward_mean=0.1, reward_bound=0.0
50416: loss=0.000, reward_mean=0.0, reward_bound=0.0
50417: loss=0.000, reward_mean=0.1, reward_bound=0.0
50418: loss=0.000, reward_mean=0.1, reward_bound=0.0
50419: loss=0.000, reward_mean=0.1, reward_bound=0.0
50420: loss=0.000, reward_mean=0.1, reward_bound=0.0
50421: loss=0.000, reward_mean=0.1, reward_bound=0.0
50422: loss=0.000, reward_mean=0.0, reward_bou

50562: loss=0.000, reward_mean=0.1, reward_bound=0.0
50563: loss=0.000, reward_mean=0.1, reward_bound=0.0
50564: loss=0.000, reward_mean=0.1, reward_bound=0.0
50565: loss=0.000, reward_mean=0.0, reward_bound=0.0
50566: loss=0.000, reward_mean=0.1, reward_bound=0.0
50567: loss=0.000, reward_mean=0.1, reward_bound=0.0
50568: loss=0.000, reward_mean=0.1, reward_bound=0.0
50569: loss=0.000, reward_mean=0.1, reward_bound=0.0
50570: loss=0.000, reward_mean=0.0, reward_bound=0.0
50571: loss=0.000, reward_mean=0.0, reward_bound=0.0
50572: loss=0.000, reward_mean=0.1, reward_bound=0.0
50573: loss=0.000, reward_mean=0.1, reward_bound=0.0
50574: loss=0.000, reward_mean=0.1, reward_bound=0.0
50575: loss=0.000, reward_mean=0.1, reward_bound=0.0
50576: loss=0.000, reward_mean=0.1, reward_bound=0.0
50577: loss=0.000, reward_mean=0.1, reward_bound=0.0
50578: loss=0.000, reward_mean=0.1, reward_bound=0.0
50579: loss=0.000, reward_mean=0.0, reward_bound=0.0
50580: loss=0.000, reward_mean=0.1, reward_bou

50718: loss=0.000, reward_mean=0.1, reward_bound=0.0
50719: loss=0.000, reward_mean=0.0, reward_bound=0.0
50720: loss=0.000, reward_mean=0.0, reward_bound=0.0
50721: loss=0.000, reward_mean=0.1, reward_bound=0.0
50722: loss=0.000, reward_mean=0.0, reward_bound=0.0
50723: loss=0.000, reward_mean=0.0, reward_bound=0.0
50724: loss=0.000, reward_mean=0.1, reward_bound=0.0
50725: loss=0.000, reward_mean=0.1, reward_bound=0.0
50726: loss=0.000, reward_mean=0.0, reward_bound=0.0
50727: loss=0.000, reward_mean=0.0, reward_bound=0.0
50728: loss=0.000, reward_mean=0.1, reward_bound=0.0
50729: loss=0.000, reward_mean=0.2, reward_bound=0.0
50730: loss=0.000, reward_mean=0.1, reward_bound=0.0
50731: loss=0.000, reward_mean=0.1, reward_bound=0.0
50732: loss=0.000, reward_mean=0.0, reward_bound=0.0
50733: loss=0.000, reward_mean=0.1, reward_bound=0.0
50734: loss=0.000, reward_mean=0.1, reward_bound=0.0
50735: loss=0.000, reward_mean=0.1, reward_bound=0.0
50736: loss=0.000, reward_mean=0.2, reward_bou

50873: loss=0.000, reward_mean=0.1, reward_bound=0.0
50874: loss=0.000, reward_mean=0.0, reward_bound=0.0
50875: loss=0.000, reward_mean=0.0, reward_bound=0.0
50876: loss=0.000, reward_mean=0.1, reward_bound=0.0
50877: loss=0.000, reward_mean=0.0, reward_bound=0.0
50878: loss=0.000, reward_mean=0.1, reward_bound=0.0
50879: loss=0.000, reward_mean=0.0, reward_bound=0.0
50880: loss=0.000, reward_mean=0.0, reward_bound=0.0
50881: loss=0.000, reward_mean=0.1, reward_bound=0.0
50882: loss=0.000, reward_mean=0.0, reward_bound=0.0
50883: loss=0.000, reward_mean=0.1, reward_bound=0.0
50884: loss=0.000, reward_mean=0.1, reward_bound=0.0
50885: loss=0.000, reward_mean=0.2, reward_bound=0.0
50886: loss=0.000, reward_mean=0.0, reward_bound=0.0
50887: loss=0.000, reward_mean=0.0, reward_bound=0.0
50888: loss=0.000, reward_mean=0.1, reward_bound=0.0
50889: loss=0.000, reward_mean=0.1, reward_bound=0.0
50890: loss=0.000, reward_mean=0.1, reward_bound=0.0
50891: loss=0.000, reward_mean=0.1, reward_bou

51033: loss=0.000, reward_mean=0.0, reward_bound=0.0
51034: loss=0.000, reward_mean=0.0, reward_bound=0.0
51035: loss=0.000, reward_mean=0.1, reward_bound=0.0
51036: loss=0.000, reward_mean=0.1, reward_bound=0.0
51037: loss=0.000, reward_mean=0.2, reward_bound=0.0
51038: loss=0.000, reward_mean=0.0, reward_bound=0.0
51039: loss=0.000, reward_mean=0.1, reward_bound=0.0
51040: loss=0.000, reward_mean=0.0, reward_bound=0.0
51041: loss=0.000, reward_mean=0.0, reward_bound=0.0
51042: loss=0.000, reward_mean=0.1, reward_bound=0.0
51043: loss=0.000, reward_mean=0.1, reward_bound=0.0
51044: loss=0.000, reward_mean=0.0, reward_bound=0.0
51045: loss=0.000, reward_mean=0.1, reward_bound=0.0
51046: loss=0.000, reward_mean=0.0, reward_bound=0.0
51047: loss=0.000, reward_mean=0.2, reward_bound=0.0
51048: loss=0.000, reward_mean=0.0, reward_bound=0.0
51049: loss=0.000, reward_mean=0.0, reward_bound=0.0
51050: loss=0.000, reward_mean=0.0, reward_bound=0.0
51051: loss=0.000, reward_mean=0.1, reward_bou

51190: loss=0.000, reward_mean=0.1, reward_bound=0.0
51191: loss=0.000, reward_mean=0.1, reward_bound=0.0
51192: loss=0.000, reward_mean=0.0, reward_bound=0.0
51193: loss=0.000, reward_mean=0.1, reward_bound=0.0
51194: loss=0.000, reward_mean=0.1, reward_bound=0.0
51195: loss=0.000, reward_mean=0.1, reward_bound=0.0
51196: loss=0.000, reward_mean=0.1, reward_bound=0.0
51197: loss=0.000, reward_mean=0.0, reward_bound=0.0
51198: loss=0.000, reward_mean=0.0, reward_bound=0.0
51199: loss=0.000, reward_mean=0.2, reward_bound=0.0
51200: loss=0.000, reward_mean=0.1, reward_bound=0.0
51201: loss=0.000, reward_mean=0.1, reward_bound=0.0
51202: loss=0.000, reward_mean=0.0, reward_bound=0.0
51203: loss=0.000, reward_mean=0.1, reward_bound=0.0
51204: loss=0.000, reward_mean=0.1, reward_bound=0.0
51205: loss=0.000, reward_mean=0.0, reward_bound=0.0
51206: loss=0.000, reward_mean=0.1, reward_bound=0.0
51207: loss=0.000, reward_mean=0.0, reward_bound=0.0
51208: loss=0.000, reward_mean=0.0, reward_bou

51347: loss=0.000, reward_mean=0.1, reward_bound=0.0
51348: loss=0.000, reward_mean=0.1, reward_bound=0.0
51349: loss=0.000, reward_mean=0.1, reward_bound=0.0
51350: loss=0.000, reward_mean=0.1, reward_bound=0.0
51351: loss=0.000, reward_mean=0.0, reward_bound=0.0
51352: loss=0.000, reward_mean=0.1, reward_bound=0.0
51353: loss=0.000, reward_mean=0.2, reward_bound=0.0
51354: loss=0.000, reward_mean=0.0, reward_bound=0.0
51355: loss=0.000, reward_mean=0.0, reward_bound=0.0
51356: loss=0.000, reward_mean=0.1, reward_bound=0.0
51357: loss=0.000, reward_mean=0.0, reward_bound=0.0
51358: loss=0.000, reward_mean=0.0, reward_bound=0.0
51359: loss=0.000, reward_mean=0.1, reward_bound=0.0
51360: loss=0.000, reward_mean=0.1, reward_bound=0.0
51361: loss=0.000, reward_mean=0.0, reward_bound=0.0
51362: loss=0.000, reward_mean=0.0, reward_bound=0.0
51363: loss=0.000, reward_mean=0.1, reward_bound=0.0
51364: loss=0.000, reward_mean=0.0, reward_bound=0.0
51365: loss=0.000, reward_mean=0.0, reward_bou

51503: loss=0.000, reward_mean=0.0, reward_bound=0.0
51504: loss=0.000, reward_mean=0.0, reward_bound=0.0
51505: loss=0.000, reward_mean=0.1, reward_bound=0.0
51506: loss=0.000, reward_mean=0.0, reward_bound=0.0
51507: loss=0.000, reward_mean=0.1, reward_bound=0.0
51508: loss=0.000, reward_mean=0.0, reward_bound=0.0
51509: loss=0.000, reward_mean=0.1, reward_bound=0.0
51510: loss=0.000, reward_mean=0.1, reward_bound=0.0
51511: loss=0.000, reward_mean=0.1, reward_bound=0.0
51512: loss=0.000, reward_mean=0.1, reward_bound=0.0
51513: loss=0.000, reward_mean=0.0, reward_bound=0.0
51514: loss=0.000, reward_mean=0.1, reward_bound=0.0
51515: loss=0.000, reward_mean=0.1, reward_bound=0.0
51516: loss=0.000, reward_mean=0.0, reward_bound=0.0
51517: loss=0.000, reward_mean=0.1, reward_bound=0.0
51518: loss=0.000, reward_mean=0.0, reward_bound=0.0
51519: loss=0.000, reward_mean=0.0, reward_bound=0.0
51520: loss=0.000, reward_mean=0.0, reward_bound=0.0
51521: loss=0.000, reward_mean=0.0, reward_bou

51662: loss=0.000, reward_mean=0.0, reward_bound=0.0
51663: loss=0.000, reward_mean=0.1, reward_bound=0.0
51664: loss=0.000, reward_mean=0.0, reward_bound=0.0
51665: loss=0.000, reward_mean=0.0, reward_bound=0.0
51666: loss=0.000, reward_mean=0.1, reward_bound=0.0
51667: loss=0.000, reward_mean=0.1, reward_bound=0.0
51668: loss=0.000, reward_mean=0.0, reward_bound=0.0
51669: loss=0.000, reward_mean=0.2, reward_bound=0.0
51670: loss=0.000, reward_mean=0.0, reward_bound=0.0
51671: loss=0.000, reward_mean=0.2, reward_bound=0.0
51672: loss=0.000, reward_mean=0.1, reward_bound=0.0
51673: loss=0.000, reward_mean=0.2, reward_bound=0.0
51674: loss=0.000, reward_mean=0.0, reward_bound=0.0
51675: loss=0.000, reward_mean=0.1, reward_bound=0.0
51676: loss=0.000, reward_mean=0.1, reward_bound=0.0
51677: loss=0.000, reward_mean=0.1, reward_bound=0.0
51678: loss=0.000, reward_mean=0.1, reward_bound=0.0
51679: loss=0.000, reward_mean=0.0, reward_bound=0.0
51680: loss=0.000, reward_mean=0.1, reward_bou

51824: loss=0.000, reward_mean=0.1, reward_bound=0.0
51825: loss=0.000, reward_mean=0.0, reward_bound=0.0
51826: loss=0.000, reward_mean=0.0, reward_bound=0.0
51827: loss=0.000, reward_mean=0.1, reward_bound=0.0
51828: loss=0.000, reward_mean=0.2, reward_bound=0.0
51829: loss=0.000, reward_mean=0.1, reward_bound=0.0
51830: loss=0.000, reward_mean=0.0, reward_bound=0.0
51831: loss=0.000, reward_mean=0.1, reward_bound=0.0
51832: loss=0.000, reward_mean=0.1, reward_bound=0.0
51833: loss=0.000, reward_mean=0.0, reward_bound=0.0
51834: loss=0.000, reward_mean=0.1, reward_bound=0.0
51835: loss=0.000, reward_mean=0.1, reward_bound=0.0
51836: loss=0.000, reward_mean=0.2, reward_bound=0.0
51837: loss=0.000, reward_mean=0.1, reward_bound=0.0
51838: loss=0.000, reward_mean=0.0, reward_bound=0.0
51839: loss=0.000, reward_mean=0.1, reward_bound=0.0
51840: loss=0.000, reward_mean=0.1, reward_bound=0.0
51841: loss=0.000, reward_mean=0.1, reward_bound=0.0
51842: loss=0.000, reward_mean=0.0, reward_bou

51980: loss=0.000, reward_mean=0.0, reward_bound=0.0
51981: loss=0.000, reward_mean=0.0, reward_bound=0.0
51982: loss=0.000, reward_mean=0.0, reward_bound=0.0
51983: loss=0.000, reward_mean=0.1, reward_bound=0.0
51984: loss=0.000, reward_mean=0.1, reward_bound=0.0
51985: loss=0.000, reward_mean=0.1, reward_bound=0.0
51986: loss=0.000, reward_mean=0.0, reward_bound=0.0
51987: loss=0.000, reward_mean=0.1, reward_bound=0.0
51988: loss=0.000, reward_mean=0.0, reward_bound=0.0
51989: loss=0.000, reward_mean=0.1, reward_bound=0.0
51990: loss=0.000, reward_mean=0.1, reward_bound=0.0
51991: loss=0.000, reward_mean=0.0, reward_bound=0.0
51992: loss=0.000, reward_mean=0.0, reward_bound=0.0
51993: loss=0.000, reward_mean=0.0, reward_bound=0.0
51994: loss=0.000, reward_mean=0.2, reward_bound=0.0
51995: loss=0.000, reward_mean=0.0, reward_bound=0.0
51996: loss=0.000, reward_mean=0.0, reward_bound=0.0
51997: loss=0.000, reward_mean=0.1, reward_bound=0.0
51998: loss=0.000, reward_mean=0.0, reward_bou

52135: loss=0.000, reward_mean=0.1, reward_bound=0.0
52136: loss=0.000, reward_mean=0.0, reward_bound=0.0
52137: loss=0.000, reward_mean=0.1, reward_bound=0.0
52138: loss=0.000, reward_mean=0.0, reward_bound=0.0
52139: loss=0.000, reward_mean=0.1, reward_bound=0.0
52140: loss=0.000, reward_mean=0.1, reward_bound=0.0
52141: loss=0.000, reward_mean=0.0, reward_bound=0.0
52142: loss=0.000, reward_mean=0.2, reward_bound=0.0
52143: loss=0.000, reward_mean=0.0, reward_bound=0.0
52144: loss=0.000, reward_mean=0.0, reward_bound=0.0
52145: loss=0.000, reward_mean=0.1, reward_bound=0.0
52146: loss=0.000, reward_mean=0.0, reward_bound=0.0
52147: loss=0.000, reward_mean=0.0, reward_bound=0.0
52148: loss=0.000, reward_mean=0.0, reward_bound=0.0
52149: loss=0.000, reward_mean=0.1, reward_bound=0.0
52150: loss=0.000, reward_mean=0.3, reward_bound=0.5
52151: loss=0.000, reward_mean=0.1, reward_bound=0.0
52152: loss=0.000, reward_mean=0.0, reward_bound=0.0
52153: loss=0.000, reward_mean=0.1, reward_bou

52291: loss=0.000, reward_mean=0.1, reward_bound=0.0
52292: loss=0.000, reward_mean=0.0, reward_bound=0.0
52293: loss=0.000, reward_mean=0.1, reward_bound=0.0
52294: loss=0.000, reward_mean=0.1, reward_bound=0.0
52295: loss=0.000, reward_mean=0.0, reward_bound=0.0
52296: loss=0.000, reward_mean=0.0, reward_bound=0.0
52297: loss=0.000, reward_mean=0.0, reward_bound=0.0
52298: loss=0.000, reward_mean=0.1, reward_bound=0.0
52299: loss=0.000, reward_mean=0.1, reward_bound=0.0
52300: loss=0.000, reward_mean=0.1, reward_bound=0.0
52301: loss=0.000, reward_mean=0.1, reward_bound=0.0
52302: loss=0.000, reward_mean=0.0, reward_bound=0.0
52303: loss=0.000, reward_mean=0.0, reward_bound=0.0
52304: loss=0.000, reward_mean=0.0, reward_bound=0.0
52305: loss=0.000, reward_mean=0.3, reward_bound=0.5
52306: loss=0.000, reward_mean=0.1, reward_bound=0.0
52307: loss=0.000, reward_mean=0.1, reward_bound=0.0
52308: loss=0.000, reward_mean=0.1, reward_bound=0.0
52309: loss=0.000, reward_mean=0.1, reward_bou

52447: loss=0.000, reward_mean=0.1, reward_bound=0.0
52448: loss=0.000, reward_mean=0.0, reward_bound=0.0
52449: loss=0.000, reward_mean=0.0, reward_bound=0.0
52450: loss=0.000, reward_mean=0.1, reward_bound=0.0
52451: loss=0.000, reward_mean=0.1, reward_bound=0.0
52452: loss=0.000, reward_mean=0.1, reward_bound=0.0
52453: loss=0.000, reward_mean=0.1, reward_bound=0.0
52454: loss=0.000, reward_mean=0.0, reward_bound=0.0
52455: loss=0.000, reward_mean=0.0, reward_bound=0.0
52456: loss=0.000, reward_mean=0.0, reward_bound=0.0
52457: loss=0.000, reward_mean=0.0, reward_bound=0.0
52458: loss=0.000, reward_mean=0.1, reward_bound=0.0
52459: loss=0.000, reward_mean=0.0, reward_bound=0.0
52460: loss=0.000, reward_mean=0.1, reward_bound=0.0
52461: loss=0.000, reward_mean=0.1, reward_bound=0.0
52462: loss=0.000, reward_mean=0.0, reward_bound=0.0
52463: loss=0.000, reward_mean=0.0, reward_bound=0.0
52464: loss=0.000, reward_mean=0.0, reward_bound=0.0
52465: loss=0.000, reward_mean=0.0, reward_bou

52606: loss=0.000, reward_mean=0.1, reward_bound=0.0
52607: loss=0.000, reward_mean=0.0, reward_bound=0.0
52608: loss=0.000, reward_mean=0.1, reward_bound=0.0
52609: loss=0.000, reward_mean=0.2, reward_bound=0.0
52610: loss=0.000, reward_mean=0.1, reward_bound=0.0
52611: loss=0.000, reward_mean=0.1, reward_bound=0.0
52612: loss=0.000, reward_mean=0.0, reward_bound=0.0
52613: loss=0.000, reward_mean=0.1, reward_bound=0.0
52614: loss=0.000, reward_mean=0.0, reward_bound=0.0
52615: loss=0.000, reward_mean=0.0, reward_bound=0.0
52616: loss=0.000, reward_mean=0.2, reward_bound=0.0
52617: loss=0.000, reward_mean=0.0, reward_bound=0.0
52618: loss=0.000, reward_mean=0.1, reward_bound=0.0
52619: loss=0.000, reward_mean=0.0, reward_bound=0.0
52620: loss=0.000, reward_mean=0.3, reward_bound=0.5
52621: loss=0.000, reward_mean=0.0, reward_bound=0.0
52622: loss=0.000, reward_mean=0.0, reward_bound=0.0
52623: loss=0.000, reward_mean=0.0, reward_bound=0.0
52624: loss=0.000, reward_mean=0.1, reward_bou

52764: loss=0.000, reward_mean=0.1, reward_bound=0.0
52765: loss=0.000, reward_mean=0.1, reward_bound=0.0
52766: loss=0.000, reward_mean=0.1, reward_bound=0.0
52767: loss=0.000, reward_mean=0.0, reward_bound=0.0
52768: loss=0.000, reward_mean=0.0, reward_bound=0.0
52769: loss=0.000, reward_mean=0.1, reward_bound=0.0
52770: loss=0.000, reward_mean=0.1, reward_bound=0.0
52771: loss=0.000, reward_mean=0.1, reward_bound=0.0
52772: loss=0.000, reward_mean=0.1, reward_bound=0.0
52773: loss=0.000, reward_mean=0.0, reward_bound=0.0
52774: loss=0.000, reward_mean=0.0, reward_bound=0.0
52775: loss=0.000, reward_mean=0.0, reward_bound=0.0
52776: loss=0.000, reward_mean=0.1, reward_bound=0.0
52777: loss=0.000, reward_mean=0.1, reward_bound=0.0
52778: loss=0.000, reward_mean=0.1, reward_bound=0.0
52779: loss=0.000, reward_mean=0.1, reward_bound=0.0
52780: loss=0.000, reward_mean=0.0, reward_bound=0.0
52781: loss=0.000, reward_mean=0.1, reward_bound=0.0
52782: loss=0.000, reward_mean=0.0, reward_bou

52922: loss=0.000, reward_mean=0.0, reward_bound=0.0
52923: loss=0.000, reward_mean=0.1, reward_bound=0.0
52924: loss=0.000, reward_mean=0.1, reward_bound=0.0
52925: loss=0.000, reward_mean=0.0, reward_bound=0.0
52926: loss=0.000, reward_mean=0.0, reward_bound=0.0
52927: loss=0.000, reward_mean=0.2, reward_bound=0.0
52928: loss=0.000, reward_mean=0.1, reward_bound=0.0
52929: loss=0.000, reward_mean=0.1, reward_bound=0.0
52930: loss=0.000, reward_mean=0.0, reward_bound=0.0
52931: loss=0.000, reward_mean=0.1, reward_bound=0.0
52932: loss=0.000, reward_mean=0.0, reward_bound=0.0
52933: loss=0.000, reward_mean=0.1, reward_bound=0.0
52934: loss=0.000, reward_mean=0.2, reward_bound=0.0
52935: loss=0.000, reward_mean=0.0, reward_bound=0.0
52936: loss=0.000, reward_mean=0.2, reward_bound=0.0
52937: loss=0.000, reward_mean=0.2, reward_bound=0.0
52938: loss=0.000, reward_mean=0.0, reward_bound=0.0
52939: loss=0.000, reward_mean=0.1, reward_bound=0.0
52940: loss=0.000, reward_mean=0.1, reward_bou

53081: loss=0.000, reward_mean=0.0, reward_bound=0.0
53082: loss=0.000, reward_mean=0.1, reward_bound=0.0
53083: loss=0.000, reward_mean=0.0, reward_bound=0.0
53084: loss=0.000, reward_mean=0.1, reward_bound=0.0
53085: loss=0.000, reward_mean=0.1, reward_bound=0.0
53086: loss=0.000, reward_mean=0.2, reward_bound=0.0
53087: loss=0.000, reward_mean=0.1, reward_bound=0.0
53088: loss=0.000, reward_mean=0.1, reward_bound=0.0
53089: loss=0.000, reward_mean=0.1, reward_bound=0.0
53090: loss=0.000, reward_mean=0.0, reward_bound=0.0
53091: loss=0.000, reward_mean=0.0, reward_bound=0.0
53092: loss=0.000, reward_mean=0.1, reward_bound=0.0
53093: loss=0.000, reward_mean=0.1, reward_bound=0.0
53094: loss=0.000, reward_mean=0.1, reward_bound=0.0
53095: loss=0.000, reward_mean=0.0, reward_bound=0.0
53096: loss=0.000, reward_mean=0.0, reward_bound=0.0
53097: loss=0.000, reward_mean=0.0, reward_bound=0.0
53098: loss=0.000, reward_mean=0.1, reward_bound=0.0
53099: loss=0.000, reward_mean=0.0, reward_bou

53240: loss=0.000, reward_mean=0.1, reward_bound=0.0
53241: loss=0.000, reward_mean=0.0, reward_bound=0.0
53242: loss=0.000, reward_mean=0.1, reward_bound=0.0
53243: loss=0.000, reward_mean=0.0, reward_bound=0.0
53244: loss=0.000, reward_mean=0.1, reward_bound=0.0
53245: loss=0.000, reward_mean=0.2, reward_bound=0.0
53246: loss=0.000, reward_mean=0.1, reward_bound=0.0
53247: loss=0.000, reward_mean=0.0, reward_bound=0.0
53248: loss=0.000, reward_mean=0.0, reward_bound=0.0
53249: loss=0.000, reward_mean=0.0, reward_bound=0.0
53250: loss=0.000, reward_mean=0.1, reward_bound=0.0
53251: loss=0.000, reward_mean=0.1, reward_bound=0.0
53252: loss=0.000, reward_mean=0.0, reward_bound=0.0
53253: loss=0.000, reward_mean=0.1, reward_bound=0.0
53254: loss=0.000, reward_mean=0.1, reward_bound=0.0
53255: loss=0.000, reward_mean=0.0, reward_bound=0.0
53256: loss=0.000, reward_mean=0.1, reward_bound=0.0
53257: loss=0.000, reward_mean=0.0, reward_bound=0.0
53258: loss=0.000, reward_mean=0.0, reward_bou

53400: loss=0.000, reward_mean=0.1, reward_bound=0.0
53401: loss=0.000, reward_mean=0.1, reward_bound=0.0
53402: loss=0.000, reward_mean=0.1, reward_bound=0.0
53403: loss=0.000, reward_mean=0.0, reward_bound=0.0
53404: loss=0.000, reward_mean=0.1, reward_bound=0.0
53405: loss=0.000, reward_mean=0.1, reward_bound=0.0
53406: loss=0.000, reward_mean=0.0, reward_bound=0.0
53407: loss=0.000, reward_mean=0.1, reward_bound=0.0
53408: loss=0.000, reward_mean=0.1, reward_bound=0.0
53409: loss=0.000, reward_mean=0.1, reward_bound=0.0
53410: loss=0.000, reward_mean=0.0, reward_bound=0.0
53411: loss=0.000, reward_mean=0.0, reward_bound=0.0
53412: loss=0.000, reward_mean=0.0, reward_bound=0.0
53413: loss=0.000, reward_mean=0.0, reward_bound=0.0
53414: loss=0.000, reward_mean=0.1, reward_bound=0.0
53415: loss=0.000, reward_mean=0.1, reward_bound=0.0
53416: loss=0.000, reward_mean=0.1, reward_bound=0.0
53417: loss=0.000, reward_mean=0.1, reward_bound=0.0
53418: loss=0.000, reward_mean=0.0, reward_bou

53559: loss=0.000, reward_mean=0.3, reward_bound=0.5
53560: loss=0.000, reward_mean=0.0, reward_bound=0.0
53561: loss=0.000, reward_mean=0.1, reward_bound=0.0
53562: loss=0.000, reward_mean=0.0, reward_bound=0.0
53563: loss=0.000, reward_mean=0.1, reward_bound=0.0
53564: loss=0.000, reward_mean=0.0, reward_bound=0.0
53565: loss=0.000, reward_mean=0.0, reward_bound=0.0
53566: loss=0.000, reward_mean=0.2, reward_bound=0.0
53567: loss=0.000, reward_mean=0.0, reward_bound=0.0
53568: loss=0.000, reward_mean=0.1, reward_bound=0.0
53569: loss=0.000, reward_mean=0.0, reward_bound=0.0
53570: loss=0.000, reward_mean=0.0, reward_bound=0.0
53571: loss=0.000, reward_mean=0.2, reward_bound=0.0
53572: loss=0.000, reward_mean=0.1, reward_bound=0.0
53573: loss=0.000, reward_mean=0.1, reward_bound=0.0
53574: loss=0.000, reward_mean=0.1, reward_bound=0.0
53575: loss=0.000, reward_mean=0.1, reward_bound=0.0
53576: loss=0.000, reward_mean=0.1, reward_bound=0.0
53577: loss=0.000, reward_mean=0.0, reward_bou

53719: loss=0.000, reward_mean=0.1, reward_bound=0.0
53720: loss=0.000, reward_mean=0.0, reward_bound=0.0
53721: loss=0.000, reward_mean=0.1, reward_bound=0.0
53722: loss=0.000, reward_mean=0.1, reward_bound=0.0
53723: loss=0.000, reward_mean=0.2, reward_bound=0.0
53724: loss=0.000, reward_mean=0.0, reward_bound=0.0
53725: loss=0.000, reward_mean=0.0, reward_bound=0.0
53726: loss=0.000, reward_mean=0.1, reward_bound=0.0
53727: loss=0.000, reward_mean=0.1, reward_bound=0.0
53728: loss=0.000, reward_mean=0.1, reward_bound=0.0
53729: loss=0.000, reward_mean=0.1, reward_bound=0.0
53730: loss=0.000, reward_mean=0.0, reward_bound=0.0
53731: loss=0.000, reward_mean=0.0, reward_bound=0.0
53732: loss=0.000, reward_mean=0.2, reward_bound=0.0
53733: loss=0.000, reward_mean=0.1, reward_bound=0.0
53734: loss=0.000, reward_mean=0.0, reward_bound=0.0
53735: loss=0.000, reward_mean=0.0, reward_bound=0.0
53736: loss=0.000, reward_mean=0.2, reward_bound=0.0
53737: loss=0.000, reward_mean=0.1, reward_bou

53880: loss=0.000, reward_mean=0.1, reward_bound=0.0
53881: loss=0.000, reward_mean=0.1, reward_bound=0.0
53882: loss=0.000, reward_mean=0.1, reward_bound=0.0
53883: loss=0.000, reward_mean=0.0, reward_bound=0.0
53884: loss=0.000, reward_mean=0.0, reward_bound=0.0
53885: loss=0.000, reward_mean=0.1, reward_bound=0.0
53886: loss=0.000, reward_mean=0.1, reward_bound=0.0
53887: loss=0.000, reward_mean=0.1, reward_bound=0.0
53888: loss=0.000, reward_mean=0.1, reward_bound=0.0
53889: loss=0.000, reward_mean=0.1, reward_bound=0.0
53890: loss=0.000, reward_mean=0.1, reward_bound=0.0
53891: loss=0.000, reward_mean=0.1, reward_bound=0.0
53892: loss=0.000, reward_mean=0.1, reward_bound=0.0
53893: loss=0.000, reward_mean=0.0, reward_bound=0.0
53894: loss=0.000, reward_mean=0.0, reward_bound=0.0
53895: loss=0.000, reward_mean=0.1, reward_bound=0.0
53896: loss=0.000, reward_mean=0.0, reward_bound=0.0
53897: loss=0.000, reward_mean=0.0, reward_bound=0.0
53898: loss=0.000, reward_mean=0.1, reward_bou

54035: loss=0.000, reward_mean=0.1, reward_bound=0.0
54036: loss=0.000, reward_mean=0.2, reward_bound=0.0
54037: loss=0.000, reward_mean=0.0, reward_bound=0.0
54038: loss=0.000, reward_mean=0.0, reward_bound=0.0
54039: loss=0.000, reward_mean=0.1, reward_bound=0.0
54040: loss=0.000, reward_mean=0.0, reward_bound=0.0
54041: loss=0.000, reward_mean=0.0, reward_bound=0.0
54042: loss=0.000, reward_mean=0.1, reward_bound=0.0
54043: loss=0.000, reward_mean=0.1, reward_bound=0.0
54044: loss=0.000, reward_mean=0.1, reward_bound=0.0
54045: loss=0.000, reward_mean=0.0, reward_bound=0.0
54046: loss=0.000, reward_mean=0.1, reward_bound=0.0
54047: loss=0.000, reward_mean=0.1, reward_bound=0.0
54048: loss=0.000, reward_mean=0.0, reward_bound=0.0
54049: loss=0.000, reward_mean=0.1, reward_bound=0.0
54050: loss=0.000, reward_mean=0.0, reward_bound=0.0
54051: loss=0.000, reward_mean=0.0, reward_bound=0.0
54052: loss=0.000, reward_mean=0.0, reward_bound=0.0
54053: loss=0.000, reward_mean=0.1, reward_bou

54194: loss=0.000, reward_mean=0.0, reward_bound=0.0
54195: loss=0.000, reward_mean=0.0, reward_bound=0.0
54196: loss=0.000, reward_mean=0.0, reward_bound=0.0
54197: loss=0.000, reward_mean=0.0, reward_bound=0.0
54198: loss=0.000, reward_mean=0.0, reward_bound=0.0
54199: loss=0.000, reward_mean=0.1, reward_bound=0.0
54200: loss=0.000, reward_mean=0.0, reward_bound=0.0
54201: loss=0.000, reward_mean=0.1, reward_bound=0.0
54202: loss=0.000, reward_mean=0.1, reward_bound=0.0
54203: loss=0.000, reward_mean=0.1, reward_bound=0.0
54204: loss=0.000, reward_mean=0.1, reward_bound=0.0
54205: loss=0.000, reward_mean=0.0, reward_bound=0.0
54206: loss=0.000, reward_mean=0.2, reward_bound=0.0
54207: loss=0.000, reward_mean=0.1, reward_bound=0.0
54208: loss=0.000, reward_mean=0.1, reward_bound=0.0
54209: loss=0.000, reward_mean=0.0, reward_bound=0.0
54210: loss=0.000, reward_mean=0.1, reward_bound=0.0
54211: loss=0.000, reward_mean=0.1, reward_bound=0.0
54212: loss=0.000, reward_mean=0.0, reward_bou

54349: loss=0.000, reward_mean=0.0, reward_bound=0.0
54350: loss=0.000, reward_mean=0.0, reward_bound=0.0
54351: loss=0.000, reward_mean=0.1, reward_bound=0.0
54352: loss=0.000, reward_mean=0.0, reward_bound=0.0
54353: loss=0.000, reward_mean=0.0, reward_bound=0.0
54354: loss=0.000, reward_mean=0.1, reward_bound=0.0
54355: loss=0.000, reward_mean=0.1, reward_bound=0.0
54356: loss=0.000, reward_mean=0.0, reward_bound=0.0
54357: loss=0.000, reward_mean=0.1, reward_bound=0.0
54358: loss=0.000, reward_mean=0.1, reward_bound=0.0
54359: loss=0.000, reward_mean=0.1, reward_bound=0.0
54360: loss=0.000, reward_mean=0.1, reward_bound=0.0
54361: loss=0.000, reward_mean=0.0, reward_bound=0.0
54362: loss=0.000, reward_mean=0.0, reward_bound=0.0
54363: loss=0.000, reward_mean=0.1, reward_bound=0.0
54364: loss=0.000, reward_mean=0.0, reward_bound=0.0
54365: loss=0.000, reward_mean=0.0, reward_bound=0.0
54366: loss=0.000, reward_mean=0.1, reward_bound=0.0
54367: loss=0.000, reward_mean=0.1, reward_bou

54507: loss=0.000, reward_mean=0.1, reward_bound=0.0
54508: loss=0.000, reward_mean=0.0, reward_bound=0.0
54509: loss=0.000, reward_mean=0.1, reward_bound=0.0
54510: loss=0.000, reward_mean=0.1, reward_bound=0.0
54511: loss=0.000, reward_mean=0.1, reward_bound=0.0
54512: loss=0.000, reward_mean=0.0, reward_bound=0.0
54513: loss=0.000, reward_mean=0.1, reward_bound=0.0
54514: loss=0.000, reward_mean=0.0, reward_bound=0.0
54515: loss=0.000, reward_mean=0.0, reward_bound=0.0
54516: loss=0.000, reward_mean=0.1, reward_bound=0.0
54517: loss=0.000, reward_mean=0.1, reward_bound=0.0
54518: loss=0.000, reward_mean=0.1, reward_bound=0.0
54519: loss=0.000, reward_mean=0.1, reward_bound=0.0
54520: loss=0.000, reward_mean=0.1, reward_bound=0.0
54521: loss=0.000, reward_mean=0.1, reward_bound=0.0
54522: loss=0.000, reward_mean=0.1, reward_bound=0.0
54523: loss=0.000, reward_mean=0.1, reward_bound=0.0
54524: loss=0.000, reward_mean=0.1, reward_bound=0.0
54525: loss=0.000, reward_mean=0.1, reward_bou

54662: loss=0.000, reward_mean=0.0, reward_bound=0.0
54663: loss=0.000, reward_mean=0.0, reward_bound=0.0
54664: loss=0.000, reward_mean=0.1, reward_bound=0.0
54665: loss=0.000, reward_mean=0.1, reward_bound=0.0
54666: loss=0.000, reward_mean=0.0, reward_bound=0.0
54667: loss=0.000, reward_mean=0.1, reward_bound=0.0
54668: loss=0.000, reward_mean=0.1, reward_bound=0.0
54669: loss=0.000, reward_mean=0.1, reward_bound=0.0
54670: loss=0.000, reward_mean=0.1, reward_bound=0.0
54671: loss=0.000, reward_mean=0.0, reward_bound=0.0
54672: loss=0.000, reward_mean=0.1, reward_bound=0.0
54673: loss=0.000, reward_mean=0.0, reward_bound=0.0
54674: loss=0.000, reward_mean=0.1, reward_bound=0.0
54675: loss=0.000, reward_mean=0.0, reward_bound=0.0
54676: loss=0.000, reward_mean=0.1, reward_bound=0.0
54677: loss=0.000, reward_mean=0.0, reward_bound=0.0
54678: loss=0.000, reward_mean=0.1, reward_bound=0.0
54679: loss=0.000, reward_mean=0.0, reward_bound=0.0
54680: loss=0.000, reward_mean=0.0, reward_bou

54823: loss=0.000, reward_mean=0.0, reward_bound=0.0
54824: loss=0.000, reward_mean=0.1, reward_bound=0.0
54825: loss=0.000, reward_mean=0.1, reward_bound=0.0
54826: loss=0.000, reward_mean=0.1, reward_bound=0.0
54827: loss=0.000, reward_mean=0.1, reward_bound=0.0
54828: loss=0.000, reward_mean=0.1, reward_bound=0.0
54829: loss=0.000, reward_mean=0.1, reward_bound=0.0
54830: loss=0.000, reward_mean=0.1, reward_bound=0.0
54831: loss=0.000, reward_mean=0.0, reward_bound=0.0
54832: loss=0.000, reward_mean=0.1, reward_bound=0.0
54833: loss=0.000, reward_mean=0.0, reward_bound=0.0
54834: loss=0.000, reward_mean=0.0, reward_bound=0.0
54835: loss=0.000, reward_mean=0.0, reward_bound=0.0
54836: loss=0.000, reward_mean=0.1, reward_bound=0.0
54837: loss=0.000, reward_mean=0.0, reward_bound=0.0
54838: loss=0.000, reward_mean=0.0, reward_bound=0.0
54839: loss=0.000, reward_mean=0.1, reward_bound=0.0
54840: loss=0.000, reward_mean=0.1, reward_bound=0.0
54841: loss=0.000, reward_mean=0.1, reward_bou

54980: loss=0.000, reward_mean=0.1, reward_bound=0.0
54981: loss=0.000, reward_mean=0.0, reward_bound=0.0
54982: loss=0.000, reward_mean=0.1, reward_bound=0.0
54983: loss=0.000, reward_mean=0.0, reward_bound=0.0
54984: loss=0.000, reward_mean=0.2, reward_bound=0.0
54985: loss=0.000, reward_mean=0.2, reward_bound=0.0
54986: loss=0.000, reward_mean=0.0, reward_bound=0.0
54987: loss=0.000, reward_mean=0.1, reward_bound=0.0
54988: loss=0.000, reward_mean=0.0, reward_bound=0.0
54989: loss=0.000, reward_mean=0.0, reward_bound=0.0
54990: loss=0.000, reward_mean=0.0, reward_bound=0.0
54991: loss=0.000, reward_mean=0.0, reward_bound=0.0
54992: loss=0.000, reward_mean=0.1, reward_bound=0.0
54993: loss=0.000, reward_mean=0.0, reward_bound=0.0
54994: loss=0.000, reward_mean=0.1, reward_bound=0.0
54995: loss=0.000, reward_mean=0.1, reward_bound=0.0
54996: loss=0.000, reward_mean=0.1, reward_bound=0.0
54997: loss=0.000, reward_mean=0.1, reward_bound=0.0
54998: loss=0.000, reward_mean=0.0, reward_bou

55142: loss=0.000, reward_mean=0.1, reward_bound=0.0
55143: loss=0.000, reward_mean=0.0, reward_bound=0.0
55144: loss=0.000, reward_mean=0.0, reward_bound=0.0
55145: loss=0.000, reward_mean=0.0, reward_bound=0.0
55146: loss=0.000, reward_mean=0.1, reward_bound=0.0
55147: loss=0.000, reward_mean=0.0, reward_bound=0.0
55148: loss=0.000, reward_mean=0.0, reward_bound=0.0
55149: loss=0.000, reward_mean=0.2, reward_bound=0.0
55150: loss=0.000, reward_mean=0.1, reward_bound=0.0
55151: loss=0.000, reward_mean=0.1, reward_bound=0.0
55152: loss=0.000, reward_mean=0.1, reward_bound=0.0
55153: loss=0.000, reward_mean=0.1, reward_bound=0.0
55154: loss=0.000, reward_mean=0.0, reward_bound=0.0
55155: loss=0.000, reward_mean=0.1, reward_bound=0.0
55156: loss=0.000, reward_mean=0.0, reward_bound=0.0
55157: loss=0.000, reward_mean=0.0, reward_bound=0.0
55158: loss=0.000, reward_mean=0.1, reward_bound=0.0
55159: loss=0.000, reward_mean=0.1, reward_bound=0.0
55160: loss=0.000, reward_mean=0.1, reward_bou

55303: loss=0.000, reward_mean=0.1, reward_bound=0.0
55304: loss=0.000, reward_mean=0.1, reward_bound=0.0
55305: loss=0.000, reward_mean=0.0, reward_bound=0.0
55306: loss=0.000, reward_mean=0.0, reward_bound=0.0
55307: loss=0.000, reward_mean=0.1, reward_bound=0.0
55308: loss=0.000, reward_mean=0.1, reward_bound=0.0
55309: loss=0.000, reward_mean=0.0, reward_bound=0.0
55310: loss=0.000, reward_mean=0.0, reward_bound=0.0
55311: loss=0.000, reward_mean=0.1, reward_bound=0.0
55312: loss=0.000, reward_mean=0.1, reward_bound=0.0
55313: loss=0.000, reward_mean=0.1, reward_bound=0.0
55314: loss=0.000, reward_mean=0.0, reward_bound=0.0
55315: loss=0.000, reward_mean=0.1, reward_bound=0.0
55316: loss=0.000, reward_mean=0.1, reward_bound=0.0
55317: loss=0.000, reward_mean=0.1, reward_bound=0.0
55318: loss=0.000, reward_mean=0.1, reward_bound=0.0
55319: loss=0.000, reward_mean=0.1, reward_bound=0.0
55320: loss=0.000, reward_mean=0.1, reward_bound=0.0
55321: loss=0.000, reward_mean=0.0, reward_bou

55464: loss=0.000, reward_mean=0.0, reward_bound=0.0
55465: loss=0.000, reward_mean=0.0, reward_bound=0.0
55466: loss=0.000, reward_mean=0.1, reward_bound=0.0
55467: loss=0.000, reward_mean=0.1, reward_bound=0.0
55468: loss=0.000, reward_mean=0.1, reward_bound=0.0
55469: loss=0.000, reward_mean=0.1, reward_bound=0.0
55470: loss=0.000, reward_mean=0.0, reward_bound=0.0
55471: loss=0.000, reward_mean=0.2, reward_bound=0.0
55472: loss=0.000, reward_mean=0.1, reward_bound=0.0
55473: loss=0.000, reward_mean=0.0, reward_bound=0.0
55474: loss=0.000, reward_mean=0.0, reward_bound=0.0
55475: loss=0.000, reward_mean=0.1, reward_bound=0.0
55476: loss=0.000, reward_mean=0.1, reward_bound=0.0
55477: loss=0.000, reward_mean=0.1, reward_bound=0.0
55478: loss=0.000, reward_mean=0.0, reward_bound=0.0
55479: loss=0.000, reward_mean=0.0, reward_bound=0.0
55480: loss=0.000, reward_mean=0.1, reward_bound=0.0
55481: loss=0.000, reward_mean=0.1, reward_bound=0.0
55482: loss=0.000, reward_mean=0.0, reward_bou

55620: loss=0.000, reward_mean=0.3, reward_bound=0.5
55621: loss=0.000, reward_mean=0.0, reward_bound=0.0
55622: loss=0.000, reward_mean=0.1, reward_bound=0.0
55623: loss=0.000, reward_mean=0.0, reward_bound=0.0
55624: loss=0.000, reward_mean=0.1, reward_bound=0.0
55625: loss=0.000, reward_mean=0.0, reward_bound=0.0
55626: loss=0.000, reward_mean=0.0, reward_bound=0.0
55627: loss=0.000, reward_mean=0.0, reward_bound=0.0
55628: loss=0.000, reward_mean=0.1, reward_bound=0.0
55629: loss=0.000, reward_mean=0.0, reward_bound=0.0
55630: loss=0.000, reward_mean=0.1, reward_bound=0.0
55631: loss=0.000, reward_mean=0.2, reward_bound=0.0
55632: loss=0.000, reward_mean=0.0, reward_bound=0.0
55633: loss=0.000, reward_mean=0.0, reward_bound=0.0
55634: loss=0.000, reward_mean=0.1, reward_bound=0.0
55635: loss=0.000, reward_mean=0.0, reward_bound=0.0
55636: loss=0.000, reward_mean=0.0, reward_bound=0.0
55637: loss=0.000, reward_mean=0.1, reward_bound=0.0
55638: loss=0.000, reward_mean=0.1, reward_bou

55778: loss=0.000, reward_mean=0.0, reward_bound=0.0
55779: loss=0.000, reward_mean=0.1, reward_bound=0.0
55780: loss=0.000, reward_mean=0.1, reward_bound=0.0
55781: loss=0.000, reward_mean=0.1, reward_bound=0.0
55782: loss=0.000, reward_mean=0.0, reward_bound=0.0
55783: loss=0.000, reward_mean=0.0, reward_bound=0.0
55784: loss=0.000, reward_mean=0.1, reward_bound=0.0
55785: loss=0.000, reward_mean=0.0, reward_bound=0.0
55786: loss=0.000, reward_mean=0.1, reward_bound=0.0
55787: loss=0.000, reward_mean=0.1, reward_bound=0.0
55788: loss=0.000, reward_mean=0.1, reward_bound=0.0
55789: loss=0.000, reward_mean=0.2, reward_bound=0.0
55790: loss=0.000, reward_mean=0.1, reward_bound=0.0
55791: loss=0.000, reward_mean=0.0, reward_bound=0.0
55792: loss=0.000, reward_mean=0.0, reward_bound=0.0
55793: loss=0.000, reward_mean=0.1, reward_bound=0.0
55794: loss=0.000, reward_mean=0.1, reward_bound=0.0
55795: loss=0.000, reward_mean=0.0, reward_bound=0.0
55796: loss=0.000, reward_mean=0.0, reward_bou

55933: loss=0.000, reward_mean=0.1, reward_bound=0.0
55934: loss=0.000, reward_mean=0.0, reward_bound=0.0
55935: loss=0.000, reward_mean=0.0, reward_bound=0.0
55936: loss=0.000, reward_mean=0.0, reward_bound=0.0
55937: loss=0.000, reward_mean=0.0, reward_bound=0.0
55938: loss=0.000, reward_mean=0.0, reward_bound=0.0
55939: loss=0.000, reward_mean=0.0, reward_bound=0.0
55940: loss=0.000, reward_mean=0.1, reward_bound=0.0
55941: loss=0.000, reward_mean=0.1, reward_bound=0.0
55942: loss=0.000, reward_mean=0.0, reward_bound=0.0
55943: loss=0.000, reward_mean=0.1, reward_bound=0.0
55944: loss=0.000, reward_mean=0.0, reward_bound=0.0
55945: loss=0.000, reward_mean=0.1, reward_bound=0.0
55946: loss=0.000, reward_mean=0.1, reward_bound=0.0
55947: loss=0.000, reward_mean=0.1, reward_bound=0.0
55948: loss=0.000, reward_mean=0.1, reward_bound=0.0
55949: loss=0.000, reward_mean=0.1, reward_bound=0.0
55950: loss=0.000, reward_mean=0.1, reward_bound=0.0
55951: loss=0.000, reward_mean=0.1, reward_bou

56093: loss=0.000, reward_mean=0.1, reward_bound=0.0
56094: loss=0.000, reward_mean=0.1, reward_bound=0.0
56095: loss=0.000, reward_mean=0.0, reward_bound=0.0
56096: loss=0.000, reward_mean=0.1, reward_bound=0.0
56097: loss=0.000, reward_mean=0.0, reward_bound=0.0
56098: loss=0.000, reward_mean=0.1, reward_bound=0.0
56099: loss=0.000, reward_mean=0.0, reward_bound=0.0
56100: loss=0.000, reward_mean=0.0, reward_bound=0.0
56101: loss=0.000, reward_mean=0.1, reward_bound=0.0
56102: loss=0.000, reward_mean=0.0, reward_bound=0.0
56103: loss=0.000, reward_mean=0.0, reward_bound=0.0
56104: loss=0.000, reward_mean=0.0, reward_bound=0.0
56105: loss=0.000, reward_mean=0.0, reward_bound=0.0
56106: loss=0.000, reward_mean=0.1, reward_bound=0.0
56107: loss=0.000, reward_mean=0.1, reward_bound=0.0
56108: loss=0.000, reward_mean=0.1, reward_bound=0.0
56109: loss=0.000, reward_mean=0.0, reward_bound=0.0
56110: loss=0.000, reward_mean=0.0, reward_bound=0.0
56111: loss=0.000, reward_mean=0.1, reward_bou

56250: loss=0.000, reward_mean=0.1, reward_bound=0.0
56251: loss=0.000, reward_mean=0.0, reward_bound=0.0
56252: loss=0.000, reward_mean=0.0, reward_bound=0.0
56253: loss=0.000, reward_mean=0.0, reward_bound=0.0
56254: loss=0.000, reward_mean=0.0, reward_bound=0.0
56255: loss=0.000, reward_mean=0.1, reward_bound=0.0
56256: loss=0.000, reward_mean=0.1, reward_bound=0.0
56257: loss=0.000, reward_mean=0.1, reward_bound=0.0
56258: loss=0.000, reward_mean=0.1, reward_bound=0.0
56259: loss=0.000, reward_mean=0.0, reward_bound=0.0
56260: loss=0.000, reward_mean=0.1, reward_bound=0.0
56261: loss=0.000, reward_mean=0.0, reward_bound=0.0
56262: loss=0.000, reward_mean=0.0, reward_bound=0.0
56263: loss=0.000, reward_mean=0.2, reward_bound=0.0
56264: loss=0.000, reward_mean=0.0, reward_bound=0.0
56265: loss=0.000, reward_mean=0.1, reward_bound=0.0
56266: loss=0.000, reward_mean=0.1, reward_bound=0.0
56267: loss=0.000, reward_mean=0.0, reward_bound=0.0
56268: loss=0.000, reward_mean=0.1, reward_bou

56412: loss=0.000, reward_mean=0.0, reward_bound=0.0
56413: loss=0.000, reward_mean=0.0, reward_bound=0.0
56414: loss=0.000, reward_mean=0.2, reward_bound=0.0
56415: loss=0.000, reward_mean=0.2, reward_bound=0.0
56416: loss=0.000, reward_mean=0.0, reward_bound=0.0
56417: loss=0.000, reward_mean=0.1, reward_bound=0.0
56418: loss=0.000, reward_mean=0.1, reward_bound=0.0
56419: loss=0.000, reward_mean=0.1, reward_bound=0.0
56420: loss=0.000, reward_mean=0.0, reward_bound=0.0
56421: loss=0.000, reward_mean=0.1, reward_bound=0.0
56422: loss=0.000, reward_mean=0.1, reward_bound=0.0
56423: loss=0.000, reward_mean=0.0, reward_bound=0.0
56424: loss=0.000, reward_mean=0.0, reward_bound=0.0
56425: loss=0.000, reward_mean=0.1, reward_bound=0.0
56426: loss=0.000, reward_mean=0.1, reward_bound=0.0
56427: loss=0.000, reward_mean=0.1, reward_bound=0.0
56428: loss=0.000, reward_mean=0.0, reward_bound=0.0
56429: loss=0.000, reward_mean=0.1, reward_bound=0.0
56430: loss=0.000, reward_mean=0.1, reward_bou

56571: loss=0.000, reward_mean=0.1, reward_bound=0.0
56572: loss=0.000, reward_mean=0.0, reward_bound=0.0
56573: loss=0.000, reward_mean=0.0, reward_bound=0.0
56574: loss=0.000, reward_mean=0.0, reward_bound=0.0
56575: loss=0.000, reward_mean=0.1, reward_bound=0.0
56576: loss=0.000, reward_mean=0.0, reward_bound=0.0
56577: loss=0.000, reward_mean=0.1, reward_bound=0.0
56578: loss=0.000, reward_mean=0.1, reward_bound=0.0
56579: loss=0.000, reward_mean=0.1, reward_bound=0.0
56580: loss=0.000, reward_mean=0.0, reward_bound=0.0
56581: loss=0.000, reward_mean=0.0, reward_bound=0.0
56582: loss=0.000, reward_mean=0.1, reward_bound=0.0
56583: loss=0.000, reward_mean=0.1, reward_bound=0.0
56584: loss=0.000, reward_mean=0.1, reward_bound=0.0
56585: loss=0.000, reward_mean=0.0, reward_bound=0.0
56586: loss=0.000, reward_mean=0.0, reward_bound=0.0
56587: loss=0.000, reward_mean=0.1, reward_bound=0.0
56588: loss=0.000, reward_mean=0.1, reward_bound=0.0
56589: loss=0.000, reward_mean=0.1, reward_bou

56727: loss=0.000, reward_mean=0.0, reward_bound=0.0
56728: loss=0.000, reward_mean=0.1, reward_bound=0.0
56729: loss=0.000, reward_mean=0.1, reward_bound=0.0
56730: loss=0.000, reward_mean=0.0, reward_bound=0.0
56731: loss=0.000, reward_mean=0.1, reward_bound=0.0
56732: loss=0.000, reward_mean=0.1, reward_bound=0.0
56733: loss=0.000, reward_mean=0.1, reward_bound=0.0
56734: loss=0.000, reward_mean=0.0, reward_bound=0.0
56735: loss=0.000, reward_mean=0.1, reward_bound=0.0
56736: loss=0.000, reward_mean=0.1, reward_bound=0.0
56737: loss=0.000, reward_mean=0.1, reward_bound=0.0
56738: loss=0.000, reward_mean=0.1, reward_bound=0.0
56739: loss=0.000, reward_mean=0.1, reward_bound=0.0
56740: loss=0.000, reward_mean=0.0, reward_bound=0.0
56741: loss=0.000, reward_mean=0.1, reward_bound=0.0
56742: loss=0.000, reward_mean=0.1, reward_bound=0.0
56743: loss=0.000, reward_mean=0.1, reward_bound=0.0
56744: loss=0.000, reward_mean=0.1, reward_bound=0.0
56745: loss=0.000, reward_mean=0.0, reward_bou

56888: loss=0.000, reward_mean=0.0, reward_bound=0.0
56889: loss=0.000, reward_mean=0.1, reward_bound=0.0
56890: loss=0.000, reward_mean=0.0, reward_bound=0.0
56891: loss=0.000, reward_mean=0.0, reward_bound=0.0
56892: loss=0.000, reward_mean=0.1, reward_bound=0.0
56893: loss=0.000, reward_mean=0.1, reward_bound=0.0
56894: loss=0.000, reward_mean=0.1, reward_bound=0.0
56895: loss=0.000, reward_mean=0.1, reward_bound=0.0
56896: loss=0.000, reward_mean=0.0, reward_bound=0.0
56897: loss=0.000, reward_mean=0.2, reward_bound=0.0
56898: loss=0.000, reward_mean=0.1, reward_bound=0.0
56899: loss=0.000, reward_mean=0.0, reward_bound=0.0
56900: loss=0.000, reward_mean=0.0, reward_bound=0.0
56901: loss=0.000, reward_mean=0.0, reward_bound=0.0
56902: loss=0.000, reward_mean=0.1, reward_bound=0.0
56903: loss=0.000, reward_mean=0.0, reward_bound=0.0
56904: loss=0.000, reward_mean=0.2, reward_bound=0.0
56905: loss=0.000, reward_mean=0.1, reward_bound=0.0
56906: loss=0.000, reward_mean=0.0, reward_bou

57044: loss=0.000, reward_mean=0.0, reward_bound=0.0
57045: loss=0.000, reward_mean=0.0, reward_bound=0.0
57046: loss=0.000, reward_mean=0.0, reward_bound=0.0
57047: loss=0.000, reward_mean=0.0, reward_bound=0.0
57048: loss=0.000, reward_mean=0.0, reward_bound=0.0
57049: loss=0.000, reward_mean=0.0, reward_bound=0.0
57050: loss=0.000, reward_mean=0.0, reward_bound=0.0
57051: loss=0.000, reward_mean=0.2, reward_bound=0.0
57052: loss=0.000, reward_mean=0.1, reward_bound=0.0
57053: loss=0.000, reward_mean=0.1, reward_bound=0.0
57054: loss=0.000, reward_mean=0.0, reward_bound=0.0
57055: loss=0.000, reward_mean=0.0, reward_bound=0.0
57056: loss=0.000, reward_mean=0.1, reward_bound=0.0
57057: loss=0.000, reward_mean=0.2, reward_bound=0.0
57058: loss=0.000, reward_mean=0.1, reward_bound=0.0
57059: loss=0.000, reward_mean=0.0, reward_bound=0.0
57060: loss=0.000, reward_mean=0.1, reward_bound=0.0
57061: loss=0.000, reward_mean=0.0, reward_bound=0.0
57062: loss=0.000, reward_mean=0.0, reward_bou

57202: loss=0.000, reward_mean=0.1, reward_bound=0.0
57203: loss=0.000, reward_mean=0.1, reward_bound=0.0
57204: loss=0.000, reward_mean=0.1, reward_bound=0.0
57205: loss=0.000, reward_mean=0.1, reward_bound=0.0
57206: loss=0.000, reward_mean=0.0, reward_bound=0.0
57207: loss=0.000, reward_mean=0.1, reward_bound=0.0
57208: loss=0.000, reward_mean=0.1, reward_bound=0.0
57209: loss=0.000, reward_mean=0.0, reward_bound=0.0
57210: loss=0.000, reward_mean=0.1, reward_bound=0.0
57211: loss=0.000, reward_mean=0.0, reward_bound=0.0
57212: loss=0.000, reward_mean=0.1, reward_bound=0.0
57213: loss=0.000, reward_mean=0.1, reward_bound=0.0
57214: loss=0.000, reward_mean=0.0, reward_bound=0.0
57215: loss=0.000, reward_mean=0.0, reward_bound=0.0
57216: loss=0.000, reward_mean=0.0, reward_bound=0.0
57217: loss=0.000, reward_mean=0.1, reward_bound=0.0
57218: loss=0.000, reward_mean=0.2, reward_bound=0.0
57219: loss=0.000, reward_mean=0.1, reward_bound=0.0
57220: loss=0.000, reward_mean=0.1, reward_bou

57359: loss=0.000, reward_mean=0.1, reward_bound=0.0
57360: loss=0.000, reward_mean=0.1, reward_bound=0.0
57361: loss=0.000, reward_mean=0.1, reward_bound=0.0
57362: loss=0.000, reward_mean=0.1, reward_bound=0.0
57363: loss=0.000, reward_mean=0.1, reward_bound=0.0
57364: loss=0.000, reward_mean=0.0, reward_bound=0.0
57365: loss=0.000, reward_mean=0.1, reward_bound=0.0
57366: loss=0.000, reward_mean=0.1, reward_bound=0.0
57367: loss=0.000, reward_mean=0.1, reward_bound=0.0
57368: loss=0.000, reward_mean=0.1, reward_bound=0.0
57369: loss=0.000, reward_mean=0.0, reward_bound=0.0
57370: loss=0.000, reward_mean=0.1, reward_bound=0.0
57371: loss=0.000, reward_mean=0.1, reward_bound=0.0
57372: loss=0.000, reward_mean=0.1, reward_bound=0.0
57373: loss=0.000, reward_mean=0.0, reward_bound=0.0
57374: loss=0.000, reward_mean=0.0, reward_bound=0.0
57375: loss=0.000, reward_mean=0.1, reward_bound=0.0
57376: loss=0.000, reward_mean=0.1, reward_bound=0.0
57377: loss=0.000, reward_mean=0.0, reward_bou

57516: loss=0.000, reward_mean=0.1, reward_bound=0.0
57517: loss=0.000, reward_mean=0.1, reward_bound=0.0
57518: loss=0.000, reward_mean=0.0, reward_bound=0.0
57519: loss=0.000, reward_mean=0.1, reward_bound=0.0
57520: loss=0.000, reward_mean=0.0, reward_bound=0.0
57521: loss=0.000, reward_mean=0.0, reward_bound=0.0
57522: loss=0.000, reward_mean=0.2, reward_bound=0.0
57523: loss=0.000, reward_mean=0.1, reward_bound=0.0
57524: loss=0.000, reward_mean=0.1, reward_bound=0.0
57525: loss=0.000, reward_mean=0.0, reward_bound=0.0
57526: loss=0.000, reward_mean=0.0, reward_bound=0.0
57527: loss=0.000, reward_mean=0.0, reward_bound=0.0
57528: loss=0.000, reward_mean=0.2, reward_bound=0.0
57529: loss=0.000, reward_mean=0.0, reward_bound=0.0
57530: loss=0.000, reward_mean=0.1, reward_bound=0.0
57531: loss=0.000, reward_mean=0.1, reward_bound=0.0
57532: loss=0.000, reward_mean=0.1, reward_bound=0.0
57533: loss=0.000, reward_mean=0.1, reward_bound=0.0
57534: loss=0.000, reward_mean=0.0, reward_bou

57677: loss=0.000, reward_mean=0.1, reward_bound=0.0
57678: loss=0.000, reward_mean=0.0, reward_bound=0.0
57679: loss=0.000, reward_mean=0.0, reward_bound=0.0
57680: loss=0.000, reward_mean=0.0, reward_bound=0.0
57681: loss=0.000, reward_mean=0.2, reward_bound=0.0
57682: loss=0.000, reward_mean=0.1, reward_bound=0.0
57683: loss=0.000, reward_mean=0.1, reward_bound=0.0
57684: loss=0.000, reward_mean=0.2, reward_bound=0.0
57685: loss=0.000, reward_mean=0.0, reward_bound=0.0
57686: loss=0.000, reward_mean=0.0, reward_bound=0.0
57687: loss=0.000, reward_mean=0.2, reward_bound=0.0
57688: loss=0.000, reward_mean=0.1, reward_bound=0.0
57689: loss=0.000, reward_mean=0.1, reward_bound=0.0
57690: loss=0.000, reward_mean=0.0, reward_bound=0.0
57691: loss=0.000, reward_mean=0.2, reward_bound=0.0
57692: loss=0.000, reward_mean=0.1, reward_bound=0.0
57693: loss=0.000, reward_mean=0.0, reward_bound=0.0
57694: loss=0.000, reward_mean=0.0, reward_bound=0.0
57695: loss=0.000, reward_mean=0.0, reward_bou

57834: loss=0.000, reward_mean=0.1, reward_bound=0.0
57835: loss=0.000, reward_mean=0.1, reward_bound=0.0
57836: loss=0.000, reward_mean=0.0, reward_bound=0.0
57837: loss=0.000, reward_mean=0.1, reward_bound=0.0
57838: loss=0.000, reward_mean=0.1, reward_bound=0.0
57839: loss=0.000, reward_mean=0.2, reward_bound=0.0
57840: loss=0.000, reward_mean=0.0, reward_bound=0.0
57841: loss=0.000, reward_mean=0.0, reward_bound=0.0
57842: loss=0.000, reward_mean=0.1, reward_bound=0.0
57843: loss=0.000, reward_mean=0.0, reward_bound=0.0
57844: loss=0.000, reward_mean=0.1, reward_bound=0.0
57845: loss=0.000, reward_mean=0.1, reward_bound=0.0
57846: loss=0.000, reward_mean=0.1, reward_bound=0.0
57847: loss=0.000, reward_mean=0.1, reward_bound=0.0
57848: loss=0.000, reward_mean=0.1, reward_bound=0.0
57849: loss=0.000, reward_mean=0.1, reward_bound=0.0
57850: loss=0.000, reward_mean=0.1, reward_bound=0.0
57851: loss=0.000, reward_mean=0.1, reward_bound=0.0
57852: loss=0.000, reward_mean=0.1, reward_bou

57989: loss=0.000, reward_mean=0.1, reward_bound=0.0
57990: loss=0.000, reward_mean=0.0, reward_bound=0.0
57991: loss=0.000, reward_mean=0.1, reward_bound=0.0
57992: loss=0.000, reward_mean=0.1, reward_bound=0.0
57993: loss=0.000, reward_mean=0.0, reward_bound=0.0
57994: loss=0.000, reward_mean=0.1, reward_bound=0.0
57995: loss=0.000, reward_mean=0.0, reward_bound=0.0
57996: loss=0.000, reward_mean=0.0, reward_bound=0.0
57997: loss=0.000, reward_mean=0.1, reward_bound=0.0
57998: loss=0.000, reward_mean=0.1, reward_bound=0.0
57999: loss=0.000, reward_mean=0.1, reward_bound=0.0
58000: loss=0.000, reward_mean=0.2, reward_bound=0.0
58001: loss=0.000, reward_mean=0.1, reward_bound=0.0
58002: loss=0.000, reward_mean=0.1, reward_bound=0.0
58003: loss=0.000, reward_mean=0.1, reward_bound=0.0
58004: loss=0.000, reward_mean=0.1, reward_bound=0.0
58005: loss=0.000, reward_mean=0.0, reward_bound=0.0
58006: loss=0.000, reward_mean=0.1, reward_bound=0.0
58007: loss=0.000, reward_mean=0.0, reward_bou

58148: loss=0.000, reward_mean=0.0, reward_bound=0.0
58149: loss=0.000, reward_mean=0.0, reward_bound=0.0
58150: loss=0.000, reward_mean=0.0, reward_bound=0.0
58151: loss=0.000, reward_mean=0.1, reward_bound=0.0
58152: loss=0.000, reward_mean=0.1, reward_bound=0.0
58153: loss=0.000, reward_mean=0.1, reward_bound=0.0
58154: loss=0.000, reward_mean=0.0, reward_bound=0.0
58155: loss=0.000, reward_mean=0.0, reward_bound=0.0
58156: loss=0.000, reward_mean=0.1, reward_bound=0.0
58157: loss=0.000, reward_mean=0.1, reward_bound=0.0
58158: loss=0.000, reward_mean=0.1, reward_bound=0.0
58159: loss=0.000, reward_mean=0.1, reward_bound=0.0
58160: loss=0.000, reward_mean=0.0, reward_bound=0.0
58161: loss=0.000, reward_mean=0.0, reward_bound=0.0
58162: loss=0.000, reward_mean=0.0, reward_bound=0.0
58163: loss=0.000, reward_mean=0.0, reward_bound=0.0
58164: loss=0.000, reward_mean=0.1, reward_bound=0.0
58165: loss=0.000, reward_mean=0.1, reward_bound=0.0
58166: loss=0.000, reward_mean=0.0, reward_bou

58305: loss=0.000, reward_mean=0.0, reward_bound=0.0
58306: loss=0.000, reward_mean=0.1, reward_bound=0.0
58307: loss=0.000, reward_mean=0.1, reward_bound=0.0
58308: loss=0.000, reward_mean=0.1, reward_bound=0.0
58309: loss=0.000, reward_mean=0.1, reward_bound=0.0
58310: loss=0.000, reward_mean=0.1, reward_bound=0.0
58311: loss=0.000, reward_mean=0.2, reward_bound=0.0
58312: loss=0.000, reward_mean=0.0, reward_bound=0.0
58313: loss=0.000, reward_mean=0.1, reward_bound=0.0
58314: loss=0.000, reward_mean=0.1, reward_bound=0.0
58315: loss=0.000, reward_mean=0.1, reward_bound=0.0
58316: loss=0.000, reward_mean=0.0, reward_bound=0.0
58317: loss=0.000, reward_mean=0.1, reward_bound=0.0
58318: loss=0.000, reward_mean=0.1, reward_bound=0.0
58319: loss=0.000, reward_mean=0.1, reward_bound=0.0
58320: loss=0.000, reward_mean=0.0, reward_bound=0.0
58321: loss=0.000, reward_mean=0.0, reward_bound=0.0
58322: loss=0.000, reward_mean=0.1, reward_bound=0.0
58323: loss=0.000, reward_mean=0.0, reward_bou

58460: loss=0.000, reward_mean=0.0, reward_bound=0.0
58461: loss=0.000, reward_mean=0.0, reward_bound=0.0
58462: loss=0.000, reward_mean=0.1, reward_bound=0.0
58463: loss=0.000, reward_mean=0.1, reward_bound=0.0
58464: loss=0.000, reward_mean=0.0, reward_bound=0.0
58465: loss=0.000, reward_mean=0.0, reward_bound=0.0
58466: loss=0.000, reward_mean=0.0, reward_bound=0.0
58467: loss=0.000, reward_mean=0.0, reward_bound=0.0
58468: loss=0.000, reward_mean=0.0, reward_bound=0.0
58469: loss=0.000, reward_mean=0.1, reward_bound=0.0
58470: loss=0.000, reward_mean=0.1, reward_bound=0.0
58471: loss=0.000, reward_mean=0.0, reward_bound=0.0
58472: loss=0.000, reward_mean=0.0, reward_bound=0.0
58473: loss=0.000, reward_mean=0.1, reward_bound=0.0
58474: loss=0.000, reward_mean=0.0, reward_bound=0.0
58475: loss=0.000, reward_mean=0.0, reward_bound=0.0
58476: loss=0.000, reward_mean=0.0, reward_bound=0.0
58477: loss=0.000, reward_mean=0.0, reward_bound=0.0
58478: loss=0.000, reward_mean=0.1, reward_bou

58615: loss=0.000, reward_mean=0.0, reward_bound=0.0
58616: loss=0.000, reward_mean=0.1, reward_bound=0.0
58617: loss=0.000, reward_mean=0.1, reward_bound=0.0
58618: loss=0.000, reward_mean=0.2, reward_bound=0.0
58619: loss=0.000, reward_mean=0.1, reward_bound=0.0
58620: loss=0.000, reward_mean=0.1, reward_bound=0.0
58621: loss=0.000, reward_mean=0.2, reward_bound=0.0
58622: loss=0.000, reward_mean=0.1, reward_bound=0.0
58623: loss=0.000, reward_mean=0.1, reward_bound=0.0
58624: loss=0.000, reward_mean=0.0, reward_bound=0.0
58625: loss=0.000, reward_mean=0.0, reward_bound=0.0
58626: loss=0.000, reward_mean=0.0, reward_bound=0.0
58627: loss=0.000, reward_mean=0.1, reward_bound=0.0
58628: loss=0.000, reward_mean=0.1, reward_bound=0.0
58629: loss=0.000, reward_mean=0.0, reward_bound=0.0
58630: loss=0.000, reward_mean=0.1, reward_bound=0.0
58631: loss=0.000, reward_mean=0.2, reward_bound=0.0
58632: loss=0.000, reward_mean=0.1, reward_bound=0.0
58633: loss=0.000, reward_mean=0.1, reward_bou

58777: loss=0.000, reward_mean=0.1, reward_bound=0.0
58778: loss=0.000, reward_mean=0.1, reward_bound=0.0
58779: loss=0.000, reward_mean=0.0, reward_bound=0.0
58780: loss=0.000, reward_mean=0.1, reward_bound=0.0
58781: loss=0.000, reward_mean=0.1, reward_bound=0.0
58782: loss=0.000, reward_mean=0.1, reward_bound=0.0
58783: loss=0.000, reward_mean=0.1, reward_bound=0.0
58784: loss=0.000, reward_mean=0.0, reward_bound=0.0
58785: loss=0.000, reward_mean=0.1, reward_bound=0.0
58786: loss=0.000, reward_mean=0.0, reward_bound=0.0
58787: loss=0.000, reward_mean=0.0, reward_bound=0.0
58788: loss=0.000, reward_mean=0.1, reward_bound=0.0
58789: loss=0.000, reward_mean=0.1, reward_bound=0.0
58790: loss=0.000, reward_mean=0.1, reward_bound=0.0
58791: loss=0.000, reward_mean=0.0, reward_bound=0.0
58792: loss=0.000, reward_mean=0.1, reward_bound=0.0
58793: loss=0.000, reward_mean=0.1, reward_bound=0.0
58794: loss=0.000, reward_mean=0.1, reward_bound=0.0
58795: loss=0.000, reward_mean=0.1, reward_bou

58933: loss=0.000, reward_mean=0.1, reward_bound=0.0
58934: loss=0.000, reward_mean=0.1, reward_bound=0.0
58935: loss=0.000, reward_mean=0.1, reward_bound=0.0
58936: loss=0.000, reward_mean=0.0, reward_bound=0.0
58937: loss=0.000, reward_mean=0.1, reward_bound=0.0
58938: loss=0.000, reward_mean=0.0, reward_bound=0.0
58939: loss=0.000, reward_mean=0.1, reward_bound=0.0
58940: loss=0.000, reward_mean=0.1, reward_bound=0.0
58941: loss=0.000, reward_mean=0.0, reward_bound=0.0
58942: loss=0.000, reward_mean=0.1, reward_bound=0.0
58943: loss=0.000, reward_mean=0.2, reward_bound=0.0
58944: loss=0.000, reward_mean=0.0, reward_bound=0.0
58945: loss=0.000, reward_mean=0.1, reward_bound=0.0
58946: loss=0.000, reward_mean=0.1, reward_bound=0.0
58947: loss=0.000, reward_mean=0.1, reward_bound=0.0
58948: loss=0.000, reward_mean=0.0, reward_bound=0.0
58949: loss=0.000, reward_mean=0.0, reward_bound=0.0
58950: loss=0.000, reward_mean=0.1, reward_bound=0.0
58951: loss=0.000, reward_mean=0.0, reward_bou

59088: loss=0.000, reward_mean=0.0, reward_bound=0.0
59089: loss=0.000, reward_mean=0.1, reward_bound=0.0
59090: loss=0.000, reward_mean=0.0, reward_bound=0.0
59091: loss=0.000, reward_mean=0.1, reward_bound=0.0
59092: loss=0.000, reward_mean=0.0, reward_bound=0.0
59093: loss=0.000, reward_mean=0.1, reward_bound=0.0
59094: loss=0.000, reward_mean=0.1, reward_bound=0.0
59095: loss=0.000, reward_mean=0.1, reward_bound=0.0
59096: loss=0.000, reward_mean=0.1, reward_bound=0.0
59097: loss=0.000, reward_mean=0.0, reward_bound=0.0
59098: loss=0.000, reward_mean=0.1, reward_bound=0.0
59099: loss=0.000, reward_mean=0.1, reward_bound=0.0
59100: loss=0.000, reward_mean=0.0, reward_bound=0.0
59101: loss=0.000, reward_mean=0.0, reward_bound=0.0
59102: loss=0.000, reward_mean=0.0, reward_bound=0.0
59103: loss=0.000, reward_mean=0.0, reward_bound=0.0
59104: loss=0.000, reward_mean=0.1, reward_bound=0.0
59105: loss=0.000, reward_mean=0.0, reward_bound=0.0
59106: loss=0.000, reward_mean=0.1, reward_bou

59245: loss=0.000, reward_mean=0.1, reward_bound=0.0
59246: loss=0.000, reward_mean=0.0, reward_bound=0.0
59247: loss=0.000, reward_mean=0.1, reward_bound=0.0
59248: loss=0.000, reward_mean=0.0, reward_bound=0.0
59249: loss=0.000, reward_mean=0.0, reward_bound=0.0
59250: loss=0.000, reward_mean=0.1, reward_bound=0.0
59251: loss=0.000, reward_mean=0.1, reward_bound=0.0
59252: loss=0.000, reward_mean=0.1, reward_bound=0.0
59253: loss=0.000, reward_mean=0.1, reward_bound=0.0
59254: loss=0.000, reward_mean=0.1, reward_bound=0.0
59255: loss=0.000, reward_mean=0.0, reward_bound=0.0
59256: loss=0.000, reward_mean=0.1, reward_bound=0.0
59257: loss=0.000, reward_mean=0.1, reward_bound=0.0
59258: loss=0.000, reward_mean=0.2, reward_bound=0.0
59259: loss=0.000, reward_mean=0.0, reward_bound=0.0
59260: loss=0.000, reward_mean=0.0, reward_bound=0.0
59261: loss=0.000, reward_mean=0.0, reward_bound=0.0
59262: loss=0.000, reward_mean=0.0, reward_bound=0.0
59263: loss=0.000, reward_mean=0.1, reward_bou

59401: loss=0.000, reward_mean=0.1, reward_bound=0.0
59402: loss=0.000, reward_mean=0.1, reward_bound=0.0
59403: loss=0.000, reward_mean=0.1, reward_bound=0.0
59404: loss=0.000, reward_mean=0.0, reward_bound=0.0
59405: loss=0.000, reward_mean=0.1, reward_bound=0.0
59406: loss=0.000, reward_mean=0.1, reward_bound=0.0
59407: loss=0.000, reward_mean=0.1, reward_bound=0.0
59408: loss=0.000, reward_mean=0.0, reward_bound=0.0
59409: loss=0.000, reward_mean=0.0, reward_bound=0.0
59410: loss=0.000, reward_mean=0.0, reward_bound=0.0
59411: loss=0.000, reward_mean=0.1, reward_bound=0.0
59412: loss=0.000, reward_mean=0.0, reward_bound=0.0
59413: loss=0.000, reward_mean=0.0, reward_bound=0.0
59414: loss=0.000, reward_mean=0.1, reward_bound=0.0
59415: loss=0.000, reward_mean=0.1, reward_bound=0.0
59416: loss=0.000, reward_mean=0.0, reward_bound=0.0
59417: loss=0.000, reward_mean=0.1, reward_bound=0.0
59418: loss=0.000, reward_mean=0.1, reward_bound=0.0
59419: loss=0.000, reward_mean=0.1, reward_bou

59556: loss=0.000, reward_mean=0.0, reward_bound=0.0
59557: loss=0.000, reward_mean=0.1, reward_bound=0.0
59558: loss=0.000, reward_mean=0.1, reward_bound=0.0
59559: loss=0.000, reward_mean=0.1, reward_bound=0.0
59560: loss=0.000, reward_mean=0.1, reward_bound=0.0
59561: loss=0.000, reward_mean=0.0, reward_bound=0.0
59562: loss=0.000, reward_mean=0.0, reward_bound=0.0
59563: loss=0.000, reward_mean=0.1, reward_bound=0.0
59564: loss=0.000, reward_mean=0.0, reward_bound=0.0
59565: loss=0.000, reward_mean=0.0, reward_bound=0.0
59566: loss=0.000, reward_mean=0.1, reward_bound=0.0
59567: loss=0.000, reward_mean=0.0, reward_bound=0.0
59568: loss=0.000, reward_mean=0.1, reward_bound=0.0
59569: loss=0.000, reward_mean=0.0, reward_bound=0.0
59570: loss=0.000, reward_mean=0.1, reward_bound=0.0
59571: loss=0.000, reward_mean=0.1, reward_bound=0.0
59572: loss=0.000, reward_mean=0.1, reward_bound=0.0
59573: loss=0.000, reward_mean=0.1, reward_bound=0.0
59574: loss=0.000, reward_mean=0.1, reward_bou

59716: loss=0.000, reward_mean=0.0, reward_bound=0.0
59717: loss=0.000, reward_mean=0.0, reward_bound=0.0
59718: loss=0.000, reward_mean=0.1, reward_bound=0.0
59719: loss=0.000, reward_mean=0.1, reward_bound=0.0
59720: loss=0.000, reward_mean=0.1, reward_bound=0.0
59721: loss=0.000, reward_mean=0.1, reward_bound=0.0
59722: loss=0.000, reward_mean=0.0, reward_bound=0.0
59723: loss=0.000, reward_mean=0.1, reward_bound=0.0
59724: loss=0.000, reward_mean=0.0, reward_bound=0.0
59725: loss=0.000, reward_mean=0.0, reward_bound=0.0
59726: loss=0.000, reward_mean=0.0, reward_bound=0.0
59727: loss=0.000, reward_mean=0.1, reward_bound=0.0
59728: loss=0.000, reward_mean=0.1, reward_bound=0.0
59729: loss=0.000, reward_mean=0.0, reward_bound=0.0
59730: loss=0.000, reward_mean=0.0, reward_bound=0.0
59731: loss=0.000, reward_mean=0.1, reward_bound=0.0
59732: loss=0.000, reward_mean=0.0, reward_bound=0.0
59733: loss=0.000, reward_mean=0.1, reward_bound=0.0
59734: loss=0.000, reward_mean=0.1, reward_bou

59871: loss=0.000, reward_mean=0.0, reward_bound=0.0
59872: loss=0.000, reward_mean=0.0, reward_bound=0.0
59873: loss=0.000, reward_mean=0.0, reward_bound=0.0
59874: loss=0.000, reward_mean=0.2, reward_bound=0.0
59875: loss=0.000, reward_mean=0.1, reward_bound=0.0
59876: loss=0.000, reward_mean=0.0, reward_bound=0.0
59877: loss=0.000, reward_mean=0.1, reward_bound=0.0
59878: loss=0.000, reward_mean=0.1, reward_bound=0.0
59879: loss=0.000, reward_mean=0.1, reward_bound=0.0
59880: loss=0.000, reward_mean=0.1, reward_bound=0.0
59881: loss=0.000, reward_mean=0.1, reward_bound=0.0
59882: loss=0.000, reward_mean=0.0, reward_bound=0.0
59883: loss=0.000, reward_mean=0.0, reward_bound=0.0
59884: loss=0.000, reward_mean=0.1, reward_bound=0.0
59885: loss=0.000, reward_mean=0.1, reward_bound=0.0
59886: loss=0.000, reward_mean=0.2, reward_bound=0.0
59887: loss=0.000, reward_mean=0.1, reward_bound=0.0
59888: loss=0.000, reward_mean=0.0, reward_bound=0.0
59889: loss=0.000, reward_mean=0.1, reward_bou

60027: loss=0.000, reward_mean=0.1, reward_bound=0.0
60028: loss=0.000, reward_mean=0.1, reward_bound=0.0
60029: loss=0.000, reward_mean=0.1, reward_bound=0.0
60030: loss=0.000, reward_mean=0.0, reward_bound=0.0
60031: loss=0.000, reward_mean=0.1, reward_bound=0.0
60032: loss=0.000, reward_mean=0.1, reward_bound=0.0
60033: loss=0.000, reward_mean=0.0, reward_bound=0.0
60034: loss=0.000, reward_mean=0.0, reward_bound=0.0
60035: loss=0.000, reward_mean=0.0, reward_bound=0.0
60036: loss=0.000, reward_mean=0.0, reward_bound=0.0
60037: loss=0.000, reward_mean=0.0, reward_bound=0.0
60038: loss=0.000, reward_mean=0.1, reward_bound=0.0
60039: loss=0.000, reward_mean=0.1, reward_bound=0.0
60040: loss=0.000, reward_mean=0.2, reward_bound=0.0
60041: loss=0.000, reward_mean=0.1, reward_bound=0.0
60042: loss=0.000, reward_mean=0.1, reward_bound=0.0
60043: loss=0.000, reward_mean=0.0, reward_bound=0.0
60044: loss=0.000, reward_mean=0.1, reward_bound=0.0
60045: loss=0.000, reward_mean=0.1, reward_bou

60189: loss=0.000, reward_mean=0.1, reward_bound=0.0
60190: loss=0.000, reward_mean=0.1, reward_bound=0.0
60191: loss=0.000, reward_mean=0.0, reward_bound=0.0
60192: loss=0.000, reward_mean=0.1, reward_bound=0.0
60193: loss=0.000, reward_mean=0.2, reward_bound=0.0
60194: loss=0.000, reward_mean=0.1, reward_bound=0.0
60195: loss=0.000, reward_mean=0.0, reward_bound=0.0
60196: loss=0.000, reward_mean=0.0, reward_bound=0.0
60197: loss=0.000, reward_mean=0.1, reward_bound=0.0
60198: loss=0.000, reward_mean=0.1, reward_bound=0.0
60199: loss=0.000, reward_mean=0.0, reward_bound=0.0
60200: loss=0.000, reward_mean=0.0, reward_bound=0.0
60201: loss=0.000, reward_mean=0.1, reward_bound=0.0
60202: loss=0.000, reward_mean=0.0, reward_bound=0.0
60203: loss=0.000, reward_mean=0.1, reward_bound=0.0
60204: loss=0.000, reward_mean=0.1, reward_bound=0.0
60205: loss=0.000, reward_mean=0.0, reward_bound=0.0
60206: loss=0.000, reward_mean=0.1, reward_bound=0.0
60207: loss=0.000, reward_mean=0.1, reward_bou

60345: loss=0.000, reward_mean=0.1, reward_bound=0.0
60346: loss=0.000, reward_mean=0.0, reward_bound=0.0
60347: loss=0.000, reward_mean=0.1, reward_bound=0.0
60348: loss=0.000, reward_mean=0.0, reward_bound=0.0
60349: loss=0.000, reward_mean=0.0, reward_bound=0.0
60350: loss=0.000, reward_mean=0.0, reward_bound=0.0
60351: loss=0.000, reward_mean=0.0, reward_bound=0.0
60352: loss=0.000, reward_mean=0.1, reward_bound=0.0
60353: loss=0.000, reward_mean=0.1, reward_bound=0.0
60354: loss=0.000, reward_mean=0.1, reward_bound=0.0
60355: loss=0.000, reward_mean=0.1, reward_bound=0.0
60356: loss=0.000, reward_mean=0.1, reward_bound=0.0
60357: loss=0.000, reward_mean=0.1, reward_bound=0.0
60358: loss=0.000, reward_mean=0.1, reward_bound=0.0
60359: loss=0.000, reward_mean=0.0, reward_bound=0.0
60360: loss=0.000, reward_mean=0.1, reward_bound=0.0
60361: loss=0.000, reward_mean=0.0, reward_bound=0.0
60362: loss=0.000, reward_mean=0.0, reward_bound=0.0
60363: loss=0.000, reward_mean=0.0, reward_bou

60502: loss=0.000, reward_mean=0.0, reward_bound=0.0
60503: loss=0.000, reward_mean=0.0, reward_bound=0.0
60504: loss=0.000, reward_mean=0.1, reward_bound=0.0
60505: loss=0.000, reward_mean=0.1, reward_bound=0.0
60506: loss=0.000, reward_mean=0.0, reward_bound=0.0
60507: loss=0.000, reward_mean=0.0, reward_bound=0.0
60508: loss=0.000, reward_mean=0.0, reward_bound=0.0
60509: loss=0.000, reward_mean=0.1, reward_bound=0.0
60510: loss=0.000, reward_mean=0.0, reward_bound=0.0
60511: loss=0.000, reward_mean=0.1, reward_bound=0.0
60512: loss=0.000, reward_mean=0.0, reward_bound=0.0
60513: loss=0.000, reward_mean=0.1, reward_bound=0.0
60514: loss=0.000, reward_mean=0.1, reward_bound=0.0
60515: loss=0.000, reward_mean=0.1, reward_bound=0.0
60516: loss=0.000, reward_mean=0.0, reward_bound=0.0
60517: loss=0.000, reward_mean=0.1, reward_bound=0.0
60518: loss=0.000, reward_mean=0.1, reward_bound=0.0
60519: loss=0.000, reward_mean=0.0, reward_bound=0.0
60520: loss=0.000, reward_mean=0.0, reward_bou

60661: loss=0.000, reward_mean=0.1, reward_bound=0.0
60662: loss=0.000, reward_mean=0.0, reward_bound=0.0
60663: loss=0.000, reward_mean=0.0, reward_bound=0.0
60664: loss=0.000, reward_mean=0.1, reward_bound=0.0
60665: loss=0.000, reward_mean=0.1, reward_bound=0.0
60666: loss=0.000, reward_mean=0.0, reward_bound=0.0
60667: loss=0.000, reward_mean=0.1, reward_bound=0.0
60668: loss=0.000, reward_mean=0.1, reward_bound=0.0
60669: loss=0.000, reward_mean=0.1, reward_bound=0.0
60670: loss=0.000, reward_mean=0.1, reward_bound=0.0
60671: loss=0.000, reward_mean=0.1, reward_bound=0.0
60672: loss=0.000, reward_mean=0.1, reward_bound=0.0
60673: loss=0.000, reward_mean=0.1, reward_bound=0.0
60674: loss=0.000, reward_mean=0.1, reward_bound=0.0
60675: loss=0.000, reward_mean=0.1, reward_bound=0.0
60676: loss=0.000, reward_mean=0.0, reward_bound=0.0
60677: loss=0.000, reward_mean=0.0, reward_bound=0.0
60678: loss=0.000, reward_mean=0.0, reward_bound=0.0
60679: loss=0.000, reward_mean=0.1, reward_bou

60822: loss=0.000, reward_mean=0.0, reward_bound=0.0
60823: loss=0.000, reward_mean=0.0, reward_bound=0.0
60824: loss=0.000, reward_mean=0.1, reward_bound=0.0
60825: loss=0.000, reward_mean=0.0, reward_bound=0.0
60826: loss=0.000, reward_mean=0.1, reward_bound=0.0
60827: loss=0.000, reward_mean=0.0, reward_bound=0.0
60828: loss=0.000, reward_mean=0.1, reward_bound=0.0
60829: loss=0.000, reward_mean=0.0, reward_bound=0.0
60830: loss=0.000, reward_mean=0.1, reward_bound=0.0
60831: loss=0.000, reward_mean=0.1, reward_bound=0.0
60832: loss=0.000, reward_mean=0.1, reward_bound=0.0
60833: loss=0.000, reward_mean=0.1, reward_bound=0.0
60834: loss=0.000, reward_mean=0.0, reward_bound=0.0
60835: loss=0.000, reward_mean=0.1, reward_bound=0.0
60836: loss=0.000, reward_mean=0.1, reward_bound=0.0
60837: loss=0.000, reward_mean=0.0, reward_bound=0.0
60838: loss=0.000, reward_mean=0.0, reward_bound=0.0
60839: loss=0.000, reward_mean=0.0, reward_bound=0.0
60840: loss=0.000, reward_mean=0.1, reward_bou

60982: loss=0.000, reward_mean=0.0, reward_bound=0.0
60983: loss=0.000, reward_mean=0.0, reward_bound=0.0
60984: loss=0.000, reward_mean=0.0, reward_bound=0.0
60985: loss=0.000, reward_mean=0.0, reward_bound=0.0
60986: loss=0.000, reward_mean=0.0, reward_bound=0.0
60987: loss=0.000, reward_mean=0.0, reward_bound=0.0
60988: loss=0.000, reward_mean=0.0, reward_bound=0.0
60989: loss=0.000, reward_mean=0.1, reward_bound=0.0
60990: loss=0.000, reward_mean=0.0, reward_bound=0.0
60991: loss=0.000, reward_mean=0.1, reward_bound=0.0
60992: loss=0.000, reward_mean=0.2, reward_bound=0.0
60993: loss=0.000, reward_mean=0.0, reward_bound=0.0
60994: loss=0.000, reward_mean=0.2, reward_bound=0.0
60995: loss=0.000, reward_mean=0.0, reward_bound=0.0
60996: loss=0.000, reward_mean=0.1, reward_bound=0.0
60997: loss=0.000, reward_mean=0.1, reward_bound=0.0
60998: loss=0.000, reward_mean=0.1, reward_bound=0.0
60999: loss=0.000, reward_mean=0.1, reward_bound=0.0
61000: loss=0.000, reward_mean=0.0, reward_bou

61139: loss=0.000, reward_mean=0.0, reward_bound=0.0
61140: loss=0.000, reward_mean=0.0, reward_bound=0.0
61141: loss=0.000, reward_mean=0.1, reward_bound=0.0
61142: loss=0.000, reward_mean=0.0, reward_bound=0.0
61143: loss=0.000, reward_mean=0.0, reward_bound=0.0
61144: loss=0.000, reward_mean=0.2, reward_bound=0.0
61145: loss=0.000, reward_mean=0.1, reward_bound=0.0
61146: loss=0.000, reward_mean=0.1, reward_bound=0.0
61147: loss=0.000, reward_mean=0.1, reward_bound=0.0
61148: loss=0.000, reward_mean=0.1, reward_bound=0.0
61149: loss=0.000, reward_mean=0.0, reward_bound=0.0
61150: loss=0.000, reward_mean=0.1, reward_bound=0.0
61151: loss=0.000, reward_mean=0.1, reward_bound=0.0
61152: loss=0.000, reward_mean=0.1, reward_bound=0.0
61153: loss=0.000, reward_mean=0.1, reward_bound=0.0
61154: loss=0.000, reward_mean=0.0, reward_bound=0.0
61155: loss=0.000, reward_mean=0.0, reward_bound=0.0
61156: loss=0.000, reward_mean=0.0, reward_bound=0.0
61157: loss=0.000, reward_mean=0.0, reward_bou

61294: loss=0.000, reward_mean=0.2, reward_bound=0.0
61295: loss=0.000, reward_mean=0.0, reward_bound=0.0
61296: loss=0.000, reward_mean=0.0, reward_bound=0.0
61297: loss=0.000, reward_mean=0.1, reward_bound=0.0
61298: loss=0.000, reward_mean=0.0, reward_bound=0.0
61299: loss=0.000, reward_mean=0.0, reward_bound=0.0
61300: loss=0.000, reward_mean=0.1, reward_bound=0.0
61301: loss=0.000, reward_mean=0.0, reward_bound=0.0
61302: loss=0.000, reward_mean=0.1, reward_bound=0.0
61303: loss=0.000, reward_mean=0.0, reward_bound=0.0
61304: loss=0.000, reward_mean=0.0, reward_bound=0.0
61305: loss=0.000, reward_mean=0.0, reward_bound=0.0
61306: loss=0.000, reward_mean=0.0, reward_bound=0.0
61307: loss=0.000, reward_mean=0.0, reward_bound=0.0
61308: loss=0.000, reward_mean=0.0, reward_bound=0.0
61309: loss=0.000, reward_mean=0.1, reward_bound=0.0
61310: loss=0.000, reward_mean=0.0, reward_bound=0.0
61311: loss=0.000, reward_mean=0.1, reward_bound=0.0
61312: loss=0.000, reward_mean=0.1, reward_bou

61450: loss=0.000, reward_mean=0.0, reward_bound=0.0
61451: loss=0.000, reward_mean=0.0, reward_bound=0.0
61452: loss=0.000, reward_mean=0.1, reward_bound=0.0
61453: loss=0.000, reward_mean=0.0, reward_bound=0.0
61454: loss=0.000, reward_mean=0.1, reward_bound=0.0
61455: loss=0.000, reward_mean=0.1, reward_bound=0.0
61456: loss=0.000, reward_mean=0.2, reward_bound=0.0
61457: loss=0.000, reward_mean=0.0, reward_bound=0.0
61458: loss=0.000, reward_mean=0.0, reward_bound=0.0
61459: loss=0.000, reward_mean=0.1, reward_bound=0.0
61460: loss=0.000, reward_mean=0.1, reward_bound=0.0
61461: loss=0.000, reward_mean=0.2, reward_bound=0.0
61462: loss=0.000, reward_mean=0.1, reward_bound=0.0
61463: loss=0.000, reward_mean=0.1, reward_bound=0.0
61464: loss=0.000, reward_mean=0.1, reward_bound=0.0
61465: loss=0.000, reward_mean=0.1, reward_bound=0.0
61466: loss=0.000, reward_mean=0.2, reward_bound=0.0
61467: loss=0.000, reward_mean=0.1, reward_bound=0.0
61468: loss=0.000, reward_mean=0.0, reward_bou

61609: loss=0.000, reward_mean=0.1, reward_bound=0.0
61610: loss=0.000, reward_mean=0.0, reward_bound=0.0
61611: loss=0.000, reward_mean=0.1, reward_bound=0.0
61612: loss=0.000, reward_mean=0.1, reward_bound=0.0
61613: loss=0.000, reward_mean=0.0, reward_bound=0.0
61614: loss=0.000, reward_mean=0.0, reward_bound=0.0
61615: loss=0.000, reward_mean=0.2, reward_bound=0.0
61616: loss=0.000, reward_mean=0.0, reward_bound=0.0
61617: loss=0.000, reward_mean=0.0, reward_bound=0.0
61618: loss=0.000, reward_mean=0.1, reward_bound=0.0
61619: loss=0.000, reward_mean=0.0, reward_bound=0.0
61620: loss=0.000, reward_mean=0.1, reward_bound=0.0
61621: loss=0.000, reward_mean=0.1, reward_bound=0.0
61622: loss=0.000, reward_mean=0.1, reward_bound=0.0
61623: loss=0.000, reward_mean=0.1, reward_bound=0.0
61624: loss=0.000, reward_mean=0.0, reward_bound=0.0
61625: loss=0.000, reward_mean=0.1, reward_bound=0.0
61626: loss=0.000, reward_mean=0.1, reward_bound=0.0
61627: loss=0.000, reward_mean=0.1, reward_bou

61766: loss=0.000, reward_mean=0.0, reward_bound=0.0
61767: loss=0.000, reward_mean=0.1, reward_bound=0.0
61768: loss=0.000, reward_mean=0.0, reward_bound=0.0
61769: loss=0.000, reward_mean=0.0, reward_bound=0.0
61770: loss=0.000, reward_mean=0.1, reward_bound=0.0
61771: loss=0.000, reward_mean=0.1, reward_bound=0.0
61772: loss=0.000, reward_mean=0.1, reward_bound=0.0
61773: loss=0.000, reward_mean=0.0, reward_bound=0.0
61774: loss=0.000, reward_mean=0.1, reward_bound=0.0
61775: loss=0.000, reward_mean=0.0, reward_bound=0.0
61776: loss=0.000, reward_mean=0.1, reward_bound=0.0
61777: loss=0.000, reward_mean=0.1, reward_bound=0.0
61778: loss=0.000, reward_mean=0.1, reward_bound=0.0
61779: loss=0.000, reward_mean=0.0, reward_bound=0.0
61780: loss=0.000, reward_mean=0.1, reward_bound=0.0
61781: loss=0.000, reward_mean=0.2, reward_bound=0.0
61782: loss=0.000, reward_mean=0.1, reward_bound=0.0
61783: loss=0.000, reward_mean=0.0, reward_bound=0.0
61784: loss=0.000, reward_mean=0.1, reward_bou

61927: loss=0.000, reward_mean=0.0, reward_bound=0.0
61928: loss=0.000, reward_mean=0.1, reward_bound=0.0
61929: loss=0.000, reward_mean=0.1, reward_bound=0.0
61930: loss=0.000, reward_mean=0.0, reward_bound=0.0
61931: loss=0.000, reward_mean=0.1, reward_bound=0.0
61932: loss=0.000, reward_mean=0.1, reward_bound=0.0
61933: loss=0.000, reward_mean=0.2, reward_bound=0.0
61934: loss=0.000, reward_mean=0.1, reward_bound=0.0
61935: loss=0.000, reward_mean=0.0, reward_bound=0.0
61936: loss=0.000, reward_mean=0.1, reward_bound=0.0
61937: loss=0.000, reward_mean=0.1, reward_bound=0.0
61938: loss=0.000, reward_mean=0.0, reward_bound=0.0
61939: loss=0.000, reward_mean=0.1, reward_bound=0.0
61940: loss=0.000, reward_mean=0.0, reward_bound=0.0
61941: loss=0.000, reward_mean=0.1, reward_bound=0.0
61942: loss=0.000, reward_mean=0.0, reward_bound=0.0
61943: loss=0.000, reward_mean=0.0, reward_bound=0.0
61944: loss=0.000, reward_mean=0.1, reward_bound=0.0
61945: loss=0.000, reward_mean=0.1, reward_bou

62086: loss=0.000, reward_mean=0.1, reward_bound=0.0
62087: loss=0.000, reward_mean=0.1, reward_bound=0.0
62088: loss=0.000, reward_mean=0.0, reward_bound=0.0
62089: loss=0.000, reward_mean=0.1, reward_bound=0.0
62090: loss=0.000, reward_mean=0.0, reward_bound=0.0
62091: loss=0.000, reward_mean=0.1, reward_bound=0.0
62092: loss=0.000, reward_mean=0.1, reward_bound=0.0
62093: loss=0.000, reward_mean=0.0, reward_bound=0.0
62094: loss=0.000, reward_mean=0.0, reward_bound=0.0
62095: loss=0.000, reward_mean=0.1, reward_bound=0.0
62096: loss=0.000, reward_mean=0.0, reward_bound=0.0
62097: loss=0.000, reward_mean=0.0, reward_bound=0.0
62098: loss=0.000, reward_mean=0.1, reward_bound=0.0
62099: loss=0.000, reward_mean=0.1, reward_bound=0.0
62100: loss=0.000, reward_mean=0.0, reward_bound=0.0
62101: loss=0.000, reward_mean=0.1, reward_bound=0.0
62102: loss=0.000, reward_mean=0.0, reward_bound=0.0
62103: loss=0.000, reward_mean=0.1, reward_bound=0.0
62104: loss=0.000, reward_mean=0.0, reward_bou

62241: loss=0.000, reward_mean=0.0, reward_bound=0.0
62242: loss=0.000, reward_mean=0.0, reward_bound=0.0
62243: loss=0.000, reward_mean=0.0, reward_bound=0.0
62244: loss=0.000, reward_mean=0.1, reward_bound=0.0
62245: loss=0.000, reward_mean=0.1, reward_bound=0.0
62246: loss=0.000, reward_mean=0.1, reward_bound=0.0
62247: loss=0.000, reward_mean=0.0, reward_bound=0.0
62248: loss=0.000, reward_mean=0.1, reward_bound=0.0
62249: loss=0.000, reward_mean=0.0, reward_bound=0.0
62250: loss=0.000, reward_mean=0.0, reward_bound=0.0
62251: loss=0.000, reward_mean=0.0, reward_bound=0.0
62252: loss=0.000, reward_mean=0.0, reward_bound=0.0
62253: loss=0.000, reward_mean=0.0, reward_bound=0.0
62254: loss=0.000, reward_mean=0.0, reward_bound=0.0
62255: loss=0.000, reward_mean=0.0, reward_bound=0.0
62256: loss=0.000, reward_mean=0.0, reward_bound=0.0
62257: loss=0.000, reward_mean=0.0, reward_bound=0.0
62258: loss=0.000, reward_mean=0.1, reward_bound=0.0
62259: loss=0.000, reward_mean=0.1, reward_bou

62402: loss=0.000, reward_mean=0.1, reward_bound=0.0
62403: loss=0.000, reward_mean=0.1, reward_bound=0.0
62404: loss=0.000, reward_mean=0.0, reward_bound=0.0
62405: loss=0.000, reward_mean=0.1, reward_bound=0.0
62406: loss=0.000, reward_mean=0.0, reward_bound=0.0
62407: loss=0.000, reward_mean=0.1, reward_bound=0.0
62408: loss=0.000, reward_mean=0.0, reward_bound=0.0
62409: loss=0.000, reward_mean=0.1, reward_bound=0.0
62410: loss=0.000, reward_mean=0.0, reward_bound=0.0
62411: loss=0.000, reward_mean=0.0, reward_bound=0.0
62412: loss=0.000, reward_mean=0.1, reward_bound=0.0
62413: loss=0.000, reward_mean=0.0, reward_bound=0.0
62414: loss=0.000, reward_mean=0.0, reward_bound=0.0
62415: loss=0.000, reward_mean=0.1, reward_bound=0.0
62416: loss=0.000, reward_mean=0.1, reward_bound=0.0
62417: loss=0.000, reward_mean=0.1, reward_bound=0.0
62418: loss=0.000, reward_mean=0.0, reward_bound=0.0
62419: loss=0.000, reward_mean=0.0, reward_bound=0.0
62420: loss=0.000, reward_mean=0.1, reward_bou

62564: loss=0.000, reward_mean=0.1, reward_bound=0.0
62565: loss=0.000, reward_mean=0.1, reward_bound=0.0
62566: loss=0.000, reward_mean=0.0, reward_bound=0.0
62567: loss=0.000, reward_mean=0.1, reward_bound=0.0
62568: loss=0.000, reward_mean=0.0, reward_bound=0.0
62569: loss=0.000, reward_mean=0.1, reward_bound=0.0
62570: loss=0.000, reward_mean=0.1, reward_bound=0.0
62571: loss=0.000, reward_mean=0.0, reward_bound=0.0
62572: loss=0.000, reward_mean=0.1, reward_bound=0.0
62573: loss=0.000, reward_mean=0.1, reward_bound=0.0
62574: loss=0.000, reward_mean=0.1, reward_bound=0.0
62575: loss=0.000, reward_mean=0.1, reward_bound=0.0
62576: loss=0.000, reward_mean=0.1, reward_bound=0.0
62577: loss=0.000, reward_mean=0.1, reward_bound=0.0
62578: loss=0.000, reward_mean=0.1, reward_bound=0.0
62579: loss=0.000, reward_mean=0.1, reward_bound=0.0
62580: loss=0.000, reward_mean=0.0, reward_bound=0.0
62581: loss=0.000, reward_mean=0.2, reward_bound=0.0
62582: loss=0.000, reward_mean=0.1, reward_bou

62725: loss=0.000, reward_mean=0.1, reward_bound=0.0
62726: loss=0.000, reward_mean=0.2, reward_bound=0.0
62727: loss=0.000, reward_mean=0.1, reward_bound=0.0
62728: loss=0.000, reward_mean=0.1, reward_bound=0.0
62729: loss=0.000, reward_mean=0.0, reward_bound=0.0
62730: loss=0.000, reward_mean=0.1, reward_bound=0.0
62731: loss=0.000, reward_mean=0.1, reward_bound=0.0
62732: loss=0.000, reward_mean=0.0, reward_bound=0.0
62733: loss=0.000, reward_mean=0.1, reward_bound=0.0
62734: loss=0.000, reward_mean=0.1, reward_bound=0.0
62735: loss=0.000, reward_mean=0.1, reward_bound=0.0
62736: loss=0.000, reward_mean=0.0, reward_bound=0.0
62737: loss=0.000, reward_mean=0.1, reward_bound=0.0
62738: loss=0.000, reward_mean=0.0, reward_bound=0.0
62739: loss=0.000, reward_mean=0.1, reward_bound=0.0
62740: loss=0.000, reward_mean=0.1, reward_bound=0.0
62741: loss=0.000, reward_mean=0.1, reward_bound=0.0
62742: loss=0.000, reward_mean=0.0, reward_bound=0.0
62743: loss=0.000, reward_mean=0.0, reward_bou

62882: loss=0.000, reward_mean=0.1, reward_bound=0.0
62883: loss=0.000, reward_mean=0.0, reward_bound=0.0
62884: loss=0.000, reward_mean=0.0, reward_bound=0.0
62885: loss=0.000, reward_mean=0.2, reward_bound=0.0
62886: loss=0.000, reward_mean=0.0, reward_bound=0.0
62887: loss=0.000, reward_mean=0.1, reward_bound=0.0
62888: loss=0.000, reward_mean=0.0, reward_bound=0.0
62889: loss=0.000, reward_mean=0.0, reward_bound=0.0
62890: loss=0.000, reward_mean=0.2, reward_bound=0.0
62891: loss=0.000, reward_mean=0.2, reward_bound=0.0
62892: loss=0.000, reward_mean=0.1, reward_bound=0.0
62893: loss=0.000, reward_mean=0.1, reward_bound=0.0
62894: loss=0.000, reward_mean=0.0, reward_bound=0.0
62895: loss=0.000, reward_mean=0.1, reward_bound=0.0
62896: loss=0.000, reward_mean=0.0, reward_bound=0.0
62897: loss=0.000, reward_mean=0.0, reward_bound=0.0
62898: loss=0.000, reward_mean=0.1, reward_bound=0.0
62899: loss=0.000, reward_mean=0.1, reward_bound=0.0
62900: loss=0.000, reward_mean=0.1, reward_bou

63044: loss=0.000, reward_mean=0.0, reward_bound=0.0
63045: loss=0.000, reward_mean=0.1, reward_bound=0.0
63046: loss=0.000, reward_mean=0.1, reward_bound=0.0
63047: loss=0.000, reward_mean=0.1, reward_bound=0.0
63048: loss=0.000, reward_mean=0.2, reward_bound=0.0
63049: loss=0.000, reward_mean=0.1, reward_bound=0.0
63050: loss=0.000, reward_mean=0.2, reward_bound=0.0
63051: loss=0.000, reward_mean=0.0, reward_bound=0.0
63052: loss=0.000, reward_mean=0.1, reward_bound=0.0
63053: loss=0.000, reward_mean=0.1, reward_bound=0.0
63054: loss=0.000, reward_mean=0.1, reward_bound=0.0
63055: loss=0.000, reward_mean=0.0, reward_bound=0.0
63056: loss=0.000, reward_mean=0.0, reward_bound=0.0
63057: loss=0.000, reward_mean=0.1, reward_bound=0.0
63058: loss=0.000, reward_mean=0.1, reward_bound=0.0
63059: loss=0.000, reward_mean=0.0, reward_bound=0.0
63060: loss=0.000, reward_mean=0.1, reward_bound=0.0
63061: loss=0.000, reward_mean=0.1, reward_bound=0.0
63062: loss=0.000, reward_mean=0.1, reward_bou

63203: loss=0.000, reward_mean=0.1, reward_bound=0.0
63204: loss=0.000, reward_mean=0.1, reward_bound=0.0
63205: loss=0.000, reward_mean=0.1, reward_bound=0.0
63206: loss=0.000, reward_mean=0.1, reward_bound=0.0
63207: loss=0.000, reward_mean=0.1, reward_bound=0.0
63208: loss=0.000, reward_mean=0.0, reward_bound=0.0
63209: loss=0.000, reward_mean=0.1, reward_bound=0.0
63210: loss=0.000, reward_mean=0.1, reward_bound=0.0
63211: loss=0.000, reward_mean=0.1, reward_bound=0.0
63212: loss=0.000, reward_mean=0.0, reward_bound=0.0
63213: loss=0.000, reward_mean=0.0, reward_bound=0.0
63214: loss=0.000, reward_mean=0.1, reward_bound=0.0
63215: loss=0.000, reward_mean=0.0, reward_bound=0.0
63216: loss=0.000, reward_mean=0.0, reward_bound=0.0
63217: loss=0.000, reward_mean=0.1, reward_bound=0.0
63218: loss=0.000, reward_mean=0.0, reward_bound=0.0
63219: loss=0.000, reward_mean=0.0, reward_bound=0.0
63220: loss=0.000, reward_mean=0.1, reward_bound=0.0
63221: loss=0.000, reward_mean=0.0, reward_bou

63358: loss=0.000, reward_mean=0.1, reward_bound=0.0
63359: loss=0.000, reward_mean=0.2, reward_bound=0.0
63360: loss=0.000, reward_mean=0.1, reward_bound=0.0
63361: loss=0.000, reward_mean=0.0, reward_bound=0.0
63362: loss=0.000, reward_mean=0.2, reward_bound=0.0
63363: loss=0.000, reward_mean=0.0, reward_bound=0.0
63364: loss=0.000, reward_mean=0.1, reward_bound=0.0
63365: loss=0.000, reward_mean=0.1, reward_bound=0.0
63366: loss=0.000, reward_mean=0.0, reward_bound=0.0
63367: loss=0.000, reward_mean=0.1, reward_bound=0.0
63368: loss=0.000, reward_mean=0.0, reward_bound=0.0
63369: loss=0.000, reward_mean=0.0, reward_bound=0.0
63370: loss=0.000, reward_mean=0.0, reward_bound=0.0
63371: loss=0.000, reward_mean=0.1, reward_bound=0.0
63372: loss=0.000, reward_mean=0.1, reward_bound=0.0
63373: loss=0.000, reward_mean=0.0, reward_bound=0.0
63374: loss=0.000, reward_mean=0.1, reward_bound=0.0
63375: loss=0.000, reward_mean=0.1, reward_bound=0.0
63376: loss=0.000, reward_mean=0.1, reward_bou

63519: loss=0.000, reward_mean=0.1, reward_bound=0.0
63520: loss=0.000, reward_mean=0.0, reward_bound=0.0
63521: loss=0.000, reward_mean=0.1, reward_bound=0.0
63522: loss=0.000, reward_mean=0.1, reward_bound=0.0
63523: loss=0.000, reward_mean=0.0, reward_bound=0.0
63524: loss=0.000, reward_mean=0.1, reward_bound=0.0
63525: loss=0.000, reward_mean=0.0, reward_bound=0.0
63526: loss=0.000, reward_mean=0.1, reward_bound=0.0
63527: loss=0.000, reward_mean=0.1, reward_bound=0.0
63528: loss=0.000, reward_mean=0.1, reward_bound=0.0
63529: loss=0.000, reward_mean=0.1, reward_bound=0.0
63530: loss=0.000, reward_mean=0.1, reward_bound=0.0
63531: loss=0.000, reward_mean=0.1, reward_bound=0.0
63532: loss=0.000, reward_mean=0.1, reward_bound=0.0
63533: loss=0.000, reward_mean=0.0, reward_bound=0.0
63534: loss=0.000, reward_mean=0.1, reward_bound=0.0
63535: loss=0.000, reward_mean=0.1, reward_bound=0.0
63536: loss=0.000, reward_mean=0.0, reward_bound=0.0
63537: loss=0.000, reward_mean=0.1, reward_bou

63679: loss=0.000, reward_mean=0.1, reward_bound=0.0
63680: loss=0.000, reward_mean=0.1, reward_bound=0.0
63681: loss=0.000, reward_mean=0.1, reward_bound=0.0
63682: loss=0.000, reward_mean=0.1, reward_bound=0.0
63683: loss=0.000, reward_mean=0.2, reward_bound=0.0
63684: loss=0.000, reward_mean=0.2, reward_bound=0.0
63685: loss=0.000, reward_mean=0.1, reward_bound=0.0
63686: loss=0.000, reward_mean=0.1, reward_bound=0.0
63687: loss=0.000, reward_mean=0.1, reward_bound=0.0
63688: loss=0.000, reward_mean=0.0, reward_bound=0.0
63689: loss=0.000, reward_mean=0.0, reward_bound=0.0
63690: loss=0.000, reward_mean=0.0, reward_bound=0.0
63691: loss=0.000, reward_mean=0.0, reward_bound=0.0
63692: loss=0.000, reward_mean=0.1, reward_bound=0.0
63693: loss=0.000, reward_mean=0.0, reward_bound=0.0
63694: loss=0.000, reward_mean=0.0, reward_bound=0.0
63695: loss=0.000, reward_mean=0.0, reward_bound=0.0
63696: loss=0.000, reward_mean=0.0, reward_bound=0.0
63697: loss=0.000, reward_mean=0.1, reward_bou

63841: loss=0.000, reward_mean=0.1, reward_bound=0.0
63842: loss=0.000, reward_mean=0.1, reward_bound=0.0
63843: loss=0.000, reward_mean=0.0, reward_bound=0.0
63844: loss=0.000, reward_mean=0.0, reward_bound=0.0
63845: loss=0.000, reward_mean=0.0, reward_bound=0.0
63846: loss=0.000, reward_mean=0.0, reward_bound=0.0
63847: loss=0.000, reward_mean=0.0, reward_bound=0.0
63848: loss=0.000, reward_mean=0.0, reward_bound=0.0
63849: loss=0.000, reward_mean=0.0, reward_bound=0.0
63850: loss=0.000, reward_mean=0.1, reward_bound=0.0
63851: loss=0.000, reward_mean=0.0, reward_bound=0.0
63852: loss=0.000, reward_mean=0.1, reward_bound=0.0
63853: loss=0.000, reward_mean=0.1, reward_bound=0.0
63854: loss=0.000, reward_mean=0.1, reward_bound=0.0
63855: loss=0.000, reward_mean=0.1, reward_bound=0.0
63856: loss=0.000, reward_mean=0.0, reward_bound=0.0
63857: loss=0.000, reward_mean=0.1, reward_bound=0.0
63858: loss=0.000, reward_mean=0.0, reward_bound=0.0
63859: loss=0.000, reward_mean=0.0, reward_bou

63998: loss=0.000, reward_mean=0.1, reward_bound=0.0
63999: loss=0.000, reward_mean=0.1, reward_bound=0.0
64000: loss=0.000, reward_mean=0.0, reward_bound=0.0
64001: loss=0.000, reward_mean=0.1, reward_bound=0.0
64002: loss=0.000, reward_mean=0.0, reward_bound=0.0
64003: loss=0.000, reward_mean=0.1, reward_bound=0.0
64004: loss=0.000, reward_mean=0.1, reward_bound=0.0
64005: loss=0.000, reward_mean=0.2, reward_bound=0.0
64006: loss=0.000, reward_mean=0.1, reward_bound=0.0
64007: loss=0.000, reward_mean=0.1, reward_bound=0.0
64008: loss=0.000, reward_mean=0.1, reward_bound=0.0
64009: loss=0.000, reward_mean=0.0, reward_bound=0.0
64010: loss=0.000, reward_mean=0.0, reward_bound=0.0
64011: loss=0.000, reward_mean=0.0, reward_bound=0.0
64012: loss=0.000, reward_mean=0.0, reward_bound=0.0
64013: loss=0.000, reward_mean=0.1, reward_bound=0.0
64014: loss=0.000, reward_mean=0.0, reward_bound=0.0
64015: loss=0.000, reward_mean=0.0, reward_bound=0.0
64016: loss=0.000, reward_mean=0.1, reward_bou

64153: loss=0.000, reward_mean=0.2, reward_bound=0.0
64154: loss=0.000, reward_mean=0.0, reward_bound=0.0
64155: loss=0.000, reward_mean=0.1, reward_bound=0.0
64156: loss=0.000, reward_mean=0.0, reward_bound=0.0
64157: loss=0.000, reward_mean=0.0, reward_bound=0.0
64158: loss=0.000, reward_mean=0.0, reward_bound=0.0
64159: loss=0.000, reward_mean=0.1, reward_bound=0.0
64160: loss=0.000, reward_mean=0.1, reward_bound=0.0
64161: loss=0.000, reward_mean=0.0, reward_bound=0.0
64162: loss=0.000, reward_mean=0.1, reward_bound=0.0
64163: loss=0.000, reward_mean=0.1, reward_bound=0.0
64164: loss=0.000, reward_mean=0.0, reward_bound=0.0
64165: loss=0.000, reward_mean=0.1, reward_bound=0.0
64166: loss=0.000, reward_mean=0.1, reward_bound=0.0
64167: loss=0.000, reward_mean=0.1, reward_bound=0.0
64168: loss=0.000, reward_mean=0.1, reward_bound=0.0
64169: loss=0.000, reward_mean=0.2, reward_bound=0.0
64170: loss=0.000, reward_mean=0.1, reward_bound=0.0
64171: loss=0.000, reward_mean=0.0, reward_bou

64308: loss=0.000, reward_mean=0.0, reward_bound=0.0
64309: loss=0.000, reward_mean=0.1, reward_bound=0.0
64310: loss=0.000, reward_mean=0.1, reward_bound=0.0
64311: loss=0.000, reward_mean=0.1, reward_bound=0.0
64312: loss=0.000, reward_mean=0.1, reward_bound=0.0
64313: loss=0.000, reward_mean=0.1, reward_bound=0.0
64314: loss=0.000, reward_mean=0.0, reward_bound=0.0
64315: loss=0.000, reward_mean=0.2, reward_bound=0.0
64316: loss=0.000, reward_mean=0.0, reward_bound=0.0
64317: loss=0.000, reward_mean=0.0, reward_bound=0.0
64318: loss=0.000, reward_mean=0.0, reward_bound=0.0
64319: loss=0.000, reward_mean=0.2, reward_bound=0.0
64320: loss=0.000, reward_mean=0.1, reward_bound=0.0
64321: loss=0.000, reward_mean=0.1, reward_bound=0.0
64322: loss=0.000, reward_mean=0.1, reward_bound=0.0
64323: loss=0.000, reward_mean=0.0, reward_bound=0.0
64324: loss=0.000, reward_mean=0.0, reward_bound=0.0
64325: loss=0.000, reward_mean=0.2, reward_bound=0.0
64326: loss=0.000, reward_mean=0.0, reward_bou

64469: loss=0.000, reward_mean=0.0, reward_bound=0.0
64470: loss=0.000, reward_mean=0.0, reward_bound=0.0
64471: loss=0.000, reward_mean=0.1, reward_bound=0.0
64472: loss=0.000, reward_mean=0.0, reward_bound=0.0
64473: loss=0.000, reward_mean=0.1, reward_bound=0.0
64474: loss=0.000, reward_mean=0.1, reward_bound=0.0
64475: loss=0.000, reward_mean=0.0, reward_bound=0.0
64476: loss=0.000, reward_mean=0.0, reward_bound=0.0
64477: loss=0.000, reward_mean=0.0, reward_bound=0.0
64478: loss=0.000, reward_mean=0.1, reward_bound=0.0
64479: loss=0.000, reward_mean=0.1, reward_bound=0.0
64480: loss=0.000, reward_mean=0.0, reward_bound=0.0
64481: loss=0.000, reward_mean=0.1, reward_bound=0.0
64482: loss=0.000, reward_mean=0.0, reward_bound=0.0
64483: loss=0.000, reward_mean=0.1, reward_bound=0.0
64484: loss=0.000, reward_mean=0.1, reward_bound=0.0
64485: loss=0.000, reward_mean=0.2, reward_bound=0.0
64486: loss=0.000, reward_mean=0.1, reward_bound=0.0
64487: loss=0.000, reward_mean=0.0, reward_bou

64624: loss=0.000, reward_mean=0.0, reward_bound=0.0
64625: loss=0.000, reward_mean=0.0, reward_bound=0.0
64626: loss=0.000, reward_mean=0.1, reward_bound=0.0
64627: loss=0.000, reward_mean=0.0, reward_bound=0.0
64628: loss=0.000, reward_mean=0.1, reward_bound=0.0
64629: loss=0.000, reward_mean=0.1, reward_bound=0.0
64630: loss=0.000, reward_mean=0.1, reward_bound=0.0
64631: loss=0.000, reward_mean=0.0, reward_bound=0.0
64632: loss=0.000, reward_mean=0.1, reward_bound=0.0
64633: loss=0.000, reward_mean=0.1, reward_bound=0.0
64634: loss=0.000, reward_mean=0.0, reward_bound=0.0
64635: loss=0.000, reward_mean=0.1, reward_bound=0.0
64636: loss=0.000, reward_mean=0.1, reward_bound=0.0
64637: loss=0.000, reward_mean=0.1, reward_bound=0.0
64638: loss=0.000, reward_mean=0.0, reward_bound=0.0
64639: loss=0.000, reward_mean=0.1, reward_bound=0.0
64640: loss=0.000, reward_mean=0.1, reward_bound=0.0
64641: loss=0.000, reward_mean=0.0, reward_bound=0.0
64642: loss=0.000, reward_mean=0.0, reward_bou

64784: loss=0.000, reward_mean=0.0, reward_bound=0.0
64785: loss=0.000, reward_mean=0.1, reward_bound=0.0
64786: loss=0.000, reward_mean=0.0, reward_bound=0.0
64787: loss=0.000, reward_mean=0.0, reward_bound=0.0
64788: loss=0.000, reward_mean=0.1, reward_bound=0.0
64789: loss=0.000, reward_mean=0.0, reward_bound=0.0
64790: loss=0.000, reward_mean=0.0, reward_bound=0.0
64791: loss=0.000, reward_mean=0.1, reward_bound=0.0
64792: loss=0.000, reward_mean=0.1, reward_bound=0.0
64793: loss=0.000, reward_mean=0.1, reward_bound=0.0
64794: loss=0.000, reward_mean=0.0, reward_bound=0.0
64795: loss=0.000, reward_mean=0.1, reward_bound=0.0
64796: loss=0.000, reward_mean=0.0, reward_bound=0.0
64797: loss=0.000, reward_mean=0.0, reward_bound=0.0
64798: loss=0.000, reward_mean=0.1, reward_bound=0.0
64799: loss=0.000, reward_mean=0.0, reward_bound=0.0
64800: loss=0.000, reward_mean=0.0, reward_bound=0.0
64801: loss=0.000, reward_mean=0.1, reward_bound=0.0
64802: loss=0.000, reward_mean=0.0, reward_bou

64939: loss=0.000, reward_mean=0.2, reward_bound=0.0
64940: loss=0.000, reward_mean=0.0, reward_bound=0.0
64941: loss=0.000, reward_mean=0.0, reward_bound=0.0
64942: loss=0.000, reward_mean=0.1, reward_bound=0.0
64943: loss=0.000, reward_mean=0.2, reward_bound=0.0
64944: loss=0.000, reward_mean=0.1, reward_bound=0.0
64945: loss=0.000, reward_mean=0.0, reward_bound=0.0
64946: loss=0.000, reward_mean=0.1, reward_bound=0.0
64947: loss=0.000, reward_mean=0.1, reward_bound=0.0
64948: loss=0.000, reward_mean=0.0, reward_bound=0.0
64949: loss=0.000, reward_mean=0.0, reward_bound=0.0
64950: loss=0.000, reward_mean=0.1, reward_bound=0.0
64951: loss=0.000, reward_mean=0.1, reward_bound=0.0
64952: loss=0.000, reward_mean=0.0, reward_bound=0.0
64953: loss=0.000, reward_mean=0.0, reward_bound=0.0
64954: loss=0.000, reward_mean=0.1, reward_bound=0.0
64955: loss=0.000, reward_mean=0.1, reward_bound=0.0
64956: loss=0.000, reward_mean=0.1, reward_bound=0.0
64957: loss=0.000, reward_mean=0.0, reward_bou

65094: loss=0.000, reward_mean=0.1, reward_bound=0.0
65095: loss=0.000, reward_mean=0.0, reward_bound=0.0
65096: loss=0.000, reward_mean=0.1, reward_bound=0.0
65097: loss=0.000, reward_mean=0.0, reward_bound=0.0
65098: loss=0.000, reward_mean=0.1, reward_bound=0.0
65099: loss=0.000, reward_mean=0.1, reward_bound=0.0
65100: loss=0.000, reward_mean=0.0, reward_bound=0.0
65101: loss=0.000, reward_mean=0.0, reward_bound=0.0
65102: loss=0.000, reward_mean=0.1, reward_bound=0.0
65103: loss=0.000, reward_mean=0.0, reward_bound=0.0
65104: loss=0.000, reward_mean=0.1, reward_bound=0.0
65105: loss=0.000, reward_mean=0.1, reward_bound=0.0
65106: loss=0.000, reward_mean=0.0, reward_bound=0.0
65107: loss=0.000, reward_mean=0.0, reward_bound=0.0
65108: loss=0.000, reward_mean=0.0, reward_bound=0.0
65109: loss=0.000, reward_mean=0.2, reward_bound=0.0
65110: loss=0.000, reward_mean=0.0, reward_bound=0.0
65111: loss=0.000, reward_mean=0.1, reward_bound=0.0
65112: loss=0.000, reward_mean=0.0, reward_bou

65252: loss=0.000, reward_mean=0.1, reward_bound=0.0
65253: loss=0.000, reward_mean=0.0, reward_bound=0.0
65254: loss=0.000, reward_mean=0.1, reward_bound=0.0
65255: loss=0.000, reward_mean=0.0, reward_bound=0.0
65256: loss=0.000, reward_mean=0.0, reward_bound=0.0
65257: loss=0.000, reward_mean=0.1, reward_bound=0.0
65258: loss=0.000, reward_mean=0.1, reward_bound=0.0
65259: loss=0.000, reward_mean=0.1, reward_bound=0.0
65260: loss=0.000, reward_mean=0.1, reward_bound=0.0
65261: loss=0.000, reward_mean=0.0, reward_bound=0.0
65262: loss=0.000, reward_mean=0.1, reward_bound=0.0
65263: loss=0.000, reward_mean=0.1, reward_bound=0.0
65264: loss=0.000, reward_mean=0.0, reward_bound=0.0
65265: loss=0.000, reward_mean=0.1, reward_bound=0.0
65266: loss=0.000, reward_mean=0.0, reward_bound=0.0
65267: loss=0.000, reward_mean=0.0, reward_bound=0.0
65268: loss=0.000, reward_mean=0.1, reward_bound=0.0
65269: loss=0.000, reward_mean=0.1, reward_bound=0.0
65270: loss=0.000, reward_mean=0.0, reward_bou

65409: loss=0.000, reward_mean=0.0, reward_bound=0.0
65410: loss=0.000, reward_mean=0.0, reward_bound=0.0
65411: loss=0.000, reward_mean=0.0, reward_bound=0.0
65412: loss=0.000, reward_mean=0.1, reward_bound=0.0
65413: loss=0.000, reward_mean=0.0, reward_bound=0.0
65414: loss=0.000, reward_mean=0.1, reward_bound=0.0
65415: loss=0.000, reward_mean=0.1, reward_bound=0.0
65416: loss=0.000, reward_mean=0.1, reward_bound=0.0
65417: loss=0.000, reward_mean=0.1, reward_bound=0.0
65418: loss=0.000, reward_mean=0.0, reward_bound=0.0
65419: loss=0.000, reward_mean=0.1, reward_bound=0.0
65420: loss=0.000, reward_mean=0.2, reward_bound=0.0
65421: loss=0.000, reward_mean=0.1, reward_bound=0.0
65422: loss=0.000, reward_mean=0.0, reward_bound=0.0
65423: loss=0.000, reward_mean=0.1, reward_bound=0.0
65424: loss=0.000, reward_mean=0.0, reward_bound=0.0
65425: loss=0.000, reward_mean=0.0, reward_bound=0.0
65426: loss=0.000, reward_mean=0.0, reward_bound=0.0
65427: loss=0.000, reward_mean=0.1, reward_bou

65571: loss=0.000, reward_mean=0.1, reward_bound=0.0
65572: loss=0.000, reward_mean=0.1, reward_bound=0.0
65573: loss=0.000, reward_mean=0.0, reward_bound=0.0
65574: loss=0.000, reward_mean=0.2, reward_bound=0.0
65575: loss=0.000, reward_mean=0.1, reward_bound=0.0
65576: loss=0.000, reward_mean=0.1, reward_bound=0.0
65577: loss=0.000, reward_mean=0.1, reward_bound=0.0
65578: loss=0.000, reward_mean=0.1, reward_bound=0.0
65579: loss=0.000, reward_mean=0.1, reward_bound=0.0
65580: loss=0.000, reward_mean=0.1, reward_bound=0.0
65581: loss=0.000, reward_mean=0.0, reward_bound=0.0
65582: loss=0.000, reward_mean=0.1, reward_bound=0.0
65583: loss=0.000, reward_mean=0.0, reward_bound=0.0
65584: loss=0.000, reward_mean=0.1, reward_bound=0.0
65585: loss=0.000, reward_mean=0.1, reward_bound=0.0
65586: loss=0.000, reward_mean=0.0, reward_bound=0.0
65587: loss=0.000, reward_mean=0.1, reward_bound=0.0
65588: loss=0.000, reward_mean=0.1, reward_bound=0.0
65589: loss=0.000, reward_mean=0.1, reward_bou

65726: loss=0.000, reward_mean=0.0, reward_bound=0.0
65727: loss=0.000, reward_mean=0.2, reward_bound=0.0
65728: loss=0.000, reward_mean=0.0, reward_bound=0.0
65729: loss=0.000, reward_mean=0.1, reward_bound=0.0
65730: loss=0.000, reward_mean=0.1, reward_bound=0.0
65731: loss=0.000, reward_mean=0.0, reward_bound=0.0
65732: loss=0.000, reward_mean=0.0, reward_bound=0.0
65733: loss=0.000, reward_mean=0.1, reward_bound=0.0
65734: loss=0.000, reward_mean=0.1, reward_bound=0.0
65735: loss=0.000, reward_mean=0.1, reward_bound=0.0
65736: loss=0.000, reward_mean=0.1, reward_bound=0.0
65737: loss=0.000, reward_mean=0.1, reward_bound=0.0
65738: loss=0.000, reward_mean=0.0, reward_bound=0.0
65739: loss=0.000, reward_mean=0.1, reward_bound=0.0
65740: loss=0.000, reward_mean=0.1, reward_bound=0.0
65741: loss=0.000, reward_mean=0.1, reward_bound=0.0
65742: loss=0.000, reward_mean=0.0, reward_bound=0.0
65743: loss=0.000, reward_mean=0.0, reward_bound=0.0
65744: loss=0.000, reward_mean=0.0, reward_bou

65886: loss=0.000, reward_mean=0.1, reward_bound=0.0
65887: loss=0.000, reward_mean=0.0, reward_bound=0.0
65888: loss=0.000, reward_mean=0.0, reward_bound=0.0
65889: loss=0.000, reward_mean=0.1, reward_bound=0.0
65890: loss=0.000, reward_mean=0.1, reward_bound=0.0
65891: loss=0.000, reward_mean=0.0, reward_bound=0.0
65892: loss=0.000, reward_mean=0.1, reward_bound=0.0
65893: loss=0.000, reward_mean=0.0, reward_bound=0.0
65894: loss=0.000, reward_mean=0.1, reward_bound=0.0
65895: loss=0.000, reward_mean=0.1, reward_bound=0.0
65896: loss=0.000, reward_mean=0.0, reward_bound=0.0
65897: loss=0.000, reward_mean=0.0, reward_bound=0.0
65898: loss=0.000, reward_mean=0.1, reward_bound=0.0
65899: loss=0.000, reward_mean=0.0, reward_bound=0.0
65900: loss=0.000, reward_mean=0.1, reward_bound=0.0
65901: loss=0.000, reward_mean=0.0, reward_bound=0.0
65902: loss=0.000, reward_mean=0.0, reward_bound=0.0
65903: loss=0.000, reward_mean=0.1, reward_bound=0.0
65904: loss=0.000, reward_mean=0.1, reward_bou

66044: loss=0.000, reward_mean=0.1, reward_bound=0.0
66045: loss=0.000, reward_mean=0.0, reward_bound=0.0
66046: loss=0.000, reward_mean=0.1, reward_bound=0.0
66047: loss=0.000, reward_mean=0.0, reward_bound=0.0
66048: loss=0.000, reward_mean=0.0, reward_bound=0.0
66049: loss=0.000, reward_mean=0.0, reward_bound=0.0
66050: loss=0.000, reward_mean=0.0, reward_bound=0.0
66051: loss=0.000, reward_mean=0.1, reward_bound=0.0
66052: loss=0.000, reward_mean=0.1, reward_bound=0.0
66053: loss=0.000, reward_mean=0.0, reward_bound=0.0
66054: loss=0.000, reward_mean=0.0, reward_bound=0.0
66055: loss=0.000, reward_mean=0.0, reward_bound=0.0
66056: loss=0.000, reward_mean=0.1, reward_bound=0.0
66057: loss=0.000, reward_mean=0.2, reward_bound=0.0
66058: loss=0.000, reward_mean=0.1, reward_bound=0.0
66059: loss=0.000, reward_mean=0.1, reward_bound=0.0
66060: loss=0.000, reward_mean=0.0, reward_bound=0.0
66061: loss=0.000, reward_mean=0.1, reward_bound=0.0
66062: loss=0.000, reward_mean=0.1, reward_bou

66199: loss=0.000, reward_mean=0.1, reward_bound=0.0
66200: loss=0.000, reward_mean=0.1, reward_bound=0.0
66201: loss=0.000, reward_mean=0.1, reward_bound=0.0
66202: loss=0.000, reward_mean=0.1, reward_bound=0.0
66203: loss=0.000, reward_mean=0.1, reward_bound=0.0
66204: loss=0.000, reward_mean=0.1, reward_bound=0.0
66205: loss=0.000, reward_mean=0.1, reward_bound=0.0
66206: loss=0.000, reward_mean=0.0, reward_bound=0.0
66207: loss=0.000, reward_mean=0.1, reward_bound=0.0
66208: loss=0.000, reward_mean=0.0, reward_bound=0.0
66209: loss=0.000, reward_mean=0.0, reward_bound=0.0
66210: loss=0.000, reward_mean=0.0, reward_bound=0.0
66211: loss=0.000, reward_mean=0.1, reward_bound=0.0
66212: loss=0.000, reward_mean=0.1, reward_bound=0.0
66213: loss=0.000, reward_mean=0.0, reward_bound=0.0
66214: loss=0.000, reward_mean=0.0, reward_bound=0.0
66215: loss=0.000, reward_mean=0.0, reward_bound=0.0
66216: loss=0.000, reward_mean=0.2, reward_bound=0.0
66217: loss=0.000, reward_mean=0.0, reward_bou

66354: loss=0.000, reward_mean=0.1, reward_bound=0.0
66355: loss=0.000, reward_mean=0.0, reward_bound=0.0
66356: loss=0.000, reward_mean=0.0, reward_bound=0.0
66357: loss=0.000, reward_mean=0.0, reward_bound=0.0
66358: loss=0.000, reward_mean=0.1, reward_bound=0.0
66359: loss=0.000, reward_mean=0.1, reward_bound=0.0
66360: loss=0.000, reward_mean=0.1, reward_bound=0.0
66361: loss=0.000, reward_mean=0.0, reward_bound=0.0
66362: loss=0.000, reward_mean=0.1, reward_bound=0.0
66363: loss=0.000, reward_mean=0.2, reward_bound=0.0
66364: loss=0.000, reward_mean=0.1, reward_bound=0.0
66365: loss=0.000, reward_mean=0.1, reward_bound=0.0
66366: loss=0.000, reward_mean=0.1, reward_bound=0.0
66367: loss=0.000, reward_mean=0.1, reward_bound=0.0
66368: loss=0.000, reward_mean=0.0, reward_bound=0.0
66369: loss=0.000, reward_mean=0.1, reward_bound=0.0
66370: loss=0.000, reward_mean=0.1, reward_bound=0.0
66371: loss=0.000, reward_mean=0.1, reward_bound=0.0
66372: loss=0.000, reward_mean=0.0, reward_bou

66511: loss=0.000, reward_mean=0.0, reward_bound=0.0
66512: loss=0.000, reward_mean=0.0, reward_bound=0.0
66513: loss=0.000, reward_mean=0.0, reward_bound=0.0
66514: loss=0.000, reward_mean=0.1, reward_bound=0.0
66515: loss=0.000, reward_mean=0.0, reward_bound=0.0
66516: loss=0.000, reward_mean=0.1, reward_bound=0.0
66517: loss=0.000, reward_mean=0.1, reward_bound=0.0
66518: loss=0.000, reward_mean=0.1, reward_bound=0.0
66519: loss=0.000, reward_mean=0.0, reward_bound=0.0
66520: loss=0.000, reward_mean=0.0, reward_bound=0.0
66521: loss=0.000, reward_mean=0.0, reward_bound=0.0
66522: loss=0.000, reward_mean=0.0, reward_bound=0.0
66523: loss=0.000, reward_mean=0.1, reward_bound=0.0
66524: loss=0.000, reward_mean=0.1, reward_bound=0.0
66525: loss=0.000, reward_mean=0.1, reward_bound=0.0
66526: loss=0.000, reward_mean=0.1, reward_bound=0.0
66527: loss=0.000, reward_mean=0.2, reward_bound=0.0
66528: loss=0.000, reward_mean=0.0, reward_bound=0.0
66529: loss=0.000, reward_mean=0.1, reward_bou

66666: loss=0.000, reward_mean=0.1, reward_bound=0.0
66667: loss=0.000, reward_mean=0.1, reward_bound=0.0
66668: loss=0.000, reward_mean=0.1, reward_bound=0.0
66669: loss=0.000, reward_mean=0.0, reward_bound=0.0
66670: loss=0.000, reward_mean=0.0, reward_bound=0.0
66671: loss=0.000, reward_mean=0.1, reward_bound=0.0
66672: loss=0.000, reward_mean=0.0, reward_bound=0.0
66673: loss=0.000, reward_mean=0.0, reward_bound=0.0
66674: loss=0.000, reward_mean=0.0, reward_bound=0.0
66675: loss=0.000, reward_mean=0.1, reward_bound=0.0
66676: loss=0.000, reward_mean=0.0, reward_bound=0.0
66677: loss=0.000, reward_mean=0.0, reward_bound=0.0
66678: loss=0.000, reward_mean=0.0, reward_bound=0.0
66679: loss=0.000, reward_mean=0.1, reward_bound=0.0
66680: loss=0.000, reward_mean=0.0, reward_bound=0.0
66681: loss=0.000, reward_mean=0.0, reward_bound=0.0
66682: loss=0.000, reward_mean=0.2, reward_bound=0.0
66683: loss=0.000, reward_mean=0.1, reward_bound=0.0
66684: loss=0.000, reward_mean=0.0, reward_bou

66821: loss=0.000, reward_mean=0.1, reward_bound=0.0
66822: loss=0.000, reward_mean=0.1, reward_bound=0.0
66823: loss=0.000, reward_mean=0.1, reward_bound=0.0
66824: loss=0.000, reward_mean=0.0, reward_bound=0.0
66825: loss=0.000, reward_mean=0.1, reward_bound=0.0
66826: loss=0.000, reward_mean=0.1, reward_bound=0.0
66827: loss=0.000, reward_mean=0.0, reward_bound=0.0
66828: loss=0.000, reward_mean=0.1, reward_bound=0.0
66829: loss=0.000, reward_mean=0.0, reward_bound=0.0
66830: loss=0.000, reward_mean=0.1, reward_bound=0.0
66831: loss=0.000, reward_mean=0.1, reward_bound=0.0
66832: loss=0.000, reward_mean=0.1, reward_bound=0.0
66833: loss=0.000, reward_mean=0.0, reward_bound=0.0
66834: loss=0.000, reward_mean=0.1, reward_bound=0.0
66835: loss=0.000, reward_mean=0.1, reward_bound=0.0
66836: loss=0.000, reward_mean=0.1, reward_bound=0.0
66837: loss=0.000, reward_mean=0.0, reward_bound=0.0
66838: loss=0.000, reward_mean=0.1, reward_bound=0.0
66839: loss=0.000, reward_mean=0.0, reward_bou

66977: loss=0.000, reward_mean=0.0, reward_bound=0.0
66978: loss=0.000, reward_mean=0.0, reward_bound=0.0
66979: loss=0.000, reward_mean=0.0, reward_bound=0.0
66980: loss=0.000, reward_mean=0.2, reward_bound=0.0
66981: loss=0.000, reward_mean=0.1, reward_bound=0.0
66982: loss=0.000, reward_mean=0.1, reward_bound=0.0
66983: loss=0.000, reward_mean=0.1, reward_bound=0.0
66984: loss=0.000, reward_mean=0.1, reward_bound=0.0
66985: loss=0.000, reward_mean=0.1, reward_bound=0.0
66986: loss=0.000, reward_mean=0.1, reward_bound=0.0
66987: loss=0.000, reward_mean=0.0, reward_bound=0.0
66988: loss=0.000, reward_mean=0.1, reward_bound=0.0
66989: loss=0.000, reward_mean=0.1, reward_bound=0.0
66990: loss=0.000, reward_mean=0.1, reward_bound=0.0
66991: loss=0.000, reward_mean=0.1, reward_bound=0.0
66992: loss=0.000, reward_mean=0.1, reward_bound=0.0
66993: loss=0.000, reward_mean=0.1, reward_bound=0.0
66994: loss=0.000, reward_mean=0.1, reward_bound=0.0
66995: loss=0.000, reward_mean=0.0, reward_bou

67132: loss=0.000, reward_mean=0.0, reward_bound=0.0
67133: loss=0.000, reward_mean=0.0, reward_bound=0.0
67134: loss=0.000, reward_mean=0.2, reward_bound=0.0
67135: loss=0.000, reward_mean=0.1, reward_bound=0.0
67136: loss=0.000, reward_mean=0.1, reward_bound=0.0
67137: loss=0.000, reward_mean=0.0, reward_bound=0.0
67138: loss=0.000, reward_mean=0.1, reward_bound=0.0
67139: loss=0.000, reward_mean=0.0, reward_bound=0.0
67140: loss=0.000, reward_mean=0.0, reward_bound=0.0
67141: loss=0.000, reward_mean=0.0, reward_bound=0.0
67142: loss=0.000, reward_mean=0.1, reward_bound=0.0
67143: loss=0.000, reward_mean=0.0, reward_bound=0.0
67144: loss=0.000, reward_mean=0.2, reward_bound=0.0
67145: loss=0.000, reward_mean=0.1, reward_bound=0.0
67146: loss=0.000, reward_mean=0.0, reward_bound=0.0
67147: loss=0.000, reward_mean=0.0, reward_bound=0.0
67148: loss=0.000, reward_mean=0.0, reward_bound=0.0
67149: loss=0.000, reward_mean=0.0, reward_bound=0.0
67150: loss=0.000, reward_mean=0.1, reward_bou

67287: loss=0.000, reward_mean=0.0, reward_bound=0.0
67288: loss=0.000, reward_mean=0.0, reward_bound=0.0
67289: loss=0.000, reward_mean=0.1, reward_bound=0.0
67290: loss=0.000, reward_mean=0.0, reward_bound=0.0
67291: loss=0.000, reward_mean=0.1, reward_bound=0.0
67292: loss=0.000, reward_mean=0.0, reward_bound=0.0
67293: loss=0.000, reward_mean=0.1, reward_bound=0.0
67294: loss=0.000, reward_mean=0.1, reward_bound=0.0
67295: loss=0.000, reward_mean=0.1, reward_bound=0.0
67296: loss=0.000, reward_mean=0.1, reward_bound=0.0
67297: loss=0.000, reward_mean=0.2, reward_bound=0.0
67298: loss=0.000, reward_mean=0.0, reward_bound=0.0
67299: loss=0.000, reward_mean=0.1, reward_bound=0.0
67300: loss=0.000, reward_mean=0.1, reward_bound=0.0
67301: loss=0.000, reward_mean=0.0, reward_bound=0.0
67302: loss=0.000, reward_mean=0.1, reward_bound=0.0
67303: loss=0.000, reward_mean=0.1, reward_bound=0.0
67304: loss=0.000, reward_mean=0.1, reward_bound=0.0
67305: loss=0.000, reward_mean=0.0, reward_bou

67444: loss=0.000, reward_mean=0.1, reward_bound=0.0
67445: loss=0.000, reward_mean=0.1, reward_bound=0.0
67446: loss=0.000, reward_mean=0.1, reward_bound=0.0
67447: loss=0.000, reward_mean=0.0, reward_bound=0.0
67448: loss=0.000, reward_mean=0.1, reward_bound=0.0
67449: loss=0.000, reward_mean=0.1, reward_bound=0.0
67450: loss=0.000, reward_mean=0.1, reward_bound=0.0
67451: loss=0.000, reward_mean=0.0, reward_bound=0.0
67452: loss=0.000, reward_mean=0.0, reward_bound=0.0
67453: loss=0.000, reward_mean=0.0, reward_bound=0.0
67454: loss=0.000, reward_mean=0.1, reward_bound=0.0
67455: loss=0.000, reward_mean=0.1, reward_bound=0.0
67456: loss=0.000, reward_mean=0.1, reward_bound=0.0
67457: loss=0.000, reward_mean=0.0, reward_bound=0.0
67458: loss=0.000, reward_mean=0.1, reward_bound=0.0
67459: loss=0.000, reward_mean=0.1, reward_bound=0.0
67460: loss=0.000, reward_mean=0.1, reward_bound=0.0
67461: loss=0.000, reward_mean=0.1, reward_bound=0.0
67462: loss=0.000, reward_mean=0.1, reward_bou

67602: loss=0.000, reward_mean=0.1, reward_bound=0.0
67603: loss=0.000, reward_mean=0.1, reward_bound=0.0
67604: loss=0.000, reward_mean=0.0, reward_bound=0.0
67605: loss=0.000, reward_mean=0.1, reward_bound=0.0
67606: loss=0.000, reward_mean=0.0, reward_bound=0.0
67607: loss=0.000, reward_mean=0.0, reward_bound=0.0
67608: loss=0.000, reward_mean=0.1, reward_bound=0.0
67609: loss=0.000, reward_mean=0.0, reward_bound=0.0
67610: loss=0.000, reward_mean=0.0, reward_bound=0.0
67611: loss=0.000, reward_mean=0.0, reward_bound=0.0
67612: loss=0.000, reward_mean=0.0, reward_bound=0.0
67613: loss=0.000, reward_mean=0.1, reward_bound=0.0
67614: loss=0.000, reward_mean=0.2, reward_bound=0.0
67615: loss=0.000, reward_mean=0.0, reward_bound=0.0
67616: loss=0.000, reward_mean=0.0, reward_bound=0.0
67617: loss=0.000, reward_mean=0.0, reward_bound=0.0
67618: loss=0.000, reward_mean=0.0, reward_bound=0.0
67619: loss=0.000, reward_mean=0.1, reward_bound=0.0
67620: loss=0.000, reward_mean=0.0, reward_bou

67757: loss=0.000, reward_mean=0.0, reward_bound=0.0
67758: loss=0.000, reward_mean=0.0, reward_bound=0.0
67759: loss=0.000, reward_mean=0.0, reward_bound=0.0
67760: loss=0.000, reward_mean=0.0, reward_bound=0.0
67761: loss=0.000, reward_mean=0.1, reward_bound=0.0
67762: loss=0.000, reward_mean=0.0, reward_bound=0.0
67763: loss=0.000, reward_mean=0.0, reward_bound=0.0
67764: loss=0.000, reward_mean=0.0, reward_bound=0.0
67765: loss=0.000, reward_mean=0.1, reward_bound=0.0
67766: loss=0.000, reward_mean=0.1, reward_bound=0.0
67767: loss=0.000, reward_mean=0.0, reward_bound=0.0
67768: loss=0.000, reward_mean=0.1, reward_bound=0.0
67769: loss=0.000, reward_mean=0.1, reward_bound=0.0
67770: loss=0.000, reward_mean=0.1, reward_bound=0.0
67771: loss=0.000, reward_mean=0.0, reward_bound=0.0
67772: loss=0.000, reward_mean=0.0, reward_bound=0.0
67773: loss=0.000, reward_mean=0.0, reward_bound=0.0
67774: loss=0.000, reward_mean=0.0, reward_bound=0.0
67775: loss=0.000, reward_mean=0.1, reward_bou

67912: loss=0.000, reward_mean=0.0, reward_bound=0.0
67913: loss=0.000, reward_mean=0.1, reward_bound=0.0
67914: loss=0.000, reward_mean=0.1, reward_bound=0.0
67915: loss=0.000, reward_mean=0.1, reward_bound=0.0
67916: loss=0.000, reward_mean=0.1, reward_bound=0.0
67917: loss=0.000, reward_mean=0.1, reward_bound=0.0
67918: loss=0.000, reward_mean=0.0, reward_bound=0.0
67919: loss=0.000, reward_mean=0.2, reward_bound=0.0
67920: loss=0.000, reward_mean=0.1, reward_bound=0.0
67921: loss=0.000, reward_mean=0.0, reward_bound=0.0
67922: loss=0.000, reward_mean=0.0, reward_bound=0.0
67923: loss=0.000, reward_mean=0.0, reward_bound=0.0
67924: loss=0.000, reward_mean=0.0, reward_bound=0.0
67925: loss=0.000, reward_mean=0.0, reward_bound=0.0
67926: loss=0.000, reward_mean=0.1, reward_bound=0.0
67927: loss=0.000, reward_mean=0.1, reward_bound=0.0
67928: loss=0.000, reward_mean=0.1, reward_bound=0.0
67929: loss=0.000, reward_mean=0.1, reward_bound=0.0
67930: loss=0.000, reward_mean=0.1, reward_bou

68067: loss=0.000, reward_mean=0.1, reward_bound=0.0
68068: loss=0.000, reward_mean=0.1, reward_bound=0.0
68069: loss=0.000, reward_mean=0.1, reward_bound=0.0
68070: loss=0.000, reward_mean=0.0, reward_bound=0.0
68071: loss=0.000, reward_mean=0.1, reward_bound=0.0
68072: loss=0.000, reward_mean=0.0, reward_bound=0.0
68073: loss=0.000, reward_mean=0.1, reward_bound=0.0
68074: loss=0.000, reward_mean=0.1, reward_bound=0.0
68075: loss=0.000, reward_mean=0.1, reward_bound=0.0
68076: loss=0.000, reward_mean=0.1, reward_bound=0.0
68077: loss=0.000, reward_mean=0.0, reward_bound=0.0
68078: loss=0.000, reward_mean=0.0, reward_bound=0.0
68079: loss=0.000, reward_mean=0.1, reward_bound=0.0
68080: loss=0.000, reward_mean=0.0, reward_bound=0.0
68081: loss=0.000, reward_mean=0.2, reward_bound=0.0
68082: loss=0.000, reward_mean=0.2, reward_bound=0.0
68083: loss=0.000, reward_mean=0.1, reward_bound=0.0
68084: loss=0.000, reward_mean=0.0, reward_bound=0.0
68085: loss=0.000, reward_mean=0.1, reward_bou

68222: loss=0.000, reward_mean=0.1, reward_bound=0.0
68223: loss=0.000, reward_mean=0.0, reward_bound=0.0
68224: loss=0.000, reward_mean=0.0, reward_bound=0.0
68225: loss=0.000, reward_mean=0.0, reward_bound=0.0
68226: loss=0.000, reward_mean=0.1, reward_bound=0.0
68227: loss=0.000, reward_mean=0.1, reward_bound=0.0
68228: loss=0.000, reward_mean=0.0, reward_bound=0.0
68229: loss=0.000, reward_mean=0.1, reward_bound=0.0
68230: loss=0.000, reward_mean=0.0, reward_bound=0.0
68231: loss=0.000, reward_mean=0.0, reward_bound=0.0
68232: loss=0.000, reward_mean=0.1, reward_bound=0.0
68233: loss=0.000, reward_mean=0.0, reward_bound=0.0
68234: loss=0.000, reward_mean=0.0, reward_bound=0.0
68235: loss=0.000, reward_mean=0.1, reward_bound=0.0
68236: loss=0.000, reward_mean=0.1, reward_bound=0.0
68237: loss=0.000, reward_mean=0.1, reward_bound=0.0
68238: loss=0.000, reward_mean=0.0, reward_bound=0.0
68239: loss=0.000, reward_mean=0.1, reward_bound=0.0
68240: loss=0.000, reward_mean=0.0, reward_bou

68379: loss=0.000, reward_mean=0.1, reward_bound=0.0
68380: loss=0.000, reward_mean=0.0, reward_bound=0.0
68381: loss=0.000, reward_mean=0.2, reward_bound=0.0
68382: loss=0.000, reward_mean=0.0, reward_bound=0.0
68383: loss=0.000, reward_mean=0.1, reward_bound=0.0
68384: loss=0.000, reward_mean=0.0, reward_bound=0.0
68385: loss=0.000, reward_mean=0.1, reward_bound=0.0
68386: loss=0.000, reward_mean=0.1, reward_bound=0.0
68387: loss=0.000, reward_mean=0.2, reward_bound=0.0
68388: loss=0.000, reward_mean=0.1, reward_bound=0.0
68389: loss=0.000, reward_mean=0.1, reward_bound=0.0
68390: loss=0.000, reward_mean=0.1, reward_bound=0.0
68391: loss=0.000, reward_mean=0.1, reward_bound=0.0
68392: loss=0.000, reward_mean=0.2, reward_bound=0.0
68393: loss=0.000, reward_mean=0.1, reward_bound=0.0
68394: loss=0.000, reward_mean=0.1, reward_bound=0.0
68395: loss=0.000, reward_mean=0.1, reward_bound=0.0
68396: loss=0.000, reward_mean=0.0, reward_bound=0.0
68397: loss=0.000, reward_mean=0.0, reward_bou

68537: loss=0.000, reward_mean=0.0, reward_bound=0.0
68538: loss=0.000, reward_mean=0.1, reward_bound=0.0
68539: loss=0.000, reward_mean=0.1, reward_bound=0.0
68540: loss=0.000, reward_mean=0.1, reward_bound=0.0
68541: loss=0.000, reward_mean=0.0, reward_bound=0.0
68542: loss=0.000, reward_mean=0.0, reward_bound=0.0
68543: loss=0.000, reward_mean=0.0, reward_bound=0.0
68544: loss=0.000, reward_mean=0.0, reward_bound=0.0
68545: loss=0.000, reward_mean=0.0, reward_bound=0.0
68546: loss=0.000, reward_mean=0.1, reward_bound=0.0
68547: loss=0.000, reward_mean=0.1, reward_bound=0.0
68548: loss=0.000, reward_mean=0.0, reward_bound=0.0
68549: loss=0.000, reward_mean=0.1, reward_bound=0.0
68550: loss=0.000, reward_mean=0.1, reward_bound=0.0
68551: loss=0.000, reward_mean=0.1, reward_bound=0.0
68552: loss=0.000, reward_mean=0.1, reward_bound=0.0
68553: loss=0.000, reward_mean=0.1, reward_bound=0.0
68554: loss=0.000, reward_mean=0.1, reward_bound=0.0
68555: loss=0.000, reward_mean=0.0, reward_bou

68693: loss=0.000, reward_mean=0.0, reward_bound=0.0
68694: loss=0.000, reward_mean=0.0, reward_bound=0.0
68695: loss=0.000, reward_mean=0.1, reward_bound=0.0
68696: loss=0.000, reward_mean=0.1, reward_bound=0.0
68697: loss=0.000, reward_mean=0.0, reward_bound=0.0
68698: loss=0.000, reward_mean=0.1, reward_bound=0.0
68699: loss=0.000, reward_mean=0.1, reward_bound=0.0
68700: loss=0.000, reward_mean=0.0, reward_bound=0.0
68701: loss=0.000, reward_mean=0.0, reward_bound=0.0
68702: loss=0.000, reward_mean=0.0, reward_bound=0.0
68703: loss=0.000, reward_mean=0.1, reward_bound=0.0
68704: loss=0.000, reward_mean=0.1, reward_bound=0.0
68705: loss=0.000, reward_mean=0.1, reward_bound=0.0
68706: loss=0.000, reward_mean=0.1, reward_bound=0.0
68707: loss=0.000, reward_mean=0.1, reward_bound=0.0
68708: loss=0.000, reward_mean=0.1, reward_bound=0.0
68709: loss=0.000, reward_mean=0.0, reward_bound=0.0
68710: loss=0.000, reward_mean=0.1, reward_bound=0.0
68711: loss=0.000, reward_mean=0.0, reward_bou

68849: loss=0.000, reward_mean=0.0, reward_bound=0.0
68850: loss=0.000, reward_mean=0.1, reward_bound=0.0
68851: loss=0.000, reward_mean=0.1, reward_bound=0.0
68852: loss=0.000, reward_mean=0.0, reward_bound=0.0
68853: loss=0.000, reward_mean=0.0, reward_bound=0.0
68854: loss=0.000, reward_mean=0.1, reward_bound=0.0
68855: loss=0.000, reward_mean=0.0, reward_bound=0.0
68856: loss=0.000, reward_mean=0.0, reward_bound=0.0
68857: loss=0.000, reward_mean=0.0, reward_bound=0.0
68858: loss=0.000, reward_mean=0.0, reward_bound=0.0
68859: loss=0.000, reward_mean=0.0, reward_bound=0.0
68860: loss=0.000, reward_mean=0.1, reward_bound=0.0
68861: loss=0.000, reward_mean=0.1, reward_bound=0.0
68862: loss=0.000, reward_mean=0.1, reward_bound=0.0
68863: loss=0.000, reward_mean=0.0, reward_bound=0.0
68864: loss=0.000, reward_mean=0.0, reward_bound=0.0
68865: loss=0.000, reward_mean=0.1, reward_bound=0.0
68866: loss=0.000, reward_mean=0.1, reward_bound=0.0
68867: loss=0.000, reward_mean=0.1, reward_bou

69010: loss=0.000, reward_mean=0.1, reward_bound=0.0
69011: loss=0.000, reward_mean=0.1, reward_bound=0.0
69012: loss=0.000, reward_mean=0.1, reward_bound=0.0
69013: loss=0.000, reward_mean=0.0, reward_bound=0.0
69014: loss=0.000, reward_mean=0.1, reward_bound=0.0
69015: loss=0.000, reward_mean=0.1, reward_bound=0.0
69016: loss=0.000, reward_mean=0.0, reward_bound=0.0
69017: loss=0.000, reward_mean=0.1, reward_bound=0.0
69018: loss=0.000, reward_mean=0.0, reward_bound=0.0
69019: loss=0.000, reward_mean=0.1, reward_bound=0.0
69020: loss=0.000, reward_mean=0.1, reward_bound=0.0
69021: loss=0.000, reward_mean=0.1, reward_bound=0.0
69022: loss=0.000, reward_mean=0.1, reward_bound=0.0
69023: loss=0.000, reward_mean=0.2, reward_bound=0.0
69024: loss=0.000, reward_mean=0.0, reward_bound=0.0
69025: loss=0.000, reward_mean=0.0, reward_bound=0.0
69026: loss=0.000, reward_mean=0.1, reward_bound=0.0
69027: loss=0.000, reward_mean=0.1, reward_bound=0.0
69028: loss=0.000, reward_mean=0.0, reward_bou

69166: loss=0.000, reward_mean=0.0, reward_bound=0.0
69167: loss=0.000, reward_mean=0.0, reward_bound=0.0
69168: loss=0.000, reward_mean=0.1, reward_bound=0.0
69169: loss=0.000, reward_mean=0.0, reward_bound=0.0
69170: loss=0.000, reward_mean=0.0, reward_bound=0.0
69171: loss=0.000, reward_mean=0.1, reward_bound=0.0
69172: loss=0.000, reward_mean=0.1, reward_bound=0.0
69173: loss=0.000, reward_mean=0.1, reward_bound=0.0
69174: loss=0.000, reward_mean=0.0, reward_bound=0.0
69175: loss=0.000, reward_mean=0.1, reward_bound=0.0
69176: loss=0.000, reward_mean=0.0, reward_bound=0.0
69177: loss=0.000, reward_mean=0.0, reward_bound=0.0
69178: loss=0.000, reward_mean=0.1, reward_bound=0.0
69179: loss=0.000, reward_mean=0.0, reward_bound=0.0
69180: loss=0.000, reward_mean=0.1, reward_bound=0.0
69181: loss=0.000, reward_mean=0.1, reward_bound=0.0
69182: loss=0.000, reward_mean=0.1, reward_bound=0.0
69183: loss=0.000, reward_mean=0.0, reward_bound=0.0
69184: loss=0.000, reward_mean=0.1, reward_bou

69324: loss=0.000, reward_mean=0.1, reward_bound=0.0
69325: loss=0.000, reward_mean=0.0, reward_bound=0.0
69326: loss=0.000, reward_mean=0.1, reward_bound=0.0
69327: loss=0.000, reward_mean=0.1, reward_bound=0.0
69328: loss=0.000, reward_mean=0.1, reward_bound=0.0
69329: loss=0.000, reward_mean=0.0, reward_bound=0.0
69330: loss=0.000, reward_mean=0.1, reward_bound=0.0
69331: loss=0.000, reward_mean=0.1, reward_bound=0.0
69332: loss=0.000, reward_mean=0.0, reward_bound=0.0
69333: loss=0.000, reward_mean=0.0, reward_bound=0.0
69334: loss=0.000, reward_mean=0.1, reward_bound=0.0
69335: loss=0.000, reward_mean=0.1, reward_bound=0.0
69336: loss=0.000, reward_mean=0.0, reward_bound=0.0
69337: loss=0.000, reward_mean=0.1, reward_bound=0.0
69338: loss=0.000, reward_mean=0.0, reward_bound=0.0
69339: loss=0.000, reward_mean=0.0, reward_bound=0.0
69340: loss=0.000, reward_mean=0.0, reward_bound=0.0
69341: loss=0.000, reward_mean=0.0, reward_bound=0.0
69342: loss=0.000, reward_mean=0.1, reward_bou

69482: loss=0.000, reward_mean=0.2, reward_bound=0.0
69483: loss=0.000, reward_mean=0.0, reward_bound=0.0
69484: loss=0.000, reward_mean=0.1, reward_bound=0.0
69485: loss=0.000, reward_mean=0.0, reward_bound=0.0
69486: loss=0.000, reward_mean=0.1, reward_bound=0.0
69487: loss=0.000, reward_mean=0.0, reward_bound=0.0
69488: loss=0.000, reward_mean=0.1, reward_bound=0.0
69489: loss=0.000, reward_mean=0.0, reward_bound=0.0
69490: loss=0.000, reward_mean=0.2, reward_bound=0.0
69491: loss=0.000, reward_mean=0.1, reward_bound=0.0
69492: loss=0.000, reward_mean=0.1, reward_bound=0.0
69493: loss=0.000, reward_mean=0.0, reward_bound=0.0
69494: loss=0.000, reward_mean=0.1, reward_bound=0.0
69495: loss=0.000, reward_mean=0.0, reward_bound=0.0
69496: loss=0.000, reward_mean=0.1, reward_bound=0.0
69497: loss=0.000, reward_mean=0.0, reward_bound=0.0
69498: loss=0.000, reward_mean=0.1, reward_bound=0.0
69499: loss=0.000, reward_mean=0.2, reward_bound=0.0
69500: loss=0.000, reward_mean=0.1, reward_bou

69638: loss=0.000, reward_mean=0.0, reward_bound=0.0
69639: loss=0.000, reward_mean=0.1, reward_bound=0.0
69640: loss=0.000, reward_mean=0.0, reward_bound=0.0
69641: loss=0.000, reward_mean=0.1, reward_bound=0.0
69642: loss=0.000, reward_mean=0.1, reward_bound=0.0
69643: loss=0.000, reward_mean=0.0, reward_bound=0.0
69644: loss=0.000, reward_mean=0.1, reward_bound=0.0
69645: loss=0.000, reward_mean=0.1, reward_bound=0.0
69646: loss=0.000, reward_mean=0.1, reward_bound=0.0
69647: loss=0.000, reward_mean=0.1, reward_bound=0.0
69648: loss=0.000, reward_mean=0.2, reward_bound=0.0
69649: loss=0.000, reward_mean=0.1, reward_bound=0.0
69650: loss=0.000, reward_mean=0.1, reward_bound=0.0
69651: loss=0.000, reward_mean=0.1, reward_bound=0.0
69652: loss=0.000, reward_mean=0.0, reward_bound=0.0
69653: loss=0.000, reward_mean=0.0, reward_bound=0.0
69654: loss=0.000, reward_mean=0.1, reward_bound=0.0
69655: loss=0.000, reward_mean=0.0, reward_bound=0.0
69656: loss=0.000, reward_mean=0.0, reward_bou

69795: loss=0.000, reward_mean=0.0, reward_bound=0.0
69796: loss=0.000, reward_mean=0.0, reward_bound=0.0
69797: loss=0.000, reward_mean=0.0, reward_bound=0.0
69798: loss=0.000, reward_mean=0.0, reward_bound=0.0
69799: loss=0.000, reward_mean=0.1, reward_bound=0.0
69800: loss=0.000, reward_mean=0.1, reward_bound=0.0
69801: loss=0.000, reward_mean=0.1, reward_bound=0.0
69802: loss=0.000, reward_mean=0.0, reward_bound=0.0
69803: loss=0.000, reward_mean=0.0, reward_bound=0.0
69804: loss=0.000, reward_mean=0.1, reward_bound=0.0
69805: loss=0.000, reward_mean=0.0, reward_bound=0.0
69806: loss=0.000, reward_mean=0.0, reward_bound=0.0
69807: loss=0.000, reward_mean=0.0, reward_bound=0.0
69808: loss=0.000, reward_mean=0.0, reward_bound=0.0
69809: loss=0.000, reward_mean=0.3, reward_bound=0.5
69810: loss=0.000, reward_mean=0.1, reward_bound=0.0
69811: loss=0.000, reward_mean=0.1, reward_bound=0.0
69812: loss=0.000, reward_mean=0.0, reward_bound=0.0
69813: loss=0.000, reward_mean=0.0, reward_bou

69951: loss=0.000, reward_mean=0.0, reward_bound=0.0
69952: loss=0.000, reward_mean=0.0, reward_bound=0.0
69953: loss=0.000, reward_mean=0.1, reward_bound=0.0
69954: loss=0.000, reward_mean=0.0, reward_bound=0.0
69955: loss=0.000, reward_mean=0.1, reward_bound=0.0
69956: loss=0.000, reward_mean=0.1, reward_bound=0.0
69957: loss=0.000, reward_mean=0.1, reward_bound=0.0
69958: loss=0.000, reward_mean=0.0, reward_bound=0.0
69959: loss=0.000, reward_mean=0.0, reward_bound=0.0
69960: loss=0.000, reward_mean=0.1, reward_bound=0.0
69961: loss=0.000, reward_mean=0.0, reward_bound=0.0
69962: loss=0.000, reward_mean=0.1, reward_bound=0.0
69963: loss=0.000, reward_mean=0.0, reward_bound=0.0
69964: loss=0.000, reward_mean=0.1, reward_bound=0.0
69965: loss=0.000, reward_mean=0.1, reward_bound=0.0
69966: loss=0.000, reward_mean=0.0, reward_bound=0.0
69967: loss=0.000, reward_mean=0.2, reward_bound=0.0
69968: loss=0.000, reward_mean=0.0, reward_bound=0.0
69969: loss=0.000, reward_mean=0.1, reward_bou

70107: loss=0.000, reward_mean=0.1, reward_bound=0.0
70108: loss=0.000, reward_mean=0.2, reward_bound=0.0
70109: loss=0.000, reward_mean=0.0, reward_bound=0.0
70110: loss=0.000, reward_mean=0.2, reward_bound=0.0
70111: loss=0.000, reward_mean=0.0, reward_bound=0.0
70112: loss=0.000, reward_mean=0.1, reward_bound=0.0
70113: loss=0.000, reward_mean=0.1, reward_bound=0.0
70114: loss=0.000, reward_mean=0.1, reward_bound=0.0
70115: loss=0.000, reward_mean=0.0, reward_bound=0.0
70116: loss=0.000, reward_mean=0.1, reward_bound=0.0
70117: loss=0.000, reward_mean=0.1, reward_bound=0.0
70118: loss=0.000, reward_mean=0.0, reward_bound=0.0
70119: loss=0.000, reward_mean=0.1, reward_bound=0.0
70120: loss=0.000, reward_mean=0.1, reward_bound=0.0
70121: loss=0.000, reward_mean=0.0, reward_bound=0.0
70122: loss=0.000, reward_mean=0.1, reward_bound=0.0
70123: loss=0.000, reward_mean=0.1, reward_bound=0.0
70124: loss=0.000, reward_mean=0.1, reward_bound=0.0
70125: loss=0.000, reward_mean=0.1, reward_bou

70263: loss=0.000, reward_mean=0.1, reward_bound=0.0
70264: loss=0.000, reward_mean=0.0, reward_bound=0.0
70265: loss=0.000, reward_mean=0.1, reward_bound=0.0
70266: loss=0.000, reward_mean=0.1, reward_bound=0.0
70267: loss=0.000, reward_mean=0.1, reward_bound=0.0
70268: loss=0.000, reward_mean=0.1, reward_bound=0.0
70269: loss=0.000, reward_mean=0.0, reward_bound=0.0
70270: loss=0.000, reward_mean=0.0, reward_bound=0.0
70271: loss=0.000, reward_mean=0.0, reward_bound=0.0
70272: loss=0.000, reward_mean=0.1, reward_bound=0.0
70273: loss=0.000, reward_mean=0.1, reward_bound=0.0
70274: loss=0.000, reward_mean=0.1, reward_bound=0.0
70275: loss=0.000, reward_mean=0.0, reward_bound=0.0
70276: loss=0.000, reward_mean=0.1, reward_bound=0.0
70277: loss=0.000, reward_mean=0.0, reward_bound=0.0
70278: loss=0.000, reward_mean=0.1, reward_bound=0.0
70279: loss=0.000, reward_mean=0.0, reward_bound=0.0
70280: loss=0.000, reward_mean=0.0, reward_bound=0.0
70281: loss=0.000, reward_mean=0.0, reward_bou

70420: loss=0.000, reward_mean=0.1, reward_bound=0.0
70421: loss=0.000, reward_mean=0.0, reward_bound=0.0
70422: loss=0.000, reward_mean=0.0, reward_bound=0.0
70423: loss=0.000, reward_mean=0.1, reward_bound=0.0
70424: loss=0.000, reward_mean=0.1, reward_bound=0.0
70425: loss=0.000, reward_mean=0.0, reward_bound=0.0
70426: loss=0.000, reward_mean=0.1, reward_bound=0.0
70427: loss=0.000, reward_mean=0.0, reward_bound=0.0
70428: loss=0.000, reward_mean=0.2, reward_bound=0.0
70429: loss=0.000, reward_mean=0.2, reward_bound=0.0
70430: loss=0.000, reward_mean=0.0, reward_bound=0.0
70431: loss=0.000, reward_mean=0.1, reward_bound=0.0
70432: loss=0.000, reward_mean=0.0, reward_bound=0.0
70433: loss=0.000, reward_mean=0.0, reward_bound=0.0
70434: loss=0.000, reward_mean=0.0, reward_bound=0.0
70435: loss=0.000, reward_mean=0.0, reward_bound=0.0
70436: loss=0.000, reward_mean=0.0, reward_bound=0.0
70437: loss=0.000, reward_mean=0.1, reward_bound=0.0
70438: loss=0.000, reward_mean=0.1, reward_bou

70578: loss=0.000, reward_mean=0.1, reward_bound=0.0
70579: loss=0.000, reward_mean=0.0, reward_bound=0.0
70580: loss=0.000, reward_mean=0.1, reward_bound=0.0
70581: loss=0.000, reward_mean=0.0, reward_bound=0.0
70582: loss=0.000, reward_mean=0.1, reward_bound=0.0
70583: loss=0.000, reward_mean=0.0, reward_bound=0.0
70584: loss=0.000, reward_mean=0.0, reward_bound=0.0
70585: loss=0.000, reward_mean=0.0, reward_bound=0.0
70586: loss=0.000, reward_mean=0.0, reward_bound=0.0
70587: loss=0.000, reward_mean=0.1, reward_bound=0.0
70588: loss=0.000, reward_mean=0.0, reward_bound=0.0
70589: loss=0.000, reward_mean=0.1, reward_bound=0.0
70590: loss=0.000, reward_mean=0.2, reward_bound=0.0
70591: loss=0.000, reward_mean=0.0, reward_bound=0.0
70592: loss=0.000, reward_mean=0.0, reward_bound=0.0
70593: loss=0.000, reward_mean=0.0, reward_bound=0.0
70594: loss=0.000, reward_mean=0.0, reward_bound=0.0
70595: loss=0.000, reward_mean=0.0, reward_bound=0.0
70596: loss=0.000, reward_mean=0.1, reward_bou

70736: loss=0.000, reward_mean=0.1, reward_bound=0.0
70737: loss=0.000, reward_mean=0.1, reward_bound=0.0
70738: loss=0.000, reward_mean=0.1, reward_bound=0.0
70739: loss=0.000, reward_mean=0.1, reward_bound=0.0
70740: loss=0.000, reward_mean=0.2, reward_bound=0.0
70741: loss=0.000, reward_mean=0.1, reward_bound=0.0
70742: loss=0.000, reward_mean=0.1, reward_bound=0.0
70743: loss=0.000, reward_mean=0.1, reward_bound=0.0
70744: loss=0.000, reward_mean=0.1, reward_bound=0.0
70745: loss=0.000, reward_mean=0.2, reward_bound=0.0
70746: loss=0.000, reward_mean=0.0, reward_bound=0.0
70747: loss=0.000, reward_mean=0.0, reward_bound=0.0
70748: loss=0.000, reward_mean=0.1, reward_bound=0.0
70749: loss=0.000, reward_mean=0.1, reward_bound=0.0
70750: loss=0.000, reward_mean=0.1, reward_bound=0.0
70751: loss=0.000, reward_mean=0.1, reward_bound=0.0
70752: loss=0.000, reward_mean=0.1, reward_bound=0.0
70753: loss=0.000, reward_mean=0.0, reward_bound=0.0
70754: loss=0.000, reward_mean=0.1, reward_bou

70891: loss=0.000, reward_mean=0.1, reward_bound=0.0
70892: loss=0.000, reward_mean=0.0, reward_bound=0.0
70893: loss=0.000, reward_mean=0.1, reward_bound=0.0
70894: loss=0.000, reward_mean=0.1, reward_bound=0.0
70895: loss=0.000, reward_mean=0.1, reward_bound=0.0
70896: loss=0.000, reward_mean=0.0, reward_bound=0.0
70897: loss=0.000, reward_mean=0.1, reward_bound=0.0
70898: loss=0.000, reward_mean=0.1, reward_bound=0.0
70899: loss=0.000, reward_mean=0.0, reward_bound=0.0
70900: loss=0.000, reward_mean=0.1, reward_bound=0.0
70901: loss=0.000, reward_mean=0.0, reward_bound=0.0
70902: loss=0.000, reward_mean=0.0, reward_bound=0.0
70903: loss=0.000, reward_mean=0.0, reward_bound=0.0
70904: loss=0.000, reward_mean=0.1, reward_bound=0.0
70905: loss=0.000, reward_mean=0.0, reward_bound=0.0
70906: loss=0.000, reward_mean=0.1, reward_bound=0.0
70907: loss=0.000, reward_mean=0.0, reward_bound=0.0
70908: loss=0.000, reward_mean=0.1, reward_bound=0.0
70909: loss=0.000, reward_mean=0.0, reward_bou

71048: loss=0.000, reward_mean=0.2, reward_bound=0.0
71049: loss=0.000, reward_mean=0.2, reward_bound=0.0
71050: loss=0.000, reward_mean=0.3, reward_bound=0.5
71051: loss=0.000, reward_mean=0.0, reward_bound=0.0
71052: loss=0.000, reward_mean=0.2, reward_bound=0.0
71053: loss=0.000, reward_mean=0.1, reward_bound=0.0
71054: loss=0.000, reward_mean=0.1, reward_bound=0.0
71055: loss=0.000, reward_mean=0.1, reward_bound=0.0
71056: loss=0.000, reward_mean=0.0, reward_bound=0.0
71057: loss=0.000, reward_mean=0.0, reward_bound=0.0
71058: loss=0.000, reward_mean=0.0, reward_bound=0.0
71059: loss=0.000, reward_mean=0.0, reward_bound=0.0
71060: loss=0.000, reward_mean=0.1, reward_bound=0.0
71061: loss=0.000, reward_mean=0.0, reward_bound=0.0
71062: loss=0.000, reward_mean=0.0, reward_bound=0.0
71063: loss=0.000, reward_mean=0.1, reward_bound=0.0
71064: loss=0.000, reward_mean=0.1, reward_bound=0.0
71065: loss=0.000, reward_mean=0.2, reward_bound=0.0
71066: loss=0.000, reward_mean=0.1, reward_bou

71204: loss=0.000, reward_mean=0.1, reward_bound=0.0
71205: loss=0.000, reward_mean=0.1, reward_bound=0.0
71206: loss=0.000, reward_mean=0.1, reward_bound=0.0
71207: loss=0.000, reward_mean=0.1, reward_bound=0.0
71208: loss=0.000, reward_mean=0.0, reward_bound=0.0
71209: loss=0.000, reward_mean=0.1, reward_bound=0.0
71210: loss=0.000, reward_mean=0.0, reward_bound=0.0
71211: loss=0.000, reward_mean=0.1, reward_bound=0.0
71212: loss=0.000, reward_mean=0.0, reward_bound=0.0
71213: loss=0.000, reward_mean=0.0, reward_bound=0.0
71214: loss=0.000, reward_mean=0.1, reward_bound=0.0
71215: loss=0.000, reward_mean=0.1, reward_bound=0.0
71216: loss=0.000, reward_mean=0.0, reward_bound=0.0
71217: loss=0.000, reward_mean=0.1, reward_bound=0.0
71218: loss=0.000, reward_mean=0.1, reward_bound=0.0
71219: loss=0.000, reward_mean=0.1, reward_bound=0.0
71220: loss=0.000, reward_mean=0.1, reward_bound=0.0
71221: loss=0.000, reward_mean=0.0, reward_bound=0.0
71222: loss=0.000, reward_mean=0.1, reward_bou

71360: loss=0.000, reward_mean=0.0, reward_bound=0.0
71361: loss=0.000, reward_mean=0.1, reward_bound=0.0
71362: loss=0.000, reward_mean=0.0, reward_bound=0.0
71363: loss=0.000, reward_mean=0.1, reward_bound=0.0
71364: loss=0.000, reward_mean=0.1, reward_bound=0.0
71365: loss=0.000, reward_mean=0.0, reward_bound=0.0
71366: loss=0.000, reward_mean=0.1, reward_bound=0.0
71367: loss=0.000, reward_mean=0.1, reward_bound=0.0
71368: loss=0.000, reward_mean=0.1, reward_bound=0.0
71369: loss=0.000, reward_mean=0.0, reward_bound=0.0
71370: loss=0.000, reward_mean=0.1, reward_bound=0.0
71371: loss=0.000, reward_mean=0.0, reward_bound=0.0
71372: loss=0.000, reward_mean=0.1, reward_bound=0.0
71373: loss=0.000, reward_mean=0.1, reward_bound=0.0
71374: loss=0.000, reward_mean=0.0, reward_bound=0.0
71375: loss=0.000, reward_mean=0.0, reward_bound=0.0
71376: loss=0.000, reward_mean=0.1, reward_bound=0.0
71377: loss=0.000, reward_mean=0.0, reward_bound=0.0
71378: loss=0.000, reward_mean=0.0, reward_bou

71514: loss=0.000, reward_mean=0.0, reward_bound=0.0
71515: loss=0.000, reward_mean=0.1, reward_bound=0.0
71516: loss=0.000, reward_mean=0.0, reward_bound=0.0
71517: loss=0.000, reward_mean=0.0, reward_bound=0.0
71518: loss=0.000, reward_mean=0.1, reward_bound=0.0
71519: loss=0.000, reward_mean=0.0, reward_bound=0.0
71520: loss=0.000, reward_mean=0.0, reward_bound=0.0
71521: loss=0.000, reward_mean=0.0, reward_bound=0.0
71522: loss=0.000, reward_mean=0.1, reward_bound=0.0
71523: loss=0.000, reward_mean=0.1, reward_bound=0.0
71524: loss=0.000, reward_mean=0.1, reward_bound=0.0
71525: loss=0.000, reward_mean=0.1, reward_bound=0.0
71526: loss=0.000, reward_mean=0.1, reward_bound=0.0
71527: loss=0.000, reward_mean=0.0, reward_bound=0.0
71528: loss=0.000, reward_mean=0.1, reward_bound=0.0
71529: loss=0.000, reward_mean=0.0, reward_bound=0.0
71530: loss=0.000, reward_mean=0.1, reward_bound=0.0
71531: loss=0.000, reward_mean=0.2, reward_bound=0.0
71532: loss=0.000, reward_mean=0.1, reward_bou

71670: loss=0.000, reward_mean=0.1, reward_bound=0.0
71671: loss=0.000, reward_mean=0.1, reward_bound=0.0
71672: loss=0.000, reward_mean=0.1, reward_bound=0.0
71673: loss=0.000, reward_mean=0.0, reward_bound=0.0
71674: loss=0.000, reward_mean=0.1, reward_bound=0.0
71675: loss=0.000, reward_mean=0.1, reward_bound=0.0
71676: loss=0.000, reward_mean=0.0, reward_bound=0.0
71677: loss=0.000, reward_mean=0.0, reward_bound=0.0
71678: loss=0.000, reward_mean=0.2, reward_bound=0.0
71679: loss=0.000, reward_mean=0.0, reward_bound=0.0
71680: loss=0.000, reward_mean=0.0, reward_bound=0.0
71681: loss=0.000, reward_mean=0.0, reward_bound=0.0
71682: loss=0.000, reward_mean=0.0, reward_bound=0.0
71683: loss=0.000, reward_mean=0.1, reward_bound=0.0
71684: loss=0.000, reward_mean=0.0, reward_bound=0.0
71685: loss=0.000, reward_mean=0.1, reward_bound=0.0
71686: loss=0.000, reward_mean=0.0, reward_bound=0.0
71687: loss=0.000, reward_mean=0.1, reward_bound=0.0
71688: loss=0.000, reward_mean=0.0, reward_bou

71826: loss=0.000, reward_mean=0.0, reward_bound=0.0
71827: loss=0.000, reward_mean=0.1, reward_bound=0.0
71828: loss=0.000, reward_mean=0.0, reward_bound=0.0
71829: loss=0.000, reward_mean=0.1, reward_bound=0.0
71830: loss=0.000, reward_mean=0.0, reward_bound=0.0
71831: loss=0.000, reward_mean=0.1, reward_bound=0.0
71832: loss=0.000, reward_mean=0.1, reward_bound=0.0
71833: loss=0.000, reward_mean=0.0, reward_bound=0.0
71834: loss=0.000, reward_mean=0.1, reward_bound=0.0
71835: loss=0.000, reward_mean=0.0, reward_bound=0.0
71836: loss=0.000, reward_mean=0.0, reward_bound=0.0
71837: loss=0.000, reward_mean=0.0, reward_bound=0.0
71838: loss=0.000, reward_mean=0.1, reward_bound=0.0
71839: loss=0.000, reward_mean=0.0, reward_bound=0.0
71840: loss=0.000, reward_mean=0.1, reward_bound=0.0
71841: loss=0.000, reward_mean=0.1, reward_bound=0.0
71842: loss=0.000, reward_mean=0.0, reward_bound=0.0
71843: loss=0.000, reward_mean=0.0, reward_bound=0.0
71844: loss=0.000, reward_mean=0.1, reward_bou

71984: loss=0.000, reward_mean=0.2, reward_bound=0.0
71985: loss=0.000, reward_mean=0.0, reward_bound=0.0
71986: loss=0.000, reward_mean=0.0, reward_bound=0.0
71987: loss=0.000, reward_mean=0.1, reward_bound=0.0
71988: loss=0.000, reward_mean=0.1, reward_bound=0.0
71989: loss=0.000, reward_mean=0.0, reward_bound=0.0
71990: loss=0.000, reward_mean=0.0, reward_bound=0.0
71991: loss=0.000, reward_mean=0.0, reward_bound=0.0
71992: loss=0.000, reward_mean=0.1, reward_bound=0.0
71993: loss=0.000, reward_mean=0.1, reward_bound=0.0
71994: loss=0.000, reward_mean=0.1, reward_bound=0.0
71995: loss=0.000, reward_mean=0.1, reward_bound=0.0
71996: loss=0.000, reward_mean=0.1, reward_bound=0.0
71997: loss=0.000, reward_mean=0.1, reward_bound=0.0
71998: loss=0.000, reward_mean=0.0, reward_bound=0.0
71999: loss=0.000, reward_mean=0.0, reward_bound=0.0
72000: loss=0.000, reward_mean=0.0, reward_bound=0.0
72001: loss=0.000, reward_mean=0.0, reward_bound=0.0
72002: loss=0.000, reward_mean=0.1, reward_bou

72139: loss=0.000, reward_mean=0.1, reward_bound=0.0
72140: loss=0.000, reward_mean=0.0, reward_bound=0.0
72141: loss=0.000, reward_mean=0.0, reward_bound=0.0
72142: loss=0.000, reward_mean=0.0, reward_bound=0.0
72143: loss=0.000, reward_mean=0.0, reward_bound=0.0
72144: loss=0.000, reward_mean=0.0, reward_bound=0.0
72145: loss=0.000, reward_mean=0.0, reward_bound=0.0
72146: loss=0.000, reward_mean=0.1, reward_bound=0.0
72147: loss=0.000, reward_mean=0.1, reward_bound=0.0
72148: loss=0.000, reward_mean=0.0, reward_bound=0.0
72149: loss=0.000, reward_mean=0.1, reward_bound=0.0
72150: loss=0.000, reward_mean=0.1, reward_bound=0.0
72151: loss=0.000, reward_mean=0.0, reward_bound=0.0
72152: loss=0.000, reward_mean=0.0, reward_bound=0.0
72153: loss=0.000, reward_mean=0.1, reward_bound=0.0
72154: loss=0.000, reward_mean=0.1, reward_bound=0.0
72155: loss=0.000, reward_mean=0.2, reward_bound=0.0
72156: loss=0.000, reward_mean=0.1, reward_bound=0.0
72157: loss=0.000, reward_mean=0.1, reward_bou

72298: loss=0.000, reward_mean=0.1, reward_bound=0.0
72299: loss=0.000, reward_mean=0.1, reward_bound=0.0
72300: loss=0.000, reward_mean=0.1, reward_bound=0.0
72301: loss=0.000, reward_mean=0.1, reward_bound=0.0
72302: loss=0.000, reward_mean=0.0, reward_bound=0.0
72303: loss=0.000, reward_mean=0.0, reward_bound=0.0
72304: loss=0.000, reward_mean=0.0, reward_bound=0.0
72305: loss=0.000, reward_mean=0.1, reward_bound=0.0
72306: loss=0.000, reward_mean=0.1, reward_bound=0.0
72307: loss=0.000, reward_mean=0.0, reward_bound=0.0
72308: loss=0.000, reward_mean=0.0, reward_bound=0.0
72309: loss=0.000, reward_mean=0.1, reward_bound=0.0
72310: loss=0.000, reward_mean=0.1, reward_bound=0.0
72311: loss=0.000, reward_mean=0.0, reward_bound=0.0
72312: loss=0.000, reward_mean=0.0, reward_bound=0.0
72313: loss=0.000, reward_mean=0.0, reward_bound=0.0
72314: loss=0.000, reward_mean=0.0, reward_bound=0.0
72315: loss=0.000, reward_mean=0.0, reward_bound=0.0
72316: loss=0.000, reward_mean=0.1, reward_bou

72456: loss=0.000, reward_mean=0.1, reward_bound=0.0
72457: loss=0.000, reward_mean=0.0, reward_bound=0.0
72458: loss=0.000, reward_mean=0.1, reward_bound=0.0
72459: loss=0.000, reward_mean=0.1, reward_bound=0.0
72460: loss=0.000, reward_mean=0.0, reward_bound=0.0
72461: loss=0.000, reward_mean=0.1, reward_bound=0.0
72462: loss=0.000, reward_mean=0.0, reward_bound=0.0
72463: loss=0.000, reward_mean=0.0, reward_bound=0.0
72464: loss=0.000, reward_mean=0.0, reward_bound=0.0
72465: loss=0.000, reward_mean=0.0, reward_bound=0.0
72466: loss=0.000, reward_mean=0.1, reward_bound=0.0
72467: loss=0.000, reward_mean=0.0, reward_bound=0.0
72468: loss=0.000, reward_mean=0.1, reward_bound=0.0
72469: loss=0.000, reward_mean=0.1, reward_bound=0.0
72470: loss=0.000, reward_mean=0.1, reward_bound=0.0
72471: loss=0.000, reward_mean=0.0, reward_bound=0.0
72472: loss=0.000, reward_mean=0.0, reward_bound=0.0
72473: loss=0.000, reward_mean=0.1, reward_bound=0.0
72474: loss=0.000, reward_mean=0.1, reward_bou

72612: loss=0.000, reward_mean=0.0, reward_bound=0.0
72613: loss=0.000, reward_mean=0.0, reward_bound=0.0
72614: loss=0.000, reward_mean=0.0, reward_bound=0.0
72615: loss=0.000, reward_mean=0.1, reward_bound=0.0
72616: loss=0.000, reward_mean=0.1, reward_bound=0.0
72617: loss=0.000, reward_mean=0.1, reward_bound=0.0
72618: loss=0.000, reward_mean=0.1, reward_bound=0.0
72619: loss=0.000, reward_mean=0.0, reward_bound=0.0
72620: loss=0.000, reward_mean=0.1, reward_bound=0.0
72621: loss=0.000, reward_mean=0.2, reward_bound=0.0
72622: loss=0.000, reward_mean=0.2, reward_bound=0.0
72623: loss=0.000, reward_mean=0.1, reward_bound=0.0
72624: loss=0.000, reward_mean=0.1, reward_bound=0.0
72625: loss=0.000, reward_mean=0.0, reward_bound=0.0
72626: loss=0.000, reward_mean=0.1, reward_bound=0.0
72627: loss=0.000, reward_mean=0.0, reward_bound=0.0
72628: loss=0.000, reward_mean=0.0, reward_bound=0.0
72629: loss=0.000, reward_mean=0.1, reward_bound=0.0
72630: loss=0.000, reward_mean=0.1, reward_bou

72770: loss=0.000, reward_mean=0.0, reward_bound=0.0
72771: loss=0.000, reward_mean=0.0, reward_bound=0.0
72772: loss=0.000, reward_mean=0.2, reward_bound=0.0
72773: loss=0.000, reward_mean=0.1, reward_bound=0.0
72774: loss=0.000, reward_mean=0.0, reward_bound=0.0
72775: loss=0.000, reward_mean=0.0, reward_bound=0.0
72776: loss=0.000, reward_mean=0.0, reward_bound=0.0
72777: loss=0.000, reward_mean=0.1, reward_bound=0.0
72778: loss=0.000, reward_mean=0.1, reward_bound=0.0
72779: loss=0.000, reward_mean=0.1, reward_bound=0.0
72780: loss=0.000, reward_mean=0.1, reward_bound=0.0
72781: loss=0.000, reward_mean=0.0, reward_bound=0.0
72782: loss=0.000, reward_mean=0.0, reward_bound=0.0
72783: loss=0.000, reward_mean=0.0, reward_bound=0.0
72784: loss=0.000, reward_mean=0.0, reward_bound=0.0
72785: loss=0.000, reward_mean=0.0, reward_bound=0.0
72786: loss=0.000, reward_mean=0.0, reward_bound=0.0
72787: loss=0.000, reward_mean=0.1, reward_bound=0.0
72788: loss=0.000, reward_mean=0.1, reward_bou

72926: loss=0.000, reward_mean=0.0, reward_bound=0.0
72927: loss=0.000, reward_mean=0.1, reward_bound=0.0
72928: loss=0.000, reward_mean=0.0, reward_bound=0.0
72929: loss=0.000, reward_mean=0.2, reward_bound=0.0
72930: loss=0.000, reward_mean=0.1, reward_bound=0.0
72931: loss=0.000, reward_mean=0.0, reward_bound=0.0
72932: loss=0.000, reward_mean=0.0, reward_bound=0.0
72933: loss=0.000, reward_mean=0.1, reward_bound=0.0
72934: loss=0.000, reward_mean=0.1, reward_bound=0.0
72935: loss=0.000, reward_mean=0.2, reward_bound=0.0
72936: loss=0.000, reward_mean=0.1, reward_bound=0.0
72937: loss=0.000, reward_mean=0.1, reward_bound=0.0
72938: loss=0.000, reward_mean=0.1, reward_bound=0.0
72939: loss=0.000, reward_mean=0.0, reward_bound=0.0
72940: loss=0.000, reward_mean=0.0, reward_bound=0.0
72941: loss=0.000, reward_mean=0.0, reward_bound=0.0
72942: loss=0.000, reward_mean=0.1, reward_bound=0.0
72943: loss=0.000, reward_mean=0.0, reward_bound=0.0
72944: loss=0.000, reward_mean=0.0, reward_bou

73082: loss=0.000, reward_mean=0.0, reward_bound=0.0
73083: loss=0.000, reward_mean=0.1, reward_bound=0.0
73084: loss=0.000, reward_mean=0.1, reward_bound=0.0
73085: loss=0.000, reward_mean=0.0, reward_bound=0.0
73086: loss=0.000, reward_mean=0.1, reward_bound=0.0
73087: loss=0.000, reward_mean=0.1, reward_bound=0.0
73088: loss=0.000, reward_mean=0.1, reward_bound=0.0
73089: loss=0.000, reward_mean=0.0, reward_bound=0.0
73090: loss=0.000, reward_mean=0.0, reward_bound=0.0
73091: loss=0.000, reward_mean=0.0, reward_bound=0.0
73092: loss=0.000, reward_mean=0.0, reward_bound=0.0
73093: loss=0.000, reward_mean=0.1, reward_bound=0.0
73094: loss=0.000, reward_mean=0.0, reward_bound=0.0
73095: loss=0.000, reward_mean=0.1, reward_bound=0.0
73096: loss=0.000, reward_mean=0.0, reward_bound=0.0
73097: loss=0.000, reward_mean=0.1, reward_bound=0.0
73098: loss=0.000, reward_mean=0.1, reward_bound=0.0
73099: loss=0.000, reward_mean=0.1, reward_bound=0.0
73100: loss=0.000, reward_mean=0.0, reward_bou

73237: loss=0.000, reward_mean=0.1, reward_bound=0.0
73238: loss=0.000, reward_mean=0.1, reward_bound=0.0
73239: loss=0.000, reward_mean=0.0, reward_bound=0.0
73240: loss=0.000, reward_mean=0.1, reward_bound=0.0
73241: loss=0.000, reward_mean=0.2, reward_bound=0.0
73242: loss=0.000, reward_mean=0.0, reward_bound=0.0
73243: loss=0.000, reward_mean=0.1, reward_bound=0.0
73244: loss=0.000, reward_mean=0.0, reward_bound=0.0
73245: loss=0.000, reward_mean=0.0, reward_bound=0.0
73246: loss=0.000, reward_mean=0.1, reward_bound=0.0
73247: loss=0.000, reward_mean=0.1, reward_bound=0.0
73248: loss=0.000, reward_mean=0.0, reward_bound=0.0
73249: loss=0.000, reward_mean=0.1, reward_bound=0.0
73250: loss=0.000, reward_mean=0.0, reward_bound=0.0
73251: loss=0.000, reward_mean=0.1, reward_bound=0.0
73252: loss=0.000, reward_mean=0.1, reward_bound=0.0
73253: loss=0.000, reward_mean=0.1, reward_bound=0.0
73254: loss=0.000, reward_mean=0.0, reward_bound=0.0
73255: loss=0.000, reward_mean=0.1, reward_bou

73397: loss=0.000, reward_mean=0.1, reward_bound=0.0
73398: loss=0.000, reward_mean=0.1, reward_bound=0.0
73399: loss=0.000, reward_mean=0.1, reward_bound=0.0
73400: loss=0.000, reward_mean=0.0, reward_bound=0.0
73401: loss=0.000, reward_mean=0.0, reward_bound=0.0
73402: loss=0.000, reward_mean=0.1, reward_bound=0.0
73403: loss=0.000, reward_mean=0.1, reward_bound=0.0
73404: loss=0.000, reward_mean=0.1, reward_bound=0.0
73405: loss=0.000, reward_mean=0.1, reward_bound=0.0
73406: loss=0.000, reward_mean=0.2, reward_bound=0.0
73407: loss=0.000, reward_mean=0.0, reward_bound=0.0
73408: loss=0.000, reward_mean=0.1, reward_bound=0.0
73409: loss=0.000, reward_mean=0.0, reward_bound=0.0
73410: loss=0.000, reward_mean=0.1, reward_bound=0.0
73411: loss=0.000, reward_mean=0.0, reward_bound=0.0
73412: loss=0.000, reward_mean=0.0, reward_bound=0.0
73413: loss=0.000, reward_mean=0.1, reward_bound=0.0
73414: loss=0.000, reward_mean=0.1, reward_bound=0.0
73415: loss=0.000, reward_mean=0.0, reward_bou

73557: loss=0.000, reward_mean=0.1, reward_bound=0.0
73558: loss=0.000, reward_mean=0.1, reward_bound=0.0
73559: loss=0.000, reward_mean=0.2, reward_bound=0.0
73560: loss=0.000, reward_mean=0.1, reward_bound=0.0
73561: loss=0.000, reward_mean=0.0, reward_bound=0.0
73562: loss=0.000, reward_mean=0.0, reward_bound=0.0
73563: loss=0.000, reward_mean=0.1, reward_bound=0.0
73564: loss=0.000, reward_mean=0.1, reward_bound=0.0
73565: loss=0.000, reward_mean=0.1, reward_bound=0.0
73566: loss=0.000, reward_mean=0.1, reward_bound=0.0
73567: loss=0.000, reward_mean=0.0, reward_bound=0.0
73568: loss=0.000, reward_mean=0.1, reward_bound=0.0
73569: loss=0.000, reward_mean=0.0, reward_bound=0.0
73570: loss=0.000, reward_mean=0.0, reward_bound=0.0
73571: loss=0.000, reward_mean=0.0, reward_bound=0.0
73572: loss=0.000, reward_mean=0.0, reward_bound=0.0
73573: loss=0.000, reward_mean=0.1, reward_bound=0.0
73574: loss=0.000, reward_mean=0.1, reward_bound=0.0
73575: loss=0.000, reward_mean=0.1, reward_bou

73713: loss=0.000, reward_mean=0.1, reward_bound=0.0
73714: loss=0.000, reward_mean=0.0, reward_bound=0.0
73715: loss=0.000, reward_mean=0.1, reward_bound=0.0
73716: loss=0.000, reward_mean=0.0, reward_bound=0.0
73717: loss=0.000, reward_mean=0.0, reward_bound=0.0
73718: loss=0.000, reward_mean=0.0, reward_bound=0.0
73719: loss=0.000, reward_mean=0.1, reward_bound=0.0
73720: loss=0.000, reward_mean=0.1, reward_bound=0.0
73721: loss=0.000, reward_mean=0.1, reward_bound=0.0
73722: loss=0.000, reward_mean=0.1, reward_bound=0.0
73723: loss=0.000, reward_mean=0.0, reward_bound=0.0
73724: loss=0.000, reward_mean=0.0, reward_bound=0.0
73725: loss=0.000, reward_mean=0.1, reward_bound=0.0
73726: loss=0.000, reward_mean=0.0, reward_bound=0.0
73727: loss=0.000, reward_mean=0.0, reward_bound=0.0
73728: loss=0.000, reward_mean=0.1, reward_bound=0.0
73729: loss=0.000, reward_mean=0.1, reward_bound=0.0
73730: loss=0.000, reward_mean=0.2, reward_bound=0.0
73731: loss=0.000, reward_mean=0.1, reward_bou

73870: loss=0.000, reward_mean=0.0, reward_bound=0.0
73871: loss=0.000, reward_mean=0.1, reward_bound=0.0
73872: loss=0.000, reward_mean=0.1, reward_bound=0.0
73873: loss=0.000, reward_mean=0.0, reward_bound=0.0
73874: loss=0.000, reward_mean=0.0, reward_bound=0.0
73875: loss=0.000, reward_mean=0.1, reward_bound=0.0
73876: loss=0.000, reward_mean=0.0, reward_bound=0.0
73877: loss=0.000, reward_mean=0.1, reward_bound=0.0
73878: loss=0.000, reward_mean=0.1, reward_bound=0.0
73879: loss=0.000, reward_mean=0.1, reward_bound=0.0
73880: loss=0.000, reward_mean=0.1, reward_bound=0.0
73881: loss=0.000, reward_mean=0.0, reward_bound=0.0
73882: loss=0.000, reward_mean=0.2, reward_bound=0.0
73883: loss=0.000, reward_mean=0.0, reward_bound=0.0
73884: loss=0.000, reward_mean=0.1, reward_bound=0.0
73885: loss=0.000, reward_mean=0.1, reward_bound=0.0
73886: loss=0.000, reward_mean=0.0, reward_bound=0.0
73887: loss=0.000, reward_mean=0.0, reward_bound=0.0
73888: loss=0.000, reward_mean=0.1, reward_bou

74027: loss=0.000, reward_mean=0.0, reward_bound=0.0
74028: loss=0.000, reward_mean=0.2, reward_bound=0.0
74029: loss=0.000, reward_mean=0.1, reward_bound=0.0
74030: loss=0.000, reward_mean=0.1, reward_bound=0.0
74031: loss=0.000, reward_mean=0.0, reward_bound=0.0
74032: loss=0.000, reward_mean=0.1, reward_bound=0.0
74033: loss=0.000, reward_mean=0.0, reward_bound=0.0
74034: loss=0.000, reward_mean=0.0, reward_bound=0.0
74035: loss=0.000, reward_mean=0.1, reward_bound=0.0
74036: loss=0.000, reward_mean=0.0, reward_bound=0.0
74037: loss=0.000, reward_mean=0.1, reward_bound=0.0
74038: loss=0.000, reward_mean=0.1, reward_bound=0.0
74039: loss=0.000, reward_mean=0.2, reward_bound=0.0
74040: loss=0.000, reward_mean=0.2, reward_bound=0.0
74041: loss=0.000, reward_mean=0.0, reward_bound=0.0
74042: loss=0.000, reward_mean=0.1, reward_bound=0.0
74043: loss=0.000, reward_mean=0.1, reward_bound=0.0
74044: loss=0.000, reward_mean=0.1, reward_bound=0.0
74045: loss=0.000, reward_mean=0.1, reward_bou

74188: loss=0.000, reward_mean=0.0, reward_bound=0.0
74189: loss=0.000, reward_mean=0.0, reward_bound=0.0
74190: loss=0.000, reward_mean=0.1, reward_bound=0.0
74191: loss=0.000, reward_mean=0.0, reward_bound=0.0
74192: loss=0.000, reward_mean=0.1, reward_bound=0.0
74193: loss=0.000, reward_mean=0.1, reward_bound=0.0
74194: loss=0.000, reward_mean=0.1, reward_bound=0.0
74195: loss=0.000, reward_mean=0.0, reward_bound=0.0
74196: loss=0.000, reward_mean=0.1, reward_bound=0.0
74197: loss=0.000, reward_mean=0.0, reward_bound=0.0
74198: loss=0.000, reward_mean=0.1, reward_bound=0.0
74199: loss=0.000, reward_mean=0.1, reward_bound=0.0
74200: loss=0.000, reward_mean=0.0, reward_bound=0.0
74201: loss=0.000, reward_mean=0.1, reward_bound=0.0
74202: loss=0.000, reward_mean=0.1, reward_bound=0.0
74203: loss=0.000, reward_mean=0.0, reward_bound=0.0
74204: loss=0.000, reward_mean=0.0, reward_bound=0.0
74205: loss=0.000, reward_mean=0.0, reward_bound=0.0
74206: loss=0.000, reward_mean=0.0, reward_bou

74344: loss=0.000, reward_mean=0.0, reward_bound=0.0
74345: loss=0.000, reward_mean=0.1, reward_bound=0.0
74346: loss=0.000, reward_mean=0.0, reward_bound=0.0
74347: loss=0.000, reward_mean=0.0, reward_bound=0.0
74348: loss=0.000, reward_mean=0.0, reward_bound=0.0
74349: loss=0.000, reward_mean=0.1, reward_bound=0.0
74350: loss=0.000, reward_mean=0.1, reward_bound=0.0
74351: loss=0.000, reward_mean=0.1, reward_bound=0.0
74352: loss=0.000, reward_mean=0.0, reward_bound=0.0
74353: loss=0.000, reward_mean=0.0, reward_bound=0.0
74354: loss=0.000, reward_mean=0.1, reward_bound=0.0
74355: loss=0.000, reward_mean=0.0, reward_bound=0.0
74356: loss=0.000, reward_mean=0.1, reward_bound=0.0
74357: loss=0.000, reward_mean=0.1, reward_bound=0.0
74358: loss=0.000, reward_mean=0.0, reward_bound=0.0
74359: loss=0.000, reward_mean=0.0, reward_bound=0.0
74360: loss=0.000, reward_mean=0.2, reward_bound=0.0
74361: loss=0.000, reward_mean=0.0, reward_bound=0.0
74362: loss=0.000, reward_mean=0.1, reward_bou

74500: loss=0.000, reward_mean=0.2, reward_bound=0.0
74501: loss=0.000, reward_mean=0.0, reward_bound=0.0
74502: loss=0.000, reward_mean=0.1, reward_bound=0.0
74503: loss=0.000, reward_mean=0.1, reward_bound=0.0
74504: loss=0.000, reward_mean=0.1, reward_bound=0.0
74505: loss=0.000, reward_mean=0.0, reward_bound=0.0
74506: loss=0.000, reward_mean=0.1, reward_bound=0.0
74507: loss=0.000, reward_mean=0.1, reward_bound=0.0
74508: loss=0.000, reward_mean=0.1, reward_bound=0.0
74509: loss=0.000, reward_mean=0.1, reward_bound=0.0
74510: loss=0.000, reward_mean=0.0, reward_bound=0.0
74511: loss=0.000, reward_mean=0.0, reward_bound=0.0
74512: loss=0.000, reward_mean=0.0, reward_bound=0.0
74513: loss=0.000, reward_mean=0.1, reward_bound=0.0
74514: loss=0.000, reward_mean=0.1, reward_bound=0.0
74515: loss=0.000, reward_mean=0.1, reward_bound=0.0
74516: loss=0.000, reward_mean=0.0, reward_bound=0.0
74517: loss=0.000, reward_mean=0.1, reward_bound=0.0
74518: loss=0.000, reward_mean=0.1, reward_bou

74657: loss=0.000, reward_mean=0.1, reward_bound=0.0
74658: loss=0.000, reward_mean=0.1, reward_bound=0.0
74659: loss=0.000, reward_mean=0.1, reward_bound=0.0
74660: loss=0.000, reward_mean=0.2, reward_bound=0.0
74661: loss=0.000, reward_mean=0.0, reward_bound=0.0
74662: loss=0.000, reward_mean=0.1, reward_bound=0.0
74663: loss=0.000, reward_mean=0.1, reward_bound=0.0
74664: loss=0.000, reward_mean=0.0, reward_bound=0.0
74665: loss=0.000, reward_mean=0.2, reward_bound=0.0
74666: loss=0.000, reward_mean=0.1, reward_bound=0.0
74667: loss=0.000, reward_mean=0.1, reward_bound=0.0
74668: loss=0.000, reward_mean=0.0, reward_bound=0.0
74669: loss=0.000, reward_mean=0.1, reward_bound=0.0
74670: loss=0.000, reward_mean=0.0, reward_bound=0.0
74671: loss=0.000, reward_mean=0.0, reward_bound=0.0
74672: loss=0.000, reward_mean=0.0, reward_bound=0.0
74673: loss=0.000, reward_mean=0.1, reward_bound=0.0
74674: loss=0.000, reward_mean=0.1, reward_bound=0.0
74675: loss=0.000, reward_mean=0.0, reward_bou

74818: loss=0.000, reward_mean=0.0, reward_bound=0.0
74819: loss=0.000, reward_mean=0.1, reward_bound=0.0
74820: loss=0.000, reward_mean=0.1, reward_bound=0.0
74821: loss=0.000, reward_mean=0.1, reward_bound=0.0
74822: loss=0.000, reward_mean=0.0, reward_bound=0.0
74823: loss=0.000, reward_mean=0.0, reward_bound=0.0
74824: loss=0.000, reward_mean=0.0, reward_bound=0.0
74825: loss=0.000, reward_mean=0.1, reward_bound=0.0
74826: loss=0.000, reward_mean=0.1, reward_bound=0.0
74827: loss=0.000, reward_mean=0.1, reward_bound=0.0
74828: loss=0.000, reward_mean=0.1, reward_bound=0.0
74829: loss=0.000, reward_mean=0.1, reward_bound=0.0
74830: loss=0.000, reward_mean=0.0, reward_bound=0.0
74831: loss=0.000, reward_mean=0.0, reward_bound=0.0
74832: loss=0.000, reward_mean=0.2, reward_bound=0.0
74833: loss=0.000, reward_mean=0.1, reward_bound=0.0
74834: loss=0.000, reward_mean=0.0, reward_bound=0.0
74835: loss=0.000, reward_mean=0.1, reward_bound=0.0
74836: loss=0.000, reward_mean=0.0, reward_bou

74979: loss=0.000, reward_mean=0.1, reward_bound=0.0
74980: loss=0.000, reward_mean=0.1, reward_bound=0.0
74981: loss=0.000, reward_mean=0.1, reward_bound=0.0
74982: loss=0.000, reward_mean=0.1, reward_bound=0.0
74983: loss=0.000, reward_mean=0.0, reward_bound=0.0
74984: loss=0.000, reward_mean=0.0, reward_bound=0.0
74985: loss=0.000, reward_mean=0.1, reward_bound=0.0
74986: loss=0.000, reward_mean=0.1, reward_bound=0.0
74987: loss=0.000, reward_mean=0.0, reward_bound=0.0
74988: loss=0.000, reward_mean=0.1, reward_bound=0.0
74989: loss=0.000, reward_mean=0.0, reward_bound=0.0
74990: loss=0.000, reward_mean=0.1, reward_bound=0.0
74991: loss=0.000, reward_mean=0.1, reward_bound=0.0
74992: loss=0.000, reward_mean=0.1, reward_bound=0.0
74993: loss=0.000, reward_mean=0.0, reward_bound=0.0
74994: loss=0.000, reward_mean=0.1, reward_bound=0.0
74995: loss=0.000, reward_mean=0.1, reward_bound=0.0
74996: loss=0.000, reward_mean=0.0, reward_bound=0.0
74997: loss=0.000, reward_mean=0.0, reward_bou

75134: loss=0.000, reward_mean=0.0, reward_bound=0.0
75135: loss=0.000, reward_mean=0.0, reward_bound=0.0
75136: loss=0.000, reward_mean=0.1, reward_bound=0.0
75137: loss=0.000, reward_mean=0.1, reward_bound=0.0
75138: loss=0.000, reward_mean=0.1, reward_bound=0.0
75139: loss=0.000, reward_mean=0.0, reward_bound=0.0
75140: loss=0.000, reward_mean=0.1, reward_bound=0.0
75141: loss=0.000, reward_mean=0.1, reward_bound=0.0
75142: loss=0.000, reward_mean=0.1, reward_bound=0.0
75143: loss=0.000, reward_mean=0.0, reward_bound=0.0
75144: loss=0.000, reward_mean=0.0, reward_bound=0.0
75145: loss=0.000, reward_mean=0.0, reward_bound=0.0
75146: loss=0.000, reward_mean=0.1, reward_bound=0.0
75147: loss=0.000, reward_mean=0.0, reward_bound=0.0
75148: loss=0.000, reward_mean=0.1, reward_bound=0.0
75149: loss=0.000, reward_mean=0.1, reward_bound=0.0
75150: loss=0.000, reward_mean=0.1, reward_bound=0.0
75151: loss=0.000, reward_mean=0.1, reward_bound=0.0
75152: loss=0.000, reward_mean=0.0, reward_bou

75290: loss=0.000, reward_mean=0.1, reward_bound=0.0
75291: loss=0.000, reward_mean=0.1, reward_bound=0.0
75292: loss=0.000, reward_mean=0.0, reward_bound=0.0
75293: loss=0.000, reward_mean=0.1, reward_bound=0.0
75294: loss=0.000, reward_mean=0.1, reward_bound=0.0
75295: loss=0.000, reward_mean=0.0, reward_bound=0.0
75296: loss=0.000, reward_mean=0.1, reward_bound=0.0
75297: loss=0.000, reward_mean=0.1, reward_bound=0.0
75298: loss=0.000, reward_mean=0.1, reward_bound=0.0
75299: loss=0.000, reward_mean=0.1, reward_bound=0.0
75300: loss=0.000, reward_mean=0.0, reward_bound=0.0
75301: loss=0.000, reward_mean=0.1, reward_bound=0.0
75302: loss=0.000, reward_mean=0.1, reward_bound=0.0
75303: loss=0.000, reward_mean=0.1, reward_bound=0.0
75304: loss=0.000, reward_mean=0.0, reward_bound=0.0
75305: loss=0.000, reward_mean=0.1, reward_bound=0.0
75306: loss=0.000, reward_mean=0.1, reward_bound=0.0
75307: loss=0.000, reward_mean=0.0, reward_bound=0.0
75308: loss=0.000, reward_mean=0.1, reward_bou

75450: loss=0.000, reward_mean=0.1, reward_bound=0.0
75451: loss=0.000, reward_mean=0.1, reward_bound=0.0
75452: loss=0.000, reward_mean=0.0, reward_bound=0.0
75453: loss=0.000, reward_mean=0.1, reward_bound=0.0
75454: loss=0.000, reward_mean=0.1, reward_bound=0.0
75455: loss=0.000, reward_mean=0.1, reward_bound=0.0
75456: loss=0.000, reward_mean=0.1, reward_bound=0.0
75457: loss=0.000, reward_mean=0.0, reward_bound=0.0
75458: loss=0.000, reward_mean=0.1, reward_bound=0.0
75459: loss=0.000, reward_mean=0.1, reward_bound=0.0
75460: loss=0.000, reward_mean=0.1, reward_bound=0.0
75461: loss=0.000, reward_mean=0.0, reward_bound=0.0
75462: loss=0.000, reward_mean=0.0, reward_bound=0.0
75463: loss=0.000, reward_mean=0.0, reward_bound=0.0
75464: loss=0.000, reward_mean=0.0, reward_bound=0.0
75465: loss=0.000, reward_mean=0.0, reward_bound=0.0
75466: loss=0.000, reward_mean=0.1, reward_bound=0.0
75467: loss=0.000, reward_mean=0.0, reward_bound=0.0
75468: loss=0.000, reward_mean=0.1, reward_bou

75611: loss=0.000, reward_mean=0.1, reward_bound=0.0
75612: loss=0.000, reward_mean=0.1, reward_bound=0.0
75613: loss=0.000, reward_mean=0.0, reward_bound=0.0
75614: loss=0.000, reward_mean=0.0, reward_bound=0.0
75615: loss=0.000, reward_mean=0.1, reward_bound=0.0
75616: loss=0.000, reward_mean=0.0, reward_bound=0.0
75617: loss=0.000, reward_mean=0.1, reward_bound=0.0
75618: loss=0.000, reward_mean=0.0, reward_bound=0.0
75619: loss=0.000, reward_mean=0.1, reward_bound=0.0
75620: loss=0.000, reward_mean=0.1, reward_bound=0.0
75621: loss=0.000, reward_mean=0.0, reward_bound=0.0
75622: loss=0.000, reward_mean=0.1, reward_bound=0.0
75623: loss=0.000, reward_mean=0.1, reward_bound=0.0
75624: loss=0.000, reward_mean=0.0, reward_bound=0.0
75625: loss=0.000, reward_mean=0.2, reward_bound=0.0
75626: loss=0.000, reward_mean=0.1, reward_bound=0.0
75627: loss=0.000, reward_mean=0.1, reward_bound=0.0
75628: loss=0.000, reward_mean=0.1, reward_bound=0.0
75629: loss=0.000, reward_mean=0.1, reward_bou

75771: loss=0.000, reward_mean=0.1, reward_bound=0.0
75772: loss=0.000, reward_mean=0.1, reward_bound=0.0
75773: loss=0.000, reward_mean=0.0, reward_bound=0.0
75774: loss=0.000, reward_mean=0.0, reward_bound=0.0
75775: loss=0.000, reward_mean=0.0, reward_bound=0.0
75776: loss=0.000, reward_mean=0.0, reward_bound=0.0
75777: loss=0.000, reward_mean=0.1, reward_bound=0.0
75778: loss=0.000, reward_mean=0.1, reward_bound=0.0
75779: loss=0.000, reward_mean=0.1, reward_bound=0.0
75780: loss=0.000, reward_mean=0.1, reward_bound=0.0
75781: loss=0.000, reward_mean=0.1, reward_bound=0.0
75782: loss=0.000, reward_mean=0.1, reward_bound=0.0
75783: loss=0.000, reward_mean=0.1, reward_bound=0.0
75784: loss=0.000, reward_mean=0.1, reward_bound=0.0
75785: loss=0.000, reward_mean=0.0, reward_bound=0.0
75786: loss=0.000, reward_mean=0.1, reward_bound=0.0
75787: loss=0.000, reward_mean=0.1, reward_bound=0.0
75788: loss=0.000, reward_mean=0.1, reward_bound=0.0
75789: loss=0.000, reward_mean=0.0, reward_bou

75927: loss=0.000, reward_mean=0.1, reward_bound=0.0
75928: loss=0.000, reward_mean=0.2, reward_bound=0.0
75929: loss=0.000, reward_mean=0.0, reward_bound=0.0
75930: loss=0.000, reward_mean=0.1, reward_bound=0.0
75931: loss=0.000, reward_mean=0.1, reward_bound=0.0
75932: loss=0.000, reward_mean=0.1, reward_bound=0.0
75933: loss=0.000, reward_mean=0.1, reward_bound=0.0
75934: loss=0.000, reward_mean=0.1, reward_bound=0.0
75935: loss=0.000, reward_mean=0.1, reward_bound=0.0
75936: loss=0.000, reward_mean=0.0, reward_bound=0.0
75937: loss=0.000, reward_mean=0.0, reward_bound=0.0
75938: loss=0.000, reward_mean=0.1, reward_bound=0.0
75939: loss=0.000, reward_mean=0.1, reward_bound=0.0
75940: loss=0.000, reward_mean=0.1, reward_bound=0.0
75941: loss=0.000, reward_mean=0.1, reward_bound=0.0
75942: loss=0.000, reward_mean=0.1, reward_bound=0.0
75943: loss=0.000, reward_mean=0.1, reward_bound=0.0
75944: loss=0.000, reward_mean=0.0, reward_bound=0.0
75945: loss=0.000, reward_mean=0.1, reward_bou

76081: loss=0.000, reward_mean=0.1, reward_bound=0.0
76082: loss=0.000, reward_mean=0.0, reward_bound=0.0
76083: loss=0.000, reward_mean=0.1, reward_bound=0.0
76084: loss=0.000, reward_mean=0.0, reward_bound=0.0
76085: loss=0.000, reward_mean=0.1, reward_bound=0.0
76086: loss=0.000, reward_mean=0.1, reward_bound=0.0
76087: loss=0.000, reward_mean=0.0, reward_bound=0.0
76088: loss=0.000, reward_mean=0.0, reward_bound=0.0
76089: loss=0.000, reward_mean=0.0, reward_bound=0.0
76090: loss=0.000, reward_mean=0.1, reward_bound=0.0
76091: loss=0.000, reward_mean=0.1, reward_bound=0.0
76092: loss=0.000, reward_mean=0.1, reward_bound=0.0
76093: loss=0.000, reward_mean=0.1, reward_bound=0.0
76094: loss=0.000, reward_mean=0.0, reward_bound=0.0
76095: loss=0.000, reward_mean=0.1, reward_bound=0.0
76096: loss=0.000, reward_mean=0.1, reward_bound=0.0
76097: loss=0.000, reward_mean=0.1, reward_bound=0.0
76098: loss=0.000, reward_mean=0.1, reward_bound=0.0
76099: loss=0.000, reward_mean=0.1, reward_bou

76240: loss=0.000, reward_mean=0.0, reward_bound=0.0
76241: loss=0.000, reward_mean=0.0, reward_bound=0.0
76242: loss=0.000, reward_mean=0.2, reward_bound=0.0
76243: loss=0.000, reward_mean=0.0, reward_bound=0.0
76244: loss=0.000, reward_mean=0.1, reward_bound=0.0
76245: loss=0.000, reward_mean=0.0, reward_bound=0.0
76246: loss=0.000, reward_mean=0.0, reward_bound=0.0
76247: loss=0.000, reward_mean=0.0, reward_bound=0.0
76248: loss=0.000, reward_mean=0.1, reward_bound=0.0
76249: loss=0.000, reward_mean=0.1, reward_bound=0.0
76250: loss=0.000, reward_mean=0.1, reward_bound=0.0
76251: loss=0.000, reward_mean=0.1, reward_bound=0.0
76252: loss=0.000, reward_mean=0.1, reward_bound=0.0
76253: loss=0.000, reward_mean=0.0, reward_bound=0.0
76254: loss=0.000, reward_mean=0.1, reward_bound=0.0
76255: loss=0.000, reward_mean=0.1, reward_bound=0.0
76256: loss=0.000, reward_mean=0.1, reward_bound=0.0
76257: loss=0.000, reward_mean=0.1, reward_bound=0.0
76258: loss=0.000, reward_mean=0.1, reward_bou

76398: loss=0.000, reward_mean=0.1, reward_bound=0.0
76399: loss=0.000, reward_mean=0.1, reward_bound=0.0
76400: loss=0.000, reward_mean=0.0, reward_bound=0.0
76401: loss=0.000, reward_mean=0.1, reward_bound=0.0
76402: loss=0.000, reward_mean=0.1, reward_bound=0.0
76403: loss=0.000, reward_mean=0.1, reward_bound=0.0
76404: loss=0.000, reward_mean=0.1, reward_bound=0.0
76405: loss=0.000, reward_mean=0.1, reward_bound=0.0
76406: loss=0.000, reward_mean=0.2, reward_bound=0.0
76407: loss=0.000, reward_mean=0.0, reward_bound=0.0
76408: loss=0.000, reward_mean=0.1, reward_bound=0.0
76409: loss=0.000, reward_mean=0.0, reward_bound=0.0
76410: loss=0.000, reward_mean=0.0, reward_bound=0.0
76411: loss=0.000, reward_mean=0.1, reward_bound=0.0
76412: loss=0.000, reward_mean=0.1, reward_bound=0.0
76413: loss=0.000, reward_mean=0.1, reward_bound=0.0
76414: loss=0.000, reward_mean=0.0, reward_bound=0.0
76415: loss=0.000, reward_mean=0.1, reward_bound=0.0
76416: loss=0.000, reward_mean=0.1, reward_bou

76558: loss=0.000, reward_mean=0.1, reward_bound=0.0
76559: loss=0.000, reward_mean=0.1, reward_bound=0.0
76560: loss=0.000, reward_mean=0.0, reward_bound=0.0
76561: loss=0.000, reward_mean=0.0, reward_bound=0.0
76562: loss=0.000, reward_mean=0.1, reward_bound=0.0
76563: loss=0.000, reward_mean=0.1, reward_bound=0.0
76564: loss=0.000, reward_mean=0.1, reward_bound=0.0
76565: loss=0.000, reward_mean=0.0, reward_bound=0.0
76566: loss=0.000, reward_mean=0.1, reward_bound=0.0
76567: loss=0.000, reward_mean=0.1, reward_bound=0.0
76568: loss=0.000, reward_mean=0.0, reward_bound=0.0
76569: loss=0.000, reward_mean=0.0, reward_bound=0.0
76570: loss=0.000, reward_mean=0.0, reward_bound=0.0
76571: loss=0.000, reward_mean=0.0, reward_bound=0.0
76572: loss=0.000, reward_mean=0.1, reward_bound=0.0
76573: loss=0.000, reward_mean=0.1, reward_bound=0.0
76574: loss=0.000, reward_mean=0.1, reward_bound=0.0
76575: loss=0.000, reward_mean=0.1, reward_bound=0.0
76576: loss=0.000, reward_mean=0.2, reward_bou

76717: loss=0.000, reward_mean=0.0, reward_bound=0.0
76718: loss=0.000, reward_mean=0.0, reward_bound=0.0
76719: loss=0.000, reward_mean=0.1, reward_bound=0.0
76720: loss=0.000, reward_mean=0.1, reward_bound=0.0
76721: loss=0.000, reward_mean=0.0, reward_bound=0.0
76722: loss=0.000, reward_mean=0.1, reward_bound=0.0
76723: loss=0.000, reward_mean=0.0, reward_bound=0.0
76724: loss=0.000, reward_mean=0.0, reward_bound=0.0
76725: loss=0.000, reward_mean=0.0, reward_bound=0.0
76726: loss=0.000, reward_mean=0.1, reward_bound=0.0
76727: loss=0.000, reward_mean=0.1, reward_bound=0.0
76728: loss=0.000, reward_mean=0.0, reward_bound=0.0
76729: loss=0.000, reward_mean=0.1, reward_bound=0.0
76730: loss=0.000, reward_mean=0.0, reward_bound=0.0
76731: loss=0.000, reward_mean=0.1, reward_bound=0.0
76732: loss=0.000, reward_mean=0.1, reward_bound=0.0
76733: loss=0.000, reward_mean=0.1, reward_bound=0.0
76734: loss=0.000, reward_mean=0.1, reward_bound=0.0
76735: loss=0.000, reward_mean=0.0, reward_bou

76877: loss=0.000, reward_mean=0.0, reward_bound=0.0
76878: loss=0.000, reward_mean=0.1, reward_bound=0.0
76879: loss=0.000, reward_mean=0.0, reward_bound=0.0
76880: loss=0.000, reward_mean=0.2, reward_bound=0.0
76881: loss=0.000, reward_mean=0.1, reward_bound=0.0
76882: loss=0.000, reward_mean=0.0, reward_bound=0.0
76883: loss=0.000, reward_mean=0.1, reward_bound=0.0
76884: loss=0.000, reward_mean=0.0, reward_bound=0.0
76885: loss=0.000, reward_mean=0.0, reward_bound=0.0
76886: loss=0.000, reward_mean=0.0, reward_bound=0.0
76887: loss=0.000, reward_mean=0.1, reward_bound=0.0
76888: loss=0.000, reward_mean=0.1, reward_bound=0.0
76889: loss=0.000, reward_mean=0.0, reward_bound=0.0
76890: loss=0.000, reward_mean=0.1, reward_bound=0.0
76891: loss=0.000, reward_mean=0.0, reward_bound=0.0
76892: loss=0.000, reward_mean=0.0, reward_bound=0.0
76893: loss=0.000, reward_mean=0.2, reward_bound=0.0
76894: loss=0.000, reward_mean=0.2, reward_bound=0.0
76895: loss=0.000, reward_mean=0.1, reward_bou

77032: loss=0.000, reward_mean=0.0, reward_bound=0.0
77033: loss=0.000, reward_mean=0.0, reward_bound=0.0
77034: loss=0.000, reward_mean=0.1, reward_bound=0.0
77035: loss=0.000, reward_mean=0.0, reward_bound=0.0
77036: loss=0.000, reward_mean=0.1, reward_bound=0.0
77037: loss=0.000, reward_mean=0.0, reward_bound=0.0
77038: loss=0.000, reward_mean=0.1, reward_bound=0.0
77039: loss=0.000, reward_mean=0.1, reward_bound=0.0
77040: loss=0.000, reward_mean=0.0, reward_bound=0.0
77041: loss=0.000, reward_mean=0.0, reward_bound=0.0
77042: loss=0.000, reward_mean=0.1, reward_bound=0.0
77043: loss=0.000, reward_mean=0.1, reward_bound=0.0
77044: loss=0.000, reward_mean=0.0, reward_bound=0.0
77045: loss=0.000, reward_mean=0.1, reward_bound=0.0
77046: loss=0.000, reward_mean=0.1, reward_bound=0.0
77047: loss=0.000, reward_mean=0.0, reward_bound=0.0
77048: loss=0.000, reward_mean=0.1, reward_bound=0.0
77049: loss=0.000, reward_mean=0.1, reward_bound=0.0
77050: loss=0.000, reward_mean=0.1, reward_bou

77193: loss=0.000, reward_mean=0.1, reward_bound=0.0
77194: loss=0.000, reward_mean=0.0, reward_bound=0.0
77195: loss=0.000, reward_mean=0.0, reward_bound=0.0
77196: loss=0.000, reward_mean=0.0, reward_bound=0.0
77197: loss=0.000, reward_mean=0.1, reward_bound=0.0
77198: loss=0.000, reward_mean=0.1, reward_bound=0.0
77199: loss=0.000, reward_mean=0.1, reward_bound=0.0
77200: loss=0.000, reward_mean=0.0, reward_bound=0.0
77201: loss=0.000, reward_mean=0.1, reward_bound=0.0
77202: loss=0.000, reward_mean=0.1, reward_bound=0.0
77203: loss=0.000, reward_mean=0.1, reward_bound=0.0
77204: loss=0.000, reward_mean=0.1, reward_bound=0.0
77205: loss=0.000, reward_mean=0.1, reward_bound=0.0
77206: loss=0.000, reward_mean=0.1, reward_bound=0.0
77207: loss=0.000, reward_mean=0.1, reward_bound=0.0
77208: loss=0.000, reward_mean=0.1, reward_bound=0.0
77209: loss=0.000, reward_mean=0.1, reward_bound=0.0
77210: loss=0.000, reward_mean=0.1, reward_bound=0.0
77211: loss=0.000, reward_mean=0.1, reward_bou

77353: loss=0.000, reward_mean=0.0, reward_bound=0.0
77354: loss=0.000, reward_mean=0.1, reward_bound=0.0
77355: loss=0.000, reward_mean=0.1, reward_bound=0.0
77356: loss=0.000, reward_mean=0.1, reward_bound=0.0
77357: loss=0.000, reward_mean=0.0, reward_bound=0.0
77358: loss=0.000, reward_mean=0.1, reward_bound=0.0
77359: loss=0.000, reward_mean=0.0, reward_bound=0.0
77360: loss=0.000, reward_mean=0.1, reward_bound=0.0
77361: loss=0.000, reward_mean=0.0, reward_bound=0.0
77362: loss=0.000, reward_mean=0.1, reward_bound=0.0
77363: loss=0.000, reward_mean=0.1, reward_bound=0.0
77364: loss=0.000, reward_mean=0.1, reward_bound=0.0
77365: loss=0.000, reward_mean=0.2, reward_bound=0.0
77366: loss=0.000, reward_mean=0.0, reward_bound=0.0
77367: loss=0.000, reward_mean=0.1, reward_bound=0.0
77368: loss=0.000, reward_mean=0.0, reward_bound=0.0
77369: loss=0.000, reward_mean=0.0, reward_bound=0.0
77370: loss=0.000, reward_mean=0.1, reward_bound=0.0
77371: loss=0.000, reward_mean=0.1, reward_bou

77511: loss=0.000, reward_mean=0.1, reward_bound=0.0
77512: loss=0.000, reward_mean=0.2, reward_bound=0.0
77513: loss=0.000, reward_mean=0.1, reward_bound=0.0
77514: loss=0.000, reward_mean=0.2, reward_bound=0.0
77515: loss=0.000, reward_mean=0.2, reward_bound=0.0
77516: loss=0.000, reward_mean=0.1, reward_bound=0.0
77517: loss=0.000, reward_mean=0.1, reward_bound=0.0
77518: loss=0.000, reward_mean=0.1, reward_bound=0.0
77519: loss=0.000, reward_mean=0.1, reward_bound=0.0
77520: loss=0.000, reward_mean=0.1, reward_bound=0.0
77521: loss=0.000, reward_mean=0.1, reward_bound=0.0
77522: loss=0.000, reward_mean=0.0, reward_bound=0.0
77523: loss=0.000, reward_mean=0.2, reward_bound=0.0
77524: loss=0.000, reward_mean=0.0, reward_bound=0.0
77525: loss=0.000, reward_mean=0.2, reward_bound=0.0
77526: loss=0.000, reward_mean=0.0, reward_bound=0.0
77527: loss=0.000, reward_mean=0.0, reward_bound=0.0
77528: loss=0.000, reward_mean=0.1, reward_bound=0.0
77529: loss=0.000, reward_mean=0.1, reward_bou

77667: loss=0.000, reward_mean=0.1, reward_bound=0.0
77668: loss=0.000, reward_mean=0.0, reward_bound=0.0
77669: loss=0.000, reward_mean=0.1, reward_bound=0.0
77670: loss=0.000, reward_mean=0.0, reward_bound=0.0
77671: loss=0.000, reward_mean=0.1, reward_bound=0.0
77672: loss=0.000, reward_mean=0.1, reward_bound=0.0
77673: loss=0.000, reward_mean=0.0, reward_bound=0.0
77674: loss=0.000, reward_mean=0.0, reward_bound=0.0
77675: loss=0.000, reward_mean=0.1, reward_bound=0.0
77676: loss=0.000, reward_mean=0.1, reward_bound=0.0
77677: loss=0.000, reward_mean=0.1, reward_bound=0.0
77678: loss=0.000, reward_mean=0.0, reward_bound=0.0
77679: loss=0.000, reward_mean=0.1, reward_bound=0.0
77680: loss=0.000, reward_mean=0.1, reward_bound=0.0
77681: loss=0.000, reward_mean=0.1, reward_bound=0.0
77682: loss=0.000, reward_mean=0.1, reward_bound=0.0
77683: loss=0.000, reward_mean=0.0, reward_bound=0.0
77684: loss=0.000, reward_mean=0.1, reward_bound=0.0
77685: loss=0.000, reward_mean=0.1, reward_bou

77822: loss=0.000, reward_mean=0.1, reward_bound=0.0
77823: loss=0.000, reward_mean=0.1, reward_bound=0.0
77824: loss=0.000, reward_mean=0.1, reward_bound=0.0
77825: loss=0.000, reward_mean=0.0, reward_bound=0.0
77826: loss=0.000, reward_mean=0.2, reward_bound=0.0
77827: loss=0.000, reward_mean=0.1, reward_bound=0.0
77828: loss=0.000, reward_mean=0.0, reward_bound=0.0
77829: loss=0.000, reward_mean=0.0, reward_bound=0.0
77830: loss=0.000, reward_mean=0.0, reward_bound=0.0
77831: loss=0.000, reward_mean=0.0, reward_bound=0.0
77832: loss=0.000, reward_mean=0.1, reward_bound=0.0
77833: loss=0.000, reward_mean=0.1, reward_bound=0.0
77834: loss=0.000, reward_mean=0.0, reward_bound=0.0
77835: loss=0.000, reward_mean=0.1, reward_bound=0.0
77836: loss=0.000, reward_mean=0.1, reward_bound=0.0
77837: loss=0.000, reward_mean=0.0, reward_bound=0.0
77838: loss=0.000, reward_mean=0.1, reward_bound=0.0
77839: loss=0.000, reward_mean=0.1, reward_bound=0.0
77840: loss=0.000, reward_mean=0.1, reward_bou

77979: loss=0.000, reward_mean=0.1, reward_bound=0.0
77980: loss=0.000, reward_mean=0.0, reward_bound=0.0
77981: loss=0.000, reward_mean=0.1, reward_bound=0.0
77982: loss=0.000, reward_mean=0.0, reward_bound=0.0
77983: loss=0.000, reward_mean=0.1, reward_bound=0.0
77984: loss=0.000, reward_mean=0.0, reward_bound=0.0
77985: loss=0.000, reward_mean=0.0, reward_bound=0.0
77986: loss=0.000, reward_mean=0.0, reward_bound=0.0
77987: loss=0.000, reward_mean=0.0, reward_bound=0.0
77988: loss=0.000, reward_mean=0.0, reward_bound=0.0
77989: loss=0.000, reward_mean=0.1, reward_bound=0.0
77990: loss=0.000, reward_mean=0.1, reward_bound=0.0
77991: loss=0.000, reward_mean=0.1, reward_bound=0.0
77992: loss=0.000, reward_mean=0.1, reward_bound=0.0
77993: loss=0.000, reward_mean=0.1, reward_bound=0.0
77994: loss=0.000, reward_mean=0.1, reward_bound=0.0
77995: loss=0.000, reward_mean=0.1, reward_bound=0.0
77996: loss=0.000, reward_mean=0.1, reward_bound=0.0
77997: loss=0.000, reward_mean=0.0, reward_bou

78135: loss=0.000, reward_mean=0.1, reward_bound=0.0
78136: loss=0.000, reward_mean=0.0, reward_bound=0.0
78137: loss=0.000, reward_mean=0.1, reward_bound=0.0
78138: loss=0.000, reward_mean=0.1, reward_bound=0.0
78139: loss=0.000, reward_mean=0.1, reward_bound=0.0
78140: loss=0.000, reward_mean=0.0, reward_bound=0.0
78141: loss=0.000, reward_mean=0.1, reward_bound=0.0
78142: loss=0.000, reward_mean=0.0, reward_bound=0.0
78143: loss=0.000, reward_mean=0.0, reward_bound=0.0
78144: loss=0.000, reward_mean=0.0, reward_bound=0.0
78145: loss=0.000, reward_mean=0.0, reward_bound=0.0
78146: loss=0.000, reward_mean=0.0, reward_bound=0.0
78147: loss=0.000, reward_mean=0.1, reward_bound=0.0
78148: loss=0.000, reward_mean=0.1, reward_bound=0.0
78149: loss=0.000, reward_mean=0.1, reward_bound=0.0
78150: loss=0.000, reward_mean=0.0, reward_bound=0.0
78151: loss=0.000, reward_mean=0.1, reward_bound=0.0
78152: loss=0.000, reward_mean=0.1, reward_bound=0.0
78153: loss=0.000, reward_mean=0.1, reward_bou

78290: loss=0.000, reward_mean=0.1, reward_bound=0.0
78291: loss=0.000, reward_mean=0.1, reward_bound=0.0
78292: loss=0.000, reward_mean=0.1, reward_bound=0.0
78293: loss=0.000, reward_mean=0.1, reward_bound=0.0
78294: loss=0.000, reward_mean=0.1, reward_bound=0.0
78295: loss=0.000, reward_mean=0.1, reward_bound=0.0
78296: loss=0.000, reward_mean=0.0, reward_bound=0.0
78297: loss=0.000, reward_mean=0.1, reward_bound=0.0
78298: loss=0.000, reward_mean=0.1, reward_bound=0.0
78299: loss=0.000, reward_mean=0.0, reward_bound=0.0
78300: loss=0.000, reward_mean=0.0, reward_bound=0.0
78301: loss=0.000, reward_mean=0.1, reward_bound=0.0
78302: loss=0.000, reward_mean=0.1, reward_bound=0.0
78303: loss=0.000, reward_mean=0.1, reward_bound=0.0
78304: loss=0.000, reward_mean=0.1, reward_bound=0.0
78305: loss=0.000, reward_mean=0.0, reward_bound=0.0
78306: loss=0.000, reward_mean=0.1, reward_bound=0.0
78307: loss=0.000, reward_mean=0.0, reward_bound=0.0
78308: loss=0.000, reward_mean=0.1, reward_bou

78445: loss=0.000, reward_mean=0.0, reward_bound=0.0
78446: loss=0.000, reward_mean=0.0, reward_bound=0.0
78447: loss=0.000, reward_mean=0.1, reward_bound=0.0
78448: loss=0.000, reward_mean=0.2, reward_bound=0.0
78449: loss=0.000, reward_mean=0.0, reward_bound=0.0
78450: loss=0.000, reward_mean=0.0, reward_bound=0.0
78451: loss=0.000, reward_mean=0.1, reward_bound=0.0
78452: loss=0.000, reward_mean=0.1, reward_bound=0.0
78453: loss=0.000, reward_mean=0.0, reward_bound=0.0
78454: loss=0.000, reward_mean=0.1, reward_bound=0.0
78455: loss=0.000, reward_mean=0.1, reward_bound=0.0
78456: loss=0.000, reward_mean=0.0, reward_bound=0.0
78457: loss=0.000, reward_mean=0.1, reward_bound=0.0
78458: loss=0.000, reward_mean=0.0, reward_bound=0.0
78459: loss=0.000, reward_mean=0.2, reward_bound=0.0
78460: loss=0.000, reward_mean=0.2, reward_bound=0.0
78461: loss=0.000, reward_mean=0.1, reward_bound=0.0
78462: loss=0.000, reward_mean=0.1, reward_bound=0.0
78463: loss=0.000, reward_mean=0.0, reward_bou

78605: loss=0.000, reward_mean=0.1, reward_bound=0.0
78606: loss=0.000, reward_mean=0.0, reward_bound=0.0
78607: loss=0.000, reward_mean=0.1, reward_bound=0.0
78608: loss=0.000, reward_mean=0.0, reward_bound=0.0
78609: loss=0.000, reward_mean=0.0, reward_bound=0.0
78610: loss=0.000, reward_mean=0.0, reward_bound=0.0
78611: loss=0.000, reward_mean=0.1, reward_bound=0.0
78612: loss=0.000, reward_mean=0.1, reward_bound=0.0
78613: loss=0.000, reward_mean=0.1, reward_bound=0.0
78614: loss=0.000, reward_mean=0.1, reward_bound=0.0
78615: loss=0.000, reward_mean=0.2, reward_bound=0.0
78616: loss=0.000, reward_mean=0.1, reward_bound=0.0
78617: loss=0.000, reward_mean=0.0, reward_bound=0.0
78618: loss=0.000, reward_mean=0.0, reward_bound=0.0
78619: loss=0.000, reward_mean=0.0, reward_bound=0.0
78620: loss=0.000, reward_mean=0.0, reward_bound=0.0
78621: loss=0.000, reward_mean=0.2, reward_bound=0.0
78622: loss=0.000, reward_mean=0.0, reward_bound=0.0
78623: loss=0.000, reward_mean=0.0, reward_bou

78761: loss=0.000, reward_mean=0.1, reward_bound=0.0
78762: loss=0.000, reward_mean=0.0, reward_bound=0.0
78763: loss=0.000, reward_mean=0.1, reward_bound=0.0
78764: loss=0.000, reward_mean=0.1, reward_bound=0.0
78765: loss=0.000, reward_mean=0.1, reward_bound=0.0
78766: loss=0.000, reward_mean=0.0, reward_bound=0.0
78767: loss=0.000, reward_mean=0.0, reward_bound=0.0
78768: loss=0.000, reward_mean=0.1, reward_bound=0.0
78769: loss=0.000, reward_mean=0.1, reward_bound=0.0
78770: loss=0.000, reward_mean=0.1, reward_bound=0.0
78771: loss=0.000, reward_mean=0.1, reward_bound=0.0
78772: loss=0.000, reward_mean=0.0, reward_bound=0.0
78773: loss=0.000, reward_mean=0.1, reward_bound=0.0
78774: loss=0.000, reward_mean=0.0, reward_bound=0.0
78775: loss=0.000, reward_mean=0.1, reward_bound=0.0
78776: loss=0.000, reward_mean=0.0, reward_bound=0.0
78777: loss=0.000, reward_mean=0.0, reward_bound=0.0
78778: loss=0.000, reward_mean=0.0, reward_bound=0.0
78779: loss=0.000, reward_mean=0.1, reward_bou

78917: loss=0.000, reward_mean=0.2, reward_bound=0.0
78918: loss=0.000, reward_mean=0.1, reward_bound=0.0
78919: loss=0.000, reward_mean=0.1, reward_bound=0.0
78920: loss=0.000, reward_mean=0.0, reward_bound=0.0
78921: loss=0.000, reward_mean=0.1, reward_bound=0.0
78922: loss=0.000, reward_mean=0.1, reward_bound=0.0
78923: loss=0.000, reward_mean=0.1, reward_bound=0.0
78924: loss=0.000, reward_mean=0.2, reward_bound=0.0
78925: loss=0.000, reward_mean=0.1, reward_bound=0.0
78926: loss=0.000, reward_mean=0.1, reward_bound=0.0
78927: loss=0.000, reward_mean=0.1, reward_bound=0.0
78928: loss=0.000, reward_mean=0.1, reward_bound=0.0
78929: loss=0.000, reward_mean=0.1, reward_bound=0.0
78930: loss=0.000, reward_mean=0.0, reward_bound=0.0
78931: loss=0.000, reward_mean=0.1, reward_bound=0.0
78932: loss=0.000, reward_mean=0.0, reward_bound=0.0
78933: loss=0.000, reward_mean=0.1, reward_bound=0.0
78934: loss=0.000, reward_mean=0.0, reward_bound=0.0
78935: loss=0.000, reward_mean=0.1, reward_bou

79075: loss=0.000, reward_mean=0.0, reward_bound=0.0
79076: loss=0.000, reward_mean=0.0, reward_bound=0.0
79077: loss=0.000, reward_mean=0.1, reward_bound=0.0
79078: loss=0.000, reward_mean=0.0, reward_bound=0.0
79079: loss=0.000, reward_mean=0.0, reward_bound=0.0
79080: loss=0.000, reward_mean=0.0, reward_bound=0.0
79081: loss=0.000, reward_mean=0.1, reward_bound=0.0
79082: loss=0.000, reward_mean=0.0, reward_bound=0.0
79083: loss=0.000, reward_mean=0.0, reward_bound=0.0
79084: loss=0.000, reward_mean=0.1, reward_bound=0.0
79085: loss=0.000, reward_mean=0.0, reward_bound=0.0
79086: loss=0.000, reward_mean=0.1, reward_bound=0.0
79087: loss=0.000, reward_mean=0.0, reward_bound=0.0
79088: loss=0.000, reward_mean=0.1, reward_bound=0.0
79089: loss=0.000, reward_mean=0.0, reward_bound=0.0
79090: loss=0.000, reward_mean=0.0, reward_bound=0.0
79091: loss=0.000, reward_mean=0.1, reward_bound=0.0
79092: loss=0.000, reward_mean=0.1, reward_bound=0.0
79093: loss=0.000, reward_mean=0.1, reward_bou

79233: loss=0.000, reward_mean=0.2, reward_bound=0.0
79234: loss=0.000, reward_mean=0.0, reward_bound=0.0
79235: loss=0.000, reward_mean=0.0, reward_bound=0.0
79236: loss=0.000, reward_mean=0.1, reward_bound=0.0
79237: loss=0.000, reward_mean=0.0, reward_bound=0.0
79238: loss=0.000, reward_mean=0.0, reward_bound=0.0
79239: loss=0.000, reward_mean=0.1, reward_bound=0.0
79240: loss=0.000, reward_mean=0.0, reward_bound=0.0
79241: loss=0.000, reward_mean=0.1, reward_bound=0.0
79242: loss=0.000, reward_mean=0.1, reward_bound=0.0
79243: loss=0.000, reward_mean=0.1, reward_bound=0.0
79244: loss=0.000, reward_mean=0.1, reward_bound=0.0
79245: loss=0.000, reward_mean=0.1, reward_bound=0.0
79246: loss=0.000, reward_mean=0.1, reward_bound=0.0
79247: loss=0.000, reward_mean=0.0, reward_bound=0.0
79248: loss=0.000, reward_mean=0.1, reward_bound=0.0
79249: loss=0.000, reward_mean=0.1, reward_bound=0.0
79250: loss=0.000, reward_mean=0.1, reward_bound=0.0
79251: loss=0.000, reward_mean=0.1, reward_bou

79392: loss=0.000, reward_mean=0.0, reward_bound=0.0
79393: loss=0.000, reward_mean=0.1, reward_bound=0.0
79394: loss=0.000, reward_mean=0.1, reward_bound=0.0
79395: loss=0.000, reward_mean=0.1, reward_bound=0.0
79396: loss=0.000, reward_mean=0.0, reward_bound=0.0
79397: loss=0.000, reward_mean=0.1, reward_bound=0.0
79398: loss=0.000, reward_mean=0.1, reward_bound=0.0
79399: loss=0.000, reward_mean=0.1, reward_bound=0.0
79400: loss=0.000, reward_mean=0.0, reward_bound=0.0
79401: loss=0.000, reward_mean=0.0, reward_bound=0.0
79402: loss=0.000, reward_mean=0.1, reward_bound=0.0
79403: loss=0.000, reward_mean=0.1, reward_bound=0.0
79404: loss=0.000, reward_mean=0.1, reward_bound=0.0
79405: loss=0.000, reward_mean=0.1, reward_bound=0.0
79406: loss=0.000, reward_mean=0.1, reward_bound=0.0
79407: loss=0.000, reward_mean=0.0, reward_bound=0.0
79408: loss=0.000, reward_mean=0.1, reward_bound=0.0
79409: loss=0.000, reward_mean=0.0, reward_bound=0.0
79410: loss=0.000, reward_mean=0.0, reward_bou

79547: loss=0.000, reward_mean=0.1, reward_bound=0.0
79548: loss=0.000, reward_mean=0.1, reward_bound=0.0
79549: loss=0.000, reward_mean=0.0, reward_bound=0.0
79550: loss=0.000, reward_mean=0.0, reward_bound=0.0
79551: loss=0.000, reward_mean=0.1, reward_bound=0.0
79552: loss=0.000, reward_mean=0.1, reward_bound=0.0
79553: loss=0.000, reward_mean=0.1, reward_bound=0.0
79554: loss=0.000, reward_mean=0.0, reward_bound=0.0
79555: loss=0.000, reward_mean=0.1, reward_bound=0.0
79556: loss=0.000, reward_mean=0.1, reward_bound=0.0
79557: loss=0.000, reward_mean=0.1, reward_bound=0.0
79558: loss=0.000, reward_mean=0.1, reward_bound=0.0
79559: loss=0.000, reward_mean=0.0, reward_bound=0.0
79560: loss=0.000, reward_mean=0.1, reward_bound=0.0
79561: loss=0.000, reward_mean=0.0, reward_bound=0.0
79562: loss=0.000, reward_mean=0.1, reward_bound=0.0
79563: loss=0.000, reward_mean=0.1, reward_bound=0.0
79564: loss=0.000, reward_mean=0.1, reward_bound=0.0
79565: loss=0.000, reward_mean=0.0, reward_bou

79704: loss=0.000, reward_mean=0.0, reward_bound=0.0
79705: loss=0.000, reward_mean=0.1, reward_bound=0.0
79706: loss=0.000, reward_mean=0.1, reward_bound=0.0
79707: loss=0.000, reward_mean=0.1, reward_bound=0.0
79708: loss=0.000, reward_mean=0.0, reward_bound=0.0
79709: loss=0.000, reward_mean=0.0, reward_bound=0.0
79710: loss=0.000, reward_mean=0.1, reward_bound=0.0
79711: loss=0.000, reward_mean=0.1, reward_bound=0.0
79712: loss=0.000, reward_mean=0.1, reward_bound=0.0
79713: loss=0.000, reward_mean=0.0, reward_bound=0.0
79714: loss=0.000, reward_mean=0.1, reward_bound=0.0
79715: loss=0.000, reward_mean=0.1, reward_bound=0.0
79716: loss=0.000, reward_mean=0.0, reward_bound=0.0
79717: loss=0.000, reward_mean=0.0, reward_bound=0.0
79718: loss=0.000, reward_mean=0.0, reward_bound=0.0
79719: loss=0.000, reward_mean=0.0, reward_bound=0.0
79720: loss=0.000, reward_mean=0.0, reward_bound=0.0
79721: loss=0.000, reward_mean=0.1, reward_bound=0.0
79722: loss=0.000, reward_mean=0.1, reward_bou

79860: loss=0.000, reward_mean=0.2, reward_bound=0.0
79861: loss=0.000, reward_mean=0.0, reward_bound=0.0
79862: loss=0.000, reward_mean=0.0, reward_bound=0.0
79863: loss=0.000, reward_mean=0.1, reward_bound=0.0
79864: loss=0.000, reward_mean=0.0, reward_bound=0.0
79865: loss=0.000, reward_mean=0.1, reward_bound=0.0
79866: loss=0.000, reward_mean=0.1, reward_bound=0.0
79867: loss=0.000, reward_mean=0.1, reward_bound=0.0
79868: loss=0.000, reward_mean=0.1, reward_bound=0.0
79869: loss=0.000, reward_mean=0.0, reward_bound=0.0
79870: loss=0.000, reward_mean=0.0, reward_bound=0.0
79871: loss=0.000, reward_mean=0.1, reward_bound=0.0
79872: loss=0.000, reward_mean=0.1, reward_bound=0.0
79873: loss=0.000, reward_mean=0.0, reward_bound=0.0
79874: loss=0.000, reward_mean=0.0, reward_bound=0.0
79875: loss=0.000, reward_mean=0.1, reward_bound=0.0
79876: loss=0.000, reward_mean=0.1, reward_bound=0.0
79877: loss=0.000, reward_mean=0.1, reward_bound=0.0
79878: loss=0.000, reward_mean=0.1, reward_bou

80015: loss=0.000, reward_mean=0.0, reward_bound=0.0
80016: loss=0.000, reward_mean=0.0, reward_bound=0.0
80017: loss=0.000, reward_mean=0.1, reward_bound=0.0
80018: loss=0.000, reward_mean=0.0, reward_bound=0.0
80019: loss=0.000, reward_mean=0.0, reward_bound=0.0
80020: loss=0.000, reward_mean=0.0, reward_bound=0.0
80021: loss=0.000, reward_mean=0.0, reward_bound=0.0
80022: loss=0.000, reward_mean=0.0, reward_bound=0.0
80023: loss=0.000, reward_mean=0.1, reward_bound=0.0
80024: loss=0.000, reward_mean=0.0, reward_bound=0.0
80025: loss=0.000, reward_mean=0.0, reward_bound=0.0
80026: loss=0.000, reward_mean=0.0, reward_bound=0.0
80027: loss=0.000, reward_mean=0.1, reward_bound=0.0
80028: loss=0.000, reward_mean=0.1, reward_bound=0.0
80029: loss=0.000, reward_mean=0.2, reward_bound=0.0
80030: loss=0.000, reward_mean=0.0, reward_bound=0.0
80031: loss=0.000, reward_mean=0.1, reward_bound=0.0
80032: loss=0.000, reward_mean=0.0, reward_bound=0.0
80033: loss=0.000, reward_mean=0.1, reward_bou

80175: loss=0.000, reward_mean=0.1, reward_bound=0.0
80176: loss=0.000, reward_mean=0.2, reward_bound=0.0
80177: loss=0.000, reward_mean=0.0, reward_bound=0.0
80178: loss=0.000, reward_mean=0.1, reward_bound=0.0
80179: loss=0.000, reward_mean=0.1, reward_bound=0.0
80180: loss=0.000, reward_mean=0.0, reward_bound=0.0
80181: loss=0.000, reward_mean=0.1, reward_bound=0.0
80182: loss=0.000, reward_mean=0.1, reward_bound=0.0
80183: loss=0.000, reward_mean=0.1, reward_bound=0.0
80184: loss=0.000, reward_mean=0.1, reward_bound=0.0
80185: loss=0.000, reward_mean=0.2, reward_bound=0.0
80186: loss=0.000, reward_mean=0.0, reward_bound=0.0
80187: loss=0.000, reward_mean=0.2, reward_bound=0.0
80188: loss=0.000, reward_mean=0.0, reward_bound=0.0
80189: loss=0.000, reward_mean=0.1, reward_bound=0.0
80190: loss=0.000, reward_mean=0.1, reward_bound=0.0
80191: loss=0.000, reward_mean=0.0, reward_bound=0.0
80192: loss=0.000, reward_mean=0.2, reward_bound=0.0
80193: loss=0.000, reward_mean=0.1, reward_bou

80337: loss=0.000, reward_mean=0.0, reward_bound=0.0
80338: loss=0.000, reward_mean=0.0, reward_bound=0.0
80339: loss=0.000, reward_mean=0.0, reward_bound=0.0
80340: loss=0.000, reward_mean=0.1, reward_bound=0.0
80341: loss=0.000, reward_mean=0.1, reward_bound=0.0
80342: loss=0.000, reward_mean=0.1, reward_bound=0.0
80343: loss=0.000, reward_mean=0.0, reward_bound=0.0
80344: loss=0.000, reward_mean=0.0, reward_bound=0.0
80345: loss=0.000, reward_mean=0.1, reward_bound=0.0
80346: loss=0.000, reward_mean=0.0, reward_bound=0.0
80347: loss=0.000, reward_mean=0.1, reward_bound=0.0
80348: loss=0.000, reward_mean=0.1, reward_bound=0.0
80349: loss=0.000, reward_mean=0.0, reward_bound=0.0
80350: loss=0.000, reward_mean=0.2, reward_bound=0.0
80351: loss=0.000, reward_mean=0.1, reward_bound=0.0
80352: loss=0.000, reward_mean=0.0, reward_bound=0.0
80353: loss=0.000, reward_mean=0.1, reward_bound=0.0
80354: loss=0.000, reward_mean=0.0, reward_bound=0.0
80355: loss=0.000, reward_mean=0.0, reward_bou

80494: loss=0.000, reward_mean=0.1, reward_bound=0.0
80495: loss=0.000, reward_mean=0.1, reward_bound=0.0
80496: loss=0.000, reward_mean=0.1, reward_bound=0.0
80497: loss=0.000, reward_mean=0.1, reward_bound=0.0
80498: loss=0.000, reward_mean=0.1, reward_bound=0.0
80499: loss=0.000, reward_mean=0.2, reward_bound=0.0
80500: loss=0.000, reward_mean=0.1, reward_bound=0.0
80501: loss=0.000, reward_mean=0.0, reward_bound=0.0
80502: loss=0.000, reward_mean=0.0, reward_bound=0.0
80503: loss=0.000, reward_mean=0.1, reward_bound=0.0
80504: loss=0.000, reward_mean=0.0, reward_bound=0.0
80505: loss=0.000, reward_mean=0.1, reward_bound=0.0
80506: loss=0.000, reward_mean=0.0, reward_bound=0.0
80507: loss=0.000, reward_mean=0.1, reward_bound=0.0
80508: loss=0.000, reward_mean=0.0, reward_bound=0.0
80509: loss=0.000, reward_mean=0.1, reward_bound=0.0
80510: loss=0.000, reward_mean=0.0, reward_bound=0.0
80511: loss=0.000, reward_mean=0.2, reward_bound=0.0
80512: loss=0.000, reward_mean=0.0, reward_bou

80653: loss=0.000, reward_mean=0.1, reward_bound=0.0
80654: loss=0.000, reward_mean=0.0, reward_bound=0.0
80655: loss=0.000, reward_mean=0.0, reward_bound=0.0
80656: loss=0.000, reward_mean=0.0, reward_bound=0.0
80657: loss=0.000, reward_mean=0.0, reward_bound=0.0
80658: loss=0.000, reward_mean=0.0, reward_bound=0.0
80659: loss=0.000, reward_mean=0.1, reward_bound=0.0
80660: loss=0.000, reward_mean=0.0, reward_bound=0.0
80661: loss=0.000, reward_mean=0.1, reward_bound=0.0
80662: loss=0.000, reward_mean=0.0, reward_bound=0.0
80663: loss=0.000, reward_mean=0.1, reward_bound=0.0
80664: loss=0.000, reward_mean=0.1, reward_bound=0.0
80665: loss=0.000, reward_mean=0.0, reward_bound=0.0
80666: loss=0.000, reward_mean=0.1, reward_bound=0.0
80667: loss=0.000, reward_mean=0.1, reward_bound=0.0
80668: loss=0.000, reward_mean=0.0, reward_bound=0.0
80669: loss=0.000, reward_mean=0.0, reward_bound=0.0
80670: loss=0.000, reward_mean=0.1, reward_bound=0.0
80671: loss=0.000, reward_mean=0.1, reward_bou

80812: loss=0.000, reward_mean=0.1, reward_bound=0.0
80813: loss=0.000, reward_mean=0.1, reward_bound=0.0
80814: loss=0.000, reward_mean=0.0, reward_bound=0.0
80815: loss=0.000, reward_mean=0.0, reward_bound=0.0
80816: loss=0.000, reward_mean=0.1, reward_bound=0.0
80817: loss=0.000, reward_mean=0.1, reward_bound=0.0
80818: loss=0.000, reward_mean=0.0, reward_bound=0.0
80819: loss=0.000, reward_mean=0.1, reward_bound=0.0
80820: loss=0.000, reward_mean=0.0, reward_bound=0.0
80821: loss=0.000, reward_mean=0.0, reward_bound=0.0
80822: loss=0.000, reward_mean=0.1, reward_bound=0.0
80823: loss=0.000, reward_mean=0.1, reward_bound=0.0
80824: loss=0.000, reward_mean=0.1, reward_bound=0.0
80825: loss=0.000, reward_mean=0.0, reward_bound=0.0
80826: loss=0.000, reward_mean=0.1, reward_bound=0.0
80827: loss=0.000, reward_mean=0.0, reward_bound=0.0
80828: loss=0.000, reward_mean=0.0, reward_bound=0.0
80829: loss=0.000, reward_mean=0.0, reward_bound=0.0
80830: loss=0.000, reward_mean=0.1, reward_bou

80967: loss=0.000, reward_mean=0.1, reward_bound=0.0
80968: loss=0.000, reward_mean=0.0, reward_bound=0.0
80969: loss=0.000, reward_mean=0.1, reward_bound=0.0
80970: loss=0.000, reward_mean=0.0, reward_bound=0.0
80971: loss=0.000, reward_mean=0.0, reward_bound=0.0
80972: loss=0.000, reward_mean=0.1, reward_bound=0.0
80973: loss=0.000, reward_mean=0.0, reward_bound=0.0
80974: loss=0.000, reward_mean=0.1, reward_bound=0.0
80975: loss=0.000, reward_mean=0.1, reward_bound=0.0
80976: loss=0.000, reward_mean=0.0, reward_bound=0.0
80977: loss=0.000, reward_mean=0.1, reward_bound=0.0
80978: loss=0.000, reward_mean=0.1, reward_bound=0.0
80979: loss=0.000, reward_mean=0.1, reward_bound=0.0
80980: loss=0.000, reward_mean=0.1, reward_bound=0.0
80981: loss=0.000, reward_mean=0.1, reward_bound=0.0
80982: loss=0.000, reward_mean=0.1, reward_bound=0.0
80983: loss=0.000, reward_mean=0.1, reward_bound=0.0
80984: loss=0.000, reward_mean=0.0, reward_bound=0.0
80985: loss=0.000, reward_mean=0.0, reward_bou

81123: loss=0.000, reward_mean=0.0, reward_bound=0.0
81124: loss=0.000, reward_mean=0.1, reward_bound=0.0
81125: loss=0.000, reward_mean=0.1, reward_bound=0.0
81126: loss=0.000, reward_mean=0.1, reward_bound=0.0
81127: loss=0.000, reward_mean=0.0, reward_bound=0.0
81128: loss=0.000, reward_mean=0.0, reward_bound=0.0
81129: loss=0.000, reward_mean=0.0, reward_bound=0.0
81130: loss=0.000, reward_mean=0.1, reward_bound=0.0
81131: loss=0.000, reward_mean=0.1, reward_bound=0.0
81132: loss=0.000, reward_mean=0.0, reward_bound=0.0
81133: loss=0.000, reward_mean=0.0, reward_bound=0.0
81134: loss=0.000, reward_mean=0.0, reward_bound=0.0
81135: loss=0.000, reward_mean=0.2, reward_bound=0.0
81136: loss=0.000, reward_mean=0.0, reward_bound=0.0
81137: loss=0.000, reward_mean=0.1, reward_bound=0.0
81138: loss=0.000, reward_mean=0.0, reward_bound=0.0
81139: loss=0.000, reward_mean=0.0, reward_bound=0.0
81140: loss=0.000, reward_mean=0.0, reward_bound=0.0
81141: loss=0.000, reward_mean=0.1, reward_bou

81278: loss=0.000, reward_mean=0.1, reward_bound=0.0
81279: loss=0.000, reward_mean=0.1, reward_bound=0.0
81280: loss=0.000, reward_mean=0.0, reward_bound=0.0
81281: loss=0.000, reward_mean=0.1, reward_bound=0.0
81282: loss=0.000, reward_mean=0.0, reward_bound=0.0
81283: loss=0.000, reward_mean=0.1, reward_bound=0.0
81284: loss=0.000, reward_mean=0.1, reward_bound=0.0
81285: loss=0.000, reward_mean=0.1, reward_bound=0.0
81286: loss=0.000, reward_mean=0.1, reward_bound=0.0
81287: loss=0.000, reward_mean=0.1, reward_bound=0.0
81288: loss=0.000, reward_mean=0.1, reward_bound=0.0
81289: loss=0.000, reward_mean=0.1, reward_bound=0.0
81290: loss=0.000, reward_mean=0.1, reward_bound=0.0
81291: loss=0.000, reward_mean=0.1, reward_bound=0.0
81292: loss=0.000, reward_mean=0.1, reward_bound=0.0
81293: loss=0.000, reward_mean=0.1, reward_bound=0.0
81294: loss=0.000, reward_mean=0.1, reward_bound=0.0
81295: loss=0.000, reward_mean=0.1, reward_bound=0.0
81296: loss=0.000, reward_mean=0.1, reward_bou

81438: loss=0.000, reward_mean=0.1, reward_bound=0.0
81439: loss=0.000, reward_mean=0.0, reward_bound=0.0
81440: loss=0.000, reward_mean=0.0, reward_bound=0.0
81441: loss=0.000, reward_mean=0.1, reward_bound=0.0
81442: loss=0.000, reward_mean=0.1, reward_bound=0.0
81443: loss=0.000, reward_mean=0.1, reward_bound=0.0
81444: loss=0.000, reward_mean=0.1, reward_bound=0.0
81445: loss=0.000, reward_mean=0.1, reward_bound=0.0
81446: loss=0.000, reward_mean=0.1, reward_bound=0.0
81447: loss=0.000, reward_mean=0.1, reward_bound=0.0
81448: loss=0.000, reward_mean=0.0, reward_bound=0.0
81449: loss=0.000, reward_mean=0.1, reward_bound=0.0
81450: loss=0.000, reward_mean=0.2, reward_bound=0.0
81451: loss=0.000, reward_mean=0.1, reward_bound=0.0
81452: loss=0.000, reward_mean=0.1, reward_bound=0.0
81453: loss=0.000, reward_mean=0.1, reward_bound=0.0
81454: loss=0.000, reward_mean=0.2, reward_bound=0.0
81455: loss=0.000, reward_mean=0.1, reward_bound=0.0
81456: loss=0.000, reward_mean=0.0, reward_bou

81595: loss=0.000, reward_mean=0.1, reward_bound=0.0
81596: loss=0.000, reward_mean=0.1, reward_bound=0.0
81597: loss=0.000, reward_mean=0.1, reward_bound=0.0
81598: loss=0.000, reward_mean=0.0, reward_bound=0.0
81599: loss=0.000, reward_mean=0.0, reward_bound=0.0
81600: loss=0.000, reward_mean=0.0, reward_bound=0.0
81601: loss=0.000, reward_mean=0.0, reward_bound=0.0
81602: loss=0.000, reward_mean=0.0, reward_bound=0.0
81603: loss=0.000, reward_mean=0.1, reward_bound=0.0
81604: loss=0.000, reward_mean=0.1, reward_bound=0.0
81605: loss=0.000, reward_mean=0.1, reward_bound=0.0
81606: loss=0.000, reward_mean=0.0, reward_bound=0.0
81607: loss=0.000, reward_mean=0.1, reward_bound=0.0
81608: loss=0.000, reward_mean=0.0, reward_bound=0.0
81609: loss=0.000, reward_mean=0.0, reward_bound=0.0
81610: loss=0.000, reward_mean=0.0, reward_bound=0.0
81611: loss=0.000, reward_mean=0.1, reward_bound=0.0
81612: loss=0.000, reward_mean=0.1, reward_bound=0.0
81613: loss=0.000, reward_mean=0.1, reward_bou

81750: loss=0.000, reward_mean=0.0, reward_bound=0.0
81751: loss=0.000, reward_mean=0.0, reward_bound=0.0
81752: loss=0.000, reward_mean=0.0, reward_bound=0.0
81753: loss=0.000, reward_mean=0.1, reward_bound=0.0
81754: loss=0.000, reward_mean=0.0, reward_bound=0.0
81755: loss=0.000, reward_mean=0.0, reward_bound=0.0
81756: loss=0.000, reward_mean=0.0, reward_bound=0.0
81757: loss=0.000, reward_mean=0.0, reward_bound=0.0
81758: loss=0.000, reward_mean=0.1, reward_bound=0.0
81759: loss=0.000, reward_mean=0.0, reward_bound=0.0
81760: loss=0.000, reward_mean=0.1, reward_bound=0.0
81761: loss=0.000, reward_mean=0.0, reward_bound=0.0
81762: loss=0.000, reward_mean=0.0, reward_bound=0.0
81763: loss=0.000, reward_mean=0.2, reward_bound=0.0
81764: loss=0.000, reward_mean=0.0, reward_bound=0.0
81765: loss=0.000, reward_mean=0.0, reward_bound=0.0
81766: loss=0.000, reward_mean=0.1, reward_bound=0.0
81767: loss=0.000, reward_mean=0.1, reward_bound=0.0
81768: loss=0.000, reward_mean=0.1, reward_bou

81910: loss=0.000, reward_mean=0.0, reward_bound=0.0
81911: loss=0.000, reward_mean=0.0, reward_bound=0.0
81912: loss=0.000, reward_mean=0.1, reward_bound=0.0
81913: loss=0.000, reward_mean=0.1, reward_bound=0.0
81914: loss=0.000, reward_mean=0.1, reward_bound=0.0
81915: loss=0.000, reward_mean=0.1, reward_bound=0.0
81916: loss=0.000, reward_mean=0.0, reward_bound=0.0
81917: loss=0.000, reward_mean=0.1, reward_bound=0.0
81918: loss=0.000, reward_mean=0.1, reward_bound=0.0
81919: loss=0.000, reward_mean=0.0, reward_bound=0.0
81920: loss=0.000, reward_mean=0.0, reward_bound=0.0
81921: loss=0.000, reward_mean=0.1, reward_bound=0.0
81922: loss=0.000, reward_mean=0.1, reward_bound=0.0
81923: loss=0.000, reward_mean=0.0, reward_bound=0.0
81924: loss=0.000, reward_mean=0.1, reward_bound=0.0
81925: loss=0.000, reward_mean=0.1, reward_bound=0.0
81926: loss=0.000, reward_mean=0.1, reward_bound=0.0
81927: loss=0.000, reward_mean=0.1, reward_bound=0.0
81928: loss=0.000, reward_mean=0.1, reward_bou

82067: loss=0.000, reward_mean=0.0, reward_bound=0.0
82068: loss=0.000, reward_mean=0.0, reward_bound=0.0
82069: loss=0.000, reward_mean=0.1, reward_bound=0.0
82070: loss=0.000, reward_mean=0.0, reward_bound=0.0
82071: loss=0.000, reward_mean=0.2, reward_bound=0.0
82072: loss=0.000, reward_mean=0.0, reward_bound=0.0
82073: loss=0.000, reward_mean=0.1, reward_bound=0.0
82074: loss=0.000, reward_mean=0.0, reward_bound=0.0
82075: loss=0.000, reward_mean=0.1, reward_bound=0.0
82076: loss=0.000, reward_mean=0.1, reward_bound=0.0
82077: loss=0.000, reward_mean=0.1, reward_bound=0.0
82078: loss=0.000, reward_mean=0.0, reward_bound=0.0
82079: loss=0.000, reward_mean=0.0, reward_bound=0.0
82080: loss=0.000, reward_mean=0.1, reward_bound=0.0
82081: loss=0.000, reward_mean=0.0, reward_bound=0.0
82082: loss=0.000, reward_mean=0.1, reward_bound=0.0
82083: loss=0.000, reward_mean=0.1, reward_bound=0.0
82084: loss=0.000, reward_mean=0.1, reward_bound=0.0
82085: loss=0.000, reward_mean=0.1, reward_bou

82228: loss=0.000, reward_mean=0.0, reward_bound=0.0
82229: loss=0.000, reward_mean=0.0, reward_bound=0.0
82230: loss=0.000, reward_mean=0.0, reward_bound=0.0
82231: loss=0.000, reward_mean=0.1, reward_bound=0.0
82232: loss=0.000, reward_mean=0.0, reward_bound=0.0
82233: loss=0.000, reward_mean=0.0, reward_bound=0.0
82234: loss=0.000, reward_mean=0.1, reward_bound=0.0
82235: loss=0.000, reward_mean=0.0, reward_bound=0.0
82236: loss=0.000, reward_mean=0.1, reward_bound=0.0
82237: loss=0.000, reward_mean=0.2, reward_bound=0.0
82238: loss=0.000, reward_mean=0.0, reward_bound=0.0
82239: loss=0.000, reward_mean=0.1, reward_bound=0.0
82240: loss=0.000, reward_mean=0.1, reward_bound=0.0
82241: loss=0.000, reward_mean=0.0, reward_bound=0.0
82242: loss=0.000, reward_mean=0.0, reward_bound=0.0
82243: loss=0.000, reward_mean=0.0, reward_bound=0.0
82244: loss=0.000, reward_mean=0.1, reward_bound=0.0
82245: loss=0.000, reward_mean=0.1, reward_bound=0.0
82246: loss=0.000, reward_mean=0.1, reward_bou

82386: loss=0.000, reward_mean=0.1, reward_bound=0.0
82387: loss=0.000, reward_mean=0.1, reward_bound=0.0
82388: loss=0.000, reward_mean=0.1, reward_bound=0.0
82389: loss=0.000, reward_mean=0.0, reward_bound=0.0
82390: loss=0.000, reward_mean=0.1, reward_bound=0.0
82391: loss=0.000, reward_mean=0.0, reward_bound=0.0
82392: loss=0.000, reward_mean=0.1, reward_bound=0.0
82393: loss=0.000, reward_mean=0.2, reward_bound=0.0
82394: loss=0.000, reward_mean=0.1, reward_bound=0.0
82395: loss=0.000, reward_mean=0.1, reward_bound=0.0
82396: loss=0.000, reward_mean=0.1, reward_bound=0.0
82397: loss=0.000, reward_mean=0.1, reward_bound=0.0
82398: loss=0.000, reward_mean=0.1, reward_bound=0.0
82399: loss=0.000, reward_mean=0.0, reward_bound=0.0
82400: loss=0.000, reward_mean=0.1, reward_bound=0.0
82401: loss=0.000, reward_mean=0.0, reward_bound=0.0
82402: loss=0.000, reward_mean=0.1, reward_bound=0.0
82403: loss=0.000, reward_mean=0.1, reward_bound=0.0
82404: loss=0.000, reward_mean=0.1, reward_bou

82541: loss=0.000, reward_mean=0.1, reward_bound=0.0
82542: loss=0.000, reward_mean=0.0, reward_bound=0.0
82543: loss=0.000, reward_mean=0.0, reward_bound=0.0
82544: loss=0.000, reward_mean=0.1, reward_bound=0.0
82545: loss=0.000, reward_mean=0.0, reward_bound=0.0
82546: loss=0.000, reward_mean=0.0, reward_bound=0.0
82547: loss=0.000, reward_mean=0.1, reward_bound=0.0
82548: loss=0.000, reward_mean=0.0, reward_bound=0.0
82549: loss=0.000, reward_mean=0.0, reward_bound=0.0
82550: loss=0.000, reward_mean=0.0, reward_bound=0.0
82551: loss=0.000, reward_mean=0.1, reward_bound=0.0
82552: loss=0.000, reward_mean=0.0, reward_bound=0.0
82553: loss=0.000, reward_mean=0.0, reward_bound=0.0
82554: loss=0.000, reward_mean=0.0, reward_bound=0.0
82555: loss=0.000, reward_mean=0.1, reward_bound=0.0
82556: loss=0.000, reward_mean=0.1, reward_bound=0.0
82557: loss=0.000, reward_mean=0.1, reward_bound=0.0
82558: loss=0.000, reward_mean=0.0, reward_bound=0.0
82559: loss=0.000, reward_mean=0.0, reward_bou

82701: loss=0.000, reward_mean=0.0, reward_bound=0.0
82702: loss=0.000, reward_mean=0.0, reward_bound=0.0
82703: loss=0.000, reward_mean=0.0, reward_bound=0.0
82704: loss=0.000, reward_mean=0.1, reward_bound=0.0
82705: loss=0.000, reward_mean=0.1, reward_bound=0.0
82706: loss=0.000, reward_mean=0.0, reward_bound=0.0
82707: loss=0.000, reward_mean=0.1, reward_bound=0.0
82708: loss=0.000, reward_mean=0.1, reward_bound=0.0
82709: loss=0.000, reward_mean=0.1, reward_bound=0.0
82710: loss=0.000, reward_mean=0.1, reward_bound=0.0
82711: loss=0.000, reward_mean=0.1, reward_bound=0.0
82712: loss=0.000, reward_mean=0.0, reward_bound=0.0
82713: loss=0.000, reward_mean=0.0, reward_bound=0.0
82714: loss=0.000, reward_mean=0.1, reward_bound=0.0
82715: loss=0.000, reward_mean=0.1, reward_bound=0.0
82716: loss=0.000, reward_mean=0.1, reward_bound=0.0
82717: loss=0.000, reward_mean=0.0, reward_bound=0.0
82718: loss=0.000, reward_mean=0.1, reward_bound=0.0
82719: loss=0.000, reward_mean=0.0, reward_bou

82860: loss=0.000, reward_mean=0.1, reward_bound=0.0
82861: loss=0.000, reward_mean=0.0, reward_bound=0.0
82862: loss=0.000, reward_mean=0.0, reward_bound=0.0
82863: loss=0.000, reward_mean=0.1, reward_bound=0.0
82864: loss=0.000, reward_mean=0.1, reward_bound=0.0
82865: loss=0.000, reward_mean=0.0, reward_bound=0.0
82866: loss=0.000, reward_mean=0.1, reward_bound=0.0
82867: loss=0.000, reward_mean=0.1, reward_bound=0.0
82868: loss=0.000, reward_mean=0.0, reward_bound=0.0
82869: loss=0.000, reward_mean=0.1, reward_bound=0.0
82870: loss=0.000, reward_mean=0.1, reward_bound=0.0
82871: loss=0.000, reward_mean=0.2, reward_bound=0.0
82872: loss=0.000, reward_mean=0.1, reward_bound=0.0
82873: loss=0.000, reward_mean=0.0, reward_bound=0.0
82874: loss=0.000, reward_mean=0.1, reward_bound=0.0
82875: loss=0.000, reward_mean=0.0, reward_bound=0.0
82876: loss=0.000, reward_mean=0.0, reward_bound=0.0
82877: loss=0.000, reward_mean=0.1, reward_bound=0.0
82878: loss=0.000, reward_mean=0.1, reward_bou

83018: loss=0.000, reward_mean=0.0, reward_bound=0.0
83019: loss=0.000, reward_mean=0.0, reward_bound=0.0
83020: loss=0.000, reward_mean=0.1, reward_bound=0.0
83021: loss=0.000, reward_mean=0.0, reward_bound=0.0
83022: loss=0.000, reward_mean=0.0, reward_bound=0.0
83023: loss=0.000, reward_mean=0.1, reward_bound=0.0
83024: loss=0.000, reward_mean=0.0, reward_bound=0.0
83025: loss=0.000, reward_mean=0.1, reward_bound=0.0
83026: loss=0.000, reward_mean=0.0, reward_bound=0.0
83027: loss=0.000, reward_mean=0.1, reward_bound=0.0
83028: loss=0.000, reward_mean=0.2, reward_bound=0.0
83029: loss=0.000, reward_mean=0.0, reward_bound=0.0
83030: loss=0.000, reward_mean=0.1, reward_bound=0.0
83031: loss=0.000, reward_mean=0.1, reward_bound=0.0
83032: loss=0.000, reward_mean=0.1, reward_bound=0.0
83033: loss=0.000, reward_mean=0.1, reward_bound=0.0
83034: loss=0.000, reward_mean=0.0, reward_bound=0.0
83035: loss=0.000, reward_mean=0.0, reward_bound=0.0
83036: loss=0.000, reward_mean=0.0, reward_bou

83178: loss=0.000, reward_mean=0.0, reward_bound=0.0
83179: loss=0.000, reward_mean=0.1, reward_bound=0.0
83180: loss=0.000, reward_mean=0.0, reward_bound=0.0
83181: loss=0.000, reward_mean=0.1, reward_bound=0.0
83182: loss=0.000, reward_mean=0.0, reward_bound=0.0
83183: loss=0.000, reward_mean=0.1, reward_bound=0.0
83184: loss=0.000, reward_mean=0.2, reward_bound=0.0
83185: loss=0.000, reward_mean=0.0, reward_bound=0.0
83186: loss=0.000, reward_mean=0.0, reward_bound=0.0
83187: loss=0.000, reward_mean=0.1, reward_bound=0.0
83188: loss=0.000, reward_mean=0.0, reward_bound=0.0
83189: loss=0.000, reward_mean=0.0, reward_bound=0.0
83190: loss=0.000, reward_mean=0.0, reward_bound=0.0
83191: loss=0.000, reward_mean=0.1, reward_bound=0.0
83192: loss=0.000, reward_mean=0.0, reward_bound=0.0
83193: loss=0.000, reward_mean=0.0, reward_bound=0.0
83194: loss=0.000, reward_mean=0.0, reward_bound=0.0
83195: loss=0.000, reward_mean=0.0, reward_bound=0.0
83196: loss=0.000, reward_mean=0.0, reward_bou

83338: loss=0.000, reward_mean=0.0, reward_bound=0.0
83339: loss=0.000, reward_mean=0.1, reward_bound=0.0
83340: loss=0.000, reward_mean=0.0, reward_bound=0.0
83341: loss=0.000, reward_mean=0.1, reward_bound=0.0
83342: loss=0.000, reward_mean=0.1, reward_bound=0.0
83343: loss=0.000, reward_mean=0.1, reward_bound=0.0
83344: loss=0.000, reward_mean=0.0, reward_bound=0.0
83345: loss=0.000, reward_mean=0.0, reward_bound=0.0
83346: loss=0.000, reward_mean=0.2, reward_bound=0.0
83347: loss=0.000, reward_mean=0.1, reward_bound=0.0
83348: loss=0.000, reward_mean=0.1, reward_bound=0.0
83349: loss=0.000, reward_mean=0.0, reward_bound=0.0
83350: loss=0.000, reward_mean=0.0, reward_bound=0.0
83351: loss=0.000, reward_mean=0.1, reward_bound=0.0
83352: loss=0.000, reward_mean=0.0, reward_bound=0.0
83353: loss=0.000, reward_mean=0.2, reward_bound=0.0
83354: loss=0.000, reward_mean=0.1, reward_bound=0.0
83355: loss=0.000, reward_mean=0.0, reward_bound=0.0
83356: loss=0.000, reward_mean=0.0, reward_bou

83498: loss=0.000, reward_mean=0.0, reward_bound=0.0
83499: loss=0.000, reward_mean=0.0, reward_bound=0.0
83500: loss=0.000, reward_mean=0.0, reward_bound=0.0
83501: loss=0.000, reward_mean=0.2, reward_bound=0.0
83502: loss=0.000, reward_mean=0.0, reward_bound=0.0
83503: loss=0.000, reward_mean=0.1, reward_bound=0.0
83504: loss=0.000, reward_mean=0.0, reward_bound=0.0
83505: loss=0.000, reward_mean=0.1, reward_bound=0.0
83506: loss=0.000, reward_mean=0.0, reward_bound=0.0
83507: loss=0.000, reward_mean=0.2, reward_bound=0.0
83508: loss=0.000, reward_mean=0.1, reward_bound=0.0
83509: loss=0.000, reward_mean=0.0, reward_bound=0.0
83510: loss=0.000, reward_mean=0.0, reward_bound=0.0
83511: loss=0.000, reward_mean=0.0, reward_bound=0.0
83512: loss=0.000, reward_mean=0.1, reward_bound=0.0
83513: loss=0.000, reward_mean=0.1, reward_bound=0.0
83514: loss=0.000, reward_mean=0.0, reward_bound=0.0
83515: loss=0.000, reward_mean=0.0, reward_bound=0.0
83516: loss=0.000, reward_mean=0.2, reward_bou

83655: loss=0.000, reward_mean=0.1, reward_bound=0.0
83656: loss=0.000, reward_mean=0.0, reward_bound=0.0
83657: loss=0.000, reward_mean=0.2, reward_bound=0.0
83658: loss=0.000, reward_mean=0.1, reward_bound=0.0
83659: loss=0.000, reward_mean=0.1, reward_bound=0.0
83660: loss=0.000, reward_mean=0.0, reward_bound=0.0
83661: loss=0.000, reward_mean=0.1, reward_bound=0.0
83662: loss=0.000, reward_mean=0.1, reward_bound=0.0
83663: loss=0.000, reward_mean=0.0, reward_bound=0.0
83664: loss=0.000, reward_mean=0.1, reward_bound=0.0
83665: loss=0.000, reward_mean=0.1, reward_bound=0.0
83666: loss=0.000, reward_mean=0.1, reward_bound=0.0
83667: loss=0.000, reward_mean=0.1, reward_bound=0.0
83668: loss=0.000, reward_mean=0.0, reward_bound=0.0
83669: loss=0.000, reward_mean=0.0, reward_bound=0.0
83670: loss=0.000, reward_mean=0.0, reward_bound=0.0
83671: loss=0.000, reward_mean=0.0, reward_bound=0.0
83672: loss=0.000, reward_mean=0.0, reward_bound=0.0
83673: loss=0.000, reward_mean=0.1, reward_bou

83813: loss=0.000, reward_mean=0.0, reward_bound=0.0
83814: loss=0.000, reward_mean=0.1, reward_bound=0.0
83815: loss=0.000, reward_mean=0.0, reward_bound=0.0
83816: loss=0.000, reward_mean=0.1, reward_bound=0.0
83817: loss=0.000, reward_mean=0.1, reward_bound=0.0
83818: loss=0.000, reward_mean=0.0, reward_bound=0.0
83819: loss=0.000, reward_mean=0.0, reward_bound=0.0
83820: loss=0.000, reward_mean=0.0, reward_bound=0.0
83821: loss=0.000, reward_mean=0.2, reward_bound=0.0
83822: loss=0.000, reward_mean=0.1, reward_bound=0.0
83823: loss=0.000, reward_mean=0.2, reward_bound=0.0
83824: loss=0.000, reward_mean=0.1, reward_bound=0.0
83825: loss=0.000, reward_mean=0.1, reward_bound=0.0
83826: loss=0.000, reward_mean=0.0, reward_bound=0.0
83827: loss=0.000, reward_mean=0.0, reward_bound=0.0
83828: loss=0.000, reward_mean=0.0, reward_bound=0.0
83829: loss=0.000, reward_mean=0.0, reward_bound=0.0
83830: loss=0.000, reward_mean=0.1, reward_bound=0.0
83831: loss=0.000, reward_mean=0.1, reward_bou

83971: loss=0.000, reward_mean=0.1, reward_bound=0.0
83972: loss=0.000, reward_mean=0.0, reward_bound=0.0
83973: loss=0.000, reward_mean=0.0, reward_bound=0.0
83974: loss=0.000, reward_mean=0.0, reward_bound=0.0
83975: loss=0.000, reward_mean=0.1, reward_bound=0.0
83976: loss=0.000, reward_mean=0.1, reward_bound=0.0
83977: loss=0.000, reward_mean=0.0, reward_bound=0.0
83978: loss=0.000, reward_mean=0.1, reward_bound=0.0
83979: loss=0.000, reward_mean=0.1, reward_bound=0.0
83980: loss=0.000, reward_mean=0.0, reward_bound=0.0
83981: loss=0.000, reward_mean=0.0, reward_bound=0.0
83982: loss=0.000, reward_mean=0.1, reward_bound=0.0
83983: loss=0.000, reward_mean=0.1, reward_bound=0.0
83984: loss=0.000, reward_mean=0.0, reward_bound=0.0
83985: loss=0.000, reward_mean=0.1, reward_bound=0.0
83986: loss=0.000, reward_mean=0.0, reward_bound=0.0
83987: loss=0.000, reward_mean=0.0, reward_bound=0.0
83988: loss=0.000, reward_mean=0.0, reward_bound=0.0
83989: loss=0.000, reward_mean=0.0, reward_bou

84133: loss=0.000, reward_mean=0.1, reward_bound=0.0
84134: loss=0.000, reward_mean=0.0, reward_bound=0.0
84135: loss=0.000, reward_mean=0.0, reward_bound=0.0
84136: loss=0.000, reward_mean=0.0, reward_bound=0.0
84137: loss=0.000, reward_mean=0.1, reward_bound=0.0
84138: loss=0.000, reward_mean=0.1, reward_bound=0.0
84139: loss=0.000, reward_mean=0.1, reward_bound=0.0
84140: loss=0.000, reward_mean=0.0, reward_bound=0.0
84141: loss=0.000, reward_mean=0.0, reward_bound=0.0
84142: loss=0.000, reward_mean=0.0, reward_bound=0.0
84143: loss=0.000, reward_mean=0.0, reward_bound=0.0
84144: loss=0.000, reward_mean=0.0, reward_bound=0.0
84145: loss=0.000, reward_mean=0.1, reward_bound=0.0
84146: loss=0.000, reward_mean=0.0, reward_bound=0.0
84147: loss=0.000, reward_mean=0.1, reward_bound=0.0
84148: loss=0.000, reward_mean=0.0, reward_bound=0.0
84149: loss=0.000, reward_mean=0.0, reward_bound=0.0
84150: loss=0.000, reward_mean=0.1, reward_bound=0.0
84151: loss=0.000, reward_mean=0.1, reward_bou

84295: loss=0.000, reward_mean=0.1, reward_bound=0.0
84296: loss=0.000, reward_mean=0.1, reward_bound=0.0
84297: loss=0.000, reward_mean=0.0, reward_bound=0.0
84298: loss=0.000, reward_mean=0.1, reward_bound=0.0
84299: loss=0.000, reward_mean=0.1, reward_bound=0.0
84300: loss=0.000, reward_mean=0.0, reward_bound=0.0
84301: loss=0.000, reward_mean=0.0, reward_bound=0.0
84302: loss=0.000, reward_mean=0.0, reward_bound=0.0
84303: loss=0.000, reward_mean=0.0, reward_bound=0.0
84304: loss=0.000, reward_mean=0.0, reward_bound=0.0
84305: loss=0.000, reward_mean=0.0, reward_bound=0.0
84306: loss=0.000, reward_mean=0.1, reward_bound=0.0
84307: loss=0.000, reward_mean=0.0, reward_bound=0.0
84308: loss=0.000, reward_mean=0.0, reward_bound=0.0
84309: loss=0.000, reward_mean=0.0, reward_bound=0.0
84310: loss=0.000, reward_mean=0.0, reward_bound=0.0
84311: loss=0.000, reward_mean=0.1, reward_bound=0.0
84312: loss=0.000, reward_mean=0.1, reward_bound=0.0
84313: loss=0.000, reward_mean=0.1, reward_bou

84455: loss=0.000, reward_mean=0.1, reward_bound=0.0
84456: loss=0.000, reward_mean=0.1, reward_bound=0.0
84457: loss=0.000, reward_mean=0.0, reward_bound=0.0
84458: loss=0.000, reward_mean=0.0, reward_bound=0.0
84459: loss=0.000, reward_mean=0.1, reward_bound=0.0
84460: loss=0.000, reward_mean=0.1, reward_bound=0.0
84461: loss=0.000, reward_mean=0.0, reward_bound=0.0
84462: loss=0.000, reward_mean=0.0, reward_bound=0.0
84463: loss=0.000, reward_mean=0.0, reward_bound=0.0
84464: loss=0.000, reward_mean=0.0, reward_bound=0.0
84465: loss=0.000, reward_mean=0.1, reward_bound=0.0
84466: loss=0.000, reward_mean=0.0, reward_bound=0.0
84467: loss=0.000, reward_mean=0.1, reward_bound=0.0
84468: loss=0.000, reward_mean=0.1, reward_bound=0.0
84469: loss=0.000, reward_mean=0.2, reward_bound=0.0
84470: loss=0.000, reward_mean=0.1, reward_bound=0.0
84471: loss=0.000, reward_mean=0.0, reward_bound=0.0
84472: loss=0.000, reward_mean=0.0, reward_bound=0.0
84473: loss=0.000, reward_mean=0.2, reward_bou

84612: loss=0.000, reward_mean=0.0, reward_bound=0.0
84613: loss=0.000, reward_mean=0.1, reward_bound=0.0
84614: loss=0.000, reward_mean=0.0, reward_bound=0.0
84615: loss=0.000, reward_mean=0.0, reward_bound=0.0
84616: loss=0.000, reward_mean=0.0, reward_bound=0.0
84617: loss=0.000, reward_mean=0.1, reward_bound=0.0
84618: loss=0.000, reward_mean=0.1, reward_bound=0.0
84619: loss=0.000, reward_mean=0.0, reward_bound=0.0
84620: loss=0.000, reward_mean=0.1, reward_bound=0.0
84621: loss=0.000, reward_mean=0.0, reward_bound=0.0
84622: loss=0.000, reward_mean=0.1, reward_bound=0.0
84623: loss=0.000, reward_mean=0.1, reward_bound=0.0
84624: loss=0.000, reward_mean=0.2, reward_bound=0.0
84625: loss=0.000, reward_mean=0.0, reward_bound=0.0
84626: loss=0.000, reward_mean=0.1, reward_bound=0.0
84627: loss=0.000, reward_mean=0.1, reward_bound=0.0
84628: loss=0.000, reward_mean=0.1, reward_bound=0.0
84629: loss=0.000, reward_mean=0.0, reward_bound=0.0
84630: loss=0.000, reward_mean=0.0, reward_bou

84768: loss=0.000, reward_mean=0.0, reward_bound=0.0
84769: loss=0.000, reward_mean=0.0, reward_bound=0.0
84770: loss=0.000, reward_mean=0.1, reward_bound=0.0
84771: loss=0.000, reward_mean=0.1, reward_bound=0.0
84772: loss=0.000, reward_mean=0.0, reward_bound=0.0
84773: loss=0.000, reward_mean=0.1, reward_bound=0.0
84774: loss=0.000, reward_mean=0.0, reward_bound=0.0
84775: loss=0.000, reward_mean=0.1, reward_bound=0.0
84776: loss=0.000, reward_mean=0.0, reward_bound=0.0
84777: loss=0.000, reward_mean=0.1, reward_bound=0.0
84778: loss=0.000, reward_mean=0.0, reward_bound=0.0
84779: loss=0.000, reward_mean=0.0, reward_bound=0.0
84780: loss=0.000, reward_mean=0.1, reward_bound=0.0
84781: loss=0.000, reward_mean=0.0, reward_bound=0.0
84782: loss=0.000, reward_mean=0.0, reward_bound=0.0
84783: loss=0.000, reward_mean=0.0, reward_bound=0.0
84784: loss=0.000, reward_mean=0.0, reward_bound=0.0
84785: loss=0.000, reward_mean=0.0, reward_bound=0.0
84786: loss=0.000, reward_mean=0.1, reward_bou

84925: loss=0.000, reward_mean=0.0, reward_bound=0.0
84926: loss=0.000, reward_mean=0.0, reward_bound=0.0
84927: loss=0.000, reward_mean=0.1, reward_bound=0.0
84928: loss=0.000, reward_mean=0.0, reward_bound=0.0
84929: loss=0.000, reward_mean=0.1, reward_bound=0.0
84930: loss=0.000, reward_mean=0.0, reward_bound=0.0
84931: loss=0.000, reward_mean=0.0, reward_bound=0.0
84932: loss=0.000, reward_mean=0.1, reward_bound=0.0
84933: loss=0.000, reward_mean=0.0, reward_bound=0.0
84934: loss=0.000, reward_mean=0.0, reward_bound=0.0
84935: loss=0.000, reward_mean=0.1, reward_bound=0.0
84936: loss=0.000, reward_mean=0.1, reward_bound=0.0
84937: loss=0.000, reward_mean=0.0, reward_bound=0.0
84938: loss=0.000, reward_mean=0.0, reward_bound=0.0
84939: loss=0.000, reward_mean=0.1, reward_bound=0.0
84940: loss=0.000, reward_mean=0.1, reward_bound=0.0
84941: loss=0.000, reward_mean=0.0, reward_bound=0.0
84942: loss=0.000, reward_mean=0.1, reward_bound=0.0
84943: loss=0.000, reward_mean=0.1, reward_bou

85085: loss=0.000, reward_mean=0.0, reward_bound=0.0
85086: loss=0.000, reward_mean=0.1, reward_bound=0.0
85087: loss=0.000, reward_mean=0.1, reward_bound=0.0
85088: loss=0.000, reward_mean=0.0, reward_bound=0.0
85089: loss=0.000, reward_mean=0.0, reward_bound=0.0
85090: loss=0.000, reward_mean=0.0, reward_bound=0.0
85091: loss=0.000, reward_mean=0.1, reward_bound=0.0
85092: loss=0.000, reward_mean=0.0, reward_bound=0.0
85093: loss=0.000, reward_mean=0.1, reward_bound=0.0
85094: loss=0.000, reward_mean=0.0, reward_bound=0.0
85095: loss=0.000, reward_mean=0.2, reward_bound=0.0
85096: loss=0.000, reward_mean=0.1, reward_bound=0.0
85097: loss=0.000, reward_mean=0.1, reward_bound=0.0
85098: loss=0.000, reward_mean=0.1, reward_bound=0.0
85099: loss=0.000, reward_mean=0.0, reward_bound=0.0
85100: loss=0.000, reward_mean=0.1, reward_bound=0.0
85101: loss=0.000, reward_mean=0.0, reward_bound=0.0
85102: loss=0.000, reward_mean=0.1, reward_bound=0.0
85103: loss=0.000, reward_mean=0.0, reward_bou

85240: loss=0.000, reward_mean=0.0, reward_bound=0.0
85241: loss=0.000, reward_mean=0.1, reward_bound=0.0
85242: loss=0.000, reward_mean=0.1, reward_bound=0.0
85243: loss=0.000, reward_mean=0.0, reward_bound=0.0
85244: loss=0.000, reward_mean=0.0, reward_bound=0.0
85245: loss=0.000, reward_mean=0.1, reward_bound=0.0
85246: loss=0.000, reward_mean=0.1, reward_bound=0.0
85247: loss=0.000, reward_mean=0.1, reward_bound=0.0
85248: loss=0.000, reward_mean=0.1, reward_bound=0.0
85249: loss=0.000, reward_mean=0.0, reward_bound=0.0
85250: loss=0.000, reward_mean=0.1, reward_bound=0.0
85251: loss=0.000, reward_mean=0.0, reward_bound=0.0
85252: loss=0.000, reward_mean=0.0, reward_bound=0.0
85253: loss=0.000, reward_mean=0.0, reward_bound=0.0
85254: loss=0.000, reward_mean=0.1, reward_bound=0.0
85255: loss=0.000, reward_mean=0.2, reward_bound=0.0
85256: loss=0.000, reward_mean=0.1, reward_bound=0.0
85257: loss=0.000, reward_mean=0.0, reward_bound=0.0
85258: loss=0.000, reward_mean=0.1, reward_bou

85395: loss=0.000, reward_mean=0.1, reward_bound=0.0
85396: loss=0.000, reward_mean=0.1, reward_bound=0.0
85397: loss=0.000, reward_mean=0.0, reward_bound=0.0
85398: loss=0.000, reward_mean=0.2, reward_bound=0.0
85399: loss=0.000, reward_mean=0.1, reward_bound=0.0
85400: loss=0.000, reward_mean=0.0, reward_bound=0.0
85401: loss=0.000, reward_mean=0.1, reward_bound=0.0
85402: loss=0.000, reward_mean=0.1, reward_bound=0.0
85403: loss=0.000, reward_mean=0.1, reward_bound=0.0
85404: loss=0.000, reward_mean=0.1, reward_bound=0.0
85405: loss=0.000, reward_mean=0.0, reward_bound=0.0
85406: loss=0.000, reward_mean=0.0, reward_bound=0.0
85407: loss=0.000, reward_mean=0.0, reward_bound=0.0
85408: loss=0.000, reward_mean=0.1, reward_bound=0.0
85409: loss=0.000, reward_mean=0.2, reward_bound=0.0
85410: loss=0.000, reward_mean=0.1, reward_bound=0.0
85411: loss=0.000, reward_mean=0.0, reward_bound=0.0
85412: loss=0.000, reward_mean=0.0, reward_bound=0.0
85413: loss=0.000, reward_mean=0.1, reward_bou

85550: loss=0.000, reward_mean=0.1, reward_bound=0.0
85551: loss=0.000, reward_mean=0.3, reward_bound=0.5
85552: loss=0.000, reward_mean=0.0, reward_bound=0.0
85553: loss=0.000, reward_mean=0.0, reward_bound=0.0
85554: loss=0.000, reward_mean=0.1, reward_bound=0.0
85555: loss=0.000, reward_mean=0.1, reward_bound=0.0
85556: loss=0.000, reward_mean=0.1, reward_bound=0.0
85557: loss=0.000, reward_mean=0.2, reward_bound=0.0
85558: loss=0.000, reward_mean=0.0, reward_bound=0.0
85559: loss=0.000, reward_mean=0.0, reward_bound=0.0
85560: loss=0.000, reward_mean=0.1, reward_bound=0.0
85561: loss=0.000, reward_mean=0.1, reward_bound=0.0
85562: loss=0.000, reward_mean=0.0, reward_bound=0.0
85563: loss=0.000, reward_mean=0.0, reward_bound=0.0
85564: loss=0.000, reward_mean=0.0, reward_bound=0.0
85565: loss=0.000, reward_mean=0.1, reward_bound=0.0
85566: loss=0.000, reward_mean=0.1, reward_bound=0.0
85567: loss=0.000, reward_mean=0.1, reward_bound=0.0
85568: loss=0.000, reward_mean=0.0, reward_bou

85708: loss=0.000, reward_mean=0.1, reward_bound=0.0
85709: loss=0.000, reward_mean=0.1, reward_bound=0.0
85710: loss=0.000, reward_mean=0.0, reward_bound=0.0
85711: loss=0.000, reward_mean=0.0, reward_bound=0.0
85712: loss=0.000, reward_mean=0.0, reward_bound=0.0
85713: loss=0.000, reward_mean=0.1, reward_bound=0.0
85714: loss=0.000, reward_mean=0.0, reward_bound=0.0
85715: loss=0.000, reward_mean=0.1, reward_bound=0.0
85716: loss=0.000, reward_mean=0.0, reward_bound=0.0
85717: loss=0.000, reward_mean=0.0, reward_bound=0.0
85718: loss=0.000, reward_mean=0.1, reward_bound=0.0
85719: loss=0.000, reward_mean=0.1, reward_bound=0.0
85720: loss=0.000, reward_mean=0.0, reward_bound=0.0
85721: loss=0.000, reward_mean=0.0, reward_bound=0.0
85722: loss=0.000, reward_mean=0.1, reward_bound=0.0
85723: loss=0.000, reward_mean=0.0, reward_bound=0.0
85724: loss=0.000, reward_mean=0.0, reward_bound=0.0
85725: loss=0.000, reward_mean=0.1, reward_bound=0.0
85726: loss=0.000, reward_mean=0.2, reward_bou

85863: loss=0.000, reward_mean=0.1, reward_bound=0.0
85864: loss=0.000, reward_mean=0.1, reward_bound=0.0
85865: loss=0.000, reward_mean=0.1, reward_bound=0.0
85866: loss=0.000, reward_mean=0.0, reward_bound=0.0
85867: loss=0.000, reward_mean=0.1, reward_bound=0.0
85868: loss=0.000, reward_mean=0.0, reward_bound=0.0
85869: loss=0.000, reward_mean=0.1, reward_bound=0.0
85870: loss=0.000, reward_mean=0.2, reward_bound=0.0
85871: loss=0.000, reward_mean=0.1, reward_bound=0.0
85872: loss=0.000, reward_mean=0.2, reward_bound=0.0
85873: loss=0.000, reward_mean=0.0, reward_bound=0.0
85874: loss=0.000, reward_mean=0.1, reward_bound=0.0
85875: loss=0.000, reward_mean=0.1, reward_bound=0.0
85876: loss=0.000, reward_mean=0.1, reward_bound=0.0
85877: loss=0.000, reward_mean=0.0, reward_bound=0.0
85878: loss=0.000, reward_mean=0.0, reward_bound=0.0
85879: loss=0.000, reward_mean=0.1, reward_bound=0.0
85880: loss=0.000, reward_mean=0.1, reward_bound=0.0
85881: loss=0.000, reward_mean=0.1, reward_bou

86018: loss=0.000, reward_mean=0.1, reward_bound=0.0
86019: loss=0.000, reward_mean=0.1, reward_bound=0.0
86020: loss=0.000, reward_mean=0.0, reward_bound=0.0
86021: loss=0.000, reward_mean=0.1, reward_bound=0.0
86022: loss=0.000, reward_mean=0.0, reward_bound=0.0
86023: loss=0.000, reward_mean=0.1, reward_bound=0.0
86024: loss=0.000, reward_mean=0.0, reward_bound=0.0
86025: loss=0.000, reward_mean=0.0, reward_bound=0.0
86026: loss=0.000, reward_mean=0.1, reward_bound=0.0
86027: loss=0.000, reward_mean=0.2, reward_bound=0.0
86028: loss=0.000, reward_mean=0.1, reward_bound=0.0
86029: loss=0.000, reward_mean=0.2, reward_bound=0.0
86030: loss=0.000, reward_mean=0.0, reward_bound=0.0
86031: loss=0.000, reward_mean=0.1, reward_bound=0.0
86032: loss=0.000, reward_mean=0.0, reward_bound=0.0
86033: loss=0.000, reward_mean=0.0, reward_bound=0.0
86034: loss=0.000, reward_mean=0.1, reward_bound=0.0
86035: loss=0.000, reward_mean=0.0, reward_bound=0.0
86036: loss=0.000, reward_mean=0.1, reward_bou

86177: loss=0.000, reward_mean=0.1, reward_bound=0.0
86178: loss=0.000, reward_mean=0.0, reward_bound=0.0
86179: loss=0.000, reward_mean=0.0, reward_bound=0.0
86180: loss=0.000, reward_mean=0.1, reward_bound=0.0
86181: loss=0.000, reward_mean=0.0, reward_bound=0.0
86182: loss=0.000, reward_mean=0.0, reward_bound=0.0
86183: loss=0.000, reward_mean=0.1, reward_bound=0.0
86184: loss=0.000, reward_mean=0.1, reward_bound=0.0
86185: loss=0.000, reward_mean=0.0, reward_bound=0.0
86186: loss=0.000, reward_mean=0.0, reward_bound=0.0
86187: loss=0.000, reward_mean=0.1, reward_bound=0.0
86188: loss=0.000, reward_mean=0.1, reward_bound=0.0
86189: loss=0.000, reward_mean=0.0, reward_bound=0.0
86190: loss=0.000, reward_mean=0.1, reward_bound=0.0
86191: loss=0.000, reward_mean=0.0, reward_bound=0.0
86192: loss=0.000, reward_mean=0.1, reward_bound=0.0
86193: loss=0.000, reward_mean=0.1, reward_bound=0.0
86194: loss=0.000, reward_mean=0.1, reward_bound=0.0
86195: loss=0.000, reward_mean=0.1, reward_bou

86336: loss=0.000, reward_mean=0.1, reward_bound=0.0
86337: loss=0.000, reward_mean=0.0, reward_bound=0.0
86338: loss=0.000, reward_mean=0.0, reward_bound=0.0
86339: loss=0.000, reward_mean=0.1, reward_bound=0.0
86340: loss=0.000, reward_mean=0.1, reward_bound=0.0
86341: loss=0.000, reward_mean=0.1, reward_bound=0.0
86342: loss=0.000, reward_mean=0.1, reward_bound=0.0
86343: loss=0.000, reward_mean=0.1, reward_bound=0.0
86344: loss=0.000, reward_mean=0.2, reward_bound=0.0
86345: loss=0.000, reward_mean=0.0, reward_bound=0.0
86346: loss=0.000, reward_mean=0.0, reward_bound=0.0
86347: loss=0.000, reward_mean=0.0, reward_bound=0.0
86348: loss=0.000, reward_mean=0.1, reward_bound=0.0
86349: loss=0.000, reward_mean=0.1, reward_bound=0.0
86350: loss=0.000, reward_mean=0.1, reward_bound=0.0
86351: loss=0.000, reward_mean=0.0, reward_bound=0.0
86352: loss=0.000, reward_mean=0.1, reward_bound=0.0
86353: loss=0.000, reward_mean=0.0, reward_bound=0.0
86354: loss=0.000, reward_mean=0.1, reward_bou

86493: loss=0.000, reward_mean=0.0, reward_bound=0.0
86494: loss=0.000, reward_mean=0.0, reward_bound=0.0
86495: loss=0.000, reward_mean=0.0, reward_bound=0.0
86496: loss=0.000, reward_mean=0.0, reward_bound=0.0
86497: loss=0.000, reward_mean=0.0, reward_bound=0.0
86498: loss=0.000, reward_mean=0.1, reward_bound=0.0
86499: loss=0.000, reward_mean=0.1, reward_bound=0.0
86500: loss=0.000, reward_mean=0.0, reward_bound=0.0
86501: loss=0.000, reward_mean=0.0, reward_bound=0.0
86502: loss=0.000, reward_mean=0.0, reward_bound=0.0
86503: loss=0.000, reward_mean=0.2, reward_bound=0.0
86504: loss=0.000, reward_mean=0.1, reward_bound=0.0
86505: loss=0.000, reward_mean=0.1, reward_bound=0.0
86506: loss=0.000, reward_mean=0.0, reward_bound=0.0
86507: loss=0.000, reward_mean=0.1, reward_bound=0.0
86508: loss=0.000, reward_mean=0.1, reward_bound=0.0
86509: loss=0.000, reward_mean=0.1, reward_bound=0.0
86510: loss=0.000, reward_mean=0.0, reward_bound=0.0
86511: loss=0.000, reward_mean=0.0, reward_bou

86651: loss=0.000, reward_mean=0.0, reward_bound=0.0
86652: loss=0.000, reward_mean=0.0, reward_bound=0.0
86653: loss=0.000, reward_mean=0.1, reward_bound=0.0
86654: loss=0.000, reward_mean=0.0, reward_bound=0.0
86655: loss=0.000, reward_mean=0.0, reward_bound=0.0
86656: loss=0.000, reward_mean=0.1, reward_bound=0.0
86657: loss=0.000, reward_mean=0.1, reward_bound=0.0
86658: loss=0.000, reward_mean=0.1, reward_bound=0.0
86659: loss=0.000, reward_mean=0.0, reward_bound=0.0
86660: loss=0.000, reward_mean=0.0, reward_bound=0.0
86661: loss=0.000, reward_mean=0.1, reward_bound=0.0
86662: loss=0.000, reward_mean=0.1, reward_bound=0.0
86663: loss=0.000, reward_mean=0.0, reward_bound=0.0
86664: loss=0.000, reward_mean=0.1, reward_bound=0.0
86665: loss=0.000, reward_mean=0.0, reward_bound=0.0
86666: loss=0.000, reward_mean=0.1, reward_bound=0.0
86667: loss=0.000, reward_mean=0.1, reward_bound=0.0
86668: loss=0.000, reward_mean=0.1, reward_bound=0.0
86669: loss=0.000, reward_mean=0.1, reward_bou

86810: loss=0.000, reward_mean=0.1, reward_bound=0.0
86811: loss=0.000, reward_mean=0.1, reward_bound=0.0
86812: loss=0.000, reward_mean=0.1, reward_bound=0.0
86813: loss=0.000, reward_mean=0.1, reward_bound=0.0
86814: loss=0.000, reward_mean=0.0, reward_bound=0.0
86815: loss=0.000, reward_mean=0.0, reward_bound=0.0
86816: loss=0.000, reward_mean=0.1, reward_bound=0.0
86817: loss=0.000, reward_mean=0.1, reward_bound=0.0
86818: loss=0.000, reward_mean=0.0, reward_bound=0.0
86819: loss=0.000, reward_mean=0.0, reward_bound=0.0
86820: loss=0.000, reward_mean=0.0, reward_bound=0.0
86821: loss=0.000, reward_mean=0.1, reward_bound=0.0
86822: loss=0.000, reward_mean=0.1, reward_bound=0.0
86823: loss=0.000, reward_mean=0.0, reward_bound=0.0
86824: loss=0.000, reward_mean=0.0, reward_bound=0.0
86825: loss=0.000, reward_mean=0.1, reward_bound=0.0
86826: loss=0.000, reward_mean=0.1, reward_bound=0.0
86827: loss=0.000, reward_mean=0.1, reward_bound=0.0
86828: loss=0.000, reward_mean=0.0, reward_bou

86966: loss=0.000, reward_mean=0.1, reward_bound=0.0
86967: loss=0.000, reward_mean=0.1, reward_bound=0.0
86968: loss=0.000, reward_mean=0.0, reward_bound=0.0
86969: loss=0.000, reward_mean=0.1, reward_bound=0.0
86970: loss=0.000, reward_mean=0.0, reward_bound=0.0
86971: loss=0.000, reward_mean=0.0, reward_bound=0.0
86972: loss=0.000, reward_mean=0.1, reward_bound=0.0
86973: loss=0.000, reward_mean=0.0, reward_bound=0.0
86974: loss=0.000, reward_mean=0.1, reward_bound=0.0
86975: loss=0.000, reward_mean=0.1, reward_bound=0.0
86976: loss=0.000, reward_mean=0.0, reward_bound=0.0
86977: loss=0.000, reward_mean=0.0, reward_bound=0.0
86978: loss=0.000, reward_mean=0.1, reward_bound=0.0
86979: loss=0.000, reward_mean=0.1, reward_bound=0.0
86980: loss=0.000, reward_mean=0.0, reward_bound=0.0
86981: loss=0.000, reward_mean=0.0, reward_bound=0.0
86982: loss=0.000, reward_mean=0.0, reward_bound=0.0
86983: loss=0.000, reward_mean=0.2, reward_bound=0.0
86984: loss=0.000, reward_mean=0.0, reward_bou

87125: loss=0.000, reward_mean=0.1, reward_bound=0.0
87126: loss=0.000, reward_mean=0.0, reward_bound=0.0
87127: loss=0.000, reward_mean=0.1, reward_bound=0.0
87128: loss=0.000, reward_mean=0.1, reward_bound=0.0
87129: loss=0.000, reward_mean=0.1, reward_bound=0.0
87130: loss=0.000, reward_mean=0.0, reward_bound=0.0
87131: loss=0.000, reward_mean=0.1, reward_bound=0.0
87132: loss=0.000, reward_mean=0.0, reward_bound=0.0
87133: loss=0.000, reward_mean=0.0, reward_bound=0.0
87134: loss=0.000, reward_mean=0.1, reward_bound=0.0
87135: loss=0.000, reward_mean=0.1, reward_bound=0.0
87136: loss=0.000, reward_mean=0.1, reward_bound=0.0
87137: loss=0.000, reward_mean=0.1, reward_bound=0.0
87138: loss=0.000, reward_mean=0.0, reward_bound=0.0
87139: loss=0.000, reward_mean=0.0, reward_bound=0.0
87140: loss=0.000, reward_mean=0.1, reward_bound=0.0
87141: loss=0.000, reward_mean=0.1, reward_bound=0.0
87142: loss=0.000, reward_mean=0.1, reward_bound=0.0
87143: loss=0.000, reward_mean=0.1, reward_bou

87283: loss=0.000, reward_mean=0.2, reward_bound=0.0
87284: loss=0.000, reward_mean=0.1, reward_bound=0.0
87285: loss=0.000, reward_mean=0.1, reward_bound=0.0
87286: loss=0.000, reward_mean=0.1, reward_bound=0.0
87287: loss=0.000, reward_mean=0.0, reward_bound=0.0
87288: loss=0.000, reward_mean=0.0, reward_bound=0.0
87289: loss=0.000, reward_mean=0.1, reward_bound=0.0
87290: loss=0.000, reward_mean=0.0, reward_bound=0.0
87291: loss=0.000, reward_mean=0.1, reward_bound=0.0
87292: loss=0.000, reward_mean=0.1, reward_bound=0.0
87293: loss=0.000, reward_mean=0.1, reward_bound=0.0
87294: loss=0.000, reward_mean=0.1, reward_bound=0.0
87295: loss=0.000, reward_mean=0.1, reward_bound=0.0
87296: loss=0.000, reward_mean=0.1, reward_bound=0.0
87297: loss=0.000, reward_mean=0.1, reward_bound=0.0
87298: loss=0.000, reward_mean=0.1, reward_bound=0.0
87299: loss=0.000, reward_mean=0.0, reward_bound=0.0
87300: loss=0.000, reward_mean=0.1, reward_bound=0.0
87301: loss=0.000, reward_mean=0.0, reward_bou

87440: loss=0.000, reward_mean=0.0, reward_bound=0.0
87441: loss=0.000, reward_mean=0.1, reward_bound=0.0
87442: loss=0.000, reward_mean=0.1, reward_bound=0.0
87443: loss=0.000, reward_mean=0.1, reward_bound=0.0
87444: loss=0.000, reward_mean=0.1, reward_bound=0.0
87445: loss=0.000, reward_mean=0.1, reward_bound=0.0
87446: loss=0.000, reward_mean=0.1, reward_bound=0.0
87447: loss=0.000, reward_mean=0.0, reward_bound=0.0
87448: loss=0.000, reward_mean=0.1, reward_bound=0.0
87449: loss=0.000, reward_mean=0.0, reward_bound=0.0
87450: loss=0.000, reward_mean=0.1, reward_bound=0.0
87451: loss=0.000, reward_mean=0.1, reward_bound=0.0
87452: loss=0.000, reward_mean=0.1, reward_bound=0.0
87453: loss=0.000, reward_mean=0.1, reward_bound=0.0
87454: loss=0.000, reward_mean=0.0, reward_bound=0.0
87455: loss=0.000, reward_mean=0.1, reward_bound=0.0
87456: loss=0.000, reward_mean=0.1, reward_bound=0.0
87457: loss=0.000, reward_mean=0.0, reward_bound=0.0
87458: loss=0.000, reward_mean=0.1, reward_bou

87595: loss=0.000, reward_mean=0.1, reward_bound=0.0
87596: loss=0.000, reward_mean=0.1, reward_bound=0.0
87597: loss=0.000, reward_mean=0.1, reward_bound=0.0
87598: loss=0.000, reward_mean=0.0, reward_bound=0.0
87599: loss=0.000, reward_mean=0.1, reward_bound=0.0
87600: loss=0.000, reward_mean=0.1, reward_bound=0.0
87601: loss=0.000, reward_mean=0.0, reward_bound=0.0
87602: loss=0.000, reward_mean=0.0, reward_bound=0.0
87603: loss=0.000, reward_mean=0.1, reward_bound=0.0
87604: loss=0.000, reward_mean=0.1, reward_bound=0.0
87605: loss=0.000, reward_mean=0.1, reward_bound=0.0
87606: loss=0.000, reward_mean=0.0, reward_bound=0.0
87607: loss=0.000, reward_mean=0.1, reward_bound=0.0
87608: loss=0.000, reward_mean=0.1, reward_bound=0.0
87609: loss=0.000, reward_mean=0.1, reward_bound=0.0
87610: loss=0.000, reward_mean=0.0, reward_bound=0.0
87611: loss=0.000, reward_mean=0.0, reward_bound=0.0
87612: loss=0.000, reward_mean=0.0, reward_bound=0.0
87613: loss=0.000, reward_mean=0.0, reward_bou

87751: loss=0.000, reward_mean=0.0, reward_bound=0.0
87752: loss=0.000, reward_mean=0.0, reward_bound=0.0
87753: loss=0.000, reward_mean=0.1, reward_bound=0.0
87754: loss=0.000, reward_mean=0.1, reward_bound=0.0
87755: loss=0.000, reward_mean=0.1, reward_bound=0.0
87756: loss=0.000, reward_mean=0.0, reward_bound=0.0
87757: loss=0.000, reward_mean=0.0, reward_bound=0.0
87758: loss=0.000, reward_mean=0.1, reward_bound=0.0
87759: loss=0.000, reward_mean=0.0, reward_bound=0.0
87760: loss=0.000, reward_mean=0.1, reward_bound=0.0
87761: loss=0.000, reward_mean=0.1, reward_bound=0.0
87762: loss=0.000, reward_mean=0.1, reward_bound=0.0
87763: loss=0.000, reward_mean=0.1, reward_bound=0.0
87764: loss=0.000, reward_mean=0.1, reward_bound=0.0
87765: loss=0.000, reward_mean=0.0, reward_bound=0.0
87766: loss=0.000, reward_mean=0.0, reward_bound=0.0
87767: loss=0.000, reward_mean=0.1, reward_bound=0.0
87768: loss=0.000, reward_mean=0.0, reward_bound=0.0
87769: loss=0.000, reward_mean=0.0, reward_bou

87910: loss=0.000, reward_mean=0.1, reward_bound=0.0
87911: loss=0.000, reward_mean=0.1, reward_bound=0.0
87912: loss=0.000, reward_mean=0.1, reward_bound=0.0
87913: loss=0.000, reward_mean=0.0, reward_bound=0.0
87914: loss=0.000, reward_mean=0.0, reward_bound=0.0
87915: loss=0.000, reward_mean=0.0, reward_bound=0.0
87916: loss=0.000, reward_mean=0.1, reward_bound=0.0
87917: loss=0.000, reward_mean=0.1, reward_bound=0.0
87918: loss=0.000, reward_mean=0.0, reward_bound=0.0
87919: loss=0.000, reward_mean=0.0, reward_bound=0.0
87920: loss=0.000, reward_mean=0.1, reward_bound=0.0
87921: loss=0.000, reward_mean=0.0, reward_bound=0.0
87922: loss=0.000, reward_mean=0.1, reward_bound=0.0
87923: loss=0.000, reward_mean=0.0, reward_bound=0.0
87924: loss=0.000, reward_mean=0.0, reward_bound=0.0
87925: loss=0.000, reward_mean=0.1, reward_bound=0.0
87926: loss=0.000, reward_mean=0.0, reward_bound=0.0
87927: loss=0.000, reward_mean=0.0, reward_bound=0.0
87928: loss=0.000, reward_mean=0.1, reward_bou

88065: loss=0.000, reward_mean=0.0, reward_bound=0.0
88066: loss=0.000, reward_mean=0.0, reward_bound=0.0
88067: loss=0.000, reward_mean=0.0, reward_bound=0.0
88068: loss=0.000, reward_mean=0.1, reward_bound=0.0
88069: loss=0.000, reward_mean=0.1, reward_bound=0.0
88070: loss=0.000, reward_mean=0.1, reward_bound=0.0
88071: loss=0.000, reward_mean=0.1, reward_bound=0.0
88072: loss=0.000, reward_mean=0.0, reward_bound=0.0
88073: loss=0.000, reward_mean=0.1, reward_bound=0.0
88074: loss=0.000, reward_mean=0.0, reward_bound=0.0
88075: loss=0.000, reward_mean=0.1, reward_bound=0.0
88076: loss=0.000, reward_mean=0.1, reward_bound=0.0
88077: loss=0.000, reward_mean=0.0, reward_bound=0.0
88078: loss=0.000, reward_mean=0.0, reward_bound=0.0
88079: loss=0.000, reward_mean=0.0, reward_bound=0.0
88080: loss=0.000, reward_mean=0.1, reward_bound=0.0
88081: loss=0.000, reward_mean=0.0, reward_bound=0.0
88082: loss=0.000, reward_mean=0.1, reward_bound=0.0
88083: loss=0.000, reward_mean=0.1, reward_bou

88222: loss=0.000, reward_mean=0.0, reward_bound=0.0
88223: loss=0.000, reward_mean=0.0, reward_bound=0.0
88224: loss=0.000, reward_mean=0.1, reward_bound=0.0
88225: loss=0.000, reward_mean=0.1, reward_bound=0.0
88226: loss=0.000, reward_mean=0.0, reward_bound=0.0
88227: loss=0.000, reward_mean=0.1, reward_bound=0.0
88228: loss=0.000, reward_mean=0.1, reward_bound=0.0
88229: loss=0.000, reward_mean=0.0, reward_bound=0.0
88230: loss=0.000, reward_mean=0.0, reward_bound=0.0
88231: loss=0.000, reward_mean=0.1, reward_bound=0.0
88232: loss=0.000, reward_mean=0.1, reward_bound=0.0
88233: loss=0.000, reward_mean=0.1, reward_bound=0.0
88234: loss=0.000, reward_mean=0.0, reward_bound=0.0
88235: loss=0.000, reward_mean=0.1, reward_bound=0.0
88236: loss=0.000, reward_mean=0.1, reward_bound=0.0
88237: loss=0.000, reward_mean=0.2, reward_bound=0.0
88238: loss=0.000, reward_mean=0.1, reward_bound=0.0
88239: loss=0.000, reward_mean=0.1, reward_bound=0.0
88240: loss=0.000, reward_mean=0.0, reward_bou

88377: loss=0.000, reward_mean=0.1, reward_bound=0.0
88378: loss=0.000, reward_mean=0.0, reward_bound=0.0
88379: loss=0.000, reward_mean=0.1, reward_bound=0.0
88380: loss=0.000, reward_mean=0.1, reward_bound=0.0
88381: loss=0.000, reward_mean=0.0, reward_bound=0.0
88382: loss=0.000, reward_mean=0.2, reward_bound=0.0
88383: loss=0.000, reward_mean=0.1, reward_bound=0.0
88384: loss=0.000, reward_mean=0.0, reward_bound=0.0
88385: loss=0.000, reward_mean=0.1, reward_bound=0.0
88386: loss=0.000, reward_mean=0.0, reward_bound=0.0
88387: loss=0.000, reward_mean=0.3, reward_bound=0.5
88388: loss=0.000, reward_mean=0.0, reward_bound=0.0
88389: loss=0.000, reward_mean=0.0, reward_bound=0.0
88390: loss=0.000, reward_mean=0.0, reward_bound=0.0
88391: loss=0.000, reward_mean=0.0, reward_bound=0.0
88392: loss=0.000, reward_mean=0.0, reward_bound=0.0
88393: loss=0.000, reward_mean=0.0, reward_bound=0.0
88394: loss=0.000, reward_mean=0.1, reward_bound=0.0
88395: loss=0.000, reward_mean=0.0, reward_bou

88537: loss=0.000, reward_mean=0.1, reward_bound=0.0
88538: loss=0.000, reward_mean=0.0, reward_bound=0.0
88539: loss=0.000, reward_mean=0.1, reward_bound=0.0
88540: loss=0.000, reward_mean=0.0, reward_bound=0.0
88541: loss=0.000, reward_mean=0.1, reward_bound=0.0
88542: loss=0.000, reward_mean=0.0, reward_bound=0.0
88543: loss=0.000, reward_mean=0.0, reward_bound=0.0
88544: loss=0.000, reward_mean=0.1, reward_bound=0.0
88545: loss=0.000, reward_mean=0.0, reward_bound=0.0
88546: loss=0.000, reward_mean=0.1, reward_bound=0.0
88547: loss=0.000, reward_mean=0.0, reward_bound=0.0
88548: loss=0.000, reward_mean=0.1, reward_bound=0.0
88549: loss=0.000, reward_mean=0.0, reward_bound=0.0
88550: loss=0.000, reward_mean=0.1, reward_bound=0.0
88551: loss=0.000, reward_mean=0.0, reward_bound=0.0
88552: loss=0.000, reward_mean=0.0, reward_bound=0.0
88553: loss=0.000, reward_mean=0.0, reward_bound=0.0
88554: loss=0.000, reward_mean=0.0, reward_bound=0.0
88555: loss=0.000, reward_mean=0.1, reward_bou

88694: loss=0.000, reward_mean=0.0, reward_bound=0.0
88695: loss=0.000, reward_mean=0.0, reward_bound=0.0
88696: loss=0.000, reward_mean=0.1, reward_bound=0.0
88697: loss=0.000, reward_mean=0.1, reward_bound=0.0
88698: loss=0.000, reward_mean=0.1, reward_bound=0.0
88699: loss=0.000, reward_mean=0.0, reward_bound=0.0
88700: loss=0.000, reward_mean=0.0, reward_bound=0.0
88701: loss=0.000, reward_mean=0.1, reward_bound=0.0
88702: loss=0.000, reward_mean=0.0, reward_bound=0.0
88703: loss=0.000, reward_mean=0.1, reward_bound=0.0
88704: loss=0.000, reward_mean=0.0, reward_bound=0.0
88705: loss=0.000, reward_mean=0.0, reward_bound=0.0
88706: loss=0.000, reward_mean=0.0, reward_bound=0.0
88707: loss=0.000, reward_mean=0.0, reward_bound=0.0
88708: loss=0.000, reward_mean=0.0, reward_bound=0.0
88709: loss=0.000, reward_mean=0.1, reward_bound=0.0
88710: loss=0.000, reward_mean=0.1, reward_bound=0.0
88711: loss=0.000, reward_mean=0.2, reward_bound=0.0
88712: loss=0.000, reward_mean=0.0, reward_bou

88849: loss=0.000, reward_mean=0.0, reward_bound=0.0
88850: loss=0.000, reward_mean=0.1, reward_bound=0.0
88851: loss=0.000, reward_mean=0.1, reward_bound=0.0
88852: loss=0.000, reward_mean=0.0, reward_bound=0.0
88853: loss=0.000, reward_mean=0.1, reward_bound=0.0
88854: loss=0.000, reward_mean=0.0, reward_bound=0.0
88855: loss=0.000, reward_mean=0.1, reward_bound=0.0
88856: loss=0.000, reward_mean=0.1, reward_bound=0.0
88857: loss=0.000, reward_mean=0.2, reward_bound=0.0
88858: loss=0.000, reward_mean=0.2, reward_bound=0.0
88859: loss=0.000, reward_mean=0.1, reward_bound=0.0
88860: loss=0.000, reward_mean=0.1, reward_bound=0.0
88861: loss=0.000, reward_mean=0.1, reward_bound=0.0
88862: loss=0.000, reward_mean=0.1, reward_bound=0.0
88863: loss=0.000, reward_mean=0.1, reward_bound=0.0
88864: loss=0.000, reward_mean=0.0, reward_bound=0.0
88865: loss=0.000, reward_mean=0.0, reward_bound=0.0
88866: loss=0.000, reward_mean=0.0, reward_bound=0.0
88867: loss=0.000, reward_mean=0.1, reward_bou

89005: loss=0.000, reward_mean=0.0, reward_bound=0.0
89006: loss=0.000, reward_mean=0.1, reward_bound=0.0
89007: loss=0.000, reward_mean=0.1, reward_bound=0.0
89008: loss=0.000, reward_mean=0.0, reward_bound=0.0
89009: loss=0.000, reward_mean=0.0, reward_bound=0.0
89010: loss=0.000, reward_mean=0.0, reward_bound=0.0
89011: loss=0.000, reward_mean=0.1, reward_bound=0.0
89012: loss=0.000, reward_mean=0.0, reward_bound=0.0
89013: loss=0.000, reward_mean=0.0, reward_bound=0.0
89014: loss=0.000, reward_mean=0.0, reward_bound=0.0
89015: loss=0.000, reward_mean=0.1, reward_bound=0.0
89016: loss=0.000, reward_mean=0.1, reward_bound=0.0
89017: loss=0.000, reward_mean=0.0, reward_bound=0.0
89018: loss=0.000, reward_mean=0.1, reward_bound=0.0
89019: loss=0.000, reward_mean=0.0, reward_bound=0.0
89020: loss=0.000, reward_mean=0.1, reward_bound=0.0
89021: loss=0.000, reward_mean=0.1, reward_bound=0.0
89022: loss=0.000, reward_mean=0.0, reward_bound=0.0
89023: loss=0.000, reward_mean=0.1, reward_bou

89163: loss=0.000, reward_mean=0.1, reward_bound=0.0
89164: loss=0.000, reward_mean=0.2, reward_bound=0.0
89165: loss=0.000, reward_mean=0.1, reward_bound=0.0
89166: loss=0.000, reward_mean=0.1, reward_bound=0.0
89167: loss=0.000, reward_mean=0.1, reward_bound=0.0
89168: loss=0.000, reward_mean=0.0, reward_bound=0.0
89169: loss=0.000, reward_mean=0.1, reward_bound=0.0
89170: loss=0.000, reward_mean=0.0, reward_bound=0.0
89171: loss=0.000, reward_mean=0.0, reward_bound=0.0
89172: loss=0.000, reward_mean=0.1, reward_bound=0.0
89173: loss=0.000, reward_mean=0.1, reward_bound=0.0
89174: loss=0.000, reward_mean=0.0, reward_bound=0.0
89175: loss=0.000, reward_mean=0.1, reward_bound=0.0
89176: loss=0.000, reward_mean=0.0, reward_bound=0.0
89177: loss=0.000, reward_mean=0.1, reward_bound=0.0
89178: loss=0.000, reward_mean=0.0, reward_bound=0.0
89179: loss=0.000, reward_mean=0.0, reward_bound=0.0
89180: loss=0.000, reward_mean=0.0, reward_bound=0.0
89181: loss=0.000, reward_mean=0.0, reward_bou

89320: loss=0.000, reward_mean=0.1, reward_bound=0.0
89321: loss=0.000, reward_mean=0.0, reward_bound=0.0
89322: loss=0.000, reward_mean=0.2, reward_bound=0.0
89323: loss=0.000, reward_mean=0.0, reward_bound=0.0
89324: loss=0.000, reward_mean=0.1, reward_bound=0.0
89325: loss=0.000, reward_mean=0.1, reward_bound=0.0
89326: loss=0.000, reward_mean=0.0, reward_bound=0.0
89327: loss=0.000, reward_mean=0.0, reward_bound=0.0
89328: loss=0.000, reward_mean=0.2, reward_bound=0.0
89329: loss=0.000, reward_mean=0.1, reward_bound=0.0
89330: loss=0.000, reward_mean=0.0, reward_bound=0.0
89331: loss=0.000, reward_mean=0.0, reward_bound=0.0
89332: loss=0.000, reward_mean=0.1, reward_bound=0.0
89333: loss=0.000, reward_mean=0.0, reward_bound=0.0
89334: loss=0.000, reward_mean=0.0, reward_bound=0.0
89335: loss=0.000, reward_mean=0.1, reward_bound=0.0
89336: loss=0.000, reward_mean=0.0, reward_bound=0.0
89337: loss=0.000, reward_mean=0.1, reward_bound=0.0
89338: loss=0.000, reward_mean=0.1, reward_bou

89475: loss=0.000, reward_mean=0.2, reward_bound=0.0
89476: loss=0.000, reward_mean=0.0, reward_bound=0.0
89477: loss=0.000, reward_mean=0.1, reward_bound=0.0
89478: loss=0.000, reward_mean=0.1, reward_bound=0.0
89479: loss=0.000, reward_mean=0.1, reward_bound=0.0
89480: loss=0.000, reward_mean=0.1, reward_bound=0.0
89481: loss=0.000, reward_mean=0.0, reward_bound=0.0
89482: loss=0.000, reward_mean=0.0, reward_bound=0.0
89483: loss=0.000, reward_mean=0.0, reward_bound=0.0
89484: loss=0.000, reward_mean=0.1, reward_bound=0.0
89485: loss=0.000, reward_mean=0.1, reward_bound=0.0
89486: loss=0.000, reward_mean=0.0, reward_bound=0.0
89487: loss=0.000, reward_mean=0.0, reward_bound=0.0
89488: loss=0.000, reward_mean=0.0, reward_bound=0.0
89489: loss=0.000, reward_mean=0.0, reward_bound=0.0
89490: loss=0.000, reward_mean=0.1, reward_bound=0.0
89491: loss=0.000, reward_mean=0.0, reward_bound=0.0
89492: loss=0.000, reward_mean=0.0, reward_bound=0.0
89493: loss=0.000, reward_mean=0.1, reward_bou

89635: loss=0.000, reward_mean=0.0, reward_bound=0.0
89636: loss=0.000, reward_mean=0.1, reward_bound=0.0
89637: loss=0.000, reward_mean=0.0, reward_bound=0.0
89638: loss=0.000, reward_mean=0.1, reward_bound=0.0
89639: loss=0.000, reward_mean=0.1, reward_bound=0.0
89640: loss=0.000, reward_mean=0.0, reward_bound=0.0
89641: loss=0.000, reward_mean=0.0, reward_bound=0.0
89642: loss=0.000, reward_mean=0.0, reward_bound=0.0
89643: loss=0.000, reward_mean=0.0, reward_bound=0.0
89644: loss=0.000, reward_mean=0.1, reward_bound=0.0
89645: loss=0.000, reward_mean=0.1, reward_bound=0.0
89646: loss=0.000, reward_mean=0.1, reward_bound=0.0
89647: loss=0.000, reward_mean=0.0, reward_bound=0.0
89648: loss=0.000, reward_mean=0.1, reward_bound=0.0
89649: loss=0.000, reward_mean=0.0, reward_bound=0.0
89650: loss=0.000, reward_mean=0.1, reward_bound=0.0
89651: loss=0.000, reward_mean=0.1, reward_bound=0.0
89652: loss=0.000, reward_mean=0.0, reward_bound=0.0
89653: loss=0.000, reward_mean=0.0, reward_bou

89793: loss=0.000, reward_mean=0.1, reward_bound=0.0
89794: loss=0.000, reward_mean=0.1, reward_bound=0.0
89795: loss=0.000, reward_mean=0.0, reward_bound=0.0
89796: loss=0.000, reward_mean=0.0, reward_bound=0.0
89797: loss=0.000, reward_mean=0.1, reward_bound=0.0
89798: loss=0.000, reward_mean=0.1, reward_bound=0.0
89799: loss=0.000, reward_mean=0.0, reward_bound=0.0
89800: loss=0.000, reward_mean=0.1, reward_bound=0.0
89801: loss=0.000, reward_mean=0.0, reward_bound=0.0
89802: loss=0.000, reward_mean=0.0, reward_bound=0.0
89803: loss=0.000, reward_mean=0.0, reward_bound=0.0
89804: loss=0.000, reward_mean=0.2, reward_bound=0.0
89805: loss=0.000, reward_mean=0.0, reward_bound=0.0
89806: loss=0.000, reward_mean=0.1, reward_bound=0.0
89807: loss=0.000, reward_mean=0.2, reward_bound=0.0
89808: loss=0.000, reward_mean=0.1, reward_bound=0.0
89809: loss=0.000, reward_mean=0.1, reward_bound=0.0
89810: loss=0.000, reward_mean=0.1, reward_bound=0.0
89811: loss=0.000, reward_mean=0.2, reward_bou

89951: loss=0.000, reward_mean=0.1, reward_bound=0.0
89952: loss=0.000, reward_mean=0.1, reward_bound=0.0
89953: loss=0.000, reward_mean=0.0, reward_bound=0.0
89954: loss=0.000, reward_mean=0.1, reward_bound=0.0
89955: loss=0.000, reward_mean=0.0, reward_bound=0.0
89956: loss=0.000, reward_mean=0.1, reward_bound=0.0
89957: loss=0.000, reward_mean=0.1, reward_bound=0.0
89958: loss=0.000, reward_mean=0.1, reward_bound=0.0
89959: loss=0.000, reward_mean=0.1, reward_bound=0.0
89960: loss=0.000, reward_mean=0.0, reward_bound=0.0
89961: loss=0.000, reward_mean=0.1, reward_bound=0.0
89962: loss=0.000, reward_mean=0.1, reward_bound=0.0
89963: loss=0.000, reward_mean=0.0, reward_bound=0.0
89964: loss=0.000, reward_mean=0.0, reward_bound=0.0
89965: loss=0.000, reward_mean=0.1, reward_bound=0.0
89966: loss=0.000, reward_mean=0.0, reward_bound=0.0
89967: loss=0.000, reward_mean=0.0, reward_bound=0.0
89968: loss=0.000, reward_mean=0.2, reward_bound=0.0
89969: loss=0.000, reward_mean=0.1, reward_bou

90112: loss=0.000, reward_mean=0.0, reward_bound=0.0
90113: loss=0.000, reward_mean=0.1, reward_bound=0.0
90114: loss=0.000, reward_mean=0.1, reward_bound=0.0
90115: loss=0.000, reward_mean=0.1, reward_bound=0.0
90116: loss=0.000, reward_mean=0.0, reward_bound=0.0
90117: loss=0.000, reward_mean=0.0, reward_bound=0.0
90118: loss=0.000, reward_mean=0.1, reward_bound=0.0
90119: loss=0.000, reward_mean=0.1, reward_bound=0.0
90120: loss=0.000, reward_mean=0.0, reward_bound=0.0
90121: loss=0.000, reward_mean=0.1, reward_bound=0.0
90122: loss=0.000, reward_mean=0.2, reward_bound=0.0
90123: loss=0.000, reward_mean=0.1, reward_bound=0.0
90124: loss=0.000, reward_mean=0.0, reward_bound=0.0
90125: loss=0.000, reward_mean=0.1, reward_bound=0.0
90126: loss=0.000, reward_mean=0.0, reward_bound=0.0
90127: loss=0.000, reward_mean=0.1, reward_bound=0.0
90128: loss=0.000, reward_mean=0.0, reward_bound=0.0
90129: loss=0.000, reward_mean=0.0, reward_bound=0.0
90130: loss=0.000, reward_mean=0.0, reward_bou

90270: loss=0.000, reward_mean=0.1, reward_bound=0.0
90271: loss=0.000, reward_mean=0.0, reward_bound=0.0
90272: loss=0.000, reward_mean=0.0, reward_bound=0.0
90273: loss=0.000, reward_mean=0.0, reward_bound=0.0
90274: loss=0.000, reward_mean=0.1, reward_bound=0.0
90275: loss=0.000, reward_mean=0.1, reward_bound=0.0
90276: loss=0.000, reward_mean=0.1, reward_bound=0.0
90277: loss=0.000, reward_mean=0.0, reward_bound=0.0
90278: loss=0.000, reward_mean=0.0, reward_bound=0.0
90279: loss=0.000, reward_mean=0.0, reward_bound=0.0
90280: loss=0.000, reward_mean=0.0, reward_bound=0.0
90281: loss=0.000, reward_mean=0.1, reward_bound=0.0
90282: loss=0.000, reward_mean=0.1, reward_bound=0.0
90283: loss=0.000, reward_mean=0.0, reward_bound=0.0
90284: loss=0.000, reward_mean=0.2, reward_bound=0.0
90285: loss=0.000, reward_mean=0.0, reward_bound=0.0
90286: loss=0.000, reward_mean=0.1, reward_bound=0.0
90287: loss=0.000, reward_mean=0.0, reward_bound=0.0
90288: loss=0.000, reward_mean=0.1, reward_bou

90427: loss=0.000, reward_mean=0.1, reward_bound=0.0
90428: loss=0.000, reward_mean=0.1, reward_bound=0.0
90429: loss=0.000, reward_mean=0.1, reward_bound=0.0
90430: loss=0.000, reward_mean=0.1, reward_bound=0.0
90431: loss=0.000, reward_mean=0.0, reward_bound=0.0
90432: loss=0.000, reward_mean=0.0, reward_bound=0.0
90433: loss=0.000, reward_mean=0.0, reward_bound=0.0
90434: loss=0.000, reward_mean=0.0, reward_bound=0.0
90435: loss=0.000, reward_mean=0.1, reward_bound=0.0
90436: loss=0.000, reward_mean=0.2, reward_bound=0.0
90437: loss=0.000, reward_mean=0.1, reward_bound=0.0
90438: loss=0.000, reward_mean=0.1, reward_bound=0.0
90439: loss=0.000, reward_mean=0.1, reward_bound=0.0
90440: loss=0.000, reward_mean=0.1, reward_bound=0.0
90441: loss=0.000, reward_mean=0.1, reward_bound=0.0
90442: loss=0.000, reward_mean=0.1, reward_bound=0.0
90443: loss=0.000, reward_mean=0.0, reward_bound=0.0
90444: loss=0.000, reward_mean=0.1, reward_bound=0.0
90445: loss=0.000, reward_mean=0.0, reward_bou

90584: loss=0.000, reward_mean=0.1, reward_bound=0.0
90585: loss=0.000, reward_mean=0.0, reward_bound=0.0
90586: loss=0.000, reward_mean=0.0, reward_bound=0.0
90587: loss=0.000, reward_mean=0.0, reward_bound=0.0
90588: loss=0.000, reward_mean=0.1, reward_bound=0.0
90589: loss=0.000, reward_mean=0.0, reward_bound=0.0
90590: loss=0.000, reward_mean=0.1, reward_bound=0.0
90591: loss=0.000, reward_mean=0.1, reward_bound=0.0
90592: loss=0.000, reward_mean=0.0, reward_bound=0.0
90593: loss=0.000, reward_mean=0.0, reward_bound=0.0
90594: loss=0.000, reward_mean=0.0, reward_bound=0.0
90595: loss=0.000, reward_mean=0.1, reward_bound=0.0
90596: loss=0.000, reward_mean=0.1, reward_bound=0.0
90597: loss=0.000, reward_mean=0.2, reward_bound=0.0
90598: loss=0.000, reward_mean=0.2, reward_bound=0.0
90599: loss=0.000, reward_mean=0.2, reward_bound=0.0
90600: loss=0.000, reward_mean=0.1, reward_bound=0.0
90601: loss=0.000, reward_mean=0.1, reward_bound=0.0
90602: loss=0.000, reward_mean=0.1, reward_bou

90740: loss=0.000, reward_mean=0.1, reward_bound=0.0
90741: loss=0.000, reward_mean=0.1, reward_bound=0.0
90742: loss=0.000, reward_mean=0.1, reward_bound=0.0
90743: loss=0.000, reward_mean=0.1, reward_bound=0.0
90744: loss=0.000, reward_mean=0.1, reward_bound=0.0
90745: loss=0.000, reward_mean=0.1, reward_bound=0.0
90746: loss=0.000, reward_mean=0.1, reward_bound=0.0
90747: loss=0.000, reward_mean=0.0, reward_bound=0.0
90748: loss=0.000, reward_mean=0.1, reward_bound=0.0
90749: loss=0.000, reward_mean=0.1, reward_bound=0.0
90750: loss=0.000, reward_mean=0.1, reward_bound=0.0
90751: loss=0.000, reward_mean=0.0, reward_bound=0.0
90752: loss=0.000, reward_mean=0.0, reward_bound=0.0
90753: loss=0.000, reward_mean=0.0, reward_bound=0.0
90754: loss=0.000, reward_mean=0.0, reward_bound=0.0
90755: loss=0.000, reward_mean=0.0, reward_bound=0.0
90756: loss=0.000, reward_mean=0.0, reward_bound=0.0
90757: loss=0.000, reward_mean=0.0, reward_bound=0.0
90758: loss=0.000, reward_mean=0.0, reward_bou

90896: loss=0.000, reward_mean=0.0, reward_bound=0.0
90897: loss=0.000, reward_mean=0.1, reward_bound=0.0
90898: loss=0.000, reward_mean=0.1, reward_bound=0.0
90899: loss=0.000, reward_mean=0.0, reward_bound=0.0
90900: loss=0.000, reward_mean=0.1, reward_bound=0.0
90901: loss=0.000, reward_mean=0.1, reward_bound=0.0
90902: loss=0.000, reward_mean=0.1, reward_bound=0.0
90903: loss=0.000, reward_mean=0.1, reward_bound=0.0
90904: loss=0.000, reward_mean=0.0, reward_bound=0.0
90905: loss=0.000, reward_mean=0.0, reward_bound=0.0
90906: loss=0.000, reward_mean=0.0, reward_bound=0.0
90907: loss=0.000, reward_mean=0.1, reward_bound=0.0
90908: loss=0.000, reward_mean=0.0, reward_bound=0.0
90909: loss=0.000, reward_mean=0.1, reward_bound=0.0
90910: loss=0.000, reward_mean=0.0, reward_bound=0.0
90911: loss=0.000, reward_mean=0.1, reward_bound=0.0
90912: loss=0.000, reward_mean=0.0, reward_bound=0.0
90913: loss=0.000, reward_mean=0.1, reward_bound=0.0
90914: loss=0.000, reward_mean=0.1, reward_bou

91053: loss=0.000, reward_mean=0.0, reward_bound=0.0
91054: loss=0.000, reward_mean=0.1, reward_bound=0.0
91055: loss=0.000, reward_mean=0.0, reward_bound=0.0
91056: loss=0.000, reward_mean=0.0, reward_bound=0.0
91057: loss=0.000, reward_mean=0.1, reward_bound=0.0
91058: loss=0.000, reward_mean=0.1, reward_bound=0.0
91059: loss=0.000, reward_mean=0.0, reward_bound=0.0
91060: loss=0.000, reward_mean=0.1, reward_bound=0.0
91061: loss=0.000, reward_mean=0.0, reward_bound=0.0
91062: loss=0.000, reward_mean=0.1, reward_bound=0.0
91063: loss=0.000, reward_mean=0.1, reward_bound=0.0
91064: loss=0.000, reward_mean=0.0, reward_bound=0.0
91065: loss=0.000, reward_mean=0.0, reward_bound=0.0
91066: loss=0.000, reward_mean=0.0, reward_bound=0.0
91067: loss=0.000, reward_mean=0.1, reward_bound=0.0
91068: loss=0.000, reward_mean=0.1, reward_bound=0.0
91069: loss=0.000, reward_mean=0.1, reward_bound=0.0
91070: loss=0.000, reward_mean=0.0, reward_bound=0.0
91071: loss=0.000, reward_mean=0.1, reward_bou

91210: loss=0.000, reward_mean=0.0, reward_bound=0.0
91211: loss=0.000, reward_mean=0.0, reward_bound=0.0
91212: loss=0.000, reward_mean=0.1, reward_bound=0.0
91213: loss=0.000, reward_mean=0.1, reward_bound=0.0
91214: loss=0.000, reward_mean=0.1, reward_bound=0.0
91215: loss=0.000, reward_mean=0.1, reward_bound=0.0
91216: loss=0.000, reward_mean=0.1, reward_bound=0.0
91217: loss=0.000, reward_mean=0.1, reward_bound=0.0
91218: loss=0.000, reward_mean=0.0, reward_bound=0.0
91219: loss=0.000, reward_mean=0.0, reward_bound=0.0
91220: loss=0.000, reward_mean=0.1, reward_bound=0.0
91221: loss=0.000, reward_mean=0.0, reward_bound=0.0
91222: loss=0.000, reward_mean=0.0, reward_bound=0.0
91223: loss=0.000, reward_mean=0.1, reward_bound=0.0
91224: loss=0.000, reward_mean=0.1, reward_bound=0.0
91225: loss=0.000, reward_mean=0.1, reward_bound=0.0
91226: loss=0.000, reward_mean=0.0, reward_bound=0.0
91227: loss=0.000, reward_mean=0.2, reward_bound=0.0
91228: loss=0.000, reward_mean=0.1, reward_bou

91365: loss=0.000, reward_mean=0.1, reward_bound=0.0
91366: loss=0.000, reward_mean=0.2, reward_bound=0.0
91367: loss=0.000, reward_mean=0.0, reward_bound=0.0
91368: loss=0.000, reward_mean=0.0, reward_bound=0.0
91369: loss=0.000, reward_mean=0.1, reward_bound=0.0
91370: loss=0.000, reward_mean=0.0, reward_bound=0.0
91371: loss=0.000, reward_mean=0.0, reward_bound=0.0
91372: loss=0.000, reward_mean=0.0, reward_bound=0.0
91373: loss=0.000, reward_mean=0.0, reward_bound=0.0
91374: loss=0.000, reward_mean=0.1, reward_bound=0.0
91375: loss=0.000, reward_mean=0.1, reward_bound=0.0
91376: loss=0.000, reward_mean=0.1, reward_bound=0.0
91377: loss=0.000, reward_mean=0.0, reward_bound=0.0
91378: loss=0.000, reward_mean=0.1, reward_bound=0.0
91379: loss=0.000, reward_mean=0.1, reward_bound=0.0
91380: loss=0.000, reward_mean=0.0, reward_bound=0.0
91381: loss=0.000, reward_mean=0.1, reward_bound=0.0
91382: loss=0.000, reward_mean=0.2, reward_bound=0.0
91383: loss=0.000, reward_mean=0.0, reward_bou

91524: loss=0.000, reward_mean=0.1, reward_bound=0.0
91525: loss=0.000, reward_mean=0.1, reward_bound=0.0
91526: loss=0.000, reward_mean=0.1, reward_bound=0.0
91527: loss=0.000, reward_mean=0.1, reward_bound=0.0
91528: loss=0.000, reward_mean=0.1, reward_bound=0.0
91529: loss=0.000, reward_mean=0.0, reward_bound=0.0
91530: loss=0.000, reward_mean=0.1, reward_bound=0.0
91531: loss=0.000, reward_mean=0.1, reward_bound=0.0
91532: loss=0.000, reward_mean=0.0, reward_bound=0.0
91533: loss=0.000, reward_mean=0.1, reward_bound=0.0
91534: loss=0.000, reward_mean=0.1, reward_bound=0.0
91535: loss=0.000, reward_mean=0.0, reward_bound=0.0
91536: loss=0.000, reward_mean=0.0, reward_bound=0.0
91537: loss=0.000, reward_mean=0.2, reward_bound=0.0
91538: loss=0.000, reward_mean=0.0, reward_bound=0.0
91539: loss=0.000, reward_mean=0.1, reward_bound=0.0
91540: loss=0.000, reward_mean=0.0, reward_bound=0.0
91541: loss=0.000, reward_mean=0.2, reward_bound=0.0
91542: loss=0.000, reward_mean=0.0, reward_bou

91679: loss=0.000, reward_mean=0.1, reward_bound=0.0
91680: loss=0.000, reward_mean=0.1, reward_bound=0.0
91681: loss=0.000, reward_mean=0.0, reward_bound=0.0
91682: loss=0.000, reward_mean=0.0, reward_bound=0.0
91683: loss=0.000, reward_mean=0.1, reward_bound=0.0
91684: loss=0.000, reward_mean=0.0, reward_bound=0.0
91685: loss=0.000, reward_mean=0.0, reward_bound=0.0
91686: loss=0.000, reward_mean=0.0, reward_bound=0.0
91687: loss=0.000, reward_mean=0.0, reward_bound=0.0
91688: loss=0.000, reward_mean=0.1, reward_bound=0.0
91689: loss=0.000, reward_mean=0.0, reward_bound=0.0
91690: loss=0.000, reward_mean=0.0, reward_bound=0.0
91691: loss=0.000, reward_mean=0.1, reward_bound=0.0
91692: loss=0.000, reward_mean=0.1, reward_bound=0.0
91693: loss=0.000, reward_mean=0.1, reward_bound=0.0
91694: loss=0.000, reward_mean=0.0, reward_bound=0.0
91695: loss=0.000, reward_mean=0.2, reward_bound=0.0
91696: loss=0.000, reward_mean=0.1, reward_bound=0.0
91697: loss=0.000, reward_mean=0.0, reward_bou

91835: loss=0.000, reward_mean=0.1, reward_bound=0.0
91836: loss=0.000, reward_mean=0.1, reward_bound=0.0
91837: loss=0.000, reward_mean=0.0, reward_bound=0.0
91838: loss=0.000, reward_mean=0.2, reward_bound=0.0
91839: loss=0.000, reward_mean=0.0, reward_bound=0.0
91840: loss=0.000, reward_mean=0.1, reward_bound=0.0
91841: loss=0.000, reward_mean=0.0, reward_bound=0.0
91842: loss=0.000, reward_mean=0.1, reward_bound=0.0
91843: loss=0.000, reward_mean=0.1, reward_bound=0.0
91844: loss=0.000, reward_mean=0.0, reward_bound=0.0
91845: loss=0.000, reward_mean=0.1, reward_bound=0.0
91846: loss=0.000, reward_mean=0.1, reward_bound=0.0
91847: loss=0.000, reward_mean=0.0, reward_bound=0.0
91848: loss=0.000, reward_mean=0.1, reward_bound=0.0
91849: loss=0.000, reward_mean=0.1, reward_bound=0.0
91850: loss=0.000, reward_mean=0.1, reward_bound=0.0
91851: loss=0.000, reward_mean=0.1, reward_bound=0.0
91852: loss=0.000, reward_mean=0.1, reward_bound=0.0
91853: loss=0.000, reward_mean=0.1, reward_bou

91992: loss=0.000, reward_mean=0.1, reward_bound=0.0
91993: loss=0.000, reward_mean=0.1, reward_bound=0.0
91994: loss=0.000, reward_mean=0.1, reward_bound=0.0
91995: loss=0.000, reward_mean=0.2, reward_bound=0.0
91996: loss=0.000, reward_mean=0.1, reward_bound=0.0
91997: loss=0.000, reward_mean=0.0, reward_bound=0.0
91998: loss=0.000, reward_mean=0.2, reward_bound=0.0
91999: loss=0.000, reward_mean=0.1, reward_bound=0.0
92000: loss=0.000, reward_mean=0.0, reward_bound=0.0
92001: loss=0.000, reward_mean=0.1, reward_bound=0.0
92002: loss=0.000, reward_mean=0.1, reward_bound=0.0
92003: loss=0.000, reward_mean=0.0, reward_bound=0.0
92004: loss=0.000, reward_mean=0.1, reward_bound=0.0
92005: loss=0.000, reward_mean=0.1, reward_bound=0.0
92006: loss=0.000, reward_mean=0.0, reward_bound=0.0
92007: loss=0.000, reward_mean=0.0, reward_bound=0.0
92008: loss=0.000, reward_mean=0.1, reward_bound=0.0
92009: loss=0.000, reward_mean=0.1, reward_bound=0.0
92010: loss=0.000, reward_mean=0.0, reward_bou

92149: loss=0.000, reward_mean=0.1, reward_bound=0.0
92150: loss=0.000, reward_mean=0.1, reward_bound=0.0
92151: loss=0.000, reward_mean=0.1, reward_bound=0.0
92152: loss=0.000, reward_mean=0.1, reward_bound=0.0
92153: loss=0.000, reward_mean=0.0, reward_bound=0.0
92154: loss=0.000, reward_mean=0.1, reward_bound=0.0
92155: loss=0.000, reward_mean=0.0, reward_bound=0.0
92156: loss=0.000, reward_mean=0.1, reward_bound=0.0
92157: loss=0.000, reward_mean=0.1, reward_bound=0.0
92158: loss=0.000, reward_mean=0.1, reward_bound=0.0
92159: loss=0.000, reward_mean=0.1, reward_bound=0.0
92160: loss=0.000, reward_mean=0.1, reward_bound=0.0
92161: loss=0.000, reward_mean=0.1, reward_bound=0.0
92162: loss=0.000, reward_mean=0.1, reward_bound=0.0
92163: loss=0.000, reward_mean=0.1, reward_bound=0.0
92164: loss=0.000, reward_mean=0.1, reward_bound=0.0
92165: loss=0.000, reward_mean=0.1, reward_bound=0.0
92166: loss=0.000, reward_mean=0.0, reward_bound=0.0
92167: loss=0.000, reward_mean=0.2, reward_bou

92305: loss=0.000, reward_mean=0.1, reward_bound=0.0
92306: loss=0.000, reward_mean=0.0, reward_bound=0.0
92307: loss=0.000, reward_mean=0.1, reward_bound=0.0
92308: loss=0.000, reward_mean=0.0, reward_bound=0.0
92309: loss=0.000, reward_mean=0.1, reward_bound=0.0
92310: loss=0.000, reward_mean=0.2, reward_bound=0.0
92311: loss=0.000, reward_mean=0.1, reward_bound=0.0
92312: loss=0.000, reward_mean=0.1, reward_bound=0.0
92313: loss=0.000, reward_mean=0.2, reward_bound=0.0
92314: loss=0.000, reward_mean=0.0, reward_bound=0.0
92315: loss=0.000, reward_mean=0.0, reward_bound=0.0
92316: loss=0.000, reward_mean=0.1, reward_bound=0.0
92317: loss=0.000, reward_mean=0.1, reward_bound=0.0
92318: loss=0.000, reward_mean=0.0, reward_bound=0.0
92319: loss=0.000, reward_mean=0.1, reward_bound=0.0
92320: loss=0.000, reward_mean=0.0, reward_bound=0.0
92321: loss=0.000, reward_mean=0.1, reward_bound=0.0
92322: loss=0.000, reward_mean=0.1, reward_bound=0.0
92323: loss=0.000, reward_mean=0.1, reward_bou

92461: loss=0.000, reward_mean=0.1, reward_bound=0.0
92462: loss=0.000, reward_mean=0.1, reward_bound=0.0
92463: loss=0.000, reward_mean=0.0, reward_bound=0.0
92464: loss=0.000, reward_mean=0.0, reward_bound=0.0
92465: loss=0.000, reward_mean=0.1, reward_bound=0.0
92466: loss=0.000, reward_mean=0.0, reward_bound=0.0
92467: loss=0.000, reward_mean=0.1, reward_bound=0.0
92468: loss=0.000, reward_mean=0.0, reward_bound=0.0
92469: loss=0.000, reward_mean=0.0, reward_bound=0.0
92470: loss=0.000, reward_mean=0.0, reward_bound=0.0
92471: loss=0.000, reward_mean=0.0, reward_bound=0.0
92472: loss=0.000, reward_mean=0.1, reward_bound=0.0
92473: loss=0.000, reward_mean=0.0, reward_bound=0.0
92474: loss=0.000, reward_mean=0.0, reward_bound=0.0
92475: loss=0.000, reward_mean=0.2, reward_bound=0.0
92476: loss=0.000, reward_mean=0.1, reward_bound=0.0
92477: loss=0.000, reward_mean=0.1, reward_bound=0.0
92478: loss=0.000, reward_mean=0.0, reward_bound=0.0
92479: loss=0.000, reward_mean=0.1, reward_bou

92617: loss=0.000, reward_mean=0.1, reward_bound=0.0
92618: loss=0.000, reward_mean=0.1, reward_bound=0.0
92619: loss=0.000, reward_mean=0.0, reward_bound=0.0
92620: loss=0.000, reward_mean=0.1, reward_bound=0.0
92621: loss=0.000, reward_mean=0.0, reward_bound=0.0
92622: loss=0.000, reward_mean=0.0, reward_bound=0.0
92623: loss=0.000, reward_mean=0.1, reward_bound=0.0
92624: loss=0.000, reward_mean=0.0, reward_bound=0.0
92625: loss=0.000, reward_mean=0.0, reward_bound=0.0
92626: loss=0.000, reward_mean=0.0, reward_bound=0.0
92627: loss=0.000, reward_mean=0.0, reward_bound=0.0
92628: loss=0.000, reward_mean=0.1, reward_bound=0.0
92629: loss=0.000, reward_mean=0.1, reward_bound=0.0
92630: loss=0.000, reward_mean=0.0, reward_bound=0.0
92631: loss=0.000, reward_mean=0.1, reward_bound=0.0
92632: loss=0.000, reward_mean=0.0, reward_bound=0.0
92633: loss=0.000, reward_mean=0.0, reward_bound=0.0
92634: loss=0.000, reward_mean=0.1, reward_bound=0.0
92635: loss=0.000, reward_mean=0.0, reward_bou

92774: loss=0.000, reward_mean=0.0, reward_bound=0.0
92775: loss=0.000, reward_mean=0.0, reward_bound=0.0
92776: loss=0.000, reward_mean=0.1, reward_bound=0.0
92777: loss=0.000, reward_mean=0.0, reward_bound=0.0
92778: loss=0.000, reward_mean=0.1, reward_bound=0.0
92779: loss=0.000, reward_mean=0.1, reward_bound=0.0
92780: loss=0.000, reward_mean=0.1, reward_bound=0.0
92781: loss=0.000, reward_mean=0.1, reward_bound=0.0
92782: loss=0.000, reward_mean=0.1, reward_bound=0.0
92783: loss=0.000, reward_mean=0.0, reward_bound=0.0
92784: loss=0.000, reward_mean=0.1, reward_bound=0.0
92785: loss=0.000, reward_mean=0.0, reward_bound=0.0
92786: loss=0.000, reward_mean=0.1, reward_bound=0.0
92787: loss=0.000, reward_mean=0.1, reward_bound=0.0
92788: loss=0.000, reward_mean=0.0, reward_bound=0.0
92789: loss=0.000, reward_mean=0.0, reward_bound=0.0
92790: loss=0.000, reward_mean=0.1, reward_bound=0.0
92791: loss=0.000, reward_mean=0.1, reward_bound=0.0
92792: loss=0.000, reward_mean=0.1, reward_bou

92932: loss=0.000, reward_mean=0.0, reward_bound=0.0
92933: loss=0.000, reward_mean=0.1, reward_bound=0.0
92934: loss=0.000, reward_mean=0.1, reward_bound=0.0
92935: loss=0.000, reward_mean=0.1, reward_bound=0.0
92936: loss=0.000, reward_mean=0.1, reward_bound=0.0
92937: loss=0.000, reward_mean=0.1, reward_bound=0.0
92938: loss=0.000, reward_mean=0.0, reward_bound=0.0
92939: loss=0.000, reward_mean=0.1, reward_bound=0.0
92940: loss=0.000, reward_mean=0.1, reward_bound=0.0
92941: loss=0.000, reward_mean=0.0, reward_bound=0.0
92942: loss=0.000, reward_mean=0.1, reward_bound=0.0
92943: loss=0.000, reward_mean=0.1, reward_bound=0.0
92944: loss=0.000, reward_mean=0.1, reward_bound=0.0
92945: loss=0.000, reward_mean=0.1, reward_bound=0.0
92946: loss=0.000, reward_mean=0.0, reward_bound=0.0
92947: loss=0.000, reward_mean=0.1, reward_bound=0.0
92948: loss=0.000, reward_mean=0.0, reward_bound=0.0
92949: loss=0.000, reward_mean=0.0, reward_bound=0.0
92950: loss=0.000, reward_mean=0.0, reward_bou

93087: loss=0.000, reward_mean=0.1, reward_bound=0.0
93088: loss=0.000, reward_mean=0.0, reward_bound=0.0
93089: loss=0.000, reward_mean=0.0, reward_bound=0.0
93090: loss=0.000, reward_mean=0.2, reward_bound=0.0
93091: loss=0.000, reward_mean=0.0, reward_bound=0.0
93092: loss=0.000, reward_mean=0.1, reward_bound=0.0
93093: loss=0.000, reward_mean=0.0, reward_bound=0.0
93094: loss=0.000, reward_mean=0.1, reward_bound=0.0
93095: loss=0.000, reward_mean=0.1, reward_bound=0.0
93096: loss=0.000, reward_mean=0.1, reward_bound=0.0
93097: loss=0.000, reward_mean=0.1, reward_bound=0.0
93098: loss=0.000, reward_mean=0.1, reward_bound=0.0
93099: loss=0.000, reward_mean=0.1, reward_bound=0.0
93100: loss=0.000, reward_mean=0.2, reward_bound=0.0
93101: loss=0.000, reward_mean=0.1, reward_bound=0.0
93102: loss=0.000, reward_mean=0.0, reward_bound=0.0
93103: loss=0.000, reward_mean=0.1, reward_bound=0.0
93104: loss=0.000, reward_mean=0.0, reward_bound=0.0
93105: loss=0.000, reward_mean=0.1, reward_bou

93242: loss=0.000, reward_mean=0.0, reward_bound=0.0
93243: loss=0.000, reward_mean=0.1, reward_bound=0.0
93244: loss=0.000, reward_mean=0.1, reward_bound=0.0
93245: loss=0.000, reward_mean=0.2, reward_bound=0.0
93246: loss=0.000, reward_mean=0.1, reward_bound=0.0
93247: loss=0.000, reward_mean=0.1, reward_bound=0.0
93248: loss=0.000, reward_mean=0.0, reward_bound=0.0
93249: loss=0.000, reward_mean=0.1, reward_bound=0.0
93250: loss=0.000, reward_mean=0.2, reward_bound=0.0
93251: loss=0.000, reward_mean=0.0, reward_bound=0.0
93252: loss=0.000, reward_mean=0.1, reward_bound=0.0
93253: loss=0.000, reward_mean=0.2, reward_bound=0.0
93254: loss=0.000, reward_mean=0.0, reward_bound=0.0
93255: loss=0.000, reward_mean=0.2, reward_bound=0.0
93256: loss=0.000, reward_mean=0.1, reward_bound=0.0
93257: loss=0.000, reward_mean=0.0, reward_bound=0.0
93258: loss=0.000, reward_mean=0.1, reward_bound=0.0
93259: loss=0.000, reward_mean=0.1, reward_bound=0.0
93260: loss=0.000, reward_mean=0.1, reward_bou

93396: loss=0.000, reward_mean=0.1, reward_bound=0.0
93397: loss=0.000, reward_mean=0.1, reward_bound=0.0
93398: loss=0.000, reward_mean=0.1, reward_bound=0.0
93399: loss=0.000, reward_mean=0.0, reward_bound=0.0
93400: loss=0.000, reward_mean=0.0, reward_bound=0.0
93401: loss=0.000, reward_mean=0.0, reward_bound=0.0
93402: loss=0.000, reward_mean=0.1, reward_bound=0.0
93403: loss=0.000, reward_mean=0.1, reward_bound=0.0
93404: loss=0.000, reward_mean=0.1, reward_bound=0.0
93405: loss=0.000, reward_mean=0.0, reward_bound=0.0
93406: loss=0.000, reward_mean=0.1, reward_bound=0.0
93407: loss=0.000, reward_mean=0.0, reward_bound=0.0
93408: loss=0.000, reward_mean=0.0, reward_bound=0.0
93409: loss=0.000, reward_mean=0.1, reward_bound=0.0
93410: loss=0.000, reward_mean=0.1, reward_bound=0.0
93411: loss=0.000, reward_mean=0.1, reward_bound=0.0
93412: loss=0.000, reward_mean=0.0, reward_bound=0.0
93413: loss=0.000, reward_mean=0.0, reward_bound=0.0
93414: loss=0.000, reward_mean=0.2, reward_bou

93558: loss=0.000, reward_mean=0.0, reward_bound=0.0
93559: loss=0.000, reward_mean=0.1, reward_bound=0.0
93560: loss=0.000, reward_mean=0.1, reward_bound=0.0
93561: loss=0.000, reward_mean=0.1, reward_bound=0.0
93562: loss=0.000, reward_mean=0.1, reward_bound=0.0
93563: loss=0.000, reward_mean=0.0, reward_bound=0.0
93564: loss=0.000, reward_mean=0.0, reward_bound=0.0
93565: loss=0.000, reward_mean=0.0, reward_bound=0.0
93566: loss=0.000, reward_mean=0.2, reward_bound=0.0
93567: loss=0.000, reward_mean=0.1, reward_bound=0.0
93568: loss=0.000, reward_mean=0.0, reward_bound=0.0
93569: loss=0.000, reward_mean=0.1, reward_bound=0.0
93570: loss=0.000, reward_mean=0.0, reward_bound=0.0
93571: loss=0.000, reward_mean=0.0, reward_bound=0.0
93572: loss=0.000, reward_mean=0.1, reward_bound=0.0
93573: loss=0.000, reward_mean=0.1, reward_bound=0.0
93574: loss=0.000, reward_mean=0.1, reward_bound=0.0
93575: loss=0.000, reward_mean=0.1, reward_bound=0.0
93576: loss=0.000, reward_mean=0.1, reward_bou

93717: loss=0.000, reward_mean=0.2, reward_bound=0.0
93718: loss=0.000, reward_mean=0.1, reward_bound=0.0
93719: loss=0.000, reward_mean=0.1, reward_bound=0.0
93720: loss=0.000, reward_mean=0.1, reward_bound=0.0
93721: loss=0.000, reward_mean=0.0, reward_bound=0.0
93722: loss=0.000, reward_mean=0.0, reward_bound=0.0
93723: loss=0.000, reward_mean=0.1, reward_bound=0.0
93724: loss=0.000, reward_mean=0.0, reward_bound=0.0
93725: loss=0.000, reward_mean=0.1, reward_bound=0.0
93726: loss=0.000, reward_mean=0.1, reward_bound=0.0
93727: loss=0.000, reward_mean=0.0, reward_bound=0.0
93728: loss=0.000, reward_mean=0.1, reward_bound=0.0
93729: loss=0.000, reward_mean=0.1, reward_bound=0.0
93730: loss=0.000, reward_mean=0.1, reward_bound=0.0
93731: loss=0.000, reward_mean=0.1, reward_bound=0.0
93732: loss=0.000, reward_mean=0.1, reward_bound=0.0
93733: loss=0.000, reward_mean=0.0, reward_bound=0.0
93734: loss=0.000, reward_mean=0.0, reward_bound=0.0
93735: loss=0.000, reward_mean=0.0, reward_bou

93872: loss=0.000, reward_mean=0.0, reward_bound=0.0
93873: loss=0.000, reward_mean=0.0, reward_bound=0.0
93874: loss=0.000, reward_mean=0.2, reward_bound=0.0
93875: loss=0.000, reward_mean=0.1, reward_bound=0.0
93876: loss=0.000, reward_mean=0.2, reward_bound=0.0
93877: loss=0.000, reward_mean=0.1, reward_bound=0.0
93878: loss=0.000, reward_mean=0.1, reward_bound=0.0
93879: loss=0.000, reward_mean=0.0, reward_bound=0.0
93880: loss=0.000, reward_mean=0.1, reward_bound=0.0
93881: loss=0.000, reward_mean=0.1, reward_bound=0.0
93882: loss=0.000, reward_mean=0.0, reward_bound=0.0
93883: loss=0.000, reward_mean=0.1, reward_bound=0.0
93884: loss=0.000, reward_mean=0.1, reward_bound=0.0
93885: loss=0.000, reward_mean=0.1, reward_bound=0.0
93886: loss=0.000, reward_mean=0.1, reward_bound=0.0
93887: loss=0.000, reward_mean=0.1, reward_bound=0.0
93888: loss=0.000, reward_mean=0.1, reward_bound=0.0
93889: loss=0.000, reward_mean=0.1, reward_bound=0.0
93890: loss=0.000, reward_mean=0.1, reward_bou

94031: loss=0.000, reward_mean=0.1, reward_bound=0.0
94032: loss=0.000, reward_mean=0.1, reward_bound=0.0
94033: loss=0.000, reward_mean=0.1, reward_bound=0.0
94034: loss=0.000, reward_mean=0.1, reward_bound=0.0
94035: loss=0.000, reward_mean=0.1, reward_bound=0.0
94036: loss=0.000, reward_mean=0.1, reward_bound=0.0
94037: loss=0.000, reward_mean=0.1, reward_bound=0.0
94038: loss=0.000, reward_mean=0.0, reward_bound=0.0
94039: loss=0.000, reward_mean=0.1, reward_bound=0.0
94040: loss=0.000, reward_mean=0.0, reward_bound=0.0
94041: loss=0.000, reward_mean=0.1, reward_bound=0.0
94042: loss=0.000, reward_mean=0.0, reward_bound=0.0
94043: loss=0.000, reward_mean=0.0, reward_bound=0.0
94044: loss=0.000, reward_mean=0.1, reward_bound=0.0
94045: loss=0.000, reward_mean=0.1, reward_bound=0.0
94046: loss=0.000, reward_mean=0.0, reward_bound=0.0
94047: loss=0.000, reward_mean=0.0, reward_bound=0.0
94048: loss=0.000, reward_mean=0.0, reward_bound=0.0
94049: loss=0.000, reward_mean=0.0, reward_bou

94186: loss=0.000, reward_mean=0.1, reward_bound=0.0
94187: loss=0.000, reward_mean=0.0, reward_bound=0.0
94188: loss=0.000, reward_mean=0.2, reward_bound=0.0
94189: loss=0.000, reward_mean=0.1, reward_bound=0.0
94190: loss=0.000, reward_mean=0.1, reward_bound=0.0
94191: loss=0.000, reward_mean=0.0, reward_bound=0.0
94192: loss=0.000, reward_mean=0.1, reward_bound=0.0
94193: loss=0.000, reward_mean=0.0, reward_bound=0.0
94194: loss=0.000, reward_mean=0.0, reward_bound=0.0
94195: loss=0.000, reward_mean=0.2, reward_bound=0.0
94196: loss=0.000, reward_mean=0.0, reward_bound=0.0
94197: loss=0.000, reward_mean=0.1, reward_bound=0.0
94198: loss=0.000, reward_mean=0.0, reward_bound=0.0
94199: loss=0.000, reward_mean=0.1, reward_bound=0.0
94200: loss=0.000, reward_mean=0.1, reward_bound=0.0
94201: loss=0.000, reward_mean=0.0, reward_bound=0.0
94202: loss=0.000, reward_mean=0.1, reward_bound=0.0
94203: loss=0.000, reward_mean=0.1, reward_bound=0.0
94204: loss=0.000, reward_mean=0.1, reward_bou

94344: loss=0.000, reward_mean=0.0, reward_bound=0.0
94345: loss=0.000, reward_mean=0.1, reward_bound=0.0
94346: loss=0.000, reward_mean=0.1, reward_bound=0.0
94347: loss=0.000, reward_mean=0.1, reward_bound=0.0
94348: loss=0.000, reward_mean=0.0, reward_bound=0.0
94349: loss=0.000, reward_mean=0.1, reward_bound=0.0
94350: loss=0.000, reward_mean=0.1, reward_bound=0.0
94351: loss=0.000, reward_mean=0.0, reward_bound=0.0
94352: loss=0.000, reward_mean=0.1, reward_bound=0.0
94353: loss=0.000, reward_mean=0.1, reward_bound=0.0
94354: loss=0.000, reward_mean=0.1, reward_bound=0.0
94355: loss=0.000, reward_mean=0.0, reward_bound=0.0
94356: loss=0.000, reward_mean=0.1, reward_bound=0.0
94357: loss=0.000, reward_mean=0.1, reward_bound=0.0
94358: loss=0.000, reward_mean=0.0, reward_bound=0.0
94359: loss=0.000, reward_mean=0.0, reward_bound=0.0
94360: loss=0.000, reward_mean=0.0, reward_bound=0.0
94361: loss=0.000, reward_mean=0.0, reward_bound=0.0
94362: loss=0.000, reward_mean=0.0, reward_bou

94500: loss=0.000, reward_mean=0.1, reward_bound=0.0
94501: loss=0.000, reward_mean=0.1, reward_bound=0.0
94502: loss=0.000, reward_mean=0.0, reward_bound=0.0
94503: loss=0.000, reward_mean=0.2, reward_bound=0.0
94504: loss=0.000, reward_mean=0.0, reward_bound=0.0
94505: loss=0.000, reward_mean=0.1, reward_bound=0.0
94506: loss=0.000, reward_mean=0.0, reward_bound=0.0
94507: loss=0.000, reward_mean=0.0, reward_bound=0.0
94508: loss=0.000, reward_mean=0.0, reward_bound=0.0
94509: loss=0.000, reward_mean=0.1, reward_bound=0.0
94510: loss=0.000, reward_mean=0.0, reward_bound=0.0
94511: loss=0.000, reward_mean=0.1, reward_bound=0.0
94512: loss=0.000, reward_mean=0.0, reward_bound=0.0
94513: loss=0.000, reward_mean=0.1, reward_bound=0.0
94514: loss=0.000, reward_mean=0.0, reward_bound=0.0
94515: loss=0.000, reward_mean=0.1, reward_bound=0.0
94516: loss=0.000, reward_mean=0.0, reward_bound=0.0
94517: loss=0.000, reward_mean=0.1, reward_bound=0.0
94518: loss=0.000, reward_mean=0.2, reward_bou

94655: loss=0.000, reward_mean=0.1, reward_bound=0.0
94656: loss=0.000, reward_mean=0.0, reward_bound=0.0
94657: loss=0.000, reward_mean=0.0, reward_bound=0.0
94658: loss=0.000, reward_mean=0.1, reward_bound=0.0
94659: loss=0.000, reward_mean=0.2, reward_bound=0.0
94660: loss=0.000, reward_mean=0.0, reward_bound=0.0
94661: loss=0.000, reward_mean=0.1, reward_bound=0.0
94662: loss=0.000, reward_mean=0.0, reward_bound=0.0
94663: loss=0.000, reward_mean=0.0, reward_bound=0.0
94664: loss=0.000, reward_mean=0.1, reward_bound=0.0
94665: loss=0.000, reward_mean=0.1, reward_bound=0.0
94666: loss=0.000, reward_mean=0.1, reward_bound=0.0
94667: loss=0.000, reward_mean=0.0, reward_bound=0.0
94668: loss=0.000, reward_mean=0.2, reward_bound=0.0
94669: loss=0.000, reward_mean=0.0, reward_bound=0.0
94670: loss=0.000, reward_mean=0.1, reward_bound=0.0
94671: loss=0.000, reward_mean=0.1, reward_bound=0.0
94672: loss=0.000, reward_mean=0.0, reward_bound=0.0
94673: loss=0.000, reward_mean=0.0, reward_bou

94811: loss=0.000, reward_mean=0.0, reward_bound=0.0
94812: loss=0.000, reward_mean=0.0, reward_bound=0.0
94813: loss=0.000, reward_mean=0.1, reward_bound=0.0
94814: loss=0.000, reward_mean=0.0, reward_bound=0.0
94815: loss=0.000, reward_mean=0.1, reward_bound=0.0
94816: loss=0.000, reward_mean=0.0, reward_bound=0.0
94817: loss=0.000, reward_mean=0.0, reward_bound=0.0
94818: loss=0.000, reward_mean=0.2, reward_bound=0.0
94819: loss=0.000, reward_mean=0.1, reward_bound=0.0
94820: loss=0.000, reward_mean=0.0, reward_bound=0.0
94821: loss=0.000, reward_mean=0.0, reward_bound=0.0
94822: loss=0.000, reward_mean=0.0, reward_bound=0.0
94823: loss=0.000, reward_mean=0.1, reward_bound=0.0
94824: loss=0.000, reward_mean=0.0, reward_bound=0.0
94825: loss=0.000, reward_mean=0.0, reward_bound=0.0
94826: loss=0.000, reward_mean=0.1, reward_bound=0.0
94827: loss=0.000, reward_mean=0.1, reward_bound=0.0
94828: loss=0.000, reward_mean=0.1, reward_bound=0.0
94829: loss=0.000, reward_mean=0.0, reward_bou

94966: loss=0.000, reward_mean=0.2, reward_bound=0.0
94967: loss=0.000, reward_mean=0.0, reward_bound=0.0
94968: loss=0.000, reward_mean=0.1, reward_bound=0.0
94969: loss=0.000, reward_mean=0.0, reward_bound=0.0
94970: loss=0.000, reward_mean=0.1, reward_bound=0.0
94971: loss=0.000, reward_mean=0.0, reward_bound=0.0
94972: loss=0.000, reward_mean=0.0, reward_bound=0.0
94973: loss=0.000, reward_mean=0.0, reward_bound=0.0
94974: loss=0.000, reward_mean=0.0, reward_bound=0.0
94975: loss=0.000, reward_mean=0.1, reward_bound=0.0
94976: loss=0.000, reward_mean=0.0, reward_bound=0.0
94977: loss=0.000, reward_mean=0.0, reward_bound=0.0
94978: loss=0.000, reward_mean=0.1, reward_bound=0.0
94979: loss=0.000, reward_mean=0.0, reward_bound=0.0
94980: loss=0.000, reward_mean=0.0, reward_bound=0.0
94981: loss=0.000, reward_mean=0.1, reward_bound=0.0
94982: loss=0.000, reward_mean=0.1, reward_bound=0.0
94983: loss=0.000, reward_mean=0.1, reward_bound=0.0
94984: loss=0.000, reward_mean=0.1, reward_bou

95122: loss=0.000, reward_mean=0.0, reward_bound=0.0
95123: loss=0.000, reward_mean=0.1, reward_bound=0.0
95124: loss=0.000, reward_mean=0.1, reward_bound=0.0
95125: loss=0.000, reward_mean=0.1, reward_bound=0.0
95126: loss=0.000, reward_mean=0.1, reward_bound=0.0
95127: loss=0.000, reward_mean=0.0, reward_bound=0.0
95128: loss=0.000, reward_mean=0.0, reward_bound=0.0
95129: loss=0.000, reward_mean=0.0, reward_bound=0.0
95130: loss=0.000, reward_mean=0.1, reward_bound=0.0
95131: loss=0.000, reward_mean=0.1, reward_bound=0.0
95132: loss=0.000, reward_mean=0.1, reward_bound=0.0
95133: loss=0.000, reward_mean=0.1, reward_bound=0.0
95134: loss=0.000, reward_mean=0.1, reward_bound=0.0
95135: loss=0.000, reward_mean=0.1, reward_bound=0.0
95136: loss=0.000, reward_mean=0.0, reward_bound=0.0
95137: loss=0.000, reward_mean=0.0, reward_bound=0.0
95138: loss=0.000, reward_mean=0.1, reward_bound=0.0
95139: loss=0.000, reward_mean=0.1, reward_bound=0.0
95140: loss=0.000, reward_mean=0.0, reward_bou

95277: loss=0.000, reward_mean=0.0, reward_bound=0.0
95278: loss=0.000, reward_mean=0.0, reward_bound=0.0
95279: loss=0.000, reward_mean=0.1, reward_bound=0.0
95280: loss=0.000, reward_mean=0.0, reward_bound=0.0
95281: loss=0.000, reward_mean=0.0, reward_bound=0.0
95282: loss=0.000, reward_mean=0.1, reward_bound=0.0
95283: loss=0.000, reward_mean=0.0, reward_bound=0.0
95284: loss=0.000, reward_mean=0.0, reward_bound=0.0
95285: loss=0.000, reward_mean=0.0, reward_bound=0.0
95286: loss=0.000, reward_mean=0.1, reward_bound=0.0
95287: loss=0.000, reward_mean=0.0, reward_bound=0.0
95288: loss=0.000, reward_mean=0.1, reward_bound=0.0
95289: loss=0.000, reward_mean=0.0, reward_bound=0.0
95290: loss=0.000, reward_mean=0.0, reward_bound=0.0
95291: loss=0.000, reward_mean=0.1, reward_bound=0.0
95292: loss=0.000, reward_mean=0.1, reward_bound=0.0
95293: loss=0.000, reward_mean=0.0, reward_bound=0.0
95294: loss=0.000, reward_mean=0.0, reward_bound=0.0
95295: loss=0.000, reward_mean=0.0, reward_bou

95435: loss=0.000, reward_mean=0.0, reward_bound=0.0
95436: loss=0.000, reward_mean=0.1, reward_bound=0.0
95437: loss=0.000, reward_mean=0.0, reward_bound=0.0
95438: loss=0.000, reward_mean=0.1, reward_bound=0.0
95439: loss=0.000, reward_mean=0.0, reward_bound=0.0
95440: loss=0.000, reward_mean=0.1, reward_bound=0.0
95441: loss=0.000, reward_mean=0.0, reward_bound=0.0
95442: loss=0.000, reward_mean=0.1, reward_bound=0.0
95443: loss=0.000, reward_mean=0.1, reward_bound=0.0
95444: loss=0.000, reward_mean=0.1, reward_bound=0.0
95445: loss=0.000, reward_mean=0.2, reward_bound=0.0
95446: loss=0.000, reward_mean=0.1, reward_bound=0.0
95447: loss=0.000, reward_mean=0.0, reward_bound=0.0
95448: loss=0.000, reward_mean=0.0, reward_bound=0.0
95449: loss=0.000, reward_mean=0.0, reward_bound=0.0
95450: loss=0.000, reward_mean=0.1, reward_bound=0.0
95451: loss=0.000, reward_mean=0.1, reward_bound=0.0
95452: loss=0.000, reward_mean=0.1, reward_bound=0.0
95453: loss=0.000, reward_mean=0.0, reward_bou

95593: loss=0.000, reward_mean=0.1, reward_bound=0.0
95594: loss=0.000, reward_mean=0.1, reward_bound=0.0
95595: loss=0.000, reward_mean=0.0, reward_bound=0.0
95596: loss=0.000, reward_mean=0.0, reward_bound=0.0
95597: loss=0.000, reward_mean=0.0, reward_bound=0.0
95598: loss=0.000, reward_mean=0.0, reward_bound=0.0
95599: loss=0.000, reward_mean=0.1, reward_bound=0.0
95600: loss=0.000, reward_mean=0.1, reward_bound=0.0
95601: loss=0.000, reward_mean=0.0, reward_bound=0.0
95602: loss=0.000, reward_mean=0.1, reward_bound=0.0
95603: loss=0.000, reward_mean=0.1, reward_bound=0.0
95604: loss=0.000, reward_mean=0.1, reward_bound=0.0
95605: loss=0.000, reward_mean=0.1, reward_bound=0.0
95606: loss=0.000, reward_mean=0.1, reward_bound=0.0
95607: loss=0.000, reward_mean=0.1, reward_bound=0.0
95608: loss=0.000, reward_mean=0.1, reward_bound=0.0
95609: loss=0.000, reward_mean=0.0, reward_bound=0.0
95610: loss=0.000, reward_mean=0.1, reward_bound=0.0
95611: loss=0.000, reward_mean=0.1, reward_bou

95748: loss=0.000, reward_mean=0.0, reward_bound=0.0
95749: loss=0.000, reward_mean=0.1, reward_bound=0.0
95750: loss=0.000, reward_mean=0.1, reward_bound=0.0
95751: loss=0.000, reward_mean=0.2, reward_bound=0.0
95752: loss=0.000, reward_mean=0.1, reward_bound=0.0
95753: loss=0.000, reward_mean=0.1, reward_bound=0.0
95754: loss=0.000, reward_mean=0.0, reward_bound=0.0
95755: loss=0.000, reward_mean=0.0, reward_bound=0.0
95756: loss=0.000, reward_mean=0.1, reward_bound=0.0
95757: loss=0.000, reward_mean=0.0, reward_bound=0.0
95758: loss=0.000, reward_mean=0.1, reward_bound=0.0
95759: loss=0.000, reward_mean=0.2, reward_bound=0.0
95760: loss=0.000, reward_mean=0.0, reward_bound=0.0
95761: loss=0.000, reward_mean=0.1, reward_bound=0.0
95762: loss=0.000, reward_mean=0.1, reward_bound=0.0
95763: loss=0.000, reward_mean=0.1, reward_bound=0.0
95764: loss=0.000, reward_mean=0.1, reward_bound=0.0
95765: loss=0.000, reward_mean=0.1, reward_bound=0.0
95766: loss=0.000, reward_mean=0.0, reward_bou

95908: loss=0.000, reward_mean=0.0, reward_bound=0.0
95909: loss=0.000, reward_mean=0.0, reward_bound=0.0
95910: loss=0.000, reward_mean=0.0, reward_bound=0.0
95911: loss=0.000, reward_mean=0.0, reward_bound=0.0
95912: loss=0.000, reward_mean=0.0, reward_bound=0.0
95913: loss=0.000, reward_mean=0.0, reward_bound=0.0
95914: loss=0.000, reward_mean=0.0, reward_bound=0.0
95915: loss=0.000, reward_mean=0.1, reward_bound=0.0
95916: loss=0.000, reward_mean=0.0, reward_bound=0.0
95917: loss=0.000, reward_mean=0.1, reward_bound=0.0
95918: loss=0.000, reward_mean=0.0, reward_bound=0.0
95919: loss=0.000, reward_mean=0.1, reward_bound=0.0
95920: loss=0.000, reward_mean=0.0, reward_bound=0.0
95921: loss=0.000, reward_mean=0.1, reward_bound=0.0
95922: loss=0.000, reward_mean=0.0, reward_bound=0.0
95923: loss=0.000, reward_mean=0.1, reward_bound=0.0
95924: loss=0.000, reward_mean=0.0, reward_bound=0.0
95925: loss=0.000, reward_mean=0.1, reward_bound=0.0
95926: loss=0.000, reward_mean=0.1, reward_bou

96064: loss=0.000, reward_mean=0.1, reward_bound=0.0
96065: loss=0.000, reward_mean=0.0, reward_bound=0.0
96066: loss=0.000, reward_mean=0.0, reward_bound=0.0
96067: loss=0.000, reward_mean=0.0, reward_bound=0.0
96068: loss=0.000, reward_mean=0.0, reward_bound=0.0
96069: loss=0.000, reward_mean=0.0, reward_bound=0.0
96070: loss=0.000, reward_mean=0.1, reward_bound=0.0
96071: loss=0.000, reward_mean=0.1, reward_bound=0.0
96072: loss=0.000, reward_mean=0.0, reward_bound=0.0
96073: loss=0.000, reward_mean=0.0, reward_bound=0.0
96074: loss=0.000, reward_mean=0.2, reward_bound=0.0
96075: loss=0.000, reward_mean=0.0, reward_bound=0.0
96076: loss=0.000, reward_mean=0.1, reward_bound=0.0
96077: loss=0.000, reward_mean=0.0, reward_bound=0.0
96078: loss=0.000, reward_mean=0.1, reward_bound=0.0
96079: loss=0.000, reward_mean=0.1, reward_bound=0.0
96080: loss=0.000, reward_mean=0.0, reward_bound=0.0
96081: loss=0.000, reward_mean=0.1, reward_bound=0.0
96082: loss=0.000, reward_mean=0.0, reward_bou

96219: loss=0.000, reward_mean=0.0, reward_bound=0.0
96220: loss=0.000, reward_mean=0.1, reward_bound=0.0
96221: loss=0.000, reward_mean=0.2, reward_bound=0.0
96222: loss=0.000, reward_mean=0.0, reward_bound=0.0
96223: loss=0.000, reward_mean=0.1, reward_bound=0.0
96224: loss=0.000, reward_mean=0.0, reward_bound=0.0
96225: loss=0.000, reward_mean=0.1, reward_bound=0.0
96226: loss=0.000, reward_mean=0.1, reward_bound=0.0
96227: loss=0.000, reward_mean=0.1, reward_bound=0.0
96228: loss=0.000, reward_mean=0.1, reward_bound=0.0
96229: loss=0.000, reward_mean=0.0, reward_bound=0.0
96230: loss=0.000, reward_mean=0.0, reward_bound=0.0
96231: loss=0.000, reward_mean=0.0, reward_bound=0.0
96232: loss=0.000, reward_mean=0.1, reward_bound=0.0
96233: loss=0.000, reward_mean=0.1, reward_bound=0.0
96234: loss=0.000, reward_mean=0.0, reward_bound=0.0
96235: loss=0.000, reward_mean=0.1, reward_bound=0.0
96236: loss=0.000, reward_mean=0.0, reward_bound=0.0
96237: loss=0.000, reward_mean=0.0, reward_bou

96374: loss=0.000, reward_mean=0.0, reward_bound=0.0
96375: loss=0.000, reward_mean=0.1, reward_bound=0.0
96376: loss=0.000, reward_mean=0.0, reward_bound=0.0
96377: loss=0.000, reward_mean=0.0, reward_bound=0.0
96378: loss=0.000, reward_mean=0.0, reward_bound=0.0
96379: loss=0.000, reward_mean=0.0, reward_bound=0.0
96380: loss=0.000, reward_mean=0.1, reward_bound=0.0
96381: loss=0.000, reward_mean=0.1, reward_bound=0.0
96382: loss=0.000, reward_mean=0.1, reward_bound=0.0
96383: loss=0.000, reward_mean=0.1, reward_bound=0.0
96384: loss=0.000, reward_mean=0.0, reward_bound=0.0
96385: loss=0.000, reward_mean=0.1, reward_bound=0.0
96386: loss=0.000, reward_mean=0.1, reward_bound=0.0
96387: loss=0.000, reward_mean=0.1, reward_bound=0.0
96388: loss=0.000, reward_mean=0.1, reward_bound=0.0
96389: loss=0.000, reward_mean=0.1, reward_bound=0.0
96390: loss=0.000, reward_mean=0.0, reward_bound=0.0
96391: loss=0.000, reward_mean=0.0, reward_bound=0.0
96392: loss=0.000, reward_mean=0.1, reward_bou

96532: loss=0.000, reward_mean=0.1, reward_bound=0.0
96533: loss=0.000, reward_mean=0.1, reward_bound=0.0
96534: loss=0.000, reward_mean=0.0, reward_bound=0.0
96535: loss=0.000, reward_mean=0.2, reward_bound=0.0
96536: loss=0.000, reward_mean=0.1, reward_bound=0.0
96537: loss=0.000, reward_mean=0.0, reward_bound=0.0
96538: loss=0.000, reward_mean=0.1, reward_bound=0.0
96539: loss=0.000, reward_mean=0.2, reward_bound=0.0
96540: loss=0.000, reward_mean=0.0, reward_bound=0.0
96541: loss=0.000, reward_mean=0.0, reward_bound=0.0
96542: loss=0.000, reward_mean=0.1, reward_bound=0.0
96543: loss=0.000, reward_mean=0.0, reward_bound=0.0
96544: loss=0.000, reward_mean=0.0, reward_bound=0.0
96545: loss=0.000, reward_mean=0.1, reward_bound=0.0
96546: loss=0.000, reward_mean=0.1, reward_bound=0.0
96547: loss=0.000, reward_mean=0.1, reward_bound=0.0
96548: loss=0.000, reward_mean=0.1, reward_bound=0.0
96549: loss=0.000, reward_mean=0.1, reward_bound=0.0
96550: loss=0.000, reward_mean=0.0, reward_bou

96691: loss=0.000, reward_mean=0.0, reward_bound=0.0
96692: loss=0.000, reward_mean=0.1, reward_bound=0.0
96693: loss=0.000, reward_mean=0.1, reward_bound=0.0
96694: loss=0.000, reward_mean=0.0, reward_bound=0.0
96695: loss=0.000, reward_mean=0.0, reward_bound=0.0
96696: loss=0.000, reward_mean=0.1, reward_bound=0.0
96697: loss=0.000, reward_mean=0.0, reward_bound=0.0
96698: loss=0.000, reward_mean=0.0, reward_bound=0.0
96699: loss=0.000, reward_mean=0.0, reward_bound=0.0
96700: loss=0.000, reward_mean=0.1, reward_bound=0.0
96701: loss=0.000, reward_mean=0.1, reward_bound=0.0
96702: loss=0.000, reward_mean=0.2, reward_bound=0.0
96703: loss=0.000, reward_mean=0.1, reward_bound=0.0
96704: loss=0.000, reward_mean=0.1, reward_bound=0.0
96705: loss=0.000, reward_mean=0.0, reward_bound=0.0
96706: loss=0.000, reward_mean=0.0, reward_bound=0.0
96707: loss=0.000, reward_mean=0.1, reward_bound=0.0
96708: loss=0.000, reward_mean=0.1, reward_bound=0.0
96709: loss=0.000, reward_mean=0.1, reward_bou

96849: loss=0.000, reward_mean=0.0, reward_bound=0.0
96850: loss=0.000, reward_mean=0.1, reward_bound=0.0
96851: loss=0.000, reward_mean=0.1, reward_bound=0.0
96852: loss=0.000, reward_mean=0.1, reward_bound=0.0
96853: loss=0.000, reward_mean=0.0, reward_bound=0.0
96854: loss=0.000, reward_mean=0.1, reward_bound=0.0
96855: loss=0.000, reward_mean=0.1, reward_bound=0.0
96856: loss=0.000, reward_mean=0.0, reward_bound=0.0
96857: loss=0.000, reward_mean=0.1, reward_bound=0.0
96858: loss=0.000, reward_mean=0.0, reward_bound=0.0
96859: loss=0.000, reward_mean=0.1, reward_bound=0.0
96860: loss=0.000, reward_mean=0.1, reward_bound=0.0
96861: loss=0.000, reward_mean=0.0, reward_bound=0.0
96862: loss=0.000, reward_mean=0.0, reward_bound=0.0
96863: loss=0.000, reward_mean=0.1, reward_bound=0.0
96864: loss=0.000, reward_mean=0.0, reward_bound=0.0
96865: loss=0.000, reward_mean=0.1, reward_bound=0.0
96866: loss=0.000, reward_mean=0.1, reward_bound=0.0
96867: loss=0.000, reward_mean=0.0, reward_bou

97010: loss=0.000, reward_mean=0.1, reward_bound=0.0
97011: loss=0.000, reward_mean=0.1, reward_bound=0.0
97012: loss=0.000, reward_mean=0.1, reward_bound=0.0
97013: loss=0.000, reward_mean=0.1, reward_bound=0.0
97014: loss=0.000, reward_mean=0.0, reward_bound=0.0
97015: loss=0.000, reward_mean=0.0, reward_bound=0.0
97016: loss=0.000, reward_mean=0.1, reward_bound=0.0
97017: loss=0.000, reward_mean=0.1, reward_bound=0.0
97018: loss=0.000, reward_mean=0.1, reward_bound=0.0
97019: loss=0.000, reward_mean=0.0, reward_bound=0.0
97020: loss=0.000, reward_mean=0.1, reward_bound=0.0
97021: loss=0.000, reward_mean=0.0, reward_bound=0.0
97022: loss=0.000, reward_mean=0.1, reward_bound=0.0
97023: loss=0.000, reward_mean=0.2, reward_bound=0.0
97024: loss=0.000, reward_mean=0.1, reward_bound=0.0
97025: loss=0.000, reward_mean=0.0, reward_bound=0.0
97026: loss=0.000, reward_mean=0.0, reward_bound=0.0
97027: loss=0.000, reward_mean=0.0, reward_bound=0.0
97028: loss=0.000, reward_mean=0.0, reward_bou

97164: loss=0.000, reward_mean=0.1, reward_bound=0.0
97165: loss=0.000, reward_mean=0.0, reward_bound=0.0
97166: loss=0.000, reward_mean=0.2, reward_bound=0.0
97167: loss=0.000, reward_mean=0.1, reward_bound=0.0
97168: loss=0.000, reward_mean=0.0, reward_bound=0.0
97169: loss=0.000, reward_mean=0.1, reward_bound=0.0
97170: loss=0.000, reward_mean=0.0, reward_bound=0.0
97171: loss=0.000, reward_mean=0.1, reward_bound=0.0
97172: loss=0.000, reward_mean=0.1, reward_bound=0.0
97173: loss=0.000, reward_mean=0.0, reward_bound=0.0
97174: loss=0.000, reward_mean=0.0, reward_bound=0.0
97175: loss=0.000, reward_mean=0.0, reward_bound=0.0
97176: loss=0.000, reward_mean=0.0, reward_bound=0.0
97177: loss=0.000, reward_mean=0.1, reward_bound=0.0
97178: loss=0.000, reward_mean=0.1, reward_bound=0.0
97179: loss=0.000, reward_mean=0.0, reward_bound=0.0
97180: loss=0.000, reward_mean=0.0, reward_bound=0.0
97181: loss=0.000, reward_mean=0.0, reward_bound=0.0
97182: loss=0.000, reward_mean=0.1, reward_bou

97323: loss=0.000, reward_mean=0.0, reward_bound=0.0
97324: loss=0.000, reward_mean=0.1, reward_bound=0.0
97325: loss=0.000, reward_mean=0.1, reward_bound=0.0
97326: loss=0.000, reward_mean=0.0, reward_bound=0.0
97327: loss=0.000, reward_mean=0.1, reward_bound=0.0
97328: loss=0.000, reward_mean=0.0, reward_bound=0.0
97329: loss=0.000, reward_mean=0.0, reward_bound=0.0
97330: loss=0.000, reward_mean=0.0, reward_bound=0.0
97331: loss=0.000, reward_mean=0.0, reward_bound=0.0
97332: loss=0.000, reward_mean=0.1, reward_bound=0.0
97333: loss=0.000, reward_mean=0.0, reward_bound=0.0
97334: loss=0.000, reward_mean=0.0, reward_bound=0.0
97335: loss=0.000, reward_mean=0.1, reward_bound=0.0
97336: loss=0.000, reward_mean=0.1, reward_bound=0.0
97337: loss=0.000, reward_mean=0.1, reward_bound=0.0
97338: loss=0.000, reward_mean=0.1, reward_bound=0.0
97339: loss=0.000, reward_mean=0.0, reward_bound=0.0
97340: loss=0.000, reward_mean=0.1, reward_bound=0.0
97341: loss=0.000, reward_mean=0.0, reward_bou

97482: loss=0.000, reward_mean=0.1, reward_bound=0.0
97483: loss=0.000, reward_mean=0.1, reward_bound=0.0
97484: loss=0.000, reward_mean=0.1, reward_bound=0.0
97485: loss=0.000, reward_mean=0.1, reward_bound=0.0
97486: loss=0.000, reward_mean=0.0, reward_bound=0.0
97487: loss=0.000, reward_mean=0.1, reward_bound=0.0
97488: loss=0.000, reward_mean=0.0, reward_bound=0.0
97489: loss=0.000, reward_mean=0.0, reward_bound=0.0
97490: loss=0.000, reward_mean=0.1, reward_bound=0.0
97491: loss=0.000, reward_mean=0.1, reward_bound=0.0
97492: loss=0.000, reward_mean=0.0, reward_bound=0.0
97493: loss=0.000, reward_mean=0.1, reward_bound=0.0
97494: loss=0.000, reward_mean=0.1, reward_bound=0.0
97495: loss=0.000, reward_mean=0.1, reward_bound=0.0
97496: loss=0.000, reward_mean=0.0, reward_bound=0.0
97497: loss=0.000, reward_mean=0.0, reward_bound=0.0
97498: loss=0.000, reward_mean=0.1, reward_bound=0.0
97499: loss=0.000, reward_mean=0.1, reward_bound=0.0
97500: loss=0.000, reward_mean=0.1, reward_bou

97637: loss=0.000, reward_mean=0.0, reward_bound=0.0
97638: loss=0.000, reward_mean=0.1, reward_bound=0.0
97639: loss=0.000, reward_mean=0.1, reward_bound=0.0
97640: loss=0.000, reward_mean=0.0, reward_bound=0.0
97641: loss=0.000, reward_mean=0.1, reward_bound=0.0
97642: loss=0.000, reward_mean=0.0, reward_bound=0.0
97643: loss=0.000, reward_mean=0.0, reward_bound=0.0
97644: loss=0.000, reward_mean=0.1, reward_bound=0.0
97645: loss=0.000, reward_mean=0.1, reward_bound=0.0
97646: loss=0.000, reward_mean=0.1, reward_bound=0.0
97647: loss=0.000, reward_mean=0.0, reward_bound=0.0
97648: loss=0.000, reward_mean=0.0, reward_bound=0.0
97649: loss=0.000, reward_mean=0.1, reward_bound=0.0
97650: loss=0.000, reward_mean=0.1, reward_bound=0.0
97651: loss=0.000, reward_mean=0.0, reward_bound=0.0
97652: loss=0.000, reward_mean=0.1, reward_bound=0.0
97653: loss=0.000, reward_mean=0.0, reward_bound=0.0
97654: loss=0.000, reward_mean=0.1, reward_bound=0.0
97655: loss=0.000, reward_mean=0.1, reward_bou

97796: loss=0.000, reward_mean=0.0, reward_bound=0.0
97797: loss=0.000, reward_mean=0.1, reward_bound=0.0
97798: loss=0.000, reward_mean=0.1, reward_bound=0.0
97799: loss=0.000, reward_mean=0.0, reward_bound=0.0
97800: loss=0.000, reward_mean=0.1, reward_bound=0.0
97801: loss=0.000, reward_mean=0.1, reward_bound=0.0
97802: loss=0.000, reward_mean=0.1, reward_bound=0.0
97803: loss=0.000, reward_mean=0.0, reward_bound=0.0
97804: loss=0.000, reward_mean=0.1, reward_bound=0.0
97805: loss=0.000, reward_mean=0.0, reward_bound=0.0
97806: loss=0.000, reward_mean=0.1, reward_bound=0.0
97807: loss=0.000, reward_mean=0.1, reward_bound=0.0
97808: loss=0.000, reward_mean=0.1, reward_bound=0.0
97809: loss=0.000, reward_mean=0.1, reward_bound=0.0
97810: loss=0.000, reward_mean=0.1, reward_bound=0.0
97811: loss=0.000, reward_mean=0.0, reward_bound=0.0
97812: loss=0.000, reward_mean=0.0, reward_bound=0.0
97813: loss=0.000, reward_mean=0.1, reward_bound=0.0
97814: loss=0.000, reward_mean=0.0, reward_bou

97953: loss=0.000, reward_mean=0.1, reward_bound=0.0
97954: loss=0.000, reward_mean=0.0, reward_bound=0.0
97955: loss=0.000, reward_mean=0.0, reward_bound=0.0
97956: loss=0.000, reward_mean=0.0, reward_bound=0.0
97957: loss=0.000, reward_mean=0.0, reward_bound=0.0
97958: loss=0.000, reward_mean=0.0, reward_bound=0.0
97959: loss=0.000, reward_mean=0.0, reward_bound=0.0
97960: loss=0.000, reward_mean=0.1, reward_bound=0.0
97961: loss=0.000, reward_mean=0.1, reward_bound=0.0
97962: loss=0.000, reward_mean=0.1, reward_bound=0.0
97963: loss=0.000, reward_mean=0.1, reward_bound=0.0
97964: loss=0.000, reward_mean=0.0, reward_bound=0.0
97965: loss=0.000, reward_mean=0.0, reward_bound=0.0
97966: loss=0.000, reward_mean=0.1, reward_bound=0.0
97967: loss=0.000, reward_mean=0.1, reward_bound=0.0
97968: loss=0.000, reward_mean=0.0, reward_bound=0.0
97969: loss=0.000, reward_mean=0.0, reward_bound=0.0
97970: loss=0.000, reward_mean=0.1, reward_bound=0.0
97971: loss=0.000, reward_mean=0.0, reward_bou

98114: loss=0.000, reward_mean=0.0, reward_bound=0.0
98115: loss=0.000, reward_mean=0.1, reward_bound=0.0
98116: loss=0.000, reward_mean=0.1, reward_bound=0.0
98117: loss=0.000, reward_mean=0.0, reward_bound=0.0
98118: loss=0.000, reward_mean=0.1, reward_bound=0.0
98119: loss=0.000, reward_mean=0.1, reward_bound=0.0
98120: loss=0.000, reward_mean=0.1, reward_bound=0.0
98121: loss=0.000, reward_mean=0.1, reward_bound=0.0
98122: loss=0.000, reward_mean=0.0, reward_bound=0.0
98123: loss=0.000, reward_mean=0.0, reward_bound=0.0
98124: loss=0.000, reward_mean=0.1, reward_bound=0.0
98125: loss=0.000, reward_mean=0.1, reward_bound=0.0
98126: loss=0.000, reward_mean=0.0, reward_bound=0.0
98127: loss=0.000, reward_mean=0.1, reward_bound=0.0
98128: loss=0.000, reward_mean=0.1, reward_bound=0.0
98129: loss=0.000, reward_mean=0.0, reward_bound=0.0
98130: loss=0.000, reward_mean=0.0, reward_bound=0.0
98131: loss=0.000, reward_mean=0.0, reward_bound=0.0
98132: loss=0.000, reward_mean=0.0, reward_bou

98269: loss=0.000, reward_mean=0.0, reward_bound=0.0
98270: loss=0.000, reward_mean=0.1, reward_bound=0.0
98271: loss=0.000, reward_mean=0.0, reward_bound=0.0
98272: loss=0.000, reward_mean=0.1, reward_bound=0.0
98273: loss=0.000, reward_mean=0.0, reward_bound=0.0
98274: loss=0.000, reward_mean=0.0, reward_bound=0.0
98275: loss=0.000, reward_mean=0.0, reward_bound=0.0
98276: loss=0.000, reward_mean=0.1, reward_bound=0.0
98277: loss=0.000, reward_mean=0.1, reward_bound=0.0
98278: loss=0.000, reward_mean=0.1, reward_bound=0.0
98279: loss=0.000, reward_mean=0.0, reward_bound=0.0
98280: loss=0.000, reward_mean=0.0, reward_bound=0.0
98281: loss=0.000, reward_mean=0.1, reward_bound=0.0
98282: loss=0.000, reward_mean=0.0, reward_bound=0.0
98283: loss=0.000, reward_mean=0.1, reward_bound=0.0
98284: loss=0.000, reward_mean=0.1, reward_bound=0.0
98285: loss=0.000, reward_mean=0.0, reward_bound=0.0
98286: loss=0.000, reward_mean=0.1, reward_bound=0.0
98287: loss=0.000, reward_mean=0.1, reward_bou

98426: loss=0.000, reward_mean=0.0, reward_bound=0.0
98427: loss=0.000, reward_mean=0.1, reward_bound=0.0
98428: loss=0.000, reward_mean=0.0, reward_bound=0.0
98429: loss=0.000, reward_mean=0.1, reward_bound=0.0
98430: loss=0.000, reward_mean=0.1, reward_bound=0.0
98431: loss=0.000, reward_mean=0.1, reward_bound=0.0
98432: loss=0.000, reward_mean=0.0, reward_bound=0.0
98433: loss=0.000, reward_mean=0.1, reward_bound=0.0
98434: loss=0.000, reward_mean=0.1, reward_bound=0.0
98435: loss=0.000, reward_mean=0.0, reward_bound=0.0
98436: loss=0.000, reward_mean=0.1, reward_bound=0.0
98437: loss=0.000, reward_mean=0.1, reward_bound=0.0
98438: loss=0.000, reward_mean=0.2, reward_bound=0.0
98439: loss=0.000, reward_mean=0.0, reward_bound=0.0
98440: loss=0.000, reward_mean=0.1, reward_bound=0.0
98441: loss=0.000, reward_mean=0.0, reward_bound=0.0
98442: loss=0.000, reward_mean=0.0, reward_bound=0.0
98443: loss=0.000, reward_mean=0.1, reward_bound=0.0
98444: loss=0.000, reward_mean=0.0, reward_bou

98582: loss=0.000, reward_mean=0.1, reward_bound=0.0
98583: loss=0.000, reward_mean=0.1, reward_bound=0.0
98584: loss=0.000, reward_mean=0.0, reward_bound=0.0
98585: loss=0.000, reward_mean=0.0, reward_bound=0.0
98586: loss=0.000, reward_mean=0.1, reward_bound=0.0
98587: loss=0.000, reward_mean=0.0, reward_bound=0.0
98588: loss=0.000, reward_mean=0.1, reward_bound=0.0
98589: loss=0.000, reward_mean=0.0, reward_bound=0.0
98590: loss=0.000, reward_mean=0.0, reward_bound=0.0
98591: loss=0.000, reward_mean=0.0, reward_bound=0.0
98592: loss=0.000, reward_mean=0.1, reward_bound=0.0
98593: loss=0.000, reward_mean=0.1, reward_bound=0.0
98594: loss=0.000, reward_mean=0.0, reward_bound=0.0
98595: loss=0.000, reward_mean=0.1, reward_bound=0.0
98596: loss=0.000, reward_mean=0.1, reward_bound=0.0
98597: loss=0.000, reward_mean=0.1, reward_bound=0.0
98598: loss=0.000, reward_mean=0.0, reward_bound=0.0
98599: loss=0.000, reward_mean=0.0, reward_bound=0.0
98600: loss=0.000, reward_mean=0.0, reward_bou

98737: loss=0.000, reward_mean=0.1, reward_bound=0.0
98738: loss=0.000, reward_mean=0.0, reward_bound=0.0
98739: loss=0.000, reward_mean=0.0, reward_bound=0.0
98740: loss=0.000, reward_mean=0.1, reward_bound=0.0
98741: loss=0.000, reward_mean=0.0, reward_bound=0.0
98742: loss=0.000, reward_mean=0.1, reward_bound=0.0
98743: loss=0.000, reward_mean=0.0, reward_bound=0.0
98744: loss=0.000, reward_mean=0.1, reward_bound=0.0
98745: loss=0.000, reward_mean=0.0, reward_bound=0.0
98746: loss=0.000, reward_mean=0.0, reward_bound=0.0
98747: loss=0.000, reward_mean=0.1, reward_bound=0.0
98748: loss=0.000, reward_mean=0.0, reward_bound=0.0
98749: loss=0.000, reward_mean=0.1, reward_bound=0.0
98750: loss=0.000, reward_mean=0.0, reward_bound=0.0
98751: loss=0.000, reward_mean=0.1, reward_bound=0.0
98752: loss=0.000, reward_mean=0.0, reward_bound=0.0
98753: loss=0.000, reward_mean=0.1, reward_bound=0.0
98754: loss=0.000, reward_mean=0.0, reward_bound=0.0
98755: loss=0.000, reward_mean=0.1, reward_bou

98895: loss=0.000, reward_mean=0.1, reward_bound=0.0
98896: loss=0.000, reward_mean=0.1, reward_bound=0.0
98897: loss=0.000, reward_mean=0.0, reward_bound=0.0
98898: loss=0.000, reward_mean=0.1, reward_bound=0.0
98899: loss=0.000, reward_mean=0.2, reward_bound=0.0
98900: loss=0.000, reward_mean=0.1, reward_bound=0.0
98901: loss=0.000, reward_mean=0.0, reward_bound=0.0
98902: loss=0.000, reward_mean=0.0, reward_bound=0.0
98903: loss=0.000, reward_mean=0.1, reward_bound=0.0
98904: loss=0.000, reward_mean=0.1, reward_bound=0.0
98905: loss=0.000, reward_mean=0.0, reward_bound=0.0
98906: loss=0.000, reward_mean=0.1, reward_bound=0.0
98907: loss=0.000, reward_mean=0.0, reward_bound=0.0
98908: loss=0.000, reward_mean=0.2, reward_bound=0.0
98909: loss=0.000, reward_mean=0.1, reward_bound=0.0
98910: loss=0.000, reward_mean=0.1, reward_bound=0.0
98911: loss=0.000, reward_mean=0.1, reward_bound=0.0
98912: loss=0.000, reward_mean=0.0, reward_bound=0.0
98913: loss=0.000, reward_mean=0.1, reward_bou

99054: loss=0.000, reward_mean=0.1, reward_bound=0.0
99055: loss=0.000, reward_mean=0.1, reward_bound=0.0
99056: loss=0.000, reward_mean=0.1, reward_bound=0.0
99057: loss=0.000, reward_mean=0.1, reward_bound=0.0
99058: loss=0.000, reward_mean=0.1, reward_bound=0.0
99059: loss=0.000, reward_mean=0.1, reward_bound=0.0
99060: loss=0.000, reward_mean=0.1, reward_bound=0.0
99061: loss=0.000, reward_mean=0.1, reward_bound=0.0
99062: loss=0.000, reward_mean=0.1, reward_bound=0.0
99063: loss=0.000, reward_mean=0.1, reward_bound=0.0
99064: loss=0.000, reward_mean=0.0, reward_bound=0.0
99065: loss=0.000, reward_mean=0.1, reward_bound=0.0
99066: loss=0.000, reward_mean=0.0, reward_bound=0.0
99067: loss=0.000, reward_mean=0.1, reward_bound=0.0
99068: loss=0.000, reward_mean=0.0, reward_bound=0.0
99069: loss=0.000, reward_mean=0.1, reward_bound=0.0
99070: loss=0.000, reward_mean=0.0, reward_bound=0.0
99071: loss=0.000, reward_mean=0.1, reward_bound=0.0
99072: loss=0.000, reward_mean=0.1, reward_bou

99209: loss=0.000, reward_mean=0.0, reward_bound=0.0
99210: loss=0.000, reward_mean=0.1, reward_bound=0.0
99211: loss=0.000, reward_mean=0.0, reward_bound=0.0
99212: loss=0.000, reward_mean=0.2, reward_bound=0.0
99213: loss=0.000, reward_mean=0.0, reward_bound=0.0
99214: loss=0.000, reward_mean=0.0, reward_bound=0.0
99215: loss=0.000, reward_mean=0.1, reward_bound=0.0
99216: loss=0.000, reward_mean=0.1, reward_bound=0.0
99217: loss=0.000, reward_mean=0.1, reward_bound=0.0
99218: loss=0.000, reward_mean=0.1, reward_bound=0.0
99219: loss=0.000, reward_mean=0.1, reward_bound=0.0
99220: loss=0.000, reward_mean=0.1, reward_bound=0.0
99221: loss=0.000, reward_mean=0.0, reward_bound=0.0
99222: loss=0.000, reward_mean=0.0, reward_bound=0.0
99223: loss=0.000, reward_mean=0.1, reward_bound=0.0
99224: loss=0.000, reward_mean=0.1, reward_bound=0.0
99225: loss=0.000, reward_mean=0.0, reward_bound=0.0
99226: loss=0.000, reward_mean=0.0, reward_bound=0.0
99227: loss=0.000, reward_mean=0.0, reward_bou

99369: loss=0.000, reward_mean=0.0, reward_bound=0.0
99370: loss=0.000, reward_mean=0.0, reward_bound=0.0
99371: loss=0.000, reward_mean=0.0, reward_bound=0.0
99372: loss=0.000, reward_mean=0.0, reward_bound=0.0
99373: loss=0.000, reward_mean=0.1, reward_bound=0.0
99374: loss=0.000, reward_mean=0.1, reward_bound=0.0
99375: loss=0.000, reward_mean=0.1, reward_bound=0.0
99376: loss=0.000, reward_mean=0.0, reward_bound=0.0
99377: loss=0.000, reward_mean=0.1, reward_bound=0.0
99378: loss=0.000, reward_mean=0.1, reward_bound=0.0
99379: loss=0.000, reward_mean=0.0, reward_bound=0.0
99380: loss=0.000, reward_mean=0.1, reward_bound=0.0
99381: loss=0.000, reward_mean=0.0, reward_bound=0.0
99382: loss=0.000, reward_mean=0.0, reward_bound=0.0
99383: loss=0.000, reward_mean=0.0, reward_bound=0.0
99384: loss=0.000, reward_mean=0.0, reward_bound=0.0
99385: loss=0.000, reward_mean=0.2, reward_bound=0.0
99386: loss=0.000, reward_mean=0.0, reward_bound=0.0
99387: loss=0.000, reward_mean=0.0, reward_bou

99524: loss=0.000, reward_mean=0.1, reward_bound=0.0
99525: loss=0.000, reward_mean=0.1, reward_bound=0.0
99526: loss=0.000, reward_mean=0.1, reward_bound=0.0
99527: loss=0.000, reward_mean=0.1, reward_bound=0.0
99528: loss=0.000, reward_mean=0.0, reward_bound=0.0
99529: loss=0.000, reward_mean=0.0, reward_bound=0.0
99530: loss=0.000, reward_mean=0.0, reward_bound=0.0
99531: loss=0.000, reward_mean=0.1, reward_bound=0.0
99532: loss=0.000, reward_mean=0.0, reward_bound=0.0
99533: loss=0.000, reward_mean=0.0, reward_bound=0.0
99534: loss=0.000, reward_mean=0.1, reward_bound=0.0
99535: loss=0.000, reward_mean=0.1, reward_bound=0.0
99536: loss=0.000, reward_mean=0.1, reward_bound=0.0
99537: loss=0.000, reward_mean=0.1, reward_bound=0.0
99538: loss=0.000, reward_mean=0.0, reward_bound=0.0
99539: loss=0.000, reward_mean=0.0, reward_bound=0.0
99540: loss=0.000, reward_mean=0.0, reward_bound=0.0
99541: loss=0.000, reward_mean=0.1, reward_bound=0.0
99542: loss=0.000, reward_mean=0.1, reward_bou

99679: loss=0.000, reward_mean=0.1, reward_bound=0.0
99680: loss=0.000, reward_mean=0.1, reward_bound=0.0
99681: loss=0.000, reward_mean=0.0, reward_bound=0.0
99682: loss=0.000, reward_mean=0.0, reward_bound=0.0
99683: loss=0.000, reward_mean=0.0, reward_bound=0.0
99684: loss=0.000, reward_mean=0.1, reward_bound=0.0
99685: loss=0.000, reward_mean=0.1, reward_bound=0.0
99686: loss=0.000, reward_mean=0.0, reward_bound=0.0
99687: loss=0.000, reward_mean=0.0, reward_bound=0.0
99688: loss=0.000, reward_mean=0.0, reward_bound=0.0
99689: loss=0.000, reward_mean=0.0, reward_bound=0.0
99690: loss=0.000, reward_mean=0.1, reward_bound=0.0
99691: loss=0.000, reward_mean=0.1, reward_bound=0.0
99692: loss=0.000, reward_mean=0.0, reward_bound=0.0
99693: loss=0.000, reward_mean=0.0, reward_bound=0.0
99694: loss=0.000, reward_mean=0.0, reward_bound=0.0
99695: loss=0.000, reward_mean=0.0, reward_bound=0.0
99696: loss=0.000, reward_mean=0.0, reward_bound=0.0
99697: loss=0.000, reward_mean=0.1, reward_bou

99834: loss=0.000, reward_mean=0.1, reward_bound=0.0
99835: loss=0.000, reward_mean=0.0, reward_bound=0.0
99836: loss=0.000, reward_mean=0.1, reward_bound=0.0
99837: loss=0.000, reward_mean=0.1, reward_bound=0.0
99838: loss=0.000, reward_mean=0.1, reward_bound=0.0
99839: loss=0.000, reward_mean=0.0, reward_bound=0.0
99840: loss=0.000, reward_mean=0.0, reward_bound=0.0
99841: loss=0.000, reward_mean=0.0, reward_bound=0.0
99842: loss=0.000, reward_mean=0.1, reward_bound=0.0
99843: loss=0.000, reward_mean=0.0, reward_bound=0.0
99844: loss=0.000, reward_mean=0.1, reward_bound=0.0
99845: loss=0.000, reward_mean=0.1, reward_bound=0.0
99846: loss=0.000, reward_mean=0.1, reward_bound=0.0
99847: loss=0.000, reward_mean=0.1, reward_bound=0.0
99848: loss=0.000, reward_mean=0.0, reward_bound=0.0
99849: loss=0.000, reward_mean=0.0, reward_bound=0.0
99850: loss=0.000, reward_mean=0.1, reward_bound=0.0
99851: loss=0.000, reward_mean=0.0, reward_bound=0.0
99852: loss=0.000, reward_mean=0.1, reward_bou

99994: loss=0.000, reward_mean=0.1, reward_bound=0.0
99995: loss=0.000, reward_mean=0.0, reward_bound=0.0
99996: loss=0.000, reward_mean=0.2, reward_bound=0.0
99997: loss=0.000, reward_mean=0.1, reward_bound=0.0
99998: loss=0.000, reward_mean=0.1, reward_bound=0.0
99999: loss=0.000, reward_mean=0.0, reward_bound=0.0
100000: loss=0.000, reward_mean=0.1, reward_bound=0.0
100001: loss=0.000, reward_mean=0.0, reward_bound=0.0
100002: loss=0.000, reward_mean=0.2, reward_bound=0.0
100003: loss=0.000, reward_mean=0.1, reward_bound=0.0
100004: loss=0.000, reward_mean=0.0, reward_bound=0.0
100005: loss=0.000, reward_mean=0.0, reward_bound=0.0
100006: loss=0.000, reward_mean=0.0, reward_bound=0.0
100007: loss=0.000, reward_mean=0.0, reward_bound=0.0
100008: loss=0.000, reward_mean=0.0, reward_bound=0.0
100009: loss=0.000, reward_mean=0.0, reward_bound=0.0
100010: loss=0.000, reward_mean=0.1, reward_bound=0.0
100011: loss=0.000, reward_mean=0.1, reward_bound=0.0
100012: loss=0.000, reward_mean=0.

100148: loss=0.000, reward_mean=0.1, reward_bound=0.0
100149: loss=0.000, reward_mean=0.0, reward_bound=0.0
100150: loss=0.000, reward_mean=0.1, reward_bound=0.0
100151: loss=0.000, reward_mean=0.1, reward_bound=0.0
100152: loss=0.000, reward_mean=0.1, reward_bound=0.0
100153: loss=0.000, reward_mean=0.0, reward_bound=0.0
100154: loss=0.000, reward_mean=0.1, reward_bound=0.0
100155: loss=0.000, reward_mean=0.1, reward_bound=0.0
100156: loss=0.000, reward_mean=0.0, reward_bound=0.0
100157: loss=0.000, reward_mean=0.1, reward_bound=0.0
100158: loss=0.000, reward_mean=0.0, reward_bound=0.0
100159: loss=0.000, reward_mean=0.1, reward_bound=0.0
100160: loss=0.000, reward_mean=0.0, reward_bound=0.0
100161: loss=0.000, reward_mean=0.0, reward_bound=0.0
100162: loss=0.000, reward_mean=0.0, reward_bound=0.0
100163: loss=0.000, reward_mean=0.1, reward_bound=0.0
100164: loss=0.000, reward_mean=0.1, reward_bound=0.0
100165: loss=0.000, reward_mean=0.0, reward_bound=0.0
100166: loss=0.000, reward_m

100305: loss=0.000, reward_mean=0.0, reward_bound=0.0
100306: loss=0.000, reward_mean=0.1, reward_bound=0.0
100307: loss=0.000, reward_mean=0.1, reward_bound=0.0
100308: loss=0.000, reward_mean=0.2, reward_bound=0.0
100309: loss=0.000, reward_mean=0.0, reward_bound=0.0
100310: loss=0.000, reward_mean=0.0, reward_bound=0.0
100311: loss=0.000, reward_mean=0.0, reward_bound=0.0
100312: loss=0.000, reward_mean=0.1, reward_bound=0.0
100313: loss=0.000, reward_mean=0.0, reward_bound=0.0
100314: loss=0.000, reward_mean=0.1, reward_bound=0.0
100315: loss=0.000, reward_mean=0.1, reward_bound=0.0
100316: loss=0.000, reward_mean=0.1, reward_bound=0.0
100317: loss=0.000, reward_mean=0.1, reward_bound=0.0
100318: loss=0.000, reward_mean=0.0, reward_bound=0.0
100319: loss=0.000, reward_mean=0.1, reward_bound=0.0
100320: loss=0.000, reward_mean=0.1, reward_bound=0.0
100321: loss=0.000, reward_mean=0.0, reward_bound=0.0
100322: loss=0.000, reward_mean=0.0, reward_bound=0.0
100323: loss=0.000, reward_m

100458: loss=0.000, reward_mean=0.0, reward_bound=0.0
100459: loss=0.000, reward_mean=0.0, reward_bound=0.0
100460: loss=0.000, reward_mean=0.1, reward_bound=0.0
100461: loss=0.000, reward_mean=0.1, reward_bound=0.0
100462: loss=0.000, reward_mean=0.2, reward_bound=0.0
100463: loss=0.000, reward_mean=0.0, reward_bound=0.0
100464: loss=0.000, reward_mean=0.1, reward_bound=0.0
100465: loss=0.000, reward_mean=0.0, reward_bound=0.0
100466: loss=0.000, reward_mean=0.0, reward_bound=0.0
100467: loss=0.000, reward_mean=0.1, reward_bound=0.0
100468: loss=0.000, reward_mean=0.1, reward_bound=0.0
100469: loss=0.000, reward_mean=0.2, reward_bound=0.0
100470: loss=0.000, reward_mean=0.0, reward_bound=0.0
100471: loss=0.000, reward_mean=0.0, reward_bound=0.0
100472: loss=0.000, reward_mean=0.1, reward_bound=0.0
100473: loss=0.000, reward_mean=0.0, reward_bound=0.0
100474: loss=0.000, reward_mean=0.1, reward_bound=0.0
100475: loss=0.000, reward_mean=0.1, reward_bound=0.0
100476: loss=0.000, reward_m

100610: loss=0.000, reward_mean=0.0, reward_bound=0.0
100611: loss=0.000, reward_mean=0.1, reward_bound=0.0
100612: loss=0.000, reward_mean=0.1, reward_bound=0.0
100613: loss=0.000, reward_mean=0.1, reward_bound=0.0
100614: loss=0.000, reward_mean=0.1, reward_bound=0.0
100615: loss=0.000, reward_mean=0.0, reward_bound=0.0
100616: loss=0.000, reward_mean=0.1, reward_bound=0.0
100617: loss=0.000, reward_mean=0.0, reward_bound=0.0
100618: loss=0.000, reward_mean=0.0, reward_bound=0.0
100619: loss=0.000, reward_mean=0.1, reward_bound=0.0
100620: loss=0.000, reward_mean=0.1, reward_bound=0.0
100621: loss=0.000, reward_mean=0.1, reward_bound=0.0
100622: loss=0.000, reward_mean=0.0, reward_bound=0.0
100623: loss=0.000, reward_mean=0.2, reward_bound=0.0
100624: loss=0.000, reward_mean=0.1, reward_bound=0.0
100625: loss=0.000, reward_mean=0.0, reward_bound=0.0
100626: loss=0.000, reward_mean=0.0, reward_bound=0.0
100627: loss=0.000, reward_mean=0.1, reward_bound=0.0
100628: loss=0.000, reward_m

100768: loss=0.000, reward_mean=0.1, reward_bound=0.0
100769: loss=0.000, reward_mean=0.1, reward_bound=0.0
100770: loss=0.000, reward_mean=0.0, reward_bound=0.0
100771: loss=0.000, reward_mean=0.1, reward_bound=0.0
100772: loss=0.000, reward_mean=0.1, reward_bound=0.0
100773: loss=0.000, reward_mean=0.1, reward_bound=0.0
100774: loss=0.000, reward_mean=0.1, reward_bound=0.0
100775: loss=0.000, reward_mean=0.1, reward_bound=0.0
100776: loss=0.000, reward_mean=0.1, reward_bound=0.0
100777: loss=0.000, reward_mean=0.0, reward_bound=0.0
100778: loss=0.000, reward_mean=0.1, reward_bound=0.0
100779: loss=0.000, reward_mean=0.0, reward_bound=0.0
100780: loss=0.000, reward_mean=0.0, reward_bound=0.0
100781: loss=0.000, reward_mean=0.3, reward_bound=0.5
100782: loss=0.000, reward_mean=0.1, reward_bound=0.0
100783: loss=0.000, reward_mean=0.1, reward_bound=0.0
100784: loss=0.000, reward_mean=0.0, reward_bound=0.0
100785: loss=0.000, reward_mean=0.1, reward_bound=0.0
100786: loss=0.000, reward_m

100927: loss=0.000, reward_mean=0.0, reward_bound=0.0
100928: loss=0.000, reward_mean=0.2, reward_bound=0.0
100929: loss=0.000, reward_mean=0.1, reward_bound=0.0
100930: loss=0.000, reward_mean=0.1, reward_bound=0.0
100931: loss=0.000, reward_mean=0.0, reward_bound=0.0
100932: loss=0.000, reward_mean=0.1, reward_bound=0.0
100933: loss=0.000, reward_mean=0.0, reward_bound=0.0
100934: loss=0.000, reward_mean=0.1, reward_bound=0.0
100935: loss=0.000, reward_mean=0.1, reward_bound=0.0
100936: loss=0.000, reward_mean=0.0, reward_bound=0.0
100937: loss=0.000, reward_mean=0.0, reward_bound=0.0
100938: loss=0.000, reward_mean=0.1, reward_bound=0.0
100939: loss=0.000, reward_mean=0.2, reward_bound=0.0
100940: loss=0.000, reward_mean=0.1, reward_bound=0.0
100941: loss=0.000, reward_mean=0.1, reward_bound=0.0
100942: loss=0.000, reward_mean=0.1, reward_bound=0.0
100943: loss=0.000, reward_mean=0.0, reward_bound=0.0
100944: loss=0.000, reward_mean=0.0, reward_bound=0.0
100945: loss=0.000, reward_m

101080: loss=0.000, reward_mean=0.1, reward_bound=0.0
101081: loss=0.000, reward_mean=0.1, reward_bound=0.0
101082: loss=0.000, reward_mean=0.0, reward_bound=0.0
101083: loss=0.000, reward_mean=0.1, reward_bound=0.0
101084: loss=0.000, reward_mean=0.0, reward_bound=0.0
101085: loss=0.000, reward_mean=0.1, reward_bound=0.0
101086: loss=0.000, reward_mean=0.1, reward_bound=0.0
101087: loss=0.000, reward_mean=0.1, reward_bound=0.0
101088: loss=0.000, reward_mean=0.1, reward_bound=0.0
101089: loss=0.000, reward_mean=0.0, reward_bound=0.0
101090: loss=0.000, reward_mean=0.0, reward_bound=0.0
101091: loss=0.000, reward_mean=0.1, reward_bound=0.0
101092: loss=0.000, reward_mean=0.1, reward_bound=0.0
101093: loss=0.000, reward_mean=0.0, reward_bound=0.0
101094: loss=0.000, reward_mean=0.1, reward_bound=0.0
101095: loss=0.000, reward_mean=0.0, reward_bound=0.0
101096: loss=0.000, reward_mean=0.1, reward_bound=0.0
101097: loss=0.000, reward_mean=0.1, reward_bound=0.0
101098: loss=0.000, reward_m

101233: loss=0.000, reward_mean=0.1, reward_bound=0.0
101234: loss=0.000, reward_mean=0.1, reward_bound=0.0
101235: loss=0.000, reward_mean=0.1, reward_bound=0.0
101236: loss=0.000, reward_mean=0.0, reward_bound=0.0
101237: loss=0.000, reward_mean=0.1, reward_bound=0.0
101238: loss=0.000, reward_mean=0.1, reward_bound=0.0
101239: loss=0.000, reward_mean=0.0, reward_bound=0.0
101240: loss=0.000, reward_mean=0.0, reward_bound=0.0
101241: loss=0.000, reward_mean=0.1, reward_bound=0.0
101242: loss=0.000, reward_mean=0.0, reward_bound=0.0
101243: loss=0.000, reward_mean=0.1, reward_bound=0.0
101244: loss=0.000, reward_mean=0.1, reward_bound=0.0
101245: loss=0.000, reward_mean=0.1, reward_bound=0.0
101246: loss=0.000, reward_mean=0.0, reward_bound=0.0
101247: loss=0.000, reward_mean=0.1, reward_bound=0.0
101248: loss=0.000, reward_mean=0.1, reward_bound=0.0
101249: loss=0.000, reward_mean=0.1, reward_bound=0.0
101250: loss=0.000, reward_mean=0.1, reward_bound=0.0
101251: loss=0.000, reward_m

101390: loss=0.000, reward_mean=0.0, reward_bound=0.0
101391: loss=0.000, reward_mean=0.1, reward_bound=0.0
101392: loss=0.000, reward_mean=0.1, reward_bound=0.0
101393: loss=0.000, reward_mean=0.0, reward_bound=0.0
101394: loss=0.000, reward_mean=0.1, reward_bound=0.0
101395: loss=0.000, reward_mean=0.0, reward_bound=0.0
101396: loss=0.000, reward_mean=0.0, reward_bound=0.0
101397: loss=0.000, reward_mean=0.1, reward_bound=0.0
101398: loss=0.000, reward_mean=0.2, reward_bound=0.0
101399: loss=0.000, reward_mean=0.0, reward_bound=0.0
101400: loss=0.000, reward_mean=0.0, reward_bound=0.0
101401: loss=0.000, reward_mean=0.0, reward_bound=0.0
101402: loss=0.000, reward_mean=0.1, reward_bound=0.0
101403: loss=0.000, reward_mean=0.1, reward_bound=0.0
101404: loss=0.000, reward_mean=0.1, reward_bound=0.0
101405: loss=0.000, reward_mean=0.1, reward_bound=0.0
101406: loss=0.000, reward_mean=0.2, reward_bound=0.0
101407: loss=0.000, reward_mean=0.1, reward_bound=0.0
101408: loss=0.000, reward_m

101546: loss=0.000, reward_mean=0.0, reward_bound=0.0
101547: loss=0.000, reward_mean=0.1, reward_bound=0.0
101548: loss=0.000, reward_mean=0.1, reward_bound=0.0
101549: loss=0.000, reward_mean=0.0, reward_bound=0.0
101550: loss=0.000, reward_mean=0.0, reward_bound=0.0
101551: loss=0.000, reward_mean=0.1, reward_bound=0.0
101552: loss=0.000, reward_mean=0.1, reward_bound=0.0
101553: loss=0.000, reward_mean=0.2, reward_bound=0.0
101554: loss=0.000, reward_mean=0.1, reward_bound=0.0
101555: loss=0.000, reward_mean=0.0, reward_bound=0.0
101556: loss=0.000, reward_mean=0.1, reward_bound=0.0
101557: loss=0.000, reward_mean=0.0, reward_bound=0.0
101558: loss=0.000, reward_mean=0.0, reward_bound=0.0
101559: loss=0.000, reward_mean=0.1, reward_bound=0.0
101560: loss=0.000, reward_mean=0.0, reward_bound=0.0
101561: loss=0.000, reward_mean=0.0, reward_bound=0.0
101562: loss=0.000, reward_mean=0.1, reward_bound=0.0
101563: loss=0.000, reward_mean=0.0, reward_bound=0.0
101564: loss=0.000, reward_m

101699: loss=0.000, reward_mean=0.1, reward_bound=0.0
101700: loss=0.000, reward_mean=0.1, reward_bound=0.0
101701: loss=0.000, reward_mean=0.0, reward_bound=0.0
101702: loss=0.000, reward_mean=0.0, reward_bound=0.0
101703: loss=0.000, reward_mean=0.2, reward_bound=0.0
101704: loss=0.000, reward_mean=0.1, reward_bound=0.0
101705: loss=0.000, reward_mean=0.0, reward_bound=0.0
101706: loss=0.000, reward_mean=0.1, reward_bound=0.0
101707: loss=0.000, reward_mean=0.1, reward_bound=0.0
101708: loss=0.000, reward_mean=0.1, reward_bound=0.0
101709: loss=0.000, reward_mean=0.1, reward_bound=0.0
101710: loss=0.000, reward_mean=0.1, reward_bound=0.0
101711: loss=0.000, reward_mean=0.1, reward_bound=0.0
101712: loss=0.000, reward_mean=0.0, reward_bound=0.0
101713: loss=0.000, reward_mean=0.1, reward_bound=0.0
101714: loss=0.000, reward_mean=0.0, reward_bound=0.0
101715: loss=0.000, reward_mean=0.1, reward_bound=0.0
101716: loss=0.000, reward_mean=0.1, reward_bound=0.0
101717: loss=0.000, reward_m

101851: loss=0.000, reward_mean=0.1, reward_bound=0.0
101852: loss=0.000, reward_mean=0.1, reward_bound=0.0
101853: loss=0.000, reward_mean=0.1, reward_bound=0.0
101854: loss=0.000, reward_mean=0.1, reward_bound=0.0
101855: loss=0.000, reward_mean=0.1, reward_bound=0.0
101856: loss=0.000, reward_mean=0.1, reward_bound=0.0
101857: loss=0.000, reward_mean=0.1, reward_bound=0.0
101858: loss=0.000, reward_mean=0.1, reward_bound=0.0
101859: loss=0.000, reward_mean=0.0, reward_bound=0.0
101860: loss=0.000, reward_mean=0.1, reward_bound=0.0
101861: loss=0.000, reward_mean=0.0, reward_bound=0.0
101862: loss=0.000, reward_mean=0.1, reward_bound=0.0
101863: loss=0.000, reward_mean=0.1, reward_bound=0.0
101864: loss=0.000, reward_mean=0.0, reward_bound=0.0
101865: loss=0.000, reward_mean=0.1, reward_bound=0.0
101866: loss=0.000, reward_mean=0.0, reward_bound=0.0
101867: loss=0.000, reward_mean=0.0, reward_bound=0.0
101868: loss=0.000, reward_mean=0.2, reward_bound=0.0
101869: loss=0.000, reward_m

102007: loss=0.000, reward_mean=0.0, reward_bound=0.0
102008: loss=0.000, reward_mean=0.0, reward_bound=0.0
102009: loss=0.000, reward_mean=0.1, reward_bound=0.0
102010: loss=0.000, reward_mean=0.1, reward_bound=0.0
102011: loss=0.000, reward_mean=0.1, reward_bound=0.0
102012: loss=0.000, reward_mean=0.0, reward_bound=0.0
102013: loss=0.000, reward_mean=0.1, reward_bound=0.0
102014: loss=0.000, reward_mean=0.0, reward_bound=0.0
102015: loss=0.000, reward_mean=0.0, reward_bound=0.0
102016: loss=0.000, reward_mean=0.0, reward_bound=0.0
102017: loss=0.000, reward_mean=0.0, reward_bound=0.0
102018: loss=0.000, reward_mean=0.1, reward_bound=0.0
102019: loss=0.000, reward_mean=0.1, reward_bound=0.0
102020: loss=0.000, reward_mean=0.1, reward_bound=0.0
102021: loss=0.000, reward_mean=0.2, reward_bound=0.0
102022: loss=0.000, reward_mean=0.0, reward_bound=0.0
102023: loss=0.000, reward_mean=0.0, reward_bound=0.0
102024: loss=0.000, reward_mean=0.1, reward_bound=0.0
102025: loss=0.000, reward_m

102161: loss=0.000, reward_mean=0.0, reward_bound=0.0
102162: loss=0.000, reward_mean=0.0, reward_bound=0.0
102163: loss=0.000, reward_mean=0.1, reward_bound=0.0
102164: loss=0.000, reward_mean=0.0, reward_bound=0.0
102165: loss=0.000, reward_mean=0.0, reward_bound=0.0
102166: loss=0.000, reward_mean=0.1, reward_bound=0.0
102167: loss=0.000, reward_mean=0.1, reward_bound=0.0
102168: loss=0.000, reward_mean=0.0, reward_bound=0.0
102169: loss=0.000, reward_mean=0.0, reward_bound=0.0
102170: loss=0.000, reward_mean=0.0, reward_bound=0.0
102171: loss=0.000, reward_mean=0.0, reward_bound=0.0
102172: loss=0.000, reward_mean=0.0, reward_bound=0.0
102173: loss=0.000, reward_mean=0.1, reward_bound=0.0
102174: loss=0.000, reward_mean=0.2, reward_bound=0.0
102175: loss=0.000, reward_mean=0.0, reward_bound=0.0
102176: loss=0.000, reward_mean=0.1, reward_bound=0.0
102177: loss=0.000, reward_mean=0.1, reward_bound=0.0
102178: loss=0.000, reward_mean=0.0, reward_bound=0.0
102179: loss=0.000, reward_m

102316: loss=0.000, reward_mean=0.0, reward_bound=0.0
102317: loss=0.000, reward_mean=0.1, reward_bound=0.0
102318: loss=0.000, reward_mean=0.1, reward_bound=0.0
102319: loss=0.000, reward_mean=0.1, reward_bound=0.0
102320: loss=0.000, reward_mean=0.1, reward_bound=0.0
102321: loss=0.000, reward_mean=0.1, reward_bound=0.0
102322: loss=0.000, reward_mean=0.1, reward_bound=0.0
102323: loss=0.000, reward_mean=0.1, reward_bound=0.0
102324: loss=0.000, reward_mean=0.0, reward_bound=0.0
102325: loss=0.000, reward_mean=0.2, reward_bound=0.0
102326: loss=0.000, reward_mean=0.0, reward_bound=0.0
102327: loss=0.000, reward_mean=0.1, reward_bound=0.0
102328: loss=0.000, reward_mean=0.1, reward_bound=0.0
102329: loss=0.000, reward_mean=0.1, reward_bound=0.0
102330: loss=0.000, reward_mean=0.0, reward_bound=0.0
102331: loss=0.000, reward_mean=0.1, reward_bound=0.0
102332: loss=0.000, reward_mean=0.0, reward_bound=0.0
102333: loss=0.000, reward_mean=0.0, reward_bound=0.0
102334: loss=0.000, reward_m

102468: loss=0.000, reward_mean=0.1, reward_bound=0.0
102469: loss=0.000, reward_mean=0.1, reward_bound=0.0
102470: loss=0.000, reward_mean=0.1, reward_bound=0.0
102471: loss=0.000, reward_mean=0.0, reward_bound=0.0
102472: loss=0.000, reward_mean=0.0, reward_bound=0.0
102473: loss=0.000, reward_mean=0.0, reward_bound=0.0
102474: loss=0.000, reward_mean=0.1, reward_bound=0.0
102475: loss=0.000, reward_mean=0.2, reward_bound=0.0
102476: loss=0.000, reward_mean=0.1, reward_bound=0.0
102477: loss=0.000, reward_mean=0.0, reward_bound=0.0
102478: loss=0.000, reward_mean=0.1, reward_bound=0.0
102479: loss=0.000, reward_mean=0.2, reward_bound=0.0
102480: loss=0.000, reward_mean=0.1, reward_bound=0.0
102481: loss=0.000, reward_mean=0.0, reward_bound=0.0
102482: loss=0.000, reward_mean=0.1, reward_bound=0.0
102483: loss=0.000, reward_mean=0.1, reward_bound=0.0
102484: loss=0.000, reward_mean=0.1, reward_bound=0.0
102485: loss=0.000, reward_mean=0.1, reward_bound=0.0
102486: loss=0.000, reward_m

102621: loss=0.000, reward_mean=0.0, reward_bound=0.0
102622: loss=0.000, reward_mean=0.1, reward_bound=0.0
102623: loss=0.000, reward_mean=0.2, reward_bound=0.0
102624: loss=0.000, reward_mean=0.0, reward_bound=0.0
102625: loss=0.000, reward_mean=0.0, reward_bound=0.0
102626: loss=0.000, reward_mean=0.0, reward_bound=0.0
102627: loss=0.000, reward_mean=0.1, reward_bound=0.0
102628: loss=0.000, reward_mean=0.1, reward_bound=0.0
102629: loss=0.000, reward_mean=0.1, reward_bound=0.0
102630: loss=0.000, reward_mean=0.0, reward_bound=0.0
102631: loss=0.000, reward_mean=0.1, reward_bound=0.0
102632: loss=0.000, reward_mean=0.0, reward_bound=0.0
102633: loss=0.000, reward_mean=0.2, reward_bound=0.0
102634: loss=0.000, reward_mean=0.1, reward_bound=0.0
102635: loss=0.000, reward_mean=0.0, reward_bound=0.0
102636: loss=0.000, reward_mean=0.1, reward_bound=0.0
102637: loss=0.000, reward_mean=0.1, reward_bound=0.0
102638: loss=0.000, reward_mean=0.1, reward_bound=0.0
102639: loss=0.000, reward_m

102778: loss=0.000, reward_mean=0.1, reward_bound=0.0
102779: loss=0.000, reward_mean=0.1, reward_bound=0.0
102780: loss=0.000, reward_mean=0.1, reward_bound=0.0
102781: loss=0.000, reward_mean=0.0, reward_bound=0.0
102782: loss=0.000, reward_mean=0.1, reward_bound=0.0
102783: loss=0.000, reward_mean=0.1, reward_bound=0.0
102784: loss=0.000, reward_mean=0.0, reward_bound=0.0
102785: loss=0.000, reward_mean=0.0, reward_bound=0.0
102786: loss=0.000, reward_mean=0.1, reward_bound=0.0
102787: loss=0.000, reward_mean=0.1, reward_bound=0.0
102788: loss=0.000, reward_mean=0.1, reward_bound=0.0
102789: loss=0.000, reward_mean=0.0, reward_bound=0.0
102790: loss=0.000, reward_mean=0.0, reward_bound=0.0
102791: loss=0.000, reward_mean=0.0, reward_bound=0.0
102792: loss=0.000, reward_mean=0.0, reward_bound=0.0
102793: loss=0.000, reward_mean=0.1, reward_bound=0.0
102794: loss=0.000, reward_mean=0.1, reward_bound=0.0
102795: loss=0.000, reward_mean=0.1, reward_bound=0.0
102796: loss=0.000, reward_m

102934: loss=0.000, reward_mean=0.0, reward_bound=0.0
102935: loss=0.000, reward_mean=0.0, reward_bound=0.0
102936: loss=0.000, reward_mean=0.0, reward_bound=0.0
102937: loss=0.000, reward_mean=0.1, reward_bound=0.0
102938: loss=0.000, reward_mean=0.0, reward_bound=0.0
102939: loss=0.000, reward_mean=0.1, reward_bound=0.0
102940: loss=0.000, reward_mean=0.0, reward_bound=0.0
102941: loss=0.000, reward_mean=0.1, reward_bound=0.0
102942: loss=0.000, reward_mean=0.1, reward_bound=0.0
102943: loss=0.000, reward_mean=0.1, reward_bound=0.0
102944: loss=0.000, reward_mean=0.1, reward_bound=0.0
102945: loss=0.000, reward_mean=0.2, reward_bound=0.0
102946: loss=0.000, reward_mean=0.1, reward_bound=0.0
102947: loss=0.000, reward_mean=0.0, reward_bound=0.0
102948: loss=0.000, reward_mean=0.0, reward_bound=0.0
102949: loss=0.000, reward_mean=0.0, reward_bound=0.0
102950: loss=0.000, reward_mean=0.0, reward_bound=0.0
102951: loss=0.000, reward_mean=0.1, reward_bound=0.0
102952: loss=0.000, reward_m

103091: loss=0.000, reward_mean=0.1, reward_bound=0.0
103092: loss=0.000, reward_mean=0.0, reward_bound=0.0
103093: loss=0.000, reward_mean=0.1, reward_bound=0.0
103094: loss=0.000, reward_mean=0.0, reward_bound=0.0
103095: loss=0.000, reward_mean=0.0, reward_bound=0.0
103096: loss=0.000, reward_mean=0.0, reward_bound=0.0
103097: loss=0.000, reward_mean=0.1, reward_bound=0.0
103098: loss=0.000, reward_mean=0.1, reward_bound=0.0
103099: loss=0.000, reward_mean=0.1, reward_bound=0.0
103100: loss=0.000, reward_mean=0.1, reward_bound=0.0
103101: loss=0.000, reward_mean=0.1, reward_bound=0.0
103102: loss=0.000, reward_mean=0.1, reward_bound=0.0
103103: loss=0.000, reward_mean=0.1, reward_bound=0.0
103104: loss=0.000, reward_mean=0.1, reward_bound=0.0
103105: loss=0.000, reward_mean=0.1, reward_bound=0.0
103106: loss=0.000, reward_mean=0.0, reward_bound=0.0
103107: loss=0.000, reward_mean=0.0, reward_bound=0.0
103108: loss=0.000, reward_mean=0.0, reward_bound=0.0
103109: loss=0.000, reward_m

103249: loss=0.000, reward_mean=0.1, reward_bound=0.0
103250: loss=0.000, reward_mean=0.2, reward_bound=0.0
103251: loss=0.000, reward_mean=0.1, reward_bound=0.0
103252: loss=0.000, reward_mean=0.0, reward_bound=0.0
103253: loss=0.000, reward_mean=0.1, reward_bound=0.0
103254: loss=0.000, reward_mean=0.0, reward_bound=0.0
103255: loss=0.000, reward_mean=0.1, reward_bound=0.0
103256: loss=0.000, reward_mean=0.0, reward_bound=0.0
103257: loss=0.000, reward_mean=0.1, reward_bound=0.0
103258: loss=0.000, reward_mean=0.1, reward_bound=0.0
103259: loss=0.000, reward_mean=0.1, reward_bound=0.0
103260: loss=0.000, reward_mean=0.1, reward_bound=0.0
103261: loss=0.000, reward_mean=0.1, reward_bound=0.0
103262: loss=0.000, reward_mean=0.0, reward_bound=0.0
103263: loss=0.000, reward_mean=0.0, reward_bound=0.0
103264: loss=0.000, reward_mean=0.0, reward_bound=0.0
103265: loss=0.000, reward_mean=0.0, reward_bound=0.0
103266: loss=0.000, reward_mean=0.0, reward_bound=0.0
103267: loss=0.000, reward_m

103402: loss=0.000, reward_mean=0.1, reward_bound=0.0
103403: loss=0.000, reward_mean=0.0, reward_bound=0.0
103404: loss=0.000, reward_mean=0.1, reward_bound=0.0
103405: loss=0.000, reward_mean=0.1, reward_bound=0.0
103406: loss=0.000, reward_mean=0.0, reward_bound=0.0
103407: loss=0.000, reward_mean=0.1, reward_bound=0.0
103408: loss=0.000, reward_mean=0.0, reward_bound=0.0
103409: loss=0.000, reward_mean=0.0, reward_bound=0.0
103410: loss=0.000, reward_mean=0.0, reward_bound=0.0
103411: loss=0.000, reward_mean=0.0, reward_bound=0.0
103412: loss=0.000, reward_mean=0.0, reward_bound=0.0
103413: loss=0.000, reward_mean=0.1, reward_bound=0.0
103414: loss=0.000, reward_mean=0.0, reward_bound=0.0
103415: loss=0.000, reward_mean=0.1, reward_bound=0.0
103416: loss=0.000, reward_mean=0.1, reward_bound=0.0
103417: loss=0.000, reward_mean=0.1, reward_bound=0.0
103418: loss=0.000, reward_mean=0.1, reward_bound=0.0
103419: loss=0.000, reward_mean=0.0, reward_bound=0.0
103420: loss=0.000, reward_m

103554: loss=0.000, reward_mean=0.1, reward_bound=0.0
103555: loss=0.000, reward_mean=0.0, reward_bound=0.0
103556: loss=0.000, reward_mean=0.1, reward_bound=0.0
103557: loss=0.000, reward_mean=0.0, reward_bound=0.0
103558: loss=0.000, reward_mean=0.2, reward_bound=0.0
103559: loss=0.000, reward_mean=0.0, reward_bound=0.0
103560: loss=0.000, reward_mean=0.0, reward_bound=0.0
103561: loss=0.000, reward_mean=0.0, reward_bound=0.0
103562: loss=0.000, reward_mean=0.1, reward_bound=0.0
103563: loss=0.000, reward_mean=0.0, reward_bound=0.0
103564: loss=0.000, reward_mean=0.2, reward_bound=0.0
103565: loss=0.000, reward_mean=0.1, reward_bound=0.0
103566: loss=0.000, reward_mean=0.1, reward_bound=0.0
103567: loss=0.000, reward_mean=0.1, reward_bound=0.0
103568: loss=0.000, reward_mean=0.1, reward_bound=0.0
103569: loss=0.000, reward_mean=0.0, reward_bound=0.0
103570: loss=0.000, reward_mean=0.1, reward_bound=0.0
103571: loss=0.000, reward_mean=0.0, reward_bound=0.0
103572: loss=0.000, reward_m

103710: loss=0.000, reward_mean=0.1, reward_bound=0.0
103711: loss=0.000, reward_mean=0.0, reward_bound=0.0
103712: loss=0.000, reward_mean=0.0, reward_bound=0.0
103713: loss=0.000, reward_mean=0.1, reward_bound=0.0
103714: loss=0.000, reward_mean=0.0, reward_bound=0.0
103715: loss=0.000, reward_mean=0.0, reward_bound=0.0
103716: loss=0.000, reward_mean=0.1, reward_bound=0.0
103717: loss=0.000, reward_mean=0.0, reward_bound=0.0
103718: loss=0.000, reward_mean=0.1, reward_bound=0.0
103719: loss=0.000, reward_mean=0.1, reward_bound=0.0
103720: loss=0.000, reward_mean=0.0, reward_bound=0.0
103721: loss=0.000, reward_mean=0.1, reward_bound=0.0
103722: loss=0.000, reward_mean=0.1, reward_bound=0.0
103723: loss=0.000, reward_mean=0.0, reward_bound=0.0
103724: loss=0.000, reward_mean=0.0, reward_bound=0.0
103725: loss=0.000, reward_mean=0.0, reward_bound=0.0
103726: loss=0.000, reward_mean=0.2, reward_bound=0.0
103727: loss=0.000, reward_mean=0.1, reward_bound=0.0
103728: loss=0.000, reward_m

103866: loss=0.000, reward_mean=0.1, reward_bound=0.0
103867: loss=0.000, reward_mean=0.1, reward_bound=0.0
103868: loss=0.000, reward_mean=0.2, reward_bound=0.0
103869: loss=0.000, reward_mean=0.1, reward_bound=0.0
103870: loss=0.000, reward_mean=0.0, reward_bound=0.0
103871: loss=0.000, reward_mean=0.0, reward_bound=0.0
103872: loss=0.000, reward_mean=0.0, reward_bound=0.0
103873: loss=0.000, reward_mean=0.0, reward_bound=0.0
103874: loss=0.000, reward_mean=0.1, reward_bound=0.0
103875: loss=0.000, reward_mean=0.1, reward_bound=0.0
103876: loss=0.000, reward_mean=0.1, reward_bound=0.0
103877: loss=0.000, reward_mean=0.0, reward_bound=0.0
103878: loss=0.000, reward_mean=0.1, reward_bound=0.0
103879: loss=0.000, reward_mean=0.1, reward_bound=0.0
103880: loss=0.000, reward_mean=0.1, reward_bound=0.0
103881: loss=0.000, reward_mean=0.1, reward_bound=0.0
103882: loss=0.000, reward_mean=0.1, reward_bound=0.0
103883: loss=0.000, reward_mean=0.0, reward_bound=0.0
103884: loss=0.000, reward_m

104021: loss=0.000, reward_mean=0.1, reward_bound=0.0
104022: loss=0.000, reward_mean=0.0, reward_bound=0.0
104023: loss=0.000, reward_mean=0.0, reward_bound=0.0
104024: loss=0.000, reward_mean=0.0, reward_bound=0.0
104025: loss=0.000, reward_mean=0.1, reward_bound=0.0
104026: loss=0.000, reward_mean=0.0, reward_bound=0.0
104027: loss=0.000, reward_mean=0.1, reward_bound=0.0
104028: loss=0.000, reward_mean=0.1, reward_bound=0.0
104029: loss=0.000, reward_mean=0.0, reward_bound=0.0
104030: loss=0.000, reward_mean=0.1, reward_bound=0.0
104031: loss=0.000, reward_mean=0.1, reward_bound=0.0
104032: loss=0.000, reward_mean=0.1, reward_bound=0.0
104033: loss=0.000, reward_mean=0.1, reward_bound=0.0
104034: loss=0.000, reward_mean=0.1, reward_bound=0.0
104035: loss=0.000, reward_mean=0.0, reward_bound=0.0
104036: loss=0.000, reward_mean=0.1, reward_bound=0.0
104037: loss=0.000, reward_mean=0.1, reward_bound=0.0
104038: loss=0.000, reward_mean=0.0, reward_bound=0.0
104039: loss=0.000, reward_m

104174: loss=0.000, reward_mean=0.0, reward_bound=0.0
104175: loss=0.000, reward_mean=0.1, reward_bound=0.0
104176: loss=0.000, reward_mean=0.1, reward_bound=0.0
104177: loss=0.000, reward_mean=0.1, reward_bound=0.0
104178: loss=0.000, reward_mean=0.0, reward_bound=0.0
104179: loss=0.000, reward_mean=0.0, reward_bound=0.0
104180: loss=0.000, reward_mean=0.1, reward_bound=0.0
104181: loss=0.000, reward_mean=0.1, reward_bound=0.0
104182: loss=0.000, reward_mean=0.1, reward_bound=0.0
104183: loss=0.000, reward_mean=0.1, reward_bound=0.0
104184: loss=0.000, reward_mean=0.0, reward_bound=0.0
104185: loss=0.000, reward_mean=0.1, reward_bound=0.0
104186: loss=0.000, reward_mean=0.1, reward_bound=0.0
104187: loss=0.000, reward_mean=0.1, reward_bound=0.0
104188: loss=0.000, reward_mean=0.0, reward_bound=0.0
104189: loss=0.000, reward_mean=0.0, reward_bound=0.0
104190: loss=0.000, reward_mean=0.1, reward_bound=0.0
104191: loss=0.000, reward_mean=0.0, reward_bound=0.0
104192: loss=0.000, reward_m

104326: loss=0.000, reward_mean=0.0, reward_bound=0.0
104327: loss=0.000, reward_mean=0.0, reward_bound=0.0
104328: loss=0.000, reward_mean=0.0, reward_bound=0.0
104329: loss=0.000, reward_mean=0.1, reward_bound=0.0
104330: loss=0.000, reward_mean=0.0, reward_bound=0.0
104331: loss=0.000, reward_mean=0.1, reward_bound=0.0
104332: loss=0.000, reward_mean=0.0, reward_bound=0.0
104333: loss=0.000, reward_mean=0.1, reward_bound=0.0
104334: loss=0.000, reward_mean=0.0, reward_bound=0.0
104335: loss=0.000, reward_mean=0.1, reward_bound=0.0
104336: loss=0.000, reward_mean=0.1, reward_bound=0.0
104337: loss=0.000, reward_mean=0.1, reward_bound=0.0
104338: loss=0.000, reward_mean=0.0, reward_bound=0.0
104339: loss=0.000, reward_mean=0.0, reward_bound=0.0
104340: loss=0.000, reward_mean=0.0, reward_bound=0.0
104341: loss=0.000, reward_mean=0.0, reward_bound=0.0
104342: loss=0.000, reward_mean=0.2, reward_bound=0.0
104343: loss=0.000, reward_mean=0.1, reward_bound=0.0
104344: loss=0.000, reward_m

104480: loss=0.000, reward_mean=0.0, reward_bound=0.0
104481: loss=0.000, reward_mean=0.1, reward_bound=0.0
104482: loss=0.000, reward_mean=0.2, reward_bound=0.0
104483: loss=0.000, reward_mean=0.1, reward_bound=0.0
104484: loss=0.000, reward_mean=0.1, reward_bound=0.0
104485: loss=0.000, reward_mean=0.1, reward_bound=0.0
104486: loss=0.000, reward_mean=0.1, reward_bound=0.0
104487: loss=0.000, reward_mean=0.1, reward_bound=0.0
104488: loss=0.000, reward_mean=0.1, reward_bound=0.0
104489: loss=0.000, reward_mean=0.1, reward_bound=0.0
104490: loss=0.000, reward_mean=0.1, reward_bound=0.0
104491: loss=0.000, reward_mean=0.1, reward_bound=0.0
104492: loss=0.000, reward_mean=0.1, reward_bound=0.0
104493: loss=0.000, reward_mean=0.1, reward_bound=0.0
104494: loss=0.000, reward_mean=0.0, reward_bound=0.0
104495: loss=0.000, reward_mean=0.1, reward_bound=0.0
104496: loss=0.000, reward_mean=0.1, reward_bound=0.0
104497: loss=0.000, reward_mean=0.1, reward_bound=0.0
104498: loss=0.000, reward_m

104633: loss=0.000, reward_mean=0.1, reward_bound=0.0
104634: loss=0.000, reward_mean=0.1, reward_bound=0.0
104635: loss=0.000, reward_mean=0.0, reward_bound=0.0
104636: loss=0.000, reward_mean=0.1, reward_bound=0.0
104637: loss=0.000, reward_mean=0.1, reward_bound=0.0
104638: loss=0.000, reward_mean=0.0, reward_bound=0.0
104639: loss=0.000, reward_mean=0.1, reward_bound=0.0
104640: loss=0.000, reward_mean=0.1, reward_bound=0.0
104641: loss=0.000, reward_mean=0.1, reward_bound=0.0
104642: loss=0.000, reward_mean=0.0, reward_bound=0.0
104643: loss=0.000, reward_mean=0.1, reward_bound=0.0
104644: loss=0.000, reward_mean=0.0, reward_bound=0.0
104645: loss=0.000, reward_mean=0.1, reward_bound=0.0
104646: loss=0.000, reward_mean=0.0, reward_bound=0.0
104647: loss=0.000, reward_mean=0.1, reward_bound=0.0
104648: loss=0.000, reward_mean=0.0, reward_bound=0.0
104649: loss=0.000, reward_mean=0.1, reward_bound=0.0
104650: loss=0.000, reward_mean=0.0, reward_bound=0.0
104651: loss=0.000, reward_m

104789: loss=0.000, reward_mean=0.0, reward_bound=0.0
104790: loss=0.000, reward_mean=0.1, reward_bound=0.0
104791: loss=0.000, reward_mean=0.1, reward_bound=0.0
104792: loss=0.000, reward_mean=0.0, reward_bound=0.0
104793: loss=0.000, reward_mean=0.1, reward_bound=0.0
104794: loss=0.000, reward_mean=0.1, reward_bound=0.0
104795: loss=0.000, reward_mean=0.0, reward_bound=0.0
104796: loss=0.000, reward_mean=0.1, reward_bound=0.0
104797: loss=0.000, reward_mean=0.1, reward_bound=0.0
104798: loss=0.000, reward_mean=0.0, reward_bound=0.0
104799: loss=0.000, reward_mean=0.1, reward_bound=0.0
104800: loss=0.000, reward_mean=0.1, reward_bound=0.0
104801: loss=0.000, reward_mean=0.0, reward_bound=0.0
104802: loss=0.000, reward_mean=0.0, reward_bound=0.0
104803: loss=0.000, reward_mean=0.0, reward_bound=0.0
104804: loss=0.000, reward_mean=0.1, reward_bound=0.0
104805: loss=0.000, reward_mean=0.1, reward_bound=0.0
104806: loss=0.000, reward_mean=0.1, reward_bound=0.0
104807: loss=0.000, reward_m

104945: loss=0.000, reward_mean=0.0, reward_bound=0.0
104946: loss=0.000, reward_mean=0.1, reward_bound=0.0
104947: loss=0.000, reward_mean=0.1, reward_bound=0.0
104948: loss=0.000, reward_mean=0.1, reward_bound=0.0
104949: loss=0.000, reward_mean=0.1, reward_bound=0.0
104950: loss=0.000, reward_mean=0.2, reward_bound=0.0
104951: loss=0.000, reward_mean=0.1, reward_bound=0.0
104952: loss=0.000, reward_mean=0.1, reward_bound=0.0
104953: loss=0.000, reward_mean=0.1, reward_bound=0.0
104954: loss=0.000, reward_mean=0.1, reward_bound=0.0
104955: loss=0.000, reward_mean=0.0, reward_bound=0.0
104956: loss=0.000, reward_mean=0.0, reward_bound=0.0
104957: loss=0.000, reward_mean=0.2, reward_bound=0.0
104958: loss=0.000, reward_mean=0.1, reward_bound=0.0
104959: loss=0.000, reward_mean=0.1, reward_bound=0.0
104960: loss=0.000, reward_mean=0.1, reward_bound=0.0
104961: loss=0.000, reward_mean=0.1, reward_bound=0.0
104962: loss=0.000, reward_mean=0.0, reward_bound=0.0
104963: loss=0.000, reward_m

105102: loss=0.000, reward_mean=0.0, reward_bound=0.0
105103: loss=0.000, reward_mean=0.1, reward_bound=0.0
105104: loss=0.000, reward_mean=0.0, reward_bound=0.0
105105: loss=0.000, reward_mean=0.2, reward_bound=0.0
105106: loss=0.000, reward_mean=0.0, reward_bound=0.0
105107: loss=0.000, reward_mean=0.2, reward_bound=0.0
105108: loss=0.000, reward_mean=0.0, reward_bound=0.0
105109: loss=0.000, reward_mean=0.1, reward_bound=0.0
105110: loss=0.000, reward_mean=0.0, reward_bound=0.0
105111: loss=0.000, reward_mean=0.0, reward_bound=0.0
105112: loss=0.000, reward_mean=0.0, reward_bound=0.0
105113: loss=0.000, reward_mean=0.0, reward_bound=0.0
105114: loss=0.000, reward_mean=0.0, reward_bound=0.0
105115: loss=0.000, reward_mean=0.0, reward_bound=0.0
105116: loss=0.000, reward_mean=0.0, reward_bound=0.0
105117: loss=0.000, reward_mean=0.1, reward_bound=0.0
105118: loss=0.000, reward_mean=0.0, reward_bound=0.0
105119: loss=0.000, reward_mean=0.2, reward_bound=0.0
105120: loss=0.000, reward_m

105257: loss=0.000, reward_mean=0.1, reward_bound=0.0
105258: loss=0.000, reward_mean=0.1, reward_bound=0.0
105259: loss=0.000, reward_mean=0.0, reward_bound=0.0
105260: loss=0.000, reward_mean=0.1, reward_bound=0.0
105261: loss=0.000, reward_mean=0.0, reward_bound=0.0
105262: loss=0.000, reward_mean=0.0, reward_bound=0.0
105263: loss=0.000, reward_mean=0.1, reward_bound=0.0
105264: loss=0.000, reward_mean=0.0, reward_bound=0.0
105265: loss=0.000, reward_mean=0.0, reward_bound=0.0
105266: loss=0.000, reward_mean=0.0, reward_bound=0.0
105267: loss=0.000, reward_mean=0.0, reward_bound=0.0
105268: loss=0.000, reward_mean=0.1, reward_bound=0.0
105269: loss=0.000, reward_mean=0.0, reward_bound=0.0
105270: loss=0.000, reward_mean=0.0, reward_bound=0.0
105271: loss=0.000, reward_mean=0.0, reward_bound=0.0
105272: loss=0.000, reward_mean=0.1, reward_bound=0.0
105273: loss=0.000, reward_mean=0.1, reward_bound=0.0
105274: loss=0.000, reward_mean=0.1, reward_bound=0.0
105275: loss=0.000, reward_m

105410: loss=0.000, reward_mean=0.1, reward_bound=0.0
105411: loss=0.000, reward_mean=0.0, reward_bound=0.0
105412: loss=0.000, reward_mean=0.1, reward_bound=0.0
105413: loss=0.000, reward_mean=0.0, reward_bound=0.0
105414: loss=0.000, reward_mean=0.1, reward_bound=0.0
105415: loss=0.000, reward_mean=0.1, reward_bound=0.0
105416: loss=0.000, reward_mean=0.0, reward_bound=0.0
105417: loss=0.000, reward_mean=0.0, reward_bound=0.0
105418: loss=0.000, reward_mean=0.0, reward_bound=0.0
105419: loss=0.000, reward_mean=0.1, reward_bound=0.0
105420: loss=0.000, reward_mean=0.1, reward_bound=0.0
105421: loss=0.000, reward_mean=0.1, reward_bound=0.0
105422: loss=0.000, reward_mean=0.1, reward_bound=0.0
105423: loss=0.000, reward_mean=0.1, reward_bound=0.0
105424: loss=0.000, reward_mean=0.0, reward_bound=0.0
105425: loss=0.000, reward_mean=0.0, reward_bound=0.0
105426: loss=0.000, reward_mean=0.0, reward_bound=0.0
105427: loss=0.000, reward_mean=0.1, reward_bound=0.0
105428: loss=0.000, reward_m

105563: loss=0.000, reward_mean=0.0, reward_bound=0.0
105564: loss=0.000, reward_mean=0.1, reward_bound=0.0
105565: loss=0.000, reward_mean=0.1, reward_bound=0.0
105566: loss=0.000, reward_mean=0.0, reward_bound=0.0
105567: loss=0.000, reward_mean=0.1, reward_bound=0.0
105568: loss=0.000, reward_mean=0.1, reward_bound=0.0
105569: loss=0.000, reward_mean=0.1, reward_bound=0.0
105570: loss=0.000, reward_mean=0.0, reward_bound=0.0
105571: loss=0.000, reward_mean=0.1, reward_bound=0.0
105572: loss=0.000, reward_mean=0.2, reward_bound=0.0
105573: loss=0.000, reward_mean=0.1, reward_bound=0.0
105574: loss=0.000, reward_mean=0.0, reward_bound=0.0
105575: loss=0.000, reward_mean=0.1, reward_bound=0.0
105576: loss=0.000, reward_mean=0.0, reward_bound=0.0
105577: loss=0.000, reward_mean=0.1, reward_bound=0.0
105578: loss=0.000, reward_mean=0.1, reward_bound=0.0
105579: loss=0.000, reward_mean=0.1, reward_bound=0.0
105580: loss=0.000, reward_mean=0.1, reward_bound=0.0
105581: loss=0.000, reward_m

105717: loss=0.000, reward_mean=0.1, reward_bound=0.0
105718: loss=0.000, reward_mean=0.0, reward_bound=0.0
105719: loss=0.000, reward_mean=0.0, reward_bound=0.0
105720: loss=0.000, reward_mean=0.1, reward_bound=0.0
105721: loss=0.000, reward_mean=0.0, reward_bound=0.0
105722: loss=0.000, reward_mean=0.2, reward_bound=0.0
105723: loss=0.000, reward_mean=0.1, reward_bound=0.0
105724: loss=0.000, reward_mean=0.0, reward_bound=0.0
105725: loss=0.000, reward_mean=0.0, reward_bound=0.0
105726: loss=0.000, reward_mean=0.0, reward_bound=0.0
105727: loss=0.000, reward_mean=0.0, reward_bound=0.0
105728: loss=0.000, reward_mean=0.1, reward_bound=0.0
105729: loss=0.000, reward_mean=0.0, reward_bound=0.0
105730: loss=0.000, reward_mean=0.0, reward_bound=0.0
105731: loss=0.000, reward_mean=0.0, reward_bound=0.0
105732: loss=0.000, reward_mean=0.1, reward_bound=0.0
105733: loss=0.000, reward_mean=0.1, reward_bound=0.0
105734: loss=0.000, reward_mean=0.0, reward_bound=0.0
105735: loss=0.000, reward_m

105872: loss=0.000, reward_mean=0.1, reward_bound=0.0
105873: loss=0.000, reward_mean=0.0, reward_bound=0.0
105874: loss=0.000, reward_mean=0.1, reward_bound=0.0
105875: loss=0.000, reward_mean=0.1, reward_bound=0.0
105876: loss=0.000, reward_mean=0.0, reward_bound=0.0
105877: loss=0.000, reward_mean=0.0, reward_bound=0.0
105878: loss=0.000, reward_mean=0.0, reward_bound=0.0
105879: loss=0.000, reward_mean=0.0, reward_bound=0.0
105880: loss=0.000, reward_mean=0.0, reward_bound=0.0
105881: loss=0.000, reward_mean=0.0, reward_bound=0.0
105882: loss=0.000, reward_mean=0.0, reward_bound=0.0
105883: loss=0.000, reward_mean=0.0, reward_bound=0.0
105884: loss=0.000, reward_mean=0.1, reward_bound=0.0
105885: loss=0.000, reward_mean=0.0, reward_bound=0.0
105886: loss=0.000, reward_mean=0.3, reward_bound=0.5
105887: loss=0.000, reward_mean=0.1, reward_bound=0.0
105888: loss=0.000, reward_mean=0.0, reward_bound=0.0
105889: loss=0.000, reward_mean=0.1, reward_bound=0.0
105890: loss=0.000, reward_m

106029: loss=0.000, reward_mean=0.1, reward_bound=0.0
106030: loss=0.000, reward_mean=0.0, reward_bound=0.0
106031: loss=0.000, reward_mean=0.1, reward_bound=0.0
106032: loss=0.000, reward_mean=0.1, reward_bound=0.0
106033: loss=0.000, reward_mean=0.1, reward_bound=0.0
106034: loss=0.000, reward_mean=0.1, reward_bound=0.0
106035: loss=0.000, reward_mean=0.0, reward_bound=0.0
106036: loss=0.000, reward_mean=0.1, reward_bound=0.0
106037: loss=0.000, reward_mean=0.0, reward_bound=0.0
106038: loss=0.000, reward_mean=0.2, reward_bound=0.0
106039: loss=0.000, reward_mean=0.0, reward_bound=0.0
106040: loss=0.000, reward_mean=0.1, reward_bound=0.0
106041: loss=0.000, reward_mean=0.2, reward_bound=0.0
106042: loss=0.000, reward_mean=0.0, reward_bound=0.0
106043: loss=0.000, reward_mean=0.0, reward_bound=0.0
106044: loss=0.000, reward_mean=0.1, reward_bound=0.0
106045: loss=0.000, reward_mean=0.1, reward_bound=0.0
106046: loss=0.000, reward_mean=0.1, reward_bound=0.0
106047: loss=0.000, reward_m

106181: loss=0.000, reward_mean=0.1, reward_bound=0.0
106182: loss=0.000, reward_mean=0.1, reward_bound=0.0
106183: loss=0.000, reward_mean=0.0, reward_bound=0.0
106184: loss=0.000, reward_mean=0.1, reward_bound=0.0
106185: loss=0.000, reward_mean=0.0, reward_bound=0.0
106186: loss=0.000, reward_mean=0.1, reward_bound=0.0
106187: loss=0.000, reward_mean=0.1, reward_bound=0.0
106188: loss=0.000, reward_mean=0.1, reward_bound=0.0
106189: loss=0.000, reward_mean=0.0, reward_bound=0.0
106190: loss=0.000, reward_mean=0.1, reward_bound=0.0
106191: loss=0.000, reward_mean=0.1, reward_bound=0.0
106192: loss=0.000, reward_mean=0.0, reward_bound=0.0
106193: loss=0.000, reward_mean=0.0, reward_bound=0.0
106194: loss=0.000, reward_mean=0.1, reward_bound=0.0
106195: loss=0.000, reward_mean=0.0, reward_bound=0.0
106196: loss=0.000, reward_mean=0.0, reward_bound=0.0
106197: loss=0.000, reward_mean=0.1, reward_bound=0.0
106198: loss=0.000, reward_mean=0.1, reward_bound=0.0
106199: loss=0.000, reward_m

106334: loss=0.000, reward_mean=0.1, reward_bound=0.0
106335: loss=0.000, reward_mean=0.0, reward_bound=0.0
106336: loss=0.000, reward_mean=0.0, reward_bound=0.0
106337: loss=0.000, reward_mean=0.1, reward_bound=0.0
106338: loss=0.000, reward_mean=0.1, reward_bound=0.0
106339: loss=0.000, reward_mean=0.1, reward_bound=0.0
106340: loss=0.000, reward_mean=0.1, reward_bound=0.0
106341: loss=0.000, reward_mean=0.1, reward_bound=0.0
106342: loss=0.000, reward_mean=0.1, reward_bound=0.0
106343: loss=0.000, reward_mean=0.0, reward_bound=0.0
106344: loss=0.000, reward_mean=0.2, reward_bound=0.0
106345: loss=0.000, reward_mean=0.1, reward_bound=0.0
106346: loss=0.000, reward_mean=0.1, reward_bound=0.0
106347: loss=0.000, reward_mean=0.1, reward_bound=0.0
106348: loss=0.000, reward_mean=0.1, reward_bound=0.0
106349: loss=0.000, reward_mean=0.1, reward_bound=0.0
106350: loss=0.000, reward_mean=0.0, reward_bound=0.0
106351: loss=0.000, reward_mean=0.0, reward_bound=0.0
106352: loss=0.000, reward_m

106488: loss=0.000, reward_mean=0.1, reward_bound=0.0
106489: loss=0.000, reward_mean=0.1, reward_bound=0.0
106490: loss=0.000, reward_mean=0.0, reward_bound=0.0
106491: loss=0.000, reward_mean=0.1, reward_bound=0.0
106492: loss=0.000, reward_mean=0.0, reward_bound=0.0
106493: loss=0.000, reward_mean=0.0, reward_bound=0.0
106494: loss=0.000, reward_mean=0.1, reward_bound=0.0
106495: loss=0.000, reward_mean=0.1, reward_bound=0.0
106496: loss=0.000, reward_mean=0.0, reward_bound=0.0
106497: loss=0.000, reward_mean=0.0, reward_bound=0.0
106498: loss=0.000, reward_mean=0.1, reward_bound=0.0
106499: loss=0.000, reward_mean=0.1, reward_bound=0.0
106500: loss=0.000, reward_mean=0.0, reward_bound=0.0
106501: loss=0.000, reward_mean=0.1, reward_bound=0.0
106502: loss=0.000, reward_mean=0.0, reward_bound=0.0
106503: loss=0.000, reward_mean=0.1, reward_bound=0.0
106504: loss=0.000, reward_mean=0.1, reward_bound=0.0
106505: loss=0.000, reward_mean=0.1, reward_bound=0.0
106506: loss=0.000, reward_m

106647: loss=0.000, reward_mean=0.0, reward_bound=0.0
106648: loss=0.000, reward_mean=0.0, reward_bound=0.0
106649: loss=0.000, reward_mean=0.2, reward_bound=0.0
106650: loss=0.000, reward_mean=0.2, reward_bound=0.0
106651: loss=0.000, reward_mean=0.1, reward_bound=0.0
106652: loss=0.000, reward_mean=0.0, reward_bound=0.0
106653: loss=0.000, reward_mean=0.0, reward_bound=0.0
106654: loss=0.000, reward_mean=0.1, reward_bound=0.0
106655: loss=0.000, reward_mean=0.1, reward_bound=0.0
106656: loss=0.000, reward_mean=0.1, reward_bound=0.0
106657: loss=0.000, reward_mean=0.1, reward_bound=0.0
106658: loss=0.000, reward_mean=0.1, reward_bound=0.0
106659: loss=0.000, reward_mean=0.1, reward_bound=0.0
106660: loss=0.000, reward_mean=0.0, reward_bound=0.0
106661: loss=0.000, reward_mean=0.1, reward_bound=0.0
106662: loss=0.000, reward_mean=0.1, reward_bound=0.0
106663: loss=0.000, reward_mean=0.1, reward_bound=0.0
106664: loss=0.000, reward_mean=0.1, reward_bound=0.0
106665: loss=0.000, reward_m

106802: loss=0.000, reward_mean=0.0, reward_bound=0.0
106803: loss=0.000, reward_mean=0.1, reward_bound=0.0
106804: loss=0.000, reward_mean=0.1, reward_bound=0.0
106805: loss=0.000, reward_mean=0.0, reward_bound=0.0
106806: loss=0.000, reward_mean=0.0, reward_bound=0.0
106807: loss=0.000, reward_mean=0.1, reward_bound=0.0
106808: loss=0.000, reward_mean=0.0, reward_bound=0.0
106809: loss=0.000, reward_mean=0.1, reward_bound=0.0
106810: loss=0.000, reward_mean=0.0, reward_bound=0.0
106811: loss=0.000, reward_mean=0.1, reward_bound=0.0
106812: loss=0.000, reward_mean=0.1, reward_bound=0.0
106813: loss=0.000, reward_mean=0.1, reward_bound=0.0
106814: loss=0.000, reward_mean=0.0, reward_bound=0.0
106815: loss=0.000, reward_mean=0.0, reward_bound=0.0
106816: loss=0.000, reward_mean=0.0, reward_bound=0.0
106817: loss=0.000, reward_mean=0.0, reward_bound=0.0
106818: loss=0.000, reward_mean=0.1, reward_bound=0.0
106819: loss=0.000, reward_mean=0.0, reward_bound=0.0
106820: loss=0.000, reward_m

106957: loss=0.000, reward_mean=0.0, reward_bound=0.0
106958: loss=0.000, reward_mean=0.1, reward_bound=0.0
106959: loss=0.000, reward_mean=0.0, reward_bound=0.0
106960: loss=0.000, reward_mean=0.1, reward_bound=0.0
106961: loss=0.000, reward_mean=0.0, reward_bound=0.0
106962: loss=0.000, reward_mean=0.1, reward_bound=0.0
106963: loss=0.000, reward_mean=0.1, reward_bound=0.0
106964: loss=0.000, reward_mean=0.1, reward_bound=0.0
106965: loss=0.000, reward_mean=0.0, reward_bound=0.0
106966: loss=0.000, reward_mean=0.1, reward_bound=0.0
106967: loss=0.000, reward_mean=0.0, reward_bound=0.0
106968: loss=0.000, reward_mean=0.1, reward_bound=0.0
106969: loss=0.000, reward_mean=0.0, reward_bound=0.0
106970: loss=0.000, reward_mean=0.1, reward_bound=0.0
106971: loss=0.000, reward_mean=0.1, reward_bound=0.0
106972: loss=0.000, reward_mean=0.0, reward_bound=0.0
106973: loss=0.000, reward_mean=0.0, reward_bound=0.0
106974: loss=0.000, reward_mean=0.0, reward_bound=0.0
106975: loss=0.000, reward_m

107110: loss=0.000, reward_mean=0.1, reward_bound=0.0
107111: loss=0.000, reward_mean=0.1, reward_bound=0.0
107112: loss=0.000, reward_mean=0.1, reward_bound=0.0
107113: loss=0.000, reward_mean=0.1, reward_bound=0.0
107114: loss=0.000, reward_mean=0.0, reward_bound=0.0
107115: loss=0.000, reward_mean=0.1, reward_bound=0.0
107116: loss=0.000, reward_mean=0.0, reward_bound=0.0
107117: loss=0.000, reward_mean=0.1, reward_bound=0.0
107118: loss=0.000, reward_mean=0.1, reward_bound=0.0
107119: loss=0.000, reward_mean=0.1, reward_bound=0.0
107120: loss=0.000, reward_mean=0.0, reward_bound=0.0
107121: loss=0.000, reward_mean=0.1, reward_bound=0.0
107122: loss=0.000, reward_mean=0.0, reward_bound=0.0
107123: loss=0.000, reward_mean=0.1, reward_bound=0.0
107124: loss=0.000, reward_mean=0.1, reward_bound=0.0
107125: loss=0.000, reward_mean=0.1, reward_bound=0.0
107126: loss=0.000, reward_mean=0.0, reward_bound=0.0
107127: loss=0.000, reward_mean=0.1, reward_bound=0.0
107128: loss=0.000, reward_m

107265: loss=0.000, reward_mean=0.1, reward_bound=0.0
107266: loss=0.000, reward_mean=0.1, reward_bound=0.0
107267: loss=0.000, reward_mean=0.1, reward_bound=0.0
107268: loss=0.000, reward_mean=0.1, reward_bound=0.0
107269: loss=0.000, reward_mean=0.0, reward_bound=0.0
107270: loss=0.000, reward_mean=0.1, reward_bound=0.0
107271: loss=0.000, reward_mean=0.1, reward_bound=0.0
107272: loss=0.000, reward_mean=0.1, reward_bound=0.0
107273: loss=0.000, reward_mean=0.1, reward_bound=0.0
107274: loss=0.000, reward_mean=0.1, reward_bound=0.0
107275: loss=0.000, reward_mean=0.1, reward_bound=0.0
107276: loss=0.000, reward_mean=0.1, reward_bound=0.0
107277: loss=0.000, reward_mean=0.1, reward_bound=0.0
107278: loss=0.000, reward_mean=0.1, reward_bound=0.0
107279: loss=0.000, reward_mean=0.1, reward_bound=0.0
107280: loss=0.000, reward_mean=0.0, reward_bound=0.0
107281: loss=0.000, reward_mean=0.0, reward_bound=0.0
107282: loss=0.000, reward_mean=0.0, reward_bound=0.0
107283: loss=0.000, reward_m

107420: loss=0.000, reward_mean=0.1, reward_bound=0.0
107421: loss=0.000, reward_mean=0.1, reward_bound=0.0
107422: loss=0.000, reward_mean=0.1, reward_bound=0.0
107423: loss=0.000, reward_mean=0.0, reward_bound=0.0
107424: loss=0.000, reward_mean=0.1, reward_bound=0.0
107425: loss=0.000, reward_mean=0.0, reward_bound=0.0
107426: loss=0.000, reward_mean=0.1, reward_bound=0.0
107427: loss=0.000, reward_mean=0.0, reward_bound=0.0
107428: loss=0.000, reward_mean=0.0, reward_bound=0.0
107429: loss=0.000, reward_mean=0.0, reward_bound=0.0
107430: loss=0.000, reward_mean=0.1, reward_bound=0.0
107431: loss=0.000, reward_mean=0.1, reward_bound=0.0
107432: loss=0.000, reward_mean=0.0, reward_bound=0.0
107433: loss=0.000, reward_mean=0.1, reward_bound=0.0
107434: loss=0.000, reward_mean=0.1, reward_bound=0.0
107435: loss=0.000, reward_mean=0.0, reward_bound=0.0
107436: loss=0.000, reward_mean=0.0, reward_bound=0.0
107437: loss=0.000, reward_mean=0.0, reward_bound=0.0
107438: loss=0.000, reward_m

107578: loss=0.000, reward_mean=0.0, reward_bound=0.0
107579: loss=0.000, reward_mean=0.2, reward_bound=0.0
107580: loss=0.000, reward_mean=0.0, reward_bound=0.0
107581: loss=0.000, reward_mean=0.1, reward_bound=0.0
107582: loss=0.000, reward_mean=0.1, reward_bound=0.0
107583: loss=0.000, reward_mean=0.1, reward_bound=0.0
107584: loss=0.000, reward_mean=0.1, reward_bound=0.0
107585: loss=0.000, reward_mean=0.1, reward_bound=0.0
107586: loss=0.000, reward_mean=0.1, reward_bound=0.0
107587: loss=0.000, reward_mean=0.1, reward_bound=0.0
107588: loss=0.000, reward_mean=0.0, reward_bound=0.0
107589: loss=0.000, reward_mean=0.1, reward_bound=0.0
107590: loss=0.000, reward_mean=0.1, reward_bound=0.0
107591: loss=0.000, reward_mean=0.1, reward_bound=0.0
107592: loss=0.000, reward_mean=0.1, reward_bound=0.0
107593: loss=0.000, reward_mean=0.1, reward_bound=0.0
107594: loss=0.000, reward_mean=0.1, reward_bound=0.0
107595: loss=0.000, reward_mean=0.1, reward_bound=0.0
107596: loss=0.000, reward_m

107734: loss=0.000, reward_mean=0.0, reward_bound=0.0
107735: loss=0.000, reward_mean=0.1, reward_bound=0.0
107736: loss=0.000, reward_mean=0.0, reward_bound=0.0
107737: loss=0.000, reward_mean=0.1, reward_bound=0.0
107738: loss=0.000, reward_mean=0.0, reward_bound=0.0
107739: loss=0.000, reward_mean=0.0, reward_bound=0.0
107740: loss=0.000, reward_mean=0.1, reward_bound=0.0
107741: loss=0.000, reward_mean=0.0, reward_bound=0.0
107742: loss=0.000, reward_mean=0.0, reward_bound=0.0
107743: loss=0.000, reward_mean=0.1, reward_bound=0.0
107744: loss=0.000, reward_mean=0.0, reward_bound=0.0
107745: loss=0.000, reward_mean=0.0, reward_bound=0.0
107746: loss=0.000, reward_mean=0.0, reward_bound=0.0
107747: loss=0.000, reward_mean=0.0, reward_bound=0.0
107748: loss=0.000, reward_mean=0.1, reward_bound=0.0
107749: loss=0.000, reward_mean=0.1, reward_bound=0.0
107750: loss=0.000, reward_mean=0.0, reward_bound=0.0
107751: loss=0.000, reward_mean=0.0, reward_bound=0.0
107752: loss=0.000, reward_m

107887: loss=0.000, reward_mean=0.1, reward_bound=0.0
107888: loss=0.000, reward_mean=0.0, reward_bound=0.0
107889: loss=0.000, reward_mean=0.0, reward_bound=0.0
107890: loss=0.000, reward_mean=0.0, reward_bound=0.0
107891: loss=0.000, reward_mean=0.1, reward_bound=0.0
107892: loss=0.000, reward_mean=0.0, reward_bound=0.0
107893: loss=0.000, reward_mean=0.1, reward_bound=0.0
107894: loss=0.000, reward_mean=0.1, reward_bound=0.0
107895: loss=0.000, reward_mean=0.1, reward_bound=0.0
107896: loss=0.000, reward_mean=0.1, reward_bound=0.0
107897: loss=0.000, reward_mean=0.1, reward_bound=0.0
107898: loss=0.000, reward_mean=0.1, reward_bound=0.0
107899: loss=0.000, reward_mean=0.0, reward_bound=0.0
107900: loss=0.000, reward_mean=0.0, reward_bound=0.0
107901: loss=0.000, reward_mean=0.1, reward_bound=0.0
107902: loss=0.000, reward_mean=0.0, reward_bound=0.0
107903: loss=0.000, reward_mean=0.0, reward_bound=0.0
107904: loss=0.000, reward_mean=0.1, reward_bound=0.0
107905: loss=0.000, reward_m

108041: loss=0.000, reward_mean=0.0, reward_bound=0.0
108042: loss=0.000, reward_mean=0.1, reward_bound=0.0
108043: loss=0.000, reward_mean=0.1, reward_bound=0.0
108044: loss=0.000, reward_mean=0.2, reward_bound=0.0
108045: loss=0.000, reward_mean=0.0, reward_bound=0.0
108046: loss=0.000, reward_mean=0.2, reward_bound=0.0
108047: loss=0.000, reward_mean=0.0, reward_bound=0.0
108048: loss=0.000, reward_mean=0.0, reward_bound=0.0
108049: loss=0.000, reward_mean=0.1, reward_bound=0.0
108050: loss=0.000, reward_mean=0.0, reward_bound=0.0
108051: loss=0.000, reward_mean=0.1, reward_bound=0.0
108052: loss=0.000, reward_mean=0.0, reward_bound=0.0
108053: loss=0.000, reward_mean=0.1, reward_bound=0.0
108054: loss=0.000, reward_mean=0.1, reward_bound=0.0
108055: loss=0.000, reward_mean=0.1, reward_bound=0.0
108056: loss=0.000, reward_mean=0.0, reward_bound=0.0
108057: loss=0.000, reward_mean=0.1, reward_bound=0.0
108058: loss=0.000, reward_mean=0.0, reward_bound=0.0
108059: loss=0.000, reward_m

108199: loss=0.000, reward_mean=0.1, reward_bound=0.0
108200: loss=0.000, reward_mean=0.1, reward_bound=0.0
108201: loss=0.000, reward_mean=0.1, reward_bound=0.0
108202: loss=0.000, reward_mean=0.0, reward_bound=0.0
108203: loss=0.000, reward_mean=0.0, reward_bound=0.0
108204: loss=0.000, reward_mean=0.1, reward_bound=0.0
108205: loss=0.000, reward_mean=0.1, reward_bound=0.0
108206: loss=0.000, reward_mean=0.1, reward_bound=0.0
108207: loss=0.000, reward_mean=0.1, reward_bound=0.0
108208: loss=0.000, reward_mean=0.0, reward_bound=0.0
108209: loss=0.000, reward_mean=0.1, reward_bound=0.0
108210: loss=0.000, reward_mean=0.1, reward_bound=0.0
108211: loss=0.000, reward_mean=0.0, reward_bound=0.0
108212: loss=0.000, reward_mean=0.0, reward_bound=0.0
108213: loss=0.000, reward_mean=0.0, reward_bound=0.0
108214: loss=0.000, reward_mean=0.1, reward_bound=0.0
108215: loss=0.000, reward_mean=0.0, reward_bound=0.0
108216: loss=0.000, reward_mean=0.1, reward_bound=0.0
108217: loss=0.000, reward_m

108352: loss=0.000, reward_mean=0.2, reward_bound=0.0
108353: loss=0.000, reward_mean=0.0, reward_bound=0.0
108354: loss=0.000, reward_mean=0.0, reward_bound=0.0
108355: loss=0.000, reward_mean=0.1, reward_bound=0.0
108356: loss=0.000, reward_mean=0.2, reward_bound=0.0
108357: loss=0.000, reward_mean=0.0, reward_bound=0.0
108358: loss=0.000, reward_mean=0.0, reward_bound=0.0
108359: loss=0.000, reward_mean=0.0, reward_bound=0.0
108360: loss=0.000, reward_mean=0.2, reward_bound=0.0
108361: loss=0.000, reward_mean=0.0, reward_bound=0.0
108362: loss=0.000, reward_mean=0.0, reward_bound=0.0
108363: loss=0.000, reward_mean=0.1, reward_bound=0.0
108364: loss=0.000, reward_mean=0.1, reward_bound=0.0
108365: loss=0.000, reward_mean=0.1, reward_bound=0.0
108366: loss=0.000, reward_mean=0.0, reward_bound=0.0
108367: loss=0.000, reward_mean=0.1, reward_bound=0.0
108368: loss=0.000, reward_mean=0.0, reward_bound=0.0
108369: loss=0.000, reward_mean=0.1, reward_bound=0.0
108370: loss=0.000, reward_m

108509: loss=0.000, reward_mean=0.1, reward_bound=0.0
108510: loss=0.000, reward_mean=0.1, reward_bound=0.0
108511: loss=0.000, reward_mean=0.0, reward_bound=0.0
108512: loss=0.000, reward_mean=0.1, reward_bound=0.0
108513: loss=0.000, reward_mean=0.1, reward_bound=0.0
108514: loss=0.000, reward_mean=0.1, reward_bound=0.0
108515: loss=0.000, reward_mean=0.0, reward_bound=0.0
108516: loss=0.000, reward_mean=0.1, reward_bound=0.0
108517: loss=0.000, reward_mean=0.0, reward_bound=0.0
108518: loss=0.000, reward_mean=0.1, reward_bound=0.0
108519: loss=0.000, reward_mean=0.0, reward_bound=0.0
108520: loss=0.000, reward_mean=0.1, reward_bound=0.0
108521: loss=0.000, reward_mean=0.0, reward_bound=0.0
108522: loss=0.000, reward_mean=0.0, reward_bound=0.0
108523: loss=0.000, reward_mean=0.1, reward_bound=0.0
108524: loss=0.000, reward_mean=0.0, reward_bound=0.0
108525: loss=0.000, reward_mean=0.1, reward_bound=0.0
108526: loss=0.000, reward_mean=0.0, reward_bound=0.0
108527: loss=0.000, reward_m

108663: loss=0.000, reward_mean=0.1, reward_bound=0.0
108664: loss=0.000, reward_mean=0.1, reward_bound=0.0
108665: loss=0.000, reward_mean=0.1, reward_bound=0.0
108666: loss=0.000, reward_mean=0.0, reward_bound=0.0
108667: loss=0.000, reward_mean=0.1, reward_bound=0.0
108668: loss=0.000, reward_mean=0.0, reward_bound=0.0
108669: loss=0.000, reward_mean=0.0, reward_bound=0.0
108670: loss=0.000, reward_mean=0.1, reward_bound=0.0
108671: loss=0.000, reward_mean=0.0, reward_bound=0.0
108672: loss=0.000, reward_mean=0.1, reward_bound=0.0
108673: loss=0.000, reward_mean=0.1, reward_bound=0.0
108674: loss=0.000, reward_mean=0.0, reward_bound=0.0
108675: loss=0.000, reward_mean=0.1, reward_bound=0.0
108676: loss=0.000, reward_mean=0.1, reward_bound=0.0
108677: loss=0.000, reward_mean=0.0, reward_bound=0.0
108678: loss=0.000, reward_mean=0.1, reward_bound=0.0
108679: loss=0.000, reward_mean=0.0, reward_bound=0.0
108680: loss=0.000, reward_mean=0.1, reward_bound=0.0
108681: loss=0.000, reward_m

108821: loss=0.000, reward_mean=0.1, reward_bound=0.0
108822: loss=0.000, reward_mean=0.0, reward_bound=0.0
108823: loss=0.000, reward_mean=0.1, reward_bound=0.0
108824: loss=0.000, reward_mean=0.1, reward_bound=0.0
108825: loss=0.000, reward_mean=0.1, reward_bound=0.0
108826: loss=0.000, reward_mean=0.0, reward_bound=0.0
108827: loss=0.000, reward_mean=0.1, reward_bound=0.0
108828: loss=0.000, reward_mean=0.1, reward_bound=0.0
108829: loss=0.000, reward_mean=0.0, reward_bound=0.0
108830: loss=0.000, reward_mean=0.1, reward_bound=0.0
108831: loss=0.000, reward_mean=0.1, reward_bound=0.0
108832: loss=0.000, reward_mean=0.0, reward_bound=0.0
108833: loss=0.000, reward_mean=0.0, reward_bound=0.0
108834: loss=0.000, reward_mean=0.0, reward_bound=0.0
108835: loss=0.000, reward_mean=0.0, reward_bound=0.0
108836: loss=0.000, reward_mean=0.1, reward_bound=0.0
108837: loss=0.000, reward_mean=0.0, reward_bound=0.0
108838: loss=0.000, reward_mean=0.1, reward_bound=0.0
108839: loss=0.000, reward_m

108974: loss=0.000, reward_mean=0.1, reward_bound=0.0
108975: loss=0.000, reward_mean=0.0, reward_bound=0.0
108976: loss=0.000, reward_mean=0.1, reward_bound=0.0
108977: loss=0.000, reward_mean=0.2, reward_bound=0.0
108978: loss=0.000, reward_mean=0.1, reward_bound=0.0
108979: loss=0.000, reward_mean=0.0, reward_bound=0.0
108980: loss=0.000, reward_mean=0.1, reward_bound=0.0
108981: loss=0.000, reward_mean=0.1, reward_bound=0.0
108982: loss=0.000, reward_mean=0.0, reward_bound=0.0
108983: loss=0.000, reward_mean=0.0, reward_bound=0.0
108984: loss=0.000, reward_mean=0.1, reward_bound=0.0
108985: loss=0.000, reward_mean=0.1, reward_bound=0.0
108986: loss=0.000, reward_mean=0.1, reward_bound=0.0
108987: loss=0.000, reward_mean=0.1, reward_bound=0.0
108988: loss=0.000, reward_mean=0.0, reward_bound=0.0
108989: loss=0.000, reward_mean=0.1, reward_bound=0.0
108990: loss=0.000, reward_mean=0.1, reward_bound=0.0
108991: loss=0.000, reward_mean=0.1, reward_bound=0.0
108992: loss=0.000, reward_m

109126: loss=0.000, reward_mean=0.0, reward_bound=0.0
109127: loss=0.000, reward_mean=0.1, reward_bound=0.0
109128: loss=0.000, reward_mean=0.1, reward_bound=0.0
109129: loss=0.000, reward_mean=0.1, reward_bound=0.0
109130: loss=0.000, reward_mean=0.1, reward_bound=0.0
109131: loss=0.000, reward_mean=0.0, reward_bound=0.0
109132: loss=0.000, reward_mean=0.1, reward_bound=0.0
109133: loss=0.000, reward_mean=0.1, reward_bound=0.0
109134: loss=0.000, reward_mean=0.0, reward_bound=0.0
109135: loss=0.000, reward_mean=0.1, reward_bound=0.0
109136: loss=0.000, reward_mean=0.2, reward_bound=0.0
109137: loss=0.000, reward_mean=0.0, reward_bound=0.0
109138: loss=0.000, reward_mean=0.0, reward_bound=0.0
109139: loss=0.000, reward_mean=0.1, reward_bound=0.0
109140: loss=0.000, reward_mean=0.1, reward_bound=0.0
109141: loss=0.000, reward_mean=0.0, reward_bound=0.0
109142: loss=0.000, reward_mean=0.2, reward_bound=0.0
109143: loss=0.000, reward_mean=0.1, reward_bound=0.0
109144: loss=0.000, reward_m

109280: loss=0.000, reward_mean=0.1, reward_bound=0.0
109281: loss=0.000, reward_mean=0.1, reward_bound=0.0
109282: loss=0.000, reward_mean=0.0, reward_bound=0.0
109283: loss=0.000, reward_mean=0.1, reward_bound=0.0
109284: loss=0.000, reward_mean=0.0, reward_bound=0.0
109285: loss=0.000, reward_mean=0.0, reward_bound=0.0
109286: loss=0.000, reward_mean=0.2, reward_bound=0.0
109287: loss=0.000, reward_mean=0.0, reward_bound=0.0
109288: loss=0.000, reward_mean=0.1, reward_bound=0.0
109289: loss=0.000, reward_mean=0.1, reward_bound=0.0
109290: loss=0.000, reward_mean=0.1, reward_bound=0.0
109291: loss=0.000, reward_mean=0.0, reward_bound=0.0
109292: loss=0.000, reward_mean=0.0, reward_bound=0.0
109293: loss=0.000, reward_mean=0.0, reward_bound=0.0
109294: loss=0.000, reward_mean=0.1, reward_bound=0.0
109295: loss=0.000, reward_mean=0.0, reward_bound=0.0
109296: loss=0.000, reward_mean=0.0, reward_bound=0.0
109297: loss=0.000, reward_mean=0.1, reward_bound=0.0
109298: loss=0.000, reward_m

109437: loss=0.000, reward_mean=0.1, reward_bound=0.0
109438: loss=0.000, reward_mean=0.1, reward_bound=0.0
109439: loss=0.000, reward_mean=0.1, reward_bound=0.0
109440: loss=0.000, reward_mean=0.1, reward_bound=0.0
109441: loss=0.000, reward_mean=0.1, reward_bound=0.0
109442: loss=0.000, reward_mean=0.0, reward_bound=0.0
109443: loss=0.000, reward_mean=0.0, reward_bound=0.0
109444: loss=0.000, reward_mean=0.1, reward_bound=0.0
109445: loss=0.000, reward_mean=0.1, reward_bound=0.0
109446: loss=0.000, reward_mean=0.1, reward_bound=0.0
109447: loss=0.000, reward_mean=0.0, reward_bound=0.0
109448: loss=0.000, reward_mean=0.1, reward_bound=0.0
109449: loss=0.000, reward_mean=0.1, reward_bound=0.0
109450: loss=0.000, reward_mean=0.1, reward_bound=0.0
109451: loss=0.000, reward_mean=0.0, reward_bound=0.0
109452: loss=0.000, reward_mean=0.0, reward_bound=0.0
109453: loss=0.000, reward_mean=0.1, reward_bound=0.0
109454: loss=0.000, reward_mean=0.1, reward_bound=0.0
109455: loss=0.000, reward_m

109594: loss=0.000, reward_mean=0.1, reward_bound=0.0
109595: loss=0.000, reward_mean=0.0, reward_bound=0.0
109596: loss=0.000, reward_mean=0.1, reward_bound=0.0
109597: loss=0.000, reward_mean=0.1, reward_bound=0.0
109598: loss=0.000, reward_mean=0.0, reward_bound=0.0
109599: loss=0.000, reward_mean=0.1, reward_bound=0.0
109600: loss=0.000, reward_mean=0.1, reward_bound=0.0
109601: loss=0.000, reward_mean=0.1, reward_bound=0.0
109602: loss=0.000, reward_mean=0.0, reward_bound=0.0
109603: loss=0.000, reward_mean=0.1, reward_bound=0.0
109604: loss=0.000, reward_mean=0.1, reward_bound=0.0
109605: loss=0.000, reward_mean=0.0, reward_bound=0.0
109606: loss=0.000, reward_mean=0.2, reward_bound=0.0
109607: loss=0.000, reward_mean=0.0, reward_bound=0.0
109608: loss=0.000, reward_mean=0.1, reward_bound=0.0
109609: loss=0.000, reward_mean=0.1, reward_bound=0.0
109610: loss=0.000, reward_mean=0.0, reward_bound=0.0
109611: loss=0.000, reward_mean=0.1, reward_bound=0.0
109612: loss=0.000, reward_m

109746: loss=0.000, reward_mean=0.1, reward_bound=0.0
109747: loss=0.000, reward_mean=0.1, reward_bound=0.0
109748: loss=0.000, reward_mean=0.1, reward_bound=0.0
109749: loss=0.000, reward_mean=0.1, reward_bound=0.0
109750: loss=0.000, reward_mean=0.0, reward_bound=0.0
109751: loss=0.000, reward_mean=0.1, reward_bound=0.0
109752: loss=0.000, reward_mean=0.1, reward_bound=0.0
109753: loss=0.000, reward_mean=0.1, reward_bound=0.0
109754: loss=0.000, reward_mean=0.2, reward_bound=0.0
109755: loss=0.000, reward_mean=0.0, reward_bound=0.0
109756: loss=0.000, reward_mean=0.0, reward_bound=0.0
109757: loss=0.000, reward_mean=0.1, reward_bound=0.0
109758: loss=0.000, reward_mean=0.1, reward_bound=0.0
109759: loss=0.000, reward_mean=0.1, reward_bound=0.0
109760: loss=0.000, reward_mean=0.0, reward_bound=0.0
109761: loss=0.000, reward_mean=0.0, reward_bound=0.0
109762: loss=0.000, reward_mean=0.0, reward_bound=0.0
109763: loss=0.000, reward_mean=0.0, reward_bound=0.0
109764: loss=0.000, reward_m

109901: loss=0.000, reward_mean=0.0, reward_bound=0.0
109902: loss=0.000, reward_mean=0.0, reward_bound=0.0
109903: loss=0.000, reward_mean=0.0, reward_bound=0.0
109904: loss=0.000, reward_mean=0.1, reward_bound=0.0
109905: loss=0.000, reward_mean=0.1, reward_bound=0.0
109906: loss=0.000, reward_mean=0.1, reward_bound=0.0
109907: loss=0.000, reward_mean=0.1, reward_bound=0.0
109908: loss=0.000, reward_mean=0.1, reward_bound=0.0
109909: loss=0.000, reward_mean=0.1, reward_bound=0.0
109910: loss=0.000, reward_mean=0.0, reward_bound=0.0
109911: loss=0.000, reward_mean=0.1, reward_bound=0.0
109912: loss=0.000, reward_mean=0.1, reward_bound=0.0
109913: loss=0.000, reward_mean=0.1, reward_bound=0.0
109914: loss=0.000, reward_mean=0.1, reward_bound=0.0
109915: loss=0.000, reward_mean=0.1, reward_bound=0.0
109916: loss=0.000, reward_mean=0.1, reward_bound=0.0
109917: loss=0.000, reward_mean=0.1, reward_bound=0.0
109918: loss=0.000, reward_mean=0.0, reward_bound=0.0
109919: loss=0.000, reward_m

110056: loss=0.000, reward_mean=0.1, reward_bound=0.0
110057: loss=0.000, reward_mean=0.0, reward_bound=0.0
110058: loss=0.000, reward_mean=0.1, reward_bound=0.0
110059: loss=0.000, reward_mean=0.3, reward_bound=0.5
110060: loss=0.000, reward_mean=0.1, reward_bound=0.0
110061: loss=0.000, reward_mean=0.1, reward_bound=0.0
110062: loss=0.000, reward_mean=0.0, reward_bound=0.0
110063: loss=0.000, reward_mean=0.0, reward_bound=0.0
110064: loss=0.000, reward_mean=0.0, reward_bound=0.0
110065: loss=0.000, reward_mean=0.0, reward_bound=0.0
110066: loss=0.000, reward_mean=0.1, reward_bound=0.0
110067: loss=0.000, reward_mean=0.1, reward_bound=0.0
110068: loss=0.000, reward_mean=0.1, reward_bound=0.0
110069: loss=0.000, reward_mean=0.1, reward_bound=0.0
110070: loss=0.000, reward_mean=0.2, reward_bound=0.0
110071: loss=0.000, reward_mean=0.0, reward_bound=0.0
110072: loss=0.000, reward_mean=0.1, reward_bound=0.0
110073: loss=0.000, reward_mean=0.0, reward_bound=0.0
110074: loss=0.000, reward_m

110208: loss=0.000, reward_mean=0.1, reward_bound=0.0
110209: loss=0.000, reward_mean=0.0, reward_bound=0.0
110210: loss=0.000, reward_mean=0.0, reward_bound=0.0
110211: loss=0.000, reward_mean=0.2, reward_bound=0.0
110212: loss=0.000, reward_mean=0.1, reward_bound=0.0
110213: loss=0.000, reward_mean=0.1, reward_bound=0.0
110214: loss=0.000, reward_mean=0.0, reward_bound=0.0
110215: loss=0.000, reward_mean=0.1, reward_bound=0.0
110216: loss=0.000, reward_mean=0.1, reward_bound=0.0
110217: loss=0.000, reward_mean=0.2, reward_bound=0.0
110218: loss=0.000, reward_mean=0.1, reward_bound=0.0
110219: loss=0.000, reward_mean=0.0, reward_bound=0.0
110220: loss=0.000, reward_mean=0.1, reward_bound=0.0
110221: loss=0.000, reward_mean=0.2, reward_bound=0.0
110222: loss=0.000, reward_mean=0.1, reward_bound=0.0
110223: loss=0.000, reward_mean=0.2, reward_bound=0.0
110224: loss=0.000, reward_mean=0.1, reward_bound=0.0
110225: loss=0.000, reward_mean=0.1, reward_bound=0.0
110226: loss=0.000, reward_m

110361: loss=0.000, reward_mean=0.0, reward_bound=0.0
110362: loss=0.000, reward_mean=0.0, reward_bound=0.0
110363: loss=0.000, reward_mean=0.1, reward_bound=0.0
110364: loss=0.000, reward_mean=0.1, reward_bound=0.0
110365: loss=0.000, reward_mean=0.0, reward_bound=0.0
110366: loss=0.000, reward_mean=0.1, reward_bound=0.0
110367: loss=0.000, reward_mean=0.1, reward_bound=0.0
110368: loss=0.000, reward_mean=0.2, reward_bound=0.0
110369: loss=0.000, reward_mean=0.0, reward_bound=0.0
110370: loss=0.000, reward_mean=0.1, reward_bound=0.0
110371: loss=0.000, reward_mean=0.0, reward_bound=0.0
110372: loss=0.000, reward_mean=0.1, reward_bound=0.0
110373: loss=0.000, reward_mean=0.0, reward_bound=0.0
110374: loss=0.000, reward_mean=0.1, reward_bound=0.0
110375: loss=0.000, reward_mean=0.1, reward_bound=0.0
110376: loss=0.000, reward_mean=0.0, reward_bound=0.0
110377: loss=0.000, reward_mean=0.0, reward_bound=0.0
110378: loss=0.000, reward_mean=0.0, reward_bound=0.0
110379: loss=0.000, reward_m

110513: loss=0.000, reward_mean=0.2, reward_bound=0.0
110514: loss=0.000, reward_mean=0.0, reward_bound=0.0
110515: loss=0.000, reward_mean=0.1, reward_bound=0.0
110516: loss=0.000, reward_mean=0.1, reward_bound=0.0
110517: loss=0.000, reward_mean=0.1, reward_bound=0.0
110518: loss=0.000, reward_mean=0.0, reward_bound=0.0
110519: loss=0.000, reward_mean=0.0, reward_bound=0.0
110520: loss=0.000, reward_mean=0.0, reward_bound=0.0
110521: loss=0.000, reward_mean=0.1, reward_bound=0.0
110522: loss=0.000, reward_mean=0.0, reward_bound=0.0
110523: loss=0.000, reward_mean=0.0, reward_bound=0.0
110524: loss=0.000, reward_mean=0.1, reward_bound=0.0
110525: loss=0.000, reward_mean=0.0, reward_bound=0.0
110526: loss=0.000, reward_mean=0.1, reward_bound=0.0
110527: loss=0.000, reward_mean=0.0, reward_bound=0.0
110528: loss=0.000, reward_mean=0.1, reward_bound=0.0
110529: loss=0.000, reward_mean=0.1, reward_bound=0.0
110530: loss=0.000, reward_mean=0.0, reward_bound=0.0
110531: loss=0.000, reward_m

110665: loss=0.000, reward_mean=0.1, reward_bound=0.0
110666: loss=0.000, reward_mean=0.0, reward_bound=0.0
110667: loss=0.000, reward_mean=0.1, reward_bound=0.0
110668: loss=0.000, reward_mean=0.1, reward_bound=0.0
110669: loss=0.000, reward_mean=0.0, reward_bound=0.0
110670: loss=0.000, reward_mean=0.0, reward_bound=0.0
110671: loss=0.000, reward_mean=0.0, reward_bound=0.0
110672: loss=0.000, reward_mean=0.0, reward_bound=0.0
110673: loss=0.000, reward_mean=0.0, reward_bound=0.0
110674: loss=0.000, reward_mean=0.0, reward_bound=0.0
110675: loss=0.000, reward_mean=0.1, reward_bound=0.0
110676: loss=0.000, reward_mean=0.1, reward_bound=0.0
110677: loss=0.000, reward_mean=0.0, reward_bound=0.0
110678: loss=0.000, reward_mean=0.1, reward_bound=0.0
110679: loss=0.000, reward_mean=0.2, reward_bound=0.0
110680: loss=0.000, reward_mean=0.1, reward_bound=0.0
110681: loss=0.000, reward_mean=0.0, reward_bound=0.0
110682: loss=0.000, reward_mean=0.0, reward_bound=0.0
110683: loss=0.000, reward_m

110817: loss=0.000, reward_mean=0.0, reward_bound=0.0
110818: loss=0.000, reward_mean=0.0, reward_bound=0.0
110819: loss=0.000, reward_mean=0.0, reward_bound=0.0
110820: loss=0.000, reward_mean=0.1, reward_bound=0.0
110821: loss=0.000, reward_mean=0.0, reward_bound=0.0
110822: loss=0.000, reward_mean=0.2, reward_bound=0.0
110823: loss=0.000, reward_mean=0.0, reward_bound=0.0
110824: loss=0.000, reward_mean=0.1, reward_bound=0.0
110825: loss=0.000, reward_mean=0.1, reward_bound=0.0
110826: loss=0.000, reward_mean=0.0, reward_bound=0.0
110827: loss=0.000, reward_mean=0.0, reward_bound=0.0
110828: loss=0.000, reward_mean=0.1, reward_bound=0.0
110829: loss=0.000, reward_mean=0.1, reward_bound=0.0
110830: loss=0.000, reward_mean=0.1, reward_bound=0.0
110831: loss=0.000, reward_mean=0.0, reward_bound=0.0
110832: loss=0.000, reward_mean=0.1, reward_bound=0.0
110833: loss=0.000, reward_mean=0.1, reward_bound=0.0
110834: loss=0.000, reward_mean=0.0, reward_bound=0.0
110835: loss=0.000, reward_m

110969: loss=0.000, reward_mean=0.1, reward_bound=0.0
110970: loss=0.000, reward_mean=0.1, reward_bound=0.0
110971: loss=0.000, reward_mean=0.0, reward_bound=0.0
110972: loss=0.000, reward_mean=0.1, reward_bound=0.0
110973: loss=0.000, reward_mean=0.1, reward_bound=0.0
110974: loss=0.000, reward_mean=0.0, reward_bound=0.0
110975: loss=0.000, reward_mean=0.0, reward_bound=0.0
110976: loss=0.000, reward_mean=0.0, reward_bound=0.0
110977: loss=0.000, reward_mean=0.1, reward_bound=0.0
110978: loss=0.000, reward_mean=0.0, reward_bound=0.0
110979: loss=0.000, reward_mean=0.0, reward_bound=0.0
110980: loss=0.000, reward_mean=0.1, reward_bound=0.0
110981: loss=0.000, reward_mean=0.1, reward_bound=0.0
110982: loss=0.000, reward_mean=0.0, reward_bound=0.0
110983: loss=0.000, reward_mean=0.0, reward_bound=0.0
110984: loss=0.000, reward_mean=0.0, reward_bound=0.0
110985: loss=0.000, reward_mean=0.1, reward_bound=0.0
110986: loss=0.000, reward_mean=0.0, reward_bound=0.0
110987: loss=0.000, reward_m

111122: loss=0.000, reward_mean=0.1, reward_bound=0.0
111123: loss=0.000, reward_mean=0.0, reward_bound=0.0
111124: loss=0.000, reward_mean=0.1, reward_bound=0.0
111125: loss=0.000, reward_mean=0.0, reward_bound=0.0
111126: loss=0.000, reward_mean=0.1, reward_bound=0.0
111127: loss=0.000, reward_mean=0.1, reward_bound=0.0
111128: loss=0.000, reward_mean=0.1, reward_bound=0.0
111129: loss=0.000, reward_mean=0.2, reward_bound=0.0
111130: loss=0.000, reward_mean=0.1, reward_bound=0.0
111131: loss=0.000, reward_mean=0.0, reward_bound=0.0
111132: loss=0.000, reward_mean=0.0, reward_bound=0.0
111133: loss=0.000, reward_mean=0.0, reward_bound=0.0
111134: loss=0.000, reward_mean=0.1, reward_bound=0.0
111135: loss=0.000, reward_mean=0.0, reward_bound=0.0
111136: loss=0.000, reward_mean=0.2, reward_bound=0.0
111137: loss=0.000, reward_mean=0.1, reward_bound=0.0
111138: loss=0.000, reward_mean=0.0, reward_bound=0.0
111139: loss=0.000, reward_mean=0.1, reward_bound=0.0
111140: loss=0.000, reward_m

111277: loss=0.000, reward_mean=0.1, reward_bound=0.0
111278: loss=0.000, reward_mean=0.1, reward_bound=0.0
111279: loss=0.000, reward_mean=0.2, reward_bound=0.0
111280: loss=0.000, reward_mean=0.1, reward_bound=0.0
111281: loss=0.000, reward_mean=0.0, reward_bound=0.0
111282: loss=0.000, reward_mean=0.0, reward_bound=0.0
111283: loss=0.000, reward_mean=0.1, reward_bound=0.0
111284: loss=0.000, reward_mean=0.0, reward_bound=0.0
111285: loss=0.000, reward_mean=0.2, reward_bound=0.0
111286: loss=0.000, reward_mean=0.1, reward_bound=0.0
111287: loss=0.000, reward_mean=0.1, reward_bound=0.0
111288: loss=0.000, reward_mean=0.0, reward_bound=0.0
111289: loss=0.000, reward_mean=0.1, reward_bound=0.0
111290: loss=0.000, reward_mean=0.1, reward_bound=0.0
111291: loss=0.000, reward_mean=0.0, reward_bound=0.0
111292: loss=0.000, reward_mean=0.1, reward_bound=0.0
111293: loss=0.000, reward_mean=0.0, reward_bound=0.0
111294: loss=0.000, reward_mean=0.0, reward_bound=0.0
111295: loss=0.000, reward_m

111430: loss=0.000, reward_mean=0.1, reward_bound=0.0
111431: loss=0.000, reward_mean=0.1, reward_bound=0.0
111432: loss=0.000, reward_mean=0.0, reward_bound=0.0
111433: loss=0.000, reward_mean=0.2, reward_bound=0.0
111434: loss=0.000, reward_mean=0.1, reward_bound=0.0
111435: loss=0.000, reward_mean=0.0, reward_bound=0.0
111436: loss=0.000, reward_mean=0.1, reward_bound=0.0
111437: loss=0.000, reward_mean=0.0, reward_bound=0.0
111438: loss=0.000, reward_mean=0.1, reward_bound=0.0
111439: loss=0.000, reward_mean=0.1, reward_bound=0.0
111440: loss=0.000, reward_mean=0.0, reward_bound=0.0
111441: loss=0.000, reward_mean=0.1, reward_bound=0.0
111442: loss=0.000, reward_mean=0.2, reward_bound=0.0
111443: loss=0.000, reward_mean=0.0, reward_bound=0.0
111444: loss=0.000, reward_mean=0.1, reward_bound=0.0
111445: loss=0.000, reward_mean=0.1, reward_bound=0.0
111446: loss=0.000, reward_mean=0.1, reward_bound=0.0
111447: loss=0.000, reward_mean=0.0, reward_bound=0.0
111448: loss=0.000, reward_m

111583: loss=0.000, reward_mean=0.0, reward_bound=0.0
111584: loss=0.000, reward_mean=0.1, reward_bound=0.0
111585: loss=0.000, reward_mean=0.0, reward_bound=0.0
111586: loss=0.000, reward_mean=0.0, reward_bound=0.0
111587: loss=0.000, reward_mean=0.0, reward_bound=0.0
111588: loss=0.000, reward_mean=0.1, reward_bound=0.0
111589: loss=0.000, reward_mean=0.1, reward_bound=0.0
111590: loss=0.000, reward_mean=0.0, reward_bound=0.0
111591: loss=0.000, reward_mean=0.0, reward_bound=0.0
111592: loss=0.000, reward_mean=0.1, reward_bound=0.0
111593: loss=0.000, reward_mean=0.0, reward_bound=0.0
111594: loss=0.000, reward_mean=0.1, reward_bound=0.0
111595: loss=0.000, reward_mean=0.1, reward_bound=0.0
111596: loss=0.000, reward_mean=0.0, reward_bound=0.0
111597: loss=0.000, reward_mean=0.1, reward_bound=0.0
111598: loss=0.000, reward_mean=0.1, reward_bound=0.0
111599: loss=0.000, reward_mean=0.0, reward_bound=0.0
111600: loss=0.000, reward_mean=0.1, reward_bound=0.0
111601: loss=0.000, reward_m

111736: loss=0.000, reward_mean=0.0, reward_bound=0.0
111737: loss=0.000, reward_mean=0.1, reward_bound=0.0
111738: loss=0.000, reward_mean=0.0, reward_bound=0.0
111739: loss=0.000, reward_mean=0.1, reward_bound=0.0
111740: loss=0.000, reward_mean=0.0, reward_bound=0.0
111741: loss=0.000, reward_mean=0.1, reward_bound=0.0
111742: loss=0.000, reward_mean=0.0, reward_bound=0.0
111743: loss=0.000, reward_mean=0.1, reward_bound=0.0
111744: loss=0.000, reward_mean=0.1, reward_bound=0.0
111745: loss=0.000, reward_mean=0.1, reward_bound=0.0
111746: loss=0.000, reward_mean=0.1, reward_bound=0.0
111747: loss=0.000, reward_mean=0.0, reward_bound=0.0
111748: loss=0.000, reward_mean=0.1, reward_bound=0.0
111749: loss=0.000, reward_mean=0.1, reward_bound=0.0
111750: loss=0.000, reward_mean=0.1, reward_bound=0.0
111751: loss=0.000, reward_mean=0.1, reward_bound=0.0
111752: loss=0.000, reward_mean=0.1, reward_bound=0.0
111753: loss=0.000, reward_mean=0.0, reward_bound=0.0
111754: loss=0.000, reward_m

111888: loss=0.000, reward_mean=0.0, reward_bound=0.0
111889: loss=0.000, reward_mean=0.2, reward_bound=0.0
111890: loss=0.000, reward_mean=0.1, reward_bound=0.0
111891: loss=0.000, reward_mean=0.1, reward_bound=0.0
111892: loss=0.000, reward_mean=0.0, reward_bound=0.0
111893: loss=0.000, reward_mean=0.1, reward_bound=0.0
111894: loss=0.000, reward_mean=0.1, reward_bound=0.0
111895: loss=0.000, reward_mean=0.1, reward_bound=0.0
111896: loss=0.000, reward_mean=0.1, reward_bound=0.0
111897: loss=0.000, reward_mean=0.1, reward_bound=0.0
111898: loss=0.000, reward_mean=0.1, reward_bound=0.0
111899: loss=0.000, reward_mean=0.1, reward_bound=0.0
111900: loss=0.000, reward_mean=0.1, reward_bound=0.0
111901: loss=0.000, reward_mean=0.0, reward_bound=0.0
111902: loss=0.000, reward_mean=0.1, reward_bound=0.0
111903: loss=0.000, reward_mean=0.0, reward_bound=0.0
111904: loss=0.000, reward_mean=0.0, reward_bound=0.0
111905: loss=0.000, reward_mean=0.1, reward_bound=0.0
111906: loss=0.000, reward_m

112040: loss=0.000, reward_mean=0.0, reward_bound=0.0
112041: loss=0.000, reward_mean=0.0, reward_bound=0.0
112042: loss=0.000, reward_mean=0.2, reward_bound=0.0
112043: loss=0.000, reward_mean=0.0, reward_bound=0.0
112044: loss=0.000, reward_mean=0.1, reward_bound=0.0
112045: loss=0.000, reward_mean=0.1, reward_bound=0.0
112046: loss=0.000, reward_mean=0.1, reward_bound=0.0
112047: loss=0.000, reward_mean=0.1, reward_bound=0.0
112048: loss=0.000, reward_mean=0.1, reward_bound=0.0
112049: loss=0.000, reward_mean=0.0, reward_bound=0.0
112050: loss=0.000, reward_mean=0.1, reward_bound=0.0
112051: loss=0.000, reward_mean=0.2, reward_bound=0.0
112052: loss=0.000, reward_mean=0.0, reward_bound=0.0
112053: loss=0.000, reward_mean=0.0, reward_bound=0.0
112054: loss=0.000, reward_mean=0.1, reward_bound=0.0
112055: loss=0.000, reward_mean=0.0, reward_bound=0.0
112056: loss=0.000, reward_mean=0.0, reward_bound=0.0
112057: loss=0.000, reward_mean=0.2, reward_bound=0.0
112058: loss=0.000, reward_m

112195: loss=0.000, reward_mean=0.0, reward_bound=0.0
112196: loss=0.000, reward_mean=0.0, reward_bound=0.0
112197: loss=0.000, reward_mean=0.0, reward_bound=0.0
112198: loss=0.000, reward_mean=0.0, reward_bound=0.0
112199: loss=0.000, reward_mean=0.1, reward_bound=0.0
112200: loss=0.000, reward_mean=0.1, reward_bound=0.0
112201: loss=0.000, reward_mean=0.1, reward_bound=0.0
112202: loss=0.000, reward_mean=0.1, reward_bound=0.0
112203: loss=0.000, reward_mean=0.0, reward_bound=0.0
112204: loss=0.000, reward_mean=0.0, reward_bound=0.0
112205: loss=0.000, reward_mean=0.1, reward_bound=0.0
112206: loss=0.000, reward_mean=0.1, reward_bound=0.0
112207: loss=0.000, reward_mean=0.0, reward_bound=0.0
112208: loss=0.000, reward_mean=0.0, reward_bound=0.0
112209: loss=0.000, reward_mean=0.0, reward_bound=0.0
112210: loss=0.000, reward_mean=0.1, reward_bound=0.0
112211: loss=0.000, reward_mean=0.0, reward_bound=0.0
112212: loss=0.000, reward_mean=0.0, reward_bound=0.0
112213: loss=0.000, reward_m

112347: loss=0.000, reward_mean=0.1, reward_bound=0.0
112348: loss=0.000, reward_mean=0.1, reward_bound=0.0
112349: loss=0.000, reward_mean=0.0, reward_bound=0.0
112350: loss=0.000, reward_mean=0.0, reward_bound=0.0
112351: loss=0.000, reward_mean=0.1, reward_bound=0.0
112352: loss=0.000, reward_mean=0.2, reward_bound=0.0
112353: loss=0.000, reward_mean=0.0, reward_bound=0.0
112354: loss=0.000, reward_mean=0.1, reward_bound=0.0
112355: loss=0.000, reward_mean=0.0, reward_bound=0.0
112356: loss=0.000, reward_mean=0.1, reward_bound=0.0
112357: loss=0.000, reward_mean=0.1, reward_bound=0.0
112358: loss=0.000, reward_mean=0.0, reward_bound=0.0
112359: loss=0.000, reward_mean=0.1, reward_bound=0.0
112360: loss=0.000, reward_mean=0.1, reward_bound=0.0
112361: loss=0.000, reward_mean=0.1, reward_bound=0.0
112362: loss=0.000, reward_mean=0.1, reward_bound=0.0
112363: loss=0.000, reward_mean=0.1, reward_bound=0.0
112364: loss=0.000, reward_mean=0.0, reward_bound=0.0
112365: loss=0.000, reward_m

112501: loss=0.000, reward_mean=0.1, reward_bound=0.0
112502: loss=0.000, reward_mean=0.1, reward_bound=0.0
112503: loss=0.000, reward_mean=0.1, reward_bound=0.0
112504: loss=0.000, reward_mean=0.0, reward_bound=0.0
112505: loss=0.000, reward_mean=0.0, reward_bound=0.0
112506: loss=0.000, reward_mean=0.0, reward_bound=0.0
112507: loss=0.000, reward_mean=0.1, reward_bound=0.0
112508: loss=0.000, reward_mean=0.1, reward_bound=0.0
112509: loss=0.000, reward_mean=0.1, reward_bound=0.0
112510: loss=0.000, reward_mean=0.1, reward_bound=0.0
112511: loss=0.000, reward_mean=0.1, reward_bound=0.0
112512: loss=0.000, reward_mean=0.0, reward_bound=0.0
112513: loss=0.000, reward_mean=0.1, reward_bound=0.0
112514: loss=0.000, reward_mean=0.1, reward_bound=0.0
112515: loss=0.000, reward_mean=0.1, reward_bound=0.0
112516: loss=0.000, reward_mean=0.2, reward_bound=0.0
112517: loss=0.000, reward_mean=0.1, reward_bound=0.0
112518: loss=0.000, reward_mean=0.1, reward_bound=0.0
112519: loss=0.000, reward_m

112655: loss=0.000, reward_mean=0.0, reward_bound=0.0
112656: loss=0.000, reward_mean=0.0, reward_bound=0.0
112657: loss=0.000, reward_mean=0.0, reward_bound=0.0
112658: loss=0.000, reward_mean=0.1, reward_bound=0.0
112659: loss=0.000, reward_mean=0.1, reward_bound=0.0
112660: loss=0.000, reward_mean=0.0, reward_bound=0.0
112661: loss=0.000, reward_mean=0.1, reward_bound=0.0
112662: loss=0.000, reward_mean=0.1, reward_bound=0.0
112663: loss=0.000, reward_mean=0.1, reward_bound=0.0
112664: loss=0.000, reward_mean=0.0, reward_bound=0.0
112665: loss=0.000, reward_mean=0.0, reward_bound=0.0
112666: loss=0.000, reward_mean=0.1, reward_bound=0.0
112667: loss=0.000, reward_mean=0.0, reward_bound=0.0
112668: loss=0.000, reward_mean=0.0, reward_bound=0.0
112669: loss=0.000, reward_mean=0.0, reward_bound=0.0
112670: loss=0.000, reward_mean=0.0, reward_bound=0.0
112671: loss=0.000, reward_mean=0.1, reward_bound=0.0
112672: loss=0.000, reward_mean=0.0, reward_bound=0.0
112673: loss=0.000, reward_m

112809: loss=0.000, reward_mean=0.0, reward_bound=0.0
112810: loss=0.000, reward_mean=0.0, reward_bound=0.0
112811: loss=0.000, reward_mean=0.1, reward_bound=0.0
112812: loss=0.000, reward_mean=0.1, reward_bound=0.0
112813: loss=0.000, reward_mean=0.0, reward_bound=0.0
112814: loss=0.000, reward_mean=0.1, reward_bound=0.0
112815: loss=0.000, reward_mean=0.0, reward_bound=0.0
112816: loss=0.000, reward_mean=0.0, reward_bound=0.0
112817: loss=0.000, reward_mean=0.1, reward_bound=0.0
112818: loss=0.000, reward_mean=0.0, reward_bound=0.0
112819: loss=0.000, reward_mean=0.1, reward_bound=0.0
112820: loss=0.000, reward_mean=0.1, reward_bound=0.0
112821: loss=0.000, reward_mean=0.2, reward_bound=0.0
112822: loss=0.000, reward_mean=0.1, reward_bound=0.0
112823: loss=0.000, reward_mean=0.1, reward_bound=0.0
112824: loss=0.000, reward_mean=0.1, reward_bound=0.0
112825: loss=0.000, reward_mean=0.1, reward_bound=0.0
112826: loss=0.000, reward_mean=0.1, reward_bound=0.0
112827: loss=0.000, reward_m

112963: loss=0.000, reward_mean=0.1, reward_bound=0.0
112964: loss=0.000, reward_mean=0.1, reward_bound=0.0
112965: loss=0.000, reward_mean=0.1, reward_bound=0.0
112966: loss=0.000, reward_mean=0.1, reward_bound=0.0
112967: loss=0.000, reward_mean=0.1, reward_bound=0.0
112968: loss=0.000, reward_mean=0.1, reward_bound=0.0
112969: loss=0.000, reward_mean=0.2, reward_bound=0.0
112970: loss=0.000, reward_mean=0.1, reward_bound=0.0
112971: loss=0.000, reward_mean=0.1, reward_bound=0.0
112972: loss=0.000, reward_mean=0.0, reward_bound=0.0
112973: loss=0.000, reward_mean=0.1, reward_bound=0.0
112974: loss=0.000, reward_mean=0.1, reward_bound=0.0
112975: loss=0.000, reward_mean=0.1, reward_bound=0.0
112976: loss=0.000, reward_mean=0.1, reward_bound=0.0
112977: loss=0.000, reward_mean=0.2, reward_bound=0.0
112978: loss=0.000, reward_mean=0.1, reward_bound=0.0
112979: loss=0.000, reward_mean=0.1, reward_bound=0.0
112980: loss=0.000, reward_mean=0.1, reward_bound=0.0
112981: loss=0.000, reward_m

113115: loss=0.000, reward_mean=0.1, reward_bound=0.0
113116: loss=0.000, reward_mean=0.1, reward_bound=0.0
113117: loss=0.000, reward_mean=0.2, reward_bound=0.0
113118: loss=0.000, reward_mean=0.0, reward_bound=0.0
113119: loss=0.000, reward_mean=0.1, reward_bound=0.0
113120: loss=0.000, reward_mean=0.2, reward_bound=0.0
113121: loss=0.000, reward_mean=0.0, reward_bound=0.0
113122: loss=0.000, reward_mean=0.1, reward_bound=0.0
113123: loss=0.000, reward_mean=0.0, reward_bound=0.0
113124: loss=0.000, reward_mean=0.1, reward_bound=0.0
113125: loss=0.000, reward_mean=0.1, reward_bound=0.0
113126: loss=0.000, reward_mean=0.0, reward_bound=0.0
113127: loss=0.000, reward_mean=0.1, reward_bound=0.0
113128: loss=0.000, reward_mean=0.1, reward_bound=0.0
113129: loss=0.000, reward_mean=0.0, reward_bound=0.0
113130: loss=0.000, reward_mean=0.1, reward_bound=0.0
113131: loss=0.000, reward_mean=0.0, reward_bound=0.0
113132: loss=0.000, reward_mean=0.0, reward_bound=0.0
113133: loss=0.000, reward_m

113267: loss=0.000, reward_mean=0.0, reward_bound=0.0
113268: loss=0.000, reward_mean=0.0, reward_bound=0.0
113269: loss=0.000, reward_mean=0.0, reward_bound=0.0
113270: loss=0.000, reward_mean=0.0, reward_bound=0.0
113271: loss=0.000, reward_mean=0.1, reward_bound=0.0
113272: loss=0.000, reward_mean=0.0, reward_bound=0.0
113273: loss=0.000, reward_mean=0.1, reward_bound=0.0
113274: loss=0.000, reward_mean=0.1, reward_bound=0.0
113275: loss=0.000, reward_mean=0.1, reward_bound=0.0
113276: loss=0.000, reward_mean=0.0, reward_bound=0.0
113277: loss=0.000, reward_mean=0.1, reward_bound=0.0
113278: loss=0.000, reward_mean=0.1, reward_bound=0.0
113279: loss=0.000, reward_mean=0.1, reward_bound=0.0
113280: loss=0.000, reward_mean=0.0, reward_bound=0.0
113281: loss=0.000, reward_mean=0.1, reward_bound=0.0
113282: loss=0.000, reward_mean=0.1, reward_bound=0.0
113283: loss=0.000, reward_mean=0.1, reward_bound=0.0
113284: loss=0.000, reward_mean=0.1, reward_bound=0.0
113285: loss=0.000, reward_m

113421: loss=0.000, reward_mean=0.2, reward_bound=0.0
113422: loss=0.000, reward_mean=0.0, reward_bound=0.0
113423: loss=0.000, reward_mean=0.0, reward_bound=0.0
113424: loss=0.000, reward_mean=0.0, reward_bound=0.0
113425: loss=0.000, reward_mean=0.1, reward_bound=0.0
113426: loss=0.000, reward_mean=0.0, reward_bound=0.0
113427: loss=0.000, reward_mean=0.1, reward_bound=0.0
113428: loss=0.000, reward_mean=0.1, reward_bound=0.0
113429: loss=0.000, reward_mean=0.0, reward_bound=0.0
113430: loss=0.000, reward_mean=0.0, reward_bound=0.0
113431: loss=0.000, reward_mean=0.1, reward_bound=0.0
113432: loss=0.000, reward_mean=0.0, reward_bound=0.0
113433: loss=0.000, reward_mean=0.0, reward_bound=0.0
113434: loss=0.000, reward_mean=0.0, reward_bound=0.0
113435: loss=0.000, reward_mean=0.1, reward_bound=0.0
113436: loss=0.000, reward_mean=0.0, reward_bound=0.0
113437: loss=0.000, reward_mean=0.1, reward_bound=0.0
113438: loss=0.000, reward_mean=0.1, reward_bound=0.0
113439: loss=0.000, reward_m

113573: loss=0.000, reward_mean=0.0, reward_bound=0.0
113574: loss=0.000, reward_mean=0.0, reward_bound=0.0
113575: loss=0.000, reward_mean=0.1, reward_bound=0.0
113576: loss=0.000, reward_mean=0.0, reward_bound=0.0
113577: loss=0.000, reward_mean=0.1, reward_bound=0.0
113578: loss=0.000, reward_mean=0.0, reward_bound=0.0
113579: loss=0.000, reward_mean=0.1, reward_bound=0.0
113580: loss=0.000, reward_mean=0.0, reward_bound=0.0
113581: loss=0.000, reward_mean=0.1, reward_bound=0.0
113582: loss=0.000, reward_mean=0.0, reward_bound=0.0
113583: loss=0.000, reward_mean=0.1, reward_bound=0.0
113584: loss=0.000, reward_mean=0.0, reward_bound=0.0
113585: loss=0.000, reward_mean=0.0, reward_bound=0.0
113586: loss=0.000, reward_mean=0.0, reward_bound=0.0
113587: loss=0.000, reward_mean=0.1, reward_bound=0.0
113588: loss=0.000, reward_mean=0.1, reward_bound=0.0
113589: loss=0.000, reward_mean=0.2, reward_bound=0.0
113590: loss=0.000, reward_mean=0.0, reward_bound=0.0
113591: loss=0.000, reward_m

113726: loss=0.000, reward_mean=0.0, reward_bound=0.0
113727: loss=0.000, reward_mean=0.1, reward_bound=0.0
113728: loss=0.000, reward_mean=0.0, reward_bound=0.0
113729: loss=0.000, reward_mean=0.1, reward_bound=0.0
113730: loss=0.000, reward_mean=0.0, reward_bound=0.0
113731: loss=0.000, reward_mean=0.0, reward_bound=0.0
113732: loss=0.000, reward_mean=0.1, reward_bound=0.0
113733: loss=0.000, reward_mean=0.1, reward_bound=0.0
113734: loss=0.000, reward_mean=0.1, reward_bound=0.0
113735: loss=0.000, reward_mean=0.0, reward_bound=0.0
113736: loss=0.000, reward_mean=0.0, reward_bound=0.0
113737: loss=0.000, reward_mean=0.0, reward_bound=0.0
113738: loss=0.000, reward_mean=0.1, reward_bound=0.0
113739: loss=0.000, reward_mean=0.0, reward_bound=0.0
113740: loss=0.000, reward_mean=0.0, reward_bound=0.0
113741: loss=0.000, reward_mean=0.0, reward_bound=0.0
113742: loss=0.000, reward_mean=0.0, reward_bound=0.0
113743: loss=0.000, reward_mean=0.0, reward_bound=0.0
113744: loss=0.000, reward_m

113882: loss=0.000, reward_mean=0.1, reward_bound=0.0
113883: loss=0.000, reward_mean=0.0, reward_bound=0.0
113884: loss=0.000, reward_mean=0.0, reward_bound=0.0
113885: loss=0.000, reward_mean=0.0, reward_bound=0.0
113886: loss=0.000, reward_mean=0.0, reward_bound=0.0
113887: loss=0.000, reward_mean=0.0, reward_bound=0.0
113888: loss=0.000, reward_mean=0.0, reward_bound=0.0
113889: loss=0.000, reward_mean=0.0, reward_bound=0.0
113890: loss=0.000, reward_mean=0.1, reward_bound=0.0
113891: loss=0.000, reward_mean=0.0, reward_bound=0.0
113892: loss=0.000, reward_mean=0.2, reward_bound=0.0
113893: loss=0.000, reward_mean=0.0, reward_bound=0.0
113894: loss=0.000, reward_mean=0.1, reward_bound=0.0
113895: loss=0.000, reward_mean=0.0, reward_bound=0.0
113896: loss=0.000, reward_mean=0.0, reward_bound=0.0
113897: loss=0.000, reward_mean=0.0, reward_bound=0.0
113898: loss=0.000, reward_mean=0.1, reward_bound=0.0
113899: loss=0.000, reward_mean=0.1, reward_bound=0.0
113900: loss=0.000, reward_m

114039: loss=0.000, reward_mean=0.1, reward_bound=0.0
114040: loss=0.000, reward_mean=0.2, reward_bound=0.0
114041: loss=0.000, reward_mean=0.0, reward_bound=0.0
114042: loss=0.000, reward_mean=0.0, reward_bound=0.0
114043: loss=0.000, reward_mean=0.1, reward_bound=0.0
114044: loss=0.000, reward_mean=0.1, reward_bound=0.0
114045: loss=0.000, reward_mean=0.0, reward_bound=0.0
114046: loss=0.000, reward_mean=0.0, reward_bound=0.0
114047: loss=0.000, reward_mean=0.0, reward_bound=0.0
114048: loss=0.000, reward_mean=0.0, reward_bound=0.0
114049: loss=0.000, reward_mean=0.1, reward_bound=0.0
114050: loss=0.000, reward_mean=0.1, reward_bound=0.0
114051: loss=0.000, reward_mean=0.1, reward_bound=0.0
114052: loss=0.000, reward_mean=0.1, reward_bound=0.0
114053: loss=0.000, reward_mean=0.1, reward_bound=0.0
114054: loss=0.000, reward_mean=0.0, reward_bound=0.0
114055: loss=0.000, reward_mean=0.0, reward_bound=0.0
114056: loss=0.000, reward_mean=0.0, reward_bound=0.0
114057: loss=0.000, reward_m

114191: loss=0.000, reward_mean=0.2, reward_bound=0.0
114192: loss=0.000, reward_mean=0.1, reward_bound=0.0
114193: loss=0.000, reward_mean=0.1, reward_bound=0.0
114194: loss=0.000, reward_mean=0.1, reward_bound=0.0
114195: loss=0.000, reward_mean=0.1, reward_bound=0.0
114196: loss=0.000, reward_mean=0.0, reward_bound=0.0
114197: loss=0.000, reward_mean=0.1, reward_bound=0.0
114198: loss=0.000, reward_mean=0.1, reward_bound=0.0
114199: loss=0.000, reward_mean=0.1, reward_bound=0.0
114200: loss=0.000, reward_mean=0.1, reward_bound=0.0
114201: loss=0.000, reward_mean=0.0, reward_bound=0.0
114202: loss=0.000, reward_mean=0.0, reward_bound=0.0
114203: loss=0.000, reward_mean=0.0, reward_bound=0.0
114204: loss=0.000, reward_mean=0.1, reward_bound=0.0
114205: loss=0.000, reward_mean=0.0, reward_bound=0.0
114206: loss=0.000, reward_mean=0.0, reward_bound=0.0
114207: loss=0.000, reward_mean=0.0, reward_bound=0.0
114208: loss=0.000, reward_mean=0.0, reward_bound=0.0
114209: loss=0.000, reward_m

114348: loss=0.000, reward_mean=0.1, reward_bound=0.0
114349: loss=0.000, reward_mean=0.1, reward_bound=0.0
114350: loss=0.000, reward_mean=0.1, reward_bound=0.0
114351: loss=0.000, reward_mean=0.0, reward_bound=0.0
114352: loss=0.000, reward_mean=0.1, reward_bound=0.0
114353: loss=0.000, reward_mean=0.1, reward_bound=0.0
114354: loss=0.000, reward_mean=0.1, reward_bound=0.0
114355: loss=0.000, reward_mean=0.0, reward_bound=0.0
114356: loss=0.000, reward_mean=0.1, reward_bound=0.0
114357: loss=0.000, reward_mean=0.1, reward_bound=0.0
114358: loss=0.000, reward_mean=0.1, reward_bound=0.0
114359: loss=0.000, reward_mean=0.1, reward_bound=0.0
114360: loss=0.000, reward_mean=0.1, reward_bound=0.0
114361: loss=0.000, reward_mean=0.2, reward_bound=0.0
114362: loss=0.000, reward_mean=0.0, reward_bound=0.0
114363: loss=0.000, reward_mean=0.1, reward_bound=0.0
114364: loss=0.000, reward_mean=0.0, reward_bound=0.0
114365: loss=0.000, reward_mean=0.1, reward_bound=0.0
114366: loss=0.000, reward_m

114503: loss=0.000, reward_mean=0.0, reward_bound=0.0
114504: loss=0.000, reward_mean=0.1, reward_bound=0.0
114505: loss=0.000, reward_mean=0.0, reward_bound=0.0
114506: loss=0.000, reward_mean=0.0, reward_bound=0.0
114507: loss=0.000, reward_mean=0.0, reward_bound=0.0
114508: loss=0.000, reward_mean=0.0, reward_bound=0.0
114509: loss=0.000, reward_mean=0.1, reward_bound=0.0
114510: loss=0.000, reward_mean=0.0, reward_bound=0.0
114511: loss=0.000, reward_mean=0.1, reward_bound=0.0
114512: loss=0.000, reward_mean=0.0, reward_bound=0.0
114513: loss=0.000, reward_mean=0.1, reward_bound=0.0
114514: loss=0.000, reward_mean=0.0, reward_bound=0.0
114515: loss=0.000, reward_mean=0.1, reward_bound=0.0
114516: loss=0.000, reward_mean=0.1, reward_bound=0.0
114517: loss=0.000, reward_mean=0.1, reward_bound=0.0
114518: loss=0.000, reward_mean=0.1, reward_bound=0.0
114519: loss=0.000, reward_mean=0.1, reward_bound=0.0
114520: loss=0.000, reward_mean=0.1, reward_bound=0.0
114521: loss=0.000, reward_m

114656: loss=0.000, reward_mean=0.0, reward_bound=0.0
114657: loss=0.000, reward_mean=0.1, reward_bound=0.0
114658: loss=0.000, reward_mean=0.1, reward_bound=0.0
114659: loss=0.000, reward_mean=0.1, reward_bound=0.0
114660: loss=0.000, reward_mean=0.1, reward_bound=0.0
114661: loss=0.000, reward_mean=0.0, reward_bound=0.0
114662: loss=0.000, reward_mean=0.0, reward_bound=0.0
114663: loss=0.000, reward_mean=0.1, reward_bound=0.0
114664: loss=0.000, reward_mean=0.0, reward_bound=0.0
114665: loss=0.000, reward_mean=0.0, reward_bound=0.0
114666: loss=0.000, reward_mean=0.0, reward_bound=0.0
114667: loss=0.000, reward_mean=0.1, reward_bound=0.0
114668: loss=0.000, reward_mean=0.1, reward_bound=0.0
114669: loss=0.000, reward_mean=0.1, reward_bound=0.0
114670: loss=0.000, reward_mean=0.0, reward_bound=0.0
114671: loss=0.000, reward_mean=0.1, reward_bound=0.0
114672: loss=0.000, reward_mean=0.0, reward_bound=0.0
114673: loss=0.000, reward_mean=0.1, reward_bound=0.0
114674: loss=0.000, reward_m

114810: loss=0.000, reward_mean=0.0, reward_bound=0.0
114811: loss=0.000, reward_mean=0.1, reward_bound=0.0
114812: loss=0.000, reward_mean=0.0, reward_bound=0.0
114813: loss=0.000, reward_mean=0.1, reward_bound=0.0
114814: loss=0.000, reward_mean=0.0, reward_bound=0.0
114815: loss=0.000, reward_mean=0.1, reward_bound=0.0
114816: loss=0.000, reward_mean=0.2, reward_bound=0.0
114817: loss=0.000, reward_mean=0.0, reward_bound=0.0
114818: loss=0.000, reward_mean=0.1, reward_bound=0.0
114819: loss=0.000, reward_mean=0.2, reward_bound=0.0
114820: loss=0.000, reward_mean=0.1, reward_bound=0.0
114821: loss=0.000, reward_mean=0.1, reward_bound=0.0
114822: loss=0.000, reward_mean=0.0, reward_bound=0.0
114823: loss=0.000, reward_mean=0.1, reward_bound=0.0
114824: loss=0.000, reward_mean=0.1, reward_bound=0.0
114825: loss=0.000, reward_mean=0.1, reward_bound=0.0
114826: loss=0.000, reward_mean=0.1, reward_bound=0.0
114827: loss=0.000, reward_mean=0.0, reward_bound=0.0
114828: loss=0.000, reward_m

114968: loss=0.000, reward_mean=0.0, reward_bound=0.0
114969: loss=0.000, reward_mean=0.0, reward_bound=0.0
114970: loss=0.000, reward_mean=0.1, reward_bound=0.0
114971: loss=0.000, reward_mean=0.1, reward_bound=0.0
114972: loss=0.000, reward_mean=0.1, reward_bound=0.0
114973: loss=0.000, reward_mean=0.1, reward_bound=0.0
114974: loss=0.000, reward_mean=0.0, reward_bound=0.0
114975: loss=0.000, reward_mean=0.0, reward_bound=0.0
114976: loss=0.000, reward_mean=0.1, reward_bound=0.0
114977: loss=0.000, reward_mean=0.1, reward_bound=0.0
114978: loss=0.000, reward_mean=0.0, reward_bound=0.0
114979: loss=0.000, reward_mean=0.1, reward_bound=0.0
114980: loss=0.000, reward_mean=0.1, reward_bound=0.0
114981: loss=0.000, reward_mean=0.1, reward_bound=0.0
114982: loss=0.000, reward_mean=0.1, reward_bound=0.0
114983: loss=0.000, reward_mean=0.0, reward_bound=0.0
114984: loss=0.000, reward_mean=0.2, reward_bound=0.0
114985: loss=0.000, reward_mean=0.0, reward_bound=0.0
114986: loss=0.000, reward_m

115121: loss=0.000, reward_mean=0.0, reward_bound=0.0
115122: loss=0.000, reward_mean=0.1, reward_bound=0.0
115123: loss=0.000, reward_mean=0.0, reward_bound=0.0
115124: loss=0.000, reward_mean=0.0, reward_bound=0.0
115125: loss=0.000, reward_mean=0.0, reward_bound=0.0
115126: loss=0.000, reward_mean=0.1, reward_bound=0.0
115127: loss=0.000, reward_mean=0.0, reward_bound=0.0
115128: loss=0.000, reward_mean=0.1, reward_bound=0.0
115129: loss=0.000, reward_mean=0.1, reward_bound=0.0
115130: loss=0.000, reward_mean=0.0, reward_bound=0.0
115131: loss=0.000, reward_mean=0.1, reward_bound=0.0
115132: loss=0.000, reward_mean=0.0, reward_bound=0.0
115133: loss=0.000, reward_mean=0.0, reward_bound=0.0
115134: loss=0.000, reward_mean=0.1, reward_bound=0.0
115135: loss=0.000, reward_mean=0.0, reward_bound=0.0
115136: loss=0.000, reward_mean=0.1, reward_bound=0.0
115137: loss=0.000, reward_mean=0.1, reward_bound=0.0
115138: loss=0.000, reward_mean=0.2, reward_bound=0.0
115139: loss=0.000, reward_m

115276: loss=0.000, reward_mean=0.0, reward_bound=0.0
115277: loss=0.000, reward_mean=0.1, reward_bound=0.0
115278: loss=0.000, reward_mean=0.0, reward_bound=0.0
115279: loss=0.000, reward_mean=0.1, reward_bound=0.0
115280: loss=0.000, reward_mean=0.0, reward_bound=0.0
115281: loss=0.000, reward_mean=0.1, reward_bound=0.0
115282: loss=0.000, reward_mean=0.0, reward_bound=0.0
115283: loss=0.000, reward_mean=0.0, reward_bound=0.0
115284: loss=0.000, reward_mean=0.0, reward_bound=0.0
115285: loss=0.000, reward_mean=0.1, reward_bound=0.0
115286: loss=0.000, reward_mean=0.0, reward_bound=0.0
115287: loss=0.000, reward_mean=0.1, reward_bound=0.0
115288: loss=0.000, reward_mean=0.0, reward_bound=0.0
115289: loss=0.000, reward_mean=0.0, reward_bound=0.0
115290: loss=0.000, reward_mean=0.1, reward_bound=0.0
115291: loss=0.000, reward_mean=0.0, reward_bound=0.0
115292: loss=0.000, reward_mean=0.0, reward_bound=0.0
115293: loss=0.000, reward_mean=0.0, reward_bound=0.0
115294: loss=0.000, reward_m

115433: loss=0.000, reward_mean=0.1, reward_bound=0.0
115434: loss=0.000, reward_mean=0.0, reward_bound=0.0
115435: loss=0.000, reward_mean=0.0, reward_bound=0.0
115436: loss=0.000, reward_mean=0.1, reward_bound=0.0
115437: loss=0.000, reward_mean=0.1, reward_bound=0.0
115438: loss=0.000, reward_mean=0.0, reward_bound=0.0
115439: loss=0.000, reward_mean=0.1, reward_bound=0.0
115440: loss=0.000, reward_mean=0.1, reward_bound=0.0
115441: loss=0.000, reward_mean=0.1, reward_bound=0.0
115442: loss=0.000, reward_mean=0.0, reward_bound=0.0
115443: loss=0.000, reward_mean=0.0, reward_bound=0.0
115444: loss=0.000, reward_mean=0.0, reward_bound=0.0
115445: loss=0.000, reward_mean=0.0, reward_bound=0.0
115446: loss=0.000, reward_mean=0.1, reward_bound=0.0
115447: loss=0.000, reward_mean=0.1, reward_bound=0.0
115448: loss=0.000, reward_mean=0.2, reward_bound=0.0
115449: loss=0.000, reward_mean=0.1, reward_bound=0.0
115450: loss=0.000, reward_mean=0.1, reward_bound=0.0
115451: loss=0.000, reward_m

115587: loss=0.000, reward_mean=0.2, reward_bound=0.0
115588: loss=0.000, reward_mean=0.2, reward_bound=0.0
115589: loss=0.000, reward_mean=0.1, reward_bound=0.0
115590: loss=0.000, reward_mean=0.0, reward_bound=0.0
115591: loss=0.000, reward_mean=0.1, reward_bound=0.0
115592: loss=0.000, reward_mean=0.0, reward_bound=0.0
115593: loss=0.000, reward_mean=0.0, reward_bound=0.0
115594: loss=0.000, reward_mean=0.0, reward_bound=0.0
115595: loss=0.000, reward_mean=0.0, reward_bound=0.0
115596: loss=0.000, reward_mean=0.0, reward_bound=0.0
115597: loss=0.000, reward_mean=0.1, reward_bound=0.0
115598: loss=0.000, reward_mean=0.1, reward_bound=0.0
115599: loss=0.000, reward_mean=0.0, reward_bound=0.0
115600: loss=0.000, reward_mean=0.1, reward_bound=0.0
115601: loss=0.000, reward_mean=0.1, reward_bound=0.0
115602: loss=0.000, reward_mean=0.1, reward_bound=0.0
115603: loss=0.000, reward_mean=0.0, reward_bound=0.0
115604: loss=0.000, reward_mean=0.0, reward_bound=0.0
115605: loss=0.000, reward_m

115740: loss=0.000, reward_mean=0.0, reward_bound=0.0
115741: loss=0.000, reward_mean=0.0, reward_bound=0.0
115742: loss=0.000, reward_mean=0.1, reward_bound=0.0
115743: loss=0.000, reward_mean=0.0, reward_bound=0.0
115744: loss=0.000, reward_mean=0.0, reward_bound=0.0
115745: loss=0.000, reward_mean=0.0, reward_bound=0.0
115746: loss=0.000, reward_mean=0.0, reward_bound=0.0
115747: loss=0.000, reward_mean=0.1, reward_bound=0.0
115748: loss=0.000, reward_mean=0.1, reward_bound=0.0
115749: loss=0.000, reward_mean=0.0, reward_bound=0.0
115750: loss=0.000, reward_mean=0.1, reward_bound=0.0
115751: loss=0.000, reward_mean=0.0, reward_bound=0.0
115752: loss=0.000, reward_mean=0.1, reward_bound=0.0
115753: loss=0.000, reward_mean=0.0, reward_bound=0.0
115754: loss=0.000, reward_mean=0.3, reward_bound=0.5
115755: loss=0.000, reward_mean=0.1, reward_bound=0.0
115756: loss=0.000, reward_mean=0.1, reward_bound=0.0
115757: loss=0.000, reward_mean=0.0, reward_bound=0.0
115758: loss=0.000, reward_m

115895: loss=0.000, reward_mean=0.0, reward_bound=0.0
115896: loss=0.000, reward_mean=0.0, reward_bound=0.0
115897: loss=0.000, reward_mean=0.1, reward_bound=0.0
115898: loss=0.000, reward_mean=0.1, reward_bound=0.0
115899: loss=0.000, reward_mean=0.1, reward_bound=0.0
115900: loss=0.000, reward_mean=0.2, reward_bound=0.0
115901: loss=0.000, reward_mean=0.1, reward_bound=0.0
115902: loss=0.000, reward_mean=0.1, reward_bound=0.0
115903: loss=0.000, reward_mean=0.1, reward_bound=0.0
115904: loss=0.000, reward_mean=0.0, reward_bound=0.0
115905: loss=0.000, reward_mean=0.2, reward_bound=0.0
115906: loss=0.000, reward_mean=0.1, reward_bound=0.0
115907: loss=0.000, reward_mean=0.1, reward_bound=0.0
115908: loss=0.000, reward_mean=0.1, reward_bound=0.0
115909: loss=0.000, reward_mean=0.1, reward_bound=0.0
115910: loss=0.000, reward_mean=0.0, reward_bound=0.0
115911: loss=0.000, reward_mean=0.1, reward_bound=0.0
115912: loss=0.000, reward_mean=0.1, reward_bound=0.0
115913: loss=0.000, reward_m

116047: loss=0.000, reward_mean=0.0, reward_bound=0.0
116048: loss=0.000, reward_mean=0.0, reward_bound=0.0
116049: loss=0.000, reward_mean=0.1, reward_bound=0.0
116050: loss=0.000, reward_mean=0.1, reward_bound=0.0
116051: loss=0.000, reward_mean=0.1, reward_bound=0.0
116052: loss=0.000, reward_mean=0.1, reward_bound=0.0
116053: loss=0.000, reward_mean=0.0, reward_bound=0.0
116054: loss=0.000, reward_mean=0.1, reward_bound=0.0
116055: loss=0.000, reward_mean=0.0, reward_bound=0.0
116056: loss=0.000, reward_mean=0.1, reward_bound=0.0
116057: loss=0.000, reward_mean=0.1, reward_bound=0.0
116058: loss=0.000, reward_mean=0.0, reward_bound=0.0
116059: loss=0.000, reward_mean=0.2, reward_bound=0.0
116060: loss=0.000, reward_mean=0.0, reward_bound=0.0
116061: loss=0.000, reward_mean=0.2, reward_bound=0.0
116062: loss=0.000, reward_mean=0.0, reward_bound=0.0
116063: loss=0.000, reward_mean=0.0, reward_bound=0.0
116064: loss=0.000, reward_mean=0.0, reward_bound=0.0
116065: loss=0.000, reward_m

116203: loss=0.000, reward_mean=0.0, reward_bound=0.0
116204: loss=0.000, reward_mean=0.1, reward_bound=0.0
116205: loss=0.000, reward_mean=0.1, reward_bound=0.0
116206: loss=0.000, reward_mean=0.1, reward_bound=0.0
116207: loss=0.000, reward_mean=0.0, reward_bound=0.0
116208: loss=0.000, reward_mean=0.0, reward_bound=0.0
116209: loss=0.000, reward_mean=0.0, reward_bound=0.0
116210: loss=0.000, reward_mean=0.1, reward_bound=0.0
116211: loss=0.000, reward_mean=0.1, reward_bound=0.0
116212: loss=0.000, reward_mean=0.1, reward_bound=0.0
116213: loss=0.000, reward_mean=0.0, reward_bound=0.0
116214: loss=0.000, reward_mean=0.1, reward_bound=0.0
116215: loss=0.000, reward_mean=0.2, reward_bound=0.0
116216: loss=0.000, reward_mean=0.1, reward_bound=0.0
116217: loss=0.000, reward_mean=0.2, reward_bound=0.0
116218: loss=0.000, reward_mean=0.2, reward_bound=0.0
116219: loss=0.000, reward_mean=0.1, reward_bound=0.0
116220: loss=0.000, reward_mean=0.1, reward_bound=0.0
116221: loss=0.000, reward_m

116358: loss=0.000, reward_mean=0.1, reward_bound=0.0
116359: loss=0.000, reward_mean=0.1, reward_bound=0.0
116360: loss=0.000, reward_mean=0.0, reward_bound=0.0
116361: loss=0.000, reward_mean=0.1, reward_bound=0.0
116362: loss=0.000, reward_mean=0.1, reward_bound=0.0
116363: loss=0.000, reward_mean=0.1, reward_bound=0.0
116364: loss=0.000, reward_mean=0.0, reward_bound=0.0
116365: loss=0.000, reward_mean=0.1, reward_bound=0.0
116366: loss=0.000, reward_mean=0.0, reward_bound=0.0
116367: loss=0.000, reward_mean=0.0, reward_bound=0.0
116368: loss=0.000, reward_mean=0.2, reward_bound=0.0
116369: loss=0.000, reward_mean=0.1, reward_bound=0.0
116370: loss=0.000, reward_mean=0.1, reward_bound=0.0
116371: loss=0.000, reward_mean=0.1, reward_bound=0.0
116372: loss=0.000, reward_mean=0.1, reward_bound=0.0
116373: loss=0.000, reward_mean=0.1, reward_bound=0.0
116374: loss=0.000, reward_mean=0.1, reward_bound=0.0
116375: loss=0.000, reward_mean=0.0, reward_bound=0.0
116376: loss=0.000, reward_m

116513: loss=0.000, reward_mean=0.1, reward_bound=0.0
116514: loss=0.000, reward_mean=0.0, reward_bound=0.0
116515: loss=0.000, reward_mean=0.1, reward_bound=0.0
116516: loss=0.000, reward_mean=0.1, reward_bound=0.0
116517: loss=0.000, reward_mean=0.1, reward_bound=0.0
116518: loss=0.000, reward_mean=0.0, reward_bound=0.0
116519: loss=0.000, reward_mean=0.1, reward_bound=0.0
116520: loss=0.000, reward_mean=0.0, reward_bound=0.0
116521: loss=0.000, reward_mean=0.2, reward_bound=0.0
116522: loss=0.000, reward_mean=0.1, reward_bound=0.0
116523: loss=0.000, reward_mean=0.1, reward_bound=0.0
116524: loss=0.000, reward_mean=0.1, reward_bound=0.0
116525: loss=0.000, reward_mean=0.1, reward_bound=0.0
116526: loss=0.000, reward_mean=0.1, reward_bound=0.0
116527: loss=0.000, reward_mean=0.1, reward_bound=0.0
116528: loss=0.000, reward_mean=0.0, reward_bound=0.0
116529: loss=0.000, reward_mean=0.1, reward_bound=0.0
116530: loss=0.000, reward_mean=0.0, reward_bound=0.0
116531: loss=0.000, reward_m

116665: loss=0.000, reward_mean=0.2, reward_bound=0.0
116666: loss=0.000, reward_mean=0.1, reward_bound=0.0
116667: loss=0.000, reward_mean=0.1, reward_bound=0.0
116668: loss=0.000, reward_mean=0.1, reward_bound=0.0
116669: loss=0.000, reward_mean=0.1, reward_bound=0.0
116670: loss=0.000, reward_mean=0.1, reward_bound=0.0
116671: loss=0.000, reward_mean=0.0, reward_bound=0.0
116672: loss=0.000, reward_mean=0.1, reward_bound=0.0
116673: loss=0.000, reward_mean=0.0, reward_bound=0.0
116674: loss=0.000, reward_mean=0.1, reward_bound=0.0
116675: loss=0.000, reward_mean=0.1, reward_bound=0.0
116676: loss=0.000, reward_mean=0.0, reward_bound=0.0
116677: loss=0.000, reward_mean=0.1, reward_bound=0.0
116678: loss=0.000, reward_mean=0.1, reward_bound=0.0
116679: loss=0.000, reward_mean=0.0, reward_bound=0.0
116680: loss=0.000, reward_mean=0.1, reward_bound=0.0
116681: loss=0.000, reward_mean=0.1, reward_bound=0.0
116682: loss=0.000, reward_mean=0.1, reward_bound=0.0
116683: loss=0.000, reward_m

116821: loss=0.000, reward_mean=0.1, reward_bound=0.0
116822: loss=0.000, reward_mean=0.1, reward_bound=0.0
116823: loss=0.000, reward_mean=0.1, reward_bound=0.0
116824: loss=0.000, reward_mean=0.0, reward_bound=0.0
116825: loss=0.000, reward_mean=0.1, reward_bound=0.0
116826: loss=0.000, reward_mean=0.3, reward_bound=0.5
116827: loss=0.000, reward_mean=0.0, reward_bound=0.0
116828: loss=0.000, reward_mean=0.1, reward_bound=0.0
116829: loss=0.000, reward_mean=0.1, reward_bound=0.0
116830: loss=0.000, reward_mean=0.1, reward_bound=0.0
116831: loss=0.000, reward_mean=0.0, reward_bound=0.0
116832: loss=0.000, reward_mean=0.1, reward_bound=0.0
116833: loss=0.000, reward_mean=0.0, reward_bound=0.0
116834: loss=0.000, reward_mean=0.0, reward_bound=0.0
116835: loss=0.000, reward_mean=0.0, reward_bound=0.0
116836: loss=0.000, reward_mean=0.1, reward_bound=0.0
116837: loss=0.000, reward_mean=0.1, reward_bound=0.0
116838: loss=0.000, reward_mean=0.1, reward_bound=0.0
116839: loss=0.000, reward_m

116975: loss=0.000, reward_mean=0.1, reward_bound=0.0
116976: loss=0.000, reward_mean=0.1, reward_bound=0.0
116977: loss=0.000, reward_mean=0.1, reward_bound=0.0
116978: loss=0.000, reward_mean=0.0, reward_bound=0.0
116979: loss=0.000, reward_mean=0.1, reward_bound=0.0
116980: loss=0.000, reward_mean=0.1, reward_bound=0.0
116981: loss=0.000, reward_mean=0.0, reward_bound=0.0
116982: loss=0.000, reward_mean=0.0, reward_bound=0.0
116983: loss=0.000, reward_mean=0.2, reward_bound=0.0
116984: loss=0.000, reward_mean=0.1, reward_bound=0.0
116985: loss=0.000, reward_mean=0.0, reward_bound=0.0
116986: loss=0.000, reward_mean=0.1, reward_bound=0.0
116987: loss=0.000, reward_mean=0.1, reward_bound=0.0
116988: loss=0.000, reward_mean=0.1, reward_bound=0.0
116989: loss=0.000, reward_mean=0.1, reward_bound=0.0
116990: loss=0.000, reward_mean=0.1, reward_bound=0.0
116991: loss=0.000, reward_mean=0.0, reward_bound=0.0
116992: loss=0.000, reward_mean=0.0, reward_bound=0.0
116993: loss=0.000, reward_m

117129: loss=0.000, reward_mean=0.0, reward_bound=0.0
117130: loss=0.000, reward_mean=0.0, reward_bound=0.0
117131: loss=0.000, reward_mean=0.0, reward_bound=0.0
117132: loss=0.000, reward_mean=0.0, reward_bound=0.0
117133: loss=0.000, reward_mean=0.0, reward_bound=0.0
117134: loss=0.000, reward_mean=0.1, reward_bound=0.0
117135: loss=0.000, reward_mean=0.1, reward_bound=0.0
117136: loss=0.000, reward_mean=0.1, reward_bound=0.0
117137: loss=0.000, reward_mean=0.0, reward_bound=0.0
117138: loss=0.000, reward_mean=0.1, reward_bound=0.0
117139: loss=0.000, reward_mean=0.1, reward_bound=0.0
117140: loss=0.000, reward_mean=0.0, reward_bound=0.0
117141: loss=0.000, reward_mean=0.1, reward_bound=0.0
117142: loss=0.000, reward_mean=0.1, reward_bound=0.0
117143: loss=0.000, reward_mean=0.2, reward_bound=0.0
117144: loss=0.000, reward_mean=0.0, reward_bound=0.0
117145: loss=0.000, reward_mean=0.0, reward_bound=0.0
117146: loss=0.000, reward_mean=0.0, reward_bound=0.0
117147: loss=0.000, reward_m

117285: loss=0.000, reward_mean=0.1, reward_bound=0.0
117286: loss=0.000, reward_mean=0.1, reward_bound=0.0
117287: loss=0.000, reward_mean=0.1, reward_bound=0.0
117288: loss=0.000, reward_mean=0.2, reward_bound=0.0
117289: loss=0.000, reward_mean=0.2, reward_bound=0.0
117290: loss=0.000, reward_mean=0.1, reward_bound=0.0
117291: loss=0.000, reward_mean=0.0, reward_bound=0.0
117292: loss=0.000, reward_mean=0.1, reward_bound=0.0
117293: loss=0.000, reward_mean=0.0, reward_bound=0.0
117294: loss=0.000, reward_mean=0.0, reward_bound=0.0
117295: loss=0.000, reward_mean=0.1, reward_bound=0.0
117296: loss=0.000, reward_mean=0.0, reward_bound=0.0
117297: loss=0.000, reward_mean=0.1, reward_bound=0.0
117298: loss=0.000, reward_mean=0.1, reward_bound=0.0
117299: loss=0.000, reward_mean=0.0, reward_bound=0.0
117300: loss=0.000, reward_mean=0.1, reward_bound=0.0
117301: loss=0.000, reward_mean=0.1, reward_bound=0.0
117302: loss=0.000, reward_mean=0.1, reward_bound=0.0
117303: loss=0.000, reward_m

117439: loss=0.000, reward_mean=0.0, reward_bound=0.0
117440: loss=0.000, reward_mean=0.0, reward_bound=0.0
117441: loss=0.000, reward_mean=0.1, reward_bound=0.0
117442: loss=0.000, reward_mean=0.0, reward_bound=0.0
117443: loss=0.000, reward_mean=0.1, reward_bound=0.0
117444: loss=0.000, reward_mean=0.1, reward_bound=0.0
117445: loss=0.000, reward_mean=0.2, reward_bound=0.0
117446: loss=0.000, reward_mean=0.1, reward_bound=0.0
117447: loss=0.000, reward_mean=0.0, reward_bound=0.0
117448: loss=0.000, reward_mean=0.0, reward_bound=0.0
117449: loss=0.000, reward_mean=0.0, reward_bound=0.0
117450: loss=0.000, reward_mean=0.0, reward_bound=0.0
117451: loss=0.000, reward_mean=0.1, reward_bound=0.0
117452: loss=0.000, reward_mean=0.1, reward_bound=0.0
117453: loss=0.000, reward_mean=0.0, reward_bound=0.0
117454: loss=0.000, reward_mean=0.1, reward_bound=0.0
117455: loss=0.000, reward_mean=0.1, reward_bound=0.0
117456: loss=0.000, reward_mean=0.0, reward_bound=0.0
117457: loss=0.000, reward_m

117594: loss=0.000, reward_mean=0.0, reward_bound=0.0
117595: loss=0.000, reward_mean=0.1, reward_bound=0.0
117596: loss=0.000, reward_mean=0.1, reward_bound=0.0
117597: loss=0.000, reward_mean=0.1, reward_bound=0.0
117598: loss=0.000, reward_mean=0.0, reward_bound=0.0
117599: loss=0.000, reward_mean=0.0, reward_bound=0.0
117600: loss=0.000, reward_mean=0.2, reward_bound=0.0
117601: loss=0.000, reward_mean=0.0, reward_bound=0.0
117602: loss=0.000, reward_mean=0.0, reward_bound=0.0
117603: loss=0.000, reward_mean=0.1, reward_bound=0.0
117604: loss=0.000, reward_mean=0.1, reward_bound=0.0
117605: loss=0.000, reward_mean=0.0, reward_bound=0.0
117606: loss=0.000, reward_mean=0.1, reward_bound=0.0
117607: loss=0.000, reward_mean=0.0, reward_bound=0.0
117608: loss=0.000, reward_mean=0.0, reward_bound=0.0
117609: loss=0.000, reward_mean=0.1, reward_bound=0.0
117610: loss=0.000, reward_mean=0.0, reward_bound=0.0
117611: loss=0.000, reward_mean=0.1, reward_bound=0.0
117612: loss=0.000, reward_m

117748: loss=0.000, reward_mean=0.1, reward_bound=0.0
117749: loss=0.000, reward_mean=0.1, reward_bound=0.0
117750: loss=0.000, reward_mean=0.1, reward_bound=0.0
117751: loss=0.000, reward_mean=0.1, reward_bound=0.0
117752: loss=0.000, reward_mean=0.1, reward_bound=0.0
117753: loss=0.000, reward_mean=0.1, reward_bound=0.0
117754: loss=0.000, reward_mean=0.0, reward_bound=0.0
117755: loss=0.000, reward_mean=0.0, reward_bound=0.0
117756: loss=0.000, reward_mean=0.0, reward_bound=0.0
117757: loss=0.000, reward_mean=0.0, reward_bound=0.0
117758: loss=0.000, reward_mean=0.0, reward_bound=0.0
117759: loss=0.000, reward_mean=0.1, reward_bound=0.0
117760: loss=0.000, reward_mean=0.1, reward_bound=0.0
117761: loss=0.000, reward_mean=0.0, reward_bound=0.0
117762: loss=0.000, reward_mean=0.1, reward_bound=0.0
117763: loss=0.000, reward_mean=0.1, reward_bound=0.0
117764: loss=0.000, reward_mean=0.1, reward_bound=0.0
117765: loss=0.000, reward_mean=0.1, reward_bound=0.0
117766: loss=0.000, reward_m

117903: loss=0.000, reward_mean=0.0, reward_bound=0.0
117904: loss=0.000, reward_mean=0.1, reward_bound=0.0
117905: loss=0.000, reward_mean=0.0, reward_bound=0.0
117906: loss=0.000, reward_mean=0.0, reward_bound=0.0
117907: loss=0.000, reward_mean=0.0, reward_bound=0.0
117908: loss=0.000, reward_mean=0.1, reward_bound=0.0
117909: loss=0.000, reward_mean=0.1, reward_bound=0.0
117910: loss=0.000, reward_mean=0.0, reward_bound=0.0
117911: loss=0.000, reward_mean=0.0, reward_bound=0.0
117912: loss=0.000, reward_mean=0.1, reward_bound=0.0
117913: loss=0.000, reward_mean=0.1, reward_bound=0.0
117914: loss=0.000, reward_mean=0.1, reward_bound=0.0
117915: loss=0.000, reward_mean=0.1, reward_bound=0.0
117916: loss=0.000, reward_mean=0.1, reward_bound=0.0
117917: loss=0.000, reward_mean=0.1, reward_bound=0.0
117918: loss=0.000, reward_mean=0.1, reward_bound=0.0
117919: loss=0.000, reward_mean=0.1, reward_bound=0.0
117920: loss=0.000, reward_mean=0.0, reward_bound=0.0
117921: loss=0.000, reward_m

118060: loss=0.000, reward_mean=0.1, reward_bound=0.0
118061: loss=0.000, reward_mean=0.0, reward_bound=0.0
118062: loss=0.000, reward_mean=0.1, reward_bound=0.0
118063: loss=0.000, reward_mean=0.1, reward_bound=0.0
118064: loss=0.000, reward_mean=0.0, reward_bound=0.0
118065: loss=0.000, reward_mean=0.1, reward_bound=0.0
118066: loss=0.000, reward_mean=0.1, reward_bound=0.0
118067: loss=0.000, reward_mean=0.1, reward_bound=0.0
118068: loss=0.000, reward_mean=0.0, reward_bound=0.0
118069: loss=0.000, reward_mean=0.0, reward_bound=0.0
118070: loss=0.000, reward_mean=0.1, reward_bound=0.0
118071: loss=0.000, reward_mean=0.1, reward_bound=0.0
118072: loss=0.000, reward_mean=0.1, reward_bound=0.0
118073: loss=0.000, reward_mean=0.2, reward_bound=0.0
118074: loss=0.000, reward_mean=0.0, reward_bound=0.0
118075: loss=0.000, reward_mean=0.1, reward_bound=0.0
118076: loss=0.000, reward_mean=0.1, reward_bound=0.0
118077: loss=0.000, reward_mean=0.0, reward_bound=0.0
118078: loss=0.000, reward_m

118214: loss=0.000, reward_mean=0.1, reward_bound=0.0
118215: loss=0.000, reward_mean=0.0, reward_bound=0.0
118216: loss=0.000, reward_mean=0.0, reward_bound=0.0
118217: loss=0.000, reward_mean=0.1, reward_bound=0.0
118218: loss=0.000, reward_mean=0.1, reward_bound=0.0
118219: loss=0.000, reward_mean=0.0, reward_bound=0.0
118220: loss=0.000, reward_mean=0.0, reward_bound=0.0
118221: loss=0.000, reward_mean=0.0, reward_bound=0.0
118222: loss=0.000, reward_mean=0.0, reward_bound=0.0
118223: loss=0.000, reward_mean=0.0, reward_bound=0.0
118224: loss=0.000, reward_mean=0.0, reward_bound=0.0
118225: loss=0.000, reward_mean=0.1, reward_bound=0.0
118226: loss=0.000, reward_mean=0.1, reward_bound=0.0
118227: loss=0.000, reward_mean=0.0, reward_bound=0.0
118228: loss=0.000, reward_mean=0.0, reward_bound=0.0
118229: loss=0.000, reward_mean=0.1, reward_bound=0.0
118230: loss=0.000, reward_mean=0.1, reward_bound=0.0
118231: loss=0.000, reward_mean=0.0, reward_bound=0.0
118232: loss=0.000, reward_m

118370: loss=0.000, reward_mean=0.0, reward_bound=0.0
118371: loss=0.000, reward_mean=0.1, reward_bound=0.0
118372: loss=0.000, reward_mean=0.0, reward_bound=0.0
118373: loss=0.000, reward_mean=0.1, reward_bound=0.0
118374: loss=0.000, reward_mean=0.1, reward_bound=0.0
118375: loss=0.000, reward_mean=0.1, reward_bound=0.0
118376: loss=0.000, reward_mean=0.0, reward_bound=0.0
118377: loss=0.000, reward_mean=0.1, reward_bound=0.0
118378: loss=0.000, reward_mean=0.1, reward_bound=0.0
118379: loss=0.000, reward_mean=0.1, reward_bound=0.0
118380: loss=0.000, reward_mean=0.0, reward_bound=0.0
118381: loss=0.000, reward_mean=0.0, reward_bound=0.0
118382: loss=0.000, reward_mean=0.1, reward_bound=0.0
118383: loss=0.000, reward_mean=0.1, reward_bound=0.0
118384: loss=0.000, reward_mean=0.0, reward_bound=0.0
118385: loss=0.000, reward_mean=0.0, reward_bound=0.0
118386: loss=0.000, reward_mean=0.1, reward_bound=0.0
118387: loss=0.000, reward_mean=0.1, reward_bound=0.0
118388: loss=0.000, reward_m

118522: loss=0.000, reward_mean=0.1, reward_bound=0.0
118523: loss=0.000, reward_mean=0.1, reward_bound=0.0
118524: loss=0.000, reward_mean=0.1, reward_bound=0.0
118525: loss=0.000, reward_mean=0.0, reward_bound=0.0
118526: loss=0.000, reward_mean=0.0, reward_bound=0.0
118527: loss=0.000, reward_mean=0.0, reward_bound=0.0
118528: loss=0.000, reward_mean=0.1, reward_bound=0.0
118529: loss=0.000, reward_mean=0.0, reward_bound=0.0
118530: loss=0.000, reward_mean=0.1, reward_bound=0.0
118531: loss=0.000, reward_mean=0.2, reward_bound=0.0
118532: loss=0.000, reward_mean=0.1, reward_bound=0.0
118533: loss=0.000, reward_mean=0.0, reward_bound=0.0
118534: loss=0.000, reward_mean=0.1, reward_bound=0.0
118535: loss=0.000, reward_mean=0.2, reward_bound=0.0
118536: loss=0.000, reward_mean=0.1, reward_bound=0.0
118537: loss=0.000, reward_mean=0.0, reward_bound=0.0
118538: loss=0.000, reward_mean=0.1, reward_bound=0.0
118539: loss=0.000, reward_mean=0.1, reward_bound=0.0
118540: loss=0.000, reward_m

118677: loss=0.000, reward_mean=0.1, reward_bound=0.0
118678: loss=0.000, reward_mean=0.0, reward_bound=0.0
118679: loss=0.000, reward_mean=0.0, reward_bound=0.0
118680: loss=0.000, reward_mean=0.0, reward_bound=0.0
118681: loss=0.000, reward_mean=0.0, reward_bound=0.0
118682: loss=0.000, reward_mean=0.0, reward_bound=0.0
118683: loss=0.000, reward_mean=0.0, reward_bound=0.0
118684: loss=0.000, reward_mean=0.0, reward_bound=0.0
118685: loss=0.000, reward_mean=0.1, reward_bound=0.0
118686: loss=0.000, reward_mean=0.1, reward_bound=0.0
118687: loss=0.000, reward_mean=0.1, reward_bound=0.0
118688: loss=0.000, reward_mean=0.0, reward_bound=0.0
118689: loss=0.000, reward_mean=0.1, reward_bound=0.0
118690: loss=0.000, reward_mean=0.0, reward_bound=0.0
118691: loss=0.000, reward_mean=0.0, reward_bound=0.0
118692: loss=0.000, reward_mean=0.0, reward_bound=0.0
118693: loss=0.000, reward_mean=0.1, reward_bound=0.0
118694: loss=0.000, reward_mean=0.0, reward_bound=0.0
118695: loss=0.000, reward_m

118833: loss=0.000, reward_mean=0.1, reward_bound=0.0
118834: loss=0.000, reward_mean=0.2, reward_bound=0.0
118835: loss=0.000, reward_mean=0.1, reward_bound=0.0
118836: loss=0.000, reward_mean=0.1, reward_bound=0.0
118837: loss=0.000, reward_mean=0.1, reward_bound=0.0
118838: loss=0.000, reward_mean=0.1, reward_bound=0.0
118839: loss=0.000, reward_mean=0.1, reward_bound=0.0
118840: loss=0.000, reward_mean=0.1, reward_bound=0.0
118841: loss=0.000, reward_mean=0.1, reward_bound=0.0
118842: loss=0.000, reward_mean=0.0, reward_bound=0.0
118843: loss=0.000, reward_mean=0.0, reward_bound=0.0
118844: loss=0.000, reward_mean=0.0, reward_bound=0.0
118845: loss=0.000, reward_mean=0.1, reward_bound=0.0
118846: loss=0.000, reward_mean=0.2, reward_bound=0.0
118847: loss=0.000, reward_mean=0.1, reward_bound=0.0
118848: loss=0.000, reward_mean=0.0, reward_bound=0.0
118849: loss=0.000, reward_mean=0.1, reward_bound=0.0
118850: loss=0.000, reward_mean=0.0, reward_bound=0.0
118851: loss=0.000, reward_m

118985: loss=0.000, reward_mean=0.0, reward_bound=0.0
118986: loss=0.000, reward_mean=0.1, reward_bound=0.0
118987: loss=0.000, reward_mean=0.1, reward_bound=0.0
118988: loss=0.000, reward_mean=0.1, reward_bound=0.0
118989: loss=0.000, reward_mean=0.0, reward_bound=0.0
118990: loss=0.000, reward_mean=0.1, reward_bound=0.0
118991: loss=0.000, reward_mean=0.1, reward_bound=0.0
118992: loss=0.000, reward_mean=0.0, reward_bound=0.0
118993: loss=0.000, reward_mean=0.1, reward_bound=0.0
118994: loss=0.000, reward_mean=0.1, reward_bound=0.0
118995: loss=0.000, reward_mean=0.0, reward_bound=0.0
118996: loss=0.000, reward_mean=0.1, reward_bound=0.0
118997: loss=0.000, reward_mean=0.0, reward_bound=0.0
118998: loss=0.000, reward_mean=0.0, reward_bound=0.0
118999: loss=0.000, reward_mean=0.0, reward_bound=0.0
119000: loss=0.000, reward_mean=0.0, reward_bound=0.0
119001: loss=0.000, reward_mean=0.0, reward_bound=0.0
119002: loss=0.000, reward_mean=0.0, reward_bound=0.0
119003: loss=0.000, reward_m

119138: loss=0.000, reward_mean=0.1, reward_bound=0.0
119139: loss=0.000, reward_mean=0.0, reward_bound=0.0
119140: loss=0.000, reward_mean=0.0, reward_bound=0.0
119141: loss=0.000, reward_mean=0.1, reward_bound=0.0
119142: loss=0.000, reward_mean=0.1, reward_bound=0.0
119143: loss=0.000, reward_mean=0.1, reward_bound=0.0
119144: loss=0.000, reward_mean=0.0, reward_bound=0.0
119145: loss=0.000, reward_mean=0.1, reward_bound=0.0
119146: loss=0.000, reward_mean=0.0, reward_bound=0.0
119147: loss=0.000, reward_mean=0.0, reward_bound=0.0
119148: loss=0.000, reward_mean=0.1, reward_bound=0.0
119149: loss=0.000, reward_mean=0.0, reward_bound=0.0
119150: loss=0.000, reward_mean=0.0, reward_bound=0.0
119151: loss=0.000, reward_mean=0.1, reward_bound=0.0
119152: loss=0.000, reward_mean=0.1, reward_bound=0.0
119153: loss=0.000, reward_mean=0.0, reward_bound=0.0
119154: loss=0.000, reward_mean=0.0, reward_bound=0.0
119155: loss=0.000, reward_mean=0.1, reward_bound=0.0
119156: loss=0.000, reward_m

119294: loss=0.000, reward_mean=0.1, reward_bound=0.0
119295: loss=0.000, reward_mean=0.0, reward_bound=0.0
119296: loss=0.000, reward_mean=0.1, reward_bound=0.0
119297: loss=0.000, reward_mean=0.1, reward_bound=0.0
119298: loss=0.000, reward_mean=0.1, reward_bound=0.0
119299: loss=0.000, reward_mean=0.0, reward_bound=0.0
119300: loss=0.000, reward_mean=0.1, reward_bound=0.0
119301: loss=0.000, reward_mean=0.0, reward_bound=0.0
119302: loss=0.000, reward_mean=0.1, reward_bound=0.0
119303: loss=0.000, reward_mean=0.1, reward_bound=0.0
119304: loss=0.000, reward_mean=0.0, reward_bound=0.0
119305: loss=0.000, reward_mean=0.0, reward_bound=0.0
119306: loss=0.000, reward_mean=0.0, reward_bound=0.0
119307: loss=0.000, reward_mean=0.1, reward_bound=0.0
119308: loss=0.000, reward_mean=0.1, reward_bound=0.0
119309: loss=0.000, reward_mean=0.0, reward_bound=0.0
119310: loss=0.000, reward_mean=0.1, reward_bound=0.0
119311: loss=0.000, reward_mean=0.1, reward_bound=0.0
119312: loss=0.000, reward_m

119449: loss=0.000, reward_mean=0.1, reward_bound=0.0
119450: loss=0.000, reward_mean=0.1, reward_bound=0.0
119451: loss=0.000, reward_mean=0.0, reward_bound=0.0
119452: loss=0.000, reward_mean=0.1, reward_bound=0.0
119453: loss=0.000, reward_mean=0.1, reward_bound=0.0
119454: loss=0.000, reward_mean=0.1, reward_bound=0.0
119455: loss=0.000, reward_mean=0.1, reward_bound=0.0
119456: loss=0.000, reward_mean=0.1, reward_bound=0.0
119457: loss=0.000, reward_mean=0.1, reward_bound=0.0
119458: loss=0.000, reward_mean=0.1, reward_bound=0.0
119459: loss=0.000, reward_mean=0.1, reward_bound=0.0
119460: loss=0.000, reward_mean=0.1, reward_bound=0.0
119461: loss=0.000, reward_mean=0.0, reward_bound=0.0
119462: loss=0.000, reward_mean=0.0, reward_bound=0.0
119463: loss=0.000, reward_mean=0.1, reward_bound=0.0
119464: loss=0.000, reward_mean=0.1, reward_bound=0.0
119465: loss=0.000, reward_mean=0.1, reward_bound=0.0
119466: loss=0.000, reward_mean=0.1, reward_bound=0.0
119467: loss=0.000, reward_m

119601: loss=0.000, reward_mean=0.0, reward_bound=0.0
119602: loss=0.000, reward_mean=0.0, reward_bound=0.0
119603: loss=0.000, reward_mean=0.1, reward_bound=0.0
119604: loss=0.000, reward_mean=0.0, reward_bound=0.0
119605: loss=0.000, reward_mean=0.0, reward_bound=0.0
119606: loss=0.000, reward_mean=0.1, reward_bound=0.0
119607: loss=0.000, reward_mean=0.1, reward_bound=0.0
119608: loss=0.000, reward_mean=0.1, reward_bound=0.0
119609: loss=0.000, reward_mean=0.0, reward_bound=0.0
119610: loss=0.000, reward_mean=0.1, reward_bound=0.0
119611: loss=0.000, reward_mean=0.1, reward_bound=0.0
119612: loss=0.000, reward_mean=0.2, reward_bound=0.0
119613: loss=0.000, reward_mean=0.2, reward_bound=0.0
119614: loss=0.000, reward_mean=0.0, reward_bound=0.0
119615: loss=0.000, reward_mean=0.1, reward_bound=0.0
119616: loss=0.000, reward_mean=0.0, reward_bound=0.0
119617: loss=0.000, reward_mean=0.1, reward_bound=0.0
119618: loss=0.000, reward_mean=0.0, reward_bound=0.0
119619: loss=0.000, reward_m

119758: loss=0.000, reward_mean=0.0, reward_bound=0.0
119759: loss=0.000, reward_mean=0.1, reward_bound=0.0
119760: loss=0.000, reward_mean=0.1, reward_bound=0.0
119761: loss=0.000, reward_mean=0.1, reward_bound=0.0
119762: loss=0.000, reward_mean=0.1, reward_bound=0.0
119763: loss=0.000, reward_mean=0.0, reward_bound=0.0
119764: loss=0.000, reward_mean=0.1, reward_bound=0.0
119765: loss=0.000, reward_mean=0.0, reward_bound=0.0
119766: loss=0.000, reward_mean=0.1, reward_bound=0.0
119767: loss=0.000, reward_mean=0.1, reward_bound=0.0
119768: loss=0.000, reward_mean=0.0, reward_bound=0.0
119769: loss=0.000, reward_mean=0.1, reward_bound=0.0
119770: loss=0.000, reward_mean=0.0, reward_bound=0.0
119771: loss=0.000, reward_mean=0.1, reward_bound=0.0
119772: loss=0.000, reward_mean=0.0, reward_bound=0.0
119773: loss=0.000, reward_mean=0.1, reward_bound=0.0
119774: loss=0.000, reward_mean=0.1, reward_bound=0.0
119775: loss=0.000, reward_mean=0.1, reward_bound=0.0
119776: loss=0.000, reward_m

119915: loss=0.000, reward_mean=0.1, reward_bound=0.0
119916: loss=0.000, reward_mean=0.1, reward_bound=0.0
119917: loss=0.000, reward_mean=0.1, reward_bound=0.0
119918: loss=0.000, reward_mean=0.1, reward_bound=0.0
119919: loss=0.000, reward_mean=0.1, reward_bound=0.0
119920: loss=0.000, reward_mean=0.1, reward_bound=0.0
119921: loss=0.000, reward_mean=0.0, reward_bound=0.0
119922: loss=0.000, reward_mean=0.1, reward_bound=0.0
119923: loss=0.000, reward_mean=0.0, reward_bound=0.0
119924: loss=0.000, reward_mean=0.0, reward_bound=0.0
119925: loss=0.000, reward_mean=0.0, reward_bound=0.0
119926: loss=0.000, reward_mean=0.0, reward_bound=0.0
119927: loss=0.000, reward_mean=0.1, reward_bound=0.0
119928: loss=0.000, reward_mean=0.1, reward_bound=0.0
119929: loss=0.000, reward_mean=0.1, reward_bound=0.0
119930: loss=0.000, reward_mean=0.1, reward_bound=0.0
119931: loss=0.000, reward_mean=0.1, reward_bound=0.0
119932: loss=0.000, reward_mean=0.1, reward_bound=0.0
119933: loss=0.000, reward_m

120070: loss=0.000, reward_mean=0.1, reward_bound=0.0
120071: loss=0.000, reward_mean=0.0, reward_bound=0.0
120072: loss=0.000, reward_mean=0.1, reward_bound=0.0
120073: loss=0.000, reward_mean=0.2, reward_bound=0.0
120074: loss=0.000, reward_mean=0.0, reward_bound=0.0
120075: loss=0.000, reward_mean=0.3, reward_bound=0.5
120076: loss=0.000, reward_mean=0.1, reward_bound=0.0
120077: loss=0.000, reward_mean=0.0, reward_bound=0.0
120078: loss=0.000, reward_mean=0.1, reward_bound=0.0
120079: loss=0.000, reward_mean=0.0, reward_bound=0.0
120080: loss=0.000, reward_mean=0.1, reward_bound=0.0
120081: loss=0.000, reward_mean=0.0, reward_bound=0.0
120082: loss=0.000, reward_mean=0.0, reward_bound=0.0
120083: loss=0.000, reward_mean=0.0, reward_bound=0.0
120084: loss=0.000, reward_mean=0.1, reward_bound=0.0
120085: loss=0.000, reward_mean=0.0, reward_bound=0.0
120086: loss=0.000, reward_mean=0.1, reward_bound=0.0
120087: loss=0.000, reward_mean=0.1, reward_bound=0.0
120088: loss=0.000, reward_m

120221: loss=0.000, reward_mean=0.1, reward_bound=0.0
120222: loss=0.000, reward_mean=0.1, reward_bound=0.0
120223: loss=0.000, reward_mean=0.0, reward_bound=0.0
120224: loss=0.000, reward_mean=0.1, reward_bound=0.0
120225: loss=0.000, reward_mean=0.1, reward_bound=0.0
120226: loss=0.000, reward_mean=0.0, reward_bound=0.0
120227: loss=0.000, reward_mean=0.0, reward_bound=0.0
120228: loss=0.000, reward_mean=0.2, reward_bound=0.0
120229: loss=0.000, reward_mean=0.0, reward_bound=0.0
120230: loss=0.000, reward_mean=0.0, reward_bound=0.0
120231: loss=0.000, reward_mean=0.0, reward_bound=0.0
120232: loss=0.000, reward_mean=0.1, reward_bound=0.0
120233: loss=0.000, reward_mean=0.0, reward_bound=0.0
120234: loss=0.000, reward_mean=0.0, reward_bound=0.0
120235: loss=0.000, reward_mean=0.0, reward_bound=0.0
120236: loss=0.000, reward_mean=0.1, reward_bound=0.0
120237: loss=0.000, reward_mean=0.0, reward_bound=0.0
120238: loss=0.000, reward_mean=0.1, reward_bound=0.0
120239: loss=0.000, reward_m

120377: loss=0.000, reward_mean=0.1, reward_bound=0.0
120378: loss=0.000, reward_mean=0.1, reward_bound=0.0
120379: loss=0.000, reward_mean=0.0, reward_bound=0.0
120380: loss=0.000, reward_mean=0.2, reward_bound=0.0
120381: loss=0.000, reward_mean=0.0, reward_bound=0.0
120382: loss=0.000, reward_mean=0.0, reward_bound=0.0
120383: loss=0.000, reward_mean=0.0, reward_bound=0.0
120384: loss=0.000, reward_mean=0.1, reward_bound=0.0
120385: loss=0.000, reward_mean=0.0, reward_bound=0.0
120386: loss=0.000, reward_mean=0.1, reward_bound=0.0
120387: loss=0.000, reward_mean=0.0, reward_bound=0.0
120388: loss=0.000, reward_mean=0.1, reward_bound=0.0
120389: loss=0.000, reward_mean=0.1, reward_bound=0.0
120390: loss=0.000, reward_mean=0.1, reward_bound=0.0
120391: loss=0.000, reward_mean=0.0, reward_bound=0.0
120392: loss=0.000, reward_mean=0.1, reward_bound=0.0
120393: loss=0.000, reward_mean=0.1, reward_bound=0.0
120394: loss=0.000, reward_mean=0.1, reward_bound=0.0
120395: loss=0.000, reward_m

120534: loss=0.000, reward_mean=0.0, reward_bound=0.0
120535: loss=0.000, reward_mean=0.1, reward_bound=0.0
120536: loss=0.000, reward_mean=0.0, reward_bound=0.0
120537: loss=0.000, reward_mean=0.1, reward_bound=0.0
120538: loss=0.000, reward_mean=0.1, reward_bound=0.0
120539: loss=0.000, reward_mean=0.0, reward_bound=0.0
120540: loss=0.000, reward_mean=0.1, reward_bound=0.0
120541: loss=0.000, reward_mean=0.1, reward_bound=0.0
120542: loss=0.000, reward_mean=0.0, reward_bound=0.0
120543: loss=0.000, reward_mean=0.0, reward_bound=0.0
120544: loss=0.000, reward_mean=0.1, reward_bound=0.0
120545: loss=0.000, reward_mean=0.0, reward_bound=0.0
120546: loss=0.000, reward_mean=0.0, reward_bound=0.0
120547: loss=0.000, reward_mean=0.1, reward_bound=0.0
120548: loss=0.000, reward_mean=0.0, reward_bound=0.0
120549: loss=0.000, reward_mean=0.2, reward_bound=0.0
120550: loss=0.000, reward_mean=0.0, reward_bound=0.0
120551: loss=0.000, reward_mean=0.0, reward_bound=0.0
120552: loss=0.000, reward_m

120692: loss=0.000, reward_mean=0.0, reward_bound=0.0
120693: loss=0.000, reward_mean=0.0, reward_bound=0.0
120694: loss=0.000, reward_mean=0.1, reward_bound=0.0
120695: loss=0.000, reward_mean=0.1, reward_bound=0.0
120696: loss=0.000, reward_mean=0.0, reward_bound=0.0
120697: loss=0.000, reward_mean=0.0, reward_bound=0.0
120698: loss=0.000, reward_mean=0.0, reward_bound=0.0
120699: loss=0.000, reward_mean=0.0, reward_bound=0.0
120700: loss=0.000, reward_mean=0.1, reward_bound=0.0
120701: loss=0.000, reward_mean=0.1, reward_bound=0.0
120702: loss=0.000, reward_mean=0.0, reward_bound=0.0
120703: loss=0.000, reward_mean=0.1, reward_bound=0.0
120704: loss=0.000, reward_mean=0.0, reward_bound=0.0
120705: loss=0.000, reward_mean=0.0, reward_bound=0.0
120706: loss=0.000, reward_mean=0.0, reward_bound=0.0
120707: loss=0.000, reward_mean=0.0, reward_bound=0.0
120708: loss=0.000, reward_mean=0.1, reward_bound=0.0
120709: loss=0.000, reward_mean=0.2, reward_bound=0.0
120710: loss=0.000, reward_m

120846: loss=0.000, reward_mean=0.1, reward_bound=0.0
120847: loss=0.000, reward_mean=0.0, reward_bound=0.0
120848: loss=0.000, reward_mean=0.2, reward_bound=0.0
120849: loss=0.000, reward_mean=0.0, reward_bound=0.0
120850: loss=0.000, reward_mean=0.1, reward_bound=0.0
120851: loss=0.000, reward_mean=0.0, reward_bound=0.0
120852: loss=0.000, reward_mean=0.1, reward_bound=0.0
120853: loss=0.000, reward_mean=0.1, reward_bound=0.0
120854: loss=0.000, reward_mean=0.1, reward_bound=0.0
120855: loss=0.000, reward_mean=0.1, reward_bound=0.0
120856: loss=0.000, reward_mean=0.1, reward_bound=0.0
120857: loss=0.000, reward_mean=0.1, reward_bound=0.0
120858: loss=0.000, reward_mean=0.1, reward_bound=0.0
120859: loss=0.000, reward_mean=0.1, reward_bound=0.0
120860: loss=0.000, reward_mean=0.0, reward_bound=0.0
120861: loss=0.000, reward_mean=0.1, reward_bound=0.0
120862: loss=0.000, reward_mean=0.0, reward_bound=0.0
120863: loss=0.000, reward_mean=0.0, reward_bound=0.0
120864: loss=0.000, reward_m

121003: loss=0.000, reward_mean=0.0, reward_bound=0.0
121004: loss=0.000, reward_mean=0.1, reward_bound=0.0
121005: loss=0.000, reward_mean=0.0, reward_bound=0.0
121006: loss=0.000, reward_mean=0.1, reward_bound=0.0
121007: loss=0.000, reward_mean=0.1, reward_bound=0.0
121008: loss=0.000, reward_mean=0.1, reward_bound=0.0
121009: loss=0.000, reward_mean=0.0, reward_bound=0.0
121010: loss=0.000, reward_mean=0.0, reward_bound=0.0
121011: loss=0.000, reward_mean=0.1, reward_bound=0.0
121012: loss=0.000, reward_mean=0.1, reward_bound=0.0
121013: loss=0.000, reward_mean=0.1, reward_bound=0.0
121014: loss=0.000, reward_mean=0.1, reward_bound=0.0
121015: loss=0.000, reward_mean=0.0, reward_bound=0.0
121016: loss=0.000, reward_mean=0.0, reward_bound=0.0
121017: loss=0.000, reward_mean=0.1, reward_bound=0.0
121018: loss=0.000, reward_mean=0.1, reward_bound=0.0
121019: loss=0.000, reward_mean=0.0, reward_bound=0.0
121020: loss=0.000, reward_mean=0.1, reward_bound=0.0
121021: loss=0.000, reward_m

121156: loss=0.000, reward_mean=0.1, reward_bound=0.0
121157: loss=0.000, reward_mean=0.1, reward_bound=0.0
121158: loss=0.000, reward_mean=0.1, reward_bound=0.0
121159: loss=0.000, reward_mean=0.0, reward_bound=0.0
121160: loss=0.000, reward_mean=0.0, reward_bound=0.0
121161: loss=0.000, reward_mean=0.0, reward_bound=0.0
121162: loss=0.000, reward_mean=0.0, reward_bound=0.0
121163: loss=0.000, reward_mean=0.1, reward_bound=0.0
121164: loss=0.000, reward_mean=0.1, reward_bound=0.0
121165: loss=0.000, reward_mean=0.0, reward_bound=0.0
121166: loss=0.000, reward_mean=0.1, reward_bound=0.0
121167: loss=0.000, reward_mean=0.0, reward_bound=0.0
121168: loss=0.000, reward_mean=0.1, reward_bound=0.0
121169: loss=0.000, reward_mean=0.1, reward_bound=0.0
121170: loss=0.000, reward_mean=0.0, reward_bound=0.0
121171: loss=0.000, reward_mean=0.1, reward_bound=0.0
121172: loss=0.000, reward_mean=0.1, reward_bound=0.0
121173: loss=0.000, reward_mean=0.0, reward_bound=0.0
121174: loss=0.000, reward_m

121313: loss=0.000, reward_mean=0.0, reward_bound=0.0
121314: loss=0.000, reward_mean=0.2, reward_bound=0.0
121315: loss=0.000, reward_mean=0.1, reward_bound=0.0
121316: loss=0.000, reward_mean=0.0, reward_bound=0.0
121317: loss=0.000, reward_mean=0.1, reward_bound=0.0
121318: loss=0.000, reward_mean=0.1, reward_bound=0.0
121319: loss=0.000, reward_mean=0.0, reward_bound=0.0
121320: loss=0.000, reward_mean=0.1, reward_bound=0.0
121321: loss=0.000, reward_mean=0.1, reward_bound=0.0
121322: loss=0.000, reward_mean=0.0, reward_bound=0.0
121323: loss=0.000, reward_mean=0.1, reward_bound=0.0
121324: loss=0.000, reward_mean=0.0, reward_bound=0.0
121325: loss=0.000, reward_mean=0.0, reward_bound=0.0
121326: loss=0.000, reward_mean=0.0, reward_bound=0.0
121327: loss=0.000, reward_mean=0.0, reward_bound=0.0
121328: loss=0.000, reward_mean=0.2, reward_bound=0.0
121329: loss=0.000, reward_mean=0.1, reward_bound=0.0
121330: loss=0.000, reward_mean=0.0, reward_bound=0.0
121331: loss=0.000, reward_m

121465: loss=0.000, reward_mean=0.1, reward_bound=0.0
121466: loss=0.000, reward_mean=0.0, reward_bound=0.0
121467: loss=0.000, reward_mean=0.1, reward_bound=0.0
121468: loss=0.000, reward_mean=0.0, reward_bound=0.0
121469: loss=0.000, reward_mean=0.1, reward_bound=0.0
121470: loss=0.000, reward_mean=0.0, reward_bound=0.0
121471: loss=0.000, reward_mean=0.1, reward_bound=0.0
121472: loss=0.000, reward_mean=0.1, reward_bound=0.0
121473: loss=0.000, reward_mean=0.1, reward_bound=0.0
121474: loss=0.000, reward_mean=0.1, reward_bound=0.0
121475: loss=0.000, reward_mean=0.1, reward_bound=0.0
121476: loss=0.000, reward_mean=0.1, reward_bound=0.0
121477: loss=0.000, reward_mean=0.0, reward_bound=0.0
121478: loss=0.000, reward_mean=0.0, reward_bound=0.0
121479: loss=0.000, reward_mean=0.1, reward_bound=0.0
121480: loss=0.000, reward_mean=0.2, reward_bound=0.0
121481: loss=0.000, reward_mean=0.0, reward_bound=0.0
121482: loss=0.000, reward_mean=0.1, reward_bound=0.0
121483: loss=0.000, reward_m

121621: loss=0.000, reward_mean=0.2, reward_bound=0.0
121622: loss=0.000, reward_mean=0.1, reward_bound=0.0
121623: loss=0.000, reward_mean=0.1, reward_bound=0.0
121624: loss=0.000, reward_mean=0.1, reward_bound=0.0
121625: loss=0.000, reward_mean=0.0, reward_bound=0.0
121626: loss=0.000, reward_mean=0.2, reward_bound=0.0
121627: loss=0.000, reward_mean=0.0, reward_bound=0.0
121628: loss=0.000, reward_mean=0.0, reward_bound=0.0
121629: loss=0.000, reward_mean=0.0, reward_bound=0.0
121630: loss=0.000, reward_mean=0.0, reward_bound=0.0
121631: loss=0.000, reward_mean=0.0, reward_bound=0.0
121632: loss=0.000, reward_mean=0.0, reward_bound=0.0
121633: loss=0.000, reward_mean=0.0, reward_bound=0.0
121634: loss=0.000, reward_mean=0.1, reward_bound=0.0
121635: loss=0.000, reward_mean=0.0, reward_bound=0.0
121636: loss=0.000, reward_mean=0.0, reward_bound=0.0
121637: loss=0.000, reward_mean=0.0, reward_bound=0.0
121638: loss=0.000, reward_mean=0.1, reward_bound=0.0
121639: loss=0.000, reward_m

121778: loss=0.000, reward_mean=0.1, reward_bound=0.0
121779: loss=0.000, reward_mean=0.0, reward_bound=0.0
121780: loss=0.000, reward_mean=0.0, reward_bound=0.0
121781: loss=0.000, reward_mean=0.0, reward_bound=0.0
121782: loss=0.000, reward_mean=0.0, reward_bound=0.0
121783: loss=0.000, reward_mean=0.0, reward_bound=0.0
121784: loss=0.000, reward_mean=0.1, reward_bound=0.0
121785: loss=0.000, reward_mean=0.0, reward_bound=0.0
121786: loss=0.000, reward_mean=0.0, reward_bound=0.0
121787: loss=0.000, reward_mean=0.1, reward_bound=0.0
121788: loss=0.000, reward_mean=0.0, reward_bound=0.0
121789: loss=0.000, reward_mean=0.1, reward_bound=0.0
121790: loss=0.000, reward_mean=0.0, reward_bound=0.0
121791: loss=0.000, reward_mean=0.1, reward_bound=0.0
121792: loss=0.000, reward_mean=0.0, reward_bound=0.0
121793: loss=0.000, reward_mean=0.1, reward_bound=0.0
121794: loss=0.000, reward_mean=0.1, reward_bound=0.0
121795: loss=0.000, reward_mean=0.1, reward_bound=0.0
121796: loss=0.000, reward_m

121935: loss=0.000, reward_mean=0.0, reward_bound=0.0
121936: loss=0.000, reward_mean=0.1, reward_bound=0.0
121937: loss=0.000, reward_mean=0.1, reward_bound=0.0
121938: loss=0.000, reward_mean=0.1, reward_bound=0.0
121939: loss=0.000, reward_mean=0.0, reward_bound=0.0
121940: loss=0.000, reward_mean=0.0, reward_bound=0.0
121941: loss=0.000, reward_mean=0.1, reward_bound=0.0
121942: loss=0.000, reward_mean=0.0, reward_bound=0.0
121943: loss=0.000, reward_mean=0.0, reward_bound=0.0
121944: loss=0.000, reward_mean=0.1, reward_bound=0.0
121945: loss=0.000, reward_mean=0.1, reward_bound=0.0
121946: loss=0.000, reward_mean=0.0, reward_bound=0.0
121947: loss=0.000, reward_mean=0.1, reward_bound=0.0
121948: loss=0.000, reward_mean=0.1, reward_bound=0.0
121949: loss=0.000, reward_mean=0.1, reward_bound=0.0
121950: loss=0.000, reward_mean=0.0, reward_bound=0.0
121951: loss=0.000, reward_mean=0.1, reward_bound=0.0
121952: loss=0.000, reward_mean=0.1, reward_bound=0.0
121953: loss=0.000, reward_m

122093: loss=0.000, reward_mean=0.1, reward_bound=0.0
122094: loss=0.000, reward_mean=0.0, reward_bound=0.0
122095: loss=0.000, reward_mean=0.0, reward_bound=0.0
122096: loss=0.000, reward_mean=0.1, reward_bound=0.0
122097: loss=0.000, reward_mean=0.1, reward_bound=0.0
122098: loss=0.000, reward_mean=0.1, reward_bound=0.0
122099: loss=0.000, reward_mean=0.0, reward_bound=0.0
122100: loss=0.000, reward_mean=0.1, reward_bound=0.0
122101: loss=0.000, reward_mean=0.1, reward_bound=0.0
122102: loss=0.000, reward_mean=0.0, reward_bound=0.0
122103: loss=0.000, reward_mean=0.0, reward_bound=0.0
122104: loss=0.000, reward_mean=0.1, reward_bound=0.0
122105: loss=0.000, reward_mean=0.1, reward_bound=0.0
122106: loss=0.000, reward_mean=0.0, reward_bound=0.0
122107: loss=0.000, reward_mean=0.0, reward_bound=0.0
122108: loss=0.000, reward_mean=0.0, reward_bound=0.0
122109: loss=0.000, reward_mean=0.0, reward_bound=0.0
122110: loss=0.000, reward_mean=0.0, reward_bound=0.0
122111: loss=0.000, reward_m

122245: loss=0.000, reward_mean=0.1, reward_bound=0.0
122246: loss=0.000, reward_mean=0.1, reward_bound=0.0
122247: loss=0.000, reward_mean=0.1, reward_bound=0.0
122248: loss=0.000, reward_mean=0.1, reward_bound=0.0
122249: loss=0.000, reward_mean=0.2, reward_bound=0.0
122250: loss=0.000, reward_mean=0.2, reward_bound=0.0
122251: loss=0.000, reward_mean=0.0, reward_bound=0.0
122252: loss=0.000, reward_mean=0.1, reward_bound=0.0
122253: loss=0.000, reward_mean=0.1, reward_bound=0.0
122254: loss=0.000, reward_mean=0.1, reward_bound=0.0
122255: loss=0.000, reward_mean=0.0, reward_bound=0.0
122256: loss=0.000, reward_mean=0.0, reward_bound=0.0
122257: loss=0.000, reward_mean=0.1, reward_bound=0.0
122258: loss=0.000, reward_mean=0.1, reward_bound=0.0
122259: loss=0.000, reward_mean=0.2, reward_bound=0.0
122260: loss=0.000, reward_mean=0.0, reward_bound=0.0
122261: loss=0.000, reward_mean=0.0, reward_bound=0.0
122262: loss=0.000, reward_mean=0.1, reward_bound=0.0
122263: loss=0.000, reward_m

122397: loss=0.000, reward_mean=0.0, reward_bound=0.0
122398: loss=0.000, reward_mean=0.1, reward_bound=0.0
122399: loss=0.000, reward_mean=0.0, reward_bound=0.0
122400: loss=0.000, reward_mean=0.1, reward_bound=0.0
122401: loss=0.000, reward_mean=0.2, reward_bound=0.0
122402: loss=0.000, reward_mean=0.1, reward_bound=0.0
122403: loss=0.000, reward_mean=0.0, reward_bound=0.0
122404: loss=0.000, reward_mean=0.1, reward_bound=0.0
122405: loss=0.000, reward_mean=0.0, reward_bound=0.0
122406: loss=0.000, reward_mean=0.1, reward_bound=0.0
122407: loss=0.000, reward_mean=0.1, reward_bound=0.0
122408: loss=0.000, reward_mean=0.1, reward_bound=0.0
122409: loss=0.000, reward_mean=0.0, reward_bound=0.0
122410: loss=0.000, reward_mean=0.1, reward_bound=0.0
122411: loss=0.000, reward_mean=0.1, reward_bound=0.0
122412: loss=0.000, reward_mean=0.0, reward_bound=0.0
122413: loss=0.000, reward_mean=0.1, reward_bound=0.0
122414: loss=0.000, reward_mean=0.0, reward_bound=0.0
122415: loss=0.000, reward_m

122552: loss=0.000, reward_mean=0.1, reward_bound=0.0
122553: loss=0.000, reward_mean=0.0, reward_bound=0.0
122554: loss=0.000, reward_mean=0.1, reward_bound=0.0
122555: loss=0.000, reward_mean=0.1, reward_bound=0.0
122556: loss=0.000, reward_mean=0.1, reward_bound=0.0
122557: loss=0.000, reward_mean=0.1, reward_bound=0.0
122558: loss=0.000, reward_mean=0.1, reward_bound=0.0
122559: loss=0.000, reward_mean=0.0, reward_bound=0.0
122560: loss=0.000, reward_mean=0.0, reward_bound=0.0
122561: loss=0.000, reward_mean=0.1, reward_bound=0.0
122562: loss=0.000, reward_mean=0.0, reward_bound=0.0
122563: loss=0.000, reward_mean=0.0, reward_bound=0.0
122564: loss=0.000, reward_mean=0.0, reward_bound=0.0
122565: loss=0.000, reward_mean=0.1, reward_bound=0.0
122566: loss=0.000, reward_mean=0.1, reward_bound=0.0
122567: loss=0.000, reward_mean=0.1, reward_bound=0.0
122568: loss=0.000, reward_mean=0.0, reward_bound=0.0
122569: loss=0.000, reward_mean=0.1, reward_bound=0.0
122570: loss=0.000, reward_m

122707: loss=0.000, reward_mean=0.0, reward_bound=0.0
122708: loss=0.000, reward_mean=0.1, reward_bound=0.0
122709: loss=0.000, reward_mean=0.1, reward_bound=0.0
122710: loss=0.000, reward_mean=0.1, reward_bound=0.0
122711: loss=0.000, reward_mean=0.1, reward_bound=0.0
122712: loss=0.000, reward_mean=0.1, reward_bound=0.0
122713: loss=0.000, reward_mean=0.0, reward_bound=0.0
122714: loss=0.000, reward_mean=0.1, reward_bound=0.0
122715: loss=0.000, reward_mean=0.0, reward_bound=0.0
122716: loss=0.000, reward_mean=0.0, reward_bound=0.0
122717: loss=0.000, reward_mean=0.0, reward_bound=0.0
122718: loss=0.000, reward_mean=0.0, reward_bound=0.0
122719: loss=0.000, reward_mean=0.0, reward_bound=0.0
122720: loss=0.000, reward_mean=0.0, reward_bound=0.0
122721: loss=0.000, reward_mean=0.1, reward_bound=0.0
122722: loss=0.000, reward_mean=0.2, reward_bound=0.0
122723: loss=0.000, reward_mean=0.1, reward_bound=0.0
122724: loss=0.000, reward_mean=0.1, reward_bound=0.0
122725: loss=0.000, reward_m

122859: loss=0.000, reward_mean=0.2, reward_bound=0.0
122860: loss=0.000, reward_mean=0.0, reward_bound=0.0
122861: loss=0.000, reward_mean=0.1, reward_bound=0.0
122862: loss=0.000, reward_mean=0.1, reward_bound=0.0
122863: loss=0.000, reward_mean=0.1, reward_bound=0.0
122864: loss=0.000, reward_mean=0.1, reward_bound=0.0
122865: loss=0.000, reward_mean=0.1, reward_bound=0.0
122866: loss=0.000, reward_mean=0.0, reward_bound=0.0
122867: loss=0.000, reward_mean=0.2, reward_bound=0.0
122868: loss=0.000, reward_mean=0.1, reward_bound=0.0
122869: loss=0.000, reward_mean=0.1, reward_bound=0.0
122870: loss=0.000, reward_mean=0.0, reward_bound=0.0
122871: loss=0.000, reward_mean=0.0, reward_bound=0.0
122872: loss=0.000, reward_mean=0.1, reward_bound=0.0
122873: loss=0.000, reward_mean=0.0, reward_bound=0.0
122874: loss=0.000, reward_mean=0.0, reward_bound=0.0
122875: loss=0.000, reward_mean=0.1, reward_bound=0.0
122876: loss=0.000, reward_mean=0.0, reward_bound=0.0
122877: loss=0.000, reward_m

123011: loss=0.000, reward_mean=0.1, reward_bound=0.0
123012: loss=0.000, reward_mean=0.1, reward_bound=0.0
123013: loss=0.000, reward_mean=0.1, reward_bound=0.0
123014: loss=0.000, reward_mean=0.2, reward_bound=0.0
123015: loss=0.000, reward_mean=0.0, reward_bound=0.0
123016: loss=0.000, reward_mean=0.1, reward_bound=0.0
123017: loss=0.000, reward_mean=0.1, reward_bound=0.0
123018: loss=0.000, reward_mean=0.0, reward_bound=0.0
123019: loss=0.000, reward_mean=0.0, reward_bound=0.0
123020: loss=0.000, reward_mean=0.0, reward_bound=0.0
123021: loss=0.000, reward_mean=0.0, reward_bound=0.0
123022: loss=0.000, reward_mean=0.1, reward_bound=0.0
123023: loss=0.000, reward_mean=0.1, reward_bound=0.0
123024: loss=0.000, reward_mean=0.2, reward_bound=0.0
123025: loss=0.000, reward_mean=0.1, reward_bound=0.0
123026: loss=0.000, reward_mean=0.0, reward_bound=0.0
123027: loss=0.000, reward_mean=0.1, reward_bound=0.0
123028: loss=0.000, reward_mean=0.0, reward_bound=0.0
123029: loss=0.000, reward_m

123168: loss=0.000, reward_mean=0.0, reward_bound=0.0
123169: loss=0.000, reward_mean=0.0, reward_bound=0.0
123170: loss=0.000, reward_mean=0.1, reward_bound=0.0
123171: loss=0.000, reward_mean=0.0, reward_bound=0.0
123172: loss=0.000, reward_mean=0.0, reward_bound=0.0
123173: loss=0.000, reward_mean=0.1, reward_bound=0.0
123174: loss=0.000, reward_mean=0.0, reward_bound=0.0
123175: loss=0.000, reward_mean=0.1, reward_bound=0.0
123176: loss=0.000, reward_mean=0.0, reward_bound=0.0
123177: loss=0.000, reward_mean=0.1, reward_bound=0.0
123178: loss=0.000, reward_mean=0.1, reward_bound=0.0
123179: loss=0.000, reward_mean=0.1, reward_bound=0.0
123180: loss=0.000, reward_mean=0.1, reward_bound=0.0
123181: loss=0.000, reward_mean=0.1, reward_bound=0.0
123182: loss=0.000, reward_mean=0.1, reward_bound=0.0
123183: loss=0.000, reward_mean=0.0, reward_bound=0.0
123184: loss=0.000, reward_mean=0.1, reward_bound=0.0
123185: loss=0.000, reward_mean=0.1, reward_bound=0.0
123186: loss=0.000, reward_m

123322: loss=0.000, reward_mean=0.1, reward_bound=0.0
123323: loss=0.000, reward_mean=0.0, reward_bound=0.0
123324: loss=0.000, reward_mean=0.0, reward_bound=0.0
123325: loss=0.000, reward_mean=0.1, reward_bound=0.0
123326: loss=0.000, reward_mean=0.1, reward_bound=0.0
123327: loss=0.000, reward_mean=0.0, reward_bound=0.0
123328: loss=0.000, reward_mean=0.1, reward_bound=0.0
123329: loss=0.000, reward_mean=0.1, reward_bound=0.0
123330: loss=0.000, reward_mean=0.1, reward_bound=0.0
123331: loss=0.000, reward_mean=0.1, reward_bound=0.0
123332: loss=0.000, reward_mean=0.0, reward_bound=0.0
123333: loss=0.000, reward_mean=0.1, reward_bound=0.0
123334: loss=0.000, reward_mean=0.0, reward_bound=0.0
123335: loss=0.000, reward_mean=0.0, reward_bound=0.0
123336: loss=0.000, reward_mean=0.1, reward_bound=0.0
123337: loss=0.000, reward_mean=0.0, reward_bound=0.0
123338: loss=0.000, reward_mean=0.1, reward_bound=0.0
123339: loss=0.000, reward_mean=0.0, reward_bound=0.0
123340: loss=0.000, reward_m

123476: loss=0.000, reward_mean=0.1, reward_bound=0.0
123477: loss=0.000, reward_mean=0.1, reward_bound=0.0
123478: loss=0.000, reward_mean=0.1, reward_bound=0.0
123479: loss=0.000, reward_mean=0.1, reward_bound=0.0
123480: loss=0.000, reward_mean=0.1, reward_bound=0.0
123481: loss=0.000, reward_mean=0.0, reward_bound=0.0
123482: loss=0.000, reward_mean=0.1, reward_bound=0.0
123483: loss=0.000, reward_mean=0.1, reward_bound=0.0
123484: loss=0.000, reward_mean=0.1, reward_bound=0.0
123485: loss=0.000, reward_mean=0.0, reward_bound=0.0
123486: loss=0.000, reward_mean=0.1, reward_bound=0.0
123487: loss=0.000, reward_mean=0.1, reward_bound=0.0
123488: loss=0.000, reward_mean=0.0, reward_bound=0.0
123489: loss=0.000, reward_mean=0.0, reward_bound=0.0
123490: loss=0.000, reward_mean=0.0, reward_bound=0.0
123491: loss=0.000, reward_mean=0.1, reward_bound=0.0
123492: loss=0.000, reward_mean=0.0, reward_bound=0.0
123493: loss=0.000, reward_mean=0.1, reward_bound=0.0
123494: loss=0.000, reward_m

123632: loss=0.000, reward_mean=0.0, reward_bound=0.0
123633: loss=0.000, reward_mean=0.1, reward_bound=0.0
123634: loss=0.000, reward_mean=0.0, reward_bound=0.0
123635: loss=0.000, reward_mean=0.0, reward_bound=0.0
123636: loss=0.000, reward_mean=0.1, reward_bound=0.0
123637: loss=0.000, reward_mean=0.0, reward_bound=0.0
123638: loss=0.000, reward_mean=0.1, reward_bound=0.0
123639: loss=0.000, reward_mean=0.1, reward_bound=0.0
123640: loss=0.000, reward_mean=0.0, reward_bound=0.0
123641: loss=0.000, reward_mean=0.0, reward_bound=0.0
123642: loss=0.000, reward_mean=0.1, reward_bound=0.0
123643: loss=0.000, reward_mean=0.1, reward_bound=0.0
123644: loss=0.000, reward_mean=0.0, reward_bound=0.0
123645: loss=0.000, reward_mean=0.0, reward_bound=0.0
123646: loss=0.000, reward_mean=0.1, reward_bound=0.0
123647: loss=0.000, reward_mean=0.1, reward_bound=0.0
123648: loss=0.000, reward_mean=0.0, reward_bound=0.0
123649: loss=0.000, reward_mean=0.1, reward_bound=0.0
123650: loss=0.000, reward_m

123785: loss=0.000, reward_mean=0.1, reward_bound=0.0
123786: loss=0.000, reward_mean=0.0, reward_bound=0.0
123787: loss=0.000, reward_mean=0.1, reward_bound=0.0
123788: loss=0.000, reward_mean=0.0, reward_bound=0.0
123789: loss=0.000, reward_mean=0.0, reward_bound=0.0
123790: loss=0.000, reward_mean=0.1, reward_bound=0.0
123791: loss=0.000, reward_mean=0.1, reward_bound=0.0
123792: loss=0.000, reward_mean=0.0, reward_bound=0.0
123793: loss=0.000, reward_mean=0.1, reward_bound=0.0
123794: loss=0.000, reward_mean=0.0, reward_bound=0.0
123795: loss=0.000, reward_mean=0.0, reward_bound=0.0
123796: loss=0.000, reward_mean=0.0, reward_bound=0.0
123797: loss=0.000, reward_mean=0.1, reward_bound=0.0
123798: loss=0.000, reward_mean=0.0, reward_bound=0.0
123799: loss=0.000, reward_mean=0.0, reward_bound=0.0
123800: loss=0.000, reward_mean=0.1, reward_bound=0.0
123801: loss=0.000, reward_mean=0.1, reward_bound=0.0
123802: loss=0.000, reward_mean=0.0, reward_bound=0.0
123803: loss=0.000, reward_m

123937: loss=0.000, reward_mean=0.1, reward_bound=0.0
123938: loss=0.000, reward_mean=0.0, reward_bound=0.0
123939: loss=0.000, reward_mean=0.1, reward_bound=0.0
123940: loss=0.000, reward_mean=0.0, reward_bound=0.0
123941: loss=0.000, reward_mean=0.2, reward_bound=0.0
123942: loss=0.000, reward_mean=0.1, reward_bound=0.0
123943: loss=0.000, reward_mean=0.0, reward_bound=0.0
123944: loss=0.000, reward_mean=0.1, reward_bound=0.0
123945: loss=0.000, reward_mean=0.0, reward_bound=0.0
123946: loss=0.000, reward_mean=0.1, reward_bound=0.0
123947: loss=0.000, reward_mean=0.0, reward_bound=0.0
123948: loss=0.000, reward_mean=0.1, reward_bound=0.0
123949: loss=0.000, reward_mean=0.1, reward_bound=0.0
123950: loss=0.000, reward_mean=0.0, reward_bound=0.0
123951: loss=0.000, reward_mean=0.0, reward_bound=0.0
123952: loss=0.000, reward_mean=0.0, reward_bound=0.0
123953: loss=0.000, reward_mean=0.1, reward_bound=0.0
123954: loss=0.000, reward_mean=0.0, reward_bound=0.0
123955: loss=0.000, reward_m

124090: loss=0.000, reward_mean=0.0, reward_bound=0.0
124091: loss=0.000, reward_mean=0.2, reward_bound=0.0
124092: loss=0.000, reward_mean=0.1, reward_bound=0.0
124093: loss=0.000, reward_mean=0.0, reward_bound=0.0
124094: loss=0.000, reward_mean=0.1, reward_bound=0.0
124095: loss=0.000, reward_mean=0.0, reward_bound=0.0
124096: loss=0.000, reward_mean=0.1, reward_bound=0.0
124097: loss=0.000, reward_mean=0.0, reward_bound=0.0
124098: loss=0.000, reward_mean=0.1, reward_bound=0.0
124099: loss=0.000, reward_mean=0.2, reward_bound=0.0
124100: loss=0.000, reward_mean=0.0, reward_bound=0.0
124101: loss=0.000, reward_mean=0.1, reward_bound=0.0
124102: loss=0.000, reward_mean=0.0, reward_bound=0.0
124103: loss=0.000, reward_mean=0.0, reward_bound=0.0
124104: loss=0.000, reward_mean=0.1, reward_bound=0.0
124105: loss=0.000, reward_mean=0.1, reward_bound=0.0
124106: loss=0.000, reward_mean=0.1, reward_bound=0.0
124107: loss=0.000, reward_mean=0.0, reward_bound=0.0
124108: loss=0.000, reward_m

124246: loss=0.000, reward_mean=0.0, reward_bound=0.0
124247: loss=0.000, reward_mean=0.0, reward_bound=0.0
124248: loss=0.000, reward_mean=0.1, reward_bound=0.0
124249: loss=0.000, reward_mean=0.1, reward_bound=0.0
124250: loss=0.000, reward_mean=0.1, reward_bound=0.0
124251: loss=0.000, reward_mean=0.1, reward_bound=0.0
124252: loss=0.000, reward_mean=0.1, reward_bound=0.0
124253: loss=0.000, reward_mean=0.1, reward_bound=0.0
124254: loss=0.000, reward_mean=0.0, reward_bound=0.0
124255: loss=0.000, reward_mean=0.0, reward_bound=0.0
124256: loss=0.000, reward_mean=0.2, reward_bound=0.0
124257: loss=0.000, reward_mean=0.0, reward_bound=0.0
124258: loss=0.000, reward_mean=0.0, reward_bound=0.0
124259: loss=0.000, reward_mean=0.1, reward_bound=0.0
124260: loss=0.000, reward_mean=0.0, reward_bound=0.0
124261: loss=0.000, reward_mean=0.1, reward_bound=0.0
124262: loss=0.000, reward_mean=0.0, reward_bound=0.0
124263: loss=0.000, reward_mean=0.0, reward_bound=0.0
124264: loss=0.000, reward_m

124398: loss=0.000, reward_mean=0.1, reward_bound=0.0
124399: loss=0.000, reward_mean=0.1, reward_bound=0.0
124400: loss=0.000, reward_mean=0.0, reward_bound=0.0
124401: loss=0.000, reward_mean=0.0, reward_bound=0.0
124402: loss=0.000, reward_mean=0.0, reward_bound=0.0
124403: loss=0.000, reward_mean=0.0, reward_bound=0.0
124404: loss=0.000, reward_mean=0.0, reward_bound=0.0
124405: loss=0.000, reward_mean=0.0, reward_bound=0.0
124406: loss=0.000, reward_mean=0.1, reward_bound=0.0
124407: loss=0.000, reward_mean=0.0, reward_bound=0.0
124408: loss=0.000, reward_mean=0.0, reward_bound=0.0
124409: loss=0.000, reward_mean=0.0, reward_bound=0.0
124410: loss=0.000, reward_mean=0.1, reward_bound=0.0
124411: loss=0.000, reward_mean=0.1, reward_bound=0.0
124412: loss=0.000, reward_mean=0.0, reward_bound=0.0
124413: loss=0.000, reward_mean=0.1, reward_bound=0.0
124414: loss=0.000, reward_mean=0.2, reward_bound=0.0
124415: loss=0.000, reward_mean=0.0, reward_bound=0.0
124416: loss=0.000, reward_m

124557: loss=0.000, reward_mean=0.0, reward_bound=0.0
124558: loss=0.000, reward_mean=0.0, reward_bound=0.0
124559: loss=0.000, reward_mean=0.2, reward_bound=0.0
124560: loss=0.000, reward_mean=0.1, reward_bound=0.0
124561: loss=0.000, reward_mean=0.1, reward_bound=0.0
124562: loss=0.000, reward_mean=0.0, reward_bound=0.0
124563: loss=0.000, reward_mean=0.0, reward_bound=0.0
124564: loss=0.000, reward_mean=0.1, reward_bound=0.0
124565: loss=0.000, reward_mean=0.0, reward_bound=0.0
124566: loss=0.000, reward_mean=0.1, reward_bound=0.0
124567: loss=0.000, reward_mean=0.1, reward_bound=0.0
124568: loss=0.000, reward_mean=0.1, reward_bound=0.0
124569: loss=0.000, reward_mean=0.1, reward_bound=0.0
124570: loss=0.000, reward_mean=0.1, reward_bound=0.0
124571: loss=0.000, reward_mean=0.0, reward_bound=0.0
124572: loss=0.000, reward_mean=0.1, reward_bound=0.0
124573: loss=0.000, reward_mean=0.0, reward_bound=0.0
124574: loss=0.000, reward_mean=0.0, reward_bound=0.0
124575: loss=0.000, reward_m

124713: loss=0.000, reward_mean=0.1, reward_bound=0.0
124714: loss=0.000, reward_mean=0.1, reward_bound=0.0
124715: loss=0.000, reward_mean=0.0, reward_bound=0.0
124716: loss=0.000, reward_mean=0.0, reward_bound=0.0
124717: loss=0.000, reward_mean=0.0, reward_bound=0.0
124718: loss=0.000, reward_mean=0.0, reward_bound=0.0
124719: loss=0.000, reward_mean=0.1, reward_bound=0.0
124720: loss=0.000, reward_mean=0.1, reward_bound=0.0
124721: loss=0.000, reward_mean=0.1, reward_bound=0.0
124722: loss=0.000, reward_mean=0.0, reward_bound=0.0
124723: loss=0.000, reward_mean=0.0, reward_bound=0.0
124724: loss=0.000, reward_mean=0.0, reward_bound=0.0
124725: loss=0.000, reward_mean=0.0, reward_bound=0.0
124726: loss=0.000, reward_mean=0.1, reward_bound=0.0
124727: loss=0.000, reward_mean=0.1, reward_bound=0.0
124728: loss=0.000, reward_mean=0.0, reward_bound=0.0
124729: loss=0.000, reward_mean=0.1, reward_bound=0.0
124730: loss=0.000, reward_mean=0.1, reward_bound=0.0
124731: loss=0.000, reward_m

124867: loss=0.000, reward_mean=0.1, reward_bound=0.0
124868: loss=0.000, reward_mean=0.1, reward_bound=0.0
124869: loss=0.000, reward_mean=0.0, reward_bound=0.0
124870: loss=0.000, reward_mean=0.1, reward_bound=0.0
124871: loss=0.000, reward_mean=0.0, reward_bound=0.0
124872: loss=0.000, reward_mean=0.1, reward_bound=0.0
124873: loss=0.000, reward_mean=0.1, reward_bound=0.0
124874: loss=0.000, reward_mean=0.0, reward_bound=0.0
124875: loss=0.000, reward_mean=0.1, reward_bound=0.0
124876: loss=0.000, reward_mean=0.2, reward_bound=0.0
124877: loss=0.000, reward_mean=0.1, reward_bound=0.0
124878: loss=0.000, reward_mean=0.0, reward_bound=0.0
124879: loss=0.000, reward_mean=0.1, reward_bound=0.0
124880: loss=0.000, reward_mean=0.0, reward_bound=0.0
124881: loss=0.000, reward_mean=0.0, reward_bound=0.0
124882: loss=0.000, reward_mean=0.1, reward_bound=0.0
124883: loss=0.000, reward_mean=0.2, reward_bound=0.0
124884: loss=0.000, reward_mean=0.1, reward_bound=0.0
124885: loss=0.000, reward_m

125021: loss=0.000, reward_mean=0.1, reward_bound=0.0
125022: loss=0.000, reward_mean=0.1, reward_bound=0.0
125023: loss=0.000, reward_mean=0.2, reward_bound=0.0
125024: loss=0.000, reward_mean=0.1, reward_bound=0.0
125025: loss=0.000, reward_mean=0.0, reward_bound=0.0
125026: loss=0.000, reward_mean=0.1, reward_bound=0.0
125027: loss=0.000, reward_mean=0.1, reward_bound=0.0
125028: loss=0.000, reward_mean=0.1, reward_bound=0.0
125029: loss=0.000, reward_mean=0.1, reward_bound=0.0
125030: loss=0.000, reward_mean=0.0, reward_bound=0.0
125031: loss=0.000, reward_mean=0.1, reward_bound=0.0
125032: loss=0.000, reward_mean=0.1, reward_bound=0.0
125033: loss=0.000, reward_mean=0.0, reward_bound=0.0
125034: loss=0.000, reward_mean=0.1, reward_bound=0.0
125035: loss=0.000, reward_mean=0.1, reward_bound=0.0
125036: loss=0.000, reward_mean=0.1, reward_bound=0.0
125037: loss=0.000, reward_mean=0.0, reward_bound=0.0
125038: loss=0.000, reward_mean=0.0, reward_bound=0.0
125039: loss=0.000, reward_m

125178: loss=0.000, reward_mean=0.1, reward_bound=0.0
125179: loss=0.000, reward_mean=0.2, reward_bound=0.0
125180: loss=0.000, reward_mean=0.0, reward_bound=0.0
125181: loss=0.000, reward_mean=0.0, reward_bound=0.0
125182: loss=0.000, reward_mean=0.0, reward_bound=0.0
125183: loss=0.000, reward_mean=0.0, reward_bound=0.0
125184: loss=0.000, reward_mean=0.0, reward_bound=0.0
125185: loss=0.000, reward_mean=0.1, reward_bound=0.0
125186: loss=0.000, reward_mean=0.2, reward_bound=0.0
125187: loss=0.000, reward_mean=0.1, reward_bound=0.0
125188: loss=0.000, reward_mean=0.0, reward_bound=0.0
125189: loss=0.000, reward_mean=0.0, reward_bound=0.0
125190: loss=0.000, reward_mean=0.0, reward_bound=0.0
125191: loss=0.000, reward_mean=0.0, reward_bound=0.0
125192: loss=0.000, reward_mean=0.0, reward_bound=0.0
125193: loss=0.000, reward_mean=0.2, reward_bound=0.0
125194: loss=0.000, reward_mean=0.0, reward_bound=0.0
125195: loss=0.000, reward_mean=0.1, reward_bound=0.0
125196: loss=0.000, reward_m

125330: loss=0.000, reward_mean=0.1, reward_bound=0.0
125331: loss=0.000, reward_mean=0.1, reward_bound=0.0
125332: loss=0.000, reward_mean=0.0, reward_bound=0.0
125333: loss=0.000, reward_mean=0.0, reward_bound=0.0
125334: loss=0.000, reward_mean=0.1, reward_bound=0.0
125335: loss=0.000, reward_mean=0.0, reward_bound=0.0
125336: loss=0.000, reward_mean=0.0, reward_bound=0.0
125337: loss=0.000, reward_mean=0.1, reward_bound=0.0
125338: loss=0.000, reward_mean=0.1, reward_bound=0.0
125339: loss=0.000, reward_mean=0.0, reward_bound=0.0
125340: loss=0.000, reward_mean=0.1, reward_bound=0.0
125341: loss=0.000, reward_mean=0.1, reward_bound=0.0
125342: loss=0.000, reward_mean=0.1, reward_bound=0.0
125343: loss=0.000, reward_mean=0.1, reward_bound=0.0
125344: loss=0.000, reward_mean=0.1, reward_bound=0.0
125345: loss=0.000, reward_mean=0.2, reward_bound=0.0
125346: loss=0.000, reward_mean=0.1, reward_bound=0.0
125347: loss=0.000, reward_mean=0.1, reward_bound=0.0
125348: loss=0.000, reward_m

125482: loss=0.000, reward_mean=0.1, reward_bound=0.0
125483: loss=0.000, reward_mean=0.1, reward_bound=0.0
125484: loss=0.000, reward_mean=0.0, reward_bound=0.0
125485: loss=0.000, reward_mean=0.0, reward_bound=0.0
125486: loss=0.000, reward_mean=0.1, reward_bound=0.0
125487: loss=0.000, reward_mean=0.1, reward_bound=0.0
125488: loss=0.000, reward_mean=0.1, reward_bound=0.0
125489: loss=0.000, reward_mean=0.1, reward_bound=0.0
125490: loss=0.000, reward_mean=0.2, reward_bound=0.0
125491: loss=0.000, reward_mean=0.1, reward_bound=0.0
125492: loss=0.000, reward_mean=0.1, reward_bound=0.0
125493: loss=0.000, reward_mean=0.0, reward_bound=0.0
125494: loss=0.000, reward_mean=0.1, reward_bound=0.0
125495: loss=0.000, reward_mean=0.0, reward_bound=0.0
125496: loss=0.000, reward_mean=0.1, reward_bound=0.0
125497: loss=0.000, reward_mean=0.1, reward_bound=0.0
125498: loss=0.000, reward_mean=0.1, reward_bound=0.0
125499: loss=0.000, reward_mean=0.1, reward_bound=0.0
125500: loss=0.000, reward_m

125633: loss=0.000, reward_mean=0.0, reward_bound=0.0
125634: loss=0.000, reward_mean=0.1, reward_bound=0.0
125635: loss=0.000, reward_mean=0.1, reward_bound=0.0
125636: loss=0.000, reward_mean=0.1, reward_bound=0.0
125637: loss=0.000, reward_mean=0.0, reward_bound=0.0
125638: loss=0.000, reward_mean=0.0, reward_bound=0.0
125639: loss=0.000, reward_mean=0.1, reward_bound=0.0
125640: loss=0.000, reward_mean=0.1, reward_bound=0.0
125641: loss=0.000, reward_mean=0.1, reward_bound=0.0
125642: loss=0.000, reward_mean=0.0, reward_bound=0.0
125643: loss=0.000, reward_mean=0.1, reward_bound=0.0
125644: loss=0.000, reward_mean=0.1, reward_bound=0.0
125645: loss=0.000, reward_mean=0.1, reward_bound=0.0
125646: loss=0.000, reward_mean=0.1, reward_bound=0.0
125647: loss=0.000, reward_mean=0.1, reward_bound=0.0
125648: loss=0.000, reward_mean=0.1, reward_bound=0.0
125649: loss=0.000, reward_mean=0.1, reward_bound=0.0
125650: loss=0.000, reward_mean=0.1, reward_bound=0.0
125651: loss=0.000, reward_m

125786: loss=0.000, reward_mean=0.0, reward_bound=0.0
125787: loss=0.000, reward_mean=0.0, reward_bound=0.0
125788: loss=0.000, reward_mean=0.0, reward_bound=0.0
125789: loss=0.000, reward_mean=0.1, reward_bound=0.0
125790: loss=0.000, reward_mean=0.1, reward_bound=0.0
125791: loss=0.000, reward_mean=0.0, reward_bound=0.0
125792: loss=0.000, reward_mean=0.1, reward_bound=0.0
125793: loss=0.000, reward_mean=0.1, reward_bound=0.0
125794: loss=0.000, reward_mean=0.0, reward_bound=0.0
125795: loss=0.000, reward_mean=0.0, reward_bound=0.0
125796: loss=0.000, reward_mean=0.1, reward_bound=0.0
125797: loss=0.000, reward_mean=0.0, reward_bound=0.0
125798: loss=0.000, reward_mean=0.0, reward_bound=0.0
125799: loss=0.000, reward_mean=0.1, reward_bound=0.0
125800: loss=0.000, reward_mean=0.1, reward_bound=0.0
125801: loss=0.000, reward_mean=0.0, reward_bound=0.0
125802: loss=0.000, reward_mean=0.0, reward_bound=0.0
125803: loss=0.000, reward_mean=0.1, reward_bound=0.0
125804: loss=0.000, reward_m

125939: loss=0.000, reward_mean=0.0, reward_bound=0.0
125940: loss=0.000, reward_mean=0.0, reward_bound=0.0
125941: loss=0.000, reward_mean=0.0, reward_bound=0.0
125942: loss=0.000, reward_mean=0.0, reward_bound=0.0
125943: loss=0.000, reward_mean=0.0, reward_bound=0.0
125944: loss=0.000, reward_mean=0.1, reward_bound=0.0
125945: loss=0.000, reward_mean=0.0, reward_bound=0.0
125946: loss=0.000, reward_mean=0.0, reward_bound=0.0
125947: loss=0.000, reward_mean=0.1, reward_bound=0.0
125948: loss=0.000, reward_mean=0.1, reward_bound=0.0
125949: loss=0.000, reward_mean=0.1, reward_bound=0.0
125950: loss=0.000, reward_mean=0.1, reward_bound=0.0
125951: loss=0.000, reward_mean=0.1, reward_bound=0.0
125952: loss=0.000, reward_mean=0.1, reward_bound=0.0
125953: loss=0.000, reward_mean=0.1, reward_bound=0.0
125954: loss=0.000, reward_mean=0.1, reward_bound=0.0
125955: loss=0.000, reward_mean=0.0, reward_bound=0.0
125956: loss=0.000, reward_mean=0.1, reward_bound=0.0
125957: loss=0.000, reward_m

126091: loss=0.000, reward_mean=0.2, reward_bound=0.0
126092: loss=0.000, reward_mean=0.0, reward_bound=0.0
126093: loss=0.000, reward_mean=0.1, reward_bound=0.0
126094: loss=0.000, reward_mean=0.0, reward_bound=0.0
126095: loss=0.000, reward_mean=0.1, reward_bound=0.0
126096: loss=0.000, reward_mean=0.1, reward_bound=0.0
126097: loss=0.000, reward_mean=0.1, reward_bound=0.0
126098: loss=0.000, reward_mean=0.1, reward_bound=0.0
126099: loss=0.000, reward_mean=0.0, reward_bound=0.0
126100: loss=0.000, reward_mean=0.0, reward_bound=0.0
126101: loss=0.000, reward_mean=0.0, reward_bound=0.0
126102: loss=0.000, reward_mean=0.0, reward_bound=0.0
126103: loss=0.000, reward_mean=0.0, reward_bound=0.0
126104: loss=0.000, reward_mean=0.1, reward_bound=0.0
126105: loss=0.000, reward_mean=0.2, reward_bound=0.0
126106: loss=0.000, reward_mean=0.1, reward_bound=0.0
126107: loss=0.000, reward_mean=0.0, reward_bound=0.0
126108: loss=0.000, reward_mean=0.0, reward_bound=0.0
126109: loss=0.000, reward_m

126245: loss=0.000, reward_mean=0.0, reward_bound=0.0
126246: loss=0.000, reward_mean=0.0, reward_bound=0.0
126247: loss=0.000, reward_mean=0.1, reward_bound=0.0
126248: loss=0.000, reward_mean=0.2, reward_bound=0.0
126249: loss=0.000, reward_mean=0.0, reward_bound=0.0
126250: loss=0.000, reward_mean=0.0, reward_bound=0.0
126251: loss=0.000, reward_mean=0.0, reward_bound=0.0
126252: loss=0.000, reward_mean=0.1, reward_bound=0.0
126253: loss=0.000, reward_mean=0.0, reward_bound=0.0
126254: loss=0.000, reward_mean=0.1, reward_bound=0.0
126255: loss=0.000, reward_mean=0.2, reward_bound=0.0
126256: loss=0.000, reward_mean=0.0, reward_bound=0.0
126257: loss=0.000, reward_mean=0.0, reward_bound=0.0
126258: loss=0.000, reward_mean=0.1, reward_bound=0.0
126259: loss=0.000, reward_mean=0.0, reward_bound=0.0
126260: loss=0.000, reward_mean=0.0, reward_bound=0.0
126261: loss=0.000, reward_mean=0.0, reward_bound=0.0
126262: loss=0.000, reward_mean=0.0, reward_bound=0.0
126263: loss=0.000, reward_m

126397: loss=0.000, reward_mean=0.0, reward_bound=0.0
126398: loss=0.000, reward_mean=0.1, reward_bound=0.0
126399: loss=0.000, reward_mean=0.0, reward_bound=0.0
126400: loss=0.000, reward_mean=0.1, reward_bound=0.0
126401: loss=0.000, reward_mean=0.1, reward_bound=0.0
126402: loss=0.000, reward_mean=0.1, reward_bound=0.0
126403: loss=0.000, reward_mean=0.0, reward_bound=0.0
126404: loss=0.000, reward_mean=0.1, reward_bound=0.0
126405: loss=0.000, reward_mean=0.2, reward_bound=0.0
126406: loss=0.000, reward_mean=0.0, reward_bound=0.0
126407: loss=0.000, reward_mean=0.0, reward_bound=0.0
126408: loss=0.000, reward_mean=0.0, reward_bound=0.0
126409: loss=0.000, reward_mean=0.0, reward_bound=0.0
126410: loss=0.000, reward_mean=0.0, reward_bound=0.0
126411: loss=0.000, reward_mean=0.1, reward_bound=0.0
126412: loss=0.000, reward_mean=0.0, reward_bound=0.0
126413: loss=0.000, reward_mean=0.1, reward_bound=0.0
126414: loss=0.000, reward_mean=0.1, reward_bound=0.0
126415: loss=0.000, reward_m

126554: loss=0.000, reward_mean=0.1, reward_bound=0.0
126555: loss=0.000, reward_mean=0.0, reward_bound=0.0
126556: loss=0.000, reward_mean=0.1, reward_bound=0.0
126557: loss=0.000, reward_mean=0.1, reward_bound=0.0
126558: loss=0.000, reward_mean=0.0, reward_bound=0.0
126559: loss=0.000, reward_mean=0.1, reward_bound=0.0
126560: loss=0.000, reward_mean=0.0, reward_bound=0.0
126561: loss=0.000, reward_mean=0.1, reward_bound=0.0
126562: loss=0.000, reward_mean=0.0, reward_bound=0.0
126563: loss=0.000, reward_mean=0.0, reward_bound=0.0
126564: loss=0.000, reward_mean=0.1, reward_bound=0.0
126565: loss=0.000, reward_mean=0.1, reward_bound=0.0
126566: loss=0.000, reward_mean=0.0, reward_bound=0.0
126567: loss=0.000, reward_mean=0.2, reward_bound=0.0
126568: loss=0.000, reward_mean=0.1, reward_bound=0.0
126569: loss=0.000, reward_mean=0.1, reward_bound=0.0
126570: loss=0.000, reward_mean=0.0, reward_bound=0.0
126571: loss=0.000, reward_mean=0.0, reward_bound=0.0
126572: loss=0.000, reward_m

126707: loss=0.000, reward_mean=0.0, reward_bound=0.0
126708: loss=0.000, reward_mean=0.2, reward_bound=0.0
126709: loss=0.000, reward_mean=0.0, reward_bound=0.0
126710: loss=0.000, reward_mean=0.0, reward_bound=0.0
126711: loss=0.000, reward_mean=0.0, reward_bound=0.0
126712: loss=0.000, reward_mean=0.0, reward_bound=0.0
126713: loss=0.000, reward_mean=0.0, reward_bound=0.0
126714: loss=0.000, reward_mean=0.1, reward_bound=0.0
126715: loss=0.000, reward_mean=0.1, reward_bound=0.0
126716: loss=0.000, reward_mean=0.1, reward_bound=0.0
126717: loss=0.000, reward_mean=0.0, reward_bound=0.0
126718: loss=0.000, reward_mean=0.1, reward_bound=0.0
126719: loss=0.000, reward_mean=0.1, reward_bound=0.0
126720: loss=0.000, reward_mean=0.1, reward_bound=0.0
126721: loss=0.000, reward_mean=0.0, reward_bound=0.0
126722: loss=0.000, reward_mean=0.1, reward_bound=0.0
126723: loss=0.000, reward_mean=0.0, reward_bound=0.0
126724: loss=0.000, reward_mean=0.1, reward_bound=0.0
126725: loss=0.000, reward_m

126861: loss=0.000, reward_mean=0.1, reward_bound=0.0
126862: loss=0.000, reward_mean=0.1, reward_bound=0.0
126863: loss=0.000, reward_mean=0.0, reward_bound=0.0
126864: loss=0.000, reward_mean=0.0, reward_bound=0.0
126865: loss=0.000, reward_mean=0.1, reward_bound=0.0
126866: loss=0.000, reward_mean=0.0, reward_bound=0.0
126867: loss=0.000, reward_mean=0.2, reward_bound=0.0
126868: loss=0.000, reward_mean=0.1, reward_bound=0.0
126869: loss=0.000, reward_mean=0.0, reward_bound=0.0
126870: loss=0.000, reward_mean=0.1, reward_bound=0.0
126871: loss=0.000, reward_mean=0.0, reward_bound=0.0
126872: loss=0.000, reward_mean=0.1, reward_bound=0.0
126873: loss=0.000, reward_mean=0.1, reward_bound=0.0
126874: loss=0.000, reward_mean=0.0, reward_bound=0.0
126875: loss=0.000, reward_mean=0.1, reward_bound=0.0
126876: loss=0.000, reward_mean=0.1, reward_bound=0.0
126877: loss=0.000, reward_mean=0.1, reward_bound=0.0
126878: loss=0.000, reward_mean=0.1, reward_bound=0.0
126879: loss=0.000, reward_m

127016: loss=0.000, reward_mean=0.1, reward_bound=0.0
127017: loss=0.000, reward_mean=0.0, reward_bound=0.0
127018: loss=0.000, reward_mean=0.1, reward_bound=0.0
127019: loss=0.000, reward_mean=0.0, reward_bound=0.0
127020: loss=0.000, reward_mean=0.1, reward_bound=0.0
127021: loss=0.000, reward_mean=0.1, reward_bound=0.0
127022: loss=0.000, reward_mean=0.0, reward_bound=0.0
127023: loss=0.000, reward_mean=0.0, reward_bound=0.0
127024: loss=0.000, reward_mean=0.1, reward_bound=0.0
127025: loss=0.000, reward_mean=0.1, reward_bound=0.0
127026: loss=0.000, reward_mean=0.0, reward_bound=0.0
127027: loss=0.000, reward_mean=0.1, reward_bound=0.0
127028: loss=0.000, reward_mean=0.0, reward_bound=0.0
127029: loss=0.000, reward_mean=0.0, reward_bound=0.0
127030: loss=0.000, reward_mean=0.1, reward_bound=0.0
127031: loss=0.000, reward_mean=0.0, reward_bound=0.0
127032: loss=0.000, reward_mean=0.1, reward_bound=0.0
127033: loss=0.000, reward_mean=0.0, reward_bound=0.0
127034: loss=0.000, reward_m

127168: loss=0.000, reward_mean=0.0, reward_bound=0.0
127169: loss=0.000, reward_mean=0.1, reward_bound=0.0
127170: loss=0.000, reward_mean=0.1, reward_bound=0.0
127171: loss=0.000, reward_mean=0.0, reward_bound=0.0
127172: loss=0.000, reward_mean=0.1, reward_bound=0.0
127173: loss=0.000, reward_mean=0.1, reward_bound=0.0
127174: loss=0.000, reward_mean=0.0, reward_bound=0.0
127175: loss=0.000, reward_mean=0.0, reward_bound=0.0
127176: loss=0.000, reward_mean=0.1, reward_bound=0.0
127177: loss=0.000, reward_mean=0.2, reward_bound=0.0
127178: loss=0.000, reward_mean=0.1, reward_bound=0.0
127179: loss=0.000, reward_mean=0.1, reward_bound=0.0
127180: loss=0.000, reward_mean=0.1, reward_bound=0.0
127181: loss=0.000, reward_mean=0.2, reward_bound=0.0
127182: loss=0.000, reward_mean=0.1, reward_bound=0.0
127183: loss=0.000, reward_mean=0.0, reward_bound=0.0
127184: loss=0.000, reward_mean=0.0, reward_bound=0.0
127185: loss=0.000, reward_mean=0.1, reward_bound=0.0
127186: loss=0.000, reward_m

127320: loss=0.000, reward_mean=0.1, reward_bound=0.0
127321: loss=0.000, reward_mean=0.1, reward_bound=0.0
127322: loss=0.000, reward_mean=0.1, reward_bound=0.0
127323: loss=0.000, reward_mean=0.1, reward_bound=0.0
127324: loss=0.000, reward_mean=0.0, reward_bound=0.0
127325: loss=0.000, reward_mean=0.1, reward_bound=0.0
127326: loss=0.000, reward_mean=0.0, reward_bound=0.0
127327: loss=0.000, reward_mean=0.0, reward_bound=0.0
127328: loss=0.000, reward_mean=0.2, reward_bound=0.0
127329: loss=0.000, reward_mean=0.0, reward_bound=0.0
127330: loss=0.000, reward_mean=0.0, reward_bound=0.0
127331: loss=0.000, reward_mean=0.1, reward_bound=0.0
127332: loss=0.000, reward_mean=0.0, reward_bound=0.0
127333: loss=0.000, reward_mean=0.0, reward_bound=0.0
127334: loss=0.000, reward_mean=0.0, reward_bound=0.0
127335: loss=0.000, reward_mean=0.1, reward_bound=0.0
127336: loss=0.000, reward_mean=0.1, reward_bound=0.0
127337: loss=0.000, reward_mean=0.0, reward_bound=0.0
127338: loss=0.000, reward_m

127476: loss=0.000, reward_mean=0.0, reward_bound=0.0
127477: loss=0.000, reward_mean=0.1, reward_bound=0.0
127478: loss=0.000, reward_mean=0.1, reward_bound=0.0
127479: loss=0.000, reward_mean=0.0, reward_bound=0.0
127480: loss=0.000, reward_mean=0.1, reward_bound=0.0
127481: loss=0.000, reward_mean=0.0, reward_bound=0.0
127482: loss=0.000, reward_mean=0.0, reward_bound=0.0
127483: loss=0.000, reward_mean=0.1, reward_bound=0.0
127484: loss=0.000, reward_mean=0.1, reward_bound=0.0
127485: loss=0.000, reward_mean=0.0, reward_bound=0.0
127486: loss=0.000, reward_mean=0.1, reward_bound=0.0
127487: loss=0.000, reward_mean=0.1, reward_bound=0.0
127488: loss=0.000, reward_mean=0.1, reward_bound=0.0
127489: loss=0.000, reward_mean=0.0, reward_bound=0.0
127490: loss=0.000, reward_mean=0.0, reward_bound=0.0
127491: loss=0.000, reward_mean=0.0, reward_bound=0.0
127492: loss=0.000, reward_mean=0.1, reward_bound=0.0
127493: loss=0.000, reward_mean=0.0, reward_bound=0.0
127494: loss=0.000, reward_m

127631: loss=0.000, reward_mean=0.1, reward_bound=0.0
127632: loss=0.000, reward_mean=0.0, reward_bound=0.0
127633: loss=0.000, reward_mean=0.1, reward_bound=0.0
127634: loss=0.000, reward_mean=0.1, reward_bound=0.0
127635: loss=0.000, reward_mean=0.1, reward_bound=0.0
127636: loss=0.000, reward_mean=0.0, reward_bound=0.0
127637: loss=0.000, reward_mean=0.1, reward_bound=0.0
127638: loss=0.000, reward_mean=0.1, reward_bound=0.0
127639: loss=0.000, reward_mean=0.0, reward_bound=0.0
127640: loss=0.000, reward_mean=0.1, reward_bound=0.0
127641: loss=0.000, reward_mean=0.0, reward_bound=0.0
127642: loss=0.000, reward_mean=0.1, reward_bound=0.0
127643: loss=0.000, reward_mean=0.0, reward_bound=0.0
127644: loss=0.000, reward_mean=0.0, reward_bound=0.0
127645: loss=0.000, reward_mean=0.1, reward_bound=0.0
127646: loss=0.000, reward_mean=0.1, reward_bound=0.0
127647: loss=0.000, reward_mean=0.0, reward_bound=0.0
127648: loss=0.000, reward_mean=0.2, reward_bound=0.0
127649: loss=0.000, reward_m

127786: loss=0.000, reward_mean=0.0, reward_bound=0.0
127787: loss=0.000, reward_mean=0.1, reward_bound=0.0
127788: loss=0.000, reward_mean=0.1, reward_bound=0.0
127789: loss=0.000, reward_mean=0.0, reward_bound=0.0
127790: loss=0.000, reward_mean=0.0, reward_bound=0.0
127791: loss=0.000, reward_mean=0.0, reward_bound=0.0
127792: loss=0.000, reward_mean=0.1, reward_bound=0.0
127793: loss=0.000, reward_mean=0.0, reward_bound=0.0
127794: loss=0.000, reward_mean=0.0, reward_bound=0.0
127795: loss=0.000, reward_mean=0.1, reward_bound=0.0
127796: loss=0.000, reward_mean=0.1, reward_bound=0.0
127797: loss=0.000, reward_mean=0.0, reward_bound=0.0
127798: loss=0.000, reward_mean=0.1, reward_bound=0.0
127799: loss=0.000, reward_mean=0.0, reward_bound=0.0
127800: loss=0.000, reward_mean=0.0, reward_bound=0.0
127801: loss=0.000, reward_mean=0.1, reward_bound=0.0
127802: loss=0.000, reward_mean=0.1, reward_bound=0.0
127803: loss=0.000, reward_mean=0.0, reward_bound=0.0
127804: loss=0.000, reward_m

127942: loss=0.000, reward_mean=0.1, reward_bound=0.0
127943: loss=0.000, reward_mean=0.1, reward_bound=0.0
127944: loss=0.000, reward_mean=0.1, reward_bound=0.0
127945: loss=0.000, reward_mean=0.0, reward_bound=0.0
127946: loss=0.000, reward_mean=0.0, reward_bound=0.0
127947: loss=0.000, reward_mean=0.1, reward_bound=0.0
127948: loss=0.000, reward_mean=0.1, reward_bound=0.0
127949: loss=0.000, reward_mean=0.0, reward_bound=0.0
127950: loss=0.000, reward_mean=0.1, reward_bound=0.0
127951: loss=0.000, reward_mean=0.2, reward_bound=0.0
127952: loss=0.000, reward_mean=0.0, reward_bound=0.0
127953: loss=0.000, reward_mean=0.1, reward_bound=0.0
127954: loss=0.000, reward_mean=0.0, reward_bound=0.0
127955: loss=0.000, reward_mean=0.1, reward_bound=0.0
127956: loss=0.000, reward_mean=0.0, reward_bound=0.0
127957: loss=0.000, reward_mean=0.1, reward_bound=0.0
127958: loss=0.000, reward_mean=0.1, reward_bound=0.0
127959: loss=0.000, reward_mean=0.0, reward_bound=0.0
127960: loss=0.000, reward_m

128097: loss=0.000, reward_mean=0.1, reward_bound=0.0
128098: loss=0.000, reward_mean=0.1, reward_bound=0.0
128099: loss=0.000, reward_mean=0.1, reward_bound=0.0
128100: loss=0.000, reward_mean=0.2, reward_bound=0.0
128101: loss=0.000, reward_mean=0.0, reward_bound=0.0
128102: loss=0.000, reward_mean=0.0, reward_bound=0.0
128103: loss=0.000, reward_mean=0.1, reward_bound=0.0
128104: loss=0.000, reward_mean=0.0, reward_bound=0.0
128105: loss=0.000, reward_mean=0.0, reward_bound=0.0
128106: loss=0.000, reward_mean=0.1, reward_bound=0.0
128107: loss=0.000, reward_mean=0.0, reward_bound=0.0
128108: loss=0.000, reward_mean=0.0, reward_bound=0.0
128109: loss=0.000, reward_mean=0.0, reward_bound=0.0
128110: loss=0.000, reward_mean=0.0, reward_bound=0.0
128111: loss=0.000, reward_mean=0.1, reward_bound=0.0
128112: loss=0.000, reward_mean=0.2, reward_bound=0.0
128113: loss=0.000, reward_mean=0.0, reward_bound=0.0
128114: loss=0.000, reward_mean=0.1, reward_bound=0.0
128115: loss=0.000, reward_m

128250: loss=0.000, reward_mean=0.0, reward_bound=0.0
128251: loss=0.000, reward_mean=0.1, reward_bound=0.0
128252: loss=0.000, reward_mean=0.1, reward_bound=0.0
128253: loss=0.000, reward_mean=0.2, reward_bound=0.0
128254: loss=0.000, reward_mean=0.1, reward_bound=0.0
128255: loss=0.000, reward_mean=0.1, reward_bound=0.0
128256: loss=0.000, reward_mean=0.0, reward_bound=0.0
128257: loss=0.000, reward_mean=0.1, reward_bound=0.0
128258: loss=0.000, reward_mean=0.0, reward_bound=0.0
128259: loss=0.000, reward_mean=0.0, reward_bound=0.0
128260: loss=0.000, reward_mean=0.0, reward_bound=0.0
128261: loss=0.000, reward_mean=0.1, reward_bound=0.0
128262: loss=0.000, reward_mean=0.0, reward_bound=0.0
128263: loss=0.000, reward_mean=0.1, reward_bound=0.0
128264: loss=0.000, reward_mean=0.1, reward_bound=0.0
128265: loss=0.000, reward_mean=0.0, reward_bound=0.0
128266: loss=0.000, reward_mean=0.0, reward_bound=0.0
128267: loss=0.000, reward_mean=0.0, reward_bound=0.0
128268: loss=0.000, reward_m

128404: loss=0.000, reward_mean=0.2, reward_bound=0.0
128405: loss=0.000, reward_mean=0.1, reward_bound=0.0
128406: loss=0.000, reward_mean=0.1, reward_bound=0.0
128407: loss=0.000, reward_mean=0.0, reward_bound=0.0
128408: loss=0.000, reward_mean=0.0, reward_bound=0.0
128409: loss=0.000, reward_mean=0.0, reward_bound=0.0
128410: loss=0.000, reward_mean=0.0, reward_bound=0.0
128411: loss=0.000, reward_mean=0.1, reward_bound=0.0
128412: loss=0.000, reward_mean=0.1, reward_bound=0.0
128413: loss=0.000, reward_mean=0.1, reward_bound=0.0
128414: loss=0.000, reward_mean=0.1, reward_bound=0.0
128415: loss=0.000, reward_mean=0.0, reward_bound=0.0
128416: loss=0.000, reward_mean=0.0, reward_bound=0.0
128417: loss=0.000, reward_mean=0.1, reward_bound=0.0
128418: loss=0.000, reward_mean=0.1, reward_bound=0.0
128419: loss=0.000, reward_mean=0.1, reward_bound=0.0
128420: loss=0.000, reward_mean=0.1, reward_bound=0.0
128421: loss=0.000, reward_mean=0.0, reward_bound=0.0
128422: loss=0.000, reward_m

128556: loss=0.000, reward_mean=0.2, reward_bound=0.0
128557: loss=0.000, reward_mean=0.0, reward_bound=0.0
128558: loss=0.000, reward_mean=0.0, reward_bound=0.0
128559: loss=0.000, reward_mean=0.1, reward_bound=0.0
128560: loss=0.000, reward_mean=0.0, reward_bound=0.0
128561: loss=0.000, reward_mean=0.1, reward_bound=0.0
128562: loss=0.000, reward_mean=0.0, reward_bound=0.0
128563: loss=0.000, reward_mean=0.0, reward_bound=0.0
128564: loss=0.000, reward_mean=0.1, reward_bound=0.0
128565: loss=0.000, reward_mean=0.1, reward_bound=0.0
128566: loss=0.000, reward_mean=0.1, reward_bound=0.0
128567: loss=0.000, reward_mean=0.1, reward_bound=0.0
128568: loss=0.000, reward_mean=0.1, reward_bound=0.0
128569: loss=0.000, reward_mean=0.0, reward_bound=0.0
128570: loss=0.000, reward_mean=0.0, reward_bound=0.0
128571: loss=0.000, reward_mean=0.0, reward_bound=0.0
128572: loss=0.000, reward_mean=0.1, reward_bound=0.0
128573: loss=0.000, reward_mean=0.0, reward_bound=0.0
128574: loss=0.000, reward_m

128714: loss=0.000, reward_mean=0.0, reward_bound=0.0
128715: loss=0.000, reward_mean=0.0, reward_bound=0.0
128716: loss=0.000, reward_mean=0.1, reward_bound=0.0
128717: loss=0.000, reward_mean=0.1, reward_bound=0.0
128718: loss=0.000, reward_mean=0.1, reward_bound=0.0
128719: loss=0.000, reward_mean=0.1, reward_bound=0.0
128720: loss=0.000, reward_mean=0.1, reward_bound=0.0
128721: loss=0.000, reward_mean=0.0, reward_bound=0.0
128722: loss=0.000, reward_mean=0.1, reward_bound=0.0
128723: loss=0.000, reward_mean=0.1, reward_bound=0.0
128724: loss=0.000, reward_mean=0.0, reward_bound=0.0
128725: loss=0.000, reward_mean=0.1, reward_bound=0.0
128726: loss=0.000, reward_mean=0.0, reward_bound=0.0
128727: loss=0.000, reward_mean=0.0, reward_bound=0.0
128728: loss=0.000, reward_mean=0.1, reward_bound=0.0
128729: loss=0.000, reward_mean=0.1, reward_bound=0.0
128730: loss=0.000, reward_mean=0.1, reward_bound=0.0
128731: loss=0.000, reward_mean=0.1, reward_bound=0.0
128732: loss=0.000, reward_m

128867: loss=0.000, reward_mean=0.0, reward_bound=0.0
128868: loss=0.000, reward_mean=0.1, reward_bound=0.0
128869: loss=0.000, reward_mean=0.1, reward_bound=0.0
128870: loss=0.000, reward_mean=0.1, reward_bound=0.0
128871: loss=0.000, reward_mean=0.0, reward_bound=0.0
128872: loss=0.000, reward_mean=0.1, reward_bound=0.0
128873: loss=0.000, reward_mean=0.1, reward_bound=0.0
128874: loss=0.000, reward_mean=0.1, reward_bound=0.0
128875: loss=0.000, reward_mean=0.0, reward_bound=0.0
128876: loss=0.000, reward_mean=0.1, reward_bound=0.0
128877: loss=0.000, reward_mean=0.1, reward_bound=0.0
128878: loss=0.000, reward_mean=0.1, reward_bound=0.0
128879: loss=0.000, reward_mean=0.2, reward_bound=0.0
128880: loss=0.000, reward_mean=0.1, reward_bound=0.0
128881: loss=0.000, reward_mean=0.1, reward_bound=0.0
128882: loss=0.000, reward_mean=0.1, reward_bound=0.0
128883: loss=0.000, reward_mean=0.0, reward_bound=0.0
128884: loss=0.000, reward_mean=0.0, reward_bound=0.0
128885: loss=0.000, reward_m

129019: loss=0.000, reward_mean=0.0, reward_bound=0.0
129020: loss=0.000, reward_mean=0.0, reward_bound=0.0
129021: loss=0.000, reward_mean=0.1, reward_bound=0.0
129022: loss=0.000, reward_mean=0.1, reward_bound=0.0
129023: loss=0.000, reward_mean=0.1, reward_bound=0.0
129024: loss=0.000, reward_mean=0.1, reward_bound=0.0
129025: loss=0.000, reward_mean=0.0, reward_bound=0.0
129026: loss=0.000, reward_mean=0.1, reward_bound=0.0
129027: loss=0.000, reward_mean=0.1, reward_bound=0.0
129028: loss=0.000, reward_mean=0.0, reward_bound=0.0
129029: loss=0.000, reward_mean=0.0, reward_bound=0.0
129030: loss=0.000, reward_mean=0.1, reward_bound=0.0
129031: loss=0.000, reward_mean=0.1, reward_bound=0.0
129032: loss=0.000, reward_mean=0.1, reward_bound=0.0
129033: loss=0.000, reward_mean=0.1, reward_bound=0.0
129034: loss=0.000, reward_mean=0.0, reward_bound=0.0
129035: loss=0.000, reward_mean=0.1, reward_bound=0.0
129036: loss=0.000, reward_mean=0.1, reward_bound=0.0
129037: loss=0.000, reward_m

129171: loss=0.000, reward_mean=0.1, reward_bound=0.0
129172: loss=0.000, reward_mean=0.1, reward_bound=0.0
129173: loss=0.000, reward_mean=0.0, reward_bound=0.0
129174: loss=0.000, reward_mean=0.0, reward_bound=0.0
129175: loss=0.000, reward_mean=0.1, reward_bound=0.0
129176: loss=0.000, reward_mean=0.0, reward_bound=0.0
129177: loss=0.000, reward_mean=0.0, reward_bound=0.0
129178: loss=0.000, reward_mean=0.0, reward_bound=0.0
129179: loss=0.000, reward_mean=0.1, reward_bound=0.0
129180: loss=0.000, reward_mean=0.0, reward_bound=0.0
129181: loss=0.000, reward_mean=0.0, reward_bound=0.0
129182: loss=0.000, reward_mean=0.1, reward_bound=0.0
129183: loss=0.000, reward_mean=0.1, reward_bound=0.0
129184: loss=0.000, reward_mean=0.0, reward_bound=0.0
129185: loss=0.000, reward_mean=0.0, reward_bound=0.0
129186: loss=0.000, reward_mean=0.1, reward_bound=0.0
129187: loss=0.000, reward_mean=0.2, reward_bound=0.0
129188: loss=0.000, reward_mean=0.0, reward_bound=0.0
129189: loss=0.000, reward_m

129323: loss=0.000, reward_mean=0.1, reward_bound=0.0
129324: loss=0.000, reward_mean=0.2, reward_bound=0.0
129325: loss=0.000, reward_mean=0.0, reward_bound=0.0
129326: loss=0.000, reward_mean=0.1, reward_bound=0.0
129327: loss=0.000, reward_mean=0.1, reward_bound=0.0
129328: loss=0.000, reward_mean=0.1, reward_bound=0.0
129329: loss=0.000, reward_mean=0.0, reward_bound=0.0
129330: loss=0.000, reward_mean=0.0, reward_bound=0.0
129331: loss=0.000, reward_mean=0.1, reward_bound=0.0
129332: loss=0.000, reward_mean=0.1, reward_bound=0.0
129333: loss=0.000, reward_mean=0.1, reward_bound=0.0
129334: loss=0.000, reward_mean=0.0, reward_bound=0.0
129335: loss=0.000, reward_mean=0.0, reward_bound=0.0
129336: loss=0.000, reward_mean=0.1, reward_bound=0.0
129337: loss=0.000, reward_mean=0.1, reward_bound=0.0
129338: loss=0.000, reward_mean=0.1, reward_bound=0.0
129339: loss=0.000, reward_mean=0.1, reward_bound=0.0
129340: loss=0.000, reward_mean=0.1, reward_bound=0.0
129341: loss=0.000, reward_m

129480: loss=0.000, reward_mean=0.0, reward_bound=0.0
129481: loss=0.000, reward_mean=0.1, reward_bound=0.0
129482: loss=0.000, reward_mean=0.1, reward_bound=0.0
129483: loss=0.000, reward_mean=0.1, reward_bound=0.0
129484: loss=0.000, reward_mean=0.0, reward_bound=0.0
129485: loss=0.000, reward_mean=0.1, reward_bound=0.0
129486: loss=0.000, reward_mean=0.1, reward_bound=0.0
129487: loss=0.000, reward_mean=0.1, reward_bound=0.0
129488: loss=0.000, reward_mean=0.2, reward_bound=0.0
129489: loss=0.000, reward_mean=0.1, reward_bound=0.0
129490: loss=0.000, reward_mean=0.0, reward_bound=0.0
129491: loss=0.000, reward_mean=0.0, reward_bound=0.0
129492: loss=0.000, reward_mean=0.0, reward_bound=0.0
129493: loss=0.000, reward_mean=0.0, reward_bound=0.0
129494: loss=0.000, reward_mean=0.1, reward_bound=0.0
129495: loss=0.000, reward_mean=0.1, reward_bound=0.0
129496: loss=0.000, reward_mean=0.1, reward_bound=0.0
129497: loss=0.000, reward_mean=0.1, reward_bound=0.0
129498: loss=0.000, reward_m

129632: loss=0.000, reward_mean=0.1, reward_bound=0.0
129633: loss=0.000, reward_mean=0.1, reward_bound=0.0
129634: loss=0.000, reward_mean=0.0, reward_bound=0.0
129635: loss=0.000, reward_mean=0.2, reward_bound=0.0
129636: loss=0.000, reward_mean=0.1, reward_bound=0.0
129637: loss=0.000, reward_mean=0.1, reward_bound=0.0
129638: loss=0.000, reward_mean=0.0, reward_bound=0.0
129639: loss=0.000, reward_mean=0.1, reward_bound=0.0
129640: loss=0.000, reward_mean=0.0, reward_bound=0.0
129641: loss=0.000, reward_mean=0.1, reward_bound=0.0
129642: loss=0.000, reward_mean=0.1, reward_bound=0.0
129643: loss=0.000, reward_mean=0.1, reward_bound=0.0
129644: loss=0.000, reward_mean=0.1, reward_bound=0.0
129645: loss=0.000, reward_mean=0.0, reward_bound=0.0
129646: loss=0.000, reward_mean=0.1, reward_bound=0.0
129647: loss=0.000, reward_mean=0.0, reward_bound=0.0
129648: loss=0.000, reward_mean=0.1, reward_bound=0.0
129649: loss=0.000, reward_mean=0.0, reward_bound=0.0
129650: loss=0.000, reward_m

129785: loss=0.000, reward_mean=0.1, reward_bound=0.0
129786: loss=0.000, reward_mean=0.1, reward_bound=0.0
129787: loss=0.000, reward_mean=0.2, reward_bound=0.0
129788: loss=0.000, reward_mean=0.0, reward_bound=0.0
129789: loss=0.000, reward_mean=0.1, reward_bound=0.0
129790: loss=0.000, reward_mean=0.1, reward_bound=0.0
129791: loss=0.000, reward_mean=0.0, reward_bound=0.0
129792: loss=0.000, reward_mean=0.0, reward_bound=0.0
129793: loss=0.000, reward_mean=0.0, reward_bound=0.0
129794: loss=0.000, reward_mean=0.0, reward_bound=0.0
129795: loss=0.000, reward_mean=0.0, reward_bound=0.0
129796: loss=0.000, reward_mean=0.1, reward_bound=0.0
129797: loss=0.000, reward_mean=0.1, reward_bound=0.0
129798: loss=0.000, reward_mean=0.1, reward_bound=0.0
129799: loss=0.000, reward_mean=0.2, reward_bound=0.0
129800: loss=0.000, reward_mean=0.1, reward_bound=0.0
129801: loss=0.000, reward_mean=0.0, reward_bound=0.0
129802: loss=0.000, reward_mean=0.2, reward_bound=0.0
129803: loss=0.000, reward_m

129937: loss=0.000, reward_mean=0.0, reward_bound=0.0
129938: loss=0.000, reward_mean=0.1, reward_bound=0.0
129939: loss=0.000, reward_mean=0.0, reward_bound=0.0
129940: loss=0.000, reward_mean=0.0, reward_bound=0.0
129941: loss=0.000, reward_mean=0.1, reward_bound=0.0
129942: loss=0.000, reward_mean=0.0, reward_bound=0.0
129943: loss=0.000, reward_mean=0.1, reward_bound=0.0
129944: loss=0.000, reward_mean=0.1, reward_bound=0.0
129945: loss=0.000, reward_mean=0.0, reward_bound=0.0
129946: loss=0.000, reward_mean=0.0, reward_bound=0.0
129947: loss=0.000, reward_mean=0.1, reward_bound=0.0
129948: loss=0.000, reward_mean=0.1, reward_bound=0.0
129949: loss=0.000, reward_mean=0.0, reward_bound=0.0
129950: loss=0.000, reward_mean=0.0, reward_bound=0.0
129951: loss=0.000, reward_mean=0.2, reward_bound=0.0
129952: loss=0.000, reward_mean=0.1, reward_bound=0.0
129953: loss=0.000, reward_mean=0.0, reward_bound=0.0
129954: loss=0.000, reward_mean=0.0, reward_bound=0.0
129955: loss=0.000, reward_m

130089: loss=0.000, reward_mean=0.0, reward_bound=0.0
130090: loss=0.000, reward_mean=0.0, reward_bound=0.0
130091: loss=0.000, reward_mean=0.0, reward_bound=0.0
130092: loss=0.000, reward_mean=0.0, reward_bound=0.0
130093: loss=0.000, reward_mean=0.1, reward_bound=0.0
130094: loss=0.000, reward_mean=0.0, reward_bound=0.0
130095: loss=0.000, reward_mean=0.0, reward_bound=0.0
130096: loss=0.000, reward_mean=0.0, reward_bound=0.0
130097: loss=0.000, reward_mean=0.0, reward_bound=0.0
130098: loss=0.000, reward_mean=0.1, reward_bound=0.0
130099: loss=0.000, reward_mean=0.1, reward_bound=0.0
130100: loss=0.000, reward_mean=0.1, reward_bound=0.0
130101: loss=0.000, reward_mean=0.0, reward_bound=0.0
130102: loss=0.000, reward_mean=0.0, reward_bound=0.0
130103: loss=0.000, reward_mean=0.1, reward_bound=0.0
130104: loss=0.000, reward_mean=0.1, reward_bound=0.0
130105: loss=0.000, reward_mean=0.0, reward_bound=0.0
130106: loss=0.000, reward_mean=0.1, reward_bound=0.0
130107: loss=0.000, reward_m

130245: loss=0.000, reward_mean=0.1, reward_bound=0.0
130246: loss=0.000, reward_mean=0.1, reward_bound=0.0
130247: loss=0.000, reward_mean=0.0, reward_bound=0.0
130248: loss=0.000, reward_mean=0.1, reward_bound=0.0
130249: loss=0.000, reward_mean=0.1, reward_bound=0.0
130250: loss=0.000, reward_mean=0.0, reward_bound=0.0
130251: loss=0.000, reward_mean=0.1, reward_bound=0.0
130252: loss=0.000, reward_mean=0.0, reward_bound=0.0
130253: loss=0.000, reward_mean=0.0, reward_bound=0.0
130254: loss=0.000, reward_mean=0.1, reward_bound=0.0
130255: loss=0.000, reward_mean=0.1, reward_bound=0.0
130256: loss=0.000, reward_mean=0.1, reward_bound=0.0
130257: loss=0.000, reward_mean=0.0, reward_bound=0.0
130258: loss=0.000, reward_mean=0.1, reward_bound=0.0
130259: loss=0.000, reward_mean=0.0, reward_bound=0.0
130260: loss=0.000, reward_mean=0.1, reward_bound=0.0
130261: loss=0.000, reward_mean=0.0, reward_bound=0.0
130262: loss=0.000, reward_mean=0.1, reward_bound=0.0
130263: loss=0.000, reward_m

130397: loss=0.000, reward_mean=0.1, reward_bound=0.0
130398: loss=0.000, reward_mean=0.1, reward_bound=0.0
130399: loss=0.000, reward_mean=0.1, reward_bound=0.0
130400: loss=0.000, reward_mean=0.0, reward_bound=0.0
130401: loss=0.000, reward_mean=0.0, reward_bound=0.0
130402: loss=0.000, reward_mean=0.0, reward_bound=0.0
130403: loss=0.000, reward_mean=0.0, reward_bound=0.0
130404: loss=0.000, reward_mean=0.1, reward_bound=0.0
130405: loss=0.000, reward_mean=0.1, reward_bound=0.0
130406: loss=0.000, reward_mean=0.1, reward_bound=0.0
130407: loss=0.000, reward_mean=0.0, reward_bound=0.0
130408: loss=0.000, reward_mean=0.0, reward_bound=0.0
130409: loss=0.000, reward_mean=0.0, reward_bound=0.0
130410: loss=0.000, reward_mean=0.1, reward_bound=0.0
130411: loss=0.000, reward_mean=0.0, reward_bound=0.0
130412: loss=0.000, reward_mean=0.0, reward_bound=0.0
130413: loss=0.000, reward_mean=0.1, reward_bound=0.0
130414: loss=0.000, reward_mean=0.1, reward_bound=0.0
130415: loss=0.000, reward_m

130549: loss=0.000, reward_mean=0.1, reward_bound=0.0
130550: loss=0.000, reward_mean=0.1, reward_bound=0.0
130551: loss=0.000, reward_mean=0.0, reward_bound=0.0
130552: loss=0.000, reward_mean=0.1, reward_bound=0.0
130553: loss=0.000, reward_mean=0.0, reward_bound=0.0
130554: loss=0.000, reward_mean=0.1, reward_bound=0.0
130555: loss=0.000, reward_mean=0.0, reward_bound=0.0
130556: loss=0.000, reward_mean=0.1, reward_bound=0.0
130557: loss=0.000, reward_mean=0.0, reward_bound=0.0
130558: loss=0.000, reward_mean=0.0, reward_bound=0.0
130559: loss=0.000, reward_mean=0.2, reward_bound=0.0
130560: loss=0.000, reward_mean=0.0, reward_bound=0.0
130561: loss=0.000, reward_mean=0.0, reward_bound=0.0
130562: loss=0.000, reward_mean=0.0, reward_bound=0.0
130563: loss=0.000, reward_mean=0.1, reward_bound=0.0
130564: loss=0.000, reward_mean=0.1, reward_bound=0.0
130565: loss=0.000, reward_mean=0.1, reward_bound=0.0
130566: loss=0.000, reward_mean=0.1, reward_bound=0.0
130567: loss=0.000, reward_m

130702: loss=0.000, reward_mean=0.0, reward_bound=0.0
130703: loss=0.000, reward_mean=0.0, reward_bound=0.0
130704: loss=0.000, reward_mean=0.0, reward_bound=0.0
130705: loss=0.000, reward_mean=0.0, reward_bound=0.0
130706: loss=0.000, reward_mean=0.1, reward_bound=0.0
130707: loss=0.000, reward_mean=0.1, reward_bound=0.0
130708: loss=0.000, reward_mean=0.0, reward_bound=0.0
130709: loss=0.000, reward_mean=0.1, reward_bound=0.0
130710: loss=0.000, reward_mean=0.1, reward_bound=0.0
130711: loss=0.000, reward_mean=0.0, reward_bound=0.0
130712: loss=0.000, reward_mean=0.1, reward_bound=0.0
130713: loss=0.000, reward_mean=0.0, reward_bound=0.0
130714: loss=0.000, reward_mean=0.1, reward_bound=0.0
130715: loss=0.000, reward_mean=0.0, reward_bound=0.0
130716: loss=0.000, reward_mean=0.0, reward_bound=0.0
130717: loss=0.000, reward_mean=0.0, reward_bound=0.0
130718: loss=0.000, reward_mean=0.0, reward_bound=0.0
130719: loss=0.000, reward_mean=0.0, reward_bound=0.0
130720: loss=0.000, reward_m

130855: loss=0.000, reward_mean=0.1, reward_bound=0.0
130856: loss=0.000, reward_mean=0.1, reward_bound=0.0
130857: loss=0.000, reward_mean=0.1, reward_bound=0.0
130858: loss=0.000, reward_mean=0.1, reward_bound=0.0
130859: loss=0.000, reward_mean=0.1, reward_bound=0.0
130860: loss=0.000, reward_mean=0.1, reward_bound=0.0
130861: loss=0.000, reward_mean=0.1, reward_bound=0.0
130862: loss=0.000, reward_mean=0.0, reward_bound=0.0
130863: loss=0.000, reward_mean=0.1, reward_bound=0.0
130864: loss=0.000, reward_mean=0.0, reward_bound=0.0
130865: loss=0.000, reward_mean=0.0, reward_bound=0.0
130866: loss=0.000, reward_mean=0.1, reward_bound=0.0
130867: loss=0.000, reward_mean=0.1, reward_bound=0.0
130868: loss=0.000, reward_mean=0.1, reward_bound=0.0
130869: loss=0.000, reward_mean=0.1, reward_bound=0.0
130870: loss=0.000, reward_mean=0.0, reward_bound=0.0
130871: loss=0.000, reward_mean=0.0, reward_bound=0.0
130872: loss=0.000, reward_mean=0.0, reward_bound=0.0
130873: loss=0.000, reward_m

131009: loss=0.000, reward_mean=0.1, reward_bound=0.0
131010: loss=0.000, reward_mean=0.0, reward_bound=0.0
131011: loss=0.000, reward_mean=0.1, reward_bound=0.0
131012: loss=0.000, reward_mean=0.1, reward_bound=0.0
131013: loss=0.000, reward_mean=0.1, reward_bound=0.0
131014: loss=0.000, reward_mean=0.1, reward_bound=0.0
131015: loss=0.000, reward_mean=0.1, reward_bound=0.0
131016: loss=0.000, reward_mean=0.0, reward_bound=0.0
131017: loss=0.000, reward_mean=0.1, reward_bound=0.0
131018: loss=0.000, reward_mean=0.1, reward_bound=0.0
131019: loss=0.000, reward_mean=0.1, reward_bound=0.0
131020: loss=0.000, reward_mean=0.0, reward_bound=0.0
131021: loss=0.000, reward_mean=0.1, reward_bound=0.0
131022: loss=0.000, reward_mean=0.0, reward_bound=0.0
131023: loss=0.000, reward_mean=0.1, reward_bound=0.0
131024: loss=0.000, reward_mean=0.1, reward_bound=0.0
131025: loss=0.000, reward_mean=0.2, reward_bound=0.0
131026: loss=0.000, reward_mean=0.0, reward_bound=0.0
131027: loss=0.000, reward_m

131165: loss=0.000, reward_mean=0.1, reward_bound=0.0
131166: loss=0.000, reward_mean=0.1, reward_bound=0.0
131167: loss=0.000, reward_mean=0.0, reward_bound=0.0
131168: loss=0.000, reward_mean=0.0, reward_bound=0.0
131169: loss=0.000, reward_mean=0.1, reward_bound=0.0
131170: loss=0.000, reward_mean=0.0, reward_bound=0.0
131171: loss=0.000, reward_mean=0.1, reward_bound=0.0
131172: loss=0.000, reward_mean=0.1, reward_bound=0.0
131173: loss=0.000, reward_mean=0.1, reward_bound=0.0
131174: loss=0.000, reward_mean=0.0, reward_bound=0.0
131175: loss=0.000, reward_mean=0.1, reward_bound=0.0
131176: loss=0.000, reward_mean=0.1, reward_bound=0.0
131177: loss=0.000, reward_mean=0.0, reward_bound=0.0
131178: loss=0.000, reward_mean=0.2, reward_bound=0.0
131179: loss=0.000, reward_mean=0.1, reward_bound=0.0
131180: loss=0.000, reward_mean=0.1, reward_bound=0.0
131181: loss=0.000, reward_mean=0.0, reward_bound=0.0
131182: loss=0.000, reward_mean=0.1, reward_bound=0.0
131183: loss=0.000, reward_m

131320: loss=0.000, reward_mean=0.0, reward_bound=0.0
131321: loss=0.000, reward_mean=0.1, reward_bound=0.0
131322: loss=0.000, reward_mean=0.0, reward_bound=0.0
131323: loss=0.000, reward_mean=0.1, reward_bound=0.0
131324: loss=0.000, reward_mean=0.0, reward_bound=0.0
131325: loss=0.000, reward_mean=0.0, reward_bound=0.0
131326: loss=0.000, reward_mean=0.0, reward_bound=0.0
131327: loss=0.000, reward_mean=0.2, reward_bound=0.0
131328: loss=0.000, reward_mean=0.2, reward_bound=0.0
131329: loss=0.000, reward_mean=0.0, reward_bound=0.0
131330: loss=0.000, reward_mean=0.1, reward_bound=0.0
131331: loss=0.000, reward_mean=0.0, reward_bound=0.0
131332: loss=0.000, reward_mean=0.0, reward_bound=0.0
131333: loss=0.000, reward_mean=0.1, reward_bound=0.0
131334: loss=0.000, reward_mean=0.0, reward_bound=0.0
131335: loss=0.000, reward_mean=0.0, reward_bound=0.0
131336: loss=0.000, reward_mean=0.1, reward_bound=0.0
131337: loss=0.000, reward_mean=0.0, reward_bound=0.0
131338: loss=0.000, reward_m

131473: loss=0.000, reward_mean=0.0, reward_bound=0.0
131474: loss=0.000, reward_mean=0.1, reward_bound=0.0
131475: loss=0.000, reward_mean=0.0, reward_bound=0.0
131476: loss=0.000, reward_mean=0.1, reward_bound=0.0
131477: loss=0.000, reward_mean=0.1, reward_bound=0.0
131478: loss=0.000, reward_mean=0.0, reward_bound=0.0
131479: loss=0.000, reward_mean=0.1, reward_bound=0.0
131480: loss=0.000, reward_mean=0.0, reward_bound=0.0
131481: loss=0.000, reward_mean=0.1, reward_bound=0.0
131482: loss=0.000, reward_mean=0.0, reward_bound=0.0
131483: loss=0.000, reward_mean=0.1, reward_bound=0.0
131484: loss=0.000, reward_mean=0.0, reward_bound=0.0
131485: loss=0.000, reward_mean=0.1, reward_bound=0.0
131486: loss=0.000, reward_mean=0.0, reward_bound=0.0
131487: loss=0.000, reward_mean=0.1, reward_bound=0.0
131488: loss=0.000, reward_mean=0.0, reward_bound=0.0
131489: loss=0.000, reward_mean=0.1, reward_bound=0.0
131490: loss=0.000, reward_mean=0.2, reward_bound=0.0
131491: loss=0.000, reward_m

131629: loss=0.000, reward_mean=0.1, reward_bound=0.0
131630: loss=0.000, reward_mean=0.1, reward_bound=0.0
131631: loss=0.000, reward_mean=0.0, reward_bound=0.0
131632: loss=0.000, reward_mean=0.1, reward_bound=0.0
131633: loss=0.000, reward_mean=0.1, reward_bound=0.0
131634: loss=0.000, reward_mean=0.1, reward_bound=0.0
131635: loss=0.000, reward_mean=0.1, reward_bound=0.0
131636: loss=0.000, reward_mean=0.0, reward_bound=0.0
131637: loss=0.000, reward_mean=0.0, reward_bound=0.0
131638: loss=0.000, reward_mean=0.1, reward_bound=0.0
131639: loss=0.000, reward_mean=0.0, reward_bound=0.0
131640: loss=0.000, reward_mean=0.1, reward_bound=0.0
131641: loss=0.000, reward_mean=0.1, reward_bound=0.0
131642: loss=0.000, reward_mean=0.1, reward_bound=0.0
131643: loss=0.000, reward_mean=0.1, reward_bound=0.0
131644: loss=0.000, reward_mean=0.0, reward_bound=0.0
131645: loss=0.000, reward_mean=0.1, reward_bound=0.0
131646: loss=0.000, reward_mean=0.0, reward_bound=0.0
131647: loss=0.000, reward_m

131783: loss=0.000, reward_mean=0.1, reward_bound=0.0
131784: loss=0.000, reward_mean=0.0, reward_bound=0.0
131785: loss=0.000, reward_mean=0.1, reward_bound=0.0
131786: loss=0.000, reward_mean=0.1, reward_bound=0.0
131787: loss=0.000, reward_mean=0.0, reward_bound=0.0
131788: loss=0.000, reward_mean=0.1, reward_bound=0.0
131789: loss=0.000, reward_mean=0.1, reward_bound=0.0
131790: loss=0.000, reward_mean=0.1, reward_bound=0.0
131791: loss=0.000, reward_mean=0.0, reward_bound=0.0
131792: loss=0.000, reward_mean=0.0, reward_bound=0.0
131793: loss=0.000, reward_mean=0.1, reward_bound=0.0
131794: loss=0.000, reward_mean=0.1, reward_bound=0.0
131795: loss=0.000, reward_mean=0.1, reward_bound=0.0
131796: loss=0.000, reward_mean=0.1, reward_bound=0.0
131797: loss=0.000, reward_mean=0.0, reward_bound=0.0
131798: loss=0.000, reward_mean=0.0, reward_bound=0.0
131799: loss=0.000, reward_mean=0.2, reward_bound=0.0
131800: loss=0.000, reward_mean=0.0, reward_bound=0.0
131801: loss=0.000, reward_m

131937: loss=0.000, reward_mean=0.1, reward_bound=0.0
131938: loss=0.000, reward_mean=0.0, reward_bound=0.0
131939: loss=0.000, reward_mean=0.0, reward_bound=0.0
131940: loss=0.000, reward_mean=0.1, reward_bound=0.0
131941: loss=0.000, reward_mean=0.2, reward_bound=0.0
131942: loss=0.000, reward_mean=0.0, reward_bound=0.0
131943: loss=0.000, reward_mean=0.0, reward_bound=0.0
131944: loss=0.000, reward_mean=0.1, reward_bound=0.0
131945: loss=0.000, reward_mean=0.0, reward_bound=0.0
131946: loss=0.000, reward_mean=0.0, reward_bound=0.0
131947: loss=0.000, reward_mean=0.1, reward_bound=0.0
131948: loss=0.000, reward_mean=0.1, reward_bound=0.0
131949: loss=0.000, reward_mean=0.2, reward_bound=0.0
131950: loss=0.000, reward_mean=0.1, reward_bound=0.0
131951: loss=0.000, reward_mean=0.1, reward_bound=0.0
131952: loss=0.000, reward_mean=0.1, reward_bound=0.0
131953: loss=0.000, reward_mean=0.2, reward_bound=0.0
131954: loss=0.000, reward_mean=0.1, reward_bound=0.0
131955: loss=0.000, reward_m

132090: loss=0.000, reward_mean=0.1, reward_bound=0.0
132091: loss=0.000, reward_mean=0.1, reward_bound=0.0
132092: loss=0.000, reward_mean=0.1, reward_bound=0.0
132093: loss=0.000, reward_mean=0.0, reward_bound=0.0
132094: loss=0.000, reward_mean=0.0, reward_bound=0.0
132095: loss=0.000, reward_mean=0.1, reward_bound=0.0
132096: loss=0.000, reward_mean=0.1, reward_bound=0.0
132097: loss=0.000, reward_mean=0.0, reward_bound=0.0
132098: loss=0.000, reward_mean=0.2, reward_bound=0.0
132099: loss=0.000, reward_mean=0.1, reward_bound=0.0
132100: loss=0.000, reward_mean=0.0, reward_bound=0.0
132101: loss=0.000, reward_mean=0.1, reward_bound=0.0
132102: loss=0.000, reward_mean=0.0, reward_bound=0.0
132103: loss=0.000, reward_mean=0.0, reward_bound=0.0
132104: loss=0.000, reward_mean=0.0, reward_bound=0.0
132105: loss=0.000, reward_mean=0.0, reward_bound=0.0
132106: loss=0.000, reward_mean=0.1, reward_bound=0.0
132107: loss=0.000, reward_mean=0.1, reward_bound=0.0
132108: loss=0.000, reward_m

132243: loss=0.000, reward_mean=0.0, reward_bound=0.0
132244: loss=0.000, reward_mean=0.0, reward_bound=0.0
132245: loss=0.000, reward_mean=0.1, reward_bound=0.0
132246: loss=0.000, reward_mean=0.1, reward_bound=0.0
132247: loss=0.000, reward_mean=0.0, reward_bound=0.0
132248: loss=0.000, reward_mean=0.0, reward_bound=0.0
132249: loss=0.000, reward_mean=0.0, reward_bound=0.0
132250: loss=0.000, reward_mean=0.0, reward_bound=0.0
132251: loss=0.000, reward_mean=0.1, reward_bound=0.0
132252: loss=0.000, reward_mean=0.1, reward_bound=0.0
132253: loss=0.000, reward_mean=0.1, reward_bound=0.0
132254: loss=0.000, reward_mean=0.1, reward_bound=0.0
132255: loss=0.000, reward_mean=0.1, reward_bound=0.0
132256: loss=0.000, reward_mean=0.1, reward_bound=0.0
132257: loss=0.000, reward_mean=0.0, reward_bound=0.0
132258: loss=0.000, reward_mean=0.0, reward_bound=0.0
132259: loss=0.000, reward_mean=0.0, reward_bound=0.0
132260: loss=0.000, reward_mean=0.0, reward_bound=0.0
132261: loss=0.000, reward_m

132395: loss=0.000, reward_mean=0.1, reward_bound=0.0
132396: loss=0.000, reward_mean=0.0, reward_bound=0.0
132397: loss=0.000, reward_mean=0.0, reward_bound=0.0
132398: loss=0.000, reward_mean=0.1, reward_bound=0.0
132399: loss=0.000, reward_mean=0.0, reward_bound=0.0
132400: loss=0.000, reward_mean=0.1, reward_bound=0.0
132401: loss=0.000, reward_mean=0.0, reward_bound=0.0
132402: loss=0.000, reward_mean=0.0, reward_bound=0.0
132403: loss=0.000, reward_mean=0.0, reward_bound=0.0
132404: loss=0.000, reward_mean=0.1, reward_bound=0.0
132405: loss=0.000, reward_mean=0.2, reward_bound=0.0
132406: loss=0.000, reward_mean=0.0, reward_bound=0.0
132407: loss=0.000, reward_mean=0.1, reward_bound=0.0
132408: loss=0.000, reward_mean=0.1, reward_bound=0.0
132409: loss=0.000, reward_mean=0.0, reward_bound=0.0
132410: loss=0.000, reward_mean=0.1, reward_bound=0.0
132411: loss=0.000, reward_mean=0.0, reward_bound=0.0
132412: loss=0.000, reward_mean=0.0, reward_bound=0.0
132413: loss=0.000, reward_m

132548: loss=0.000, reward_mean=0.1, reward_bound=0.0
132549: loss=0.000, reward_mean=0.0, reward_bound=0.0
132550: loss=0.000, reward_mean=0.0, reward_bound=0.0
132551: loss=0.000, reward_mean=0.1, reward_bound=0.0
132552: loss=0.000, reward_mean=0.0, reward_bound=0.0
132553: loss=0.000, reward_mean=0.0, reward_bound=0.0
132554: loss=0.000, reward_mean=0.1, reward_bound=0.0
132555: loss=0.000, reward_mean=0.2, reward_bound=0.0
132556: loss=0.000, reward_mean=0.1, reward_bound=0.0
132557: loss=0.000, reward_mean=0.0, reward_bound=0.0
132558: loss=0.000, reward_mean=0.0, reward_bound=0.0
132559: loss=0.000, reward_mean=0.0, reward_bound=0.0
132560: loss=0.000, reward_mean=0.0, reward_bound=0.0
132561: loss=0.000, reward_mean=0.0, reward_bound=0.0
132562: loss=0.000, reward_mean=0.1, reward_bound=0.0
132563: loss=0.000, reward_mean=0.1, reward_bound=0.0
132564: loss=0.000, reward_mean=0.2, reward_bound=0.0
132565: loss=0.000, reward_mean=0.1, reward_bound=0.0
132566: loss=0.000, reward_m

132703: loss=0.000, reward_mean=0.0, reward_bound=0.0
132704: loss=0.000, reward_mean=0.1, reward_bound=0.0
132705: loss=0.000, reward_mean=0.0, reward_bound=0.0
132706: loss=0.000, reward_mean=0.1, reward_bound=0.0
132707: loss=0.000, reward_mean=0.1, reward_bound=0.0
132708: loss=0.000, reward_mean=0.1, reward_bound=0.0
132709: loss=0.000, reward_mean=0.1, reward_bound=0.0
132710: loss=0.000, reward_mean=0.0, reward_bound=0.0
132711: loss=0.000, reward_mean=0.0, reward_bound=0.0
132712: loss=0.000, reward_mean=0.1, reward_bound=0.0
132713: loss=0.000, reward_mean=0.0, reward_bound=0.0
132714: loss=0.000, reward_mean=0.0, reward_bound=0.0
132715: loss=0.000, reward_mean=0.1, reward_bound=0.0
132716: loss=0.000, reward_mean=0.1, reward_bound=0.0
132717: loss=0.000, reward_mean=0.2, reward_bound=0.0
132718: loss=0.000, reward_mean=0.0, reward_bound=0.0
132719: loss=0.000, reward_mean=0.0, reward_bound=0.0
132720: loss=0.000, reward_mean=0.1, reward_bound=0.0
132721: loss=0.000, reward_m

132861: loss=0.000, reward_mean=0.1, reward_bound=0.0
132862: loss=0.000, reward_mean=0.2, reward_bound=0.0
132863: loss=0.000, reward_mean=0.0, reward_bound=0.0
132864: loss=0.000, reward_mean=0.1, reward_bound=0.0
132865: loss=0.000, reward_mean=0.1, reward_bound=0.0
132866: loss=0.000, reward_mean=0.0, reward_bound=0.0
132867: loss=0.000, reward_mean=0.1, reward_bound=0.0
132868: loss=0.000, reward_mean=0.1, reward_bound=0.0
132869: loss=0.000, reward_mean=0.1, reward_bound=0.0
132870: loss=0.000, reward_mean=0.0, reward_bound=0.0
132871: loss=0.000, reward_mean=0.2, reward_bound=0.0
132872: loss=0.000, reward_mean=0.1, reward_bound=0.0
132873: loss=0.000, reward_mean=0.0, reward_bound=0.0
132874: loss=0.000, reward_mean=0.0, reward_bound=0.0
132875: loss=0.000, reward_mean=0.1, reward_bound=0.0
132876: loss=0.000, reward_mean=0.1, reward_bound=0.0
132877: loss=0.000, reward_mean=0.1, reward_bound=0.0
132878: loss=0.000, reward_mean=0.1, reward_bound=0.0
132879: loss=0.000, reward_m

133016: loss=0.000, reward_mean=0.1, reward_bound=0.0
133017: loss=0.000, reward_mean=0.1, reward_bound=0.0
133018: loss=0.000, reward_mean=0.1, reward_bound=0.0
133019: loss=0.000, reward_mean=0.0, reward_bound=0.0
133020: loss=0.000, reward_mean=0.0, reward_bound=0.0
133021: loss=0.000, reward_mean=0.0, reward_bound=0.0
133022: loss=0.000, reward_mean=0.1, reward_bound=0.0
133023: loss=0.000, reward_mean=0.0, reward_bound=0.0
133024: loss=0.000, reward_mean=0.0, reward_bound=0.0
133025: loss=0.000, reward_mean=0.0, reward_bound=0.0
133026: loss=0.000, reward_mean=0.1, reward_bound=0.0
133027: loss=0.000, reward_mean=0.1, reward_bound=0.0
133028: loss=0.000, reward_mean=0.1, reward_bound=0.0
133029: loss=0.000, reward_mean=0.1, reward_bound=0.0
133030: loss=0.000, reward_mean=0.0, reward_bound=0.0
133031: loss=0.000, reward_mean=0.0, reward_bound=0.0
133032: loss=0.000, reward_mean=0.1, reward_bound=0.0
133033: loss=0.000, reward_mean=0.1, reward_bound=0.0
133034: loss=0.000, reward_m

133169: loss=0.000, reward_mean=0.2, reward_bound=0.0
133170: loss=0.000, reward_mean=0.0, reward_bound=0.0
133171: loss=0.000, reward_mean=0.1, reward_bound=0.0
133172: loss=0.000, reward_mean=0.0, reward_bound=0.0
133173: loss=0.000, reward_mean=0.1, reward_bound=0.0
133174: loss=0.000, reward_mean=0.1, reward_bound=0.0
133175: loss=0.000, reward_mean=0.1, reward_bound=0.0
133176: loss=0.000, reward_mean=0.1, reward_bound=0.0
133177: loss=0.000, reward_mean=0.1, reward_bound=0.0
133178: loss=0.000, reward_mean=0.0, reward_bound=0.0
133179: loss=0.000, reward_mean=0.0, reward_bound=0.0
133180: loss=0.000, reward_mean=0.0, reward_bound=0.0
133181: loss=0.000, reward_mean=0.0, reward_bound=0.0
133182: loss=0.000, reward_mean=0.1, reward_bound=0.0
133183: loss=0.000, reward_mean=0.0, reward_bound=0.0
133184: loss=0.000, reward_mean=0.1, reward_bound=0.0
133185: loss=0.000, reward_mean=0.1, reward_bound=0.0
133186: loss=0.000, reward_mean=0.1, reward_bound=0.0
133187: loss=0.000, reward_m

133320: loss=0.000, reward_mean=0.1, reward_bound=0.0
133321: loss=0.000, reward_mean=0.1, reward_bound=0.0
133322: loss=0.000, reward_mean=0.1, reward_bound=0.0
133323: loss=0.000, reward_mean=0.0, reward_bound=0.0
133324: loss=0.000, reward_mean=0.1, reward_bound=0.0
133325: loss=0.000, reward_mean=0.0, reward_bound=0.0
133326: loss=0.000, reward_mean=0.1, reward_bound=0.0
133327: loss=0.000, reward_mean=0.1, reward_bound=0.0
133328: loss=0.000, reward_mean=0.1, reward_bound=0.0
133329: loss=0.000, reward_mean=0.1, reward_bound=0.0
133330: loss=0.000, reward_mean=0.2, reward_bound=0.0
133331: loss=0.000, reward_mean=0.0, reward_bound=0.0
133332: loss=0.000, reward_mean=0.1, reward_bound=0.0
133333: loss=0.000, reward_mean=0.1, reward_bound=0.0
133334: loss=0.000, reward_mean=0.0, reward_bound=0.0
133335: loss=0.000, reward_mean=0.0, reward_bound=0.0
133336: loss=0.000, reward_mean=0.1, reward_bound=0.0
133337: loss=0.000, reward_mean=0.1, reward_bound=0.0
133338: loss=0.000, reward_m

133476: loss=0.000, reward_mean=0.1, reward_bound=0.0
133477: loss=0.000, reward_mean=0.4, reward_bound=1.0
133478: loss=0.000, reward_mean=0.0, reward_bound=0.0
133479: loss=0.000, reward_mean=0.0, reward_bound=0.0
133480: loss=0.000, reward_mean=0.2, reward_bound=0.0
133481: loss=0.000, reward_mean=0.1, reward_bound=0.0
133482: loss=0.000, reward_mean=0.0, reward_bound=0.0
133483: loss=0.000, reward_mean=0.0, reward_bound=0.0
133484: loss=0.000, reward_mean=0.1, reward_bound=0.0
133485: loss=0.000, reward_mean=0.1, reward_bound=0.0
133486: loss=0.000, reward_mean=0.0, reward_bound=0.0
133487: loss=0.000, reward_mean=0.0, reward_bound=0.0
133488: loss=0.000, reward_mean=0.1, reward_bound=0.0
133489: loss=0.000, reward_mean=0.0, reward_bound=0.0
133490: loss=0.000, reward_mean=0.1, reward_bound=0.0
133491: loss=0.000, reward_mean=0.1, reward_bound=0.0
133492: loss=0.000, reward_mean=0.1, reward_bound=0.0
133493: loss=0.000, reward_mean=0.0, reward_bound=0.0
133494: loss=0.000, reward_m

133630: loss=0.000, reward_mean=0.1, reward_bound=0.0
133631: loss=0.000, reward_mean=0.0, reward_bound=0.0
133632: loss=0.000, reward_mean=0.1, reward_bound=0.0
133633: loss=0.000, reward_mean=0.0, reward_bound=0.0
133634: loss=0.000, reward_mean=0.0, reward_bound=0.0
133635: loss=0.000, reward_mean=0.0, reward_bound=0.0
133636: loss=0.000, reward_mean=0.0, reward_bound=0.0
133637: loss=0.000, reward_mean=0.1, reward_bound=0.0
133638: loss=0.000, reward_mean=0.0, reward_bound=0.0
133639: loss=0.000, reward_mean=0.1, reward_bound=0.0
133640: loss=0.000, reward_mean=0.1, reward_bound=0.0
133641: loss=0.000, reward_mean=0.0, reward_bound=0.0
133642: loss=0.000, reward_mean=0.1, reward_bound=0.0
133643: loss=0.000, reward_mean=0.0, reward_bound=0.0
133644: loss=0.000, reward_mean=0.1, reward_bound=0.0
133645: loss=0.000, reward_mean=0.0, reward_bound=0.0
133646: loss=0.000, reward_mean=0.1, reward_bound=0.0
133647: loss=0.000, reward_mean=0.2, reward_bound=0.0
133648: loss=0.000, reward_m

133786: loss=0.000, reward_mean=0.0, reward_bound=0.0
133787: loss=0.000, reward_mean=0.1, reward_bound=0.0
133788: loss=0.000, reward_mean=0.0, reward_bound=0.0
133789: loss=0.000, reward_mean=0.1, reward_bound=0.0
133790: loss=0.000, reward_mean=0.1, reward_bound=0.0
133791: loss=0.000, reward_mean=0.1, reward_bound=0.0
133792: loss=0.000, reward_mean=0.1, reward_bound=0.0
133793: loss=0.000, reward_mean=0.0, reward_bound=0.0
133794: loss=0.000, reward_mean=0.1, reward_bound=0.0
133795: loss=0.000, reward_mean=0.1, reward_bound=0.0
133796: loss=0.000, reward_mean=0.0, reward_bound=0.0
133797: loss=0.000, reward_mean=0.1, reward_bound=0.0
133798: loss=0.000, reward_mean=0.0, reward_bound=0.0
133799: loss=0.000, reward_mean=0.0, reward_bound=0.0
133800: loss=0.000, reward_mean=0.1, reward_bound=0.0
133801: loss=0.000, reward_mean=0.0, reward_bound=0.0
133802: loss=0.000, reward_mean=0.1, reward_bound=0.0
133803: loss=0.000, reward_mean=0.0, reward_bound=0.0
133804: loss=0.000, reward_m

133938: loss=0.000, reward_mean=0.1, reward_bound=0.0
133939: loss=0.000, reward_mean=0.0, reward_bound=0.0
133940: loss=0.000, reward_mean=0.1, reward_bound=0.0
133941: loss=0.000, reward_mean=0.1, reward_bound=0.0
133942: loss=0.000, reward_mean=0.0, reward_bound=0.0
133943: loss=0.000, reward_mean=0.1, reward_bound=0.0
133944: loss=0.000, reward_mean=0.0, reward_bound=0.0
133945: loss=0.000, reward_mean=0.0, reward_bound=0.0
133946: loss=0.000, reward_mean=0.0, reward_bound=0.0
133947: loss=0.000, reward_mean=0.1, reward_bound=0.0
133948: loss=0.000, reward_mean=0.0, reward_bound=0.0
133949: loss=0.000, reward_mean=0.1, reward_bound=0.0
133950: loss=0.000, reward_mean=0.1, reward_bound=0.0
133951: loss=0.000, reward_mean=0.1, reward_bound=0.0
133952: loss=0.000, reward_mean=0.0, reward_bound=0.0
133953: loss=0.000, reward_mean=0.2, reward_bound=0.0
133954: loss=0.000, reward_mean=0.1, reward_bound=0.0
133955: loss=0.000, reward_mean=0.2, reward_bound=0.0
133956: loss=0.000, reward_m

134094: loss=0.000, reward_mean=0.1, reward_bound=0.0
134095: loss=0.000, reward_mean=0.0, reward_bound=0.0
134096: loss=0.000, reward_mean=0.0, reward_bound=0.0
134097: loss=0.000, reward_mean=0.1, reward_bound=0.0
134098: loss=0.000, reward_mean=0.1, reward_bound=0.0
134099: loss=0.000, reward_mean=0.1, reward_bound=0.0
134100: loss=0.000, reward_mean=0.1, reward_bound=0.0
134101: loss=0.000, reward_mean=0.1, reward_bound=0.0
134102: loss=0.000, reward_mean=0.1, reward_bound=0.0
134103: loss=0.000, reward_mean=0.0, reward_bound=0.0
134104: loss=0.000, reward_mean=0.1, reward_bound=0.0
134105: loss=0.000, reward_mean=0.0, reward_bound=0.0
134106: loss=0.000, reward_mean=0.1, reward_bound=0.0
134107: loss=0.000, reward_mean=0.1, reward_bound=0.0
134108: loss=0.000, reward_mean=0.0, reward_bound=0.0
134109: loss=0.000, reward_mean=0.0, reward_bound=0.0
134110: loss=0.000, reward_mean=0.0, reward_bound=0.0
134111: loss=0.000, reward_mean=0.0, reward_bound=0.0
134112: loss=0.000, reward_m

134249: loss=0.000, reward_mean=0.2, reward_bound=0.0
134250: loss=0.000, reward_mean=0.2, reward_bound=0.0
134251: loss=0.000, reward_mean=0.1, reward_bound=0.0
134252: loss=0.000, reward_mean=0.0, reward_bound=0.0
134253: loss=0.000, reward_mean=0.1, reward_bound=0.0
134254: loss=0.000, reward_mean=0.1, reward_bound=0.0
134255: loss=0.000, reward_mean=0.0, reward_bound=0.0
134256: loss=0.000, reward_mean=0.1, reward_bound=0.0
134257: loss=0.000, reward_mean=0.1, reward_bound=0.0
134258: loss=0.000, reward_mean=0.0, reward_bound=0.0
134259: loss=0.000, reward_mean=0.1, reward_bound=0.0
134260: loss=0.000, reward_mean=0.0, reward_bound=0.0
134261: loss=0.000, reward_mean=0.1, reward_bound=0.0
134262: loss=0.000, reward_mean=0.0, reward_bound=0.0
134263: loss=0.000, reward_mean=0.0, reward_bound=0.0
134264: loss=0.000, reward_mean=0.1, reward_bound=0.0
134265: loss=0.000, reward_mean=0.0, reward_bound=0.0
134266: loss=0.000, reward_mean=0.0, reward_bound=0.0
134267: loss=0.000, reward_m

134404: loss=0.000, reward_mean=0.1, reward_bound=0.0
134405: loss=0.000, reward_mean=0.0, reward_bound=0.0
134406: loss=0.000, reward_mean=0.1, reward_bound=0.0
134407: loss=0.000, reward_mean=0.1, reward_bound=0.0
134408: loss=0.000, reward_mean=0.1, reward_bound=0.0
134409: loss=0.000, reward_mean=0.1, reward_bound=0.0
134410: loss=0.000, reward_mean=0.1, reward_bound=0.0
134411: loss=0.000, reward_mean=0.1, reward_bound=0.0
134412: loss=0.000, reward_mean=0.1, reward_bound=0.0
134413: loss=0.000, reward_mean=0.1, reward_bound=0.0
134414: loss=0.000, reward_mean=0.1, reward_bound=0.0
134415: loss=0.000, reward_mean=0.0, reward_bound=0.0
134416: loss=0.000, reward_mean=0.1, reward_bound=0.0
134417: loss=0.000, reward_mean=0.0, reward_bound=0.0
134418: loss=0.000, reward_mean=0.1, reward_bound=0.0
134419: loss=0.000, reward_mean=0.1, reward_bound=0.0
134420: loss=0.000, reward_mean=0.0, reward_bound=0.0
134421: loss=0.000, reward_mean=0.0, reward_bound=0.0
134422: loss=0.000, reward_m

134556: loss=0.000, reward_mean=0.1, reward_bound=0.0
134557: loss=0.000, reward_mean=0.1, reward_bound=0.0
134558: loss=0.000, reward_mean=0.0, reward_bound=0.0
134559: loss=0.000, reward_mean=0.1, reward_bound=0.0
134560: loss=0.000, reward_mean=0.0, reward_bound=0.0
134561: loss=0.000, reward_mean=0.1, reward_bound=0.0
134562: loss=0.000, reward_mean=0.1, reward_bound=0.0
134563: loss=0.000, reward_mean=0.0, reward_bound=0.0
134564: loss=0.000, reward_mean=0.1, reward_bound=0.0
134565: loss=0.000, reward_mean=0.0, reward_bound=0.0
134566: loss=0.000, reward_mean=0.0, reward_bound=0.0
134567: loss=0.000, reward_mean=0.1, reward_bound=0.0
134568: loss=0.000, reward_mean=0.1, reward_bound=0.0
134569: loss=0.000, reward_mean=0.0, reward_bound=0.0
134570: loss=0.000, reward_mean=0.1, reward_bound=0.0
134571: loss=0.000, reward_mean=0.1, reward_bound=0.0
134572: loss=0.000, reward_mean=0.2, reward_bound=0.0
134573: loss=0.000, reward_mean=0.1, reward_bound=0.0
134574: loss=0.000, reward_m

134714: loss=0.000, reward_mean=0.1, reward_bound=0.0
134715: loss=0.000, reward_mean=0.1, reward_bound=0.0
134716: loss=0.000, reward_mean=0.1, reward_bound=0.0
134717: loss=0.000, reward_mean=0.1, reward_bound=0.0
134718: loss=0.000, reward_mean=0.1, reward_bound=0.0
134719: loss=0.000, reward_mean=0.1, reward_bound=0.0
134720: loss=0.000, reward_mean=0.1, reward_bound=0.0
134721: loss=0.000, reward_mean=0.0, reward_bound=0.0
134722: loss=0.000, reward_mean=0.1, reward_bound=0.0
134723: loss=0.000, reward_mean=0.1, reward_bound=0.0
134724: loss=0.000, reward_mean=0.0, reward_bound=0.0
134725: loss=0.000, reward_mean=0.0, reward_bound=0.0
134726: loss=0.000, reward_mean=0.0, reward_bound=0.0
134727: loss=0.000, reward_mean=0.1, reward_bound=0.0
134728: loss=0.000, reward_mean=0.1, reward_bound=0.0
134729: loss=0.000, reward_mean=0.1, reward_bound=0.0
134730: loss=0.000, reward_mean=0.0, reward_bound=0.0
134731: loss=0.000, reward_mean=0.1, reward_bound=0.0
134732: loss=0.000, reward_m

134872: loss=0.000, reward_mean=0.2, reward_bound=0.0
134873: loss=0.000, reward_mean=0.1, reward_bound=0.0
134874: loss=0.000, reward_mean=0.0, reward_bound=0.0
134875: loss=0.000, reward_mean=0.0, reward_bound=0.0
134876: loss=0.000, reward_mean=0.1, reward_bound=0.0
134877: loss=0.000, reward_mean=0.0, reward_bound=0.0
134878: loss=0.000, reward_mean=0.0, reward_bound=0.0
134879: loss=0.000, reward_mean=0.0, reward_bound=0.0
134880: loss=0.000, reward_mean=0.2, reward_bound=0.0
134881: loss=0.000, reward_mean=0.0, reward_bound=0.0
134882: loss=0.000, reward_mean=0.0, reward_bound=0.0
134883: loss=0.000, reward_mean=0.1, reward_bound=0.0
134884: loss=0.000, reward_mean=0.0, reward_bound=0.0
134885: loss=0.000, reward_mean=0.1, reward_bound=0.0
134886: loss=0.000, reward_mean=0.1, reward_bound=0.0
134887: loss=0.000, reward_mean=0.0, reward_bound=0.0
134888: loss=0.000, reward_mean=0.1, reward_bound=0.0
134889: loss=0.000, reward_mean=0.0, reward_bound=0.0
134890: loss=0.000, reward_m

135029: loss=0.000, reward_mean=0.1, reward_bound=0.0
135030: loss=0.000, reward_mean=0.0, reward_bound=0.0
135031: loss=0.000, reward_mean=0.0, reward_bound=0.0
135032: loss=0.000, reward_mean=0.1, reward_bound=0.0
135033: loss=0.000, reward_mean=0.1, reward_bound=0.0
135034: loss=0.000, reward_mean=0.1, reward_bound=0.0
135035: loss=0.000, reward_mean=0.0, reward_bound=0.0
135036: loss=0.000, reward_mean=0.2, reward_bound=0.0
135037: loss=0.000, reward_mean=0.0, reward_bound=0.0
135038: loss=0.000, reward_mean=0.1, reward_bound=0.0
135039: loss=0.000, reward_mean=0.0, reward_bound=0.0
135040: loss=0.000, reward_mean=0.0, reward_bound=0.0
135041: loss=0.000, reward_mean=0.0, reward_bound=0.0
135042: loss=0.000, reward_mean=0.1, reward_bound=0.0
135043: loss=0.000, reward_mean=0.1, reward_bound=0.0
135044: loss=0.000, reward_mean=0.0, reward_bound=0.0
135045: loss=0.000, reward_mean=0.1, reward_bound=0.0
135046: loss=0.000, reward_mean=0.1, reward_bound=0.0
135047: loss=0.000, reward_m

135187: loss=0.000, reward_mean=0.1, reward_bound=0.0
135188: loss=0.000, reward_mean=0.0, reward_bound=0.0
135189: loss=0.000, reward_mean=0.1, reward_bound=0.0
135190: loss=0.000, reward_mean=0.0, reward_bound=0.0
135191: loss=0.000, reward_mean=0.1, reward_bound=0.0
135192: loss=0.000, reward_mean=0.0, reward_bound=0.0
135193: loss=0.000, reward_mean=0.0, reward_bound=0.0
135194: loss=0.000, reward_mean=0.1, reward_bound=0.0
135195: loss=0.000, reward_mean=0.1, reward_bound=0.0
135196: loss=0.000, reward_mean=0.0, reward_bound=0.0
135197: loss=0.000, reward_mean=0.0, reward_bound=0.0
135198: loss=0.000, reward_mean=0.0, reward_bound=0.0
135199: loss=0.000, reward_mean=0.1, reward_bound=0.0
135200: loss=0.000, reward_mean=0.1, reward_bound=0.0
135201: loss=0.000, reward_mean=0.1, reward_bound=0.0
135202: loss=0.000, reward_mean=0.1, reward_bound=0.0
135203: loss=0.000, reward_mean=0.1, reward_bound=0.0
135204: loss=0.000, reward_mean=0.0, reward_bound=0.0
135205: loss=0.000, reward_m

135342: loss=0.000, reward_mean=0.0, reward_bound=0.0
135343: loss=0.000, reward_mean=0.1, reward_bound=0.0
135344: loss=0.000, reward_mean=0.1, reward_bound=0.0
135345: loss=0.000, reward_mean=0.1, reward_bound=0.0
135346: loss=0.000, reward_mean=0.2, reward_bound=0.0
135347: loss=0.000, reward_mean=0.0, reward_bound=0.0
135348: loss=0.000, reward_mean=0.0, reward_bound=0.0
135349: loss=0.000, reward_mean=0.2, reward_bound=0.0
135350: loss=0.000, reward_mean=0.1, reward_bound=0.0
135351: loss=0.000, reward_mean=0.2, reward_bound=0.0
135352: loss=0.000, reward_mean=0.2, reward_bound=0.0
135353: loss=0.000, reward_mean=0.1, reward_bound=0.0
135354: loss=0.000, reward_mean=0.0, reward_bound=0.0
135355: loss=0.000, reward_mean=0.1, reward_bound=0.0
135356: loss=0.000, reward_mean=0.1, reward_bound=0.0
135357: loss=0.000, reward_mean=0.0, reward_bound=0.0
135358: loss=0.000, reward_mean=0.1, reward_bound=0.0
135359: loss=0.000, reward_mean=0.1, reward_bound=0.0
135360: loss=0.000, reward_m

135499: loss=0.000, reward_mean=0.1, reward_bound=0.0
135500: loss=0.000, reward_mean=0.1, reward_bound=0.0
135501: loss=0.000, reward_mean=0.1, reward_bound=0.0
135502: loss=0.000, reward_mean=0.1, reward_bound=0.0
135503: loss=0.000, reward_mean=0.1, reward_bound=0.0
135504: loss=0.000, reward_mean=0.1, reward_bound=0.0
135505: loss=0.000, reward_mean=0.0, reward_bound=0.0
135506: loss=0.000, reward_mean=0.0, reward_bound=0.0
135507: loss=0.000, reward_mean=0.1, reward_bound=0.0
135508: loss=0.000, reward_mean=0.0, reward_bound=0.0
135509: loss=0.000, reward_mean=0.1, reward_bound=0.0
135510: loss=0.000, reward_mean=0.0, reward_bound=0.0
135511: loss=0.000, reward_mean=0.0, reward_bound=0.0
135512: loss=0.000, reward_mean=0.1, reward_bound=0.0
135513: loss=0.000, reward_mean=0.0, reward_bound=0.0
135514: loss=0.000, reward_mean=0.0, reward_bound=0.0
135515: loss=0.000, reward_mean=0.0, reward_bound=0.0
135516: loss=0.000, reward_mean=0.0, reward_bound=0.0
135517: loss=0.000, reward_m

135653: loss=0.000, reward_mean=0.1, reward_bound=0.0
135654: loss=0.000, reward_mean=0.1, reward_bound=0.0
135655: loss=0.000, reward_mean=0.1, reward_bound=0.0
135656: loss=0.000, reward_mean=0.2, reward_bound=0.0
135657: loss=0.000, reward_mean=0.0, reward_bound=0.0
135658: loss=0.000, reward_mean=0.2, reward_bound=0.0
135659: loss=0.000, reward_mean=0.1, reward_bound=0.0
135660: loss=0.000, reward_mean=0.0, reward_bound=0.0
135661: loss=0.000, reward_mean=0.0, reward_bound=0.0
135662: loss=0.000, reward_mean=0.1, reward_bound=0.0
135663: loss=0.000, reward_mean=0.0, reward_bound=0.0
135664: loss=0.000, reward_mean=0.1, reward_bound=0.0
135665: loss=0.000, reward_mean=0.0, reward_bound=0.0
135666: loss=0.000, reward_mean=0.0, reward_bound=0.0
135667: loss=0.000, reward_mean=0.1, reward_bound=0.0
135668: loss=0.000, reward_mean=0.0, reward_bound=0.0
135669: loss=0.000, reward_mean=0.1, reward_bound=0.0
135670: loss=0.000, reward_mean=0.0, reward_bound=0.0
135671: loss=0.000, reward_m

135806: loss=0.000, reward_mean=0.1, reward_bound=0.0
135807: loss=0.000, reward_mean=0.0, reward_bound=0.0
135808: loss=0.000, reward_mean=0.1, reward_bound=0.0
135809: loss=0.000, reward_mean=0.0, reward_bound=0.0
135810: loss=0.000, reward_mean=0.0, reward_bound=0.0
135811: loss=0.000, reward_mean=0.1, reward_bound=0.0
135812: loss=0.000, reward_mean=0.1, reward_bound=0.0
135813: loss=0.000, reward_mean=0.2, reward_bound=0.0
135814: loss=0.000, reward_mean=0.0, reward_bound=0.0
135815: loss=0.000, reward_mean=0.1, reward_bound=0.0
135816: loss=0.000, reward_mean=0.2, reward_bound=0.0
135817: loss=0.000, reward_mean=0.0, reward_bound=0.0
135818: loss=0.000, reward_mean=0.2, reward_bound=0.0
135819: loss=0.000, reward_mean=0.1, reward_bound=0.0
135820: loss=0.000, reward_mean=0.0, reward_bound=0.0
135821: loss=0.000, reward_mean=0.0, reward_bound=0.0
135822: loss=0.000, reward_mean=0.1, reward_bound=0.0
135823: loss=0.000, reward_mean=0.0, reward_bound=0.0
135824: loss=0.000, reward_m

135958: loss=0.000, reward_mean=0.1, reward_bound=0.0
135959: loss=0.000, reward_mean=0.0, reward_bound=0.0
135960: loss=0.000, reward_mean=0.2, reward_bound=0.0
135961: loss=0.000, reward_mean=0.0, reward_bound=0.0
135962: loss=0.000, reward_mean=0.1, reward_bound=0.0
135963: loss=0.000, reward_mean=0.2, reward_bound=0.0
135964: loss=0.000, reward_mean=0.1, reward_bound=0.0
135965: loss=0.000, reward_mean=0.1, reward_bound=0.0
135966: loss=0.000, reward_mean=0.0, reward_bound=0.0
135967: loss=0.000, reward_mean=0.0, reward_bound=0.0
135968: loss=0.000, reward_mean=0.0, reward_bound=0.0
135969: loss=0.000, reward_mean=0.1, reward_bound=0.0
135970: loss=0.000, reward_mean=0.0, reward_bound=0.0
135971: loss=0.000, reward_mean=0.0, reward_bound=0.0
135972: loss=0.000, reward_mean=0.0, reward_bound=0.0
135973: loss=0.000, reward_mean=0.1, reward_bound=0.0
135974: loss=0.000, reward_mean=0.1, reward_bound=0.0
135975: loss=0.000, reward_mean=0.1, reward_bound=0.0
135976: loss=0.000, reward_m

136114: loss=0.000, reward_mean=0.0, reward_bound=0.0
136115: loss=0.000, reward_mean=0.1, reward_bound=0.0
136116: loss=0.000, reward_mean=0.1, reward_bound=0.0
136117: loss=0.000, reward_mean=0.1, reward_bound=0.0
136118: loss=0.000, reward_mean=0.0, reward_bound=0.0
136119: loss=0.000, reward_mean=0.0, reward_bound=0.0
136120: loss=0.000, reward_mean=0.0, reward_bound=0.0
136121: loss=0.000, reward_mean=0.1, reward_bound=0.0
136122: loss=0.000, reward_mean=0.1, reward_bound=0.0
136123: loss=0.000, reward_mean=0.0, reward_bound=0.0
136124: loss=0.000, reward_mean=0.1, reward_bound=0.0
136125: loss=0.000, reward_mean=0.1, reward_bound=0.0
136126: loss=0.000, reward_mean=0.0, reward_bound=0.0
136127: loss=0.000, reward_mean=0.0, reward_bound=0.0
136128: loss=0.000, reward_mean=0.1, reward_bound=0.0
136129: loss=0.000, reward_mean=0.0, reward_bound=0.0
136130: loss=0.000, reward_mean=0.1, reward_bound=0.0
136131: loss=0.000, reward_mean=0.2, reward_bound=0.0
136132: loss=0.000, reward_m

136269: loss=0.000, reward_mean=0.1, reward_bound=0.0
136270: loss=0.000, reward_mean=0.0, reward_bound=0.0
136271: loss=0.000, reward_mean=0.2, reward_bound=0.0
136272: loss=0.000, reward_mean=0.1, reward_bound=0.0
136273: loss=0.000, reward_mean=0.0, reward_bound=0.0
136274: loss=0.000, reward_mean=0.0, reward_bound=0.0
136275: loss=0.000, reward_mean=0.1, reward_bound=0.0
136276: loss=0.000, reward_mean=0.1, reward_bound=0.0
136277: loss=0.000, reward_mean=0.0, reward_bound=0.0
136278: loss=0.000, reward_mean=0.1, reward_bound=0.0
136279: loss=0.000, reward_mean=0.0, reward_bound=0.0
136280: loss=0.000, reward_mean=0.0, reward_bound=0.0
136281: loss=0.000, reward_mean=0.1, reward_bound=0.0
136282: loss=0.000, reward_mean=0.0, reward_bound=0.0
136283: loss=0.000, reward_mean=0.0, reward_bound=0.0
136284: loss=0.000, reward_mean=0.1, reward_bound=0.0
136285: loss=0.000, reward_mean=0.2, reward_bound=0.0
136286: loss=0.000, reward_mean=0.0, reward_bound=0.0
136287: loss=0.000, reward_m

136425: loss=0.000, reward_mean=0.1, reward_bound=0.0
136426: loss=0.000, reward_mean=0.2, reward_bound=0.0
136427: loss=0.000, reward_mean=0.1, reward_bound=0.0
136428: loss=0.000, reward_mean=0.1, reward_bound=0.0
136429: loss=0.000, reward_mean=0.0, reward_bound=0.0
136430: loss=0.000, reward_mean=0.1, reward_bound=0.0
136431: loss=0.000, reward_mean=0.1, reward_bound=0.0
136432: loss=0.000, reward_mean=0.0, reward_bound=0.0
136433: loss=0.000, reward_mean=0.1, reward_bound=0.0
136434: loss=0.000, reward_mean=0.1, reward_bound=0.0
136435: loss=0.000, reward_mean=0.1, reward_bound=0.0
136436: loss=0.000, reward_mean=0.1, reward_bound=0.0
136437: loss=0.000, reward_mean=0.1, reward_bound=0.0
136438: loss=0.000, reward_mean=0.1, reward_bound=0.0
136439: loss=0.000, reward_mean=0.0, reward_bound=0.0
136440: loss=0.000, reward_mean=0.1, reward_bound=0.0
136441: loss=0.000, reward_mean=0.1, reward_bound=0.0
136442: loss=0.000, reward_mean=0.0, reward_bound=0.0
136443: loss=0.000, reward_m

136578: loss=0.000, reward_mean=0.0, reward_bound=0.0
136579: loss=0.000, reward_mean=0.1, reward_bound=0.0
136580: loss=0.000, reward_mean=0.0, reward_bound=0.0
136581: loss=0.000, reward_mean=0.2, reward_bound=0.0
136582: loss=0.000, reward_mean=0.1, reward_bound=0.0
136583: loss=0.000, reward_mean=0.1, reward_bound=0.0
136584: loss=0.000, reward_mean=0.1, reward_bound=0.0
136585: loss=0.000, reward_mean=0.0, reward_bound=0.0
136586: loss=0.000, reward_mean=0.0, reward_bound=0.0
136587: loss=0.000, reward_mean=0.1, reward_bound=0.0
136588: loss=0.000, reward_mean=0.2, reward_bound=0.0
136589: loss=0.000, reward_mean=0.1, reward_bound=0.0
136590: loss=0.000, reward_mean=0.1, reward_bound=0.0
136591: loss=0.000, reward_mean=0.1, reward_bound=0.0
136592: loss=0.000, reward_mean=0.0, reward_bound=0.0
136593: loss=0.000, reward_mean=0.0, reward_bound=0.0
136594: loss=0.000, reward_mean=0.1, reward_bound=0.0
136595: loss=0.000, reward_mean=0.0, reward_bound=0.0
136596: loss=0.000, reward_m

136731: loss=0.000, reward_mean=0.0, reward_bound=0.0
136732: loss=0.000, reward_mean=0.1, reward_bound=0.0
136733: loss=0.000, reward_mean=0.0, reward_bound=0.0
136734: loss=0.000, reward_mean=0.1, reward_bound=0.0
136735: loss=0.000, reward_mean=0.2, reward_bound=0.0
136736: loss=0.000, reward_mean=0.0, reward_bound=0.0
136737: loss=0.000, reward_mean=0.0, reward_bound=0.0
136738: loss=0.000, reward_mean=0.2, reward_bound=0.0
136739: loss=0.000, reward_mean=0.0, reward_bound=0.0
136740: loss=0.000, reward_mean=0.0, reward_bound=0.0
136741: loss=0.000, reward_mean=0.1, reward_bound=0.0
136742: loss=0.000, reward_mean=0.0, reward_bound=0.0
136743: loss=0.000, reward_mean=0.1, reward_bound=0.0
136744: loss=0.000, reward_mean=0.0, reward_bound=0.0
136745: loss=0.000, reward_mean=0.1, reward_bound=0.0
136746: loss=0.000, reward_mean=0.0, reward_bound=0.0
136747: loss=0.000, reward_mean=0.1, reward_bound=0.0
136748: loss=0.000, reward_mean=0.0, reward_bound=0.0
136749: loss=0.000, reward_m

136884: loss=0.000, reward_mean=0.0, reward_bound=0.0
136885: loss=0.000, reward_mean=0.1, reward_bound=0.0
136886: loss=0.000, reward_mean=0.1, reward_bound=0.0
136887: loss=0.000, reward_mean=0.1, reward_bound=0.0
136888: loss=0.000, reward_mean=0.1, reward_bound=0.0
136889: loss=0.000, reward_mean=0.0, reward_bound=0.0
136890: loss=0.000, reward_mean=0.0, reward_bound=0.0
136891: loss=0.000, reward_mean=0.1, reward_bound=0.0
136892: loss=0.000, reward_mean=0.2, reward_bound=0.0
136893: loss=0.000, reward_mean=0.0, reward_bound=0.0
136894: loss=0.000, reward_mean=0.0, reward_bound=0.0
136895: loss=0.000, reward_mean=0.1, reward_bound=0.0
136896: loss=0.000, reward_mean=0.1, reward_bound=0.0
136897: loss=0.000, reward_mean=0.0, reward_bound=0.0
136898: loss=0.000, reward_mean=0.0, reward_bound=0.0
136899: loss=0.000, reward_mean=0.1, reward_bound=0.0
136900: loss=0.000, reward_mean=0.1, reward_bound=0.0
136901: loss=0.000, reward_mean=0.1, reward_bound=0.0
136902: loss=0.000, reward_m

137043: loss=0.000, reward_mean=0.0, reward_bound=0.0
137044: loss=0.000, reward_mean=0.0, reward_bound=0.0
137045: loss=0.000, reward_mean=0.0, reward_bound=0.0
137046: loss=0.000, reward_mean=0.1, reward_bound=0.0
137047: loss=0.000, reward_mean=0.0, reward_bound=0.0
137048: loss=0.000, reward_mean=0.1, reward_bound=0.0
137049: loss=0.000, reward_mean=0.1, reward_bound=0.0
137050: loss=0.000, reward_mean=0.1, reward_bound=0.0
137051: loss=0.000, reward_mean=0.0, reward_bound=0.0
137052: loss=0.000, reward_mean=0.1, reward_bound=0.0
137053: loss=0.000, reward_mean=0.3, reward_bound=0.5
137054: loss=0.000, reward_mean=0.1, reward_bound=0.0
137055: loss=0.000, reward_mean=0.2, reward_bound=0.0
137056: loss=0.000, reward_mean=0.1, reward_bound=0.0
137057: loss=0.000, reward_mean=0.1, reward_bound=0.0
137058: loss=0.000, reward_mean=0.1, reward_bound=0.0
137059: loss=0.000, reward_mean=0.0, reward_bound=0.0
137060: loss=0.000, reward_mean=0.0, reward_bound=0.0
137061: loss=0.000, reward_m

137198: loss=0.000, reward_mean=0.1, reward_bound=0.0
137199: loss=0.000, reward_mean=0.1, reward_bound=0.0
137200: loss=0.000, reward_mean=0.1, reward_bound=0.0
137201: loss=0.000, reward_mean=0.1, reward_bound=0.0
137202: loss=0.000, reward_mean=0.0, reward_bound=0.0
137203: loss=0.000, reward_mean=0.1, reward_bound=0.0
137204: loss=0.000, reward_mean=0.1, reward_bound=0.0
137205: loss=0.000, reward_mean=0.0, reward_bound=0.0
137206: loss=0.000, reward_mean=0.1, reward_bound=0.0
137207: loss=0.000, reward_mean=0.1, reward_bound=0.0
137208: loss=0.000, reward_mean=0.0, reward_bound=0.0
137209: loss=0.000, reward_mean=0.1, reward_bound=0.0
137210: loss=0.000, reward_mean=0.1, reward_bound=0.0
137211: loss=0.000, reward_mean=0.1, reward_bound=0.0
137212: loss=0.000, reward_mean=0.1, reward_bound=0.0
137213: loss=0.000, reward_mean=0.0, reward_bound=0.0
137214: loss=0.000, reward_mean=0.1, reward_bound=0.0
137215: loss=0.000, reward_mean=0.0, reward_bound=0.0
137216: loss=0.000, reward_m

137354: loss=0.000, reward_mean=0.1, reward_bound=0.0
137355: loss=0.000, reward_mean=0.1, reward_bound=0.0
137356: loss=0.000, reward_mean=0.1, reward_bound=0.0
137357: loss=0.000, reward_mean=0.1, reward_bound=0.0
137358: loss=0.000, reward_mean=0.0, reward_bound=0.0
137359: loss=0.000, reward_mean=0.1, reward_bound=0.0
137360: loss=0.000, reward_mean=0.1, reward_bound=0.0
137361: loss=0.000, reward_mean=0.1, reward_bound=0.0
137362: loss=0.000, reward_mean=0.0, reward_bound=0.0
137363: loss=0.000, reward_mean=0.0, reward_bound=0.0
137364: loss=0.000, reward_mean=0.0, reward_bound=0.0
137365: loss=0.000, reward_mean=0.1, reward_bound=0.0
137366: loss=0.000, reward_mean=0.1, reward_bound=0.0
137367: loss=0.000, reward_mean=0.0, reward_bound=0.0
137368: loss=0.000, reward_mean=0.0, reward_bound=0.0
137369: loss=0.000, reward_mean=0.2, reward_bound=0.0
137370: loss=0.000, reward_mean=0.0, reward_bound=0.0
137371: loss=0.000, reward_mean=0.0, reward_bound=0.0
137372: loss=0.000, reward_m

137510: loss=0.000, reward_mean=0.0, reward_bound=0.0
137511: loss=0.000, reward_mean=0.0, reward_bound=0.0
137512: loss=0.000, reward_mean=0.1, reward_bound=0.0
137513: loss=0.000, reward_mean=0.0, reward_bound=0.0
137514: loss=0.000, reward_mean=0.0, reward_bound=0.0
137515: loss=0.000, reward_mean=0.1, reward_bound=0.0
137516: loss=0.000, reward_mean=0.0, reward_bound=0.0
137517: loss=0.000, reward_mean=0.0, reward_bound=0.0
137518: loss=0.000, reward_mean=0.1, reward_bound=0.0
137519: loss=0.000, reward_mean=0.1, reward_bound=0.0
137520: loss=0.000, reward_mean=0.1, reward_bound=0.0
137521: loss=0.000, reward_mean=0.1, reward_bound=0.0
137522: loss=0.000, reward_mean=0.0, reward_bound=0.0
137523: loss=0.000, reward_mean=0.1, reward_bound=0.0
137524: loss=0.000, reward_mean=0.0, reward_bound=0.0
137525: loss=0.000, reward_mean=0.1, reward_bound=0.0
137526: loss=0.000, reward_mean=0.1, reward_bound=0.0
137527: loss=0.000, reward_mean=0.0, reward_bound=0.0
137528: loss=0.000, reward_m

137663: loss=0.000, reward_mean=0.0, reward_bound=0.0
137664: loss=0.000, reward_mean=0.1, reward_bound=0.0
137665: loss=0.000, reward_mean=0.1, reward_bound=0.0
137666: loss=0.000, reward_mean=0.0, reward_bound=0.0
137667: loss=0.000, reward_mean=0.0, reward_bound=0.0
137668: loss=0.000, reward_mean=0.2, reward_bound=0.0
137669: loss=0.000, reward_mean=0.2, reward_bound=0.0
137670: loss=0.000, reward_mean=0.0, reward_bound=0.0
137671: loss=0.000, reward_mean=0.0, reward_bound=0.0
137672: loss=0.000, reward_mean=0.0, reward_bound=0.0
137673: loss=0.000, reward_mean=0.0, reward_bound=0.0
137674: loss=0.000, reward_mean=0.0, reward_bound=0.0
137675: loss=0.000, reward_mean=0.0, reward_bound=0.0
137676: loss=0.000, reward_mean=0.1, reward_bound=0.0
137677: loss=0.000, reward_mean=0.0, reward_bound=0.0
137678: loss=0.000, reward_mean=0.1, reward_bound=0.0
137679: loss=0.000, reward_mean=0.1, reward_bound=0.0
137680: loss=0.000, reward_mean=0.2, reward_bound=0.0
137681: loss=0.000, reward_m

137816: loss=0.000, reward_mean=0.1, reward_bound=0.0
137817: loss=0.000, reward_mean=0.0, reward_bound=0.0
137818: loss=0.000, reward_mean=0.1, reward_bound=0.0
137819: loss=0.000, reward_mean=0.0, reward_bound=0.0
137820: loss=0.000, reward_mean=0.1, reward_bound=0.0
137821: loss=0.000, reward_mean=0.2, reward_bound=0.0
137822: loss=0.000, reward_mean=0.1, reward_bound=0.0
137823: loss=0.000, reward_mean=0.1, reward_bound=0.0
137824: loss=0.000, reward_mean=0.0, reward_bound=0.0
137825: loss=0.000, reward_mean=0.1, reward_bound=0.0
137826: loss=0.000, reward_mean=0.1, reward_bound=0.0
137827: loss=0.000, reward_mean=0.1, reward_bound=0.0
137828: loss=0.000, reward_mean=0.0, reward_bound=0.0
137829: loss=0.000, reward_mean=0.2, reward_bound=0.0
137830: loss=0.000, reward_mean=0.0, reward_bound=0.0
137831: loss=0.000, reward_mean=0.0, reward_bound=0.0
137832: loss=0.000, reward_mean=0.1, reward_bound=0.0
137833: loss=0.000, reward_mean=0.1, reward_bound=0.0
137834: loss=0.000, reward_m

137969: loss=0.000, reward_mean=0.0, reward_bound=0.0
137970: loss=0.000, reward_mean=0.1, reward_bound=0.0
137971: loss=0.000, reward_mean=0.0, reward_bound=0.0
137972: loss=0.000, reward_mean=0.1, reward_bound=0.0
137973: loss=0.000, reward_mean=0.1, reward_bound=0.0
137974: loss=0.000, reward_mean=0.1, reward_bound=0.0
137975: loss=0.000, reward_mean=0.1, reward_bound=0.0
137976: loss=0.000, reward_mean=0.0, reward_bound=0.0
137977: loss=0.000, reward_mean=0.0, reward_bound=0.0
137978: loss=0.000, reward_mean=0.0, reward_bound=0.0
137979: loss=0.000, reward_mean=0.0, reward_bound=0.0
137980: loss=0.000, reward_mean=0.1, reward_bound=0.0
137981: loss=0.000, reward_mean=0.0, reward_bound=0.0
137982: loss=0.000, reward_mean=0.2, reward_bound=0.0
137983: loss=0.000, reward_mean=0.0, reward_bound=0.0
137984: loss=0.000, reward_mean=0.1, reward_bound=0.0
137985: loss=0.000, reward_mean=0.0, reward_bound=0.0
137986: loss=0.000, reward_mean=0.1, reward_bound=0.0
137987: loss=0.000, reward_m

138123: loss=0.000, reward_mean=0.0, reward_bound=0.0
138124: loss=0.000, reward_mean=0.1, reward_bound=0.0
138125: loss=0.000, reward_mean=0.1, reward_bound=0.0
138126: loss=0.000, reward_mean=0.0, reward_bound=0.0
138127: loss=0.000, reward_mean=0.2, reward_bound=0.0
138128: loss=0.000, reward_mean=0.1, reward_bound=0.0
138129: loss=0.000, reward_mean=0.0, reward_bound=0.0
138130: loss=0.000, reward_mean=0.1, reward_bound=0.0
138131: loss=0.000, reward_mean=0.1, reward_bound=0.0
138132: loss=0.000, reward_mean=0.1, reward_bound=0.0
138133: loss=0.000, reward_mean=0.0, reward_bound=0.0
138134: loss=0.000, reward_mean=0.1, reward_bound=0.0
138135: loss=0.000, reward_mean=0.0, reward_bound=0.0
138136: loss=0.000, reward_mean=0.0, reward_bound=0.0
138137: loss=0.000, reward_mean=0.1, reward_bound=0.0
138138: loss=0.000, reward_mean=0.1, reward_bound=0.0
138139: loss=0.000, reward_mean=0.0, reward_bound=0.0
138140: loss=0.000, reward_mean=0.1, reward_bound=0.0
138141: loss=0.000, reward_m

138276: loss=0.000, reward_mean=0.0, reward_bound=0.0
138277: loss=0.000, reward_mean=0.1, reward_bound=0.0
138278: loss=0.000, reward_mean=0.1, reward_bound=0.0
138279: loss=0.000, reward_mean=0.2, reward_bound=0.0
138280: loss=0.000, reward_mean=0.0, reward_bound=0.0
138281: loss=0.000, reward_mean=0.1, reward_bound=0.0
138282: loss=0.000, reward_mean=0.0, reward_bound=0.0
138283: loss=0.000, reward_mean=0.1, reward_bound=0.0
138284: loss=0.000, reward_mean=0.0, reward_bound=0.0
138285: loss=0.000, reward_mean=0.0, reward_bound=0.0
138286: loss=0.000, reward_mean=0.0, reward_bound=0.0
138287: loss=0.000, reward_mean=0.1, reward_bound=0.0
138288: loss=0.000, reward_mean=0.1, reward_bound=0.0
138289: loss=0.000, reward_mean=0.1, reward_bound=0.0
138290: loss=0.000, reward_mean=0.1, reward_bound=0.0
138291: loss=0.000, reward_mean=0.1, reward_bound=0.0
138292: loss=0.000, reward_mean=0.1, reward_bound=0.0
138293: loss=0.000, reward_mean=0.0, reward_bound=0.0
138294: loss=0.000, reward_m

138429: loss=0.000, reward_mean=0.1, reward_bound=0.0
138430: loss=0.000, reward_mean=0.0, reward_bound=0.0
138431: loss=0.000, reward_mean=0.0, reward_bound=0.0
138432: loss=0.000, reward_mean=0.0, reward_bound=0.0
138433: loss=0.000, reward_mean=0.0, reward_bound=0.0
138434: loss=0.000, reward_mean=0.1, reward_bound=0.0
138435: loss=0.000, reward_mean=0.1, reward_bound=0.0
138436: loss=0.000, reward_mean=0.1, reward_bound=0.0
138437: loss=0.000, reward_mean=0.0, reward_bound=0.0
138438: loss=0.000, reward_mean=0.2, reward_bound=0.0
138439: loss=0.000, reward_mean=0.1, reward_bound=0.0
138440: loss=0.000, reward_mean=0.1, reward_bound=0.0
138441: loss=0.000, reward_mean=0.1, reward_bound=0.0
138442: loss=0.000, reward_mean=0.0, reward_bound=0.0
138443: loss=0.000, reward_mean=0.0, reward_bound=0.0
138444: loss=0.000, reward_mean=0.0, reward_bound=0.0
138445: loss=0.000, reward_mean=0.0, reward_bound=0.0
138446: loss=0.000, reward_mean=0.0, reward_bound=0.0
138447: loss=0.000, reward_m

138584: loss=0.000, reward_mean=0.0, reward_bound=0.0
138585: loss=0.000, reward_mean=0.0, reward_bound=0.0
138586: loss=0.000, reward_mean=0.0, reward_bound=0.0
138587: loss=0.000, reward_mean=0.1, reward_bound=0.0
138588: loss=0.000, reward_mean=0.1, reward_bound=0.0
138589: loss=0.000, reward_mean=0.0, reward_bound=0.0
138590: loss=0.000, reward_mean=0.0, reward_bound=0.0
138591: loss=0.000, reward_mean=0.1, reward_bound=0.0
138592: loss=0.000, reward_mean=0.1, reward_bound=0.0
138593: loss=0.000, reward_mean=0.0, reward_bound=0.0
138594: loss=0.000, reward_mean=0.0, reward_bound=0.0
138595: loss=0.000, reward_mean=0.1, reward_bound=0.0
138596: loss=0.000, reward_mean=0.1, reward_bound=0.0
138597: loss=0.000, reward_mean=0.1, reward_bound=0.0
138598: loss=0.000, reward_mean=0.0, reward_bound=0.0
138599: loss=0.000, reward_mean=0.0, reward_bound=0.0
138600: loss=0.000, reward_mean=0.0, reward_bound=0.0
138601: loss=0.000, reward_mean=0.1, reward_bound=0.0
138602: loss=0.000, reward_m

138736: loss=0.000, reward_mean=0.0, reward_bound=0.0
138737: loss=0.000, reward_mean=0.0, reward_bound=0.0
138738: loss=0.000, reward_mean=0.1, reward_bound=0.0
138739: loss=0.000, reward_mean=0.1, reward_bound=0.0
138740: loss=0.000, reward_mean=0.1, reward_bound=0.0
138741: loss=0.000, reward_mean=0.1, reward_bound=0.0
138742: loss=0.000, reward_mean=0.1, reward_bound=0.0
138743: loss=0.000, reward_mean=0.1, reward_bound=0.0
138744: loss=0.000, reward_mean=0.1, reward_bound=0.0
138745: loss=0.000, reward_mean=0.0, reward_bound=0.0
138746: loss=0.000, reward_mean=0.1, reward_bound=0.0
138747: loss=0.000, reward_mean=0.2, reward_bound=0.0
138748: loss=0.000, reward_mean=0.0, reward_bound=0.0
138749: loss=0.000, reward_mean=0.1, reward_bound=0.0
138750: loss=0.000, reward_mean=0.1, reward_bound=0.0
138751: loss=0.000, reward_mean=0.1, reward_bound=0.0
138752: loss=0.000, reward_mean=0.0, reward_bound=0.0
138753: loss=0.000, reward_mean=0.1, reward_bound=0.0
138754: loss=0.000, reward_m

138891: loss=0.000, reward_mean=0.1, reward_bound=0.0
138892: loss=0.000, reward_mean=0.1, reward_bound=0.0
138893: loss=0.000, reward_mean=0.1, reward_bound=0.0
138894: loss=0.000, reward_mean=0.2, reward_bound=0.0
138895: loss=0.000, reward_mean=0.2, reward_bound=0.0
138896: loss=0.000, reward_mean=0.1, reward_bound=0.0
138897: loss=0.000, reward_mean=0.0, reward_bound=0.0
138898: loss=0.000, reward_mean=0.0, reward_bound=0.0
138899: loss=0.000, reward_mean=0.1, reward_bound=0.0
138900: loss=0.000, reward_mean=0.0, reward_bound=0.0
138901: loss=0.000, reward_mean=0.0, reward_bound=0.0
138902: loss=0.000, reward_mean=0.0, reward_bound=0.0
138903: loss=0.000, reward_mean=0.0, reward_bound=0.0
138904: loss=0.000, reward_mean=0.1, reward_bound=0.0
138905: loss=0.000, reward_mean=0.1, reward_bound=0.0
138906: loss=0.000, reward_mean=0.1, reward_bound=0.0
138907: loss=0.000, reward_mean=0.1, reward_bound=0.0
138908: loss=0.000, reward_mean=0.1, reward_bound=0.0
138909: loss=0.000, reward_m

139046: loss=0.000, reward_mean=0.1, reward_bound=0.0
139047: loss=0.000, reward_mean=0.1, reward_bound=0.0
139048: loss=0.000, reward_mean=0.0, reward_bound=0.0
139049: loss=0.000, reward_mean=0.1, reward_bound=0.0
139050: loss=0.000, reward_mean=0.0, reward_bound=0.0
139051: loss=0.000, reward_mean=0.0, reward_bound=0.0
139052: loss=0.000, reward_mean=0.1, reward_bound=0.0
139053: loss=0.000, reward_mean=0.1, reward_bound=0.0
139054: loss=0.000, reward_mean=0.0, reward_bound=0.0
139055: loss=0.000, reward_mean=0.1, reward_bound=0.0
139056: loss=0.000, reward_mean=0.0, reward_bound=0.0
139057: loss=0.000, reward_mean=0.0, reward_bound=0.0
139058: loss=0.000, reward_mean=0.1, reward_bound=0.0
139059: loss=0.000, reward_mean=0.1, reward_bound=0.0
139060: loss=0.000, reward_mean=0.1, reward_bound=0.0
139061: loss=0.000, reward_mean=0.0, reward_bound=0.0
139062: loss=0.000, reward_mean=0.0, reward_bound=0.0
139063: loss=0.000, reward_mean=0.0, reward_bound=0.0
139064: loss=0.000, reward_m

139201: loss=0.000, reward_mean=0.1, reward_bound=0.0
139202: loss=0.000, reward_mean=0.0, reward_bound=0.0
139203: loss=0.000, reward_mean=0.0, reward_bound=0.0
139204: loss=0.000, reward_mean=0.1, reward_bound=0.0
139205: loss=0.000, reward_mean=0.0, reward_bound=0.0
139206: loss=0.000, reward_mean=0.1, reward_bound=0.0
139207: loss=0.000, reward_mean=0.1, reward_bound=0.0
139208: loss=0.000, reward_mean=0.0, reward_bound=0.0
139209: loss=0.000, reward_mean=0.0, reward_bound=0.0
139210: loss=0.000, reward_mean=0.0, reward_bound=0.0
139211: loss=0.000, reward_mean=0.0, reward_bound=0.0
139212: loss=0.000, reward_mean=0.1, reward_bound=0.0
139213: loss=0.000, reward_mean=0.0, reward_bound=0.0
139214: loss=0.000, reward_mean=0.1, reward_bound=0.0
139215: loss=0.000, reward_mean=0.1, reward_bound=0.0
139216: loss=0.000, reward_mean=0.1, reward_bound=0.0
139217: loss=0.000, reward_mean=0.1, reward_bound=0.0
139218: loss=0.000, reward_mean=0.0, reward_bound=0.0
139219: loss=0.000, reward_m

139355: loss=0.000, reward_mean=0.0, reward_bound=0.0
139356: loss=0.000, reward_mean=0.0, reward_bound=0.0
139357: loss=0.000, reward_mean=0.2, reward_bound=0.0
139358: loss=0.000, reward_mean=0.1, reward_bound=0.0
139359: loss=0.000, reward_mean=0.1, reward_bound=0.0
139360: loss=0.000, reward_mean=0.1, reward_bound=0.0
139361: loss=0.000, reward_mean=0.1, reward_bound=0.0
139362: loss=0.000, reward_mean=0.0, reward_bound=0.0
139363: loss=0.000, reward_mean=0.0, reward_bound=0.0
139364: loss=0.000, reward_mean=0.0, reward_bound=0.0
139365: loss=0.000, reward_mean=0.1, reward_bound=0.0
139366: loss=0.000, reward_mean=0.0, reward_bound=0.0
139367: loss=0.000, reward_mean=0.0, reward_bound=0.0
139368: loss=0.000, reward_mean=0.1, reward_bound=0.0
139369: loss=0.000, reward_mean=0.1, reward_bound=0.0
139370: loss=0.000, reward_mean=0.1, reward_bound=0.0
139371: loss=0.000, reward_mean=0.0, reward_bound=0.0
139372: loss=0.000, reward_mean=0.2, reward_bound=0.0
139373: loss=0.000, reward_m

139514: loss=0.000, reward_mean=0.1, reward_bound=0.0
139515: loss=0.000, reward_mean=0.0, reward_bound=0.0
139516: loss=0.000, reward_mean=0.1, reward_bound=0.0
139517: loss=0.000, reward_mean=0.1, reward_bound=0.0
139518: loss=0.000, reward_mean=0.1, reward_bound=0.0
139519: loss=0.000, reward_mean=0.0, reward_bound=0.0
139520: loss=0.000, reward_mean=0.0, reward_bound=0.0
139521: loss=0.000, reward_mean=0.1, reward_bound=0.0
139522: loss=0.000, reward_mean=0.1, reward_bound=0.0
139523: loss=0.000, reward_mean=0.0, reward_bound=0.0
139524: loss=0.000, reward_mean=0.1, reward_bound=0.0
139525: loss=0.000, reward_mean=0.0, reward_bound=0.0
139526: loss=0.000, reward_mean=0.1, reward_bound=0.0
139527: loss=0.000, reward_mean=0.1, reward_bound=0.0
139528: loss=0.000, reward_mean=0.1, reward_bound=0.0
139529: loss=0.000, reward_mean=0.0, reward_bound=0.0
139530: loss=0.000, reward_mean=0.0, reward_bound=0.0
139531: loss=0.000, reward_mean=0.0, reward_bound=0.0
139532: loss=0.000, reward_m

139666: loss=0.000, reward_mean=0.0, reward_bound=0.0
139667: loss=0.000, reward_mean=0.1, reward_bound=0.0
139668: loss=0.000, reward_mean=0.1, reward_bound=0.0
139669: loss=0.000, reward_mean=0.1, reward_bound=0.0
139670: loss=0.000, reward_mean=0.0, reward_bound=0.0
139671: loss=0.000, reward_mean=0.1, reward_bound=0.0
139672: loss=0.000, reward_mean=0.1, reward_bound=0.0
139673: loss=0.000, reward_mean=0.0, reward_bound=0.0
139674: loss=0.000, reward_mean=0.0, reward_bound=0.0
139675: loss=0.000, reward_mean=0.1, reward_bound=0.0
139676: loss=0.000, reward_mean=0.1, reward_bound=0.0
139677: loss=0.000, reward_mean=0.0, reward_bound=0.0
139678: loss=0.000, reward_mean=0.1, reward_bound=0.0
139679: loss=0.000, reward_mean=0.1, reward_bound=0.0
139680: loss=0.000, reward_mean=0.1, reward_bound=0.0
139681: loss=0.000, reward_mean=0.1, reward_bound=0.0
139682: loss=0.000, reward_mean=0.1, reward_bound=0.0
139683: loss=0.000, reward_mean=0.0, reward_bound=0.0
139684: loss=0.000, reward_m

KeyboardInterrupt: 

## 결과 
<img src ="./image/4/t-2.png">

#### 시간이 지나도 성능이 좋아지지 않음 


### - 보상 구조 
#### - CartPole 환경, 환경의 모든 단계에서 막대가 떨어질 때까지 보상 1.0을 줌 
- 더 오래 막대기를 유지한 에이전트가 더 높은 보상을 얻음 
- 에이전트의 무작위성으로 인해 꽤나 각 에피소드의 보상은 정규분포를 따름 

<img src = "./image/4/distribution.png">

- 이후 보상경계를 선택하여 덜 성공적인 에피소드를 제외하고 더 나은 에피소드를 반복하도록 학습 
    

#### - FrozenLake 환경, 목적지에 도달해야만 보상 1.0을 줌 
- 어떤 에피소드가 좋은 것인지 알 수 없음 
- 어떤 방법으로 목적지에 도달했는지 알 수 없음 
- 에피소드의 보상의 분포는 애매해짐 

<img src="./image/4/f_distribution.png">
   
- 0 (실패) / 1 (성공) 
- 엘리트 에피소드를 걸러내는 것이 잘못되었고 좋지 않은 에피소드를 학습하게 됨 
- 그러므로, 학습이 실패함

## Cross-entropy 방법의 한계 
#### - 학습에 있어서, 에피소드는 유한하고 가급적 짧아야 함 
#### - 에피소드의 총 보상은 좋은 에피소드와 나쁜 에피소드를 구분할 수 있을만큼 충분한 변동성이 있어야 함 
#### - 에이전트가 성공할지 실패할지에 대해 중간에 알 수 없음 
#### - 이러한 한계들은 앞으로 배울 다른 방법들에의해 다뤄질 수 있음 


