<a href="https://colab.research.google.com/github/lincolnschick/ML4MC/blob/main/docs/reports/requirement-10-code/hedges_MineRL_BC%2Bscripted.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Documentation Note

This is a modified version of FindCaveTestAll.ipynb from the previous sprint.

All blocks new for this sprint are labeled (New) and have been commented.

# Configuration Steps (New)

1. Get access to the shared drive folder for the project
2. Create a shortcut to your drive using *Organize -> Add Shortcut -> All Locations -> My Drive*
3. While not required, if using multiple machines you may want to set the file host's name to keep track of which computer each file dump comes from.
4. Run the Colab file.
5. If the Colab file gets stuck, delete the runtime and start over. This Colab file is configured to use Unix timestamps for model naming, which means you do not need to change any variables in the file if the run changes.

Shared folder must have a samples.csv to seed random sampling.

## Known Issues (New)

* Folders for each model are created, but the folder target does not update after a certain amount of runs.
* modelinfo.csv needs to be renamed to something with a timestamp in order to prevent overwriting until the above issue is fixed.

With Google Drive support, these issues do not overwrite data, but the modelinfo.csv version history will need to be accessed to restore the correct versions. Everything else is timestamped, so the order of the files can be resolved without additional information.

## Set Globals (New)

In [None]:
#(New)
HOST = "host" # The host computer for this process
DESTINATION = "drive/MyDrive/ParallelMineRL/" # The path to the shared drive location

# Mount Drive (New)

In [None]:
#(New)
# We are using Google Drive to support automatic file transfer.
# When Google Drive mounts the user's drive,
# a path accessible by OS commands opens into their drive.
from google.colab import drive
drive.mount('/content/drive') # mount google drive

Mounted at /content/drive


# Setup

In [None]:
!sudo add-apt-repository -y ppa:openjdk-r/ppa
!sudo apt-get purge openjdk-*
!sudo apt-get install openjdk-8-jdk
!sudo apt-get install xvfb
!sudo apt-get install xserver-xephyr
!sudo apt install tigervnc-standalone-server
!sudo apt-get install -y python3-opengl
!sudo apt-get install ffmpeg
!pip3 install gym==0.13.1
!pip3 install minerl==0.4.4
!pip3 install pyvirtualdisplay
!pip3 install -U colabgymrender

PPA publishes dbgsym, you may need to include 'main/debug' component
Repository: 'deb https://ppa.launchpadcontent.net/openjdk-r/ppa/ubuntu/ jammy main'
More info: https://launchpad.net/~openjdk-r/+archive/ubuntu/ppa
Adding repository.
Adding deb entry to /etc/apt/sources.list.d/openjdk-r-ubuntu-ppa-jammy.list
Adding disabled deb-src entry to /etc/apt/sources.list.d/openjdk-r-ubuntu-ppa-jammy.list
Adding key to /etc/apt/trusted.gpg.d/openjdk-r-ubuntu-ppa.gpg with fingerprint DA1A4A13543B466853BAF164EB9B1D8886F44E2A
Get:1 http://security.ubuntu.com/ubuntu jammy-security InRelease [110 kB]
Get:2 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ InRelease [3,626 B]
Hit:3 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  InRelease
Hit:4 http://archive.ubuntu.com/ubuntu jammy InRelease
Get:5 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ Packages [46.8 kB]
Get:6 http://archive.ubuntu.com/ubuntu jammy-updates InRelease [119 kB]
Get:7 http://secu

# Import Libraries

In [None]:
import numpy as np
import torch as th
from torch import nn
import gym
import minerl
from tqdm.notebook import tqdm
from colabgymrender.recorder import Recorder
from pyvirtualdisplay import Display
import logging
# logging.disable(logging.ERROR) # reduce clutter, remove if something doesn't work to see the error logs.

In [None]:
#(New)
import os       # for file transfer
import time     # for timestamping
import random   # for random decisions
import csv      # for accessing csv files

# Neural network

In [None]:
class NatureCNN(nn.Module):
    """
    CNN from DQN nature paper:
        Mnih, Volodymyr, et al.
        "Human-level control through deep reinforcement learning."
        Nature 518.7540 (2015): 529-533.

    :param input_shape: A three-item tuple telling image dimensions in (C, H, W)
    :param output_dim: Dimensionality of the output vector
    """

    def __init__(self, input_shape, output_dim):
        super().__init__()
        n_input_channels = input_shape[0]
        self.cnn = nn.Sequential(
            nn.Conv2d(n_input_channels, 32, kernel_size=8, stride=4, padding=0),
            nn.ReLU(),
            nn.Conv2d(32, 64, kernel_size=4, stride=2, padding=0),
            nn.ReLU(),
            nn.Conv2d(64, 64, kernel_size=3, stride=1, padding=0),
            nn.ReLU(),
            nn.Flatten(),
        )

        # Compute shape by doing one forward pass
        with th.no_grad():
            n_flatten = self.cnn(th.zeros(1, *input_shape)).shape[1]

        self.linear = nn.Sequential(
            nn.Linear(n_flatten, 512),
            nn.ReLU(),
            nn.Linear(512, output_dim)
        )

    def forward(self, observations: th.Tensor) -> th.Tensor:
        return self.linear(self.cnn(observations))

# Environment wrappers

In [None]:
class ActionShaping(gym.ActionWrapper):
    """
    The default MineRL action space is the following dict:

    Dict(attack:Discrete(2),
         back:Discrete(2),
         camera:Box(low=-180.0, high=180.0, shape=(2,)),
         craft:Enum(crafting_table,none,planks,stick,torch),
         equip:Enum(air,iron_axe,iron_pickaxe,none,stone_axe,stone_pickaxe,wooden_axe,wooden_pickaxe),
         forward:Discrete(2),
         jump:Discrete(2),
         left:Discrete(2),
         nearbyCraft:Enum(furnace,iron_axe,iron_pickaxe,none,stone_axe,stone_pickaxe,wooden_axe,wooden_pickaxe),
         nearbySmelt:Enum(coal,iron_ingot,none),
         place:Enum(cobblestone,crafting_table,dirt,furnace,none,stone,torch),
         right:Discrete(2),
         sneak:Discrete(2),
         sprint:Discrete(2))

    It can be viewed as:
         - buttons, like attack, back, forward, sprint that are either pressed or not.
         - mouse, i.e. the continuous camera action in degrees. The two values are pitch (up/down), where up is
           negative, down is positive, and yaw (left/right), where left is negative, right is positive.
         - craft/equip/place actions for items specified above.
    So an example action could be sprint + forward + jump + attack + turn camera, all in one action.

    This wrapper makes the action space much smaller by selecting a few common actions and making the camera actions
    discrete. You can change these actions by changing self._actions below. That should just work with the RL agent,
    but would require some further tinkering below with the BC one.
    """
    def __init__(self, env, camera_angle=10, always_attack=False):
        super().__init__(env)

        self.camera_angle = camera_angle
        self.always_attack = always_attack
        self._actions = [
            [('attack', 1)],
            [('forward', 1)],
            # Actions below not needed for treechop
            # [('back', 1)],
            # [('left', 1)],
            # [('right', 1)],
            # [('jump', 1)],
            # [('forward', 1), ('attack', 1)],
            # [('craft', 'planks')],
            [('forward', 1), ('jump', 1)],
            [('camera', [-self.camera_angle, 0])],
            [('camera', [self.camera_angle, 0])],
            [('camera', [0, self.camera_angle])],
            [('camera', [0, -self.camera_angle])],
        ]

        self.actions = []
        for actions in self._actions:
            act = self.env.action_space.noop()
            for a, v in actions:
                act[a] = v
            if self.always_attack:
                act['attack'] = 1
            self.actions.append(act)

        self.action_space = gym.spaces.Discrete(len(self.actions))

    def action(self, action):
        return self.actions[action]

# Data parser

In [None]:
def dataset_action_batch_to_actions(dataset_actions, camera_margin=5):
    """
    Turn a batch of actions from dataset (`batch_iter`) to a numpy
    array that corresponds to batch of actions of ActionShaping wrapper (_actions).

    Camera margin sets the threshold what is considered "moving camera".

    Note: Hardcoded to work for actions in ActionShaping._actions, with "intuitive"
        ordering of actions.
        If you change ActionShaping._actions, remember to change this!

    Array elements are integers corresponding to actions, or "-1"
    for actions that did not have any corresponding discrete match.
    """
    # There are dummy dimensions of shape one
    camera_actions = dataset_actions["camera"].squeeze()
    attack_actions = dataset_actions["attack"].squeeze()
    forward_actions = dataset_actions["forward"].squeeze()
    jump_actions = dataset_actions["jump"].squeeze()
    batch_size = len(camera_actions)
    actions = np.zeros((batch_size,), dtype=np.int)

    for i in range(len(camera_actions)):
        # Moving camera is most important (horizontal first)
        if camera_actions[i][0] < -camera_margin:
            actions[i] = 3
        elif camera_actions[i][0] > camera_margin:
            actions[i] = 4
        elif camera_actions[i][1] > camera_margin:
            actions[i] = 5
        elif camera_actions[i][1] < -camera_margin:
            actions[i] = 6
        elif forward_actions[i] == 1:
            if jump_actions[i] == 1:
                actions[i] = 2
            else:
                actions[i] = 1
        elif attack_actions[i] == 1:
            actions[i] = 0
        else:
            # No reasonable mapping (would be no-op)
            actions[i] = -1
    return actions

# Parameters

### Pull a timestamp to generate the name (New)

In [None]:
#(New)
# We use timestamping to try to prevent duplicate naming
# This also works to create unique names without:
# -- User updating after a restart
# -- Searching through the drive for existing folders
TIMESTAMP = time.time()
print(TIMESTAMP)

1699812264.097381


### Sampler Definition (New)

In [None]:
#(New)
# We use a random sampler to enable a genetic algorithm.
# On successive steps, the GA can sample from high-performing hyperparameters
# on the previous step.

# all three of these random offset functions
# have a 50/50 chance of no change to the hyperparameter
# this means we can expect some preservation of good hyperparameters
def random_offset_epochs(epochs):
    if epochs == 1:
      epochs += random.choice([0,1])
    else:
      epochs += random.choice([0,0,1,-1])
    return epochs
def random_offset_learning_rate(learning_rate):
    if learning_rate >= 0.05:
        learning_rate *= random.choice([2 ** (random.random() - 1), 1])
    if learning_rate <= 0.00002:
        learning_rate *= random.choice([2 ** (random.random()), 1])
    else:
      learning_rate *= random.choice([(2 ** (2 * random.random() - 1)),1])
    # allow 0.5 to 2 multiplier
    # random.random() -> 0.0 ... 1.0
    # 2 * THIS -> 0.0 ... 2.0
    # THIS - 1 -> -1.0 ... 1.0
    # 2 ** THIS -> 2^-1.0 ... 2^1.0 -> 0.5 ... 2.0
    return learning_rate
def random_offset_batch_size(batch_size):
    if batch_size <= 24:
      batch_size += random.choice([0,0,0,0,0,1,2,3,4,5])
    elif batch_size >= 100:
      batch_size -= random.choice([0,0,0,0,0,1,2,3,4,5])
    else:
      batch_size += random.choice([0,0,0,0,0,0,0,0,0,0,1,2,3,4,5,-1,-2,-3,4,5])
    return batch_size


def sampler():
# open a file samples.csv that we will sample from
 with open(DESTINATION + 'samples.csv', newline='') as csvfile:
    samplereader = csv.DictReader(csvfile, delimiter=' ')
    population = []
    # for each row in the csv file -> each sample
    for row in samplereader:
        epochs = int(row['epochs']) # load the epochs
        learning_rate = float(row['learning_rate']) # load the learning rate
        batch_size = int(row['batch_size']) # load the batch size
        population.append({'epochs': epochs, 'learning_rate': learning_rate, 'batch_size': batch_size})
    sample = random.choice(population) # take a random sample of the population
    # randomly mutate it
    epochs = random_offset_epochs(sample['epochs'])
    learning_rate = random_offset_learning_rate(sample['learning_rate'])
    batch_size = random_offset_batch_size(sample['batch_size'])
    # return the new hyperparameters
    return epochs, learning_rate, batch_size

### Utilize Sampler (New)

In [None]:
#(New)
EPOCHS, LEARNING_RATE, BATCH_SIZE = sampler()
# save specific params back to a csv
with open('modelinfo.csv', 'w', newline='') as csvfile:
    spamwriter = csv.writer(csvfile, delimiter=' ',
                            quotechar='|', quoting=csv.QUOTE_MINIMAL)
    spamwriter.writerow([EPOCHS, LEARNING_RATE, BATCH_SIZE])

In [None]:
# Parameters:
TRAIN_MODEL_NAME = HOST + '_' + str(TIMESTAMP) + '_model' + '.pth'  # name to use when saving the trained agent.
TEST_MODEL_NAME = TRAIN_MODEL_NAME  # name to use when loading the trained agent.

TEST_EPISODES = 10  # number of episodes to test the agent for.
MAX_TEST_EPISODE_LEN = 15000  # 18k is the default for MineRLObtainDiamond.
FINDCAVE_STEPS = 15000  # number of steps to run BC for in evaluations.

# Setup training

In [None]:
def train(epochs, learning_rate, batch_size):
    """
    :param epochs: How many times we train over the dataset
    :param learning_rate: Learning rate for the neural network
    :param batch_size: How many samples before the model is updated
    """
    data = minerl.data.make("MineRLBasaltFindCave-v0",  data_dir='data', num_workers=4)

    # We know ActionShaping has seven discrete actions, so we create
    # a network to map images to seven values (logits), which represent
    # likelihoods of selecting those actions
    network = NatureCNN((3, 64, 64), 7).cuda()
    optimizer = th.optim.Adam(network.parameters(), lr=learning_rate)
    loss_function = nn.CrossEntropyLoss()

    iter_count = 0
    losses = []
    for dataset_obs, dataset_actions, _, _, _ in tqdm(data.batch_iter(num_epochs=epochs, batch_size=batch_size, seq_len=1)):
        # We only use pov observations (also remove dummy dimensions)
        obs = dataset_obs["pov"].squeeze().astype(np.float32)
        # Transpose observations to be channel-first (BCHW instead of BHWC)
        obs = obs.transpose(0, 3, 1, 2)
        # Normalize observations
        obs /= 255.0

        # Actions need bit more work
        actions = dataset_action_batch_to_actions(dataset_actions)

        # Remove samples that had no corresponding action
        mask = actions != -1
        obs = obs[mask]
        actions = actions[mask]

        # Obtain logits of each action
        logits = network(th.from_numpy(obs).float().cuda())

        # Minimize cross-entropy with target labels.
        # We could also compute the probability of demonstration actions and
        # maximize them.
        loss = loss_function(logits, th.from_numpy(actions).long().cuda())

        # Standard PyTorch update
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        iter_count += 1
        losses.append(loss.item())
        if (iter_count % 1000) == 0:
            mean_loss = sum(losses) / len(losses)
            tqdm.write("Iteration {}. Loss {:<10.3f}".format(iter_count, mean_loss))
            losses.clear()

    th.save(network.state_dict(), TRAIN_MODEL_NAME)

# Download the data

In [None]:
minerl.data.download(directory='data', environment='MineRLBasaltFindCave-v0');

  full_bar = Bar(frac,

Download: https://minerl.s3.amazonaws.com/v4/MineRLBasaltFindCave-v0.tar: 100%|██████████| 288.0/287.744 [00:08<00:00, 33.99MB/s]


# Train

In [None]:
train(EPOCHS, LEARNING_RATE, BATCH_SIZE)
#train(8, 0.00028, 32)

0it [00:00, ?it/s]

# Run your agent
Test the trained model in a new environment.

In [None]:
def test():
    network = NatureCNN((3, 64, 64), 7)
    network.load_state_dict(th.load(TEST_MODEL_NAME))

    # Test agent on a different environment
    env = gym.make('MineRLBasaltFindCave-v0')
    env = Recorder(env, './video', fps=60)
    env = ActionShaping(env, always_attack=True)

    num_actions = env.action_space.n
    action_list = np.arange(num_actions)

    for episode in range(TEST_EPISODES):
        obs = env.reset()
        done = False
        total_reward = 0
        steps = 0

        # BC part to get some logs:
        for i in range(FINDCAVE_STEPS):
            # Process the action:
            #   - Add/remove batch dimensions
            #   - Transpose image (needs to be channels-last)
            #   - Normalize image
            obs = th.from_numpy(obs['pov'].transpose(2, 0, 1)[None].astype(np.float32) / 255)
            # Turn logits into probabilities
            probabilities = th.softmax(network(obs), dim=1)[0]
            # Into numpy
            probabilities = probabilities.detach().cpu().numpy()
            # Sample action according to the probabilities
            action = np.random.choice(action_list, p=probabilities)
            obs, reward, done, info = env.step(action)
            total_reward += reward
            steps += 1
            if done:
                break

        print(f'Episode #{episode + 1} reward: {total_reward}\t\t episode length: {steps}')
    env.release()
    env.close()

display = Display(visible=0, size=(400, 300))
display.start();
test()

ERROR:minerl.env.malmo.instance.1efea2:[18:11:24] [main/ERROR]: The binary patch set is missing. Either you are in a development environment, or things are not going to work!
ERROR:minerl.env.malmo.instance.1efea2:[18:11:25] [main/ERROR]: FML appears to be missing any signature data. This is not a good thing
ERROR:minerl.env.malmo.instance.1efea2:[18:11:58] [Client thread/INFO]: [STDOUT]: [ERROR] Seed specified was NONE. Expected a long (integer).
ERROR:minerl.env.malmo.instance.1efea2:[18:11:59] [Thread-6/ERROR]: Error in class 'LibraryLWJGLOpenAL'
ERROR:minerl.env.malmo.instance.1efea2:[18:11:59] [Thread-6/ERROR]: Unable to initialize OpenAL.  Probable cause: OpenAL not supported.
ERROR:minerl.env.malmo.instance.1efea2:[18:11:59] [Thread-6/WARN]: ERROR MESSAGE:
ERROR:minerl.env.malmo.instance.1efea2:[18:11:59] [Sound Library Loader/WARN]: ERROR MESSAGE:
ERROR:minerl.env.malmo.instance.1efea2:[18:12:04] [Thread-10/ERROR]: Error in class 'LibraryLWJGLOpenAL'
ERROR:minerl.env.malmo.inst

Episode #1 reward: 0.0		 episode length: 1706




Episode #2 reward: 0.0		 episode length: 3538




Episode #3 reward: 0.0		 episode length: 3532
Episode #4 reward: 0.0		 episode length: 781




Episode #5 reward: 0.0		 episode length: 3535




Episode #6 reward: 0.0		 episode length: 3505
Episode #7 reward: 0.0		 episode length: 1880




Episode #8 reward: 0.0		 episode length: 3513




Episode #9 reward: 0.0		 episode length: 3511
Episode #10 reward: 0.0		 episode length: 3564


# Save Your Results (New)

In [None]:
#(New)
# autoformat a system move command with a source and destination
def move(src, dst=DESTINATION + HOST + "_" + str(TIMESTAMP) + "_files"):
  return f"mv {src} {dst}"

# save files to mounted drive
def file_dump():
  rwd = os.getcwd() # get the current working directory.
  # we need to be able to get back here.
  model_folder = HOST + "_" + str(TIMESTAMP) + "_files" # make a folder name from the model name
  # Note : we use slice to get rid of ".pth" at the end
  os.chdir(DESTINATION) # change to the mounted drive expected location
  os.mkdir(model_folder) # make the folder
  os.chdir(rwd) # return back
  os.system(move(TEST_MODEL_NAME)) # push model
  for video_name in os.listdir("video/"):
    os.system(move("video/"+video_name))
  os.system(move("modelinfo.csv"))

In [None]:
#(New)
file_dump()


# Running Again (NEW)

In [None]:
#(New)
for i in range(10):
  TIMESTAMP = time.time() # make a new timestamp
  EPOCHS, LEARNING_RATE, BATCH_SIZE = sampler() # sample again
  # save specific params back to a csv
  with open('modelinfo.csv', 'w', newline='') as csvfile:
    spamwriter = csv.writer(csvfile, delimiter=' ',
                            quotechar='|', quoting=csv.QUOTE_MINIMAL)
    spamwriter.writerow([EPOCHS, LEARNING_RATE, BATCH_SIZE])
  TRAIN_MODEL_NAME = HOST + '_' + str(TIMESTAMP) + '_model' + '.pth'  # name to use when saving the trained agent.
  TEST_MODEL_NAME = TRAIN_MODEL_NAME  # name to use when loading the trained agent.
  train(EPOCHS, LEARNING_RATE, BATCH_SIZE) # train new model
  display = Display(visible=0, size=(400, 300))
  display.start();
  test() # test new model
  file_dump() # save results

0it [00:00, ?it/s]

ERROR:minerl.env.malmo.instance.c2104c:[19:46:53] [main/ERROR]: The binary patch set is missing. Either you are in a development environment, or things are not going to work!
ERROR:minerl.env.malmo.instance.c2104c:[19:46:54] [main/ERROR]: FML appears to be missing any signature data. This is not a good thing
ERROR:minerl.env.malmo.instance.c2104c:[19:47:18] [Client thread/INFO]: [STDOUT]: [ERROR] Seed specified was NONE. Expected a long (integer).
ERROR:minerl.env.malmo.instance.c2104c:[19:47:19] [Thread-6/ERROR]: Error in class 'LibraryLWJGLOpenAL'
ERROR:minerl.env.malmo.instance.c2104c:[19:47:19] [Thread-6/ERROR]: Unable to initialize OpenAL.  Probable cause: OpenAL not supported.
ERROR:minerl.env.malmo.instance.c2104c:[19:47:19] [Thread-6/WARN]: ERROR MESSAGE:
ERROR:minerl.env.malmo.instance.c2104c:[19:47:19] [Sound Library Loader/WARN]: ERROR MESSAGE:
ERROR:minerl.env.malmo.instance.c2104c:[19:47:24] [Thread-10/ERROR]: Error in class 'LibraryLWJGLOpenAL'
ERROR:minerl.env.malmo.inst

Episode #1 reward: 0.0		 episode length: 3505
Episode #2 reward: 0.0		 episode length: 2520




Episode #3 reward: 0.0		 episode length: 3548




Episode #4 reward: 0.0		 episode length: 3549




Episode #5 reward: 0.0		 episode length: 3559




Episode #6 reward: 0.0		 episode length: 3552




Episode #7 reward: 0.0		 episode length: 3519
Episode #8 reward: 0.0		 episode length: 700




Episode #9 reward: 0.0		 episode length: 3555
Episode #10 reward: 0.0		 episode length: 3535


0it [00:00, ?it/s]

ERROR:minerl.env.malmo.instance.33016a:[20:32:12] [main/ERROR]: The binary patch set is missing. Either you are in a development environment, or things are not going to work!
ERROR:minerl.env.malmo.instance.33016a:[20:32:13] [main/ERROR]: FML appears to be missing any signature data. This is not a good thing
ERROR:minerl.env.malmo.instance.33016a:[20:32:43] [Client thread/INFO]: [STDOUT]: [ERROR] Seed specified was NONE. Expected a long (integer).
ERROR:minerl.env.malmo.instance.33016a:[20:32:44] [Thread-6/ERROR]: Error in class 'LibraryLWJGLOpenAL'
ERROR:minerl.env.malmo.instance.33016a:[20:32:44] [Thread-6/ERROR]: Unable to initialize OpenAL.  Probable cause: OpenAL not supported.
ERROR:minerl.env.malmo.instance.33016a:[20:32:44] [Thread-6/WARN]: ERROR MESSAGE:
ERROR:minerl.env.malmo.instance.33016a:[20:32:44] [Sound Library Loader/WARN]: ERROR MESSAGE:
ERROR:minerl.env.malmo.instance.33016a:[20:32:51] [Thread-10/ERROR]: Error in class 'LibraryLWJGLOpenAL'
ERROR:minerl.env.malmo.inst

Episode #1 reward: 0.0		 episode length: 3476
Episode #2 reward: 0.0		 episode length: 867




Episode #3 reward: 0.0		 episode length: 3519




Episode #4 reward: 0.0		 episode length: 3557
Episode #5 reward: 0.0		 episode length: 1602


ERROR:minerl.env.malmo.instance.33016a:[20:50:57] [Client thread/INFO]: [STDOUT]: STATE ERROR - multiple states in the queue.
ERROR:minerl.env.malmo.instance.33016a:[20:50:57] [Client thread/INFO]: [STDOUT]: STATE ERROR - multiple states in the queue.


Episode #6 reward: 0.0		 episode length: 3512


ERROR:minerl.env.malmo.instance.33016a:[20:56:58] [Client thread/INFO]: [STDOUT]: STATE ERROR - multiple states in the queue.
ERROR:minerl.env.malmo.instance.33016a:[20:56:58] [Client thread/INFO]: [STDOUT]: STATE ERROR - multiple states in the queue.
ERROR:minerl.env.malmo.instance.33016a:[20:56:58] [Client thread/INFO]: [STDOUT]: STATE ERROR - multiple states in the queue.
ERROR:minerl.env.malmo.instance.33016a:[20:56:58] [Client thread/INFO]: [STDOUT]: STATE ERROR - multiple states in the queue.
ERROR:minerl.env.malmo.instance.33016a:[20:56:58] [Client thread/INFO]: [STDOUT]: STATE ERROR - multiple states in the queue.
ERROR:minerl.env.malmo.instance.33016a:[20:56:58] [Client thread/INFO]: [STDOUT]: STATE ERROR - multiple states in the queue.
ERROR:minerl.env.malmo.instance.33016a:[20:56:58] [Client thread/INFO]: [STDOUT]: STATE ERROR - multiple states in the queue.
ERROR:minerl.env.malmo.instance.33016a:[20:56:58] [Client thread/INFO]: [STDOUT]: STATE ERROR - multiple states in the

Episode #7 reward: 0.0		 episode length: 1923


ERROR:minerl.env.malmo.instance.33016a:[20:56:58] [Client thread/INFO]: [STDOUT]: STATE ERROR - multiple states in the queue.
ERROR:minerl.env.malmo.instance.33016a:[20:56:58] [Client thread/INFO]: [STDOUT]: STATE ERROR - multiple states in the queue.


Episode #8 reward: 0.0		 episode length: 3516




Episode #9 reward: 0.0		 episode length: 3493
Episode #10 reward: 0.0		 episode length: 3542


0it [00:00, ?it/s]

Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  actions = np.zeros((batch_size,), dtype=np.int)



Iteration 1000. Loss 0.890     
Iteration 2000. Loss 0.876     
Iteration 3000. Loss 0.828     
Iteration 4000. Loss 0.798     
Iteration 5000. Loss 0.877     
Iteration 6000. Loss 0.840     
Iteration 7000. Loss 0.866     
Iteration 8000. Loss 0.805     
Iteration 9000. Loss 0.833     
Iteration 10000. Loss 0.863     
Iteration 11000. Loss 0.829     
Iteration 12000. Loss 0.868     
Iteration 13000. Loss 0.830     


ERROR:minerl.env.malmo.instance.b37c20:[21:17:54] [main/ERROR]: The binary patch set is missing. Either you are in a development environment, or things are not going to work!
ERROR:minerl.env.malmo.instance.b37c20:[21:17:56] [main/ERROR]: FML appears to be missing any signature data. This is not a good thing
ERROR:minerl.env.malmo.instance.b37c20:[21:18:21] [Client thread/INFO]: [STDOUT]: [ERROR] Seed specified was NONE. Expected a long (integer).
ERROR:minerl.env.malmo.instance.b37c20:[21:18:22] [Thread-6/ERROR]: Error in class 'LibraryLWJGLOpenAL'
ERROR:minerl.env.malmo.instance.b37c20:[21:18:22] [Thread-6/ERROR]: Unable to initialize OpenAL.  Probable cause: OpenAL not supported.
ERROR:minerl.env.malmo.instance.b37c20:[21:18:22] [Thread-6/WARN]: ERROR MESSAGE:
ERROR:minerl.env.malmo.instance.b37c20:[21:18:22] [Sound Library Loader/WARN]: ERROR MESSAGE:
ERROR:minerl.env.malmo.instance.b37c20:[21:18:27] [Thread-10/ERROR]: Error in class 'LibraryLWJGLOpenAL'
ERROR:minerl.env.malmo.inst

Episode #1 reward: 0.0		 episode length: 2342




Episode #2 reward: 0.0		 episode length: 3541




Episode #3 reward: 0.0		 episode length: 3494
Episode #4 reward: 0.0		 episode length: 1466




Episode #5 reward: 0.0		 episode length: 3539




Episode #6 reward: 0.0		 episode length: 3545
Episode #7 reward: 0.0		 episode length: 969




Episode #8 reward: 0.0		 episode length: 3502




Episode #9 reward: 0.0		 episode length: 3541
Episode #10 reward: 0.0		 episode length: 2545


0it [00:00, ?it/s]

ERROR:minerl.env.malmo.instance.45e392:[22:03:23] [main/ERROR]: The binary patch set is missing. Either you are in a development environment, or things are not going to work!
ERROR:minerl.env.malmo.instance.45e392:[22:03:24] [main/ERROR]: FML appears to be missing any signature data. This is not a good thing
ERROR:minerl.env.malmo.instance.45e392:[22:03:56] [Client thread/INFO]: [STDOUT]: [ERROR] Seed specified was NONE. Expected a long (integer).
ERROR:minerl.env.malmo.instance.45e392:[22:03:56] [Thread-6/ERROR]: Error in class 'LibraryLWJGLOpenAL'
ERROR:minerl.env.malmo.instance.45e392:[22:03:56] [Thread-6/ERROR]: Unable to initialize OpenAL.  Probable cause: OpenAL not supported.
ERROR:minerl.env.malmo.instance.45e392:[22:03:56] [Thread-6/WARN]: ERROR MESSAGE:
ERROR:minerl.env.malmo.instance.45e392:[22:03:56] [Sound Library Loader/WARN]: ERROR MESSAGE:
ERROR:minerl.env.malmo.instance.45e392:[22:04:04] [Thread-10/ERROR]: Error in class 'LibraryLWJGLOpenAL'
ERROR:minerl.env.malmo.inst

Episode #1 reward: 0.0		 episode length: 3534
