# Example: POS Tagging

According to [Wikipedia](https://en.wikipedia.org/wiki/Part-of-speech_tagging):

> Part-of-speech tagging (POS tagging or PoS tagging or POST) is the process of marking up a word in a text (corpus) as corresponding to a particular part of speech, based on both its definition and its context—i.e., its relationship with adjacent and related words in a phrase, sentence, or paragraph.

Formally, given a sequence of words $\mathbf{x} = \left< x_1, x_2, \ldots, x_t \right>$ the goal is to learn a model $P(y_i \,|\, \mathbf{x})$ where $y_i$ is the POS tag associated with the $x_i$.
Note that the model is conditioned on all of $\mathbf{x}$ not just the words that occur earlier in the sentence - this is because we can assume that the entire sentence is known at the time of tagging.

### Dataset

We will train our model on the [Engligh Dependencies Treebank](https://github.com/UniversalDependencies/UD_English).
You can download this dataset by running the following lines:

In [1]:
!pip install gdown
!pip install ray

Collecting gdown
  Downloading gdown-5.2.0-py3-none-any.whl.metadata (5.8 kB)
Downloading gdown-5.2.0-py3-none-any.whl (18 kB)
Installing collected packages: gdown
Successfully installed gdown-5.2.0


In [2]:
import gdown
url = "https://raw.githubusercontent.com/UniversalDependencies/UD_English-EWT/master/en_ewt-ud-dev.conllu"
output = "en_ewt-ud-dev.conllu"
gdown.download(url, output, quiet=False)

Downloading...
From: https://raw.githubusercontent.com/UniversalDependencies/UD_English-EWT/master/en_ewt-ud-dev.conllu
To: /kaggle/working/en_ewt-ud-dev.conllu
1.76MB [00:00, 48.5MB/s]                  


'en_ewt-ud-dev.conllu'

In [3]:
url = "https://raw.githubusercontent.com/UniversalDependencies/UD_English-EWT/master/en_ewt-ud-test.conllu"
output = "en_ewt-ud-test.conllu"
gdown.download(url, output, quiet=False)

Downloading...
From: https://raw.githubusercontent.com/UniversalDependencies/UD_English-EWT/master/en_ewt-ud-test.conllu
To: /kaggle/working/en_ewt-ud-test.conllu
1.77MB [00:00, 45.1MB/s]                  


'en_ewt-ud-test.conllu'

In [4]:
url = "https://raw.githubusercontent.com/UniversalDependencies/UD_English-EWT/master/en_ewt-ud-train.conllu"
output = "en_ewt-ud-train.conllu"
gdown.download(url, output, quiet=False)

Downloading...
From: https://raw.githubusercontent.com/UniversalDependencies/UD_English-EWT/master/en_ewt-ud-train.conllu
To: /kaggle/working/en_ewt-ud-train.conllu
13.9MB [00:00, 141MB/s]                    


'en_ewt-ud-train.conllu'

The individual data instances come in chunks seperated by blank lines. Each chunk consists of a few starting comments, and then lines of tab-seperated fields. The fields we are interested in are the 1st and 3rd, which contain the tokenized word and POS tag respectively. An example chunk is shown below:

```
# sent_id = answers-20111107193044AAvUYBv_ans-0023
# text = Hope you have a crapload of fun!
1	Hope	hope	VERB	VBP	Mood=Ind|Tense=Pres|VerbForm=Fin	0	root	0:root	_
2	you	you	PRON	PRP	Case=Nom|Person=2|PronType=Prs	3	nsubj	3:nsubj	_
3	have	have	VERB	VBP	Mood=Ind|Tense=Pres|VerbForm=Fin	1	ccomp	1:ccomp	_
4	a	a	DET	DT	Definite=Ind|PronType=Art	5	det	5:det	_
5	crapload	crapload	NOUN	NN	Number=Sing	3	obj	3:obj	_
6	of	of	ADP	IN	_	7	case	7:case	_
7	fun	fun	NOUN	NN	Number=Sing	5	nmod	5:nmod	SpaceAfter=No
8	!	!	PUNCT	.	_	1	punct	1:punct	_

```

As with most real world data, we are going to need to do some preprocessing before we can use it. The first thing we are going to need is a `Vocabulary` to map words/POS tags to integer ids. Here is a more full-featured implementation than what we used in the first tutorial:

In [5]:
from collections import Counter


class Vocab(object):
    def __init__(self, iter, max_size=None, sos_token=None, eos_token=None, unk_token=None):
        """Initialize the vocabulary.
        Args:
            iter: An iterable which produces sequences of tokens used to update
                the vocabulary.
            max_size: (Optional) Maximum number of tokens in the vocabulary.
            sos_token: (Optional) Token denoting the start of a sequence.
            eos_token: (Optional) Token denoting the end of a sequence.
            unk_token: (Optional) Token denoting an unknown element in a
                sequence.
        """
        self.max_size = max_size
        self.pad_token = '<pad>'
        self.sos_token = sos_token
        self.eos_token = eos_token
        self.unk_token = unk_token

        # Add special tokens.
        id2word = [self.pad_token]
        if sos_token is not None:
            id2word.append(self.sos_token)
        if eos_token is not None:
            id2word.append(self.eos_token)
        if unk_token is not None:
            id2word.append(self.unk_token)

        # Update counter with token counts.
        counter = Counter()
        for x in iter:
            counter.update(x)

        # Extract lookup tables.
        if max_size is not None:
            counts = counter.most_common(max_size)
        else:
            counts = counter.items()
            counts = sorted(counts, key=lambda x: x[1], reverse=True)
        words = [x[0] for x in counts]
        id2word.extend(words)
        word2id = {x: i for i, x in enumerate(id2word)}

        self._id2word = id2word
        self._word2id = word2id

    def __len__(self):
        return len(self._id2word)

    def word2id(self, word):
        """Map a word in the vocabulary to its unique integer id.
        Args:
            word: Word to lookup.
        Returns:
            id: The integer id of the word being looked up.
        """
        if word in self._word2id:
            return self._word2id[word]
        elif self.unk_token is not None:
            return self._word2id[self.unk_token]
        else:
            raise KeyError('Word "%s" not in vocabulary.' % word)

    def id2word(self, id):
        """Map an integer id to its corresponding word in the vocabulary.
        Args:
            id: Integer id of the word being looked up.
        Returns:
            word: The corresponding word.
        """
        return self._id2word[id]

Now we need to parse the .conllu files and extract the data needed for our model. The good news is that the file is only a few megabytes so we can store everything in memory. Rather than creating a generator from scratch like we did in the previous tutorial, we will instead showcase the `torch.utils.data.Dataset` class. There are two main things that a `Dataset` must have:

1. A `__len__` method which let's you know how many data points are in the dataset.
2. A `__getitem__` method which is used to support integer indexing.

Here's an example of how to define these methods for the English Dependencies Treebank data.

In [6]:
import re
from torch.utils.data import Dataset


class Annotation(object):
    def __init__(self):
        """A helper object for storing annotation data."""
        self.tokens = []
        self.pos_tags = []


class CoNLLDataset(Dataset):
    def __init__(self, fname):
        """Initializes the CoNLLDataset.
        Args:
            fname: The .conllu file to load data from.
        """
        self.fname = fname
        self.annotations = self.process_conll_file(fname)
        self.token_vocab = Vocab([x.tokens for x in self.annotations],
                                 unk_token='<unk>')
        self.pos_vocab = Vocab([x.pos_tags for x in self.annotations])

    def __len__(self):
        return len(self.annotations)

    def __getitem__(self, idx):
        annotation = self.annotations[idx]
        input = [self.token_vocab.word2id(x) for x in annotation.tokens]
        target = [self.pos_vocab.word2id(x) for x in annotation.pos_tags]
        return input, target

    def process_conll_file(self, fname):
        # Read the entire file.
        with open(fname, 'r') as f:
            raw_text = f.read()
        # Split into chunks on blank lines.
        chunks = re.split(r'^\n', raw_text, flags=re.MULTILINE)
        # Process each chunk into an annotation.
        annotations = []
        for chunk in chunks:
            annotation = Annotation()
            lines = chunk.split('\n')
            # Iterate over all lines in the chunk.
            for line in lines:
                # If line is empty ignore it.
                if len(line)==0:
                    continue
                # If line is a commend ignore it.
                if line[0] == '#':
                    continue
                # Otherwise split on tabs and retrieve the token and the
                # POS tag fields.
                fields = line.split('\t')
                annotation.tokens.append(fields[1])
                annotation.pos_tags.append(fields[3])
            if (len(annotation.tokens) > 0) and (len(annotation.pos_tags) > 0):
                annotations.append(annotation)
        return annotations

And let's see how this is used in practice.

In [7]:
dataset = CoNLLDataset('en_ewt-ud-train.conllu')

In [8]:
input, target = dataset[0]
print('Example input: %s\n' % input)
print('Example target: %s\n' % target)
print('Translated input: %s\n' % ' '.join(dataset.token_vocab.id2word(x) for x in input))
print('Translated target: %s\n' % ' '.join(dataset.pos_vocab.id2word(x) for x in target))

Example input: [266, 16, 5249, 45, 295, 703, 1154, 4233, 10099, 595, 16, 10100, 4, 3, 6865, 35, 3, 6866, 10, 3, 498, 8, 6867, 4, 758, 3, 2224, 1605, 2]

Example target: [9, 2, 9, 2, 7, 1, 3, 9, 9, 9, 2, 9, 2, 6, 1, 5, 6, 1, 5, 6, 1, 5, 9, 2, 5, 6, 7, 1, 2]

Translated input: Al - Zaman : American forces killed Shaikh Abdullah al - Ani , the preacher at the mosque in the town of Qaim , near the Syrian border .

Translated target: PROPN PUNCT PROPN PUNCT ADJ NOUN VERB PROPN PROPN PROPN PUNCT PROPN PUNCT DET NOUN ADP DET NOUN ADP DET NOUN ADP PROPN PUNCT ADP DET ADJ NOUN PUNCT



The main upshot of using the `Dataset` class is that it makes accessing training/test observations very simple. Accordingly, this makes batch generation easy since all we need to do is randomly choose numbers and then grab those observations from the dataset - PyTorch includes a `torch.utils.data.DataLoader` object which handles this for you. In fact, if we were not working with sequential data we would be able to proceed straight to the modeling step from here. However, since we are working with sequential data there is one last pesky issue we need to handle - padding.

The issue is that when we are given a batch of outputs from `CoNLLDataset`, the sequences in the batch are likely to all be of different length. To deal with this, we define a custom `collate_annotations` function which adds padding to the end of the sequences in the batch so that they are all the same length. In addition, we'll have this function take care of loading the data into tensors and ensuring that the tensor dimensions are in the order expected by PyTorch.

Oh and one last annoying thing - to deal with some of the issues caused by using padded data we will be using a function called `torch.nn.utils.rnn.pack_padded_sequences` in our model later on. All you need to know now is that this function expects our sequences in the batch to be sorted in terms of descending length, and that we know the lengths of each sequence. So we will make sure that the `collate_annotations` function performs this sorting for us and returns the sequence lengths in addition to the input and target tensors.

In [9]:
import torch
from torch.autograd import Variable


def pad(sequences, max_length, pad_value=0):
    """Pads a list of sequences.
    Args:
        sequences: A list of sequences to be padded.
        max_length: The length to pad to.
        pad_value: The value used for padding.
    Returns:
        A list of padded sequences.
    """
    out = []
    for sequence in sequences:
        padded = sequence + [0]*(max_length - len(sequence))
        out.append(padded)
    return out


def collate_annotations(batch):
    """Function used to collate data returned by CoNLLDataset."""
    # Get inputs, targets, and lengths.
    inputs, targets = zip(*batch)
    lengths = [len(x) for x in inputs]
    # Sort by length.
    sort = sorted(zip(inputs, targets, lengths),
                  key=lambda x: x[2],
                  reverse=True)
    inputs, targets, lengths = zip(*sort)
    # Pad.
    max_length = max(lengths)
    inputs = pad(inputs, max_length)
    targets = pad(targets, max_length)
    # Transpose.
    inputs = list(map(list, zip(*inputs)))
    targets = list(map(list, zip(*targets)))
    # Convert to PyTorch variables.
    inputs = Variable(torch.LongTensor(inputs))
    targets = Variable(torch.LongTensor(targets))
    lengths = Variable(torch.LongTensor(lengths))
    if torch.cuda.is_available():
        inputs = inputs.cuda()
        targets = targets.cuda()
        lengths = lengths.cuda()
    return inputs, targets, lengths

Again let's see how this is used in practice:

In [10]:
from torch.utils.data import DataLoader


for inputs, targets, lengths in DataLoader(dataset, batch_size=16, collate_fn=collate_annotations):
    print('Inputs: %s\n' % inputs.data)
    print('Targets: %s\n' % targets.data)
    print('Lengths: %s\n' % lengths.data)

    # Usually we'd keep sampling batches, but here we'll just break
    break

Inputs: tensor([[   28,  1083,   266,    28,    30,   106,    68,   266,   499,   625,
         10103,   121,  1212,    28,    28,   108],
        [10106,     3,    16,  1713,  6874,  6878, 10115,    16,  1030,   106,
            45, 10123,     8,  3581,  1081,  1606],
        [   10,  5252,  5249,  4237,    11,    11,    46,  5249,  4239,  1712,
           555,     4,    69,    60,    19,    54],
        [  180,    19,    45,     8,    10,     3,   185,    45,    51,     8,
          1849,  6874,    60,  1370,   159,    41],
        [   11,   343,   295, 10118, 10125,   759,   138,  5253, 10121,     7,
          2018,  3111,   159,    10,   450,    19],
        [ 4234,   163,   703,  3111,   180,  1031,     8,  1154,     7, 10101,
            12,     4,   450,     3,    44, 10111],
        [    5,     5,  1154,  2018,     6,    10,     3,     7, 10122, 10102,
            31,   151,    44, 10112,     3,     3],
        [    3,   408,  4233,    12,    50,     3,  2755,   807,  3112,    

### Model

We will use the following architecture:

1. Embed the input words into a 200 dimensional vector space.
2. Feed the word embeddings into a (bidirectional) GRU.
3. Feed the GRU outputs into a fully connected layer.
4. Use a softmax activation to get the probabilities of the different labels.

There is one complication which arises during the forward computation. As was noted in the dataset section, the input sequences are padded. This causes an issue since we do not want to waste computational resources feeding these pad tokens into the RNN. In PyTorch, we can deal with this issue by converting the sequence data into a  `torch.nn.utils.rnn.PackedSequence` object before feeding it into the RNN. In essence, a `PackedSequence` flattens the sequence and batch dimensions of a tensor, and also contains metadata so that PyTorch knows when to re-initialize the hidden state when fed into a recurrent layer. If this seems confusing, do not worry. To use the `PackedSequence` in practice you will almost always perform the following steps:

1. Before feeding data into a recurrent layer, transform it into a `PackedSequence` by using the function `torch.nn.utils.rnn.pack_padded_sequence()`.
2. Feed the `PackedSequence` into the recurrent layer.
3. Transform the output back into a regular tensor by using the function `torch.nn.utils.rnn.pad_packed_sequence()`.

See the model implementation below for a working example:

In [11]:
from torch import nn
from torch.nn.utils.rnn import pack_padded_sequence, pad_packed_sequence

class Tagger(nn.Module):
    def __init__(self,
                 input_vocab_size,
                 output_vocab_size,
                 embedding_dim=64,
                 hidden_size=64,
                 bidirectional=True):
        """Initializes the tagger.

        Args:
            input_vocab_size: Size of the input vocabulary.
            output_vocab_size: Size of the output vocabulary.
            embedding_dim: Dimension of the word embeddings.
            hidden_size: Number of units in each LSTM hidden layer.
            bidirectional: Whether or not to use a bidirectional rnn.
        """
        super(Tagger, self).__init__()

        # Store parameters
        self.input_vocab_size = input_vocab_size
        self.output_vocab_size = output_vocab_size
        self.embedding_dim = embedding_dim
        self.hidden_size = hidden_size
        self.bidirectional = bidirectional

        # Define layers
        self.word_embeddings = nn.Embedding(input_vocab_size, embedding_dim,
                                            padding_idx=0)
        self.rnn = nn.GRU(embedding_dim, hidden_size,
                          bidirectional=bidirectional,
                          dropout=0.9)
        if bidirectional:
            self.fc = nn.Linear(2*hidden_size, output_vocab_size)
        else:
            self.fc = nn.Linear(hidden_size, output_vocab_size)
        self.activation = nn.LogSoftmax(dim=2)

    def forward(self, x, lengths=None, hidden=None):
        """Computes a forward pass of the language model.

        Args:
            x: A LongTensor w/ dimension [seq_len, batch_size].
            lengths: The lengths of the sequences in x.
            hidden: Hidden state to be fed into the lstm.

        Returns:
            net: the output representation for each word in the sequence.
            hidden: the hidden state at the last timestamp.
        """
        seq_len, batch_size = x.size()

        # If no hidden state is provided, then default to zeros.
        if hidden is None:
            if self.bidirectional:
                num_directions = 2
            else:
                num_directions = 1
            hidden = Variable(torch.zeros(num_directions, batch_size, self.hidden_size))
            if torch.cuda.is_available():
                hidden = hidden.cuda()

        net = self.word_embeddings(x)
        # Pack before feeding into the RNN.
        if lengths is not None:
            lengths = lengths.data.view(-1).tolist()
            net = pack_padded_sequence(net, lengths)
        net, hidden = self.rnn(net, hidden)
        # Unpack after
        if lengths is not None:
            net, _ = pad_packed_sequence(net)
        net = self.fc(net)
        net = self.activation(net)

        return net, hidden

### Training

Training is pretty much exactly the same as in the previous tutorial. There is one catch - we don't want to evaluate our loss function on pad tokens. This is easily fixed by setting the weight of the pad class to zero.

In [12]:
pip install -U ipywidgets

Collecting ipywidgets
  Downloading ipywidgets-8.1.3-py3-none-any.whl.metadata (2.4 kB)
Collecting widgetsnbextension~=4.0.11 (from ipywidgets)
  Downloading widgetsnbextension-4.0.11-py3-none-any.whl.metadata (1.6 kB)
Collecting jupyterlab-widgets~=3.0.11 (from ipywidgets)
  Downloading jupyterlab_widgets-3.0.11-py3-none-any.whl.metadata (4.1 kB)
Downloading ipywidgets-8.1.3-py3-none-any.whl (139 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m139.4/139.4 kB[0m [31m4.3 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading jupyterlab_widgets-3.0.11-py3-none-any.whl (214 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m214.4/214.4 kB[0m [31m12.5 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading widgetsnbextension-4.0.11-py3-none-any.whl (2.3 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.3/2.3 MB[0m [31m62.5 MB/s[0m eta [36m0:00:00[0m:00:01[0m
[?25hInstalling collected packages: widgetsnbextension, jupyterlab-widgets, ipywidgets

In [13]:
from functools import partial
import os
import tempfile
from pathlib import Path
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import random_split
import torchvision
import torchvision.transforms as transforms
from ray import tune
from ray import train
from ray.train import Checkpoint, get_checkpoint
from ray.tune.schedulers import ASHAScheduler
import ray.cloudpickle as pickle

In [18]:
def tuning(config):
    device = "cuda" if torch.cuda.is_available() else "cpu"
    batch_size = config["batch_size"]
    train_dataset = CoNLLDataset('/kaggle/working/en_ewt-ud-train.conllu')
    dev_dataset = CoNLLDataset('/kaggle/working/en_ewt-ud-dev.conllu')
    input_vocab_size = len(train_dataset.token_vocab)
    output_vocab_size = len(train_dataset.pos_vocab)
    model = Tagger(input_vocab_size, output_vocab_size, config["embed_dim"], config["hidden_size"]).to(device)
    weight = torch.ones(output_vocab_size)
    weight[0] = 0
    if torch.cuda.is_available():
        weight = weight.cuda()

    # Initialize loss function and optimizer.
    criterion = torch.nn.NLLLoss(weight)
    optimizer = optim.Adam(model.parameters(), lr=config["lr"])
    
    checkpoint = get_checkpoint()
    checkpoint = False
    if checkpoint:
        with checkpoint.as_directory() as checkpoint_dir:
            data_path = Path(checkpoint_dir) / "data.pkl"
            with open(data_path, "rb") as fp:
                checkpoint_state = pickle.load(fp)
            start_epoch = checkpoint_state["epoch"]
            model.load_state_dict(checkpoint_state["net_state_dict"])
            optimizer.load_state_dict(checkpoint_state["optimizer_state_dict"])
    else:
        start_epoch = 0
    
    trainloader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True,
                         collate_fn=collate_annotations)
    valloader = DataLoader(dev_dataset, batch_size=batch_size, shuffle=False,
                        collate_fn=collate_annotations)
    
    for epoch in range(start_epoch, 10):  # loop over the dataset multiple times
        running_loss = 0.0
        epoch_steps = 0
        i = 0
        model.train()
        for inputs, targets, lengths in trainloader:
            # zero the parameter gradients
            optimizer.zero_grad()

            # forward + backward + optimize
            outputs, _ = model(inputs, lengths=lengths)

            outputs = outputs.view(-1, output_vocab_size)
            targets = targets.view(-1)
            loss = criterion(outputs, targets)
            running_loss += loss.item()
            loss.backward()
            optimizer.step()

            # print statistics
            epoch_steps += 1
#             if i % 200 == 199:  # print every 200 mini-batches
#                 print(
#                     "[%d, %5d] loss: %.3f"
#                     % (epoch + 1, i + 1, running_loss / 200)
#                 )
#                 running_loss = 0.0
            i += 1
        # Validation loss
        val_loss = 0.0
        val_steps = 0
        total = 0
        correct = 0
        with torch.no_grad():
            model.eval()
            for inputs, targets, lengths in trainloader:
                outputs, _ = model(inputs, lengths=lengths)

                outputs = outputs.view(-1, output_vocab_size)
                targets = targets.view(-1)

                loss = criterion(outputs, targets)
                val_loss += loss.item()
                val_steps += 1
#                 print("loss:", val_loss / val_steps)

        checkpoint_data = {
            "epoch": epoch,
            "net_state_dict": model.state_dict(),
            "optimizer_state_dict": optimizer.state_dict(),
        }
        with tempfile.TemporaryDirectory() as checkpoint_dir:
            data_path = Path(checkpoint_dir) / "data.pkl"
            with open(data_path, "wb") as fp:
                pickle.dump(checkpoint_data, fp)

            checkpoint = Checkpoint.from_directory(checkpoint_dir)
            train.report(
                {"loss": val_loss / val_steps},
                checkpoint=checkpoint,
            )

    print("Finished Training")

In [19]:
config = {
    "embed_dim": tune.choice([2 ** i for i in range(6, 10)]),
    "hidden_size": tune.choice([2 ** i for i in range(6, 11)]),
    "lr": tune.loguniform(1e-5, 1e-3),
    "batch_size": tune.choice([32, 64, 128, 256])
}

In [20]:
from functools import partial

scheduler = ASHAScheduler(
    metric="loss",
    mode="min",
    max_t=12000, # max time before cutting off training for each instance (in seconds)
    grace_period=30, # do not cut off training younger than this many seconds
    reduction_factor=2,
)

result = tune.run(
    partial(tuning),
    resources_per_trial={"cpu": 1, "gpu": 0.25},
    config=config,
    num_samples=48,
    scheduler=scheduler,
    checkpoint_at_end=False)

2024-06-17 16:15:04,204	INFO tune.py:583 -- [output] This uses the legacy output and progress reporter, as Jupyter notebooks are not supported by the new engine, yet. For more information, please see https://github.com/ray-project/ray/issues/36949


0,1
Current time:,2024-06-17 16:53:11
Running for:,00:38:06.80
Memory:,3.4/31.4 GiB

Trial name,status,loc,batch_size,embed_dim,hidden_size,lr,iter,total time (s),loss
tuning_c16fa_00000,TERMINATED,172.19.2.2:1099,128,256,1024,3.37498e-05,10,176.445,0.623635
tuning_c16fa_00001,TERMINATED,172.19.2.2:1101,32,256,256,6.57673e-05,10,179.45,0.351704
tuning_c16fa_00002,TERMINATED,172.19.2.2:1103,64,256,64,0.000129328,10,139.954,0.43629
tuning_c16fa_00003,TERMINATED,172.19.2.2:1107,32,128,128,1.71778e-05,10,176.315,1.05099
tuning_c16fa_00004,TERMINATED,172.19.2.2:1273,32,128,256,1.76021e-05,10,190.106,0.897592
tuning_c16fa_00005,TERMINATED,172.19.2.2:1360,64,128,64,7.60462e-05,10,145.105,0.784403
tuning_c16fa_00006,TERMINATED,172.19.2.2:1363,32,512,1024,1.27219e-05,10,343.209,0.470761
tuning_c16fa_00007,TERMINATED,172.19.2.2:1364,256,256,128,3.68937e-05,10,57.2047,1.54627
tuning_c16fa_00008,TERMINATED,172.19.2.2:1495,128,128,1024,0.000731538,10,201.053,0.00719291
tuning_c16fa_00009,TERMINATED,172.19.2.2:1557,64,512,256,3.15856e-05,10,163.568,0.51786




[36m(func pid=1107)[0m [1,   200] loss: 2.867


Trial name,loss,should_checkpoint
tuning_c16fa_00000,0.623635,True
tuning_c16fa_00001,0.351704,True
tuning_c16fa_00002,0.43629,True
tuning_c16fa_00003,1.05099,True
tuning_c16fa_00004,0.897592,True
tuning_c16fa_00005,0.784403,True
tuning_c16fa_00006,0.470761,True
tuning_c16fa_00007,1.54627,True
tuning_c16fa_00008,0.00719291,True
tuning_c16fa_00009,0.51786,True


[36m(func pid=1103)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00002_2_batch_size=64,embed_dim=256,hidden_size=64,lr=0.0001_2024-06-17_16-15-04/checkpoint_000000)


[36m(func pid=1107)[0m [2,   200] loss: 2.487[32m [repeated 2x across cluster][0m


[36m(func pid=1103)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00002_2_batch_size=64,embed_dim=256,hidden_size=64,lr=0.0001_2024-06-17_16-15-04/checkpoint_000001)[32m [repeated 4x across cluster][0m
[36m(func pid=1107)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00003_3_batch_size=32,embed_dim=128,hidden_size=128,lr=0.0000_2024-06-17_16-15-04/checkpoint_000001)
[36m(func pid=1099)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00000_0_batch_size=128,embed_dim=256,hidden_size=1024,lr=0.0000_2024-06-17_16-15-04/checkpoint_000001)


[36m(func pid=1107)[0m [3,   200] loss: 2.076[32m [repeated 2x across cluster][0m


[36m(func pid=1099)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00000_0_batch_size=128,embed_dim=256,hidden_size=1024,lr=0.0000_2024-06-17_16-15-04/checkpoint_000002)[32m [repeated 3x across cluster][0m


[36m(func pid=1107)[0m [4,   200] loss: 1.803[32m [repeated 2x across cluster][0m


[36m(func pid=1103)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00002_2_batch_size=64,embed_dim=256,hidden_size=64,lr=0.0001_2024-06-17_16-15-04/checkpoint_000004)[32m [repeated 4x across cluster][0m


[36m(func pid=1107)[0m [5,   200] loss: 1.603[32m [repeated 2x across cluster][0m


[36m(func pid=1103)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00002_2_batch_size=64,embed_dim=256,hidden_size=64,lr=0.0001_2024-06-17_16-15-04/checkpoint_000005)[32m [repeated 4x across cluster][0m
[36m(func pid=1101)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00001_1_batch_size=32,embed_dim=256,hidden_size=256,lr=0.0001_2024-06-17_16-15-04/checkpoint_000004)[32m [repeated 3x across cluster][0m


[36m(func pid=1107)[0m [6,   200] loss: 1.446[32m [repeated 2x across cluster][0m


[36m(func pid=1103)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00002_2_batch_size=64,embed_dim=256,hidden_size=64,lr=0.0001_2024-06-17_16-15-04/checkpoint_000006)
[36m(func pid=1107)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00003_3_batch_size=32,embed_dim=128,hidden_size=128,lr=0.0000_2024-06-17_16-15-04/checkpoint_000005)


[36m(func pid=1107)[0m [7,   200] loss: 1.329[32m [repeated 2x across cluster][0m


[36m(func pid=1103)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00002_2_batch_size=64,embed_dim=256,hidden_size=64,lr=0.0001_2024-06-17_16-15-04/checkpoint_000008)[32m [repeated 4x across cluster][0m


[36m(func pid=1107)[0m [8,   200] loss: 1.226[32m [repeated 2x across cluster][0m


[36m(func pid=1103)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00002_2_batch_size=64,embed_dim=256,hidden_size=64,lr=0.0001_2024-06-17_16-15-04/checkpoint_000009)[32m [repeated 4x across cluster][0m


[36m(func pid=1103)[0m Finished Training
[36m(func pid=1101)[0m [8,   200] loss: 0.437
[36m(func pid=1107)[0m [9,   200] loss: 1.154


[36m(func pid=1101)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00001_1_batch_size=32,embed_dim=256,hidden_size=256,lr=0.0001_2024-06-17_16-15-04/checkpoint_000007)[32m [repeated 3x across cluster][0m


[36m(func pid=1273)[0m [1,   200] loss: 2.808[32m [repeated 2x across cluster][0m


[36m(func pid=1099)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00000_0_batch_size=128,embed_dim=256,hidden_size=1024,lr=0.0000_2024-06-17_16-15-04/checkpoint_000008)
[36m(func pid=1107)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00003_3_batch_size=32,embed_dim=128,hidden_size=128,lr=0.0000_2024-06-17_16-15-04/checkpoint_000008)


[36m(func pid=1107)[0m [10,   200] loss: 1.093
[36m(func pid=1101)[0m [10,   200] loss: 0.379


[36m(func pid=1273)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00004_4_batch_size=32,embed_dim=128,hidden_size=256,lr=0.0000_2024-06-17_16-15-04/checkpoint_000000)[32m [repeated 2x across cluster][0m


[36m(func pid=1099)[0m Finished Training
[36m(func pid=1273)[0m [2,   200] loss: 2.122


[36m(func pid=1101)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00001_1_batch_size=32,embed_dim=256,hidden_size=256,lr=0.0001_2024-06-17_16-15-04/checkpoint_000009)[32m [repeated 3x across cluster][0m
[36m(func pid=1273)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00004_4_batch_size=32,embed_dim=128,hidden_size=256,lr=0.0000_2024-06-17_16-15-04/checkpoint_000001)


[36m(func pid=1273)[0m [3,   200] loss: 1.719
[36m(func pid=1101)[0m Finished Training[32m [repeated 2x across cluster][0m


[36m(func pid=1364)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00007_7_batch_size=256,embed_dim=256,hidden_size=128,lr=0.0000_2024-06-17_16-15-04/checkpoint_000000)


[36m(func pid=1363)[0m [1,   200] loss: 2.375


[36m(func pid=1364)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00007_7_batch_size=256,embed_dim=256,hidden_size=128,lr=0.0000_2024-06-17_16-15-04/checkpoint_000001)[32m [repeated 2x across cluster][0m


[36m(func pid=1273)[0m [4,   200] loss: 1.462


[36m(func pid=1364)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00007_7_batch_size=256,embed_dim=256,hidden_size=128,lr=0.0000_2024-06-17_16-15-04/checkpoint_000002)[32m [repeated 2x across cluster][0m
[36m(func pid=1364)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00007_7_batch_size=256,embed_dim=256,hidden_size=128,lr=0.0000_2024-06-17_16-15-04/checkpoint_000003)
[36m(func pid=1360)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00005_5_batch_size=64,embed_dim=128,hidden_size=64,lr=0.0001_2024-06-17_16-15-04/checkpoint_000001)
[36m(func pid=1364)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00007_7_batch_size=256,embed_dim=256,hidden_si

[36m(func pid=1273)[0m [5,   200] loss: 1.289


[36m(func pid=1364)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00007_7_batch_size=256,embed_dim=256,hidden_size=128,lr=0.0000_2024-06-17_16-15-04/checkpoint_000006)
[36m(func pid=1360)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00005_5_batch_size=64,embed_dim=128,hidden_size=64,lr=0.0001_2024-06-17_16-15-04/checkpoint_000002)


[36m(func pid=1363)[0m [2,   200] loss: 1.228


[36m(func pid=1364)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00007_7_batch_size=256,embed_dim=256,hidden_size=128,lr=0.0000_2024-06-17_16-15-04/checkpoint_000007)
[36m(func pid=1273)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00004_4_batch_size=32,embed_dim=128,hidden_size=256,lr=0.0000_2024-06-17_16-15-04/checkpoint_000004)


[36m(func pid=1273)[0m [6,   200] loss: 1.172
[36m(func pid=1364)[0m Finished Training


[36m(func pid=1364)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00007_7_batch_size=256,embed_dim=256,hidden_size=128,lr=0.0000_2024-06-17_16-15-04/checkpoint_000009)[32m [repeated 3x across cluster][0m
[36m(func pid=1363)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00006_6_batch_size=32,embed_dim=512,hidden_size=1024,lr=0.0000_2024-06-17_16-15-04/checkpoint_000001)
[36m(func pid=1360)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00005_5_batch_size=64,embed_dim=128,hidden_size=64,lr=0.0001_2024-06-17_16-15-04/checkpoint_000004)


[36m(func pid=1273)[0m [7,   200] loss: 1.086
[36m(func pid=1363)[0m [3,   200] loss: 0.890


[36m(func pid=1360)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00005_5_batch_size=64,embed_dim=128,hidden_size=64,lr=0.0001_2024-06-17_16-15-04/checkpoint_000005)[32m [repeated 2x across cluster][0m
[36m(func pid=1273)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00004_4_batch_size=32,embed_dim=128,hidden_size=256,lr=0.0000_2024-06-17_16-15-04/checkpoint_000006)
[36m(func pid=1495)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00008_8_batch_size=128,embed_dim=128,hidden_size=1024,lr=0.0007_2024-06-17_16-15-04/checkpoint_000000)


[36m(func pid=1273)[0m [8,   200] loss: 1.019


[36m(func pid=1363)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00006_6_batch_size=32,embed_dim=512,hidden_size=1024,lr=0.0000_2024-06-17_16-15-04/checkpoint_000002)[32m [repeated 2x across cluster][0m
[36m(func pid=1495)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00008_8_batch_size=128,embed_dim=128,hidden_size=1024,lr=0.0007_2024-06-17_16-15-04/checkpoint_000001)
[36m(func pid=1273)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00004_4_batch_size=32,embed_dim=128,hidden_size=256,lr=0.0000_2024-06-17_16-15-04/checkpoint_000007)


[36m(func pid=1363)[0m [4,   200] loss: 0.748
[36m(func pid=1273)[0m [9,   200] loss: 0.972


[36m(func pid=1360)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00005_5_batch_size=64,embed_dim=128,hidden_size=64,lr=0.0001_2024-06-17_16-15-04/checkpoint_000008)[32m [repeated 2x across cluster][0m
[36m(func pid=1363)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00006_6_batch_size=32,embed_dim=512,hidden_size=1024,lr=0.0000_2024-06-17_16-15-04/checkpoint_000003)[32m [repeated 3x across cluster][0m


[36m(func pid=1273)[0m [10,   200] loss: 0.920


[36m(func pid=1360)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00005_5_batch_size=64,embed_dim=128,hidden_size=64,lr=0.0001_2024-06-17_16-15-04/checkpoint_000009)


[36m(func pid=1360)[0m Finished Training
[36m(func pid=1363)[0m [5,   200] loss: 0.663


[36m(func pid=1495)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00008_8_batch_size=128,embed_dim=128,hidden_size=1024,lr=0.0007_2024-06-17_16-15-04/checkpoint_000003)


[36m(func pid=1273)[0m Finished Training


[36m(func pid=1273)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00004_4_batch_size=32,embed_dim=128,hidden_size=256,lr=0.0000_2024-06-17_16-15-04/checkpoint_000009)
[36m(func pid=1495)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00008_8_batch_size=128,embed_dim=128,hidden_size=1024,lr=0.0007_2024-06-17_16-15-04/checkpoint_000004)
[36m(func pid=1557)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00009_9_batch_size=64,embed_dim=512,hidden_size=256,lr=0.0000_2024-06-17_16-15-04/checkpoint_000000)[32m [repeated 2x across cluster][0m


[36m(func pid=1363)[0m [6,   200] loss: 0.614


[36m(func pid=1615)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00010_10_batch_size=64,embed_dim=256,hidden_size=256,lr=0.0001_2024-06-17_16-15-04/checkpoint_000000)
[36m(func pid=1495)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00008_8_batch_size=128,embed_dim=128,hidden_size=1024,lr=0.0007_2024-06-17_16-15-04/checkpoint_000005)
[36m(func pid=1557)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00009_9_batch_size=64,embed_dim=512,hidden_size=256,lr=0.0000_2024-06-17_16-15-04/checkpoint_000001)
[36m(func pid=1615)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00010_10_batch_size=64,embed_dim=256,hidden_size=256,lr=0.0001_2024-06-17_16-15-04

[36m(func pid=1363)[0m [7,   200] loss: 0.569


[36m(func pid=1557)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00009_9_batch_size=64,embed_dim=512,hidden_size=256,lr=0.0000_2024-06-17_16-15-04/checkpoint_000003)
[36m(func pid=1495)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00008_8_batch_size=128,embed_dim=128,hidden_size=1024,lr=0.0007_2024-06-17_16-15-04/checkpoint_000007)
[36m(func pid=1615)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00010_10_batch_size=64,embed_dim=256,hidden_size=256,lr=0.0001_2024-06-17_16-15-04/checkpoint_000003)
[36m(func pid=1557)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00009_9_batch_size=64,embed_dim=512,hidden_size=256,lr=0.0000_2024-06-17_16-15-04/

[36m(func pid=1363)[0m [8,   200] loss: 0.537
[36m(func pid=1495)[0m Finished Training


[36m(func pid=1495)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00008_8_batch_size=128,embed_dim=128,hidden_size=1024,lr=0.0007_2024-06-17_16-15-04/checkpoint_000009)[32m [repeated 2x across cluster][0m
[36m(func pid=1557)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00009_9_batch_size=64,embed_dim=512,hidden_size=256,lr=0.0000_2024-06-17_16-15-04/checkpoint_000006)
[36m(func pid=1615)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00010_10_batch_size=64,embed_dim=256,hidden_size=256,lr=0.0001_2024-06-17_16-15-04/checkpoint_000006)
[36m(func pid=1682)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00011_11_batch_size=128,embed_dim=64,hidden_

[36m(func pid=1363)[0m [9,   200] loss: 0.512


[36m(func pid=1682)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00011_11_batch_size=128,embed_dim=64,hidden_size=128,lr=0.0000_2024-06-17_16-15-04/checkpoint_000001)[32m [repeated 3x across cluster][0m
[36m(func pid=1682)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00011_11_batch_size=128,embed_dim=64,hidden_size=128,lr=0.0000_2024-06-17_16-15-04/checkpoint_000002)
[36m(func pid=1557)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00009_9_batch_size=64,embed_dim=512,hidden_size=256,lr=0.0000_2024-06-17_16-15-04/checkpoint_000008)
[36m(func pid=1682)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00011_11_batch_size=128,embed_dim=64,hidden_s

[36m(func pid=1557)[0m Finished Training
[36m(func pid=1363)[0m [10,   200] loss: 0.493
[36m(func pid=1615)[0m Finished Training


[36m(func pid=1682)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00011_11_batch_size=128,embed_dim=64,hidden_size=128,lr=0.0000_2024-06-17_16-15-04/checkpoint_000005)[32m [repeated 3x across cluster][0m
[36m(func pid=1682)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00011_11_batch_size=128,embed_dim=64,hidden_size=128,lr=0.0000_2024-06-17_16-15-04/checkpoint_000006)


[36m(func pid=1773)[0m [1,   200] loss: 1.625


[36m(func pid=1682)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00011_11_batch_size=128,embed_dim=64,hidden_size=128,lr=0.0000_2024-06-17_16-15-04/checkpoint_000007)


[36m(func pid=1775)[0m [1,   200] loss: 1.388
[36m(func pid=1363)[0m Finished Training


[36m(func pid=1682)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00011_11_batch_size=128,embed_dim=64,hidden_size=128,lr=0.0000_2024-06-17_16-15-04/checkpoint_000008)[32m [repeated 2x across cluster][0m


[36m(func pid=1682)[0m Finished Training


[36m(func pid=1682)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00011_11_batch_size=128,embed_dim=64,hidden_size=128,lr=0.0000_2024-06-17_16-15-04/checkpoint_000009)[32m [repeated 2x across cluster][0m


[36m(func pid=1773)[0m [2,   200] loss: 0.663


[36m(func pid=1775)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00013_13_batch_size=32,embed_dim=128,hidden_size=1024,lr=0.0002_2024-06-17_16-15-04/checkpoint_000000)


[36m(func pid=1775)[0m [2,   200] loss: 0.707


[36m(func pid=1867)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00014_14_batch_size=256,embed_dim=256,hidden_size=256,lr=0.0000_2024-06-17_16-15-04/checkpoint_000000)
[36m(func pid=1867)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00014_14_batch_size=256,embed_dim=256,hidden_size=256,lr=0.0000_2024-06-17_16-15-04/checkpoint_000001)[32m [repeated 2x across cluster][0m


[36m(func pid=1773)[0m [3,   200] loss: 0.466


[36m(func pid=1867)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00014_14_batch_size=256,embed_dim=256,hidden_size=256,lr=0.0000_2024-06-17_16-15-04/checkpoint_000002)
[36m(func pid=1927)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00015_15_batch_size=128,embed_dim=512,hidden_size=512,lr=0.0002_2024-06-17_16-15-04/checkpoint_000000)
[36m(func pid=1775)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00013_13_batch_size=32,embed_dim=128,hidden_size=1024,lr=0.0002_2024-06-17_16-15-04/checkpoint_000001)[32m [repeated 2x across cluster][0m
[36m(func pid=1867)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00014_14_batch_size=256,embed_dim=256,hid

[36m(func pid=1773)[0m [4,   200] loss: 0.356
[36m(func pid=1775)[0m [3,   200] loss: 0.567


[36m(func pid=1867)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00014_14_batch_size=256,embed_dim=256,hidden_size=256,lr=0.0000_2024-06-17_16-15-04/checkpoint_000005)[32m [repeated 2x across cluster][0m
[36m(func pid=1867)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00014_14_batch_size=256,embed_dim=256,hidden_size=256,lr=0.0000_2024-06-17_16-15-04/checkpoint_000006)
[36m(func pid=1927)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00015_15_batch_size=128,embed_dim=512,hidden_size=512,lr=0.0002_2024-06-17_16-15-04/checkpoint_000002)
[36m(func pid=1867)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00014_14_batch_size=256,embed_dim=256,hid

[36m(func pid=1773)[0m [5,   200] loss: 0.276


[36m(func pid=1867)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00014_14_batch_size=256,embed_dim=256,hidden_size=256,lr=0.0000_2024-06-17_16-15-04/checkpoint_000008)
[36m(func pid=1775)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00013_13_batch_size=32,embed_dim=128,hidden_size=1024,lr=0.0002_2024-06-17_16-15-04/checkpoint_000002)
[36m(func pid=1867)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00014_14_batch_size=256,embed_dim=256,hidden_size=256,lr=0.0000_2024-06-17_16-15-04/checkpoint_000009)[32m [repeated 2x across cluster][0m


[36m(func pid=1867)[0m Finished Training
[36m(func pid=1775)[0m [4,   200] loss: 0.467


[36m(func pid=1927)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00015_15_batch_size=128,embed_dim=512,hidden_size=512,lr=0.0002_2024-06-17_16-15-04/checkpoint_000004)[32m [repeated 2x across cluster][0m
[36m(func pid=1927)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00015_15_batch_size=128,embed_dim=512,hidden_size=512,lr=0.0002_2024-06-17_16-15-04/checkpoint_000005)
[36m(func pid=1995)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00016_16_batch_size=256,embed_dim=128,hidden_size=512,lr=0.0006_2024-06-17_16-15-12/checkpoint_000000)


[36m(func pid=1773)[0m [7,   200] loss: 0.176[32m [repeated 2x across cluster][0m


[36m(func pid=1995)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00016_16_batch_size=256,embed_dim=128,hidden_size=512,lr=0.0006_2024-06-17_16-15-12/checkpoint_000001)[32m [repeated 3x across cluster][0m


[36m(func pid=1775)[0m [5,   200] loss: 0.383


[36m(func pid=1995)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00016_16_batch_size=256,embed_dim=128,hidden_size=512,lr=0.0006_2024-06-17_16-15-12/checkpoint_000002)[32m [repeated 2x across cluster][0m
[36m(func pid=1927)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00015_15_batch_size=128,embed_dim=512,hidden_size=512,lr=0.0002_2024-06-17_16-15-04/checkpoint_000007)[32m [repeated 2x across cluster][0m


[36m(func pid=1773)[0m [8,   200] loss: 0.143


[36m(func pid=1775)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00013_13_batch_size=32,embed_dim=128,hidden_size=1024,lr=0.0002_2024-06-17_16-15-04/checkpoint_000004)[32m [repeated 2x across cluster][0m
[36m(func pid=1773)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00012_12_batch_size=32,embed_dim=64,hidden_size=64,lr=0.0009_2024-06-17_16-15-04/checkpoint_000007)[32m [repeated 3x across cluster][0m


[36m(func pid=1775)[0m [6,   200] loss: 0.320
[36m(func pid=1773)[0m [9,   200] loss: 0.114


[36m(func pid=1927)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00015_15_batch_size=128,embed_dim=512,hidden_size=512,lr=0.0002_2024-06-17_16-15-04/checkpoint_000009)[32m [repeated 2x across cluster][0m


[36m(func pid=1927)[0m Finished Training


[36m(func pid=1773)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00012_12_batch_size=32,embed_dim=64,hidden_size=64,lr=0.0009_2024-06-17_16-15-04/checkpoint_000008)[32m [repeated 2x across cluster][0m


[36m(func pid=1773)[0m [10,   200] loss: 0.091


[36m(func pid=1995)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00016_16_batch_size=256,embed_dim=128,hidden_size=512,lr=0.0006_2024-06-17_16-15-12/checkpoint_000008)[32m [repeated 3x across cluster][0m


[36m(func pid=1775)[0m [7,   200] loss: 0.252
[36m(func pid=1995)[0m Finished Training


[36m(func pid=1995)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00016_16_batch_size=256,embed_dim=128,hidden_size=512,lr=0.0006_2024-06-17_16-15-12/checkpoint_000009)
[36m(func pid=1773)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00012_12_batch_size=32,embed_dim=64,hidden_size=64,lr=0.0009_2024-06-17_16-15-04/checkpoint_000009)
[36m(func pid=1775)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00013_13_batch_size=32,embed_dim=128,hidden_size=1024,lr=0.0002_2024-06-17_16-15-04/checkpoint_000006)[32m [repeated 2x across cluster][0m


[36m(func pid=1775)[0m [8,   200] loss: 0.195
[36m(func pid=1773)[0m Finished Training


[36m(func pid=2058)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00017_17_batch_size=64,embed_dim=512,hidden_size=128,lr=0.0000_2024-06-17_16-15-13/checkpoint_000002)[32m [repeated 2x across cluster][0m
[36m(func pid=2119)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00018_18_batch_size=256,embed_dim=512,hidden_size=1024,lr=0.0003_2024-06-17_16-15-13/checkpoint_000000)
[36m(func pid=2120)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00019_19_batch_size=64,embed_dim=256,hidden_size=512,lr=0.0000_2024-06-17_16-15-13/checkpoint_000000)
[36m(func pid=2058)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00017_17_batch_size=64,embed_dim=512,hidde

[36m(func pid=1775)[0m [9,   200] loss: 0.141


[36m(func pid=2119)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00018_18_batch_size=256,embed_dim=512,hidden_size=1024,lr=0.0003_2024-06-17_16-15-13/checkpoint_000001)
[36m(func pid=2120)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00019_19_batch_size=64,embed_dim=256,hidden_size=512,lr=0.0000_2024-06-17_16-15-13/checkpoint_000001)
[36m(func pid=2058)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00017_17_batch_size=64,embed_dim=512,hidden_size=128,lr=0.0000_2024-06-17_16-15-13/checkpoint_000004)
[36m(func pid=2119)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00018_18_batch_size=256,embed_dim=512,hidden_size=1024,lr=0.0003_2024-06-17_16-1

[36m(func pid=1775)[0m [10,   200] loss: 0.096


[36m(func pid=2119)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00018_18_batch_size=256,embed_dim=512,hidden_size=1024,lr=0.0003_2024-06-17_16-15-13/checkpoint_000003)
[36m(func pid=2120)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00019_19_batch_size=64,embed_dim=256,hidden_size=512,lr=0.0000_2024-06-17_16-15-13/checkpoint_000003)
[36m(func pid=2119)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00018_18_batch_size=256,embed_dim=512,hidden_size=1024,lr=0.0003_2024-06-17_16-15-13/checkpoint_000004)[32m [repeated 2x across cluster][0m


[36m(func pid=1775)[0m Finished Training


[36m(func pid=2058)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00017_17_batch_size=64,embed_dim=512,hidden_size=128,lr=0.0000_2024-06-17_16-15-13/checkpoint_000007)[32m [repeated 3x across cluster][0m
[36m(func pid=2119)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00018_18_batch_size=256,embed_dim=512,hidden_size=1024,lr=0.0003_2024-06-17_16-15-13/checkpoint_000005)
[36m(func pid=2120)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00019_19_batch_size=64,embed_dim=256,hidden_size=512,lr=0.0000_2024-06-17_16-15-13/checkpoint_000005)


[36m(func pid=2216)[0m [1,   200] loss: 1.575


[36m(func pid=2119)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00018_18_batch_size=256,embed_dim=512,hidden_size=1024,lr=0.0003_2024-06-17_16-15-13/checkpoint_000006)[32m [repeated 2x across cluster][0m


[36m(func pid=2058)[0m Finished Training


[36m(func pid=2058)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00017_17_batch_size=64,embed_dim=512,hidden_size=128,lr=0.0000_2024-06-17_16-15-13/checkpoint_000009)
[36m(func pid=2120)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00019_19_batch_size=64,embed_dim=256,hidden_size=512,lr=0.0000_2024-06-17_16-15-13/checkpoint_000006)
[36m(func pid=2216)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00020_20_batch_size=32,embed_dim=256,hidden_size=1024,lr=0.0001_2024-06-17_16-17-41/checkpoint_000000)
[36m(func pid=2119)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00018_18_batch_size=256,embed_dim=512,hidden_size=1024,lr=0.0003_2024-06-17_16-15

[36m(func pid=2216)[0m [2,   200] loss: 0.702


[36m(func pid=2119)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00018_18_batch_size=256,embed_dim=512,hidden_size=1024,lr=0.0003_2024-06-17_16-15-13/checkpoint_000008)[32m [repeated 2x across cluster][0m
[36m(func pid=2216)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00020_20_batch_size=32,embed_dim=256,hidden_size=1024,lr=0.0001_2024-06-17_16-17-41/checkpoint_000001)[32m [repeated 3x across cluster][0m
[36m(func pid=2119)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00018_18_batch_size=256,embed_dim=512,hidden_size=1024,lr=0.0003_2024-06-17_16-15-13/checkpoint_000009)


[36m(func pid=2119)[0m Finished Training
[36m(func pid=2216)[0m [3,   200] loss: 0.578


[36m(func pid=2276)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00021_21_batch_size=64,embed_dim=128,hidden_size=512,lr=0.0000_2024-06-17_16-18-18/checkpoint_000001)
[36m(func pid=2120)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00019_19_batch_size=64,embed_dim=256,hidden_size=512,lr=0.0000_2024-06-17_16-15-13/checkpoint_000009)
[36m(func pid=2276)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00021_21_batch_size=64,embed_dim=128,hidden_size=512,lr=0.0000_2024-06-17_16-18-18/checkpoint_000002)
[36m(func pid=2216)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00020_20_batch_size=32,embed_dim=256,hidden_size=1024,lr=0.0001_2024-06-17_16-17-4

[36m(func pid=2216)[0m [4,   200] loss: 0.500
[36m(func pid=2120)[0m Finished Training


[36m(func pid=2276)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00021_21_batch_size=64,embed_dim=128,hidden_size=512,lr=0.0000_2024-06-17_16-18-18/checkpoint_000003)
[36m(func pid=2394)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00023_23_batch_size=64,embed_dim=256,hidden_size=64,lr=0.0002_2024-06-17_16-18-19/checkpoint_000000)
[36m(func pid=2341)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00022_22_batch_size=128,embed_dim=512,hidden_size=512,lr=0.0000_2024-06-17_16-18-18/checkpoint_000001)
[36m(func pid=2394)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00023_23_batch_size=64,embed_dim=256,hidden_size=64,lr=0.0002_2024-06-17_16-18-19/

[36m(func pid=2216)[0m [5,   200] loss: 0.446


[36m(func pid=2341)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00022_22_batch_size=128,embed_dim=512,hidden_size=512,lr=0.0000_2024-06-17_16-18-18/checkpoint_000004)[32m [repeated 3x across cluster][0m
[36m(func pid=2276)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00021_21_batch_size=64,embed_dim=128,hidden_size=512,lr=0.0000_2024-06-17_16-18-18/checkpoint_000006)[32m [repeated 2x across cluster][0m
[36m(func pid=2341)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00022_22_batch_size=128,embed_dim=512,hidden_size=512,lr=0.0000_2024-06-17_16-18-18/checkpoint_000005)[32m [repeated 2x across cluster][0m


[36m(func pid=2216)[0m [6,   200] loss: 0.404


[36m(func pid=2276)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00021_21_batch_size=64,embed_dim=128,hidden_size=512,lr=0.0000_2024-06-17_16-18-18/checkpoint_000007)[32m [repeated 2x across cluster][0m
[36m(func pid=2394)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00023_23_batch_size=64,embed_dim=256,hidden_size=64,lr=0.0002_2024-06-17_16-18-19/checkpoint_000005)[32m [repeated 2x across cluster][0m
[36m(func pid=2276)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00021_21_batch_size=64,embed_dim=128,hidden_size=512,lr=0.0000_2024-06-17_16-18-18/checkpoint_000008)
[36m(func pid=2341)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00022_2

[36m(func pid=2216)[0m [7,   200] loss: 0.362


[36m(func pid=2341)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00022_22_batch_size=128,embed_dim=512,hidden_size=512,lr=0.0000_2024-06-17_16-18-18/checkpoint_000008)
[36m(func pid=2276)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00021_21_batch_size=64,embed_dim=128,hidden_size=512,lr=0.0000_2024-06-17_16-18-18/checkpoint_000009)


[36m(func pid=2276)[0m Finished Training
[36m(func pid=2341)[0m Finished Training


[36m(func pid=2341)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00022_22_batch_size=128,embed_dim=512,hidden_size=512,lr=0.0000_2024-06-17_16-18-18/checkpoint_000009)[32m [repeated 2x across cluster][0m
[36m(func pid=2394)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00023_23_batch_size=64,embed_dim=256,hidden_size=64,lr=0.0002_2024-06-17_16-18-19/checkpoint_000008)[32m [repeated 2x across cluster][0m


[36m(func pid=2216)[0m [8,   200] loss: 0.326
[36m(func pid=2394)[0m Finished Training
[36m(func pid=2465)[0m [1,   200] loss: 2.576


[36m(func pid=2394)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00023_23_batch_size=64,embed_dim=256,hidden_size=64,lr=0.0002_2024-06-17_16-18-19/checkpoint_000009)
[36m(func pid=2519)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00025_25_batch_size=256,embed_dim=64,hidden_size=1024,lr=0.0002_2024-06-17_16-20-52/checkpoint_000000)


[36m(func pid=2465)[0m [2,   200] loss: 1.616
[36m(func pid=2216)[0m [9,   200] loss: 0.287


[36m(func pid=2519)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00025_25_batch_size=256,embed_dim=64,hidden_size=1024,lr=0.0002_2024-06-17_16-20-52/checkpoint_000001)[32m [repeated 3x across cluster][0m
[36m(func pid=2585)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00026_26_batch_size=128,embed_dim=128,hidden_size=1024,lr=0.0000_2024-06-17_16-21-00/checkpoint_000000)
[36m(func pid=2465)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00024_24_batch_size=32,embed_dim=64,hidden_size=512,lr=0.0000_2024-06-17_16-19-24/checkpoint_000001)
[36m(func pid=2519)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00025_25_batch_size=256,embed_dim=64,hidde

[36m(func pid=2465)[0m [3,   200] loss: 1.350


[36m(func pid=2216)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00020_20_batch_size=32,embed_dim=256,hidden_size=1024,lr=0.0001_2024-06-17_16-17-41/checkpoint_000008)
[36m(func pid=2519)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00025_25_batch_size=256,embed_dim=64,hidden_size=1024,lr=0.0002_2024-06-17_16-20-52/checkpoint_000003)[32m [repeated 2x across cluster][0m


[36m(func pid=2216)[0m [10,   200] loss: 0.255
[36m(func pid=2465)[0m [4,   200] loss: 1.212


[36m(func pid=2585)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00026_26_batch_size=128,embed_dim=128,hidden_size=1024,lr=0.0000_2024-06-17_16-21-00/checkpoint_000002)[32m [repeated 2x across cluster][0m
[36m(func pid=2465)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00024_24_batch_size=32,embed_dim=64,hidden_size=512,lr=0.0000_2024-06-17_16-19-24/checkpoint_000003)[32m [repeated 2x across cluster][0m


[36m(func pid=2216)[0m Finished Training


[36m(func pid=2585)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00026_26_batch_size=128,embed_dim=128,hidden_size=1024,lr=0.0000_2024-06-17_16-21-00/checkpoint_000003)[32m [repeated 3x across cluster][0m


[36m(func pid=2465)[0m [5,   200] loss: 1.120


[36m(func pid=2519)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00025_25_batch_size=256,embed_dim=64,hidden_size=1024,lr=0.0002_2024-06-17_16-20-52/checkpoint_000006)


[36m(func pid=2652)[0m [1,   200] loss: 1.841


[36m(func pid=2465)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00024_24_batch_size=32,embed_dim=64,hidden_size=512,lr=0.0000_2024-06-17_16-19-24/checkpoint_000004)


[36m(func pid=2465)[0m [6,   200] loss: 1.051


[36m(func pid=2519)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00025_25_batch_size=256,embed_dim=64,hidden_size=1024,lr=0.0002_2024-06-17_16-20-52/checkpoint_000007)[32m [repeated 2x across cluster][0m
[36m(func pid=2652)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00027_27_batch_size=32,embed_dim=512,hidden_size=512,lr=0.0001_2024-06-17_16-22-54/checkpoint_000000)
[36m(func pid=2585)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00026_26_batch_size=128,embed_dim=128,hidden_size=1024,lr=0.0000_2024-06-17_16-21-00/checkpoint_000005)


[36m(func pid=2652)[0m [2,   200] loss: 0.735
[36m(func pid=2465)[0m [7,   200] loss: 1.008


[36m(func pid=2652)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00027_27_batch_size=32,embed_dim=512,hidden_size=512,lr=0.0001_2024-06-17_16-22-54/checkpoint_000001)[32m [repeated 3x across cluster][0m


[36m(func pid=2519)[0m Finished Training
[36m(func pid=2652)[0m [3,   200] loss: 0.558


[36m(func pid=2465)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00024_24_batch_size=32,embed_dim=64,hidden_size=512,lr=0.0000_2024-06-17_16-19-24/checkpoint_000006)[32m [repeated 3x across cluster][0m
[36m(func pid=2585)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00026_26_batch_size=128,embed_dim=128,hidden_size=1024,lr=0.0000_2024-06-17_16-21-00/checkpoint_000007)
[36m(func pid=2652)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00027_27_batch_size=32,embed_dim=512,hidden_size=512,lr=0.0001_2024-06-17_16-22-54/checkpoint_000002)


[36m(func pid=2652)[0m [4,   200] loss: 0.468[32m [repeated 2x across cluster][0m


[36m(func pid=2585)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00026_26_batch_size=128,embed_dim=128,hidden_size=1024,lr=0.0000_2024-06-17_16-21-00/checkpoint_000008)[32m [repeated 3x across cluster][0m
[36m(func pid=2714)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00028_28_batch_size=64,embed_dim=256,hidden_size=256,lr=0.0000_2024-06-17_16-23-44/checkpoint_000001)
[36m(func pid=2652)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00027_27_batch_size=32,embed_dim=512,hidden_size=512,lr=0.0001_2024-06-17_16-22-54/checkpoint_000003)
[36m(func pid=2585)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00026_26_batch_size=128,embed_dim=128,hidd

[36m(func pid=2585)[0m Finished Training
[36m(func pid=2465)[0m [9,   200] loss: 0.917
[36m(func pid=2465)[0m [10,   200] loss: 0.882


[36m(func pid=2714)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00028_28_batch_size=64,embed_dim=256,hidden_size=256,lr=0.0000_2024-06-17_16-23-44/checkpoint_000002)
[36m(func pid=2714)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00028_28_batch_size=64,embed_dim=256,hidden_size=256,lr=0.0000_2024-06-17_16-23-44/checkpoint_000003)


[36m(func pid=2465)[0m Finished Training
[36m(func pid=2652)[0m [5,   200] loss: 0.405


[36m(func pid=2776)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00029_29_batch_size=128,embed_dim=128,hidden_size=64,lr=0.0002_2024-06-17_16-23-44/checkpoint_000000)[32m [repeated 3x across cluster][0m


[36m(func pid=2652)[0m [6,   200] loss: 0.355


[36m(func pid=2776)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00029_29_batch_size=128,embed_dim=128,hidden_size=64,lr=0.0002_2024-06-17_16-23-44/checkpoint_000001)[32m [repeated 2x across cluster][0m
[36m(func pid=2714)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00028_28_batch_size=64,embed_dim=256,hidden_size=256,lr=0.0000_2024-06-17_16-23-44/checkpoint_000005)
[36m(func pid=2776)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00029_29_batch_size=128,embed_dim=128,hidden_size=64,lr=0.0002_2024-06-17_16-23-44/checkpoint_000002)


[36m(func pid=2652)[0m [7,   200] loss: 0.317


[36m(func pid=2835)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00030_30_batch_size=128,embed_dim=128,hidden_size=1024,lr=0.0001_2024-06-17_16-24-10/checkpoint_000000)[32m [repeated 2x across cluster][0m
[36m(func pid=2714)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00028_28_batch_size=64,embed_dim=256,hidden_size=256,lr=0.0000_2024-06-17_16-23-44/checkpoint_000006)[32m [repeated 2x across cluster][0m
[36m(func pid=2652)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00027_27_batch_size=32,embed_dim=512,hidden_size=512,lr=0.0001_2024-06-17_16-22-54/checkpoint_000006)[32m [repeated 2x across cluster][0m


[36m(func pid=2652)[0m [8,   200] loss: 0.284


[36m(func pid=2776)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00029_29_batch_size=128,embed_dim=128,hidden_size=64,lr=0.0002_2024-06-17_16-23-44/checkpoint_000006)[32m [repeated 4x across cluster][0m
[36m(func pid=2714)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00028_28_batch_size=64,embed_dim=256,hidden_size=256,lr=0.0000_2024-06-17_16-23-44/checkpoint_000008)
[36m(func pid=2776)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00029_29_batch_size=128,embed_dim=128,hidden_size=64,lr=0.0002_2024-06-17_16-23-44/checkpoint_000007)
[36m(func pid=2776)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00029_29_batch_size=128,embed_dim=128,hidden

[36m(func pid=2652)[0m [9,   200] loss: 0.250
[36m(func pid=2714)[0m Finished Training


[36m(func pid=2714)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00028_28_batch_size=64,embed_dim=256,hidden_size=256,lr=0.0000_2024-06-17_16-23-44/checkpoint_000009)
[36m(func pid=2776)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00029_29_batch_size=128,embed_dim=128,hidden_size=64,lr=0.0002_2024-06-17_16-23-44/checkpoint_000009)
[36m(func pid=2652)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00027_27_batch_size=32,embed_dim=512,hidden_size=512,lr=0.0001_2024-06-17_16-22-54/checkpoint_000008)[32m [repeated 2x across cluster][0m


[36m(func pid=2652)[0m [10,   200] loss: 0.223
[36m(func pid=2776)[0m Finished Training


[36m(func pid=2835)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00030_30_batch_size=128,embed_dim=128,hidden_size=1024,lr=0.0001_2024-06-17_16-24-10/checkpoint_000004)
[36m(func pid=2915)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00031_31_batch_size=128,embed_dim=512,hidden_size=64,lr=0.0001_2024-06-17_16-24-18/checkpoint_000000)


[36m(func pid=2652)[0m Finished Training


[36m(func pid=2652)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00027_27_batch_size=32,embed_dim=512,hidden_size=512,lr=0.0001_2024-06-17_16-22-54/checkpoint_000009)[32m [repeated 2x across cluster][0m
[36m(func pid=2915)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00031_31_batch_size=128,embed_dim=512,hidden_size=64,lr=0.0001_2024-06-17_16-24-18/checkpoint_000002)[32m [repeated 4x across cluster][0m
[36m(func pid=2835)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00030_30_batch_size=128,embed_dim=128,hidden_size=1024,lr=0.0001_2024-06-17_16-24-10/checkpoint_000006)[32m [repeated 2x across cluster][0m
[36m(func pid=3011)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning

[36m(func pid=2835)[0m Finished Training
[36m(func pid=2915)[0m Finished Training


[36m(func pid=2916)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00032_32_batch_size=128,embed_dim=128,hidden_size=512,lr=0.0002_2024-06-17_16-25-34/checkpoint_000008)[32m [repeated 4x across cluster][0m
[36m(func pid=3011)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00033_33_batch_size=256,embed_dim=256,hidden_size=512,lr=0.0004_2024-06-17_16-26-57/checkpoint_000007)
[36m(func pid=2916)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00032_32_batch_size=128,embed_dim=128,hidden_size=512,lr=0.0002_2024-06-17_16-25-34/checkpoint_000009)


[36m(func pid=2916)[0m Finished Training
[36m(func pid=3126)[0m [1,   200] loss: 2.705
[36m(func pid=3011)[0m Finished Training


[36m(func pid=3011)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00033_33_batch_size=256,embed_dim=256,hidden_size=512,lr=0.0004_2024-06-17_16-26-57/checkpoint_000009)[32m [repeated 3x across cluster][0m
[36m(func pid=3074)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00034_34_batch_size=128,embed_dim=128,hidden_size=256,lr=0.0007_2024-06-17_16-27-24/checkpoint_000002)[32m [repeated 3x across cluster][0m


[36m(func pid=3126)[0m [2,   200] loss: 1.893


[36m(func pid=3074)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00034_34_batch_size=128,embed_dim=128,hidden_size=256,lr=0.0007_2024-06-17_16-27-24/checkpoint_000003)[32m [repeated 2x across cluster][0m
[36m(func pid=3254)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00037_37_batch_size=128,embed_dim=128,hidden_size=64,lr=0.0004_2024-06-17_16-30-04/checkpoint_000000)[32m [repeated 2x across cluster][0m


[36m(func pid=3126)[0m [3,   200] loss: 1.419


[36m(func pid=3254)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00037_37_batch_size=128,embed_dim=128,hidden_size=64,lr=0.0004_2024-06-17_16-30-04/checkpoint_000001)[32m [repeated 4x across cluster][0m
[36m(func pid=3254)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00037_37_batch_size=128,embed_dim=128,hidden_size=64,lr=0.0004_2024-06-17_16-30-04/checkpoint_000002)[32m [repeated 3x across cluster][0m
[36m(func pid=3254)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00037_37_batch_size=128,embed_dim=128,hidden_size=64,lr=0.0004_2024-06-17_16-30-04/checkpoint_000003)[32m [repeated 4x across cluster][0m


[36m(func pid=3126)[0m [4,   200] loss: 1.157


[36m(func pid=3254)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00037_37_batch_size=128,embed_dim=128,hidden_size=64,lr=0.0004_2024-06-17_16-30-04/checkpoint_000004)[32m [repeated 3x across cluster][0m
[36m(func pid=3254)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00037_37_batch_size=128,embed_dim=128,hidden_size=64,lr=0.0004_2024-06-17_16-30-04/checkpoint_000005)[32m [repeated 3x across cluster][0m


[36m(func pid=3074)[0m Finished Training


[36m(func pid=3254)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00037_37_batch_size=128,embed_dim=128,hidden_size=64,lr=0.0004_2024-06-17_16-30-04/checkpoint_000006)[32m [repeated 4x across cluster][0m


[36m(func pid=3126)[0m [5,   200] loss: 0.993


[36m(func pid=3254)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00037_37_batch_size=128,embed_dim=128,hidden_size=64,lr=0.0004_2024-06-17_16-30-04/checkpoint_000007)[32m [repeated 2x across cluster][0m


[36m(func pid=3193)[0m Finished Training


[36m(func pid=3254)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00037_37_batch_size=128,embed_dim=128,hidden_size=64,lr=0.0004_2024-06-17_16-30-04/checkpoint_000008)[32m [repeated 3x across cluster][0m


[36m(func pid=3126)[0m [6,   200] loss: 0.888


[36m(func pid=3254)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00037_37_batch_size=128,embed_dim=128,hidden_size=64,lr=0.0004_2024-06-17_16-30-04/checkpoint_000009)


[36m(func pid=3254)[0m Finished Training


[36m(func pid=3126)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00035_35_batch_size=32,embed_dim=256,hidden_size=128,lr=0.0000_2024-06-17_16-27-25/checkpoint_000005)
[36m(func pid=3382)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00039_39_batch_size=128,embed_dim=128,hidden_size=512,lr=0.0003_2024-06-17_16-31-04/checkpoint_000000)


[36m(func pid=3126)[0m [7,   200] loss: 0.807[32m [repeated 2x across cluster][0m


[36m(func pid=3322)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00038_38_batch_size=32,embed_dim=128,hidden_size=1024,lr=0.0001_2024-06-17_16-31-00/checkpoint_000000)
[36m(func pid=3382)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00039_39_batch_size=128,embed_dim=128,hidden_size=512,lr=0.0003_2024-06-17_16-31-04/checkpoint_000001)


[36m(func pid=3322)[0m [2,   200] loss: 0.942


[36m(func pid=3126)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00035_35_batch_size=32,embed_dim=256,hidden_size=128,lr=0.0000_2024-06-17_16-27-25/checkpoint_000006)
[36m(func pid=3382)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00039_39_batch_size=128,embed_dim=128,hidden_size=512,lr=0.0003_2024-06-17_16-31-04/checkpoint_000002)


[36m(func pid=3126)[0m [8,   200] loss: 0.744


[36m(func pid=3382)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00039_39_batch_size=128,embed_dim=128,hidden_size=512,lr=0.0003_2024-06-17_16-31-04/checkpoint_000003)[32m [repeated 2x across cluster][0m
[36m(func pid=3126)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00035_35_batch_size=32,embed_dim=256,hidden_size=128,lr=0.0000_2024-06-17_16-27-25/checkpoint_000007)[32m [repeated 2x across cluster][0m


[36m(func pid=3126)[0m [9,   200] loss: 0.693


[36m(func pid=3382)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00039_39_batch_size=128,embed_dim=128,hidden_size=512,lr=0.0003_2024-06-17_16-31-04/checkpoint_000004)


[36m(func pid=3322)[0m [3,   200] loss: 0.799


[36m(func pid=3442)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00040_40_batch_size=64,embed_dim=256,hidden_size=1024,lr=0.0000_2024-06-17_16-33-24/checkpoint_000001)
[36m(func pid=3382)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00039_39_batch_size=128,embed_dim=128,hidden_size=512,lr=0.0003_2024-06-17_16-31-04/checkpoint_000005)
[36m(func pid=3126)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00035_35_batch_size=32,embed_dim=256,hidden_size=128,lr=0.0000_2024-06-17_16-27-25/checkpoint_000008)


[36m(func pid=3126)[0m [10,   200] loss: 0.656


[36m(func pid=3322)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00038_38_batch_size=32,embed_dim=128,hidden_size=1024,lr=0.0001_2024-06-17_16-31-00/checkpoint_000002)
[36m(func pid=3382)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00039_39_batch_size=128,embed_dim=128,hidden_size=512,lr=0.0003_2024-06-17_16-31-04/checkpoint_000006)
[36m(func pid=3442)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00040_40_batch_size=64,embed_dim=256,hidden_size=1024,lr=0.0000_2024-06-17_16-33-24/checkpoint_000002)


[36m(func pid=3322)[0m [4,   200] loss: 0.709


[36m(func pid=3382)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00039_39_batch_size=128,embed_dim=128,hidden_size=512,lr=0.0003_2024-06-17_16-31-04/checkpoint_000007)


[36m(func pid=3126)[0m Finished Training


[36m(func pid=3126)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00035_35_batch_size=32,embed_dim=256,hidden_size=128,lr=0.0000_2024-06-17_16-27-25/checkpoint_000009)
[36m(func pid=3382)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00039_39_batch_size=128,embed_dim=128,hidden_size=512,lr=0.0003_2024-06-17_16-31-04/checkpoint_000008)
[36m(func pid=3322)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00038_38_batch_size=32,embed_dim=128,hidden_size=1024,lr=0.0001_2024-06-17_16-31-00/checkpoint_000003)
[36m(func pid=3442)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00040_40_batch_size=64,embed_dim=256,hidden_size=1024,lr=0.0000_2024-06-17_16-33

[36m(func pid=3509)[0m [1,   200] loss: 1.808
[36m(func pid=3382)[0m Finished Training


[36m(func pid=3382)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00039_39_batch_size=128,embed_dim=128,hidden_size=512,lr=0.0003_2024-06-17_16-31-04/checkpoint_000009)


[36m(func pid=3322)[0m [5,   200] loss: 0.646


[36m(func pid=3509)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00041_41_batch_size=32,embed_dim=64,hidden_size=1024,lr=0.0001_2024-06-17_16-33-30/checkpoint_000000)
[36m(func pid=3569)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00042_42_batch_size=256,embed_dim=128,hidden_size=512,lr=0.0000_2024-06-17_16-33-48/checkpoint_000001)[32m [repeated 3x across cluster][0m


[36m(func pid=3509)[0m [2,   200] loss: 1.041


[36m(func pid=3569)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00042_42_batch_size=256,embed_dim=128,hidden_size=512,lr=0.0000_2024-06-17_16-33-48/checkpoint_000002)[32m [repeated 2x across cluster][0m


[36m(func pid=3322)[0m [6,   200] loss: 0.595


[36m(func pid=3569)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00042_42_batch_size=256,embed_dim=128,hidden_size=512,lr=0.0000_2024-06-17_16-33-48/checkpoint_000003)
[36m(func pid=3442)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00040_40_batch_size=64,embed_dim=256,hidden_size=1024,lr=0.0000_2024-06-17_16-33-24/checkpoint_000005)
[36m(func pid=3569)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00042_42_batch_size=256,embed_dim=128,hidden_size=512,lr=0.0000_2024-06-17_16-33-48/checkpoint_000004)[32m [repeated 2x across cluster][0m
[36m(func pid=3322)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00038_38_batch_size=32,embed_dim=128,hidd

[36m(func pid=3509)[0m [3,   200] loss: 0.892


[36m(func pid=3569)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00042_42_batch_size=256,embed_dim=128,hidden_size=512,lr=0.0000_2024-06-17_16-33-48/checkpoint_000005)


[36m(func pid=3322)[0m [7,   200] loss: 0.552


[36m(func pid=3569)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00042_42_batch_size=256,embed_dim=128,hidden_size=512,lr=0.0000_2024-06-17_16-33-48/checkpoint_000006)
[36m(func pid=3442)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00040_40_batch_size=64,embed_dim=256,hidden_size=1024,lr=0.0000_2024-06-17_16-33-24/checkpoint_000006)
[36m(func pid=3569)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00042_42_batch_size=256,embed_dim=128,hidden_size=512,lr=0.0000_2024-06-17_16-33-48/checkpoint_000007)
[36m(func pid=3509)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00041_41_batch_size=32,embed_dim=64,hidden_size=1024,lr=0.0001_2024-06-17_16-33

[36m(func pid=3509)[0m [4,   200] loss: 0.786
[36m(func pid=3569)[0m Finished Training


[36m(func pid=3569)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00042_42_batch_size=256,embed_dim=128,hidden_size=512,lr=0.0000_2024-06-17_16-33-48/checkpoint_000009)
[36m(func pid=3442)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00040_40_batch_size=64,embed_dim=256,hidden_size=1024,lr=0.0000_2024-06-17_16-33-24/checkpoint_000007)


[36m(func pid=3322)[0m [8,   200] loss: 0.512


[36m(func pid=3509)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00041_41_batch_size=32,embed_dim=64,hidden_size=1024,lr=0.0001_2024-06-17_16-33-30/checkpoint_000003)


[36m(func pid=3634)[0m [1,   200] loss: 2.583
[36m(func pid=3509)[0m [5,   200] loss: 0.702


[36m(func pid=3322)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00038_38_batch_size=32,embed_dim=128,hidden_size=1024,lr=0.0001_2024-06-17_16-31-00/checkpoint_000007)
[36m(func pid=3442)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00040_40_batch_size=64,embed_dim=256,hidden_size=1024,lr=0.0000_2024-06-17_16-33-24/checkpoint_000008)
[36m(func pid=3634)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00043_43_batch_size=32,embed_dim=128,hidden_size=512,lr=0.0000_2024-06-17_16-35-39/checkpoint_000000)


[36m(func pid=3322)[0m [9,   200] loss: 0.478
[36m(func pid=3634)[0m [2,   200] loss: 1.542


[36m(func pid=3509)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00041_41_batch_size=32,embed_dim=64,hidden_size=1024,lr=0.0001_2024-06-17_16-33-30/checkpoint_000004)


[36m(func pid=3442)[0m Finished Training


[36m(func pid=3442)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00040_40_batch_size=64,embed_dim=256,hidden_size=1024,lr=0.0000_2024-06-17_16-33-24/checkpoint_000009)
[36m(func pid=3634)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00043_43_batch_size=32,embed_dim=128,hidden_size=512,lr=0.0000_2024-06-17_16-35-39/checkpoint_000001)


[36m(func pid=3509)[0m [6,   200] loss: 0.634
[36m(func pid=3634)[0m [3,   200] loss: 1.192


[36m(func pid=3322)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00038_38_batch_size=32,embed_dim=128,hidden_size=1024,lr=0.0001_2024-06-17_16-31-00/checkpoint_000008)


[36m(func pid=3322)[0m [10,   200] loss: 0.444


[36m(func pid=3697)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00044_44_batch_size=128,embed_dim=256,hidden_size=512,lr=0.0000_2024-06-17_16-36-57/checkpoint_000000)


[36m(func pid=3634)[0m [4,   200] loss: 1.044
[36m(func pid=3509)[0m [7,   200] loss: 0.570


[36m(func pid=3697)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00044_44_batch_size=128,embed_dim=256,hidden_size=512,lr=0.0000_2024-06-17_16-36-57/checkpoint_000001)[32m [repeated 3x across cluster][0m


[36m(func pid=3322)[0m Finished Training


[36m(func pid=3322)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00038_38_batch_size=32,embed_dim=128,hidden_size=1024,lr=0.0001_2024-06-17_16-31-00/checkpoint_000009)
[36m(func pid=3634)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00043_43_batch_size=32,embed_dim=128,hidden_size=512,lr=0.0000_2024-06-17_16-35-39/checkpoint_000003)


[36m(func pid=3634)[0m [5,   200] loss: 0.945


[36m(func pid=3509)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00041_41_batch_size=32,embed_dim=64,hidden_size=1024,lr=0.0001_2024-06-17_16-33-30/checkpoint_000006)[32m [repeated 2x across cluster][0m
[36m(func pid=3697)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00044_44_batch_size=128,embed_dim=256,hidden_size=512,lr=0.0000_2024-06-17_16-36-57/checkpoint_000003)


[36m(func pid=3509)[0m [8,   200] loss: 0.516


[36m(func pid=3757)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00045_45_batch_size=64,embed_dim=64,hidden_size=512,lr=0.0007_2024-06-17_16-37-54/checkpoint_000000)
[36m(func pid=3697)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00044_44_batch_size=128,embed_dim=256,hidden_size=512,lr=0.0000_2024-06-17_16-36-57/checkpoint_000004)[32m [repeated 2x across cluster][0m


[36m(func pid=3634)[0m [6,   200] loss: 0.889


[36m(func pid=3697)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00044_44_batch_size=128,embed_dim=256,hidden_size=512,lr=0.0000_2024-06-17_16-36-57/checkpoint_000005)
[36m(func pid=3757)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00045_45_batch_size=64,embed_dim=64,hidden_size=512,lr=0.0007_2024-06-17_16-37-54/checkpoint_000001)
[36m(func pid=3634)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00043_43_batch_size=32,embed_dim=128,hidden_size=512,lr=0.0000_2024-06-17_16-35-39/checkpoint_000005)[32m [repeated 2x across cluster][0m


[36m(func pid=3509)[0m [9,   200] loss: 0.473


[36m(func pid=3757)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00045_45_batch_size=64,embed_dim=64,hidden_size=512,lr=0.0007_2024-06-17_16-37-54/checkpoint_000002)[32m [repeated 2x across cluster][0m
[36m(func pid=3697)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00044_44_batch_size=128,embed_dim=256,hidden_size=512,lr=0.0000_2024-06-17_16-36-57/checkpoint_000007)
[36m(func pid=3634)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00043_43_batch_size=32,embed_dim=128,hidden_size=512,lr=0.0000_2024-06-17_16-35-39/checkpoint_000006)


[36m(func pid=3634)[0m [8,   200] loss: 0.797[32m [repeated 2x across cluster][0m
[36m(func pid=3509)[0m [10,   200] loss: 0.428


[36m(func pid=3697)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00044_44_batch_size=128,embed_dim=256,hidden_size=512,lr=0.0000_2024-06-17_16-36-57/checkpoint_000009)[32m [repeated 4x across cluster][0m


[36m(func pid=3697)[0m Finished Training


[36m(func pid=3634)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00043_43_batch_size=32,embed_dim=128,hidden_size=512,lr=0.0000_2024-06-17_16-35-39/checkpoint_000007)[32m [repeated 2x across cluster][0m


[36m(func pid=3634)[0m [9,   200] loss: 0.771


[36m(func pid=3509)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00041_41_batch_size=32,embed_dim=64,hidden_size=1024,lr=0.0001_2024-06-17_16-33-30/checkpoint_000009)


[36m(func pid=3509)[0m Finished Training


[36m(func pid=3757)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00045_45_batch_size=64,embed_dim=64,hidden_size=512,lr=0.0007_2024-06-17_16-37-54/checkpoint_000005)
[36m(func pid=3634)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00043_43_batch_size=32,embed_dim=128,hidden_size=512,lr=0.0000_2024-06-17_16-35-39/checkpoint_000008)
[36m(func pid=3757)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00045_45_batch_size=64,embed_dim=64,hidden_size=512,lr=0.0007_2024-06-17_16-37-54/checkpoint_000006)


[36m(func pid=3634)[0m [10,   200] loss: 0.738[32m [repeated 2x across cluster][0m
[36m(func pid=3880)[0m [1,   200] loss: 1.209
[36m(func pid=3820)[0m [2,   200] loss: 0.713


[36m(func pid=3757)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00045_45_batch_size=64,embed_dim=64,hidden_size=512,lr=0.0007_2024-06-17_16-37-54/checkpoint_000007)[32m [repeated 2x across cluster][0m


[36m(func pid=3634)[0m Finished Training
[36m(func pid=3880)[0m [2,   200] loss: 0.456


[36m(func pid=3757)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00045_45_batch_size=64,embed_dim=64,hidden_size=512,lr=0.0007_2024-06-17_16-37-54/checkpoint_000008)[32m [repeated 4x across cluster][0m
[36m(func pid=3880)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00047_47_batch_size=32,embed_dim=256,hidden_size=256,lr=0.0004_2024-06-17_16-39-30/checkpoint_000001)
[36m(func pid=3820)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00046_46_batch_size=32,embed_dim=64,hidden_size=64,lr=0.0008_2024-06-17_16-38-07/checkpoint_000002)


[36m(func pid=3757)[0m Finished Training
[36m(func pid=3820)[0m [3,   200] loss: 0.519
[36m(func pid=3880)[0m [3,   200] loss: 0.312


[36m(func pid=3820)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00046_46_batch_size=32,embed_dim=64,hidden_size=64,lr=0.0008_2024-06-17_16-38-07/checkpoint_000003)[32m [repeated 2x across cluster][0m


[36m(func pid=3820)[0m [5,   200] loss: 0.325[32m [repeated 2x across cluster][0m


[36m(func pid=3820)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00046_46_batch_size=32,embed_dim=64,hidden_size=64,lr=0.0008_2024-06-17_16-38-07/checkpoint_000004)[32m [repeated 2x across cluster][0m


[36m(func pid=3820)[0m [6,   200] loss: 0.268[32m [repeated 2x across cluster][0m


[36m(func pid=3820)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00046_46_batch_size=32,embed_dim=64,hidden_size=64,lr=0.0008_2024-06-17_16-38-07/checkpoint_000005)[32m [repeated 2x across cluster][0m


[36m(func pid=3820)[0m [7,   200] loss: 0.219[32m [repeated 2x across cluster][0m


[36m(func pid=3820)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00046_46_batch_size=32,embed_dim=64,hidden_size=64,lr=0.0008_2024-06-17_16-38-07/checkpoint_000006)[32m [repeated 2x across cluster][0m


[36m(func pid=3820)[0m [8,   200] loss: 0.183[32m [repeated 2x across cluster][0m


[36m(func pid=3820)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00046_46_batch_size=32,embed_dim=64,hidden_size=64,lr=0.0008_2024-06-17_16-38-07/checkpoint_000007)[32m [repeated 2x across cluster][0m


[36m(func pid=3820)[0m [9,   200] loss: 0.153[32m [repeated 2x across cluster][0m


[36m(func pid=3820)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00046_46_batch_size=32,embed_dim=64,hidden_size=64,lr=0.0008_2024-06-17_16-38-07/checkpoint_000008)[32m [repeated 2x across cluster][0m


[36m(func pid=3820)[0m [10,   200] loss: 0.124[32m [repeated 2x across cluster][0m
[36m(func pid=3820)[0m Finished Training


[36m(func pid=3820)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00046_46_batch_size=32,embed_dim=64,hidden_size=64,lr=0.0008_2024-06-17_16-38-07/checkpoint_000009)[32m [repeated 2x across cluster][0m


[36m(func pid=3880)[0m [10,   200] loss: 0.016[32m [repeated 2x across cluster][0m
[36m(func pid=3880)[0m Finished Training


[36m(func pid=3880)[0m Checkpoint successfully created at: Checkpoint(filesystem=local, path=/root/ray_results/tuning_2024-06-17_16-15-04/tuning_c16fa_00047_47_batch_size=32,embed_dim=256,hidden_size=256,lr=0.0004_2024-06-17_16-39-30/checkpoint_000009)[32m [repeated 2x across cluster][0m
2024-06-17 16:53:11,057	INFO tune.py:1042 -- Total run time: 2286.85 seconds (2286.79 seconds for the tuning loop).


In [23]:
best_trial = result.get_best_trial("loss", "min", "last")
print(f"Best trial config: {best_trial.config}")
print(f"Best trial final validation loss: {best_trial.last_result['loss']}")

Best trial config: {'embed_dim': 128, 'hidden_size': 1024, 'lr': 0.0007315376158978725, 'batch_size': 128}
Best trial final validation loss: 0.007192907085148048


In [17]:
import numpy as np

# Load datasets.
train_dataset = CoNLLDataset('/kaggle/working/en_ewt-ud-train.conllu')
dev_dataset = CoNLLDataset('/kaggle/working/en_ewt-ud-dev.conllu')

dev_dataset.token_vocab = train_dataset.token_vocab
dev_dataset.pos_vocab = train_dataset.pos_vocab

# Hyperparameters / constants.
input_vocab_size = len(train_dataset.token_vocab)
output_vocab_size = len(train_dataset.pos_vocab)
batch_size = 64
epochs = 10

# Initialize the model.
model = Tagger(input_vocab_size, output_vocab_size, 128, 256)
if torch.cuda.is_available():
    model = model.cuda()

# Loss function weights.
weight = torch.ones(output_vocab_size)
weight[0] = 0
if torch.cuda.is_available():
    weight = weight.cuda()

# Initialize loss function and optimizer.
loss_function = torch.nn.NLLLoss(weight)
optimizer = torch.optim.Adam(model.parameters(), lr=0.00106512)

# Main training loop.
data_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True,
                         collate_fn=collate_annotations)
dev_loader = DataLoader(dev_dataset, batch_size=batch_size, shuffle=False,
                        collate_fn=collate_annotations)
losses = []
i = 0
for epoch in range(epochs):
    for inputs, targets, lengths in data_loader:
        optimizer.zero_grad()
        outputs, _ = model(inputs, lengths=lengths)
    
        outputs = outputs.view(-1, output_vocab_size)
        targets = targets.view(-1)

        loss = loss_function(outputs, targets)
        loss.backward()
        optimizer.step()

        losses.append(loss.item())
        if (i % 10) == 0:
            # Compute dev loss over entire dev set.
            # NOTE: This is expensive. In your work you may want to only use a
            # subset of the dev set.
            dev_losses = []
            for inputs, targets, lengths in dev_loader:
                outputs, _ = model(inputs, lengths=lengths)
                outputs = outputs.view(-1, output_vocab_size)
                targets = targets.view(-1)
                loss = loss_function(outputs, targets)
                dev_losses.append(loss.item())
            avg_train_loss = np.mean(losses)
            avg_dev_loss = np.mean(dev_losses)
            losses = []
            print('Iteration %i - Train Loss: %0.6f - Dev Loss: %0.6f' % (i, avg_train_loss, avg_dev_loss), end='\r')
            torch.save(model, 'pos_tagger.pt')
        i += 1

torch.save(model, 'pos_tagger.final.pt')



Iteration 360 - Train Loss: 0.406959 - Dev Loss: 0.489795


KeyboardInterrupt



In [None]:
import matplotlib.pyplot as plt 

plt.plot(losses)

In [None]:
import numpy as np

# Load datasets.
train_dataset = CoNLLDataset('/kaggle/working/en_ewt-ud-train.conllu')
dev_dataset = CoNLLDataset('/kaggle/working/en_ewt-ud-dev.conllu')

dev_dataset.token_vocab = train_dataset.token_vocab
dev_dataset.pos_vocab = train_dataset.pos_vocab

# Hyperparameters / constants.
input_vocab_size = len(train_dataset.token_vocab)
output_vocab_size = len(train_dataset.pos_vocab)
batch_size = 64
epochs = 10

# Initialize the model.
model = Tagger(input_vocab_size, output_vocab_size, 128, 128)
if torch.cuda.is_available():
    model = model.cuda()

# Loss function weights.
weight = torch.ones(output_vocab_size)
weight[0] = 0
if torch.cuda.is_available():
    weight = weight.cuda()

# Initialize loss function and optimizer.
loss_function = torch.nn.NLLLoss(weight)
optimizer = torch.optim.Adam(model.parameters(), lr=0.0005)

# Main training loop.
data_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True,
                         collate_fn=collate_annotations)
dev_loader = DataLoader(dev_dataset, batch_size=batch_size, shuffle=False,
                        collate_fn=collate_annotations)
losses = []
i = 0
for epoch in range(epochs):
    for inputs, targets, lengths in data_loader:
        optimizer.zero_grad()
        outputs, _ = model(inputs, lengths=lengths)
    
        outputs = outputs.view(-1, output_vocab_size)
        targets = targets.view(-1)

        loss = loss_function(outputs, targets)
        loss.backward()
        optimizer.step()

        losses.append(loss.item())
        if (i % 10) == 0:
            # Compute dev loss over entire dev set.
            # NOTE: This is expensive. In your work you may want to only use a
            # subset of the dev set.
            dev_losses = []
            for inputs, targets, lengths in dev_loader:
                outputs, _ = model(inputs, lengths=lengths)
                outputs = outputs.view(-1, output_vocab_size)
                targets = targets.view(-1)
                loss = loss_function(outputs, targets)
                dev_losses.append(loss.item())
            avg_train_loss = np.mean(losses)
            avg_dev_loss = np.mean(dev_losses)
            losses = []
            print('Iteration %i - Train Loss: %0.6f - Dev Loss: %0.6f' % (i, avg_train_loss, avg_dev_loss), end='\r')
            torch.save(model, 'pos_tagger.pt')
        i += 1

torch.save(model, 'pos_tagger.final.pt')

In [None]:
import matplotlib.pyplot as plt

plt.plot(losses)

### Evaluation

For tagging tasks the typical evaluation metric are accuracy and f1-score (e.g. the harmonic mean of precision and recall):

$$ \text{f1-score} = 2 \frac{\text{precision} * \text{recall}}{\text{precision} + \text{recall}} $$

Here are the results for our final model:

In [None]:
# Collect the predictions and targets
y_true = []
y_pred = []

for inputs, targets, lengths in dev_loader:
    outputs, _ = model(inputs, lengths=lengths)
    _, preds = torch.max(outputs, dim=2)
    targets = targets.view(-1)
    preds = preds.view(-1)
    if torch.cuda.is_available():
        targets = targets.cpu()
        preds = preds.cpu()
    y_true.append(targets.data.numpy())
    y_pred.append(preds.data.numpy())

# Stack into numpy arrays
y_true = np.concatenate(y_true)
y_pred = np.concatenate(y_pred)

# Compute accuracy
acc = np.mean(y_true[y_true != 0] == y_pred[y_true != 0])
print('Accuracy - %0.6f\n' % acc)

# Evaluate f1-score
from sklearn.metrics import f1_score
score = f1_score(y_true, y_pred, average=None)
print('F1-scores:\n')
for label, score in zip(dev_dataset.pos_vocab._id2word[1:], score[1:]):
    print('%s - %0.6f' % (label, score))

### Inference

Now let's look at some of the model's predictions.

In [None]:
model = torch.load('pos_tagger.final.pt')

def inference(sentence):
    # Convert words to id tensor.
    ids = [[dataset.token_vocab.word2id(x)] for x in sentence]
    ids = Variable(torch.LongTensor(ids))
    if torch.cuda.is_available():
        ids = ids.cuda()
    # Get model output.
    output, _ = model(ids)
    _, preds = torch.max(output, dim=2)
    if torch.cuda.is_available():
        preds = preds.cpu()
    preds = preds.data.view(-1).numpy()
    pos_tags = [dataset.pos_vocab.id2word(x) for x in preds]
    for word, tag in zip(sentence, pos_tags):
        print('%s - %s' % (word, tag))

In [None]:
sentence = "sdfgkj asd;glkjsdg ;lkj  .".split()
inference(sentence)

# Example: Sentiment Analysis

According to [Wikipedia](https://en.wikipedia.org/wiki/Sentiment_analysis):

>Opinion mining (sometimes known as sentiment analysis or emotion AI) refers to the use of natural language processing, text analysis, computational linguistics, and biometrics to systematically identify, extract, quantify, and study affective states and subjective information.

Formally, given a sequence of words $\mathbf{x} = \left< x_1, x_2, \ldots, x_t \right>$ the goal is to learn a model $P(y \,|\, \mathbf{x})$ where $y$ is the sentiment associated to the sentence. This is very similar to the problem above, with the exception that we only want a single output for each sentence not a sentence. Accordingly, we will only highlight the neccessary changes that need to be made.

### Dataset

We will be using the Kaggle 'Sentiment Analysis on Movie Reviews' dataset [[link](https://www.kaggle.com/c/sentiment-analysis-on-movie-reviews/data)]. You will need to agree to the Kaggle terms of service in order to download this data. The following code can be used to process this data.

In [None]:
import torch
from collections import Counter
from torch.autograd import Variable
from torch.utils.data import Dataset


class Annotation(object):
    def __init__(self):
        """A helper object for storing annotation data."""
        self.tokens = []
        self.sentiment = None


class SentimentDataset(Dataset):
    def __init__(self, fname):
        """Initializes the SentimentDataset.
        Args:
            fname: The .tsv file to load data from.
        """
        self.fname = fname
        self.annotations = self.process_tsv_file(fname)
        self.token_vocab = Vocab([x.tokens for x in self.annotations],
                                 unk_token='<unk>')

    def __len__(self):
        return len(self.annotations)

    def __getitem__(self, idx):
        annotation = self.annotations[idx]
        input = [self.token_vocab.word2id(x) for x in annotation.tokens]
        target = annotation.sentiment
        return input, target

    def process_tsv_file(self, fname):
        # Read the entire file.
        with open(fname, 'r') as f:
            lines = f.readlines()
        annotations = []
        observed_ids = set()
        for line in lines[1:]:
            annotation = Annotation()
            _, sentence_id, sentence, sentiment = line.split('\t')
            sentence_id = sentence_id
            if sentence_id in observed_ids:
                continue
            else:
                observed_ids.add(sentence_id)
            annotation.tokens = sentence.split()
            annotation.sentiment = int(sentiment)
            if len(annotation.tokens) > 0:
                annotations.append(annotation)
        return annotations


def pad(sequences, max_length, pad_value=0):
    """Pads a list of sequences.
    Args:
        sequences: A list of sequences to be padded.
        max_length: The length to pad to.
        pad_value: The value used for padding.
    Returns:
        A list of padded sequences.
    """
    out = []
    for sequence in sequences:
        padded = sequence + [0]*(max_length - len(sequence))
        out.append(padded)
    return out


def collate_annotations(batch):
    """Function used to collate data returned by CoNLLDataset."""
    # Get inputs, targets, and lengths.
    inputs, targets = zip(*batch)
    lengths = [len(x) for x in inputs]
    # Sort by length.
    sort = sorted(zip(inputs, targets, lengths),
                  key=lambda x: x[2],
                  reverse=True)
    inputs, targets, lengths = zip(*sort)
    # Pad.
    max_length = max(lengths)
    inputs = pad(inputs, max_length)
    # Transpose.
    inputs = list(map(list, zip(*inputs)))
    # Convert to PyTorch variables.
    inputs = Variable(torch.LongTensor(inputs))
    targets = Variable(torch.LongTensor(targets))
    lengths = Variable(torch.LongTensor(lengths))
    if torch.cuda.is_available():
        inputs = inputs.cuda()
        targets = targets.cuda()
        lengths = lengths.cuda()
    return inputs, targets, lengths

### Model

The model architecture we will use for sentiment classification is almost exactly the same as the one we used for tagging. The only difference is that we want the model to produce a single output at the end, not a sequence of outputs. While there are many ways to do this, a simple approach is to just use the final hidden state of the recurrent layer as the input to the fully connected layer. This approach is particularly nice in PyTorch since the forward pass of the recurrent layer returns the final hidden states as its second output (see the note in the code below if this is unclear), so we do not need to do any fancy indexing tricks to get them.

Formally, the model architecture we will use is:

1. Embed the input words into a 200 dimensional vector space.
2. Feed the word embeddings into a GRU.
3. Feed the final hidden state output by the GRU into a fully connected layer.
4. Use a softmax activation to get the probabilities of the different labels.

In [None]:
from torch import nn
from torch.nn.utils.rnn import pack_padded_sequence, pad_packed_sequence


class SentimentClassifier(nn.Module):
    def __init__(self,
                 input_vocab_size,
                 output_vocab_size,
                 embedding_dim=64,
                 hidden_size=64):
        """Initializes the tagger.

        Args:
            input_vocab_size: Size of the input vocabulary.
            output_vocab_size: Size of the output vocabulary.
            embedding_dim: Dimension of the word embeddings.
            hidden_size: Number of units in each LSTM hidden layer.
        """
        # Always do this!!!
        super(SentimentClassifier, self).__init__()

        # Store parameters
        self.input_vocab_size = input_vocab_size
        self.output_vocab_size = output_vocab_size
        self.embedding_dim = embedding_dim
        self.hidden_size = hidden_size

        # Define layers
        self.word_embeddings = nn.Embedding(input_vocab_size, embedding_dim,
                                            padding_idx=0)
        self.rnn = nn.GRU(embedding_dim, hidden_size, dropout=0.9)
        self.fc = nn.Linear(hidden_size, output_vocab_size)
        self.activation = nn.LogSoftmax(dim=2)

    def forward(self, x, lengths=None, hidden=None):
        """Computes a forward pass of the language model.

        Args:
            x: A LongTensor w/ dimension [seq_len, batch_size].
            lengths: The lengths of the sequences in x.
            hidden: Hidden state to be fed into the lstm.

        Returns:
            net: the output representation for each word in the sequence.
            hidden: Hidden state of the last timestamp.
        """
        seq_len, batch_size = x.size()

        # If no hidden state is provided, then default to zeros.
        if hidden is None:
            hidden = Variable(torch.zeros(1, batch_size, self.hidden_size))
            if torch.cuda.is_available():
                hidden = hidden.cuda()

        net = self.word_embeddings(x)
        if lengths is not None:
            lengths_list = lengths.data.view(-1).tolist()
            net = pack_padded_sequence(net, lengths_list)
        net, hidden = self.rnn(net, hidden)
        # NOTE: we are using hidden as the input to the fully-connected layer, not net!!!
        net = self.fc(hidden)
        net = self.activation(net)

        return net, hidden

### Training

This code should look pretty familiar by now...

In [None]:
import numpy as np
from torch.utils.data import DataLoader

# Load dataset.
sentiment_dataset = SentimentDataset('train.tsv')

# Hyperparameters / constants.
input_vocab_size = len(sentiment_dataset.token_vocab)
output_vocab_size = 5
batch_size = 16
epochs = 7

# Initialize the model.
model = SentimentClassifier(input_vocab_size, output_vocab_size)
if torch.cuda.is_available():
    model = model.cuda()

# Initialize loss function and optimizer.
loss_function = torch.nn.NLLLoss()
optimizer = torch.optim.Adam(model.parameters())

# Main training loop.
data_loader = DataLoader(sentiment_dataset, batch_size=batch_size, shuffle=True,
                         collate_fn=collate_annotations)
losses = []
i = 0
for epoch in range(epochs):
    for inputs, targets, lengths in data_loader:
        optimizer.zero_grad()
        outputs, _ = model(inputs, lengths=lengths)

        outputs = outputs.view(-1, output_vocab_size)
        targets = targets.view(-1)

        loss = loss_function(outputs, targets)
        loss.backward()
        optimizer.step()

        losses.append(loss.data[0])
        if (i % 100) == 0:
            average_loss = np.mean(losses)
            losses = []
            print('Iteration %i - Loss: %0.6f' % (i, average_loss), end='\r')
        if (i % 1000) == 0:
            torch.save(model, 'sentiment_classifier.pt')
        i += 1

torch.save(model, 'sentiment_classifier.final.pt')

### Inference

Lastly, let's examine some model outputs:

In [None]:
model = torch.load('sentiment_classifier.final.pt')

def inference(sentence):
    # Convert words to id tensor.
    ids = [[sentiment_dataset.token_vocab.word2id(x)] for x in sentence]
    ids = Variable(torch.LongTensor(ids))
    if torch.cuda.is_available():
        ids = ids.cuda()
    # Get model output.
    output, _ = model(ids)
    _, pred = torch.max(output, dim=2)
    if torch.cuda.is_available():
        pred = pred.cpu()
    pred = pred.data.view(-1).numpy()
    print('Sentence: %s' % ' '.join(sentence))
    print('Sentiment (0=negative, 4=positive): %i' % pred)

In [None]:
sentence = 'Zot zot  .'.split()
inference(sentence)

<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=a606c68a-0fb4-4c6a-9886-fddaecf4a93b' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>