#Exploratory Data Analysis of Stringer Dataset 
@authors: Simone Azeglio, Chetan Dhulipalla , Khalid Saifullah 


Part of the code here has been taken from [Neuromatch Academy's Computational Neuroscience Course](https://compneuro.neuromatch.io/projects/neurons/README.html), and specifically from [this notebook](https://colab.research.google.com/github/NeuromatchAcademy/course-content/blob/master/projects/neurons/load_stringer_spontaneous.ipynb)

# to do list

1. custom normalization: dividing by mean value per neuron
1a. downsampling: convolve then downsample by 5
2. training validation split: withhold last 20 percent of time series for testing
3. RNN for each layer: a way to capture the dynamics inside each layer instead of capturing extra dynamics from inter-layer interactions. it will be OK to compare the different RNNs. maintain same neuron count in each layer to reduce potential bias 
4. layer weight regularization: L2 
5. early stopping , dropout?

## Loading of Stringer spontaneous data



In [1]:
%%capture
!pip install wandb --upgrade

In [2]:
import wandb

wandb.login()

<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


True

In [3]:
#@title Data retrieval
import os, requests

fname = "stringer_spontaneous.npy"
url = "https://osf.io/dpqaj/download"

if not os.path.isfile(fname):
    try:
        r = requests.get(url)
    except requests.ConnectionError:
        print("!!! Failed to download data !!!")
    else:
        if r.status_code != requests.codes.ok:
            print("!!! Failed to download data !!!")
        else:
            with open(fname, "wb") as fid:
                fid.write(r.content)

In [4]:
#@title Import matplotlib and set defaults
from matplotlib import rcParams 
from matplotlib import pyplot as plt
rcParams['figure.figsize'] = [20, 4]
rcParams['font.size'] =15
rcParams['axes.spines.top'] = False
rcParams['axes.spines.right'] = False
rcParams['figure.autolayout'] = True

## Exploratory Data Analysis (EDA)

In [5]:
#@title Data loading
import numpy as np
dat = np.load('stringer_spontaneous.npy', allow_pickle=True).item()
print(dat.keys())

dict_keys(['sresp', 'run', 'beh_svd_time', 'beh_svd_mask', 'stat', 'pupilArea', 'pupilCOM', 'xyz'])


In [6]:
# functions 

def moving_avg(array, factor = 5):
    """Reducing the number of compontents by averaging of N = factor
    subsequent elements of array"""
    zeros_ = np.zeros((array.shape[0], 2))
    array = np.hstack((array, zeros_))

    array = np.reshape(array, (array.shape[0],  int(array.shape[1]/factor), factor))
    array = np.mean(array, axis = 2)

    return array

## Extracting Data for RNN (or LFADS)
The first problem to address is that for each layer we don't have the exact same number of neurons. We'd like to have a single RNN encoding all the different layers activities, to make it easier we can take the number of neurons ($N_{neurons} = 1131$ of the least represented class (layer) and level out each remaining class. 

In [7]:
# Extract labels from z - coordinate
from sklearn import preprocessing
x, y, z = dat['xyz']

le = preprocessing.LabelEncoder()
labels = le.fit_transform(z)
### least represented class (layer with less neurons)
n_samples = np.histogram(labels, bins=9)[0][-1]

In [8]:
### Data for LFADS / RNN 
import pandas as pd 
dataSet = pd.DataFrame(dat["sresp"])
dataSet["label"] = labels 

In [9]:
# it can be done in one loop ... 
data_ = []
for i in range(0, 9):
    data_.append(dataSet[dataSet["label"] == i].sample(n = n_samples).iloc[:,:-1])

dataRNN = np.zeros((n_samples*9, dataSet.shape[1]-1))
for i in range(0,9):
    
    # dataRNN[n_samples*i:n_samples*(i+1), :] = data_[i]
    ## normalized by layer
    dataRNN[n_samples*i:n_samples*(i+1), :] = data_[i]/np.mean(np.asarray(data_)[i,:,:], axis = 0)

## shuffling for training purposes

#np.random.shuffle(dataRNN)

In [None]:
#unshuffled = np.array(data_)

In [10]:
#@title Convolutions code

# convolution moving average

# kernel_length = 50
# averaging_kernel = np.ones(kernel_length) / kernel_length

# dataRNN.shape

# avgd_dataRNN = list()

# for neuron in dataRNN:
#   avgd_dataRNN.append(np.convolve(neuron, averaging_kernel))

# avg_dataRNN = np.array(avgd_dataRNN)

# print(avg_dataRNN.shape)

In [11]:
# @title Z Score Code 


# from scipy.stats import zscore


# neuron = 500

# scaled_all = zscore(avg_dataRNN)
# scaled_per_neuron = zscore(avg_dataRNN[neuron, :])

# scaled_per_layer = list()

# for layer in unshuffled:
#   scaled_per_layer.append(zscore(layer))

# scaled_per_layer = np.array(scaled_per_layer)



# plt.plot(avg_dataRNN[neuron, :])
# plt.plot(avg_dataRNN[2500, :])
# plt.figure()
# plt.plot(dataRNN[neuron, :])
# plt.figure()
# plt.plot(scaled_all[neuron, :])
# plt.plot(scaled_per_neuron)
# plt.figure()
# plt.plot(scaled_per_layer[0,neuron,:])


In [12]:
# custom normalization

normed_dataRNN = list()
for neuron in dataRNN:
    normed_dataRNN.append(neuron / neuron.mean())
normed_dataRNN = np.array(normed_dataRNN)

# downsampling and averaging 
#avgd_normed_dataRNN = dataRNN#
avgd_normed_dataRNN = moving_avg(dataRNN, factor=10)

issue: does the individual scaling by layer introduce bias that may artificially increase performance of the network?

## Data Loader 


In [13]:
import torch
import torch.nn as nn
import torch.nn.functional as F

In [14]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

In [15]:
# set the seed
np.random.seed(42)

# number of neurons 
NN = dataRNN.shape[0]

In [16]:
# swapping the axes to maintain consistency with seq2seq notebook in the following code - the network takes all the neurons at a time step as input, not just one neuron

# avgd_normed_dataRNN = np.swapaxes(avgd_normed_dataRNN, 0, 1)
avgd_normed_dataRNN.shape

(10179, 702)

In [17]:
frac = 4/5

#x1 = torch.from_numpy(dataRNN[:,:int(frac*dataRNN.shape[1])]).to(device).float().unsqueeze(0)
#x2 = torch.from_numpy(dataRNN[:,int(frac*dataRNN.shape[1]):]).to(device).float().unsqueeze(0)
#x1 = torch.from_numpy(avgd_normed_dataRNN[:1131,:]).to(device).float().unsqueeze(2)
#x2 = torch.from_numpy(avgd_normed_dataRNN[:1131,:]).to(device).float().unsqueeze(2)

n_neurs = 1131
# let's use n_neurs/10 latent components
ncomp = int(n_neurs/5)

x1_train = torch.from_numpy(avgd_normed_dataRNN[:n_neurs,:int(frac*avgd_normed_dataRNN.shape[1])]).to(device).float().unsqueeze(2)
x2_train = torch.from_numpy(avgd_normed_dataRNN[:n_neurs,:int(frac*avgd_normed_dataRNN.shape[1])]).to(device).float().unsqueeze(2)

x1_valid = torch.from_numpy(avgd_normed_dataRNN[:n_neurs,int(frac*avgd_normed_dataRNN.shape[1]):]).to(device).float().unsqueeze(2)
x2_valid = torch.from_numpy(avgd_normed_dataRNN[:n_neurs,int(frac*avgd_normed_dataRNN.shape[1]):]).to(device).float().unsqueeze(2)

NN1 = x1_train.shape[0]
NN2 = x2_train.shape[0]


In [53]:
class Net(nn.Module):
    def __init__(self, ncomp, NN1, NN2, num_layers=1, dropout=0, bidi=True):
        super(Net, self).__init__()

        # play with some of the options in the RNN!
        
        self.rnn = nn.LSTM(NN1, ncomp, num_layers = num_layers, dropout = dropout,
                         bidirectional = bidi)
        """
        self.rnn = nn.RNN(NN1, ncomp, num_layers = 1, dropout = 0,
                    bidirectional = bidi, nonlinearity = 'tanh')
        self.rnn = nn.GRU(NN1, ncomp, num_layers = 1, dropout = 0,
                         bidirectional = bidi)
        """
        
        self.mlp = nn.Sequential(
                    nn.Linear(ncomp, ncomp*2),
                    nn.Mish(),
                    nn.Linear(ncomp*2, ncomp*2),
                    nn.Mish(),
                    nn.Dropout(0.25),
                    nn.Linear(ncomp*2, ncomp), 
                    nn.Mish())
        
        self.fc = nn.Linear(ncomp, NN2)

    def forward(self, x):
        x = x.permute(1, 2, 0)
        #print(x.shape)
        # h_0 = torch.zeros(2, x.size()[1], self.ncomp).to(device)
        
        y, h_n = self.rnn(x)

        #print(y.shape)
        #print(h_n.shape)
        if self.rnn.bidirectional:
          # if the rnn is bidirectional, it concatenates the activations from the forward and backward pass
          # we want to add them instead, so as to enforce the latents to match between the forward and backward pass
            q = (y[:, :, :ncomp] + y[:, :, ncomp:])/2
        else:
            q = y
        
        q = self.mlp(q)

        # the softplus function is just like a relu but it's smoothed out so we can't predict 0
        # if we predict 0 and there was a spike, that's an instant Inf in the Poisson log-likelihood which leads to failure
        #z = F.softplus(self.fc(q), 10)
        #print(q.shape)
        z = self.fc(q).permute(2, 0, 1)
        # print(z.shape)
        return z, q

In [54]:
sweep_config = {
    'method': 'random'
    }

metric = {
    'name': 'loss',
    'goal': 'minimize'   
    }

sweep_config['metric'] = metric

parameters_dict = {
    'optimizer': {
        'values': ['adam', 'sgd']
        },
    'num_layers': {
        'values': [2, 3]
        },
    'dropout': {
          'values': [0.1, 0.2, 0.3]
        },
    'weight_decay': {
          'values': [5e-5, 0.]
        },
    }

sweep_config['parameters'] = parameters_dict

parameters_dict.update({
    'epochs': {
        'value': 1000}
    })

import math

parameters_dict.update({
    'learning_rate': {
        # a flat distribution between 0 and 0.1
        'distribution': 'log_uniform',
        'min': -9.9,
        'max': -5.3
    },
})

import pprint

pprint.pprint(sweep_config)

{'method': 'random',
 'metric': {'goal': 'minimize', 'name': 'loss'},
 'parameters': {'dropout': {'values': [0.1, 0.2, 0.3]},
                'epochs': {'value': 1000},
                'learning_rate': {'distribution': 'log_uniform',
                                  'max': -5.3,
                                  'min': -9.9},
                'num_layers': {'values': [2, 3]},
                'optimizer': {'values': ['adam', 'sgd']},
                'weight_decay': {'values': [5e-05, 0.0]}}}


In [55]:
sweep_id = wandb.sweep(sweep_config, project="NMAs-sweeps-test")

Create sweep with ID: 45npk1l6
Sweep URL: https://wandb.ai/khalidsaifullaah/NMAs-sweeps-test/sweeps/45npk1l6


In [56]:
# you can keep re-running this cell if you think the cost might decrease further
cost = nn.MSELoss()

niter = 5800
# rnn_loss = 0.2372, lstm_loss = 0.2340, gru_lstm = 0.2370

In [57]:
def build_optimizer(network, optimizer, learning_rate, weight_decay):
    if optimizer == "sgd":
        optimizer = torch.optim.SGD(network.parameters(),
                              lr=learning_rate, momentum=0.9, weight_decay=weight_decay)
    elif optimizer == "adam":
        optimizer = torch.optim.Adam(network.parameters(),
                               lr=learning_rate, weight_decay=weight_decay)
    return optimizer

def train(config=None):
    # Initialize a new wandb run
    with wandb.init(config=config):
        # If called by wandb.agent, as below,
        # this config will be set by Sweep Controller
        config = wandb.config

        # loader = build_dataset(config.batch_size)
        # Net(ncomp, NN1, NN2, bidi = True).to(device)
        network = Net(ncomp, NN1, NN2, config.num_layers, config.dropout).to(device)
        optimizer = build_optimizer(network, config.optimizer, config.learning_rate, config.weight_decay)

        for epoch in range(config.epochs):
            # avg_loss = train_epoch(network, loader, optimizer)
            network.train()
            # the networkwork outputs the single-neuron prediction and the latents
            z, y = network(x1_train)

            # our cost
            loss = cost(z, x2_train)

            # train the networkwork as usual
            loss.backward()
            optimizer.step()
            optimizer.zero_grad()
            with torch.no_grad():
                network.eval()
                valid_loss = cost(network(x1_valid)[0], x2_valid)

            if epoch % 50 == 0:
                with torch.no_grad():
                    network.eval()
                    valid_loss = cost(network(x1_valid)[0], x2_valid)
                    
                    print(f' iteration {epoch}, train cost {loss.item():.4f}, valid cost {valid_loss.item():.4f}')
            wandb.log({"train_loss": loss.item(), 'valid_loss': valid_loss.item(), "epoch": epoch})    

In [None]:
wandb.agent(sweep_id, train, count=15)

[34m[1mwandb[0m: Agent Starting Run: oxzru1p4 with config:
[34m[1mwandb[0m: 	dropout: 0.2
[34m[1mwandb[0m: 	epochs: 1000
[34m[1mwandb[0m: 	learning_rate: 0.0011218672414935777
[34m[1mwandb[0m: 	num_layers: 3
[34m[1mwandb[0m: 	optimizer: adam
[34m[1mwandb[0m: 	weight_decay: 5e-05


 iteration 0, train cost 2.5689, valid cost 2.5861
 iteration 50, train cost 0.9216, valid cost 0.9491
 iteration 100, train cost 0.9202, valid cost 0.9482
 iteration 150, train cost 0.9191, valid cost 0.9479
 iteration 200, train cost 0.9148, valid cost 0.9442
 iteration 250, train cost 0.8677, valid cost 0.8961
 iteration 300, train cost 0.8517, valid cost 0.8860
 iteration 350, train cost 0.8352, valid cost 0.8780
 iteration 400, train cost 0.8167, valid cost 0.8629
 iteration 450, train cost 0.7973, valid cost 0.8488
 iteration 500, train cost 0.7712, valid cost 0.8379
 iteration 550, train cost 0.7506, valid cost 0.8187
 iteration 600, train cost 0.7309, valid cost 0.8041
 iteration 650, train cost 0.7019, valid cost 0.7964
 iteration 700, train cost 0.6800, valid cost 0.7966
 iteration 750, train cost 0.6618, valid cost 0.7896
 iteration 800, train cost 0.6400, valid cost 0.7910
 iteration 850, train cost 0.6219, valid cost 0.7931
 iteration 900, train cost 0.6035, valid cost 0.8

VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
train_loss,0.57094
valid_loss,0.80005
epoch,999.0
_runtime,220.0
_timestamp,1628895935.0
_step,999.0


0,1
train_loss,█▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁
valid_loss,█▃▃▃▃▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
epoch,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
_runtime,▁▁▁▁▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▆▆▆▆▆▆▇▇▇▇▇▇███
_timestamp,▁▁▁▁▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▆▆▆▆▆▆▇▇▇▇▇▇███
_step,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███


[34m[1mwandb[0m: Agent Starting Run: ol4ojwre with config:
[34m[1mwandb[0m: 	dropout: 0.2
[34m[1mwandb[0m: 	epochs: 1000
[34m[1mwandb[0m: 	learning_rate: 0.00244977351874619
[34m[1mwandb[0m: 	num_layers: 2
[34m[1mwandb[0m: 	optimizer: sgd
[34m[1mwandb[0m: 	weight_decay: 5e-05


 iteration 0, train cost 2.5655, valid cost 2.5974
 iteration 50, train cost 2.5581, valid cost 2.5898
 iteration 100, train cost 2.5492, valid cost 2.5809
 iteration 150, train cost 2.5402, valid cost 2.5720
 iteration 200, train cost 2.5313, valid cost 2.5630
 iteration 250, train cost 2.5222, valid cost 2.5539
 iteration 300, train cost 2.5130, valid cost 2.5448
 iteration 350, train cost 2.5037, valid cost 2.5354
 iteration 400, train cost 2.4942, valid cost 2.5258
 iteration 450, train cost 2.4843, valid cost 2.5160
 iteration 500, train cost 2.4741, valid cost 2.5058
 iteration 550, train cost 2.4635, valid cost 2.4951
 iteration 600, train cost 2.4523, valid cost 2.4838
 iteration 650, train cost 2.4403, valid cost 2.4718
 iteration 700, train cost 2.4272, valid cost 2.4588
 iteration 750, train cost 2.4131, valid cost 2.4445
 iteration 800, train cost 2.3971, valid cost 2.4285
 iteration 850, train cost 2.3787, valid cost 2.4099
 iteration 900, train cost 2.3566, valid cost 2.3

VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
train_loss,2.29054
valid_loss,2.32068
epoch,999.0
_runtime,144.0
_timestamp,1628896086.0
_step,999.0


0,1
train_loss,█████▇▇▇▇▇▇▇▇▆▆▆▆▆▆▆▆▅▅▅▅▅▅▄▄▄▄▄▃▃▃▃▂▂▁▁
valid_loss,█████▇▇▇▇▇▇▇▇▆▆▆▆▆▆▆▆▅▅▅▅▅▅▄▄▄▄▄▃▃▃▃▂▂▂▁
epoch,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
_runtime,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
_timestamp,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
_step,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███


[34m[1mwandb[0m: Agent Starting Run: 8h3fq3ge with config:
[34m[1mwandb[0m: 	dropout: 0.1
[34m[1mwandb[0m: 	epochs: 1000
[34m[1mwandb[0m: 	learning_rate: 0.0005196211631499388
[34m[1mwandb[0m: 	num_layers: 3
[34m[1mwandb[0m: 	optimizer: sgd
[34m[1mwandb[0m: 	weight_decay: 0


 iteration 0, train cost 2.5606, valid cost 2.5928
 iteration 50, train cost 2.5591, valid cost 2.5912
 iteration 100, train cost 2.5572, valid cost 2.5894
 iteration 150, train cost 2.5554, valid cost 2.5875
 iteration 200, train cost 2.5535, valid cost 2.5857
 iteration 250, train cost 2.5517, valid cost 2.5838
 iteration 300, train cost 2.5498, valid cost 2.5820
 iteration 350, train cost 2.5480, valid cost 2.5801
 iteration 400, train cost 2.5461, valid cost 2.5783
 iteration 450, train cost 2.5442, valid cost 2.5764
 iteration 500, train cost 2.5424, valid cost 2.5746
 iteration 550, train cost 2.5405, valid cost 2.5727
 iteration 600, train cost 2.5387, valid cost 2.5708
 iteration 650, train cost 2.5368, valid cost 2.5690
 iteration 700, train cost 2.5349, valid cost 2.5671
 iteration 750, train cost 2.5331, valid cost 2.5653
 iteration 800, train cost 2.5312, valid cost 2.5634
 iteration 850, train cost 2.5294, valid cost 2.5615
 iteration 900, train cost 2.5275, valid cost 2.5

VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
train_loss,2.52378
valid_loss,2.55595
epoch,999.0
_runtime,215.0
_timestamp,1628896308.0
_step,999.0


0,1
train_loss,████▇▇▇▇▇▆▆▆▆▆▆▅▅▅▅▅▄▄▄▄▄▄▃▃▃▃▃▂▂▂▂▂▂▁▁▁
valid_loss,████▇▇▇▇▇▆▆▆▆▆▆▅▅▅▅▅▄▄▄▄▄▄▃▃▃▃▃▂▂▂▂▂▂▁▁▁
epoch,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
_runtime,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
_timestamp,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
_step,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███


[34m[1mwandb[0m: Agent Starting Run: 7ohhxr07 with config:
[34m[1mwandb[0m: 	dropout: 0.3
[34m[1mwandb[0m: 	epochs: 1000
[34m[1mwandb[0m: 	learning_rate: 9.802412632800289e-05
[34m[1mwandb[0m: 	num_layers: 3
[34m[1mwandb[0m: 	optimizer: adam
[34m[1mwandb[0m: 	weight_decay: 5e-05


 iteration 0, train cost 2.5669, valid cost 2.5975
 iteration 50, train cost 1.4359, valid cost 1.4200
 iteration 100, train cost 0.9531, valid cost 0.9644
 iteration 150, train cost 0.9320, valid cost 0.9496
 iteration 200, train cost 0.9263, valid cost 0.9491
 iteration 250, train cost 0.9241, valid cost 0.9491
 iteration 300, train cost 0.9222, valid cost 0.9489
 iteration 350, train cost 0.9217, valid cost 0.9486
 iteration 400, train cost 0.9202, valid cost 0.9482
 iteration 450, train cost 0.9200, valid cost 0.9476
 iteration 500, train cost 0.9188, valid cost 0.9470
 iteration 550, train cost 0.9180, valid cost 0.9459
 iteration 600, train cost 0.9157, valid cost 0.9441
 iteration 650, train cost 0.9086, valid cost 0.9371
 iteration 700, train cost 0.8879, valid cost 0.9173
 iteration 750, train cost 0.8687, valid cost 0.8982
 iteration 800, train cost 0.8623, valid cost 0.8904
 iteration 850, train cost 0.8586, valid cost 0.8868
 iteration 900, train cost 0.8550, valid cost 0.8

VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
train_loss,0.83975
valid_loss,0.86837
epoch,999.0
_runtime,221.0
_timestamp,1628896537.0
_step,999.0


0,1
train_loss,█▆▃▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
valid_loss,█▆▃▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
epoch,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
_runtime,▁▁▁▁▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▆▆▆▆▆▆▇▇▇▇▇████
_timestamp,▁▁▁▁▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▆▆▆▆▆▆▇▇▇▇▇████
_step,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███


[34m[1mwandb[0m: Agent Starting Run: xjycba27 with config:
[34m[1mwandb[0m: 	dropout: 0.3
[34m[1mwandb[0m: 	epochs: 1000
[34m[1mwandb[0m: 	learning_rate: 9.662557339611326e-05
[34m[1mwandb[0m: 	num_layers: 3
[34m[1mwandb[0m: 	optimizer: sgd
[34m[1mwandb[0m: 	weight_decay: 5e-05


 iteration 0, train cost 2.5637, valid cost 2.5961
 iteration 50, train cost 2.5634, valid cost 2.5958
 iteration 100, train cost 2.5630, valid cost 2.5955
 iteration 150, train cost 2.5627, valid cost 2.5952
 iteration 200, train cost 2.5624, valid cost 2.5948
 iteration 250, train cost 2.5620, valid cost 2.5945
 iteration 300, train cost 2.5617, valid cost 2.5941
 iteration 350, train cost 2.5613, valid cost 2.5938
 iteration 400, train cost 2.5610, valid cost 2.5934
 iteration 450, train cost 2.5606, valid cost 2.5931
 iteration 500, train cost 2.5603, valid cost 2.5928
 iteration 550, train cost 2.5600, valid cost 2.5924
 iteration 600, train cost 2.5596, valid cost 2.5921
 iteration 650, train cost 2.5593, valid cost 2.5917
 iteration 700, train cost 2.5589, valid cost 2.5914
 iteration 750, train cost 2.5586, valid cost 2.5910
 iteration 800, train cost 2.5582, valid cost 2.5907
 iteration 850, train cost 2.5579, valid cost 2.5904
 iteration 900, train cost 2.5576, valid cost 2.5

VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
train_loss,2.55691
valid_loss,2.58935
epoch,999.0
_runtime,220.0
_timestamp,1628896764.0
_step,999.0


0,1
train_loss,████▇▇▇▇▇▆▆▆▆▆▆▅▅▅▅▅▄▄▄▄▄▃▃▃▃▃▃▂▂▂▂▂▂▁▁▁
valid_loss,████▇▇▇▇▇▆▆▆▆▆▆▅▅▅▅▅▄▄▄▄▄▄▃▃▃▃▃▂▂▂▂▂▂▁▁▁
epoch,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
_runtime,▁▁▁▁▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
_timestamp,▁▁▁▁▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
_step,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███


[34m[1mwandb[0m: Agent Starting Run: u4egdfwz with config:
[34m[1mwandb[0m: 	dropout: 0.3
[34m[1mwandb[0m: 	epochs: 1000
[34m[1mwandb[0m: 	learning_rate: 0.0005797898517473733
[34m[1mwandb[0m: 	num_layers: 3
[34m[1mwandb[0m: 	optimizer: adam
[34m[1mwandb[0m: 	weight_decay: 5e-05


 iteration 0, train cost 2.5616, valid cost 2.5854
 iteration 50, train cost 0.9275, valid cost 0.9504
 iteration 100, train cost 0.9218, valid cost 0.9484
 iteration 150, train cost 0.9207, valid cost 0.9481
 iteration 200, train cost 0.9193, valid cost 0.9474
 iteration 250, train cost 0.9168, valid cost 0.9455
 iteration 300, train cost 0.9083, valid cost 0.9372
 iteration 350, train cost 0.8627, valid cost 0.8899
 iteration 400, train cost 0.8530, valid cost 0.8827
 iteration 450, train cost 0.8420, valid cost 0.8706
 iteration 500, train cost 0.8193, valid cost 0.8566
 iteration 550, train cost 0.7790, valid cost 0.8184
 iteration 600, train cost 0.7606, valid cost 0.8018
 iteration 650, train cost 0.7470, valid cost 0.7916
 iteration 700, train cost 0.7182, valid cost 0.7689
 iteration 750, train cost 0.6936, valid cost 0.7569
 iteration 800, train cost 0.6782, valid cost 0.7508
 iteration 850, train cost 0.6610, valid cost 0.7493
 iteration 900, train cost 0.6462, valid cost 0.7

VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
train_loss,0.60789
valid_loss,0.74037
epoch,999.0
_runtime,222.0
_timestamp,1628896993.0
_step,999.0


0,1
train_loss,█▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁
valid_loss,█▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
epoch,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
_runtime,▁▁▁▁▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▆▆▆▆▆▆▇▇▇▇▇████
_timestamp,▁▁▁▁▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▆▆▆▆▆▆▇▇▇▇▇████
_step,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███


[34m[1mwandb[0m: Agent Starting Run: uy3p6qvk with config:
[34m[1mwandb[0m: 	dropout: 0.2
[34m[1mwandb[0m: 	epochs: 1000
[34m[1mwandb[0m: 	learning_rate: 0.0004367548708801497
[34m[1mwandb[0m: 	num_layers: 3
[34m[1mwandb[0m: 	optimizer: sgd
[34m[1mwandb[0m: 	weight_decay: 0


 iteration 0, train cost 2.5620, valid cost 2.5946
 iteration 50, train cost 2.5607, valid cost 2.5932
 iteration 100, train cost 2.5591, valid cost 2.5917
 iteration 150, train cost 2.5575, valid cost 2.5901
 iteration 200, train cost 2.5559, valid cost 2.5885
 iteration 250, train cost 2.5544, valid cost 2.5869
 iteration 300, train cost 2.5528, valid cost 2.5853
 iteration 350, train cost 2.5512, valid cost 2.5838
 iteration 400, train cost 2.5496, valid cost 2.5822
 iteration 450, train cost 2.5481, valid cost 2.5806
 iteration 500, train cost 2.5465, valid cost 2.5790
 iteration 550, train cost 2.5449, valid cost 2.5774
 iteration 600, train cost 2.5434, valid cost 2.5759
 iteration 650, train cost 2.5418, valid cost 2.5743
 iteration 700, train cost 2.5402, valid cost 2.5727
 iteration 750, train cost 2.5386, valid cost 2.5711
 iteration 800, train cost 2.5371, valid cost 2.5696
 iteration 850, train cost 2.5355, valid cost 2.5680
 iteration 900, train cost 2.5339, valid cost 2.5

VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
train_loss,2.53073
valid_loss,2.56325
epoch,999.0
_runtime,221.0
_timestamp,1628897222.0
_step,999.0


0,1
train_loss,████▇▇▇▇▇▆▆▆▆▆▆▅▅▅▅▅▄▄▄▄▄▄▃▃▃▃▃▂▂▂▂▂▂▁▁▁
valid_loss,████▇▇▇▇▇▆▆▆▆▆▆▅▅▅▅▅▄▄▄▄▄▄▃▃▃▃▃▂▂▂▂▂▂▁▁▁
epoch,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
_runtime,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▆▆▆▆▆▆▇▇▇▇▇████
_timestamp,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▆▆▆▆▆▆▇▇▇▇▇████
_step,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███


[34m[1mwandb[0m: Agent Starting Run: ddd13pi7 with config:
[34m[1mwandb[0m: 	dropout: 0.2
[34m[1mwandb[0m: 	epochs: 1000
[34m[1mwandb[0m: 	learning_rate: 0.0014710560946805586
[34m[1mwandb[0m: 	num_layers: 3
[34m[1mwandb[0m: 	optimizer: sgd
[34m[1mwandb[0m: 	weight_decay: 0


 iteration 0, train cost 2.5643, valid cost 2.5964
 iteration 50, train cost 2.5600, valid cost 2.5920
 iteration 100, train cost 2.5548, valid cost 2.5867
 iteration 150, train cost 2.5495, valid cost 2.5815
 iteration 200, train cost 2.5443, valid cost 2.5762
 iteration 250, train cost 2.5390, valid cost 2.5710
 iteration 300, train cost 2.5338, valid cost 2.5657
 iteration 350, train cost 2.5285, valid cost 2.5604
 iteration 400, train cost 2.5232, valid cost 2.5551
 iteration 450, train cost 2.5178, valid cost 2.5498
 iteration 500, train cost 2.5125, valid cost 2.5444
 iteration 550, train cost 2.5070, valid cost 2.5390
 iteration 600, train cost 2.5016, valid cost 2.5335
 iteration 650, train cost 2.4960, valid cost 2.5279
 iteration 700, train cost 2.4904, valid cost 2.5223
 iteration 750, train cost 2.4847, valid cost 2.5166
 iteration 800, train cost 2.4789, valid cost 2.5108
 iteration 850, train cost 2.4730, valid cost 2.5049
 iteration 900, train cost 2.4670, valid cost 2.4

VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
train_loss,2.45475
valid_loss,2.48661
epoch,999.0
_runtime,221.0
_timestamp,1628897451.0
_step,999.0


0,1
train_loss,████▇▇▇▇▇▇▆▆▆▆▆▅▅▅▅▅▅▄▄▄▄▄▄▃▃▃▃▃▂▂▂▂▂▁▁▁
valid_loss,████▇▇▇▇▇▇▆▆▆▆▆▅▅▅▅▅▅▄▄▄▄▄▄▃▃▃▃▃▂▂▂▂▂▁▁▁
epoch,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
_runtime,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇████
_timestamp,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇████
_step,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███


[34m[1mwandb[0m: Agent Starting Run: drt1wn46 with config:
[34m[1mwandb[0m: 	dropout: 0.2
[34m[1mwandb[0m: 	epochs: 1000
[34m[1mwandb[0m: 	learning_rate: 0.00010738414169520817
[34m[1mwandb[0m: 	num_layers: 3
[34m[1mwandb[0m: 	optimizer: sgd
[34m[1mwandb[0m: 	weight_decay: 0


 iteration 0, train cost 2.5678, valid cost 2.5999
 iteration 50, train cost 2.5675, valid cost 2.5996
 iteration 100, train cost 2.5671, valid cost 2.5992
 iteration 150, train cost 2.5667, valid cost 2.5988
 iteration 200, train cost 2.5663, valid cost 2.5984
 iteration 250, train cost 2.5660, valid cost 2.5980
 iteration 300, train cost 2.5656, valid cost 2.5976
 iteration 350, train cost 2.5652, valid cost 2.5972
 iteration 400, train cost 2.5648, valid cost 2.5968
 iteration 450, train cost 2.5644, valid cost 2.5965
 iteration 500, train cost 2.5640, valid cost 2.5961
 iteration 550, train cost 2.5636, valid cost 2.5957
 iteration 600, train cost 2.5632, valid cost 2.5953
 iteration 650, train cost 2.5629, valid cost 2.5949
 iteration 700, train cost 2.5624, valid cost 2.5945
 iteration 750, train cost 2.5621, valid cost 2.5941
 iteration 800, train cost 2.5617, valid cost 2.5937
 iteration 850, train cost 2.5613, valid cost 2.5933
 iteration 900, train cost 2.5609, valid cost 2.5

VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
train_loss,2.56013
valid_loss,2.59219
epoch,999.0
_runtime,220.0
_timestamp,1628897679.0
_step,999.0


0,1
train_loss,████▇▇▇▇▇▆▆▆▆▆▆▅▅▅▅▅▄▄▄▄▄▄▃▃▃▃▃▂▂▂▂▂▂▁▁▁
valid_loss,████▇▇▇▇▇▆▆▆▆▆▆▅▅▅▅▅▄▄▄▄▄▄▃▃▃▃▃▂▂▂▂▂▂▁▁▁
epoch,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
_runtime,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
_timestamp,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
_step,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███


[34m[1mwandb[0m: Agent Starting Run: wvyae90q with config:
[34m[1mwandb[0m: 	dropout: 0.3
[34m[1mwandb[0m: 	epochs: 1000
[34m[1mwandb[0m: 	learning_rate: 0.001962058028434178
[34m[1mwandb[0m: 	num_layers: 3
[34m[1mwandb[0m: 	optimizer: sgd
[34m[1mwandb[0m: 	weight_decay: 0


 iteration 0, train cost 2.5597, valid cost 2.5925
 iteration 50, train cost 2.5539, valid cost 2.5865
 iteration 100, train cost 2.5469, valid cost 2.5795
 iteration 150, train cost 2.5399, valid cost 2.5724
 iteration 200, train cost 2.5328, valid cost 2.5654
 iteration 250, train cost 2.5257, valid cost 2.5583
 iteration 300, train cost 2.5185, valid cost 2.5511
 iteration 350, train cost 2.5114, valid cost 2.5439
 iteration 400, train cost 2.5041, valid cost 2.5366
 iteration 450, train cost 2.4967, valid cost 2.5292
 iteration 500, train cost 2.4892, valid cost 2.5217
 iteration 550, train cost 2.4816, valid cost 2.5141
 iteration 600, train cost 2.4738, valid cost 2.5063
 iteration 650, train cost 2.4658, valid cost 2.4983
 iteration 700, train cost 2.4576, valid cost 2.4900
 iteration 750, train cost 2.4491, valid cost 2.4815
 iteration 800, train cost 2.4403, valid cost 2.4727
 iteration 850, train cost 2.4313, valid cost 2.4636
 iteration 900, train cost 2.4217, valid cost 2.4

VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
train_loss,2.40149
valid_loss,2.43383
epoch,999.0
_runtime,222.0
_timestamp,1628897909.0
_step,999.0


0,1
train_loss,████▇▇▇▇▇▇▆▆▆▆▆▆▅▅▅▅▅▅▄▄▄▄▄▄▃▃▃▃▃▂▂▂▂▁▁▁
valid_loss,████▇▇▇▇▇▇▆▆▆▆▆▆▅▅▅▅▅▅▄▄▄▄▄▄▃▃▃▃▃▂▂▂▂▁▁▁
epoch,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
_runtime,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▆▆▆▆▆▆▇▇▇▇▇▇███
_timestamp,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▆▆▆▆▆▆▇▇▇▇▇▇███
_step,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███


[34m[1mwandb[0m: Agent Starting Run: jf9dkzcd with config:
[34m[1mwandb[0m: 	dropout: 0.2
[34m[1mwandb[0m: 	epochs: 1000
[34m[1mwandb[0m: 	learning_rate: 6.297753987053766e-05
[34m[1mwandb[0m: 	num_layers: 3
[34m[1mwandb[0m: 	optimizer: adam
[34m[1mwandb[0m: 	weight_decay: 0


 iteration 0, train cost 2.5604, valid cost 2.5921
 iteration 50, train cost 1.7777, valid cost 1.7720
 iteration 100, train cost 1.0303, valid cost 1.0443
 iteration 150, train cost 0.9439, valid cost 0.9579
 iteration 200, train cost 0.9349, valid cost 0.9501
 iteration 250, train cost 0.9315, valid cost 0.9491
 iteration 300, train cost 0.9295, valid cost 0.9487
 iteration 350, train cost 0.9270, valid cost 0.9482
 iteration 400, train cost 0.9247, valid cost 0.9474
 iteration 450, train cost 0.9233, valid cost 0.9468
 iteration 500, train cost 0.9227, valid cost 0.9465
 iteration 550, train cost 0.9216, valid cost 0.9461
 iteration 600, train cost 0.9210, valid cost 0.9458
 iteration 650, train cost 0.9204, valid cost 0.9453
 iteration 700, train cost 0.9194, valid cost 0.9447
 iteration 750, train cost 0.9176, valid cost 0.9438
 iteration 800, train cost 0.9145, valid cost 0.9414
 iteration 850, train cost 0.9072, valid cost 0.9350
 iteration 900, train cost 0.8941, valid cost 0.9

VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
train_loss,0.87024
valid_loss,0.89842
epoch,999.0
_runtime,223.0
_timestamp,1628898140.0
_step,999.0


0,1
train_loss,█▇▄▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
valid_loss,█▇▄▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
epoch,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
_runtime,▁▁▁▁▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
_timestamp,▁▁▁▁▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
_step,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███


[34m[1mwandb[0m: Agent Starting Run: maan2t8j with config:
[34m[1mwandb[0m: 	dropout: 0.2
[34m[1mwandb[0m: 	epochs: 1000
[34m[1mwandb[0m: 	learning_rate: 0.0024747739403080393
[34m[1mwandb[0m: 	num_layers: 3
[34m[1mwandb[0m: 	optimizer: sgd
[34m[1mwandb[0m: 	weight_decay: 0


 iteration 0, train cost 2.5664, valid cost 2.5979
 iteration 50, train cost 2.5592, valid cost 2.5906
 iteration 100, train cost 2.5505, valid cost 2.5819
 iteration 150, train cost 2.5418, valid cost 2.5732
 iteration 200, train cost 2.5331, valid cost 2.5644
 iteration 250, train cost 2.5243, valid cost 2.5557
 iteration 300, train cost 2.5155, valid cost 2.5468
 iteration 350, train cost 2.5065, valid cost 2.5378
 iteration 400, train cost 2.4974, valid cost 2.5287
 iteration 450, train cost 2.4880, valid cost 2.5194
 iteration 500, train cost 2.4785, valid cost 2.5099
 iteration 550, train cost 2.4687, valid cost 2.5000
 iteration 600, train cost 2.4586, valid cost 2.4899
 iteration 650, train cost 2.4481, valid cost 2.4794
 iteration 700, train cost 2.4371, valid cost 2.4684
 iteration 750, train cost 2.4255, valid cost 2.4568
 iteration 800, train cost 2.4133, valid cost 2.4446
 iteration 850, train cost 2.4002, valid cost 2.4315
 iteration 900, train cost 2.3863, valid cost 2.4

VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
train_loss,2.35478
valid_loss,2.38595
epoch,999.0
_runtime,219.0
_timestamp,1628898367.0
_step,999.0


0,1
train_loss,████▇▇▇▇▇▇▇▆▆▆▆▆▆▅▅▅▅▅▅▄▄▄▄▄▄▃▃▃▃▃▂▂▂▂▁▁
valid_loss,████▇▇▇▇▇▇▇▆▆▆▆▆▆▅▅▅▅▅▅▄▄▄▄▄▄▃▃▃▃▃▂▂▂▂▁▁
epoch,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
_runtime,▁▁▁▁▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
_timestamp,▁▁▁▁▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
_step,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███


[34m[1mwandb[0m: Agent Starting Run: y58a3sxx with config:
[34m[1mwandb[0m: 	dropout: 0.3
[34m[1mwandb[0m: 	epochs: 1000
[34m[1mwandb[0m: 	learning_rate: 5.174818285265098e-05
[34m[1mwandb[0m: 	num_layers: 3
[34m[1mwandb[0m: 	optimizer: adam
[34m[1mwandb[0m: 	weight_decay: 5e-05


 iteration 0, train cost 2.5613, valid cost 2.5925
 iteration 50, train cost 2.2037, valid cost 2.1966
 iteration 100, train cost 1.1679, valid cost 1.1745
 iteration 150, train cost 0.9659, valid cost 0.9724
 iteration 200, train cost 0.9389, valid cost 0.9504
 iteration 250, train cost 0.9329, valid cost 0.9498
 iteration 300, train cost 0.9296, valid cost 0.9502
 iteration 350, train cost 0.9267, valid cost 0.9510
 iteration 400, train cost 0.9256, valid cost 0.9514
 iteration 450, train cost 0.9240, valid cost 0.9509
 iteration 500, train cost 0.9235, valid cost 0.9513
 iteration 550, train cost 0.9228, valid cost 0.9509
 iteration 600, train cost 0.9218, valid cost 0.9513
 iteration 650, train cost 0.9212, valid cost 0.9507
 iteration 700, train cost 0.9206, valid cost 0.9502
 iteration 750, train cost 0.9202, valid cost 0.9496
 iteration 800, train cost 0.9194, valid cost 0.9496
 iteration 850, train cost 0.9186, valid cost 0.9485
 iteration 900, train cost 0.9172, valid cost 0.9

VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
train_loss,0.90909
valid_loss,0.94316
epoch,999.0
_runtime,223.0
_timestamp,1628898599.0
_step,999.0


0,1
train_loss,██▆▃▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
valid_loss,██▆▃▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
epoch,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
_runtime,▁▁▁▁▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▆▆▆▆▆▆▇▇▇▇▇▇███
_timestamp,▁▁▁▁▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▆▆▆▆▆▆▇▇▇▇▇▇███
_step,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███


[34m[1mwandb[0m: Agent Starting Run: 10psf6pu with config:
[34m[1mwandb[0m: 	dropout: 0.1
[34m[1mwandb[0m: 	epochs: 1000
[34m[1mwandb[0m: 	learning_rate: 0.00012200477287091464
[34m[1mwandb[0m: 	num_layers: 3
[34m[1mwandb[0m: 	optimizer: sgd
[34m[1mwandb[0m: 	weight_decay: 5e-05


 iteration 0, train cost 2.5630, valid cost 2.5951
 iteration 50, train cost 2.5627, valid cost 2.5947
 iteration 100, train cost 2.5622, valid cost 2.5943
