## Introduction

This notebook is based on the notebook originally published with LSTM training and inference: [3 LSTMs; with Data Picking and Shifting](https://www.kaggle.com/code/seungmoklee/3-lstms-with-data-picking-and-shifting). So if you like my notebooks don't forget the work that it is based on!

This notebook shows how the LSTM models can be trained on TPU. It should also run on GPU...just take into account the smaller batch size and learning rate in that case...

The datapreprocessing is done in the same way with the following exceptions:
1. The pulse_count is 96 instead of 128. During various experiments I often noticed that the performance was better with a maximum of 96 pulses compared to 128 pulses.
2. The features 'r_err' and 'z_err' where not added. In various experiments with the model training they seemed to hurt the performance.

With these modifications the files are a lot smaller and more files will fit into memory. In the attached Dataset 90 training files are provided. See the following notebook for how to preprocess the training data and generate the required files: [Tensorflow LSTM Model Data PreProcessor](https://www.kaggle.com/rsmits/tensorflow-lstm-model-data-preprocessor)

I made the following modifications that (all combined..) increased the performance of the LSTM model drastically. Below the most important ones:
* Use a lot more of the training data. The largest set I used sofar is 70 training files.
* Increase the bin_num. Further increasing might be possible... I did notice that this only works when enough training files are used. With only a small set of training files performance will actually drop.
* Use a large batch size
* Higher learning rate
* Use GRU instead of LSTM
* Use a larger number of LSTM cells.
* Add an additional Bidirectional/GRU layer.
* Add an additional Dense layer with enough units.
* Add a Masking layer. This takes into account any pulses with all 0. values.
* Don't use One Hot Encoding, just an integer. One Hot Encoding is killing the amount of RAM available.
* Lower the maximum pulse count from 128 to 96.
* Use only 6 features when training: time, charge, aux, x, y, z 

The model files used in the [Tensorflow LSTM Model Inference](https://www.kaggle.com/code/rsmits/tensorflow-lstm-model-inference) notebook where trained on my local laptop loading in 70 training files into 32 GB of RAM. All code in this TPU training notebook is the same with the difference that every (data_new_load_interval) number of epochs a new set of training files (amount set with train_files_delta) is loaded. This takes into account the limited RAM available in the Kaggle environment while still being able to use all data...just in a different way.

To get the same performance as the model files in the inference notebook you should train locally with all data loaded in RAM. The batch loading of training files as used for TPU does not seem to achieve the exact high performance. With only 20 hours of TPU time available each week it is a limitation to be able to tweak and tune the TPU notebook to do exactly the same. To mimic the local training from the 2 inference models change the hyperparameters as they are marked with 'Local Training'.

Some ideas for increasing the performance:
* Change the current .npz files to TFRecords. That will allow all data to be used easily by the TPU's.
* Experiment with different bin_nums.
* Experiment with Dropout.
* Experiment with Learning Rate scheduling.
* Etc.

Let me know if you have any feedback, comments or questions about this notebook. 

## Update Latest Version

In this latest update I changed a few hyperparameters to increase the performance when training on TPU. The delta set of training files used is now reloaded more often and the selection of training files is completely random.

After some verification it turns out that using Generators with TPU is not supported. As already mentioned in the earlier versions of this training notebook you can further experiment with the notebook as it currently is ... or switch to using TFRecords. Tate Larkin made a very nice [notebook](https://www.kaggle.com/code/tatelarkin/saving-and-loading-icecube-data-as-tfrecord) showing the concepts of how to convert the data into TFRecords.

In [1]:
# Import
import numpy as np
import os
# os.environ["CUDA_VISIBLE_DEVICES"] = "-1"
import gc
import tensorflow as tf
import random
from tqdm.notebook import tqdm

2023-04-10 09:45:35.208148: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-04-10 09:45:41.218329: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:/usr/local/cuda/lib:/usr/local/lib/x86_64-linux-gnu:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/opt/conda/lib
2023-04-10 09:45:41.218505: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object

In [2]:
# Configure Strategy. Assume TPU...if not set default for GPU
tpu = None
try:
    tpu = tf.distribute.cluster_resolver.TPUClusterResolver()
    tf.config.experimental_connect_to_cluster(tpu)
    tf.tpu.experimental.initialize_tpu_system(tpu)
    strategy = tf.distribute.experimental.TPUStrategy(tpu)
except:
    strategy = tf.distribute.get_strategy()

## Constants

In [3]:
# Training
validation_files_amount = 1
data_new_load_interval = 6      # Local Training: None
train_files_delta = 15          # Local Training: None
epochs = 75                     # Local Training: 30
batch_size = 2048               # Local Training: 2048
learning_rate = 0.0022          # Local Training: 0.0005
verbose = 0

data_new_load_interval = 1      # Local Training: None
train_files_delta = 1          # Local Training: None
epochs = 80                     # Local Training: 30
batch_size = 1024             # Local Training: 2048
learning_rate = 0.0005          # Local Training: 0.0005

# Training Batches
train_batch_id_min = 100
train_batch_id_max = 200
train_batch_ids = [*range(train_batch_id_min, train_batch_id_max+1)]
np.random.shuffle(train_batch_ids)
print(train_batch_ids)

# Model Parameters
pulse_count = 96
feature_count = 6
lstm_units = 192
bin_num = 32

# Data
base_dir = "/kaggle/input/icecubedata/"
file_format = base_dir + 'pp_mpc96_n7_batch_{batch_id:d}.npz'

[116, 130, 176, 124, 109, 104, 163, 156, 185, 150, 169, 138, 111, 119, 121, 134, 174, 175, 194, 146, 144, 128, 103, 160, 177, 153, 113, 154, 199, 118, 189, 106, 143, 139, 135, 192, 186, 168, 132, 120, 200, 187, 166, 149, 123, 172, 180, 100, 142, 137, 155, 117, 136, 161, 178, 188, 191, 126, 167, 114, 190, 181, 112, 157, 159, 131, 151, 197, 182, 133, 165, 115, 122, 198, 145, 102, 158, 183, 171, 152, 125, 127, 179, 184, 105, 173, 108, 147, 110, 141, 196, 101, 162, 170, 148, 164, 129, 107, 195, 140, 193]


In [4]:
# Set Seed
seed = 4242
tf.random.set_seed(seed)
random.seed(seed)
np.random.seed(seed)

## Prepare Metric

In [5]:
def angular_dist_score(az_true, zen_true, az_pred, zen_pred):
    '''
    calculate the MAE of the angular distance between two directions.
    The two vectors are first converted to cartesian unit vectors,
    and then their scalar product is computed, which is equal to
    the cosine of the angle between the two vectors. The inverse 
    cosine (arccos) thereof is then the angle between the two input vectors
    
    Parameters:
    -----------
    
    az_true : float (or array thereof)
        true azimuth value(s) in radian
    zen_true : float (or array thereof)
        true zenith value(s) in radian
    az_pred : float (or array thereof)
        predicted azimuth value(s) in radian
    zen_pred : float (or array thereof)
        predicted zenith value(s) in radian
    
    Returns:
    --------
    
    dist : float
        mean over the angular distance(s) in radian
    '''
    
    if not (np.all(np.isfinite(az_true)) and
            np.all(np.isfinite(zen_true)) and
            np.all(np.isfinite(az_pred)) and
            np.all(np.isfinite(zen_pred))):
        raise ValueError("All arguments must be finite")
    
    # pre-compute all sine and cosine values
    sa1 = np.sin(az_true)
    ca1 = np.cos(az_true)
    sz1 = np.sin(zen_true)
    cz1 = np.cos(zen_true)
    
    sa2 = np.sin(az_pred)
    ca2 = np.cos(az_pred)
    sz2 = np.sin(zen_pred)
    cz2 = np.cos(zen_pred)
    
    # scalar product of the two cartesian vectors (x = sz*ca, y = sz*sa, z = cz)
    scalar_prod = sz1*sz2*(ca1*ca2 + sa1*sa2) + (cz1*cz2)
    
    # scalar product of two unit vectors is always between -1 and 1, this is against nummerical instability
    # that might otherwise occure from the finite precision of the sine and cosine functions
    scalar_prod =  np.clip(scalar_prod, -1, 1)
    
    # convert back to an angle (in radian)
    return np.average(np.abs(np.arccos(scalar_prod)))

## Define Azimuth and Zenith Bins

In [6]:
# Create Azimuth Edges
azimuth_edges = np.linspace(0, 2 * np.pi, bin_num + 1)
print(azimuth_edges)

# Create Zenith Edges
zenith_edges = []
zenith_edges.append(0)
for bin_idx in range(1, bin_num):
    zenith_edges.append(np.arccos(np.cos(zenith_edges[-1]) - 2 / (bin_num)))
zenith_edges.append(np.pi)
zenith_edges = np.array(zenith_edges)
print(zenith_edges)

[0.         0.19634954 0.39269908 0.58904862 0.78539816 0.9817477
 1.17809725 1.37444679 1.57079633 1.76714587 1.96349541 2.15984495
 2.35619449 2.55254403 2.74889357 2.94524311 3.14159265 3.33794219
 3.53429174 3.73064128 3.92699082 4.12334036 4.3196899  4.51603944
 4.71238898 4.90873852 5.10508806 5.3014376  5.49778714 5.69413668
 5.89048623 6.08683577 6.28318531]
[0.         0.3554212  0.50536051 0.62236849 0.72273425 0.81275556
 0.89566479 0.97338991 1.04719755 1.11797973 1.18639955 1.25297262
 1.31811607 1.38217994 1.4454685  1.50825556 1.57079633 1.63333709
 1.69612416 1.75941271 1.82347658 1.88862003 1.9551931  2.02361292
 2.0943951  2.16820274 2.24592786 2.32883709 2.41885841 2.51922417
 2.63623214 2.78617145 3.14159265]


## Supporting Functions

In [7]:
angle_bin_zenith0 = np.tile(zenith_edges[:-1], bin_num)
angle_bin_zenith1 = np.tile(zenith_edges[1:], bin_num)
angle_bin_azimuth0 = np.repeat(azimuth_edges[:-1], bin_num)
angle_bin_azimuth1 = np.repeat(azimuth_edges[1:], bin_num)

angle_bin_area = (angle_bin_azimuth1 - angle_bin_azimuth0) * (np.cos(angle_bin_zenith0) - np.cos(angle_bin_zenith1))
angle_bin_vector_sum_x = (np.sin(angle_bin_azimuth1) - np.sin(angle_bin_azimuth0)) * ((angle_bin_zenith1 - angle_bin_zenith0) / 2 - (np.sin(2 * angle_bin_zenith1) - np.sin(2 * angle_bin_zenith0)) / 4)
angle_bin_vector_sum_y = (np.cos(angle_bin_azimuth0) - np.cos(angle_bin_azimuth1)) * ((angle_bin_zenith1 - angle_bin_zenith0) / 2 - (np.sin(2 * angle_bin_zenith1) - np.sin(2 * angle_bin_zenith0)) / 4)
angle_bin_vector_sum_z = (angle_bin_azimuth1 - angle_bin_azimuth0) * ((np.cos(2 * angle_bin_zenith0) - np.cos(2 * angle_bin_zenith1)) / 4)

angle_bin_vector_mean_x = angle_bin_vector_sum_x / angle_bin_area
angle_bin_vector_mean_y = angle_bin_vector_sum_y / angle_bin_area
angle_bin_vector_mean_z = angle_bin_vector_sum_z / angle_bin_area

angle_bin_vector = np.zeros((1, bin_num * bin_num, 3))
angle_bin_vector[:, :, 0] = angle_bin_vector_mean_x
angle_bin_vector[:, :, 1] = angle_bin_vector_mean_y
angle_bin_vector[:, :, 2] = angle_bin_vector_mean_z

def pred_to_angle(pred, epsilon=1e-8):
    # convert prediction to vector
    pred_vector = (pred.reshape((-1, bin_num * bin_num, 1)) * angle_bin_vector).sum(axis=1)
    
    # normalize
    pred_vector_norm = np.sqrt((pred_vector**2).sum(axis=1))
    mask = pred_vector_norm < epsilon
    pred_vector_norm[mask] = 1
    
    # assign <1, 0, 0> to very small vectors (badly predicted)
    pred_vector /= pred_vector_norm.reshape((-1, 1))
    pred_vector[mask] = np.array([1., 0., 0.])
    
    # convert to angle
    azimuth = np.arctan2(pred_vector[:, 1], pred_vector[:, 0])
    azimuth[azimuth < 0] += 2 * np.pi
    zenith = np.arccos(pred_vector[:, 2])
    
    return azimuth, zenith

def y_to_angle_code(batch_y):
    azimuth_code = (batch_y[:, 0] > azimuth_edges[1:].reshape((-1, 1))).sum(axis=0)
    zenith_code = (batch_y[:, 1] > zenith_edges[1:].reshape((-1, 1))).sum(axis=0)
    angle_code = bin_num * azimuth_code + zenith_code
    
    return angle_code

## Data Loading

In [8]:
def normalize_data(data):
    data[:, :, 0] /= 1000   # time
    data[:, :, 1] /= 300    # charge
    data[:, :, 3:] /= 600   # space
    
    return data

def prep_validation_data(validation_files_amount):
    print("Processing Validation Data...")

    # Prepare fixed Validation Set
    val_x = None
    val_y = None
    
    # Summary
    print(train_batch_ids[:validation_files_amount])

    # Loop
    for batch_id in tqdm(train_batch_ids[:validation_files_amount]):
        val_data_file = np.load(file_format.format(batch_id = batch_id))

        if val_x is None:
            val_x = val_data_file["x"][:, :, [0,1,2,3,4,5]]
            val_y = val_data_file["y"]
        else:
            val_x = np.append(val_x, val_data_file["x"][:, :, [0,1,2,3,4,5]], axis = 0)
            val_y = np.append(val_y, val_data_file["y"], axis = 0)

        val_data_file.close()
        del val_data_file
        _ = gc.collect()

    # Normalize Data
    val_x = normalize_data(val_x)

    # Shape Summary
    print(val_x.shape)
    
    return val_x, val_y

def prep_training_data(start_batch):
    print("Processing Training Data...")
    
    # Placeholders
    train_x = None
    train_y = None
    
    # Summary
    train_ids = random.sample(train_batch_ids[start_batch:], train_files_delta)
    print(train_ids)
    
    # Loop
    for batch_id in tqdm(train_ids):
        train_data_file = np.load(file_format.format(batch_id = batch_id))

        if train_x is None:
            train_x = train_data_file["x"][:, :, [0,1,2,3,4,5]]
            train_y = train_data_file["y"]
        else:
            train_x = np.append(train_x, train_data_file["x"][:, :, [0,1,2,3,4,5]], axis = 0)
            train_y = np.append(train_y, train_data_file["y"], axis = 0)

        train_data_file.close()
        del train_data_file
        _ = gc.collect()

    # Normalize data
    train_x = normalize_data(train_x)
    
    # Shape Summary
    print(train_x.shape)
    
    # Output Encoding
    trn_y_anglecode = y_to_angle_code(train_y)
        
    return train_x, trn_y_anglecode

## Model

In [9]:
def create_model():
    with strategy.scope(): 
        inputs = tf.keras.layers.Input((pulse_count, feature_count))
        
        x = tf.keras.layers.Masking(mask_value = 0., input_shape = (pulse_count, feature_count))(inputs)
        x = tf.keras.layers.Bidirectional(tf.keras.layers.GRU(lstm_units, return_sequences = True, dropout=0.2))(x)
        x = tf.keras.layers.Bidirectional(tf.keras.layers.GRU(lstm_units, return_sequences = True))(x)
        x = tf.keras.layers.Bidirectional(tf.keras.layers.GRU(lstm_units, dropout=0.2))(x)        
        x = tf.keras.layers.Dense(256, activation = 'relu')(x)
        
        outputs = tf.keras.layers.Dense(bin_num**2, activation = 'softmax')(x)

        # Finalize Model
        model = tf.keras.models.Model(inputs = inputs, outputs = outputs)

        # Compile model
        model.compile(loss = 'sparse_categorical_crossentropy',
                      optimizer= tf.keras.optimizers.Adam(learning_rate = learning_rate),
                      metrics = ['accuracy'])
        
        # Show Model Summary
        model.summary()

        return model

## Train Model

In [10]:
# Get Fixed Validation Dataset
val_x, val_y = prep_validation_data(validation_files_amount)

# Create Model
model = create_model()

# For training other than Kaggle environment...provided enough RAM...Load all data
if data_new_load_interval is None and train_files_delta is None:
    print('\nLoading All Train Data')
    start_batch = validation_files_amount
    end_batch = start_batch + (len(train_batch_ids) - validation_files_amount)
    trn_x, trn_y_anglecode = prep_training_data(start_batch)

# Epoch Loop
for e in range(epochs):
    print(f'=========== EPOCH: {e}')
    
    # Load new random batch of training files .. delta wise .. on Kaggle or Colab with limited RAM.
    if data_new_load_interval is not None and train_files_delta is not None and e % data_new_load_interval == 0:
        print(f'\nLoading Train Data at epoch: {e}')
        trn_x, trn_y_anglecode = prep_training_data(validation_files_amount)
    
    # Number of batches
    batch_count = trn_x.shape[0] // batch_size

    # Random Shuffle each epoch
    indices = np.arange(trn_x.shape[0])
    np.random.shuffle(indices)
    trn_x = trn_x[indices]
    trn_y_anglecode = trn_y_anglecode[indices]
        
    # Placeholder
    losses = []
    accuracy = []
        
    # Batch Loop
    for batch_index in tqdm(range(batch_count), total = batch_count):
        b_train_x = trn_x[batch_index * batch_size: batch_index * batch_size + batch_size,:]
        b_train_y = trn_y_anglecode[batch_index * batch_size: batch_index * batch_size + batch_size]
        
        metrics = model.train_on_batch(b_train_x, b_train_y)
        losses.append(metrics[0])
        accuracy.append(metrics[1])  
    
    # Save Model
    model.save(f'/kaggle/input/icecubemodels/tf-lstm/tpu_pp96_n{feature_count}_bin{bin_num}_batch{batch_size}_epoch{e}.h5')

    # Metrics
    valid_pred = model.predict(val_x, batch_size = batch_size, verbose = verbose)    
    valid_pred_azimuth, valid_pred_zenith = pred_to_angle(valid_pred)
    mae = angular_dist_score(val_y[:, 0], val_y[:, 1], valid_pred_azimuth, valid_pred_zenith)    
    print(f'Total Train Loss: {np.mean(losses):.4f}   Accuracy: {np.mean(accuracy):.4f}  MAE: {mae:.5f}')  
        
    # Memory Cleanup
    gc.collect()

Processing Validation Data...
[116]


  0%|          | 0/1 [00:00<?, ?it/s]

(200000, 96, 6)
Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_1 (InputLayer)        [(None, 96, 6)]           0         
                                                                 
 masking (Masking)           (None, 96, 6)             0         
                                                                 
 bidirectional (Bidirectiona  (None, 96, 384)          230400    
 l)                                                              
                                                                 
 bidirectional_1 (Bidirectio  (None, 96, 384)          665856    
 nal)                                                            
                                                                 
 bidirectional_2 (Bidirectio  (None, 384)              665856    
 nal)                                                            
                                             

  0%|          | 0/1 [00:00<?, ?it/s]

(200000, 96, 6)


  0%|          | 0/195 [00:00<?, ?it/s]

2023-04-10 09:47:05.574564: W tensorflow/tsl/framework/cpu_allocator_impl.cc:82] Allocation of 150994944 exceeds 10% of free system memory.
2023-04-10 09:47:05.574697: W tensorflow/tsl/framework/cpu_allocator_impl.cc:82] Allocation of 150994944 exceeds 10% of free system memory.
2023-04-10 09:47:09.902374: W tensorflow/tsl/framework/cpu_allocator_impl.cc:82] Allocation of 150994944 exceeds 10% of free system memory.
2023-04-10 09:47:11.266088: W tensorflow/tsl/framework/cpu_allocator_impl.cc:82] Allocation of 150994944 exceeds 10% of free system memory.
2023-04-10 09:47:11.379268: W tensorflow/tsl/framework/cpu_allocator_impl.cc:82] Allocation of 150994944 exceeds 10% of free system memory.
2023-04-10 09:47:12.921359: W tensorflow/core/common_runtime/type_inference.cc:339] Type inference failed. This indicates an invalid graph that escaped type checking. Error message: INVALID_ARGUMENT: expected compatible input types, but input 1:
type_id: TFT_OPTIONAL
args {
  type_id: TFT_PRODUCT
  

Total Train Loss: 6.9317   Accuracy: 0.0010  MAE: 1.53220

Loading Train Data at epoch: 1
Processing Training Data...
[128]


  0%|          | 0/1 [00:00<?, ?it/s]

(200000, 96, 6)


  0%|          | 0/195 [00:00<?, ?it/s]

Total Train Loss: 6.9289   Accuracy: 0.0012  MAE: 1.53048

Loading Train Data at epoch: 2
Processing Training Data...
[144]


  0%|          | 0/1 [00:00<?, ?it/s]

(200000, 96, 6)


  0%|          | 0/195 [00:00<?, ?it/s]

Total Train Loss: 6.9260   Accuracy: 0.0011  MAE: 1.52130

Loading Train Data at epoch: 3
Processing Training Data...
[145]


  0%|          | 0/1 [00:00<?, ?it/s]

(200000, 96, 6)


  0%|          | 0/195 [00:00<?, ?it/s]

Total Train Loss: 6.9216   Accuracy: 0.0013  MAE: 1.48682

Loading Train Data at epoch: 4
Processing Training Data...
[195]


  0%|          | 0/1 [00:00<?, ?it/s]

(200000, 96, 6)


  0%|          | 0/195 [00:00<?, ?it/s]

Total Train Loss: 6.9099   Accuracy: 0.0013  MAE: 1.41267

Loading Train Data at epoch: 5
Processing Training Data...
[147]


  0%|          | 0/1 [00:00<?, ?it/s]

(200000, 96, 6)


  0%|          | 0/195 [00:00<?, ?it/s]

Total Train Loss: 6.8339   Accuracy: 0.0029  MAE: 1.29784

Loading Train Data at epoch: 6
Processing Training Data...
[181]


  0%|          | 0/1 [00:00<?, ?it/s]

(200000, 96, 6)


  0%|          | 0/195 [00:00<?, ?it/s]

Total Train Loss: 6.7207   Accuracy: 0.0054  MAE: 1.23558

Loading Train Data at epoch: 7
Processing Training Data...
[174]


  0%|          | 0/1 [00:00<?, ?it/s]

(200000, 96, 6)


  0%|          | 0/195 [00:00<?, ?it/s]

Total Train Loss: 6.6236   Accuracy: 0.0078  MAE: 1.20920

Loading Train Data at epoch: 8
Processing Training Data...
[109]


  0%|          | 0/1 [00:00<?, ?it/s]

(200000, 96, 6)


  0%|          | 0/195 [00:00<?, ?it/s]

Total Train Loss: 6.5502   Accuracy: 0.0110  MAE: 1.18556

Loading Train Data at epoch: 9
Processing Training Data...
[175]


  0%|          | 0/1 [00:00<?, ?it/s]

(200000, 96, 6)


  0%|          | 0/195 [00:00<?, ?it/s]

Total Train Loss: 6.5015   Accuracy: 0.0131  MAE: 1.17864

Loading Train Data at epoch: 10
Processing Training Data...
[131]


  0%|          | 0/1 [00:00<?, ?it/s]

(200000, 96, 6)


  0%|          | 0/195 [00:00<?, ?it/s]

Total Train Loss: 6.4545   Accuracy: 0.0154  MAE: 1.17157

Loading Train Data at epoch: 11
Processing Training Data...
[146]


  0%|          | 0/1 [00:00<?, ?it/s]

(200000, 96, 6)


  0%|          | 0/195 [00:00<?, ?it/s]

Total Train Loss: 6.4187   Accuracy: 0.0176  MAE: 1.16195

Loading Train Data at epoch: 12
Processing Training Data...
[177]


  0%|          | 0/1 [00:00<?, ?it/s]

(200000, 96, 6)


  0%|          | 0/195 [00:00<?, ?it/s]

Total Train Loss: 6.3868   Accuracy: 0.0197  MAE: 1.15801

Loading Train Data at epoch: 13
Processing Training Data...
[123]


  0%|          | 0/1 [00:00<?, ?it/s]

(200000, 96, 6)


  0%|          | 0/195 [00:00<?, ?it/s]

Total Train Loss: 6.3595   Accuracy: 0.0215  MAE: 1.15601

Loading Train Data at epoch: 14
Processing Training Data...
[174]


  0%|          | 0/1 [00:00<?, ?it/s]

(200000, 96, 6)


  0%|          | 0/195 [00:00<?, ?it/s]

Total Train Loss: 6.3348   Accuracy: 0.0231  MAE: 1.15160

Loading Train Data at epoch: 15
Processing Training Data...
[103]


  0%|          | 0/1 [00:00<?, ?it/s]

(200000, 96, 6)


  0%|          | 0/195 [00:00<?, ?it/s]

Total Train Loss: 6.3164   Accuracy: 0.0242  MAE: 1.14553

Loading Train Data at epoch: 16
Processing Training Data...
[162]


  0%|          | 0/1 [00:00<?, ?it/s]

(200000, 96, 6)


  0%|          | 0/195 [00:00<?, ?it/s]

Total Train Loss: 6.2882   Accuracy: 0.0267  MAE: 1.14405

Loading Train Data at epoch: 17
Processing Training Data...
[106]


  0%|          | 0/1 [00:00<?, ?it/s]

(200000, 96, 6)


  0%|          | 0/195 [00:00<?, ?it/s]

Total Train Loss: 6.2791   Accuracy: 0.0277  MAE: 1.14119

Loading Train Data at epoch: 18
Processing Training Data...
[141]


  0%|          | 0/1 [00:00<?, ?it/s]

(200000, 96, 6)


  0%|          | 0/195 [00:00<?, ?it/s]

Total Train Loss: 6.2517   Accuracy: 0.0297  MAE: 1.13928

Loading Train Data at epoch: 19
Processing Training Data...
[198]


  0%|          | 0/1 [00:00<?, ?it/s]

(200000, 96, 6)


  0%|          | 0/195 [00:00<?, ?it/s]

Total Train Loss: 6.2341   Accuracy: 0.0316  MAE: 1.13830

Loading Train Data at epoch: 20
Processing Training Data...
[108]


  0%|          | 0/1 [00:00<?, ?it/s]

(200000, 96, 6)


  0%|          | 0/195 [00:00<?, ?it/s]

Total Train Loss: 6.2276   Accuracy: 0.0320  MAE: 1.13591

Loading Train Data at epoch: 21
Processing Training Data...
[127]


  0%|          | 0/1 [00:00<?, ?it/s]

(200000, 96, 6)


  0%|          | 0/195 [00:00<?, ?it/s]

Total Train Loss: 6.2146   Accuracy: 0.0334  MAE: 1.13555

Loading Train Data at epoch: 22
Processing Training Data...
[189]


  0%|          | 0/1 [00:00<?, ?it/s]

(200000, 96, 6)


  0%|          | 0/195 [00:00<?, ?it/s]

Total Train Loss: 6.1989   Accuracy: 0.0350  MAE: 1.13394

Loading Train Data at epoch: 23
Processing Training Data...
[167]


  0%|          | 0/1 [00:00<?, ?it/s]

(200000, 96, 6)


  0%|          | 0/195 [00:00<?, ?it/s]

Total Train Loss: 6.1814   Accuracy: 0.0363  MAE: 1.13289

Loading Train Data at epoch: 24
Processing Training Data...
[185]


  0%|          | 0/1 [00:00<?, ?it/s]

(200000, 96, 6)


  0%|          | 0/195 [00:00<?, ?it/s]

Total Train Loss: 6.1739   Accuracy: 0.0373  MAE: 1.13016

Loading Train Data at epoch: 25
Processing Training Data...
[186]


  0%|          | 0/1 [00:00<?, ?it/s]

(200000, 96, 6)


  0%|          | 0/195 [00:00<?, ?it/s]

Total Train Loss: 6.1571   Accuracy: 0.0379  MAE: 1.12993

Loading Train Data at epoch: 26
Processing Training Data...
[123]


  0%|          | 0/1 [00:00<?, ?it/s]

(200000, 96, 6)


  0%|          | 0/195 [00:00<?, ?it/s]

Total Train Loss: 6.1491   Accuracy: 0.0392  MAE: 1.12925

Loading Train Data at epoch: 27
Processing Training Data...
[130]


  0%|          | 0/1 [00:00<?, ?it/s]

(200000, 96, 6)


  0%|          | 0/195 [00:00<?, ?it/s]

Total Train Loss: 6.1542   Accuracy: 0.0395  MAE: 1.12300

Loading Train Data at epoch: 28
Processing Training Data...
[187]


  0%|          | 0/1 [00:00<?, ?it/s]

(200000, 96, 6)


  0%|          | 0/195 [00:00<?, ?it/s]

Total Train Loss: 6.1288   Accuracy: 0.0422  MAE: 1.12434

Loading Train Data at epoch: 29
Processing Training Data...
[103]


  0%|          | 0/1 [00:00<?, ?it/s]

(200000, 96, 6)


  0%|          | 0/195 [00:00<?, ?it/s]

Total Train Loss: 6.1256   Accuracy: 0.0416  MAE: 1.12196

Loading Train Data at epoch: 30
Processing Training Data...
[197]


  0%|          | 0/1 [00:00<?, ?it/s]

(200000, 96, 6)


  0%|          | 0/195 [00:00<?, ?it/s]

Total Train Loss: 6.1286   Accuracy: 0.0421  MAE: 1.12041

Loading Train Data at epoch: 31
Processing Training Data...
[123]


  0%|          | 0/1 [00:00<?, ?it/s]

(200000, 96, 6)


  0%|          | 0/195 [00:00<?, ?it/s]

Total Train Loss: 6.0977   Accuracy: 0.0447  MAE: 1.11866

Loading Train Data at epoch: 32
Processing Training Data...
[147]


  0%|          | 0/1 [00:00<?, ?it/s]

(200000, 96, 6)


  0%|          | 0/195 [00:00<?, ?it/s]

Total Train Loss: 6.1049   Accuracy: 0.0451  MAE: 1.11931

Loading Train Data at epoch: 33
Processing Training Data...
[180]


  0%|          | 0/1 [00:00<?, ?it/s]

(200000, 96, 6)


  0%|          | 0/195 [00:00<?, ?it/s]

Total Train Loss: 6.1005   Accuracy: 0.0458  MAE: 1.11923

Loading Train Data at epoch: 34
Processing Training Data...
[171]


  0%|          | 0/1 [00:00<?, ?it/s]

(200000, 96, 6)


  0%|          | 0/195 [00:00<?, ?it/s]

Total Train Loss: 6.0957   Accuracy: 0.0458  MAE: 1.11981

Loading Train Data at epoch: 35
Processing Training Data...
[117]


  0%|          | 0/1 [00:00<?, ?it/s]

(200000, 96, 6)


  0%|          | 0/195 [00:00<?, ?it/s]

Total Train Loss: 6.0897   Accuracy: 0.0464  MAE: 1.11784

Loading Train Data at epoch: 36
Processing Training Data...
[136]


  0%|          | 0/1 [00:00<?, ?it/s]

(200000, 96, 6)


  0%|          | 0/195 [00:00<?, ?it/s]

Total Train Loss: 6.0763   Accuracy: 0.0474  MAE: 1.11356

Loading Train Data at epoch: 37
Processing Training Data...
[166]


  0%|          | 0/1 [00:00<?, ?it/s]

(200000, 96, 6)


  0%|          | 0/195 [00:00<?, ?it/s]

Total Train Loss: 6.0683   Accuracy: 0.0483  MAE: 1.12279

Loading Train Data at epoch: 38
Processing Training Data...
[164]


  0%|          | 0/1 [00:00<?, ?it/s]

(200000, 96, 6)


  0%|          | 0/195 [00:00<?, ?it/s]

Total Train Loss: 6.0693   Accuracy: 0.0483  MAE: 1.11789

Loading Train Data at epoch: 39
Processing Training Data...
[200]


  0%|          | 0/1 [00:00<?, ?it/s]

(200000, 96, 6)


  0%|          | 0/195 [00:00<?, ?it/s]

Total Train Loss: 6.0655   Accuracy: 0.0482  MAE: 1.11494

Loading Train Data at epoch: 40
Processing Training Data...
[103]


  0%|          | 0/1 [00:00<?, ?it/s]

(200000, 96, 6)


  0%|          | 0/195 [00:00<?, ?it/s]

Total Train Loss: 6.0530   Accuracy: 0.0500  MAE: 1.11121

Loading Train Data at epoch: 41
Processing Training Data...
[172]


  0%|          | 0/1 [00:00<?, ?it/s]

(200000, 96, 6)


  0%|          | 0/195 [00:00<?, ?it/s]

Total Train Loss: 6.0534   Accuracy: 0.0504  MAE: 1.11337

Loading Train Data at epoch: 42
Processing Training Data...
[183]


  0%|          | 0/1 [00:00<?, ?it/s]

(200000, 96, 6)


  0%|          | 0/195 [00:00<?, ?it/s]

Total Train Loss: 6.0544   Accuracy: 0.0504  MAE: 1.11503

Loading Train Data at epoch: 43
Processing Training Data...
[122]


  0%|          | 0/1 [00:00<?, ?it/s]

(200000, 96, 6)


  0%|          | 0/195 [00:00<?, ?it/s]

Total Train Loss: 6.0502   Accuracy: 0.0502  MAE: 1.11030

Loading Train Data at epoch: 44
Processing Training Data...
[157]


  0%|          | 0/1 [00:00<?, ?it/s]

(200000, 96, 6)


  0%|          | 0/195 [00:00<?, ?it/s]

Total Train Loss: 6.0365   Accuracy: 0.0524  MAE: 1.10516

Loading Train Data at epoch: 45
Processing Training Data...
[192]


  0%|          | 0/1 [00:00<?, ?it/s]

(200000, 96, 6)


  0%|          | 0/195 [00:00<?, ?it/s]

Total Train Loss: 6.0334   Accuracy: 0.0520  MAE: 1.10885

Loading Train Data at epoch: 46
Processing Training Data...
[126]


  0%|          | 0/1 [00:00<?, ?it/s]

(200000, 96, 6)


  0%|          | 0/195 [00:00<?, ?it/s]

Total Train Loss: 6.0348   Accuracy: 0.0532  MAE: 1.10713

Loading Train Data at epoch: 47
Processing Training Data...
[138]


  0%|          | 0/1 [00:00<?, ?it/s]

(200000, 96, 6)


  0%|          | 0/195 [00:00<?, ?it/s]

Total Train Loss: 6.0276   Accuracy: 0.0533  MAE: 1.10531

Loading Train Data at epoch: 48
Processing Training Data...
[170]


  0%|          | 0/1 [00:00<?, ?it/s]

(200000, 96, 6)


  0%|          | 0/195 [00:00<?, ?it/s]

Total Train Loss: 6.0154   Accuracy: 0.0530  MAE: 1.10662

Loading Train Data at epoch: 49
Processing Training Data...
[112]


  0%|          | 0/1 [00:00<?, ?it/s]

(200000, 96, 6)


  0%|          | 0/195 [00:00<?, ?it/s]

Total Train Loss: 6.0166   Accuracy: 0.0538  MAE: 1.10561

Loading Train Data at epoch: 50
Processing Training Data...
[139]


  0%|          | 0/1 [00:00<?, ?it/s]

(200000, 96, 6)


  0%|          | 0/195 [00:00<?, ?it/s]

Total Train Loss: 6.0194   Accuracy: 0.0552  MAE: 1.10541

Loading Train Data at epoch: 51
Processing Training Data...
[103]


  0%|          | 0/1 [00:00<?, ?it/s]

(200000, 96, 6)


  0%|          | 0/195 [00:00<?, ?it/s]

Total Train Loss: 6.0046   Accuracy: 0.0561  MAE: 1.10322

Loading Train Data at epoch: 52
Processing Training Data...
[152]


  0%|          | 0/1 [00:00<?, ?it/s]

(200000, 96, 6)


  0%|          | 0/195 [00:00<?, ?it/s]

Total Train Loss: 6.0037   Accuracy: 0.0549  MAE: 1.10743

Loading Train Data at epoch: 53
Processing Training Data...
[103]


  0%|          | 0/1 [00:00<?, ?it/s]

(200000, 96, 6)


  0%|          | 0/195 [00:00<?, ?it/s]

Total Train Loss: 5.9778   Accuracy: 0.0581  MAE: 1.10855

Loading Train Data at epoch: 54
Processing Training Data...
[168]


  0%|          | 0/1 [00:00<?, ?it/s]

(200000, 96, 6)


  0%|          | 0/195 [00:00<?, ?it/s]

Total Train Loss: 6.0074   Accuracy: 0.0559  MAE: 1.10355

Loading Train Data at epoch: 55
Processing Training Data...
[112]


  0%|          | 0/1 [00:00<?, ?it/s]

(200000, 96, 6)


  0%|          | 0/195 [00:00<?, ?it/s]

Total Train Loss: 5.9894   Accuracy: 0.0570  MAE: 1.10171

Loading Train Data at epoch: 56
Processing Training Data...
[186]


  0%|          | 0/1 [00:00<?, ?it/s]

(200000, 96, 6)


  0%|          | 0/195 [00:00<?, ?it/s]

Total Train Loss: 5.9887   Accuracy: 0.0568  MAE: 1.10224

Loading Train Data at epoch: 57
Processing Training Data...
[138]


  0%|          | 0/1 [00:00<?, ?it/s]

(200000, 96, 6)


  0%|          | 0/195 [00:00<?, ?it/s]

Total Train Loss: 5.9854   Accuracy: 0.0572  MAE: 1.10063

Loading Train Data at epoch: 58
Processing Training Data...
[119]


  0%|          | 0/1 [00:00<?, ?it/s]

(200000, 96, 6)


  0%|          | 0/195 [00:00<?, ?it/s]

Total Train Loss: 5.9892   Accuracy: 0.0566  MAE: 1.09903

Loading Train Data at epoch: 59
Processing Training Data...
[148]


  0%|          | 0/1 [00:00<?, ?it/s]

(200000, 96, 6)


  0%|          | 0/195 [00:00<?, ?it/s]

Total Train Loss: 5.9945   Accuracy: 0.0569  MAE: 1.09760

Loading Train Data at epoch: 60
Processing Training Data...
[137]


  0%|          | 0/1 [00:00<?, ?it/s]

(200000, 96, 6)


  0%|          | 0/195 [00:00<?, ?it/s]

Total Train Loss: 5.9895   Accuracy: 0.0577  MAE: 1.09944

Loading Train Data at epoch: 61
Processing Training Data...
[191]


  0%|          | 0/1 [00:00<?, ?it/s]

(200000, 96, 6)


  0%|          | 0/195 [00:00<?, ?it/s]

Total Train Loss: 5.9830   Accuracy: 0.0583  MAE: 1.09853

Loading Train Data at epoch: 62
Processing Training Data...
[108]


  0%|          | 0/1 [00:00<?, ?it/s]

(200000, 96, 6)


  0%|          | 0/195 [00:00<?, ?it/s]

Total Train Loss: 5.9820   Accuracy: 0.0586  MAE: 1.09648

Loading Train Data at epoch: 63
Processing Training Data...
[190]


  0%|          | 0/1 [00:00<?, ?it/s]

(200000, 96, 6)


  0%|          | 0/195 [00:00<?, ?it/s]

Total Train Loss: 5.9736   Accuracy: 0.0603  MAE: 1.09962

Loading Train Data at epoch: 64
Processing Training Data...
[181]


  0%|          | 0/1 [00:00<?, ?it/s]

(200000, 96, 6)


  0%|          | 0/195 [00:00<?, ?it/s]

Total Train Loss: 5.9792   Accuracy: 0.0585  MAE: 1.09925

Loading Train Data at epoch: 65
Processing Training Data...
[189]


  0%|          | 0/1 [00:00<?, ?it/s]

(200000, 96, 6)


  0%|          | 0/195 [00:00<?, ?it/s]

Total Train Loss: 5.9697   Accuracy: 0.0585  MAE: 1.09782

Loading Train Data at epoch: 66
Processing Training Data...
[159]


  0%|          | 0/1 [00:00<?, ?it/s]

(200000, 96, 6)


  0%|          | 0/195 [00:00<?, ?it/s]

Total Train Loss: 5.9611   Accuracy: 0.0608  MAE: 1.09669

Loading Train Data at epoch: 67
Processing Training Data...
[118]


  0%|          | 0/1 [00:00<?, ?it/s]

(200000, 96, 6)


  0%|          | 0/195 [00:00<?, ?it/s]

Total Train Loss: 5.9629   Accuracy: 0.0601  MAE: 1.09256

Loading Train Data at epoch: 68
Processing Training Data...
[121]


  0%|          | 0/1 [00:00<?, ?it/s]

(200000, 96, 6)


  0%|          | 0/195 [00:00<?, ?it/s]

Total Train Loss: 5.9597   Accuracy: 0.0607  MAE: 1.09746

Loading Train Data at epoch: 69
Processing Training Data...
[125]


  0%|          | 0/1 [00:00<?, ?it/s]

(200000, 96, 6)


  0%|          | 0/195 [00:00<?, ?it/s]

Total Train Loss: 5.9602   Accuracy: 0.0607  MAE: 1.09852

Loading Train Data at epoch: 70
Processing Training Data...
[167]


  0%|          | 0/1 [00:00<?, ?it/s]

(200000, 96, 6)


  0%|          | 0/195 [00:00<?, ?it/s]

Total Train Loss: 5.9511   Accuracy: 0.0619  MAE: 1.09597

Loading Train Data at epoch: 71
Processing Training Data...
[178]


  0%|          | 0/1 [00:00<?, ?it/s]

(200000, 96, 6)


  0%|          | 0/195 [00:00<?, ?it/s]

Total Train Loss: 5.9602   Accuracy: 0.0612  MAE: 1.09300

Loading Train Data at epoch: 72
Processing Training Data...
[165]


  0%|          | 0/1 [00:00<?, ?it/s]

(200000, 96, 6)


  0%|          | 0/195 [00:00<?, ?it/s]

Total Train Loss: 5.9496   Accuracy: 0.0624  MAE: 1.09273

Loading Train Data at epoch: 73
Processing Training Data...
[178]


  0%|          | 0/1 [00:00<?, ?it/s]

(200000, 96, 6)


  0%|          | 0/195 [00:00<?, ?it/s]

Total Train Loss: 5.9338   Accuracy: 0.0635  MAE: 1.09489

Loading Train Data at epoch: 74
Processing Training Data...
[118]


  0%|          | 0/1 [00:00<?, ?it/s]

(200000, 96, 6)


  0%|          | 0/195 [00:00<?, ?it/s]

Total Train Loss: 5.9372   Accuracy: 0.0633  MAE: 1.08907

Loading Train Data at epoch: 75
Processing Training Data...
[143]


  0%|          | 0/1 [00:00<?, ?it/s]

(200000, 96, 6)


  0%|          | 0/195 [00:00<?, ?it/s]

Total Train Loss: 5.9496   Accuracy: 0.0627  MAE: 1.08931

Loading Train Data at epoch: 76
Processing Training Data...
[100]


  0%|          | 0/1 [00:00<?, ?it/s]

(200000, 96, 6)


  0%|          | 0/195 [00:00<?, ?it/s]

Total Train Loss: 5.9361   Accuracy: 0.0633  MAE: 1.08943

Loading Train Data at epoch: 77
Processing Training Data...
[162]


  0%|          | 0/1 [00:00<?, ?it/s]

(200000, 96, 6)


  0%|          | 0/195 [00:00<?, ?it/s]

Total Train Loss: 5.9354   Accuracy: 0.0633  MAE: 1.08924

Loading Train Data at epoch: 78
Processing Training Data...
[199]


  0%|          | 0/1 [00:00<?, ?it/s]

(200000, 96, 6)


  0%|          | 0/195 [00:00<?, ?it/s]

Total Train Loss: 5.9429   Accuracy: 0.0632  MAE: 1.09404

Loading Train Data at epoch: 79
Processing Training Data...
[182]


  0%|          | 0/1 [00:00<?, ?it/s]

(200000, 96, 6)


  0%|          | 0/195 [00:00<?, ?it/s]

Total Train Loss: 5.9379   Accuracy: 0.0626  MAE: 1.09334
