<a href="https://colab.research.google.com/github/juansolana/SincNet_MLP/blob/master/LANL_Earthquake_Prediction_with_Cloud_GPU.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# LANL Earthquake Prediction with Cloud GPU
<table class="tfo-notebook-buttons" align="left" >
 <td>
    <a target="_blank" href="https://www.kaggle.com/c/LANL-Earthquake-Prediction"><img src="https://www.kaggle.com/static/images/site-logo.png" width='82' />View competition</a>
  </td>
  <td>
    <a target="_blank" href="https://github.com/icewing1996/SincNet_MLP"><img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" />View source on GitHub</a>
  </td>
</table>


**Firstly**, log in to Google so that you can access the data on Google Cloud

In [0]:
from google.colab import auth
import math
import os
auth.authenticate_user()

**Secondly**, download code from GitHub

In [2]:
!rm -rf SincNet_MLP
!git clone https://github.com/juansolana/SincNet_MLP
os.chdir('SincNet_MLP')

Cloning into 'SincNet_MLP'...
remote: Enumerating objects: 29, done.[K
remote: Counting objects: 100% (29/29), done.[K
remote: Compressing objects: 100% (29/29), done.[K
remote: Total 397 (delta 16), reused 0 (delta 0), pack-reused 368[K
Receiving objects: 100% (397/397), 185.70 KiB | 5.80 MiB/s, done.
Resolving deltas: 100% (255/255), done.


**Thirdly**, specify dataset to use and download them from Google Cloud:


*   **Raw audio** with each 150,000 datapoint segment downsampled to N datapoints. 
   * **4000**: N = 4000, using last point as target
   * **4000mid**: N = 4000, using mid point as target
   * **40000**: N = 40000, using last point as target
   * **40000_mid**: N = 40000, using mid point as target
   
* **Handcrafted features** with each 150,000 datapoint segment split into 150 mini-segments (statistics calculated over each 1,000 points).
   * **features_last**: mean, min, max, std, 4 quantiles, using last point as target
   * **features_mid**: same as above, using mid point as target
   * **more_feautres_mid**: in addition to above, kurtosis, variance, skew, median, mad, abs_mean, abs_std, using mid point as target
   * **sliding_features_mid**: same as **features_mid**, except using sliding window 3750 datapoints at a time on train and dev sets (40x data augmentation)*** (can NOT directly compare to other datasets)***, train+dev shape (167733, 150, 8)


In [3]:
DATASET = "more_feautres_mid" #@param ['4000', '4000mid', '40000', '40000_mid', 'features_last', 'features_mid', 'more_feautres_mid', 'sliding_features_mid', 'sliding_more_features_mid', 'sliding_more_features_last']
#DDATASET = "features_mid"
!gsutil cp gs://edinquake/prepared_data/$DATASET/train_signals prepared_data/train_signals
!gsutil cp gs://edinquake/prepared_data/$DATASET/train_labels prepared_data/train_labels
!gsutil cp gs://edinquake/prepared_data/$DATASET/dev_signals prepared_data/dev_signals
!gsutil cp gs://edinquake/prepared_data/$DATASET/dev_labels prepared_data/dev_labels
!gsutil cp gs://edinquake/prepared_data/$DATASET/test_signals prepared_data/test_signals
!gsutil cp gs://edinquake/prepared_data/$DATASET/test_labels prepared_data//test_labels

Copying gs://edinquake/prepared_data/more_feautres_mid/train_signals...
\
Operation completed over 1 objects/64.8 MiB.                                     
Copying gs://edinquake/prepared_data/more_feautres_mid/train_labels...
/ [1 files][ 29.6 KiB/ 29.6 KiB]                                                
Operation completed over 1 objects/29.6 KiB.                                     
Copying gs://edinquake/prepared_data/more_feautres_mid/dev_signals...
/ [1 files][  7.2 MiB/  7.2 MiB]                                                
Operation completed over 1 objects/7.2 MiB.                                      
Copying gs://edinquake/prepared_data/more_feautres_mid/dev_labels...
/ [1 files][  3.4 KiB/  3.4 KiB]                                                
Operation completed over 1 objects/3.4 KiB.                                      
Copying gs://edinquake/prepared_data/more_feautres_mid/test_signals...
- [1 files][ 45.0 MiB/ 45.0 MiB]                                          

**Fun times!**


1.   Pick a model:
   *   **raw** models use raw audio data; **features** models use handcrafted features
   *  It only makes sense for [SincNet](https://arxiv.org/abs/1808.00158) to use raw audio data
   *  [Transformer](https://arxiv.org/abs/1706.03762) runs out of memory on **raw** data (makes sense since its memory cost is quadratic the length of the input)

2.   Specify hyperparameters (change only relevant models but make sure to*** run all cells***)
3.   Profit

**WLEN** must equal number of input features:
   *  8 for **fetures_mid**, **features_last**, and **sliding_features_mid**
   * 15 for **more_feautres_mid**
   * 4000 for **4000** and **4000mid**
   * 40000 for **40000** and **40000_mid**

**BATCH_NORM** and **LAY_NORM** ***cannot*** be applied in the same layer at the same time

**Activation functions** can be one of **softplus, relu, tanh, sigmoid, leaky_relu, elu, softmax, linear**

In [0]:
MODEL = "Transformer_features" #@param ["Transformer_features", "LSTM_raw", "LSTM_features", "CNN_raw", "CNN_features", "SincNet_raw"]
WLEN="15" #@param [8, 15, 4000, 40000]


**Transformer** hyperparameters:
*   **TRANSFORMER_EMBED_DIM**:
   *  input dimension to transformer
   *  must equal last dimension of **DNN_before** if using **DNN_before** 
   *  must equal number of input features otherwise
      *  8 for **fetures_mid**, **features_last**, and **sliding_features_mid**
      * 15 for **more_feautres_mid**
*   **TRANSFORMER_HIDDEN_SIZE**:
   *  must equal **TRANSFORMER_EMBED_DIM**



In [0]:
# [transformer]
TRANSFORMER_EMBED_DIM = 256 #@param {type:"number"}
TRANSFORMER_MAX_POSITIONS = 1024
POSITION_EMBEDDING_TYPE = 'learned' #@param ['learned', 'timing']
TRANSFORMER_NUM_LAYERS = 4 #@param {type:"number"}
TRANSFORMER_NUM_HEADS = 16 #@param {type:"number"}
TRANSFORMER_FILTER_SIZE = 256 #@param {type:"number"}
TRANSFORMER_HIDDEN_SIZE = 256 #@param {type:"number"}
TRANSFORMER_DROPOUT = 0.1 #@param {type:"number"}
TRANSFORMER_ATTENTION_DROPOUT = 0.1 #@param {type:"number"}
TRANSFORMER_RELU_DROPOUT = 0.1 #@param {type:"number"}

**LSTM** hyperparameters:

*  **LSTM_EMBED_DIM**:
   *  input dimension to transformer
   *  must equal last dimension of **DNN_before** if using **DNN_before** 
   *  must equal number of input features otherwise
      *  8 for **fetures_mid**, **features_last**, and **sliding_features_mid**
      * 15 for **more_feautres_mid**


In [0]:
# [lstm]
LSTM_EMBED_DIM=256 #@param {type:"number"}
LSTM_HIDDEN_SIZE=256 #@param {type:"number"}
LSTM_NUM_LAYERS=4 #@param {type:"number"}
LSTM_BIDIRECTIONAL='True' #@param ['True', 'False']
LSTM_DROPOUT_IN=0.25 #@param {type:"number"}
LSTM_DROPOUT_OUT=0.25 #@param {type:"number"}

**SincNet/ CNN** hyperparameters (see **DNN_before** for how to fill)

In [0]:
# [cnn]
CNN_N_FILT='80,60,60' #@param {type:"string"}
CNN_LEN_FILT='25,5,5' #@param {type:"string"}
CNN_MAX_POOL_LEN='3,3,3' #@param {type:"string"}
CNN_USE_LAYNORM_INP='True' #@param ['True', 'False']
CNN_USE_BATCHNORM_INP='False' #@param ['True', 'False']
CNN_USE_LAYNORM='False,False,False' #@param {type:"string"}
CNN_USE_BATCHNORM='True,True,True' #@param {type:"string"}
CNN_ACT='leaky_relu,leaky_relu,leaky_relu' #@param {type:"string"}
CNN_DROP='0.0,0.0,0.0' #@param {type:"string"}

**DNN_before** hyperparameters: 
*   Optionally have an MLP before feeding input to CNN/LSTM/Transformer
    *   **FC1_LAY_USE**: whether to use **DNN_before**
    *   **FC1_LAY**: dimension of each layer, **e.g.** "256,256,1028" ***(note NO whitespace)***
    *   **FC1_DROP**: dropoput rate of each layer, **e.g.** "0.0,0.0"
    *   **FC1_USE_LAYNORM_INP**: whether to use layer normalization at input
    *   **FC1_USE_BATCHNORM_INP**: whether to use batch normalization at input
    *   **FC1_USE_BATCHNORM**: whether batchnorm at each layer,** e.g.** "True,True,True"
    *   **FC1_USE_LAYNORM**: whether laynorm at each layer, **e.g.** "False,False,False"
    *   **FC1_ACT**: activation function of each layer,** e.g.** "leaky_relu,leaky_relu,leaky_relu"



In [0]:
# [dnn_before]
FC1_LAY_USE='True' #@param ['True', 'False']
FC1_LAY='256' #@param {type: "string"}
FC1_DROP='0.0' #@param {type: "string"}
FC1_USE_LAYNORM_INP='False' #@param ['True', 'False']
FC1_USE_BATCHNORM_INP='False' #@param ['True', 'False']
FC1_USE_BATCHNORM='False' #@param {type: "string"}
FC1_USE_LAYNORM='False' #@param {type: "string"}
FC1_ACT='relu' #@param {type: "string"}

**DNN_after** hyperparameters (everything same as **DNN_before**):
*   The MLP after CNN/LSTM/Transformer

In [0]:
# [dnn_after]
FC2_LAY='10,10' #@param {type: "string"}
FC2_DROP='0.0,0.0' #@param {type: "string"}
FC2_USE_LAYNORM_INP='False' #@param ['True', 'False']
FC2_USE_BATCHNORM_INP='False' #@param ['True', 'False']
FC2_USE_BATCHNORM='False,False' #@param {type: "string"}
FC2_USE_LAYNORM='False,False' #@param {type: "string"}
FC2_ACT='relu,relu' #@param {type: "string"}

**Lastly**, save configs and start training!

**PATIENCE**: early stop if dev loss doesn't improve for PATIENCE epochs

In [10]:
OPTIMIZER = 'AMSGrad' #@param ['AMSGrad', 'AdamW', 'Adam', 'RMSProp']
WEIGHT_DECAY = 0.000 #@param {type:"number"}
LEARNING_RATE = 1e-5 #@param {type:"number"} 
BATCH_SIZE = 64 #@param {type:"number"}
MAX_EPOCH = 1500 #@param {type:"number"}
PATIENCE = 6 #@param {type:"number"}
WHERE_TO_SAVE = 'Transformer_L4_H16-64batchSize-EF' #@param {type:"string"}
RANDOM_SEED = 1234 #@param {type:"number"}


# calculate sampling rate (sf), each 150,000 datapoint segment is 37.5 ms
down_sample_size = 1
if DATASET in ['4000', '4000mid']:
  down_sample_size = 4000
if DATASET in ['40000', '40000_mid']:
  down_sample_size = 40000
fs = math.ceil(down_sample_size/0.0375)

with open('config_file', 'w') as f:
  f.write('[data]\n')  
  f.write('train_src_dir=prepared_data/train_signals\n')
  f.write('train_tgt_dir=prepared_data/train_labels\n')
  f.write('dev_src_dir=prepared_data/dev_signals\n')
  f.write('dev_tgt_dir=prepared_data/dev_labels\n')
  f.write('test_src_dir=prepared_data/test_signals\n')
  f.write('test_tgt_dir=prepared_data/test_labels\n')
  f.write('output_folder=exp/{}/\n'.format(WHERE_TO_SAVE))
  f.write('save_dir=exp/{}/checkpoints/\n'.format(WHERE_TO_SAVE))
  f.write('restore_file=checkpoint_last.pt\n')
  f.write('\n')
  
  f.write('[windowing]\n')
  f.write('fs={}\n'.format(fs))
  f.write('\n')

  f.write('[cnn]\n')
  f.write('wlen={}\n'.format(WLEN))
  f.write('cnn_N_filt={}\n'.format(CNN_N_FILT))
  f.write('cnn_len_filt={}\n'.format(CNN_LEN_FILT))
  f.write('cnn_max_pool_len={}\n'.format(CNN_MAX_POOL_LEN))
  f.write('cnn_use_laynorm_inp={}\n'.format(CNN_USE_LAYNORM_INP))
  f.write('cnn_use_batchnorm_inp={}\n'.format(CNN_USE_BATCHNORM_INP))
  f.write('cnn_use_laynorm={}\n'.format(CNN_USE_LAYNORM))
  f.write('cnn_use_batchnorm={}\n'.format(CNN_USE_BATCHNORM))
  f.write('cnn_act={}\n'.format(CNN_ACT))
  f.write('cnn_drop={}\n'.format(CNN_DROP))
  f.write('\n')
  
  f.write('[transformer]\n')
  f.write('tr_embed_dim={}\n'.format(TRANSFORMER_EMBED_DIM))
  f.write('tr_max_positions={}\n'.format(TRANSFORMER_MAX_POSITIONS))
  f.write('tr_pos={}\n'.format(POSITION_EMBEDDING_TYPE))
  f.write('tr_num_layers={}\n'.format(TRANSFORMER_NUM_LAYERS))
  f.write('tr_num_heads={}\n'.format(TRANSFORMER_NUM_HEADS))
  f.write('tr_filter_size={}\n'.format(TRANSFORMER_FILTER_SIZE))
  f.write('tr_hidden_size={}\n'.format(TRANSFORMER_HIDDEN_SIZE))
  f.write('tr_dropout={}\n'.format(TRANSFORMER_DROPOUT))
  f.write('tr_attention_dropout={}\n'.format(TRANSFORMER_ATTENTION_DROPOUT))
  f.write('tr_relu_dropout={}\n'.format(TRANSFORMER_RELU_DROPOUT))
  f.write('\n')
  
  f.write('[lstm]\n')
  f.write('lstm_embed_dim={}\n'.format(LSTM_EMBED_DIM))
  f.write('lstm_hidden_size={}\n'.format(LSTM_HIDDEN_SIZE))
  f.write('lstm_num_layers={}\n'.format(LSTM_NUM_LAYERS))
  f.write('lstm_bidirectional={}\n'.format(LSTM_BIDIRECTIONAL))
  f.write('lstm_dropout_in={}\n'.format(LSTM_DROPOUT_IN))
  f.write('lstm_dropout_out={}\n'.format(LSTM_DROPOUT_OUT))
  f.write('\n')
  
  f.write('[dnn_before]\n')
  f.write('fc1_lay_use={}\n'.format(FC1_LAY_USE))
  f.write('fc1_lay={}\n'.format(FC1_LAY))
  f.write('fc1_drop={}\n'.format(FC1_DROP))
  f.write('fc1_use_laynorm_inp={}\n'.format(FC1_USE_LAYNORM_INP))
  f.write('fc1_use_batchnorm_inp={}\n'.format(FC1_USE_BATCHNORM_INP))
  f.write('fc1_use_batchnorm={}\n'.format(FC1_USE_BATCHNORM))
  f.write('fc1_use_laynorm={}\n'.format(FC1_USE_LAYNORM))
  f.write('fc1_act={}\n'.format(FC1_ACT))
  f.write('\n')

  f.write('[dnn_after]\n')
  f.write('fc2_lay={}\n'.format(FC2_LAY))
  f.write('fc2_drop={}\n'.format(FC2_DROP))
  f.write('fc2_use_laynorm_inp={}\n'.format(FC2_USE_LAYNORM_INP))
  f.write('fc2_use_batchnorm_inp={}\n'.format(FC2_USE_BATCHNORM_INP))
  f.write('fc2_use_batchnorm={}\n'.format(FC2_USE_BATCHNORM))
  f.write('fc2_use_laynorm={}\n'.format(FC2_USE_LAYNORM))
  f.write('fc2_act={}\n'.format(FC2_ACT))
  f.write('\n')
  
  f.write('[optimization]\n')
  f.write('optimizer={}\n'.format(OPTIMIZER))
  f.write('weight_decay={}\n'.format(WEIGHT_DECAY))
  f.write('lr={}\n'.format(LEARNING_RATE))
  f.write('batch_size={}\n'.format(BATCH_SIZE))
  f.write('N_epochs={}\n'.format(MAX_EPOCH))
  f.write('seed={}\n'.format(RANDOM_SEED))
  f.write('cuda=True\n')
  f.write('patience={}\n'.format(PATIENCE))  
          
!mkdir -p exp/$WHERE_TO_SAVE/
!cp config_file exp/$WHERE_TO_SAVE/
!python run.py --cfg=config_file --model=$MODEL

Reading config file...
FunTimes: 1847511 parameters
CommandException: Wrong number of arguments for "cp" command.
CommandException: No URLs matched: gs://edinquake/MLP/exp/Transformer_L4_H16-64batchSize-EF/checkpoints/checkpoint_last.pt
***** Started training at 2019-03-22 09:09:40.433907 *****
Epoch 000: loss 5.008 | grad_norm 0.4313 | clip 0
Epoch 000: valid_loss 5.24 | valid_perplexity 188
Copying file://exp/Transformer_L4_H16-64batchSize-EF/checkpoints/checkpoint_best.pt [Content-Type=application/octet-stream]...
-
Operation completed over 1 objects/28.2 MiB.                                     
Copying file://exp/Transformer_L4_H16-64batchSize-EF/checkpoints/checkpoint_last.pt [Content-Type=application/octet-stream]...
-
Operation completed over 1 objects/28.2 MiB.                                     
Epoch 001: loss 5.009 | grad_norm 0.4317 | clip 0
Epoch 001: valid_loss 5.24 | valid_perplexity 188
Copying file://exp/Transformer_L4_H16-64batchSize-EF/checkpoints/checkpoint_best.p

In [0]:

# import os
# import torch
# from torch.serialization import default_restore_location
# state_dict = {}

# # print (os.listdir('.'))
# checkpoint_path = os.path.join('./exp/{}/checkpoints'.format(WHERE_TO_SAVE), 'checkpoint_best.pt')
# if os.path.isfile(checkpoint_path):
#   print('exist')
#   state_dict = torch.load(checkpoint_path, map_location=lambda s, l: default_restore_location(s, 'cpu'))
  
# print (state_dict['best_mae_loss'])



In [0]:
# def load_checkpoint(save_dir, restore_file, model, optimizer):
# 		checkpoint_path = os.path.join(save_dir, restore_file)
# 		subprocess.call(['gsutil', 'cp', 'gs://edinquake/MLP/{}'.format(checkpoint_path), checkpoint_path])
# 		if os.path.isfile(checkpoint_path):
# 				state_dict = torch.load(checkpoint_path, map_location=lambda s, l: default_restore_location(s, 'cpu'))
# 				model.load_state_dict(state_dict['model'])
# 				optimizer.load_state_dict(state_dict['optimizer'])
# 				save_checkpoint.best_loss = state_dict['best_loss']
# 				save_checkpoint.last_epoch = state_dict['last_epoch']
# 				print('Loaded checkpoint {}'.format(checkpoint_path))
# return state_dict

**(Optional)** Download prediction!

In [0]:
from google.colab import files
files.download('{}_submission.csv'.format(MODEL))

In [0]:
# Results (on Kaggle public scoreboard)
# SVM, RF & XGB 1.536
# LSTM    1.541
# SincNet 4000  dim last point 2.736
# CNN     40000 dim last point 2.238

# ====================================================
# Below, all results on dev
# ====================================================
# |||Using Raw|||

# (4000 raw waveform, using last point)
# SincNet 2.88
# CNN     2.61

# (40000 raw waveform, using last point)
# SincNet 2.71
# CNN     2.57

# (4000 raw waveform, using mid point)
# SincNet 2.64
# CNN     2.49
# LSTM    2.751 (4layer biLSTM 128, MLP 10, 10)


# ======================================================
# |||Using features|||
# Transformer (last point)
# layer  dim      score
# 4      256      2.07 AMSGrad / 16 Transformer Heads / Batchnorm / BatchSize 8

# ======================================================
# |||Using features|||
# Transformer (mid point)
# layer  dim      score
# 2      8        2.3
# 2      256      2.26
# 4      256      2.24
# 4      256      2.19 (Adam)    1.621 on Kaggle
# 4      256      2.15 AMSGrad / 16 Transformer Heads / Batchnorm
# 5      256      2.59 AMSGrad / 16 Transformer Heads / Batchnorm
# 4      256      2.34 AMSGrad / 16 Transformer Heads / Batchnorm / Weight decay 0.0001
# 4      256      2.23 AMSGrad / 16 Transformer Heads / Batchnorm / BatchSize 128
# 4      256      2.11 AMSGrad / 16 Transformer Heads / Batchnorm / BatchSize 32
# 4      256      2.10 AMSGrad / 16 Transformer Heads / Batchnorm / BatchSize 16
# 4      256      2.06 AMSGrad / 16 Transformer Heads / Batchnorm / BatchSize 8
# 4      256      2.09 AMSGrad / 16 Transformer Heads / Batchnorm / BatchSize 4



# LSTM results (mid point)
# Same settings (MLP_layer 1, MLP_dim 10)
# dir layer  score   LSTM_dim    
# bi  1      2.44      48
# s   1      2.48
# bi  2      2.36
# s   2      2.49
# bi  4      2.32
# bi  8      2.36

# bi  4      2.28      128
# bi  4      2.26      256



# Same settings (bi, 4 lstm layer, lstm_dim 128)
# MLP_layer    MLP_dim     score
# 1            100         2.31
# 1			 200		 2.31

# 2            10, 10      2.27
# 3            10, 10, 10  2.26
# 3            100, 10, 10 2.27

# LOL WTF AFTER BUG FIX
# 3            10, 10, 10  3.11
# ===================================================
# |||Using more features|||
# Same settings (bi, 4 lstm layer, lstm_dim 128)
# MLP_layer    MLP_dim     score
# 3            10, 10, 10  2.27
# 1            10          2.31


# Transformer (mid point)
# layer  dim      score
# 4      256      2.33 AMSGrad 
# 4      256      2.-- AMSGrad / 16 Transformer Heads / Batchnorm / BatchSize 8 Transformer_L4_H16-8batchSize



# ======================================================
# |||Using Sliding features|||
# Transformer (mid point)
# layer  dim      score
# 4      256      ~2.03 AMSGrad 1.532 on Kaggle 8 Transformer Head, Layernorm
