# ES-RNN Colab NB Example

A GPU-enabled version of the hybrid ES-RNN model by Slawek et al that won the M4 time-series forecasting competition by a large margin, here implemented in a Google Colab environment. The details of our implementation and the results are discussed in detail on this [paper](https://arxiv.org/abs/1907.03329).



## Configure notebook

Notebook is meant to be connected to a clone of the ES-RNN GPU GitHub repository (https://github.com/florisrc/ESRNN-GPU.git). This notebook is designed having the following workflow in mind: 

1. Mount Colab to drive.
2. Clone the remote GitHub repo to Colab.
3. Copy GitHub repo to Colab.
4. Create temp work directory with GitHub files in Colab. 
5. Save nb changes to Colab nb in drive.
6. Clone remote GitHub to temp Colab directory. 
7, Sync changes from drive to temp Colab directory. 
8. Commit changes to remote GitHub directory. 

In the following few cells this framework is set up, while helper functions are provided. 

Please note that it requires a configuration file including github credentials: 

``` 
{"repository": "***", "user": "***", "password": "***", "email": "***"}
```
Furthermore the configuration file should also include gcloud credentials if buckets are used. 


Furthermore the notebook should be saved manually before running ```git_prepare_commit()``` and ```git_commit()``` functions if notebook changes should be included in commit. 


In [1]:
from google.colab import drive, auth
from os.path import join

# directory configs
ROOT = '/content/drive'     # default for the drive
PROJ = 'ESRNN-GPU'       # name of project 
CONFIG_FILE = ROOT + '/My Drive/personal/config.json' # path to git configs
PROJECT_PATH = join(ROOT, 'My Drive/' + PROJ)

auth.authenticate_user()        # authenticate user cloud storage account
drive.mount(ROOT)       # mount the drive at /content/drive

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at /content/drive


In [0]:
import json

def clone_github_repo(config_file, targ_dir='', r = "ESRNN-GPU"):
  """Clone GitHub repository. """
  with open (config_file, 'r') as f:
    git_config = json.load(f)['git_config']
    u = git_config['user']
    p = git_config['password']
    !git clone  https://{u}:{p}@github.com/{u}/{r}.git {targ_dir}

def cp_proj_2_drive():
  """Copy files to drive."""
  !cp -r /content/"{PROJ}"/* "{PROJECT_PATH}"

def prepare_git_commit():
  """Sync GitHub repository with Drive. Please save this notebook first if 
  the changes of this notebook should be included in the commit. """
  %cd /content/
  !mkdir ./temp
  clone_github_repo(CONFIG_FILE, targ_dir='./temp')
  !rsync -av --exclude=data/ "{PROJECT_PATH}"/* ./temp

def git_commit(config_file, commit_m='commited from colab nb', branch='master', commit_f='.'):
  """Commit all changes after safe."""
  with open (config_file, 'r') as f:
    git_config = json.load(f)['git_config']
  u  = git_config['user']
  e = git_config['email']
  %cd /content/temp
  !git config --global user.email "{e}"
  !git config --global user.name "{u}" 
  !git add "{commit_f}"
  !git commit -m "{commit_m}"
  !git push origin "{branch}"
  %cd /content
  !rm -rf ./temp

## Get data and code:


In [3]:
# get data
%cd /content
!mkdir /content/m4_data 
%cd /content/m4_data
!wget https://www.m4.unic.ac.cy/wp-content/uploads/2017/12/M4DataSet.zip
!wget https://www.m4.unic.ac.cy/wp-content/uploads/2018/07/M-test-set.zip
!wget https://github.com/M4Competition/M4-methods/raw/master/Dataset/M4-info.csv
!mkdir ./Train && cd ./Train && unzip ../M4DataSet.zip && cd ..
!mkdir ./Test && cd ./Test && unzip ../M-test-set.zip && cd ..

%cd /content
!mkdir "{PROJECT_PATH}"  # in case we haven't created it already
!mkdir ./temp
clone_github_repo(CONFIG_FILE, targ_dir='temp', r="ESRNN-GPU") # clone git repo using repo config file 
!cp -r ./temp/* "{PROJECT_PATH}"
!rm -rf ./temp
!mkdir "{PROJ}"
!rsync -av --exclude=.idea/ "{PROJECT_PATH}"/* "{PROJ}"

%cd /content/ESRNN-GPU/
!mkdir ./data
%cd data/
!mkdir ./Train && cp /content/m4_data/Train/* ./Train/
!mkdir ./Test && cp /content/m4_data/Test/* ./Test/
!cp /content/m4_data/M4-info.csv ./info.csv
!cd ../..

/content
/content/m4_data
--2020-04-20 12:11:44--  https://www.m4.unic.ac.cy/wp-content/uploads/2017/12/M4DataSet.zip
Resolving www.m4.unic.ac.cy (www.m4.unic.ac.cy)... 35.177.142.35, 35.176.90.68
Connecting to www.m4.unic.ac.cy (www.m4.unic.ac.cy)|35.177.142.35|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 66613994 (64M) [application/zip]
Saving to: ‘M4DataSet.zip’


2020-04-20 12:11:52 (9.71 MB/s) - ‘M4DataSet.zip’ saved [66613994/66613994]

--2020-04-20 12:11:54--  https://www.m4.unic.ac.cy/wp-content/uploads/2018/07/M-test-set.zip
Resolving www.m4.unic.ac.cy (www.m4.unic.ac.cy)... 35.176.90.68, 35.177.142.35
Connecting to www.m4.unic.ac.cy (www.m4.unic.ac.cy)|35.176.90.68|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3723045 (3.5M) [application/zip]
Saving to: ‘M-test-set.zip’


2020-04-20 12:11:57 (2.28 MB/s) - ‘M-test-set.zip’ saved [3723045/3723045]

--2020-04-20 12:11:58--  https://github.com/M4Competition/M4-methods/raw/m

## Create colab environment with correct library versions

In [4]:
# uninstall torch  
!pip uninstall torch
!pip uninstall torch # run twice (recommendation pytorch forums)

# and re-install as 0.4.1
from os import path
from wheel.pep425tags import get_abbr_impl, get_impl_ver, get_abi_tag
platform = '{}{}-{}'.format(get_abbr_impl(), get_impl_ver(), get_abi_tag())

accelerator = 'cu80' if path.exists('/opt/bin/nvidia-smi') else 'cpu'

!pip install -q http://download.pytorch.org/whl/{accelerator}/torch-0.4.1-{platform}-linux_x86_64.whl torchvision

# tensorflow version 1 
%tensorflow_version 1.x

import torch
import tensorflow as tf 
print(f'Torch version: {torch.__version__}')
print(f'Tensorflow version: {tf.__version__}')
print(f'Torch.cuda.is_available: {torch.cuda.is_available()}')

Uninstalling torch-1.4.0:
  Would remove:
    /usr/local/bin/convert-caffe2-to-onnx
    /usr/local/bin/convert-onnx-to-caffe2
    /usr/local/lib/python3.6/dist-packages/caffe2/*
    /usr/local/lib/python3.6/dist-packages/torch-1.4.0.dist-info/*
    /usr/local/lib/python3.6/dist-packages/torch/*
Proceed (y/n)? y
  Successfully uninstalled torch-1.4.0
[K     |████████████████████████████████| 483.0MB 2.2MB/s 
[31mERROR: torchvision 0.5.0 has requirement torch==1.4.0, but you'll have torch 0.4.1 which is incompatible.[0m
[31mERROR: fastai 1.0.60 has requirement torch>=1.0.0, but you'll have torch 0.4.1 which is incompatible.[0m
[?25hTensorFlow 1.x selected.
Torch version: 0.4.1
Tensorflow version: 1.15.2
Torch.cuda.is_available: True


## Check configurations 


In [5]:
# move to project working directory
%cd /content/"{PROJ}"

# Check configuration
import pprint
from es_rnn.config import get_config

config = get_config('Monthly')    # can be quarterly, monthly, daily or yearly. 
pprint.pprint(config)

/content/ESRNN-GPU
{'add_nl_layer': True,
 'batch_size': 1024,
 'c_state_penalty': 0,
 'chop_val': 72,
 'device': 'cuda',
 'dilations': ((1, 3), (6, 12)),
 'gradient_clipping': 20,
 'input_size': 12,
 'input_size_i': 12,
 'learning_rate': 0.001,
 'learning_rates': (10, 0.0001),
 'level_variability_penalty': 50,
 'lr_anneal_rate': 0.5,
 'lr_anneal_step': 5,
 'lr_ratio': 3.1622776601683795,
 'lr_tolerance_multip': 1.005,
 'min_epochs_before_changing_lrate': 2,
 'min_learning_rate': 0.0001,
 'num_of_categories': 6,
 'num_of_train_epochs': 15,
 'output_size': 18,
 'output_size_i': 18,
 'percentile': 50,
 'print_output_stats': 3,
 'print_train_batch_every': 5,
 'prod': True,
 'rnn_cell_type': 'LSTM',
 'seasonality': 12,
 'state_hsize': 50,
 'tau': 0.5,
 'training_percentile': 45,
 'training_tau': 0.45,
 'variable': 'Monthly'}


## Editing configurations (not necessary) 

In [6]:
%%writefile /content/{PROJ}/es_rnn/config.py

from math import sqrt

import torch


def get_config(interval):
    config = {
        'prod': True,
        'device': ("cuda" if torch.cuda.is_available() else "cpu"),
        'percentile': 50,
        'training_percentile': 45,
        'add_nl_layer': True,
        'rnn_cell_type': 'LSTM',
        'learning_rate': 1e-3,
        'learning_rates': ((10, 1e-4)),
        'num_of_train_epochs': 5,
        'num_of_categories': 6,  # in data provided
        'batch_size': 1024,
        'gradient_clipping': 20,
        'c_state_penalty': 0,
        'min_learning_rate': 0.0001,
        'lr_ratio': sqrt(10),
        'lr_tolerance_multip': 1.005,
        'min_epochs_before_changing_lrate': 2,
        'print_train_batch_every': 5,
        'print_output_stats': 3,
        'lr_anneal_rate': 0.5,
        'lr_anneal_step': 5
    }

    if interval == 'Quarterly':
        config.update({
            'chop_val': 72,
            'variable': "Quarterly",
            'dilations': ((1, 2), (4, 8)),
            'state_hsize': 40,
            'seasonality': 4,
            'input_size': 4,
            'output_size': 8,
            'level_variability_penalty': 80
        })
    elif interval == 'Monthly':
        config.update({
            #     RUNTIME PARAMETERS
            'chop_val': 72,
            'variable': "Monthly",
            'dilations': ((1, 3), (6, 12)),
            'state_hsize': 50,
            'seasonality': 12,
            'input_size': 12,
            'output_size': 18,
            'level_variability_penalty': 50
        })
    elif interval == 'Daily':
        config.update({
            #     RUNTIME PARAMETERS
            'chop_val': 200,
            'variable': "Daily",
            'dilations': ((1, 7), (14, 28)),
            'state_hsize': 50,
            'seasonality': 7,
            'input_size': 7,
            'output_size': 14,
            'level_variability_penalty': 50
        })
    elif interval == 'Yearly':

        config.update({
            #     RUNTIME PARAMETERS
            'chop_val': 25,
            'variable': "Yearly",
            'dilations': ((1, 2), (2, 6)),
            'state_hsize': 30,
            'seasonality': 1,
            'input_size': 4,
            'output_size': 6,
            'level_variability_penalty': 0
        })
    else:
        print("I don't have that config. :(")

    config['input_size_i'] = config['input_size']
    config['output_size_i'] = config['output_size']
    config['tau'] = config['percentile'] / 100
    config['training_tau'] = config['training_percentile'] / 100

    if not config['prod']:
        config['batch_size'] = 10
        config['num_of_train_epochs'] = 15

    return config

Overwriting /content/ESRNN-GPU/es_rnn/config.py


In [7]:
# move to project working directory
%cd /content/ESRNN-GPU/

import pandas as pd
from torch.utils.data import DataLoader
from es_rnn.data_loading import create_datasets, SeriesDataset
from es_rnn.config import get_config
from es_rnn.trainer import ESRNNTrainer
from es_rnn.model import ESRNN
import time

print('loading config')
config = get_config('Monthly')

print('loading data')
info = pd.read_csv('/content/ESRNN-GPU/data/info.csv')

train_path = '/content/ESRNN-GPU/data/Train/%s-train.csv' % (config['variable'])
test_path = '/content/ESRNN-GPU/data/Test/%s-test.csv' % (config['variable'])

train, val, test = create_datasets(train_path, test_path, config['output_size'])

dataset = SeriesDataset(train, val, test, info, config['variable'], config['chop_val'], config['device'])
dataloader = DataLoader(dataset, batch_size=config['batch_size'], shuffle=True)

run_id = str(int(time.time()))
model = ESRNN(num_series=len(dataset), config=config)
tr = ESRNNTrainer(model, dataloader, run_id, config, ohe_headers=dataset.dataInfoCatHeaders)
tr.train_epochs() 

/content/ESRNN-GPU
loading config
loading data

Train_batch: 1

Train_batch: 2


KeyboardInterrupt: ignored

In [10]:
prepare_git_commit()
git_commit(CONFIG_FILE, commit_m='Example nb for es-rnn example committed from Colab')

/content
Cloning into './temp'...
remote: Enumerating objects: 50, done.[K
remote: Counting objects: 100% (50/50), done.[K
remote: Compressing objects: 100% (34/34), done.[K
remote: Total 537 (delta 25), reused 34 (delta 16), pack-reused 487[K
Receiving objects: 100% (537/537), 76.98 MiB | 10.09 MiB/s, done.
Resolving deltas: 100% (337/337), done.
sending incremental file list
LICENSE
README.md
__init__.py
es_rnn_colab_nb_example.ipynb
m4_baseline.R
main.py
es_rnn/
es_rnn/DRNN.py
es_rnn/__init__.py
es_rnn/config.py
es_rnn/data_loading.py
es_rnn/loss_modules.py
es_rnn/main.py
es_rnn/model.py
es_rnn/trainer.py
utils/
utils/__init__.py
utils/helper_funcs.py
utils/logger.py

sent 84,729 bytes  received 359 bytes  170,176.00 bytes/sec
total size is 83,569  speedup is 0.98
/content/temp
On branch master
Your branch is up to date with 'origin/master'.

nothing to commit, working tree clean
Everything up-to-date
/content
