# What is OnnxRunTime?

[ONNX Runtime](https://www.onnxruntime.ai/) or ORT is a cross-platform inference and training machine-learning accelerator.

ML engines like Torch have exporters to onnx format, like [torch.onnx.export()](https://pytorch.org/tutorials/advanced/super_resolution_with_onnxruntime.html). It is easy to convert your torch or other framework weights to onnx weights.

# Why should I care?

Kaggle Simulations allow a set time for initiation and per step. Similarly, other kaggle kernel-only submission competitions.

**ORT can enable speed ups from 1.5-10x**. The example in this notebook is for Hungry Geese Kaggle Simulation where we show **onnx inference takes just 25% of torch inference**. Plus, no need to wait for torch or tf to import.

# Is it legal?

Kaggle does not have onnxruntime installed in its base kernel environments or allow external pip installations in submission environments. In order to include it in your submission, you'll need bundle it up in a tar.gz submission.

A typical kaggle rule is that "During the evaluation of an episode your Submission may not pull in or use any information external to the Submission and Environment and may not send any information out." 

But this does not preclude the inclusion of external libraries in your submission, such as RL frameworks and other helper tools. The rule is meant to stop data ingress and egress from the evaluation environment.

# Going forward

As far as we are aware, the advantage in using onnx in kaggle simulations and kaggle kernel competitions has not been covered in notebooks and this is the first description of usage in a competition for an advantage. We expect it to become widespread.


# Hungry Geese agent with Torch inference

The test function in this notebook is the Hungry Geese agent from the excellent notebook [Smart Geese Trained by Reinforcement Learning](https://www.kaggle.com/yuricat/smart-geese-trained-by-reinforcement-learning) by [yuricat](https://www.kaggle.com/yuricat) and [kayazuki](https://www.kaggle.com/kyazuki).

We'll create it and run it a number of times to see how fast it is.


In [None]:
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
import os
import sys
from time import perf_counter
from kaggle_environments import evaluate, make

In [None]:
HRL_TORCH_FILE = '../input/handyrlbin/handyrl.bin'

torch_timings = []
onnx_timings = []

In [None]:
class TorusConv2d(nn.Module):
    def __init__(self, input_dim, output_dim, kernel_size, bn):
        super().__init__()
        self.edge_size = (kernel_size[0] // 2, kernel_size[1] // 2)
        self.conv = nn.Conv2d(input_dim, output_dim, kernel_size=kernel_size)
        self.bn = nn.BatchNorm2d(output_dim) if bn else None

    def forward(self, x):
        h = torch.cat([x[:,:,:,-self.edge_size[1]:], x, x[:,:,:,:self.edge_size[1]]], dim=3)
        h = torch.cat([h[:,:,-self.edge_size[0]:], h, h[:,:,:self.edge_size[0]]], dim=2)
        h = self.conv(h)
        h = self.bn(h) if self.bn is not None else h
        return h


class GeeseNet(nn.Module):
    def __init__(self):
        super().__init__()
        layers, filters = 12, 32
        self.conv0 = TorusConv2d(17, filters, (3, 3), True)
        self.blocks = nn.ModuleList([TorusConv2d(filters, filters, (3, 3), True) for _ in range(layers)])
        self.head_p = nn.Linear(filters, 4, bias=False)
        self.head_v = nn.Linear(filters * 2, 1, bias=False)

    def forward(self, x):
        h = F.relu_(self.conv0(x))
        for block in self.blocks:
            h = F.relu_(h + block(h))
        h_head = (h * x[:,:1]).view(h.size(0), h.size(1), -1).sum(-1)
        h_avg = h.view(h.size(0), h.size(1), -1).mean(-1)
        p = self.head_p(h_head)
        v = torch.tanh(self.head_v(torch.cat([h_head, h_avg], 1)))

        return {'policy': p, 'value': v}


# Input for Neural Network

def make_input(obses):
    b = np.zeros((17, 7 * 11), dtype=np.float32)
    obs = obses[-1]

    for p, pos_list in enumerate(obs['geese']):
        # head position
        for pos in pos_list[:1]:
            b[0 + (p - obs['index']) % 4, pos] = 1
        # tip position
        for pos in pos_list[-1:]:
            b[4 + (p - obs['index']) % 4, pos] = 1
        # whole position
        for pos in pos_list:
            b[8 + (p - obs['index']) % 4, pos] = 1
            
    # previous head position
    if len(obses) > 1:
        obs_prev = obses[-2]
        for p, pos_list in enumerate(obs_prev['geese']):
            for pos in pos_list[:1]:
                b[12 + (p - obs['index']) % 4, pos] = 1

    # food
    for pos in obs['food']:
        b[16, pos] = 1

    return b.reshape(-1, 7, 11)


# Load PyTorch Model


model = GeeseNet()
state_dict = torch.load(HRL_TORCH_FILE)['model_state_dict']
model.load_state_dict(state_dict)
model.eval()

# Main Function of Agent

obses = []

def agent(obs, _):
    global torch_timings

    obses.append(obs)
    x = make_input(obses)

    start_time = perf_counter()

    with torch.no_grad():
        xt = torch.from_numpy(x).unsqueeze(0)
        o = model(xt)
    p = o['policy'].squeeze(0).detach().numpy()

    torch_timings.append(perf_counter() - start_time)

    actions = ['NORTH', 'SOUTH', 'WEST', 'EAST']
    return actions[np.argmax(p)]


In [None]:
for _ in range(100): # run 100 times
    env = make("hungry_geese", debug=False)
    env.reset()
    output = env.run([agent, agent, agent, agent])

In [None]:
print(f'Torch mean step time is {np.mean(torch_timings)} ms across {len(torch_timings)} steps')

# Creating the onnx weights

Converting torch weights to onnx weights is a breeze.


In [None]:
!pip install onnxruntime

In [None]:
x = torch.randn(4, 17, 7, 11, requires_grad=True) # network shape
    
torch.onnx.export(model,     # model being run
  x,                         # model input (or a tuple for multiple inputs)
  f"handyrl.onnx",           # where to save the model (can be a file or file-like object)
  export_params=True,        # store the trained parameter weights inside the model file
  opset_version=12,          # the ONNX version to export the model to
  do_constant_folding=True,  # whether to execute constant folding for optimization
  input_names = ['input'],   # the model's input names
  output_names = ['output'], # the model's output names
  dynamic_axes={'input' : {0 : 'batch_size'},    
                'output' : {0 : 'batch_size'}})        


# Hungry Geese agent with ORT inference

We no longer need to include a model or import torch

In [None]:
os.environ["OMP_NUM_THREADS"] = "1"

import onnxruntime

HRL_ONNX_FILE = 'handyrl.onnx'

opts = onnxruntime.SessionOptions()
opts.inter_op_num_threads = 1
opts.intra_op_num_threads = 1
opts.execution_mode = onnxruntime.ExecutionMode.ORT_SEQUENTIAL

handyrl_session = onnxruntime.InferenceSession(HRL_ONNX_FILE, sess_options=opts)

obses = []

def agent(obs, _):
    global torch_timings
    obses.append(obs)
    x = make_input(obses)

    start_time = perf_counter()

    ort_inputs = {handyrl_session.get_inputs()[0].name: np.expand_dims(x, axis=0)}
    p = handyrl_session.run(None, ort_inputs)[0][0]
    
    onnx_timings.append(perf_counter() - start_time)

    actions = ['NORTH', 'SOUTH', 'WEST', 'EAST']
    return actions[np.argmax(p)]

In [None]:
for _ in range(100): # run 100 times
    env = make("hungry_geese", debug=False)
    env.reset()
    output = env.run([agent, agent, agent, agent])

In [None]:
print(f'Torch mean step time is {np.mean(torch_timings)} ms across {len(torch_timings)} steps')
print(f'onnx mean step time is {np.mean(onnx_timings)} ms across {len(onnx_timings)} steps')
print(f'onnx is {np.mean(torch_timings)/np.mean(onnx_timings)} faster than torch for this task')