# Conversion to ONNX
ONNX is a different format for running machine learning models. The ONNX format is much faster on CPU, sometimes 5 times as fast as PyTorch!

While the EAWSW model is designed to be small, accurate and accessible, for some people it's still too much to run...

Hosting the model as a free service for players is an option. An ONNX version of the model allows us to host the model on CPU yet have faster response times! Given that the model is made in a time with chip shortage, running on hardware I already have inside a server is efficient, scalable and cheaper.

An important note is that ONNX doesn't execute logic by itself, and you have to do that yourself, `onnx_model_manager.py` intends to deal with this for us.

In [1]:
%load_ext autoreload
%autoreload 2

from model_utils import train_model, split_data, split_branches, get_model, set_pretrained_model_dropout, get_dataset, ModelSeeder
from config import Config
import json
import matplotlib.pyplot as plt
%matplotlib inline
import math
import random
import time
import onnx
import logging
from onnx_model_manager import OnnxModelManager
from onnxruntime.quantization import quantize_dynamic, QuantType
import os
import datasets
import torch
import transformers
from transformers import AutoTokenizer, AutoModelForCausalLM
from model_manager import ModelManager

In [2]:
saved_model_path = os.path.join("models", "awsw_mixed")
saved_model_onnx_path = os.path.join("models", "awsw_onnx")
if not os.path.exists(os.path.join(saved_model_path, "special_tokens_map.json")):
    print("Copying config files from huggingface (needed for conversion)... WARNING: this assumes the structure of the model isn't changed!")
    !cd $saved_model_path && git clone https://huggingface.co/EleutherAI/gpt-neo-125M
    !cp -n $saved_model_path/gpt-neo-125M/* $saved_model_path
    !rm -rf $saved_model_path/gpt-neo-125M
if not os.path.exists(os.path.join(saved_model_onnx_path, "model.onnx")):
    !python3 -m transformers.onnx --model=$saved_model_path --feature=causal-lm-with-past $saved_model_onnx_path

In [3]:
def optimize_onnx():
    model_quant = os.path.join(saved_model_onnx_path, "model_quant.onnx")
    if not os.path.exists(model_quant):
        model_fp32 = os.path.join(saved_model_onnx_path, "model.onnx")
        model_opt = os.path.join(saved_model_onnx_path, "model-opt.onnx")
        quantized_model = quantize_dynamic(model_fp32, model_quant, weight_type = QuantType.QInt8)
        #!rm $model_opt
optimize_onnx()

In [4]:
# Tell pytorch to run this model on the GPU.
device_name = "cuda:0" if torch.cuda.is_available() else "cpu"
device = torch.device(device_name)

onnx_model_manager = OnnxModelManager(os.path.join(saved_model_onnx_path, "model-opt.onnx"))
onnx_model_manager_quant = OnnxModelManager(os.path.join(saved_model_onnx_path, "model_quant.onnx"))
tokenizer = AutoTokenizer.from_pretrained('EleutherAI/gpt-neo-125M')
model = AutoModelForCausalLM.from_pretrained(saved_model_path)
model.to(device)
model.eval()
model_manager = ModelManager(model=model, tokenizer=tokenizer, device=device)
print(f"Pretrained model loaded on {device_name}")

Pretrained model loaded on cuda:0


In [5]:
prompt = "In my dreams, I'm a dragon"
for i in range(2):
    print("ONNX:", onnx_model_manager.say_raw(prompt, do_sample=True))
    print("ONNX (Quantized):", onnx_model_manager_quant.say_raw(prompt, do_sample=True))
    print("PyTorch:", model_manager.say_raw(prompt, 50, 0.7))
    print('-' * 100)

ONNX: In my dreams, I'm a dragoness who dreams about fire, dragoness and dragoness, but I don't think I've ever told a dragoness about these things. I don't know what I'm supposed to do with them, but I'm afraid they're not quite as good as I thought. I know they're not quite as good, but they're just myths. I'm afraid they're not quite as good, but they're good. I'm afraid they're not quite as good, but they're good. I'm afraid they're not quite as good, but they're good. I'm afraid they're not quite as good, but they're
ONNX (Quantized): In my dreams, I'm a dragon. I dream of people I met in the past, but I don't think I can remember them well enough to know what they actually mean."<d3d3n<p><color=#ffff00ff}I've met a lot of people, and I think they're crazy."<d3d3n<p><color=#ffff00ff}I've met a lot of people, and I think they're crazy."<d3n3n3<msgn "They don't know what they're talking about, do they?"<|endoftext|>
PyTorch: In my dreams, I'm a dragon. I dream of fire, but I don't h

# Testing

We created a few past (for context) + present prompts (player input) and see the different reactions. This way, we can test the models across different iterations.
The first test involves a old prompt to compare the pre-trained model with the one trained on AWSW. Did it manage to store it's data well? Is it able to write down things that have nothing to do with AWSW? (So we know we didn't overfit).

**This test generates boring and repetetive** replies! It's because we use no good sampling algorithm, but it does give us a indication of what the model has learned!

In [6]:
prompts = [
    ('<p><msg>c "Hey Remy!"<d><scn>park2<msg>Ry "Hello, [player_name]."', "How are you?"),
    ('<p><msg>c "I was with Lorem today."<d><scn>park2<msg>Ad "Very nice."', "What do you think of Lorem?"),
    ('<p><msg>m "In Tatsu park, Adine and I sat down."', "Oh my god, Adine. What is this?"),
    ('<p><msg>m "I sat down on a chair in Anna\'s lab."', "What will we do here?"),
]

for (past, prompt) in prompts:
    print(f"Prompt: {prompt}")
    reply = model_manager.say(past, prompt)
    print(f"[Pytorch] Reply: {reply}\n")
    reply = onnx_model_manager.say(past, prompt)
    print(f"[ONNX] Reply: {reply}\n")
    reply = onnx_model_manager_quant.say(past, prompt)
    print(f"[ONNX Quantized] Reply: {reply}\n")
    print("-" * 10)

Prompt: How are you?
[Pytorch] Reply: park2<msg>Ry "I'm not sure, but I'm glad you're here. I'm glad you're here. I'm glad you're here. I'm glad you're here. I'm glad you're here. I'm glad you're here. I'm glad you're here. I'm glad you're here. I'm glad you're here. I'm glad you're

[ONNX] Reply: park2<msg>Ry "Oh, I'm fine."<d><scn>park2<msg>Ry "I'm not sure if I'm ready to do this, but I'm sure you're aware."<|endoftext|>

[ONNX Quantized] Reply: park2<msg>Ry "Do you know what I'm doing here?"<p><msg>c "I'm sorry, but I don't think I'm ready to do this."<d><scn>park2<msg>Ry "Alright, I'll see you later."<|endoftext|>

----------
Prompt: What do you think of Lorem?
[Pytorch] Reply: park2<msg>Ad "I think he's funny."<|endoftext|>

[ONNX] Reply: park2<msg>Ad "I think he's funny."<|endoftext|>

[ONNX Quantized] Reply: park2<msg>Ad "I think he is funny."<d><scn>park2<msg>Ad "I think he is funny."<|endoftext|>

----------
Prompt: Oh my god, Adine. What is this?
[Pytorch] Reply: adineapt<ms

# Sampling test

This is gonna be interesting!

In [7]:
for (past, prompt) in prompts:
    print(f"Prompt: {prompt}")
    reply = model_manager.say(past, prompt, top_k = 50, top_p = 0.7)
    print(f"[Pytorch] Reply: {reply}\n")
    reply = onnx_model_manager.say(past, prompt, do_sample = True)
    print(f"[ONNX] Reply: {reply}\n")
    reply = onnx_model_manager_quant.say(past, prompt, do_sample = True)
    print(f"[ONNX Quantized] Reply: {reply}\n")
    print("-" * 10)

Prompt: How are you?
[Pytorch] Reply: park2<msg>Ry "I'm not sure, but I'm afraid I don't."<d><scn>park2<msg>Ry "I suppose that's true."<d><scn>park2<msg>Ry "Well, if you need anything, I'll do it."<|endoftext|>

[ONNX] Reply: park2<msg>Ry "Oh, I am not sure, but I am afraid that's not too bad. Do you have any news about my family?"<d><scn>park2<msg>Ry "No. I'm sorry, but I couldn’t. I don‘t know what I would think if I wasn’t worried about you anymore."<d><scn>park2<msg>Ry "Well, I know you don’t have plans to retire, and I know what I would do. What I know now is I don’t think I can

[ONNX Quantized] Reply: park2<msg>Ry "Do not know. I'm not sure if you have much experience in the park, though."<d><scn>park2<msg>Ry "I have some experience in the park, and I'd like you to know where I come from, too!"<p><msg>c "Hey Bryce! What's up to do? What have ya done for us? You're the best!"<d><scn>park2<msg>Br "I'm sure I have more to do, but keep in mind I'm doing what I can do to make sure it

# RP test
Testing out the injected roleplay actions

In [8]:
test_rps = [
    "Visit Lorem",
    "Meet with Lorem",
    "Visit Adine",
    "Fight Maverick",
    "Fight Adine",
    "Attack Adine"
]

for rp in test_rps:
    print(f'[Pytorch] {rp} -> {model_manager.say("", rp, top_k = 50, top_p = 0.7)}')
    print(f'[ONNX] {rp} -> {onnx_model_manager.say("", rp, do_sample = True)}')
    print(f'[ONNX Quantized] {rp} -> {onnx_model_manager_quant.say("", rp, do_sample = True)}')
    print("-" * 10)
    
print("Lowercase test")

for rp in test_rps:
    rp = rp[0].lower() + rp[1:]
    print(f'[Pytorch] {rp} -> {model_manager.say("", rp, top_k = 50, top_p = 0.7)}')
    print(f'[ONNX] {rp} -> {onnx_model_manager.say("", rp, do_sample = True)}')
    print(f'[ONNX Quantized] {rp} -> {onnx_model_manager_quant.say("", rp, do_sample = True)}')
    rp = rp.lower()
    print(f'[Pytorch] {rp} -> {model_manager.say("", rp, top_k = 50, top_p = 0.7)}')
    print(f'[ONNX] {rp} -> {onnx_model_manager.say("", rp, do_sample = True)}')
    print(f'[ONNX Quantized] {rp} -> {onnx_model_manager_quant.say("", rp, do_sample = True)}')
    print("-" * 10)

[Pytorch] Visit Lorem -> lorem<msg>Ip "I don't know, but I'll give it to you later in the book if you're interested."<|endoftext|>
[ONNX] Visit Lorem -> lorem<msg>m "Go back."<d><scn>lorem<msg>m "Go back."<|endoftext|>
[ONNX Quantized] Visit Lorem -> lorem<msg>m "I'll show you the computer, but don't tell him it's too bad."<|endoftext|>
----------
[Pytorch] Meet with Lorem -> lorem<msg>Lo "Oh, [player_name]!"<p><msg>c "What is it?"<d><scn>lorem<msg>Lo "I'm sorry, but I'm free."<d><scn>lorem<msg>Ip "If you don't want to do this, you don't have to."<d><scn>lorem<msg>Ip "I'll leave you to do this, but I won't let you do this."<d
[ONNX] Meet with Lorem -> lorem<msg>Lo "Oh, it's you!"<d><scn>lorem<msg>Lo "Hey!"<d><scn>lorem<msg>Lo "I'm so glad to see you."<d><scn>lorem<msg>Lo "You're such a happy child, too."<d><scn>lorem<msg>Lo "I know you're not quite alone, but I'm not alone."<d><scn>lorem<msg>Lo "You're right. I'm sorry, but I understand. I don't think it
[ONNX Quantized] Meet with Lore