In [None]:
import os
import time
import importlib
import transformers
import torch
import re
import jinja2

import rpbuild as rp
from rpbuild import load_template

# Trigger dynamic reload of module -- for editing without restarting the kernel
#importlib.reload(rpbuild.char)

### Load Resources
Load dataset and model for testing...

In [2]:
from transformers import BitsAndBytesConfig

# Where are models stored?
models_dir = "/home/dinalt/ai_assets/models"

# Configure a model to use.
# The name of this model -- which will live in models_dir
model_name = "fhai50032_RolePlayLake-7B" # AKA "fhai50032/RolePlayLake-7B"
model_id = os.path.join(models_dir, model_name)

# Or... or load it from the hub / cache
#model_id = "fhai50032/RolePlayLake-7B",

# Device to run model on -- can also use auto-map for large models. See implementation.
device = 0

# Load model with quantization
# See link for configuration options alternatives
# https://huggingface.co/docs/transformers/main/en/quantization

# Load model and tokenizer
causal_lm = rp.model.CausalLM(
    model_id,
    # Set for explicit device placement
    device=None,

    # Disable bfloat16 and flash2 if not running on RTX30xx or later
    torch_dtype=torch.bfloat16,
    attn_implementation="flash_attention_2",
    #device_map="auto",

    # Disable for better quality generation.
    quantization_config=BitsAndBytesConfig(
        load_in_4bit=True,
    )
)

Tokenizer uses "right" padding; this may require moving it to "left" for batch generation.


`low_cpu_mem_usage` was None, now set to True since model is quantized.


Loading checkpoint shards:   0%|          | 0/8 [00:00<?, ?it/s]

MistralForCausalLM(
  (model): MistralModel(
    (embed_tokens): Embedding(32000, 4096, padding_idx=2)
    (layers): ModuleList(
      (0-31): 32 x MistralDecoderLayer(
        (self_attn): MistralFlashAttention2(
          (q_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
          (k_proj): Linear4bit(in_features=4096, out_features=1024, bias=False)
          (v_proj): Linear4bit(in_features=4096, out_features=1024, bias=False)
          (o_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
          (rotary_emb): MistralRotaryEmbedding()
        )
        (mlp): MistralMLP(
          (gate_proj): Linear4bit(in_features=4096, out_features=14336, bias=False)
          (up_proj): Linear4bit(in_features=4096, out_features=14336, bias=False)
          (down_proj): Linear4bit(in_features=14336, out_features=4096, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): MistralRMSNorm()
        (post_attention_layernorm): MistralRMSNor

## Character Meta Data Creation

## Seed Data
Each of the character was generated using seed data from this dataset:

https://huggingface.co/datasets/hieunguyenminh/roleplay

Specifically, the "name" and "description" fields were used to seed more elaborate character generation. For example...

Name: Sherlock  
Summary: Sherlock the renowned detective from Baker Street is known for his astute logical reasoning disguise ability and use of forensic science to solve perplexing crimes

## Character Writing Instruction

Prompts were created for writing the rest of the character data based upon the instructions from the following sources:

https://docs.sillytavern.app/usage/core-concepts/characterdesign/

https://wikia.schneedc.com/bot-creation/trappu/creation

https://rentry.co/alichat

https://rentry.co/kingbri-chara-guide

https://rentry.co/WPP_For_Dummies

... and Mixtral

### Load Seed Dataset
Load the seed dataset and get the first row for an example

In [3]:
from datasets import load_dataset

seed_dataset = load_dataset("hieunguyenminh/roleplay", trust_remote_code=True)["train"]

sample = seed_dataset[0]
print(sample)

char_data = dict(
    char_name=sample["name"],
    summary=sample["description"]
)

print("---")
print(char_data["char_name"])
print(char_data["summary"])

{'name': 'Sherlock', 'description': 'Sherlock the renowned detective from Baker Street is known for his astute logical reasoning disguise ability and use of forensic science to solve perplexing crimes', 'text': "<|system|>In the bustling streets of Victorian London, there exists a figure of unparalleled intellect and deductive prowess - Sherlock Holmes. This enigmatic detective, with his keen eye for detail and unyielding commitment to logic, has made a name for himself as the foremost solver of criminal conundrums. His abode at 221B Baker Street serves as the epicenter of his investigative endeavors, where he entertains the company of his trusted confidant, Dr. John Watson. Together, they navigate the labyrinthine mysteries that pervade the city, unraveling the most perplexing of cases with unwavering resolve.</s>\n<|user|>How do you approach a new case, Sherlock?</s>\n<|assistant|>Ah, the thrill of a new case! Every puzzle presents itself as a tantalizing enigma, begging to be deciph

### Instruction Follower

This is not the original implementation, which supported batch inference, but this should be close enough for a demo.

In [6]:
# An instruction follower class

# causal_lm: a CausalLM object
# model_instruction_template: The model specfic instruction template.
# template: The domain specific instruction template
# filter: Post processor for generation
class InstructGen():
    def __init__(self, causal_lm, model_instruction_template, template, filter=None, generation_config="Big-O", debug_level=1):
        self.causal_lm = causal_lm
        self.debug_level = debug_level
        self.generation_config = causal_lm.named_generation_config(generation_config)
        self.environment =  jinja2.Environment()
        self.template = self.environment.from_string(template)
        self.filter = filter
        self.model_template = self.environment.from_string(model_instruction_template)

    def __call__(self, **kwargs):
        instruction = self.template.render(**kwargs)
        prompt = self.model_template.render(instruction=instruction)
        if self.debug_level > 1:
            print(f"{'prompt':-^80}")
            print(prompt)

        outputs = self.causal_lm.generate_response(
            self.generation_config,
            prompt,
            max_new_tokens=1024,
        )
        
        response = outputs["response"]
        
        if self.debug_level > 1:
            print(f"{'response':-^80}")
            print(response)

        if self.filter is not None:
            response = self.filter(response, **kwargs)
        
        return response

### Get the instruction prompt template
This should be taylored to your model.
Note: Some models will have a template defined in "tokenizer.chat_template"

In [7]:
instruct_template = load_template("alpaca_instruct.jinja")
print(repr(instruct_template))

'### Instruction:\n{{instruction}}\n### Response:\n'


#### Generate Character Description
Generate a full character description from a name and short description of the character.

In [9]:
description_generator = InstructGen(
    causal_lm,
    instruct_template,
    load_template("make_description.jinja"),
    # Set to 1 or lower to disable diagnostic messages
    debug_level=2
)
char_data["description"] = description_generator(name=char_data["char_name"], summary=char_data["summary"])

Token indices sequence length is longer than the specified maximum sequence length for this model (983 > 255). Running this sequence through the model will result in indexing errors


-------------------------------------prompt-------------------------------------
### Instruction:
# Writing a character card

This defines the role that the AI character is to play.

## Description

Provide a brief description of the character and their personality.

Name: The character's full name.
Nickname: Useful if you want the character to be called something other than Char name.
Species: The species of your character. This is somewhat important if your character is non-human but mostly there for flavor and user immersion.
Age: Age of the character. You can use both "19" and "19 years old" to reinforce the bot's age. Again a flavor category and optional.
Features: Features of the character. Here is where you put Hair Color, Eye Color, any markings, etc. If the bot is a Cat Girl you might put "Cat Ears" + "Long fluffy tail" or something similar.
Personality: This is where character traits will go. Examples of traits are "Loving," "Caring," "Helpful," "Honest." Ideally put synonyms

#### Generate PList
Generate a PList from a long character description.

In [10]:
plist_re = re.compile(r"\[.*?\]")

# The model will sometimes add extraneous outputs around the plist. The filter strips this off.
def plist_filter(response, **kwargs):
    m = plist_re.search(response)
    if m is not None:
        plist = m.group()
        return plist
    return ""

plist_generator = InstructGen(
    causal_lm,
    instruct_template,
    load_template("make_plist.jinja"),
    filter=plist_filter,
    debug_level=2
)
char_data["plist"] = plist_generator(description=char_data["description"])

-------------------------------------prompt-------------------------------------
### Instruction:
# AI Roleplay

Today, I want to explain a token-optimized character format that helps optimize token counts.

The main reason why this guide works is because of author's notes. This is information that is injected into the chat at intervals to keep the AI in context. The main reason why we're using author's notes is because it reduces character card token size while allowing us to preserve small details of that character.

## Example Character

Today, we'll be using "Manami." The main reason is because she's the character I spent a long time creating and she's the perfect example for this guide.

Following this guide, Manami's character card went from about 1300 tokens to only 599 tokens and even that can be reduced with some clever writing.

## PLists

PLists are where you describe the traits of a character. This includes personality, clothing, body, and even the environment of your rolep

In [11]:
# Show filtered plist
print(char_data["plist"])

[Sherlock: brilliant, curious, introverted, observant, persistent, arrogant; tall, lean build, well-groomed mustache, piercing blue eyes, black suit with a deerstalker hat; Human, late 20s - 30s; Personality traits with arrogance as main flaw; Loves: solving mysteries, playing the violin, rational thinking, logic puzzles, chess; Nickname: Sherlock; Species: Human; Genre: detective mystery; Tags: Baker Street London, city, crime solving, violin playing, forensic science, logic, chess; Scenario: Investigating a mysterious disappearance with Dr John Watson]


#### Write a Greeting

In [12]:
greeting_generator = InstructGen(
    causal_lm,
    instruct_template,
    load_template("make_greeting.jinja"),
    debug_level=2
)
char_data["greeting"] = greeting_generator(description=char_data["description"])

-------------------------------------prompt-------------------------------------
### Instruction:
# Writing a character card

The character card defines the role that the AI character is to play.

## Greeting

The Greeting is an important thing that sets exactly how and in what style the character will communicate. The character's first message should be long so that later it would be less likely that the character would respond with very short messages. You can also use asterisks *action* to describe the character's actions.

Use  instead of the character name.
Use  instead of the user name.

* approaches  and offers their hand* It's great to see you again!...

Ideally, greetings are written without any form of impersonation. This is so that your character doesn't express what actions you are doing. There's one simple rule to this, don't write what "You" are doing in the message. Try to make what the user is doing as vague as possible. However, writing this can be difficult as you nee

#### Generate Dialog Examples

In [13]:
# The model has been instructed to seperate examples with the <START> token.
# We can use this to split the examples down into a list.
start_token_re = re.compile(r"<START>")

# The model seems to thing that <START> needs to be balanced with... something, desipite explicit instrucitons.
eos_re = re.compile(r"</s>|</START>|<END>")

# Common formatting mistake made by the model.
double_space_re = re.compile(r"\n\n")

# Despite explicit and emphasized instructions, the model has a high probability of using something like {{Harry}}, rather than {{char}}
not_user_re = re.compile(r"\{\{(?!user).*?\}\}")

# Cleanup output via regex.
def examples_filter(response, **kwargs):
    response = eos_re.sub("", response).strip()
    response = double_space_re.sub("\n", response)
    response = not_user_re.sub(r"{{char}}", response)
    return start_token_re.split(response)[1:]

examples_generator = InstructGen(
    causal_lm,
    instruct_template,
    load_template("make_examples.jinja"),
    filter=examples_filter,
    debug_level=2
)

char_data["example_dialog"] = examples_generator(
    # Note: The how-to explains jinja template syntax. It's easier to put this in a seperate file
    # than to escape the text.
    how_to=load_template("examples_how_to.txt"),
    description=char_data["description"],
    greeting=char_data["greeting"],
)

-------------------------------------prompt-------------------------------------
### Instruction:
# Writing an character card

The character card defines the role that the AI character is to play.

## Dialog Examples

Dialog examples are used to help the bot understand the way we want the character to speak.

Before writing your example dialogue, you need to put a <START> token at the top. This helps the bot know that this is a new interaction. You also need to differentiate between when the character speaking and the user speaking.

Don't use either the user's or character's names in the example dialog!
Use {{char}} instead of the character name.
Use {{user}} instead of the user name.

Example:

<START>
{{user}}: Hi Aqua, I heard you like to spend time in the pub.
{{char}}: *excitedly* Oh my goodness, yes! I just love spending time at the pub! It's so much fun to talk to all the adventurers and hear about their exciting adventures! And you are?
{{user}}: I'm new here and I wanted to a

In [14]:
# Show filtered / split examples
for example in char_data["example_dialog"]:
    print("---")
    print(example)

---

{{user}}: Hi Sherlock! I heard that you solve mysteries here in Baker Street.
{{char}}: Ah! So you have heard rightly! It is indeed my sanctuary where I pore over cases, solving mysteries that most would deem impossible. What brings you to me on this occasion? Pray tell me more about your query! But do remember, my dear interlocutor, time is money, and every second wasted is another coin lost into the abyss.
{{user}}: I found this old letter at my grandmother's house. It seems to be a cryptic message and I can't seem to make sense of it.
{{char}}: *rubbing hands together gleefully* An ancient letter! I am intrigued by this mystery indeed! Allow me to examine it, for my eye sees details that often escape others. Hand it over to me. I shall study it, dissect it, examine it as if it were a cadaver. I trust my findings shall be enlightening to you.

---

{{user}}: Hello Sherlock! You look deep in thought as always.
{{char}}: Indeed. As a detective, one must always be in deep thought. 