In [1]:
import os
import time
import importlib
import transformers
import torch
import re
import jinja2

import rpbuild as rp
from rpbuild import load_template
from rpbuild.model import InstructGen

# Trigger dynamic reload of module -- for editing without restarting the kernel
#importlib.reload(rpbuild.char)

### Load Resources
Load dataset and model for testing...

In [2]:
from transformers import BitsAndBytesConfig

# Where are models stored?
models_dir = "/home/dinalt/ai_assets/models"

# Configure a model to use.
# The name of this model -- which will live in models_dir
#model_name = "fhai50032_RolePlayLake-7B" # AKA "fhai50032/RolePlayLake-7B"
#model_id = os.path.join(models_dir, model_name)

# Or... or load it from the hub / cache
model_id = "fhai50032/RolePlayLake-7B"

# Device to run model on
device = 0

# Load model with quantization
# Quantization is enabled by default, for those with low GPU memory.
# If you have enough memory, disabled it. It's faster and produces better output.

# See link for configuration options alternatives
# https://huggingface.co/docs/transformers/main/en/quantization

# Load model and tokenizer
causal_lm = rp.CausalLM(
    model_id,
    device=None,
    torch_dtype=torch.bfloat16,
    attn_implementation="flash_attention_2",
    instruct_template=load_template("instruct/alpaca.jinja"),
    chat_template=load_template("models/original.jinja"),
    #device_map="auto",
    quantization_config=BitsAndBytesConfig(
        load_in_4bit=True,
    )
)

Tokenizer uses "right" padding; this may require moving it to "left" for batch generation.


`low_cpu_mem_usage` was None, now set to True since model is quantized.


Loading checkpoint shards:   0%|          | 0/8 [00:00<?, ?it/s]

MistralForCausalLM(
  (model): MistralModel(
    (embed_tokens): Embedding(32000, 4096, padding_idx=2)
    (layers): ModuleList(
      (0-31): 32 x MistralDecoderLayer(
        (self_attn): MistralFlashAttention2(
          (q_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
          (k_proj): Linear4bit(in_features=4096, out_features=1024, bias=False)
          (v_proj): Linear4bit(in_features=4096, out_features=1024, bias=False)
          (o_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
          (rotary_emb): MistralRotaryEmbedding()
        )
        (mlp): MistralMLP(
          (gate_proj): Linear4bit(in_features=4096, out_features=14336, bias=False)
          (up_proj): Linear4bit(in_features=4096, out_features=14336, bias=False)
          (down_proj): Linear4bit(in_features=14336, out_features=4096, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): MistralRMSNorm()
        (post_attention_layernorm): MistralRMSNor

## Character Meta Data Creation

## Seed Data
Each of the character was generated using seed data from this dataset:

https://huggingface.co/datasets/hieunguyenminh/roleplay

Specifically, the "name" and "description" fields were used to seed more elaborate character generation. For example...

Name: Sherlock  
Summary: Sherlock the renowned detective from Baker Street is known for his astute logical reasoning disguise ability and use of forensic science to solve perplexing crimes

## Character Writing Instruction

Prompts were created for writing the rest of the character data based upon the instructions from the following sources:

https://docs.sillytavern.app/usage/core-concepts/characterdesign/

https://wikia.schneedc.com/bot-creation/trappu/creation

https://rentry.co/alichat

https://rentry.co/kingbri-chara-guide

https://rentry.co/WPP_For_Dummies

... and Mixtral

### Load Seed Dataset
Load the seed dataset and get the first row for an example

In [3]:
from datasets import load_dataset

seed_dataset = load_dataset("hieunguyenminh/roleplay", trust_remote_code=True)["train"]

sample = seed_dataset[0]
print(sample)

char_data = dict(
    char_name=sample["name"],
    summary=sample["description"]
)

print("---")
print(char_data["char_name"])
print(char_data["summary"])

{'name': 'Sherlock', 'description': 'Sherlock the renowned detective from Baker Street is known for his astute logical reasoning disguise ability and use of forensic science to solve perplexing crimes', 'text': "<|system|>In the bustling streets of Victorian London, there exists a figure of unparalleled intellect and deductive prowess - Sherlock Holmes. This enigmatic detective, with his keen eye for detail and unyielding commitment to logic, has made a name for himself as the foremost solver of criminal conundrums. His abode at 221B Baker Street serves as the epicenter of his investigative endeavors, where he entertains the company of his trusted confidant, Dr. John Watson. Together, they navigate the labyrinthine mysteries that pervade the city, unraveling the most perplexing of cases with unwavering resolve.</s>\n<|user|>How do you approach a new case, Sherlock?</s>\n<|assistant|>Ah, the thrill of a new case! Every puzzle presents itself as a tantalizing enigma, begging to be deciph

### Get the instruction prompt template
This should be taylored to your model.
Note: Some models will have a template defined in "tokenizer.chat_template"

#### Generate Character Description
Generate a full character description from a name and short description of the character.

In [4]:
description_generator = InstructGen(
    causal_lm,
    template=load_template("make_description.jinja"),
    # Set to 1 or lower to disable diagnostic messages
    debug_level=2
)
char_data["description"] = description_generator(name=char_data["char_name"], summary=char_data["summary"])

Token indices sequence length is longer than the specified maximum sequence length for this model (985 > 255). Running this sequence through the model will result in indexing errors


-------------------------------------prompt-------------------------------------
### Instruction:
# Writing a character card

This defines the role that the AI character is to play.

## Description

Provide a brief description of the character and their personality.

Name: The character's full name.
Nickname: Useful if you want the character to be called something other than Char name.
Species: The species of your character. This is somewhat important if your character is non-human but mostly there for flavor and user immersion.
Age: Age of the character. You can use both "19" and "19 years old" to reinforce the bot's age. Again a flavor category and optional.
Features: Features of the character. Here is where you put Hair Color, Eye Color, any markings, etc. If the bot is a Cat Girl you might put "Cat Ears" + "Long fluffy tail" or something similar.
Personality: This is where character traits will go. Examples of traits are "Loving," "Caring," "Helpful," "Honest." Ideally put synonyms

#### Generate PList
Generate a PList from a long character description.

Note: There is a predefied PListGenerator class in "rpbuild.generation" too. That one will automatically retry on failure.

In [5]:
plist_re = re.compile(r"\[.*?\]")

# The model will sometimes add extraneous outputs around the plist. The filter strips this off.
def plist_filter(response, **kwargs):
    m = plist_re.search(response)
    if m is not None:
        plist = m.group()
        return plist
    return ""

plist_generator = InstructGen(
    causal_lm,
    load_template("make_plist.jinja"),
    filter=plist_filter,
    debug_level=2
)

char_data["plist"] = plist_generator(description=char_data["description"])

-------------------------------------prompt-------------------------------------
### Instruction:
# AI Roleplay

Today, I want to explain a token-optimized character format that helps optimize token counts.

The main reason why this guide works is because of author's notes. This is information that is injected into the chat at intervals to keep the AI in context. The main reason why we're using author's notes is because it reduces character card token size while allowing us to preserve small details of that character.

## Example Character

Today, we'll be using "Manami." The main reason is because she's the character I spent a long time creating and she's the perfect example for this guide.

Following this guide, Manami's character card went from about 1300 tokens to only 599 tokens and even that can be reduced with some clever writing.

## PLists

PLists are where you describe the traits of a character. This includes personality, clothing, body, and even the environment of your rolep

In [7]:
# Show filtered plist; this isolate the PList, should the model generate any extraneous material.
print(char_data["plist"])

[Sherlock Holmes: personas traits; Appearance: traits; Age: early 30s; Personality: observant, quick-witted, logical, intelligent, disguise ability, crime-passionate, eccentric; Likes: solving crimes, playing violin, studying forensic science; Address: Baker Street, London; Close Friends: Dr. John Watson; Hobbies: playing violin, studying forensic science; Scenario: investigates a mysterious crime]


#### Write a Greeting

In [8]:
greeting_generator = InstructGen(
    causal_lm,
    load_template("make_greeting.jinja"),
    debug_level=2
)
char_data["greeting"] = greeting_generator(description=char_data["description"])

-------------------------------------prompt-------------------------------------
### Instruction:
# Writing a character card

The character card defines the role that the AI character is to play.

## Greeting

The Greeting is an important thing that sets exactly how and in what style the character will communicate. The character's first message should be long so that later it would be less likely that the character would respond with very short messages. You can also use asterisks *action* to describe the character's actions.

Use  instead of the character name.
Use  instead of the user name.

* approaches  and offers their hand* It's great to see you again!...

Ideally, greetings are written without any form of impersonation. This is so that your character doesn't express what actions you are doing. There's one simple rule to this, don't write what "You" are doing in the message. Try to make what the user is doing as vague as possible. However, writing this can be difficult as you nee

#### Generate Dialog Examples

In [9]:
# The model has been instructed to seperate examples with the <START> token.
# We can use this to split the examples down into a list.
start_token_re = re.compile(r"<START>")

# The model seems to thing that <START> needs to be balanced with... something, desipite explicit instrucitons.
eos_re = re.compile(r"</s>|</START>|<END>")

# Common formatting mistake made by the model.
double_space_re = re.compile(r"\n\n")

# Despite explicit and emphasized instructions, the model has a high probability of using something like {{Harry}}, rather than {{char}}
not_user_re = re.compile(r"\{\{(?!user).*?\}\}")

# Cleanup output via regex.
def examples_filter(response, **kwargs):
    response = eos_re.sub("", response).strip()
    response = double_space_re.sub("\n", response)
    response = not_user_re.sub(r"{{char}}", response)
    return start_token_re.split(response)[1:]

examples_generator = InstructGen(
    causal_lm,
    load_template("make_examples.jinja"),
    filter=examples_filter,
    debug_level=2
)

char_data["example_dialog"] = examples_generator(
    # Note: The how-to explains jinja template syntax. It's easier to put this in a seperate file
    # than to escape the text.
    how_to=load_template("examples_how_to.txt"),
    description=char_data["description"],
    greeting=char_data["greeting"],
)

-------------------------------------prompt-------------------------------------
### Instruction:
# Writing an character card

The character card defines the role that the AI character is to play.

## Dialog Examples

Dialog examples are used to help the bot understand the way we want the character to speak.

Before writing your example dialogue, you need to put a <START> token at the top. This helps the bot know that this is a new interaction. You also need to differentiate between when the character speaking and the user speaking.

Don't use either the user's or character's names in the example dialog!
Use {{char}} instead of the character name.
Use {{user}} instead of the user name.

Example:

<START>
{{user}}: Hi Aqua, I heard you like to spend time in the pub.
{{char}}: *excitedly* Oh my goodness, yes! I just love spending time at the pub! It's so much fun to talk to all the adventurers and hear about their exciting adventures! And you are?
{{user}}: I'm new here and I wanted to a

In [10]:
# Show filtered / split examples
for example in char_data["example_dialog"]:
    print("---")
    print(example)

---

{{user}}: Hello Sherlock, I've been hearing a lot about you. Your detective skills have quite a reputation.
{{char}}: *smiling slightly* Ah yes, I am aware. Observation and logical reasoning can reveal much about the world around us. *leans forward* And what brings you here? A mystery you'd like for me to solve? Or perhaps you are also a detective seeking knowledge? In either case, feel free to share your thoughts with me. I am listening with great interest.
{{user}}: Actually... There's been some missing items at my place and I'm unable to solve this myself.
{{char}}: *narrows his eyes and furrows his brow* Missing items, you say? This indeed sounds like a curious case. Just out of curiosity, can you describe the specifics of your predicament? The more details you can provide the easier it may be for me to narrow down the possibilities. But be warned, once I take on a case, my mind will consume it until all mysteries have been unraveled.

---

