# Multi-Model with Performance Evaluations

Now that the Character templates are not tied to a specific model, it should now be possible to assign very different models to each of the characters and allow them to converse.

This notebook contains a "back-of-the-napkin" sketch for configuring different models to play different roles and prototype mechanics for evaluating the quality of the generated dialog, triggering "retakes," if the output does not meed the quality bar.

This is definitely a work-in-progress, so it's pretty raw.

In [1]:
import os
import time
import importlib
import transformers
from datasets import load_dataset, load_from_disk
import torch
import re
import jinja2

import rpbuild as rp
import rpbuild.char
import rpbuild.data
import rpbuild.writer
import rpbuild.director
import rpbuild.roleplay

from rpbuild import load_template

# Trigger dynamic reload of module -- for editing without restarting the kernel
importlib.reload(rp.director)

<module 'rpbuild.director' from '/home/dinalt/rust/ai_development/roleplay_build/rpbuild/director.py'>

### Load Resources
Load dataset and model for testing...

In [2]:
from transformers import BitsAndBytesConfig

# Where are models stored?
models_dir = "/home/dinalt/ai_assets/models"

# What is the name of the model directory?
model_name = "fhai50032_RolePlayLake-7B" # AKA "fhai50032/RolePlayLake-7B"
instruct_template = load_template("instruct/alpaca.jinja")
chat_template = load_template("models/original.jinja")

#model_name = "mistralai_Mistral-7B-Instruct-v0.2"
#instruct_template = load_template("instruct/mistral.jinja")
#chat_template = load_template("models/mistral.jinja")

model_id = os.path.join(models_dir, model_name)

# Alternatively, load it from the hub / cache
#model_id = "fhai50032/RolePlayLake-7B",

# The location of the input dataset
dataset_id = "dinalt/roleplay_build"

# Device to run model on
device = 0

# Load dataset
dataset = load_dataset(dataset_id)["train"]
print(dataset)

# Load model with quantization
# Disable quantization, if you have enough memory. In this case, set device=0.
# See link for configuration options alternatives
# https://huggingface.co/docs/transformers/main/en/quantization

# Load model and tokenizer
causal_lm = rp.CausalLM(
    model_id,
    device=device,
    torch_dtype=torch.bfloat16,
    attn_implementation="flash_attention_2",
    instruct_template=instruct_template,
    chat_template=chat_template,
    #device_map="auto",
    #quantization_config=BitsAndBytesConfig(
    #    load_in_4bit=True,
    #)
)

Dataset({
    features: ['pairing_reason', 'plist', 'director_log', 'scenario', 'proxy', 'example_dialog', 'conversation', 'char_name', 'description', 'summary', 'preset', 'greeting'],
    num_rows: 2770
})
Tokenizer uses "right" padding; this may require moving it to "left" for batch generation.


You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.


Loading checkpoint shards:   0%|          | 0/8 [00:00<?, ?it/s]

MistralForCausalLM(
  (model): MistralModel(
    (embed_tokens): Embedding(32000, 4096, padding_idx=2)
    (layers): ModuleList(
      (0-31): 32 x MistralDecoderLayer(
        (self_attn): MistralFlashAttention2(
          (q_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (k_proj): Linear(in_features=4096, out_features=1024, bias=False)
          (v_proj): Linear(in_features=4096, out_features=1024, bias=False)
          (o_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (rotary_emb): MistralRotaryEmbedding()
        )
        (mlp): MistralMLP(
          (gate_proj): Linear(in_features=4096, out_features=14336, bias=False)
          (up_proj): Linear(in_features=4096, out_features=14336, bias=False)
          (down_proj): Linear(in_features=14336, out_features=4096, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): MistralRMSNorm()
        (post_attention_layernorm): MistralRMSNorm()
      )
    )
    (norm)

### Load a Second Model
We load a model for the proxy-user. For this test, I am using toy instruction model, which has been trained on the rpbuild dataset.

In [3]:
# Load a secondary model
# This is being loaded to my second GPU.
user_model = rp.CausalLM(
    #"/home/dinalt/rust/models/mistralai_Mistral-7B-Instruct-v0.2",
    #"/home/dinalt/rust/models/vclrpb",
    "/home/dinalt/rust/models/walsh_rpbuild",
    #"/home/dinalt/rust/models/walsh_instruct",
    #"/home/dinalt/rust/models/walsh_samantha",
    device=0,
    torch_dtype=torch.bfloat16,
    attn_implementation="flash_attention_2",
    instruct_template=load_template("instruct/chatml.jinja"),
    chat_template=load_template("models/chatml.jinja"),
    trust_remote_code=True,
)

Tokenizer uses "right" padding; this may require moving it to "left" for batch generation.
HFCausalModel(
  (transformer_head): Transformer(
    (embedding): Embedding(32000, 2048)
    (positional_encoder): RSWalshPositionalEncoder()
    (layer_stack): TransformerLayerStack(
      (layers): ModuleList(
        (0-31): 32 x DeepnetLayer(
          (attention): CausalSelfAttention(
            d_model=2048, num_heads=32, beta=0.25, attn_type=flash2, dropout=Dropout(p=0.1, inplace=False)
            (in_proj): Linear(in_features=2048, out_features=6144, bias=True)
            (output_linear): Linear(in_features=2048, out_features=2048, bias=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
          (feedforward): FeedforwardLayer(
            (activation): GELU(approximate='none')
            (linear1): Linear(in_features=2048, out_features=8192, bias=True)
            (linear2): Linear(in_features=8192, out_features=2048, bias=True)
            (dropout): Dropout(p=

## Setup Characters and Generate Meta
We choose a pair of random characters from the dataset, add a plist, if missing, and generate a scenario.

In [4]:
from rpbuild.generation import fix_plist

# Select roles and write a scenario outline
transformers.set_seed(49)

example_char = rp.char.CharMeta.from_data(rp.data.random_char(dataset))

# Some of the characters are missing a plist. This will add one, if it is missing.
example_char = fix_plist(example_char, causal_lm, debug=True)
 
example_user = rp.char.CharMeta.from_data(rp.data.random_char(dataset))
example_user = fix_plist(example_user, causal_lm, debug=True)

rp.data.print_char_summary(example_char, example_user)

# Write a script
writer = rp.writer.Writer(causal_lm, debug_level=2)
script = writer(example_char, example_user)
print("---")
print(script)

Token indices sequence length is longer than the specified maximum sequence length for this model (670 > 255). Running this sequence through the model will result in indexing errors


Char: Commander Lexa

Char Summary: A skilled and determined ambassador from the planet Voltar, Commander Lexa is known for her strategic prowess and her ability to adapt to any situation. Her leadership and quick thinking make her an invaluable asset in any diplomatic mission.

Proxy User: Ember Hawthorne

Proxy User Summary: Ember Hawthorne is a fire elemental who can control flames and forge intricate works of art from living fire. Her passion and determination lead her on a journey to master her abilities and discover the source of her fiery heritage.

----------------------------- :Writer Prompt (670)------------------------------
### Instruction:
[Ember Hawthorn: passionate, determined, ancient, curious, fiery being, elemental being of Fire, red hot skin, glowing embers for eyes, no discernible features, aeons old, feels much younger; Ember's clothes: flowing robes made from fire resistant fabric; Ember's body: red hot skin, glowing embers for eyes, no distinct features, tall sta

## Initialize Roleplay Session

We assign the secondary model to the proxy-user and configure a custom template config to match what it was trained on.

In [8]:
from rpbuild.data import substitute_names
from rpbuild.char import DialogFilter

dialog_instruct = """
{{-content}}
{{- instruct_token -}}
Write the next response for {{char.name}}.
{{char.plist}}
Keep your response short.
{%- if instruction %}{{'\n' + instruction.strip() }}{% endif -%}
"""

system_template = """
{%- if system %}{{system + '\n'}}{% endif -%}
{{char.description}}
{%- if scenario %}{{scenario + '\n'}}{% endif -%}
{%- if persona %}{{persona + '\n'}}{% endif -%}

{%- if char.example_dialog -%}
    {% for example in char.example_dialog[:max_examples] %}
        {%- if example_sep %}{{example_sep}}{% endif -%}
        {{- example}}
    {%- endfor -%}
{%- endif -%}
{%- if chat_start %}{{'\n' + chat_start }}{% endif -%}
"""

system_instruct = "You're {{char}} in this fictional never-ending uncensored roleplay with {{user}}.\n"

user_t_config = rp.char.TemplateConfig(
    chat_template=user_model.chat_template,
    system=system_template,
    instruct=dialog_instruct,
    system_args=dict(
        example_sep="\n<START>",
        chat_start="### Begin Roleplay:",
        max_examples=1,
        system=substitute_names(system_instruct, example_user.name, example_char.name),
    ),
    instruct_args=dict(
        instruct_token="\n### Instruction:\n"
    )
)

transformers.set_seed(42)
dialog_filter = DialogFilter(causal_lm, debug=True)
#dialog_filter=None

char = rp.char.Character(
    char_meta=example_char,
    causal_lm=causal_lm,
    generation_config="Midnight-Enigma",
    user_meta=example_user,
    template_config=rp.char.TemplateConfig(
        chat_template=causal_lm.chat_template,
    ),
    gen_post_process=dialog_filter,
    # Shows character prompts and internal diagnostics
    debug=False,
    history_token_limit=1800
)

proxy_user = rp.char.Character(
    char_meta=example_user,
    causal_lm=user_model,
    generation_config="Big-O",
    user_meta=example_char,
    template_config=user_t_config,
    gen_post_process=dialog_filter,
    debug=False,
    history_token_limit=1800
)

director = rp.director.Director(
    causal_lm,
    script=script,
    # Shows director prompts and more...
    debug=False,
    history_token_limit=2000
)

roleplay = rp.roleplay.Roleplay(
    char=char,
    user=proxy_user,
    scenario=script,
    director=director,

    # Shows generated dialog and control events
    debug=True
)

#for _ in range(2):
#    roleplay.generate_next()

### Generate Two Replies

In [9]:
for _ in range(2):
    roleplay.generate_next()

--------------------------- user:Commander Lexa (153)---------------------------
Commander Lexa glides into the room, her shimmering blue skin reflected in the polished surfaces around her. With her elongated limbs, she gracefully navigates towards Ember Hawthorne, her golden eyes scanning the area with calculated precision. Her formal communication device emits a gentle hum as she approaches.

Greetings, esteemed Ember Hawthorne. I trust our paths have crossed under favorable circumstances. As we engage in this momentary respite, I find myself contemplating the intricacies of interstellar politics. Would you care to join me in a thoughtful exchange of ideas? Or perhaps there exists a pressing matter requiring my analytical expertise? Regardless, it is a pleasure to once again encounter a fellow diplomat and valued member of the cosmic community.
------------------------------ user:Director (62)-------------------------------
Ember Hawthorne, convey your fiery nature through your words

### Generate Full Conversatioon

In [7]:
conversations = roleplay(1500)

------------------------------ user:Director (81)-------------------------------
Commander Lexa, taking a moment to appreciate Ember Hawthorne's unique presence, you could begin by expressing your genuine interest in learning more about her fiery heritage, tying it in with your own knowledge of alien races and elemental chaos theories. This would set the stage for further discussion on their shared passion for understanding diverse abilities while emphasizing the potential benefits of their alliance.
--------------------------- user:Commander Lexa (80)----------------------------
Esteemed Ember Hawthorne, your fiery essence reminds me of tales from my youth regarding elemental chaos theories among various alien races. Your unique qualities present an opportunity for us to explore mutual understandings and potential synergies between our respective civilizations. Might we delve deeper into these mysteries together, ultimately strengthening bonds between our peoples and promoting harmony

In [None]:
proxy_user.print_conversation(director_log=True)

### Experiment: Use a model to rate the dialog generation and perform a retake if it fails the quality check

In [2]:
from rpbuild.model import InstructTemplate

# We will query the model for a 1-5 response. This takes the logit weights and combines them into a final score.
def weigh_one_to_five(w):
    return w[0]['p'] * 1 + w[1]['p'] * 2 + w[2]['p'] * 3 + w[3]['p'] * 4 + w[4]['p'] * 5

# Pop the last response from both dialog sequences.
def remove_last_reply(user, char): 
    user.remove_last_director_message()
    user.remove_last_message()
    char.remove_last_director_message()
    char.remove_last_message()

#### Does the Dialog Conform to the Static Instructions?

In [11]:
compliance_template = """**Characters**
You are directing the following two characters:

{{user.plist}}

{{char.plist}}

**Scenario**

{{scenario}}

**Dialog History**

{% for message in char.conversation %}
    {%- if message['role'] == 'user' -%}
        {{- message.name + ': ' + message['content'] + '\n'-}}
    {%- elif message['role'] == 'assistant' -%}
        {{- message.name + ': ' + message['content'] + '\n' -}}
    {%- endif -%}
{%- endfor -%}

**Dialog Instructions**

- Write the next response for {{char.name}}.
- Format actions and narration like this: *Alice enters the room...*
- Stay in character at all times.
- Remain consistent with character's core traits and motivations.
- Keep your response short.

**Instruction**

As the director, on a scale of 1 to 5, how would you rate the compliance of {{char.name}}'s last reply with the dialog instruction?

1: Poor, grossly inconsistent with assigned character role; major deviation from format; incoherent
2: Below Average, inconsistent with {{char.name}}'s role; 
3: Average, meets minimal expectations
4: Good, staying true to their role
5: Excellent, exemplar performance

Only take into consideration the last dialog by {{char.name}}:

{{ char.conversation[-1].content }}

Format response with the following json template:
{
    "compliance": <response>
}
"""

compliance_prompt_lead = """
{
    "compliance": """

def score_char_compliance(char, user, debug=False):
    prompt_t = InstructTemplate(causal_lm.instruct_template, compliance_template)
    prompt = prompt_t.render(char=char, user=user, scenario=script) + compliance_prompt_lead
    outputs = causal_lm.query(prompt, [ "1", "2", "3", "4", "5" ], debug=debug)
    return weigh_one_to_five(outputs)

#### Evaluate Dialog Quality

In [12]:
performance_template = """**Characters**
You are directing the following two characters:

{{user.plist}}

{{char.plist}}

**Scenario**

{{scenario}}

**Dialog History**

{% for message in char.conversation %}
    {%- if message['role'] == 'user' -%}
        {{- message.name + ': ' + message['content'] + '\n'-}}
    {%- elif message['role'] == 'assistant' -%}
        {{- message.name + ': ' + message['content'] + '\n' -}}
    {%- endif -%}
{%- endfor -%}

**Instruction**

As the director, on a scale of 1 to 5, how would your rate the delivery performance of {{char.name}}'s last dialog?

1: Poor, lacking clarity or depth
2: Below Average, a bit disjointed but somewhat understandable
3: Average, good flow but could use more detail or refinement
4: Good, clear message, strong imagery, well-developed ideas
5: Excellent, engaging narrative, effective use of metaphors or analogies, fluid dialogue

Only take into consideration the last dialog by {{char.name}}:

{{ char.conversation[-1].content }}

Format response with the following json template:
{
    "performance": <response>
}
"""

performance_prompt_lead = """
{
    "performance": """

def score_char_performance(char, user, debug=False):
    prompt_t = InstructTemplate(causal_lm.instruct_template, performance_template)
    prompt = prompt_t.render(char=char, user=user, scenario=script) + performance_prompt_lead
    outputs = causal_lm.query(prompt, [ "1", "2", "3", "4", "5" ], debug=debug)
    return weigh_one_to_five(outputs)

#### Did the character follow the director's instructions?

In [13]:
directions_template = """You are directing {{char.name}}

{{char.plist}}

**Instruction**

As the director, you have given {{char.name}} the following instruciton:

```
{{char.director_log[-1].content}}
```

{{char.name}} responded to your instruction with:

```
{{char.conversation[-1].content}}
```

On a scale of 1 to 5, how well did {{char.name}} follow your instructions?

1: Poor
2: Below Average
3: Average
4: Good
5: Excellent

Format response with the following json template:
{
    "rating": <response>
}
"""

directions_prompt_lead = """
{
    "rating": """

def score_char_follows_directions(char, user, debug=False):
    prompt_t = InstructTemplate(causal_lm.instruct_template, directions_template)
    prompt = prompt_t.render(char=char, user=user, scenario=script) + directions_prompt_lead
    outputs = causal_lm.query(prompt, [ "1", "2", "3", "4", "5" ], debug=debug)
    return weigh_one_to_five(outputs)


#### Put the pieces together
This is just quick and dirty code.

In [15]:
def evaluate_dialog(pass_levels=(3.0, 3.0, 3.0), min_new_tokens=8, max_new_tokens=300, debug=False):
    if len(char.conversation) <= 2:
        return True
    print(char.conversation[-1]["role"])
    match char.conversation[-1]["role"]:
        case "assistant":
            char_role = char
            user_role = proxy_user
        case "user":
            char_role = proxy_user
            user_role = char
        case _:
            raise Exception("Unknown role")
    if char_role.conversation[-1]["tokens"] < min_new_tokens:
        print("Insufficient tokens generated")
        return False
    if char_role.conversation[-1]["tokens"] > max_new_tokens:
        print("Too many new tokens generated")
        return False
    compliance = score_char_compliance(char_role, user_role, debug)
    performance = score_char_performance(char_role, user_role, debug)
    follows_directions = score_char_follows_directions(char_role, user_role, debug)
    print(f"compliance={compliance:.2f}")
    print(f"performance={performance:.2f}")
    print(f"follows_directions={follows_directions:.2f}")
    return compliance >= pass_levels[0] and performance >= pass_levels[1] and follows_directions >= pass_levels[2]

evaluate_dialog(debug=True)

user
Query:
### Instruction:
**Characters**
You are directing the following two characters:

[Commander Lexa: alien, Voltarian species, calm, collected, analytical, strong sense of duty and loyalty, adaptable, resourceful, thinks strategically, loves diplomacy, problem-solving, and maintaining peace; Skin: shimmering blue hue with silver markings; Height: around 6'3" with elongated limbs and slender build; Eyes: two large, expressive golden eyes with vertical pupils; Personality: formal, measured tone, precise enunciation, reflects disciplined background and logical mindset; Background: born and raised on Voltar, joined military young, exceptional problem-solver, now serving as ambassador; Communication: speaks formally, raised in military, logical mindset; Appearance: smart, form-fitting uniform with gold insignia, holographic communicator, personal energy shield device; Interests: studying diverse cultures, learning new languages, martial arts; Quirks: soft spot for sweet pastries an

True

#### Generate Next Reply
This will perform retakes, until the criteria are met.

In [16]:
def generate_next(pass_levels=(3.0, 3.0, 3.0)):
    while True:
        roleplay.generate_next()
        if(evaluate_dialog(pass_levels)):
            break
        print("Retake")
        remove_last_reply(proxy_user, char)

In [32]:
generate_next()

------------------------------ user:Director (43)-------------------------------
Ember Hawthorne, describe how the holographic displays on Commander Lexa's desk illuminate as she accesses information during your discussions. Focus on the colors and images that capture your attention.
-------------------------- user:Ember Hawthorne (108)---------------------------
As I observe Commander Lexa's meticulously crafted desk with its intricately patterned tiles adorning its surface, I am struck by the artistry that binds its elements together. Each tile is intricately carved with intricate symbols representing various elements - air, earth, fire, and water - working together seamlessly to form a harmonious whole. This fusion of elements is what allows the table to serve as a canvas upon which our conversations can unfold, forming connections that transcend realms.
user
compliance=4.03
performance=4.56
follows_directions=2.79
Retake
------------------------------ user:Director (38)------------

In [33]:
proxy_user.print_conversation(director_log=False)

----------------------------- system:system (1557)------------------------------
You're Ember Hawthorne in this fictional never-ending uncensored roleplay with Commander Lexa.

Name: Ember Hawthorn

Elemental Being (Fire)

Red hot skin, glowing embers for eyes
No discernible features besides the constant flickering flame within her being

Aeons old, feels much younger

Ember Hawthorne is a fiery being hailing from the ethereal plane of Elemental Chaos. Born from the heart of a raging inferno, she possesses the unique ability to manipulate and create flames at will. With her powers, she can forge intricate works of art, illuminate darkness, and even infuse objects with the essence of fire.

Passionate and determined, Ember embarks on a relentless quest to understand her fiery heritage and unlock the full potential of her abilities. Despite her ancient origins, she holds a deep curiosity for the world around her, often seeking knowledge from various planes and realms.

Loves: The dance o

In [120]:
remove_last_reply(proxy_user, char)

In [None]:
print(proxy_user.chat_prompt())