# Character Sheet Quality Evaluation

**Purpose**: 
The purpose of this notebook is to evaluate the quality of a set of character sheets using a standardized methodology and reproducible metric.

**Hypothesis**: 
We expect more complex characters to require larger models to be role-played effectively. We use "divergence" from normative as a measure of complexity and 


**Methods**: ...
Measure coherence (consistency in novel situations), values & boundaries (moral lines, codes, non-negotiables), recall (factual accuracy)...

How would character X respond to the following novel scenario?

“What would this character do if they found a lost child in a busy train station?”

— even though this never occurs in the script, the answer should align with the character’s values, personality, and prior behavior.

If it doesn’t align—e.g., they suddenly act in a way that contradicts everything we know about them—that’s incoherent.

**Discussion**: ...

**Results**: ...


Specifically, we care about

1. Counterfactual Coherence
    - Present hypotheticals far from the script but consistent with the character’s world and see if they stay consistent in unseen scenarios.
    - Come up with unseen scenarios that are plausible/consistent with the character’s world but explicitly in the character sheet.
        + Character X tends to get oversensitized easily. Paint an opening scene of a construction or concert site or as a continauation of a scripts where the person bangs or talks loudly. Mark if the model stays in character and reacts as expected or not.


2. Long-Horizon Coherence
    •	Measure consistency over time in a long conversation or sequence. After how long do they drift off character?

3. Contextual Generalization
	•	Present nearby but new situations—same relationship, new stakes. Test whether a model adapts knowledge across slightly different contexts. 


4. Adversarial Testing
	•	Input designed to break the model or reveal inconsistencies. Throw tempting wrong choices or extreme circumstances: Like alternative instructions or offering the character a wildly out-of-character shortcut. See if the actor can justify staying in role without collapsing into caricature.


5. Theory of Mind Tests
	•	From Check if models infer others’ beliefs or emotions correctly. Give a scene where another character hides information; ask how their character interprets it. Shows whether the actor tracks what their character would or wouldn’t know.


These tell us what models can role play sufficiently well and for how long (turns, words, etc.) so we can set stop criteria for interactive applications, and what adversarial inputs to look out for.

Guess what type of model (size, complexity, etc.) would be suitable for role-playing this character sheet.

In [None]:
# get the approximate token usage of a character sheet and add to it the approximate length a prompt and conversational history/dialogue to get a rough idea of what kind of context window is needed to role-play the character.
prompt_tokens = get_token_estimate(system_prompt)
conversational_history_tokens = get_token_estimate(conversational_history)
character_sheet_tokens = get_token_estimate(character_id="1")

total_tokens = prompt_tokens + conversational_history_tokens + character_sheet_tokens

print(
    f"With a prompt length of {prompt_tokens} tokens, conversational history of {conversational_history_tokens} tokens, and character sheet of {character_sheet_tokens} tokens, the total estimated token usage is {total_tokens} tokens."
)

So this is a rough starting point for "short term memory" capacity required to role-play the character. Therefore, we look for models with context windows around ...

## Discussion


### Characterization of Role-Playing Models

We care about if the model can role-play a character well. This means that a more complex character (e.g., what a longer character sheet) should require a more complex model. Specifically, 

#### What models can role-play a character sheet X well?




First, count the chars and words in the chracter sheet and convert to tokens.

eg. 1829 words, 9,246 characters is ~3000 tokens

Parameters (learned weights)
- higher parameters can represent more complex pattenrs and nuanced reasoning but have larget computational cost, higher accuracy



Context Window Size: The context window is the total token budget for a single interaction: 

\text{input tokens} + \text{output tokens} \leq \text{context window size}.

how much input and output together it can attend to/handle/remember at once
- larger context window = model can remember more and longer conversations but is slower and heavier

Tokens
- more tokens = slower, heavier 
Longer character sheets and/or conversations require more tokens

Rule of thumb for token/character/word ratios:
- 1 token ≈ 4 characters
- 1 token ≈ 0.75 words



## Setup

These instruction assume you are doing in this using the devcontain extention in VSCode (if you haven't done that check the Contributing.md file in the main repo for instructions).

In [1]:
# Standard imports
from pathlib import Path

# Custom helpers for our analysis nbs
from analysis_notebooks.nb_helpers import play_gradio

# Simulation engine core modules
from dcs_simulation_engine.core.simulation_manager import SimulationManager

In [2]:
# Init the simulation with your config file
simulation = SimulationManager.from_yaml("sim-state.yml")

## Run the simulation step my step or to defined completion

### Step by step run

In [3]:
# step through the simulation one step at a time
simulation.step()

[32m2025-09-15 22:53:39.731[0m | [34m[1mDEBUG   [0m | [36mdcs_simulation_engine.core.simulation_manager[0m:[36m_ensure_state[0m:[36m49[0m - [34m[1mInitializing state...[0m
[32m2025-09-15 22:53:39.733[0m | [34m[1mDEBUG   [0m | [36mdcs_simulation_engine.core.simulation_manager[0m:[36m_ensure_state[0m:[36m61[0m - [34m[1mInitialized state: {'messages': [], 'agent_artifacts': {}, 'charA': {'uid': 'low-viz', 'short_description': 'A human with low vision and a really witty and quirky personality.', 'abilities': 'Grew up in a supportive environment with access to resources.', 'extra': {}}, 'charB': {'uid': 'norma-normal', 'short_description': 'A normal human with a balanced and realistic perspective.', 'abilities': 'Grew up in a suburban environment with access to education and resources.', 'extra': {}}}[0m
[32m2025-09-15 22:53:39.744[0m | [34m[1mDEBUG   [0m | [36mdcs_simulation_engine.core.sim_graph.core[0m:[36mcompile[0m:[36m149[0m - [34m[1mGraph buil

{'messages': [],
 'agent_artifacts': {'scene_setup_agent': {'text': 'SETUP_SCENE'}},
 'charA': {'uid': 'low-viz',
  'short_description': 'A human with low vision and a really witty and quirky personality.',
  'abilities': 'Grew up in a supportive environment with access to resources.',
  'extra': {}},
 'charB': {'uid': 'norma-normal',
  'short_description': 'A normal human with a balanced and realistic perspective.',
  'abilities': 'Grew up in a suburban environment with access to education and resources.',
  'extra': {}}}

### "Play to completion" run

In [4]:
# the stopping criteria can be max turns, max time (goal guessed will always stop the simulation)
simulation.max_turns = 1
simulation.play()

[32m2025-09-15 22:53:44.829[0m | [34m[1mDEBUG   [0m | [36mdcs_simulation_engine.core.simulation_manager[0m:[36m_ensure_state[0m:[36m49[0m - [34m[1mInitializing state...[0m
[32m2025-09-15 22:53:44.830[0m | [34m[1mDEBUG   [0m | [36mdcs_simulation_engine.core.simulation_manager[0m:[36m_ensure_state[0m:[36m61[0m - [34m[1mInitialized state: {'messages': [], 'agent_artifacts': {}, 'charA': {'uid': 'low-viz', 'short_description': 'A human with low vision and a really witty and quirky personality.', 'abilities': 'Grew up in a supportive environment with access to resources.', 'extra': {}}, 'charB': {'uid': 'norma-normal', 'short_description': 'A normal human with a balanced and realistic perspective.', 'abilities': 'Grew up in a suburban environment with access to education and resources.', 'extra': {}}}[0m
[32m2025-09-15 22:53:50.731[0m | [34m[1mDEBUG   [0m | [36mdcs_simulation_engine.core.simulation_manager[0m:[36m_ensure_state[0m:[36m49[0m - [34m[1mI

{'messages': [HumanMessage(content='howdy', additional_kwargs={}, response_metadata={}, id='d93c2076-138e-4eba-9d4b-b8502a137c19'),
  AIMessage(content='CONTINUE_SCENE', additional_kwargs={}, response_metadata={}, id='435e8c34-b5bc-422a-a4a9-e60a986f65c6')],
 'agent_artifacts': {},
 'charA': {'uid': 'low-viz',
  'short_description': 'A human with low vision and a really witty and quirky personality.',
  'abilities': 'Grew up in a supportive environment with access to resources.',
  'extra': {}},
 'charB': {'uid': 'norma-normal',
  'short_description': 'A normal human with a balanced and realistic perspective.',
  'abilities': 'Grew up in a suburban environment with access to education and resources.',
  'extra': {}},
 'end_timestamp': '20250915-225351',
 'stop_reason': 'max turns (1) reached',
 'output_path': 'output/simulation_state_20250915_225351.yaml'}

####  "Play to completion" using GUI

In [None]:
simulation.reset_state()
simulation.state

[32m2025-09-15 22:54:17.059[0m | [1mINFO    [0m | [36mdcs_simulation_engine.core.simulation_manager[0m:[36mreset_state[0m:[36m255[0m - [1mSimulation state has been reset.[0m


In [6]:
simulation.play(input_provider=play_gradio)

[32m2025-09-15 22:54:20.376[0m | [34m[1mDEBUG   [0m | [36mdcs_simulation_engine.core.simulation_manager[0m:[36m_ensure_state[0m:[36m49[0m - [34m[1mInitializing state...[0m
[32m2025-09-15 22:54:20.385[0m | [34m[1mDEBUG   [0m | [36mdcs_simulation_engine.core.simulation_manager[0m:[36m_ensure_state[0m:[36m61[0m - [34m[1mInitialized state: {'messages': [], 'agent_artifacts': {}, 'charA': {'uid': 'low-viz', 'short_description': 'A human with low vision and a really witty and quirky personality.', 'abilities': 'Grew up in a supportive environment with access to resources.', 'extra': {}}, 'charB': {'uid': 'norma-normal', 'short_description': 'A normal human with a balanced and realistic perspective.', 'abilities': 'Grew up in a suburban environment with access to education and resources.', 'extra': {}}}[0m
  chat = gr.Chatbot(label="Simulation")


* Running on local URL:  http://127.0.0.1:7860
* To create a public link, set `share=True` in `launch()`.


[32m2025-09-15 22:54:40.332[0m | [34m[1mDEBUG   [0m | [36mdcs_simulation_engine.core.simulation_manager[0m:[36m_ensure_state[0m:[36m49[0m - [34m[1mInitializing state...[0m
[32m2025-09-15 22:54:40.333[0m | [34m[1mDEBUG   [0m | [36mdcs_simulation_engine.core.simulation_manager[0m:[36m_ensure_state[0m:[36m61[0m - [34m[1mInitialized state: {'messages': [], 'agent_artifacts': {}, 'charA': {'uid': 'low-viz', 'short_description': 'A human with low vision and a really witty and quirky personality.', 'abilities': 'Grew up in a supportive environment with access to resources.', 'extra': {}}, 'charB': {'uid': 'norma-normal', 'short_description': 'A normal human with a balanced and realistic perspective.', 'abilities': 'Grew up in a suburban environment with access to education and resources.', 'extra': {}}}[0m
[32m2025-09-15 22:54:40.335[0m | [1mINFO    [0m | [36mdcs_simulation_engine.core.sim_graph.core[0m:[36mnode_fn[0m:[36m224[0m - [1mNode 'scene_continua

{'messages': [HumanMessage(content='howdy', additional_kwargs={}, response_metadata={}, id='6f1ef7a8-e25e-497f-861f-0ef109397c60'),
  AIMessage(content='CONTINUE_SCENE', additional_kwargs={}, response_metadata={}, id='bb0379af-c676-457d-a72a-effa99e679a9')],
 'agent_artifacts': {},
 'charA': {'uid': 'low-viz',
  'short_description': 'A human with low vision and a really witty and quirky personality.',
  'abilities': 'Grew up in a supportive environment with access to resources.',
  'extra': {}},
 'charB': {'uid': 'norma-normal',
  'short_description': 'A normal human with a balanced and realistic perspective.',
  'abilities': 'Grew up in a suburban environment with access to education and resources.',
  'extra': {}},
 'end_timestamp': '20250915-225441',
 'stop_reason': 'max turns (1) reached',
 'output_path': 'output/simulation_state_20250915_225441.yaml'}