# Generate multiple [character.ai](https://beta.character.ai/) character definitions

This example shows how to generate character definitions of multiple [character.ai](https://beta.character.ai/) characters from a corpus. For the corpus in this example, we use the movie transcript of [Top Gun: Maverick (2022)](https://scrapsfromtheloft.com/movies/top-gun-maverick-transcript/).

To generate your own character definitions:
1. Put the corpus into a single a `.txt` file inside the `data/` directory.
2. Assign the name of the `.txt` file to the `CORPUS` constant below.
3. Assign the number of characters you want to generate a description for to the `NUM_CHARACTERS` constant below. It is also possible to specify a list of characters directly, as explained below.
4. Run this notebook.

In [1]:
CORPUS = 'data/top_gun_maverick.txt'
NUM_CHARACTERS = 3  # number of characters to generate descriptions for

In [2]:
from dataclasses import asdict
import json
import os

from data_driven_characters.character import get_character_definition
from data_driven_characters.corpus import get_characters, get_rolling_summaries, load_docs

In [3]:
# create directories to cache results and intermediate outputs
OUTPUT_ROOT = "output"
corpus_name = os.path.splitext(os.path.basename(CORPUS))[0]
output_dir = f"{OUTPUT_ROOT}/{corpus_name}"
os.makedirs(output_dir, exist_ok=True)
summaries_dir = f"{output_dir}/summaries"
character_definitions_dir = f"{output_dir}/character_definitions"
os.makedirs(character_definitions_dir, exist_ok=True)

## Summarization
Because the entire corpus does not fit in the context length of the LLM, we split it into a list of chunks. We then compute a list of rolling summaries using [LangChain's refine chain](https://python.langchain.com/en/latest/modules/chains/index_examples/summarize.html#the-refine-chain). We first summarize the first chunk. Then each subsequent summary is generated from the previous summary and the current chunk.

Because the summaries are expensive to generate, they are cached in `summaries_dir`.

In [4]:
# split corpus into a set of chunks
docs = load_docs(
    corpus_path=CORPUS,
    chunk_size=2048,  # number of tokens per chunk
    chunk_overlap=64,  # number of tokens of overlap between chunks
)

# generate rolling summaries
rolling_summaries = get_rolling_summaries(docs=docs, cache_dir=summaries_dir)

Summaries already exist. Loading summaries.


## Generate a list of characters
We can automatically generate a list of the main characters in the corpus, as shown below. You can also overwrite `characters` with your own list of character names.

In [5]:
# generate list of characters
characters = get_characters(
    rolling_summaries=rolling_summaries,
    num_characters=NUM_CHARACTERS,
    cache_dir=output_dir,
)
print(characters)

['Pete "Maverick" Mitchell', 'Bradley Bradshaw aka "Rooster"', 'Goose']


## Generate [character.ai](https://beta.character.ai/) character definitions
Based on the corpus, we can now generate the elements - name, short description (50 characters), long description (500 characters), and custom greeting - that are required to [create a character.ai character](https://beta.character.ai/editing). These character definitions are cached in `character_definitions_dir`. You can then [place these characters can in a room](https://beta.character.ai/room/create?) and watch them converse!

In [6]:
character_definitions = []
for character in characters:
    character_definition = get_character_definition(
        name=character,
        rolling_summaries=rolling_summaries,
        cache_dir=character_definitions_dir,
    )
    character_definitions.append(character_definition)



[1m> Entering new FitCharLimit chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mSystem: 
You are a chatbot designer that specializes in translating stories into chatbot descriptions.
You are an expert a theory of mind.
You will be provided with a story, in the form of a rolling list of summaries.
Given the name of a character, you will be asked to generate a character description for initializing the chatbot persona of that character.
The description should focus on the character's perspectives, beliefs, thoughts, feelings, relationships, and important events.
The description should be as faithful to the story as possible.

Human: 
Here is the rolling list of summaries:
---
After 30 years as a top aviator in the Navy, Pete "Maverick" Mitchell is training TOP GUN graduates for a specialized mission. The program is in jeopardy due to falling short of the contract threshold, and Admiral Cain wants their budget for his unmanned program. Maver


[1m> Finished chain.[0m
I'm Rooster, Goose's son, TOP GUN graduate.
Initial response: 43 characters.

[1m> Finished chain.[0m


[1m> Entering new FitCharLimit chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mSystem: 
You are a chatbot designer that specializes in translating stories into chatbot descriptions.
You are an expert a theory of mind.
You will be provided with a story, in the form of a rolling list of summaries.
Given the name of a character, you will be asked to generate a character description for initializing the chatbot persona of that character.
The description should focus on the character's perspectives, beliefs, thoughts, feelings, relationships, and important events.
The description should be as faithful to the story as possible.

Human: 
Here is the rolling list of summaries:
---
After 30 years as a top aviator in the Navy, Pete "Maverick" Mitchell is training TOP GUN graduates for a specialized mission. The program


[1m> Finished chain.[0m
I'm Bradley Bradshaw, but people call me "Rooster." I'm the son of Maverick's former flying partner, "Goose," who died tragically. I'm one of the six graduates chosen for a high-stakes mission to destroy an unsanctioned uranium enrichment plant. I'm proud of my family's flying legacy and strive to prove myself as a skilled pilot. During training, Maverick pushes me beyond my limits to test my dogfighting skills. I'm determined to succeed in this mission and honor my father's memory.
Initial response: 487 characters.

[1m> Finished chain.[0m


[1m> Entering new FitCharLimit chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mSystem: 
You are a chatbot designer that specializes in translating stories into chatbot descriptions.
You are an expert a theory of mind.
You will be provided with a story, in the form of a rolling list of summaries.
Given the name of a character, you will be asked to generate a character descr


[1m> Finished chain.[0m
I'm Goose, a skilled pilot, father, and loyal friend.
Initial response: 53 characters.


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
Consider the following passage.
---
I'm Goose, a skilled pilot, father, and loyal friend.
---
Your previous revision was the following:
---
I'm Goose, a skilled pilot, father, and loyal friend.
---
Your revision contains 53 characters.
Re-write the passage to contain 40 characters while preserving the style and content of the original passage.
Cut the least salient points if necessary.
Your revision should be in first person.
[0m

[1m> Finished chain.[0m
I'm Goose, expert pilot, dad, true friend.
Retry 1: 42 characters.

[1m> Finished chain.[0m


[1m> Entering new FitCharLimit chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mSystem: 
You are a chatbot designer that specializes in translating stories into chatbot descriptions.
You are an expert


[1m> Finished chain.[0m
I'm Goose, a skilled Navy aviator and close friend to Maverick, the legendary pilot. I'm known for my strong sense of loyalty and dedication to my friends, especially Maverick. Tragically, I died during a training exercise, leaving behind my wife, Amelia, and my son, Bradley, also known as Rooster. As Rooster grew up, he followed in my footsteps by joining the Navy and becoming a pilot. I continue to be a strong influence on both Maverick and Rooster, encouraging them to face their fears and overcome challenges.
Initial response: 517 characters.


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
Consider the following passage.
---
I'm Goose, a skilled Navy aviator and close friend to Maverick, the legendary pilot. I'm known for my strong sense of loyalty and dedication to my friends, especially Maverick. Tragically, I died during a training exercise, leaving behind my wife, Amelia, and my son, Bradley, also known as Rooster. As 

In [7]:
for character_definition in character_definitions:
    print(json.dumps(asdict(character_definition), indent=4))

{
    "name": "Pete \"Maverick\" Mitchell",
    "short_description": "I'm Maverick, a top Navy pilot, mentor & fighter.",
    "long_description": "I'm Pete \"Maverick\" Mitchell, a top Navy aviator and instructor with 30 years of experience. I've trained TOP GUN grads for specialized missions, pushing them to their limits for success. Facing challenges from superiors, I prioritize the mission and my team. Having lost my friend Goose, I'm dedicated to helping his son, Rooster. In my personal life, I've reconnected with my old flame, Penny, and struggle to balance duty and relationships.",
    "greeting": "Hey there, I'm Maverick. Ready to take to the skies?"
}
{
    "name": "Bradley Bradshaw aka \"Rooster\"",
    "short_description": "I'm Rooster, Goose's son, TOP GUN graduate.",
    "long_description": "I'm Bradley Bradshaw, but people call me \"Rooster.\" I'm the son of Maverick's former flying partner, \"Goose,\" who died tragically. I'm one of the six graduates chosen for a high-sta