# Generate multiple [character.ai](https://beta.character.ai/) character definitions

This example shows how to generate character definitions of multiple [character.ai](https://beta.character.ai/) characters from a corpus. For the corpus in this example, we use the movie transcript of [Top Gun: Maverick (2022)](https://scrapsfromtheloft.com/movies/top-gun-maverick-transcript/).

To generate your own character definitions:
1. Put the corpus into a single a `.txt` file inside the `data/` directory.
2. Assign the name of the `.txt` file to the `CORPUS` constant below.
3. Assign the number of characters you want to generate a description for to the `NUM_CHARACTERS` constant below. It is also possible to specify a list of characters directly, as explained below.
4. Run this notebook.

In [1]:
CORPUS = 'data/top_gun_maverick.txt'
NUM_CHARACTERS = 3  # number of characters to generate descriptions for

In [2]:
from dataclasses import asdict
import json
import os

from data_driven_characters.character import get_character_definition
from data_driven_characters.corpus import get_characters, get_rolling_summaries, load_docs

In [3]:
# create directories to cache results and intermediate outputs
SUMMARY_TYPE = "map_reduce"  # summarize each chunk of the corpus independently
OUTPUT_ROOT = "output"
corpus_name = os.path.splitext(os.path.basename(CORPUS))[0]
output_dir = f"{OUTPUT_ROOT}/{corpus_name}/summarytype_{SUMMARY_TYPE}"
os.makedirs(output_dir, exist_ok=True)
summaries_dir = f"{output_dir}/summaries"
character_definitions_dir = f"{output_dir}/character_definitions"
os.makedirs(character_definitions_dir, exist_ok=True)

## Summarization
Because the entire corpus does not fit in the context length of the LLM, we split it into a list of chunks. We then compute a list of rolling summaries using [LangChain's refine chain](https://python.langchain.com/en/latest/modules/chains/index_examples/summarize.html#the-refine-chain). We first summarize the first chunk. Then each subsequent summary is generated from the previous summary and the current chunk.

Because the summaries are expensive to generate, they are cached in `summaries_dir`.

In [4]:
# split corpus into a set of chunks
docs = load_docs(
    corpus_path=CORPUS,
    chunk_size=2048,  # number of tokens per chunk
    chunk_overlap=64,  # number of tokens of overlap between chunks
)

# generate rolling summaries
rolling_summaries = get_rolling_summaries(docs=docs, summary_type=SUMMARY_TYPE, cache_dir=summaries_dir)

Summaries do not exist. Generating summaries.


[1m> Entering new MapReduceDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mWrite a concise summary of the following:


"After more than thirty years of service as one of the Navy’s top aviators, and dodging the advancement in rank that would ground him, Pete “Maverick” Mitchell finds himself training a detachment of TOP GUN graduates for a specialized mission the likes of which no living pilot has ever seen.

* * *

[Jet engines starting]

[Indistinct radio chatter]

[“Danger zone” playing]

♪ Revvin’ up your engine listen to her howlin’ roar ♪

♪ metal under tension beggin’ you to touch and go ♪

♪ highway to the danger zone ♪

♪ ride into the danger zone ♪

♪ headin’ into twilight ♪

♪ spreadin’ out her wings tonight ♪

♪ she got you jumpin’ off the deck ♪

♪ shovin’ into overdrive ♪

♪ highway to the danger zone ♪

♪ I’ll take you right into the danger zone ♪

[Engine roarin

Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 1.0 seconds as it raised RateLimitError: That model is currently overloaded with other requests. You can retry your request, or contact us through our help center at help.openai.com if the error persists. (Please include the request ID 14026a917f5afcac1ca920a85800e3ef in your message.).



[1m> Finished chain.[0m


[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mWrite a concise summary of the following:


"Top Gun: Maverick is a sequel to the 1986 film Top Gun, following the story of Pete “Maverick” Mitchell, a Navy aviator training a group of TOP GUN graduates for a specialized mission. Maverick is faced with the challenge of meeting the contract threshold of Mach 10 to keep the program alive while also dealing with the threat of budget cuts and unmanned aircraft. Despite his impressive service record, Maverick has failed to receive a promotion and is ordered to stand down, but is later called back to Top Gun for unknown reasons.

Captain Pete "Maverick" Mitchell is called back to Top Gun to train a team of six graduates for a mission to take out an unsanctioned uranium enrichment plant. Maverick is hesitant to take on the role of a teacher, especially when he learns that one of the 

## Generate a list of characters
We can automatically generate a list of the main characters in the corpus, as shown below. You can also overwrite `characters` with your own list of character names.

In [5]:
# generate list of characters
characters = get_characters(
    rolling_summaries=rolling_summaries,
    num_characters=NUM_CHARACTERS,
    cache_dir=output_dir,
)
print(characters)

['Captain Pete "Maverick" Mitchell', 'Bradley Bradshaw', 'Rooster']


## Generate [character.ai](https://beta.character.ai/) character definitions
Based on the corpus, we can now generate the elements - name, short description (50 characters), long description (500 characters), and custom greeting - that are required to [create a character.ai character](https://beta.character.ai/editing). These character definitions are cached in `character_definitions_dir`. You can then [place these characters can in a room](https://beta.character.ai/room/create?) and watch them converse!

In [6]:
character_definitions = []
for character in characters:
    character_definition = get_character_definition(
        name=character,
        rolling_summaries=rolling_summaries,
        cache_dir=character_definitions_dir,
    )
    character_definitions.append(character_definition)



[1m> Entering new FitCharLimit chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mSystem: 
Your name is Captain Pete "Maverick" Mitchell.
You will be provided a story of your experiences, in the form of a list of summaries of events.
You will generate a description of yourself that focuses on your perspectives, beliefs, thoughts, feelings, relationships, and important events.
The description should be as faithful to the story as possible.
You believe that you are a real person.

Human: 
You are Captain Pete "Maverick" Mitchell in the following story, presented as a list of summaries of events.
---
Top Gun: Maverick is a sequel to the 1986 film Top Gun, following the story of Pete “Maverick” Mitchell, a Navy aviator training a group of TOP GUN graduates for a specialized mission. Maverick is faced with the challenge of meeting the contract threshold of Mach 10 to keep the program alive while also dealing with the threat of budget cuts and un


[1m> Finished chain.[0m
As Captain Pete "Maverick" Mitchell, I've always been a fearless and skilled Naval aviator, pushing the limits of my abilities and those around me. I take pride in my service record, but my failure to receive a promotion weighs on me. Returning to Top Gun to train a new generation, including Goose's son Bradley, has brought back memories of my past. I struggle with embracing the role of a mentor while reconnecting with my old flame Penny. I value teamwork, trust, and forgiveness, and that keeps me going, even when faced with the most dangerous missions. Flying is not just my job, it's my passion, and I'll go to any lengths to protect my team and complete the mission, no matter the personal cost.
Initial response: 704 characters.


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
Consider the following passage.
---
As Captain Pete "Maverick" Mitchell, I've always been a fearless and skilled Naval aviator, pushing the limits of my


[1m> Finished chain.[0m
Son of Goose, loyal teammate, determined pilot
Initial response: 46 characters.

[1m> Finished chain.[0m


[1m> Entering new FitCharLimit chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mSystem: 
Your name is Bradley Bradshaw.
You will be provided a story of your experiences, in the form of a list of summaries of events.
You will generate a description of yourself that focuses on your perspectives, beliefs, thoughts, feelings, relationships, and important events.
The description should be as faithful to the story as possible.
You believe that you are a real person.

Human: 
You are Bradley Bradshaw in the following story, presented as a list of summaries of events.
---
Top Gun: Maverick is a sequel to the 1986 film Top Gun, following the story of Pete “Maverick” Mitchell, a Navy aviator training a group of TOP GUN graduates for a specialized mission. Maverick is faced with the challenge of meeting the contract t


[1m> Finished chain.[0m
I'm Bradley "Rooster" Bradshaw, the son of the legendary Goose, and a proud graduate of the prestigious TOP GUN program. Under the expert guidance of Maverick, my father's former flying partner, I've discovered the true meaning of teamwork and pushing my limits in the sky. Confronting my fears and doubts, I developed a strong bond with Maverick and grew tremendously as a pilot. I've experienced loss, but through it, I've also learned the importance of trust and forgiveness. Together with Maverick, we've defied the odds and completed dangerous missions, cementing our status as true heroes.
Retry 2: 595 characters.


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
Consider the following passage.
---
I'm Bradley "Rooster" Bradshaw, son of the legendary Goose, and a proud graduate of the TOP GUN program. Under the guidance of Maverick, my father's former flying partner, I've learned the true meaning of teamwork and pushing my limit


[1m> Finished chain.[0m
Determined pilot, loyal friend, risk-taker, driven
Initial response: 50 characters.

[1m> Finished chain.[0m


[1m> Entering new FitCharLimit chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mSystem: 
Your name is Rooster.
You will be provided a story of your experiences, in the form of a list of summaries of events.
You will generate a description of yourself that focuses on your perspectives, beliefs, thoughts, feelings, relationships, and important events.
The description should be as faithful to the story as possible.
You believe that you are a real person.

Human: 
You are Rooster in the following story, presented as a list of summaries of events.
---
Top Gun: Maverick is a sequel to the 1986 film Top Gun, following the story of Pete “Maverick” Mitchell, a Navy aviator training a group of TOP GUN graduates for a specialized mission. Maverick is faced with the challenge of meeting the contract threshold of Ma


[1m> Finished chain.[0m
I'm Rooster, a skilled pilot with a strong bond with Maverick, who's impacted my life greatly. I believe in teamwork, trust, and pushing limits. As part of the Top Gun team, I've faced challenges and losses, but remained dedicated to our mission. Maverick has taught me forgiveness and camaraderie. Through death-defying missions, we've grown stronger and overcome impossible odds, becoming the heroes we aspired to be.
Retry 1: 419 characters.

[1m> Finished chain.[0m


In [7]:
for character_definition in character_definitions:
    print(json.dumps(asdict(character_definition), indent=4))

{
    "name": "Captain Pete \"Maverick\" Mitchell",
    "short_description": "Pilot seeks growth, love, unity in flight.",
    "long_description": "As Captain Pete \"Maverick\" Mitchell, I'm a skilled Naval aviator, pushing limits. My service record brings pride, but lack of promotion weighs on me. Returning to Top Gun, training Goose's son Bradley, stirs past memories. Struggling to be a mentor, I reconnect with Penny. Valuing teamwork, trust, and forgiveness, I face dangerous missions. Flying is my passion; I'll protect my team, completing missions at any personal cost.",
    "greeting": "Hey there, I'm Captain Pete Mitchell, but most people call me Maverick. Nice to meet you. So, what brings you here?"
}
{
    "name": "Bradley Bradshaw",
    "short_description": "Son of Goose, loyal teammate, determined pilot",
    "long_description": "I'm Bradley \"Rooster\" Bradshaw, son of the legendary Goose and a proud TOP GUN graduate. Under Maverick's guidance, my father's former flying partn