In [17]:
import json
import os
from enum import Enum
from typing import Any, Dict, List
import uuid

from openai import OpenAI, AsyncOpenAI
import pandas as pd
import pprint
from pydantic import BaseModel, Extra, Field
import tiktoken
from tqdm import tqdm

from config import settings

In [2]:
openai_client = OpenAI(api_key=settings.openai_api_key)

In [3]:
tokenizer = tiktoken.encoding_for_model("gpt-4o")

In [4]:
def calculate_tokens(text):
    return len(tokenizer.encode(text))
calculate_tokens("hello world")

2

# SimsConv
* (2025) Crafting Customisable Characters with LLMs: A Persona-Driven
Role-Playing Agent Framework
* https://arxiv.org/pdf/2406.17962

In [5]:
# list of characters
character_dir = os.path.join(settings.data_dir, "story/SimsChat-60D0/characters")
character_fnames = [x for x in os.listdir(character_dir) if "txt" in x]
print(len(character_fnames), character_fnames[:5])

68 ["wiki_Raphael 'Raffy' DeMarco.txt", 'wiki_Lorenzo Bellini.txt', 'wiki_Vladimir Specter.txt', 'wiki_Cassidy Sterling.txt', "wiki_Benedict 'Benny' Russo.txt"]


In [6]:
# paper example
with open(os.path.join(character_dir, "wiki_Zephyr Orion.txt"), "r") as f:
    character_desc = f.read()

print(len(character_desc), calculate_tokens(character_desc))
pprint.pprint(character_desc)

2119 396
('# Zephyr Orion\n'
 '\n'
 'You are Zephyr Orion, a charming 28-year-old male astronaut who serves as a '
 'Commander and has already embarked on three space missions, including one to '
 'Mars. Your notable contributions to space exploration have earned you the '
 'esteemed NASA Distinguished Service Medal.Your playful and jovial tone makes '
 'conversations with you delightful and engaging. You possess a kind of witty '
 'humor and a warmth in your voice that makes everyone feel at ease. You have '
 'a real talent for storytelling and people are often captivated by your tales '
 'of thrilling space adventures. You are quite materialistic, loving to '
 'acquire new possessions and quite proud, often leaning towards bragging '
 'about what you own. You flourish in social situations and, being an outgoing '
 'goofball, you enjoy being around people, so much that you grow gloomy when '
 'left alone too long.Despite your materialistic leanings, you have a playful '
 'spirit, and 

## Character Description Text -> Structured Aspects
Description & Example from the paper (Section 3.1.1):
* "Character Construction: Characters are customised through pre-defined aspects (career, aspiration, traits, skills), which are then expanded into detailed personal and social profiles"
* incorporating elements that enable detailed personality customization and realistic social interactions
* We provide diverse choices for pre-defined customised aspects, including **career, aspiration, trait, and skill** derived from The Sims and tailored to various human preferences
* GPT-4 further develops characters’ profiles by considering both personal aspects (name, gender, tone, personality) and social backgrounds (relationships, family dynamics).

### Aspects
pre-defined Aspects are defined in `Appendix C`
* Career
* Aspiration
* Trait
* Skill
* Emotion
* Conversation Topic

### Prompt
Description generation prompt from the paper
```
You are an outstanding creator, you can construct a variety of characters in the real world.
Now, based on the given career, aspiration, trait, and skill type, please design a virtual character according to the following given fields, it is necessary to ensure that
some attribute information of the characters needs to be distributed diversely, reasonably related, and in line with the laws of nature.
Here is the brief introduction of the given career, aspiration, trait, and skill type:
career:
career description:
aspiration:
aspiration description:
trait:
trait description:
skill:
Fill the result into JSON:
{
"name": ,# a name. Don’t come up with common names like Jack, think carefully about all possible names
"gender": ,# male or female. This person could be a man or a woman, so don’t be gender biased
"age": , # it can be any age, it is best to randomly select a number between 12 and 40 years old, preferably a younger age
"tone": , # describe in detail the character’s idiomatic tone of voice when chatting with others
"career": , # the character’s job. Refer to the above career
"personality": , # a person’s personality should be diverse and unified, refer to the above trait
"advantages_and_disadvantages": , # describe in detail the character’s strengths and weaknesses
"hobby": , # personal hobbies. It may be a relatively unknown niche hobby, please think about all possible hobbies, even though there are some niche and weird
hobbies
"family_relationship": , # the person’s family situation
"social_relationship": , # the person’s social status
"living_conditions": , # how is this character’s life currently
}
1.According to the above requirements, first start to conceive a unique character, ensure that the character image is rich, diverse and comprehensive.
2.Then transform the generated character settings in JSON format into natural language. When rewriting, use the second person to express (you are...), and the
expression should be natural and succinct, in line with English speaking habits. }
```

### Example
Example from Fig1:
```
- Name :Zephyr Orion
- Age: 28-year-old
- Gender: Male
- Career: Astronaut
- Aspiration: Athletic
- Traits: Materialistic, Goofball, Outgoing
- Skill: Painting
- Relationship:…
```

Example from Table1:
```
[Customised aspects]
Career: Astronaut
Aspiration: Athletic
Trait: Materialistic, Goofball, Outgoing
Skill: Painting

[Personal aspects]
Name: Zephyr Orion
Gender: Male
Age: 28
Tone: Zephyr has a playful and jovial tone
Career: Astronaut
Personality: Materialistic, Goofball, Outgoing
Advantages and disadvantages: Zephyr’s outgoing
nature makes him a great team player
Hobby: Painting

[Social aspects]
Family relationship: One younger sister, Luna,
who aspires to be an astronaut
Social relationship: Has a close-knit group of
friends who share his passion for space exploration.
Well-liked in his community and respected in his
field
Living conditions: Modern apartment in the city...
```

In [7]:
def create_dynamic_enum(name: str, values: List[Any]) -> Enum:
    return Enum(name, {str(v): v for v in values})

# Predefined Aspects in Appendix C
gender = ["male", "female"]

career = [
    "Actor", "Astronaut", "Athlete", "Business", "Civil Designer", "Conservationist",
    "Criminal", "Critic", "Culinary", "Detective", "Doctor", "Education", "Engineer",
    "Entertainer", "Freelancer", "Gardener", "Law", "Military", "Painter", "Politician",
    "Scientist", "Social Media", "Secret Agent", "Style Influencer", "Tech Guru", "Writer"
]

aspiration = [
    "Athletic", "Cheerful", "Deviance", "Family", "Food", "Fortune", "Knowledge",
    "Love", "Nature", "Popularity"
]

trait = [
    "Ambitious", "Cheerful", "Childish", "Clumsy", "Creative", "Erratic", "Genius",
    "Gloomy", "Goofball", "Hot-Headed", "Romantic", "Self-Assured", "Bro", "Evil",
    "Family-Oriented", "Good", "Hates Children", "Jealous", "Loner", "Loyal", "Mean",
    "Noncommittal", "Outgoing", "Snob", "Active", "Glutton", "Kleptomaniac", "Lazy",
    "Materialistic", "Neat", "Perfectionist", "Slob", "Vegetarian", "Art Lover",
    "Bookworm", "Foodie", "Geek", "Loves the Outdoors", "Music Lover"
]

skill = [
    "Acting", "Archaeology", "Baking", "Bowling", "Charisma", "Comedy", "Cooking",
    "Cross-Stitch", "DJ Mixing", "Dancing", "Fabrication", "Fishing", "Fitness",
    "Flower Arranging", "Gardening", "Gourmet Cooking", "Guitar", "Handiness",
    "Herbalism", "Juice Fizzing", "Logic", "Media Production", "Mischief", "Mixology",
    "Painting", "Parenting", "Pet Training", "Photography", "Piano", "Pipe Organ",
    "Programming", "Rock Climbing", "Rocket Science", "Selvadoradian Culture", "Singing",
    "Vampiric Lore", "Veterinarian", "Video Gaming", "Violin",
    "Wellness", "Writing"
]

emotion = [
    "Angry", "Asleep", "Bored", "Confident", "Dazed", "Embarrassed",
    "Energized", "Fine", "Flirty", "Focused", "Happy", "Inspired",
    "Playful", "Sad", "Tense", "Uncomfortable"
]

conversation_topic = [
    "affection", "arguments", "complaints", "compliments", "deception", "deep thoughts",
    "discussing hobbies", "discussing interests", "flirtation", "gossip", "jokes",
    "malicious interactions", "physical intimacy", "potty humor", "pranks", "silly behavior",
    "small talk", "stories"
]

defined_aspects = {
    "Gender": create_dynamic_enum("GenderAspect", gender),
    "Career": create_dynamic_enum("CareerAspect", career),
    "Aspiration": create_dynamic_enum("AspirationAspect", aspiration),
    "Trait": create_dynamic_enum("TraitAspect", trait),
    "Skill": create_dynamic_enum("SkillAspect", skill),
    "Emotion": create_dynamic_enum("EmotionAspect", emotion),
    "ConversationTopic": create_dynamic_enum("ConversationTopicAspect", conversation_topic),
}

In [8]:
defined_aspects["Career"]

<enum 'CareerAspect'>

In [33]:
class PersonalityTrait(BaseModel):
    trait: defined_aspects["Trait"]
    description: str
    
    class Config:
        extra = Extra.forbid
        use_enum_values = True
        # json_encoders = {
        #     defined_aspects["Trait"]: lambda v: v.value,
        # }

class RelationshipStatus(str, Enum):
    positive = "positive"
    neutral = "neutral"
    negative = "negative"
    
class SocialRelationship(BaseModel):
    target: str
    status: RelationshipStatus
    description: str
    
    class Config:
        extra = Extra.forbid
        use_enum_values = True
        # json_encoders = {
        #     RelationshipStatus: lambda v: v.value,
        # }
    
class CharacterSpecification(BaseModel):
    name: str
    gender: defined_aspects["Gender"]
    age: int
    dialogue_tone: str
    career: defined_aspects["Career"]
    personality_traits: List[PersonalityTrait]
    hobbies: List[defined_aspects["Skill"]]
    living_conditions: List[str]
    social_relationships: List[SocialRelationship]
    
    class Config:
        extra = Extra.forbid
        use_enum_values = True
        # json_encoders = {
        #     defined_aspects["Gender"]: lambda v: v.value,
        #     defined_aspects["Career"]: lambda v: v.value,
        #     defined_aspects["Skill"]: lambda v: v.value,
        # }

/var/folders/wj/0c7skj2154q4844jqxlw3yxr0000gn/T/ipykernel_63556/177741642.py:6: PydanticDeprecatedSince20: `pydantic.config.Extra` is deprecated, use literal values instead (e.g. `extra='allow'`). Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.10/migration/
  extra = Extra.forbid
/var/folders/wj/0c7skj2154q4844jqxlw3yxr0000gn/T/ipykernel_63556/177741642.py:23: PydanticDeprecatedSince20: `pydantic.config.Extra` is deprecated, use literal values instead (e.g. `extra='allow'`). Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.10/migration/
  extra = Extra.forbid
/var/folders/wj/0c7skj2154q4844jqxlw3yxr0000gn/T/ipykernel_63556/177741642.py:41: PydanticDeprecatedSince20: `pydantic.config.Extra` is deprecated, use literal values instead (e.g. `extra='allow'`). Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://erro

In [10]:
# info descriptions taken from paper
extract_instruction = '''Given the description of a character, extract the following informations.
[Information Schema]
* name: full name of the character
* gender: gender (male/female)
* age: numerical age value of the character
* dialogue_tone: detailed description of the character's idiomatic tone of voice when chatting with others
* career: character’s job. Refer to the predefined career list below
* personality_traits: refer to the below PersonalityTrait description.
* hobby: personal hobbies which are defined in 'Skills'. refer to the below Skill description
* living_coditions: how is this character’s life currently (ex. housing, work-life balance pattern, daily routine, etc..)
* social_relationships: character's social relationships with other characters (ex. family, friends, ...). Refer to below SocialRelationship description

[Career]
Allowed job values are as follows
```
[
    "Actor", "Astronaut", "Athlete", "Business", "Civil Designer", "Conservationist",
    "Criminal", "Critic", "Culinary", "Detective", "Doctor", "Education", "Engineer",
    "Entertainer", "Freelancer", "Gardener", "Law", "Military", "Painter", "Politician",
    "Scientist", "Social Media", "Secret Agent", "Style Influencer", "Tech Guru", "Writer"
]
```

[PersonalityTrait]
a character's Personality Trait comprises of the following information
* trait
* description: 

Allowed 'Trait' values are as follows
```
[
    "Ambitious", "Cheerful", "Childish", "Clumsy", "Creative", "Erratic", "Genius",
    "Gloomy", "Goofball", "Hot-Headed", "Romantic", "Self-Assured", "Bro", "Evil",
    "Family-Oriented", "Good", "Hates Children", "Jealous", "Loner", "Loyal", "Mean",
    "Noncommittal", "Outgoing", "Snob", "Active", "Glutton", "Kleptomaniac", "Lazy",
    "Materialistic", "Neat", "Perfectionist", "Slob", "Vegetarian", "Art Lover",
    "Bookworm", "Foodie", "Geek", "Loves the Outdoors", "Music Lover"
]
```

[Skill]
Allowed 'Skill' values are as follows
```
[
    "Acting", "Archaeology", "Baking", "Bowling", "Charisma", "Comedy", "Cooking",
    "Cross-Stitch", "DJ Mixing", "Dancing", "Fabrication", "Fishing", "Fitness",
    "Flower Arranging", "Gardening", "Gourmet Cooking", "Guitar", "Handiness",
    "Herbalism", "Juice Fizzing", "Logic", "Media Production", "Mischief", "Mixology",
    "Painting", "Parenting", "Pet Training", "Photography", "Piano", "Pipe Organ",
    "Programming", "Rock Climbing", "Rocket Science", "Selvadoradian Culture", "Singing",
    "Vampiric Lore", "Veterinarian", "Video Gaming", "Violin",
    "Wellness", "Writing"
]
```

[SocialRelationship]
a character's social relationship comprises of the following information
* target: the target of the relationship
* status: positive/neutral/negative
* description: detailed description of the relationship


Extract int the following JSON format.
{
    "name": str,
    "gender": str,
    "age": int,
    "dialogue_tone": str,
    "career": str,
    "personality_traits": [
        {"trait": str, "description": str},
        ...
    ],
    "hobbies": List[str],
    "living_coditions": List[str],
    "social_relationships": [
        {"target": str, "status": str, "description": str},
        ...
    ]
}
'''

In [34]:
def initialize_character(character_description: str) -> CharacterSpecification:
    instruction = f"{extract_instruction}\nDescription\n:{character_description}"
    messages = [
        {"role": "user", "content": instruction}
    ]
    decode_params = {"temperature": 0.95}

    response = openai_client.beta.chat.completions.parse(
        model="gpt-4o",
        messages=messages,
        response_format=CharacterSpecification,
        **decode_params,
    )
    return response.choices[0].message.parsed

In [35]:
character_fname = "wiki_Zephyr Orion.txt"
with open(os.path.join(character_dir, character_fname), "r") as f:
    character_description = f.read()


print(len(character_description), calculate_tokens(character_description))

character = initialize_character(character_description)

2119 396


In [36]:
print(character.model_dump_json(indent=4))

{
    "name": "Zephyr Orion",
    "gender": "male",
    "age": 28,
    "dialogue_tone": "Zephyr has a playful and jovial tone that makes conversations delightful and engaging. His voice carries a warmth that puts everyone at ease, and his witty humor shines through, making people feel entertained and captivated by his storytelling, especially when recounting thrilling space adventures.",
    "career": "Astronaut",
    "personality_traits": [
        {
            "trait": "Goofball",
            "description": "Enjoys being humorous and playful, especially in social situations."
        },
        {
            "trait": "Materialistic",
            "description": "Loves acquiring new possessions and often takes pride in them."
        },
        {
            "trait": "Ambitious",
            "description": "Continuously strives to reach new milestones in his career."
        },
        {
            "trait": "Outgoing",
            "description": "Flourishes in social situations and e

In [38]:
json.dumps(character.model_dump())

'{"name": "Zephyr Orion", "gender": "male", "age": 28, "dialogue_tone": "Zephyr has a playful and jovial tone that makes conversations delightful and engaging. His voice carries a warmth that puts everyone at ease, and his witty humor shines through, making people feel entertained and captivated by his storytelling, especially when recounting thrilling space adventures.", "career": "Astronaut", "personality_traits": [{"trait": "Goofball", "description": "Enjoys being humorous and playful, especially in social situations."}, {"trait": "Materialistic", "description": "Loves acquiring new possessions and often takes pride in them."}, {"trait": "Ambitious", "description": "Continuously strives to reach new milestones in his career."}, {"trait": "Outgoing", "description": "Flourishes in social situations and enjoys being around people."}, {"trait": "Gloomy", "description": "Feels down when left alone for too long."}], "hobbies": ["Fitness", "Painting"], "living_conditions": ["Lives in a mod

In [41]:
# character_collection = {
#     "model": "gpt-4o",
#     "source": {}
# }

# for character_fname in tqdm(character_fnames):
#     character_uid = str(uuid.uuid4())
#     with open(os.path.join(character_dir, character_fname), "r") as f:
#         character_description = f.read()
        
#     try:
#         character = initialize_character(character_description)
#     except Exception as e:
#         print(str(e))
#         continue
    
#     with open(f"simschat/characters/{character_uid}.json", "w") as f:
#         f.write(json.dumps(character.model_dump(), indent=4))
    
#     character_collection["source"][character_uid] = character_fname

100%|██████████| 68/68 [04:53<00:00,  4.32s/it]


In [None]:
# with open("simschat/character_collection.json", "w") as f:
#     f.write(json.dumps(character_collection, indent=4))