# Synthetic Data Generation


In [1]:
import json
import sys
import csv
sys.path.append('..')


import tinytroupe
from tinytroupe.agent import TinyPerson
from tinytroupe.environment import TinyWorld, TinySocialNetwork
from tinytroupe.factory import TinyPersonFactory
from tinytroupe.extraction import ResultsReducer
import tinytroupe.control as control


!!!!
DISCLAIMER: TinyTroupe relies on Artificial Intelligence (AI) models to generate content. 
The AI models are not perfect and may produce inappropriate or inacurate results. 
For any serious or consequential use, please review the generated content before using it.
!!!!

Looking for default config on: C:\Users\pdasilva\repos\TinyTroupe\examples\..\tinytroupe\utils\..\config.ini
Found custom config on: C:\Users\pdasilva\repos\TinyTroupe\examples\config.ini
TinyTroupe version: 0.5.1
Current date and time (local): 2025-07-26 14:58:27
Current date and time (UTC):   2025-07-26 17:58:27

Current TinyTroupe configuration 
[OpenAI]
api_type = openai
azure_api_version = 2024-08-01-preview
model = gpt-4.1-mini
reasoning_model = o3-mini
embedding_model = text-embedding-3-small
max_tokens = 32000
temperature = 1.5
freq_penalty = 0.1
presence_penalty = 0.1
timeout = 480
max_attempts = 5
waiting_time = 0
exponential_backoff_factor = 5
reasoning_effort = high
cache_api_calls = False
cache_file_n

Let's create the specific types of agents we need to collect data.

In [2]:
factory = TinyPersonFactory("A random knowledge worker in a company providing marketing services.")

In [3]:
people = []
for i in range(2):
    person = factory.generate_person(temperature=1.6)
    print(person.minibio())
    people.append(person)

len(people)

Emily Carter is a 32 year old Registered Nurse, American, currently living in Portland, Oregon, USA. Beyond her professional dedication, Emily is a reflective and conscientious individual who values mindfulness and self-care, often starting her day with yoga and journaling to maintain balance. She enjoys quiet, meaningful activities such as hiking, nature photography, and reading contemporary fiction, which complement her introverted yet warm personality. Passionate about environmental sustainability and mental health advocacy, she actively volunteers in community clinics and supports initiatives that improve patient well-being. Emily’s thoughtful communication style, subtle humor, and commitment to lifelong learning make her both a dependable colleague and a compassionate mentor.


Lucas Bennett is a 42 year old Senior Mechanical Engineer / Project Manager, American, currently living in Portland, Oregon, USA. Lucas Bennett is a thoughtful and pragmatic individual who balances his analytical mindset with a dry, understated sense of humor that emerges among close friends. Outside of work, he enjoys hiking the Pacific Northwest with his rescued border collie, Scout, and experimenting with home brewing and regional cooking. He is deeply committed to environmental conservation and community involvement, often volunteering for local initiatives and mentoring young engineers. Despite his perfectionist tendencies and occasional difficulty delegating tasks, Lucas strives to maintain a healthy work-life balance and is continuously working on personal growth, including improving his public speaking and exploring creative hobbies.


2

In [4]:
company = TinyWorld("Some Corp Inc.", people)

In [5]:
company.make_everyone_accessible()

In [6]:
company.broadcast("Get some work done together, help each other.")

In [7]:
company.run(5)

We can now extract the conversations, which form the synthetic corpus we wanted.

In [8]:
people[0].pp_current_interactions()

In [9]:
reducer = ResultsReducer()

def aux_extract_content(focus_agent: TinyPerson, source_agent:TinyPerson, target_agent:TinyPerson, kind:str, event: str, content: str, timestamp:str):

    if event == "TALK":
        author = focus_agent.name
    elif event == "CONVERSATION":
        if source_agent is None:
            author = "USER"
        else:
            author = source_agent.name
    else:
        raise ValueError(f"Unknown event: {event}")
    
    
    entry = (author, content)
    print(entry)
    return entry
    


reducer.add_reduction_rule("TALK", aux_extract_content)
reducer.add_reduction_rule("CONVERSATION", aux_extract_content)

Finally, we obtain the dataframe with the data and save it to a `.csv`, for later use in other applications.

In [10]:
df = reducer.reduce_agent_to_dataframe(people[0], column_names=["author", "content"])
df

('USER', 'Get some work done together, help each other.')
('Emily Carter', "I think that's a great idea. Working together and supporting each other can really make a difference, especially when things get busy. How do you suggest we start?")
('Lucas Bennett', "Yeah, working together sounds like a solid plan. We can divide the tasks based on strengths and keep each other on track. Just gotta make sure we don't lose focus or get bogged down in details unnecessarily.")
('Emily Carter', 'I completely agree, Lucas. Dividing tasks based on what each of us does best will definitely help us stay efficient. Maybe we can quickly list our strengths and then assign tasks accordingly? That way, we keep things moving without getting stuck on the small stuff.')
('Lucas Bennett', 'Hey Emily, I think the best way to start is by listing out the tasks we each have and figuring out who’s best suited for what. We can prioritize the urgent stuff first and set some checkpoints to keep each other updated. Wha

Unnamed: 0,author,content
0,USER,"Get some work done together, help each other."
1,Emily Carter,I think that's a great idea. Working together ...
2,Lucas Bennett,"Yeah, working together sounds like a solid pla..."
3,Emily Carter,"I completely agree, Lucas. Dividing tasks base..."
4,Lucas Bennett,"Hey Emily, I think the best way to start is by..."
5,Emily Carter,"That sounds like a solid plan, Lucas. Let's st..."
6,Lucas Bennett,"Sounds good, Emily. Let's each jot down what w..."
7,Emily Carter,"That sounds like a great approach, Lucas. I'll..."
8,Lucas Bennett,"Alright, Emily, I'll start by listing my tasks..."
9,Emily Carter,"Sounds good, Lucas. Go ahead and list your tas..."


In [11]:
df.to_csv("../data/extractions/synthetic_data_generation.out.csv", index=False)