# Synthetic Data Generation


In [1]:
import json
import sys
import csv
sys.path.append('..')


import tinytroupe
from tinytroupe.agent import TinyPerson
from tinytroupe.environment import TinyWorld, TinySocialNetwork
from tinytroupe.factory import TinyPersonFactory
from tinytroupe.extraction import ResultsReducer
import tinytroupe.control as control


!!!!
DISCLAIMER: TinyTroupe relies on Artificial Intelligence (AI) models to generate content. 
The AI models are not perfect and may produce inappropriate or inacurate results. 
For any serious or consequential use, please review the generated content before using it.
!!!!

Looking for default config on: c:\Users\pdasilva\repos\TinyTroupe\examples\..\tinytroupe\utils\..\config.ini
Found custom config on: c:\Users\pdasilva\repos\TinyTroupe\examples\config.ini
TinyTroupe version: 0.5.1
Current date and time (local): 2025-07-15 21:32:10
Current date and time (UTC):   2025-07-16 00:32:10

Current TinyTroupe configuration 
[OpenAI]
api_type = openai
azure_api_version = 2024-08-01-preview
model = gpt-4o-mini
reasoning_model = o3-mini
embedding_model = text-embedding-3-small
max_tokens = 16000
temperature = 1.7
freq_penalty = 0.1
presence_penalty = 0.1
timeout = 480
max_attempts = 5
waiting_time = 0
exponential_backoff_factor = 5
reasoning_effort = high
cache_api_calls = False
cache_file_na

Let's create the specific types of agents we need to collect data.

In [2]:
factory = TinyPersonFactory("A random knowledge worker in a company providing marketing services.")

In [3]:
people = []
for i in range(2):
    person = factory.generate_person(temperature=1.6)
    print(person.minibio())
    people.append(person)

len(people)

Ethan Carter is a 32 year old Marketing Specialist, American, currently living in Austin, Texas. Ethan Carter is a creative and detail-oriented individual who thrives in collaborative environments, often using his good sense of humor to ease tension among colleagues. He has a strong passion for digital marketing and technology trends, which he actively follows through podcasts and industry news. Outside of work, Ethan enjoys exploring new cuisines and experimenting with recipes, reflecting his love for cooking and photography. He values ethical marketing practices and believes in the power of technology to empower small businesses, which aligns with his long-term goal of establishing a consultancy to help them succeed.
Oliver Bennett is a 32 year old Digital Marketing Specialist, British, currently living in Bristol, England. Oliver Bennett is not only a dedicated Digital Marketing Specialist but also a creative thinker who thrives on brainstorming innovative ideas. His passion for pho

2

In [4]:
company = TinyWorld("Some Corp Inc.", people)

In [5]:
company.make_everyone_accessible()

In [6]:
company.broadcast("Get some work done together, help each other.")

In [7]:
company.run(5)

We can now extract the conversations, which form the synthetic corpus we wanted.

In [8]:
people[0].pp_current_interactions()

In [9]:
reducer = ResultsReducer()

def aux_extract_content(focus_agent: TinyPerson, source_agent:TinyPerson, target_agent:TinyPerson, kind:str, event: str, content: str, timestamp:str):

    if event == "TALK":
        author = focus_agent.name
    elif event == "CONVERSATION":
        if source_agent is None:
            author = "USER"
        else:
            author = source_agent.name
    else:
        raise ValueError(f"Unknown event: {event}")
    
    
    entry = (author, content)
    print(entry)
    return entry
    


reducer.add_reduction_rule("TALK", aux_extract_content)
reducer.add_reduction_rule("CONVERSATION", aux_extract_content)

Finally, we obtain the dataframe with the data and save it to a `.csv`, for later use in other applications.

In [10]:
df = reducer.reduce_agent_to_dataframe(people[0], column_names=["author", "content"])
df

('USER', 'Get some work done together, help each other.')
('Ethan Carter', "Sure! Let's get some work done together. How can I help you with your tasks?")
('Ethan Carter', "Hey team, how's everyone doing with their tasks? Is there anything I can help you with?")
('Oliver Bennett', 'Thanks, Ethan! I could use some help with optimizing my latest ad campaign and brainstorming ideas for social media content. What do you think?')
('Ethan Carter', "Absolutely, Oliver! I'd love to help with that. Let's start by reviewing your current ad campaign. What specific areas do you think need optimization? And we can brainstorm some creative ideas for social media content together!")
('Oliver Bennett', "I'm doing well, thanks for asking! I could use some help with optimizing my ad campaign. If anyone has experience with that, I'd appreciate your input!")
('Ethan Carter', "Hey Oliver! I have some experience with ad campaign optimization. Let's discuss what you've been working on and see how we can impr

Unnamed: 0,author,content
0,USER,"Get some work done together, help each other."
1,Ethan Carter,Sure! Let's get some work done together. How c...
2,Ethan Carter,"Hey team, how's everyone doing with their task..."
3,Oliver Bennett,"Thanks, Ethan! I could use some help with opti..."
4,Ethan Carter,"Absolutely, Oliver! I'd love to help with that..."
5,Oliver Bennett,"I'm doing well, thanks for asking! I could use..."
6,Ethan Carter,Hey Oliver! I have some experience with ad cam...
7,Oliver Bennett,"Thanks, Ethan! I think we need to optimize the..."
8,Ethan Carter,"Those are great ideas, Oliver! Optimizing the ..."
9,Oliver Bennett,"That sounds great, Ethan! I've been running a ..."


In [11]:
df.to_csv("../data/extractions/synthetic_data_generation.out.csv", index=False)