# Synthetic Data Generation


In [1]:
import json
import sys
import csv
sys.path.append('..')


import tinytroupe
from tinytroupe.agent import TinyPerson
from tinytroupe.environment import TinyWorld, TinySocialNetwork
from tinytroupe.factory import TinyPersonFactory
from tinytroupe.extraction import ResultsReducer
import tinytroupe.control as control


!!!!
DISCLAIMER: TinyTroupe relies on Artificial Intelligence (AI) models to generate content. 
The AI models are not perfect and may produce inappropriate or inacurate results. 
For any serious or consequential use, please review the generated content before using it.
!!!!

Looking for default config on: c:\Users\pdasilva\repos\TinyTroupe\examples\..\tinytroupe\utils\..\config.ini
Found custom config on: c:\Users\pdasilva\repos\TinyTroupe\examples\config.ini

Current TinyTroupe configuration 
[OpenAI]
api_type = openai
azure_api_version = 2024-08-01-preview
model = gpt-4o-mini
reasoning_model = o3-mini
embedding_model = text-embedding-3-small
max_tokens = 16000
temperature = 1.2
freq_penalty = 0.0
presence_penalty = 0.0
timeout = 60
max_attempts = 5
waiting_time = 0
exponential_backoff_factor = 5
reasoning_effort = high
cache_api_calls = False
cache_file_name = openai_api_cache.pickle
max_content_display_length = 1024
azure_embedding_model_api_version = 2023-05-15

[Simulation]
paral

Let's create the specific types of agents we need to collect data.

In [2]:
factory = TinyPersonFactory("A random knowledge worker in a company providing marketing services.")

In [3]:
people = []
for i in range(2):
    person = factory.generate_person(temperature=1.6)
    print(person.minibio())
    people.append(person)

len(people)

Emily Carter is a 32 year old Sustainability Consultant, American, currently living in Austin, Texas, USA. Emily Carter is not only dedicated to her work as a Sustainability Consultant but also embodies a warm and approachable personality that makes her a natural leader in her community. Her interests in gardening and environmental activism reflect her commitment to sustainable living, and she often engages in local initiatives that promote eco-friendly practices. Despite her optimistic outlook, she sometimes grapples with self-doubt, particularly when faced with criticism about the slow pace of change in the sustainability sector. Emily finds solace in yoga and mindfulness, which help her manage anxiety and maintain a balanced lifestyle, while her love for cooking healthy meals and spending time outdoors further enriches her connection to nature.
Michael Thompson is a 42 year old Marketing Manager, American, currently living in Chicago, Illinois, USA. Michael Thompson is not only a de

2

In [4]:
company = TinyWorld("Some Corp Inc.", people)

In [5]:
company.make_everyone_accessible()

In [6]:
company.broadcast("Get some work done together, help each other.")

In [7]:
company.run(5)

We can now extract the conversations, which form the synthetic corpus we wanted.

In [8]:
people[0].pp_current_interactions()

In [9]:
reducer = ResultsReducer()

def aux_extract_content(focus_agent: TinyPerson, source_agent:TinyPerson, target_agent:TinyPerson, kind:str, event: str, content: str, timestamp:str):

    if event == "TALK":
        author = focus_agent.name
    elif event == "CONVERSATION":
        if source_agent is None:
            author = "USER"
        else:
            author = source_agent.name
    else:
        raise ValueError(f"Unknown event: {event}")
    
    
    entry = (author, content)
    print(entry)
    return entry
    


reducer.add_reduction_rule("TALK", aux_extract_content)
reducer.add_reduction_rule("CONVERSATION", aux_extract_content)

Finally, we obtain the dataframe with the data and save it to a `.csv`, for later use in other applications.

In [10]:
df = reducer.reduce_agent_to_dataframe(people[0], column_names=["author", "content"])
df

('USER', 'Get some work done together, help each other.')
('Emily Carter', "That sounds like a great idea! I'm always up for collaborating and helping each other out. What specific tasks do you have in mind?")
('Emily Carter', 'I think it would be helpful to list out what we each need to accomplish. That way, we can see where we can assist each other the most. What do you think?')
('Michael Thompson', "Sure! I'd love to collaborate and help each other out. What specific tasks do you have in mind?")
('Michael Thompson', 'I think we should start by identifying the tasks we need to tackle. Do you have a list of priorities?')
('Michael Thompson', "Let's break down the tasks we need to accomplish. Maybe we can each take a part and then review each other's work? What do you think?")
('Emily Carter', 'I think breaking down the tasks sounds like a solid plan! I have a few priorities in mind, like completing the sustainability report and preparing for our upcoming client meeting. What about you

Unnamed: 0,author,content
0,USER,"Get some work done together, help each other."
1,Emily Carter,That sounds like a great idea! I'm always up f...
2,Emily Carter,I think it would be helpful to list out what w...
3,Michael Thompson,Sure! I'd love to collaborate and help each ot...
4,Michael Thompson,I think we should start by identifying the tas...
5,Michael Thompson,Let's break down the tasks we need to accompli...
6,Emily Carter,I think breaking down the tasks sounds like a ...
7,Emily Carter,"I also have a few tasks in mind, like updating..."
8,Emily Carter,Great! Let's split the tasks. I can handle the...
9,Michael Thompson,"I completely agree, Emily! Let's list out our ..."


In [11]:
df.to_csv("../data/extractions/synthetic_data_generation.out.csv", index=False)