# Using LLMs to Generate Synthetic Data For Benchmarking

## Overview
This notebook generates diverse synthetic data examples to benchmark multiple open-source models for Named Entity Recognition (NER) tasks. For the purpose of donor relation email parsing.



# Importing OLLAMA
For this we'll use locally hosted models to generate synthtic data


In [15]:
import ollama

## Prompt Generation Function
We'll use a template function to generate the prompts to feed into the LLM to get some more radnomized results from the models.

In [74]:

def create_json_prompt_for_synthetic_data():
    """
    Generates a synthetic data generation prompt for donor relations emails,
    incorporating structured faculty, program, college, payment methods, and distribution types dynamically.
    """
    # Define the prompt structure
    prompt = f"""
Role:
You are a synthetic data generator tasked with producing diverse and realistic donor-initiated emails for fine-tuning a Named Entity Recognition (NER) model. Your output must include structured financial and donation-related named entities while ensuring linguistic variation in formality, tone, and readability levels.

Entity Categories & Examples:
  -	Person (Donor Name): Jane Doe, Michael Roberts
  -	Orginization (Organization Name): ABC Foundation, Global Giving Trust
  -	Email (Email Address): contact@donorsmith.org
  -	Phone (Phone Number): 204-456-2341
  -	Address (Mailing Address): 123 Donor Lane, Toronto, ON, M5J 2T3
  -	PaymentMethod (Payment Method): wire transfer, stock transfer, cheque, gifts in-kind, cash
  -	Date (First Payment Date): March 1, 2025
  -	Money (Donation Amount): $10,000, $5,000, $1,000
  -	Faculty(Faculty/Department): Civil Engineering, Economics, Music, Computer Science, Mathematics, Fine Arts
  -	Distribution (Purpose of Donation): bursary, scholarship, fellowship, award, prize
  -	Frequency (Gift Type): one-time gift, recurring gift, pledge, payment for a pledge, payment for a recurring gift

Objective:
  - Generate donor emails where individuals or organizations express intent to make a donation to the University of Manitoba.
  - Ensure each email contains multiple named entities related to contact details, payment methods, donation amounts, and distribution types.
  - Maintain natural language with diverse sentence structures.
  - Format the output in JSON with the text passage and corresponding entity annotations.
  - Do not include any entities that are not listed in the categories above.

Output Format:
The response must be in JSON format, containing a text passage and corresponding entity annotations. Do not use emojis, emoticons, or any non-text symbols in the output. Responses should be professional and formatted strictly in plain text.
{{
	"text": "Hello, \n I want to set up a yearly scholarship that provides a student with $40,000 if they are in Price Faculty of Engineering as well as a $10,000 scholarship if they are in Computer Science. I would like to know if this is possible. \n Thanks,\n Bruce Niemi.",
	"entities": [
		{{"entity": "yearly", "type": "Interval"}},
		{{"entity": "$40,000", "type": "Money"}},
		{{"entity": "Price Faculty of Engineering", "type": "Faculty"}},
		{{"entity": "$10,000", "type": "Money"}},
		{{"entity": "Computer Science", "type": "Faculty"}},
		{{"entity": "Bruce Niemi", "type": "Person"}}
	]
}}

{{
  "text": "Hello,\nI would like to establish a scholarship that provides a student with $8,000 per year, starting in 2026. Please let me know the process to set this up.\nYou can contact me at 204-555-7890 or via mail at 456 River Avenue, Winnipeg, MB.\nBest,\nJordan Mitchell.",
  "entities": [
    {{ "entity": "$8,000", "type": "Money" }},
    {{ "entity": "per year", "type": "Interval" }},
    {{ "entity": "2026", "type": "Start Date" }},
    {{ "entity": "204-555-7890", "type": "Phone" }},
    {{ "entity": "456 River Avenue, Winnipeg, MB", "type": "Address" }},
    {{ "entity": "Jordan Mitchell", "type": "Person" }}
  ]
}}

{{
  "text": "Hi,\nI am interested in setting up a scholarship that provides a student with $2,500 every three months. Could you provide details on how to proceed?\nFeel free to reach me at 204-555-2345 or at my address, 789 Oak Street, Winnipeg, MB.\nThanks,\nEmily Carter.",
  "entities": [
    {{"entity": "$2,500", "type": "Money"}},
    {{"entity": "every three months", "type": "Interval"}},
    {{"entity": "204-555-2345", "type": "Phone"}},
    {{"entity": "789 Oak Street, Winnipeg, MB", "type": "Address"}},
    {{"entity": "Emily Carter", "type": "Person"}}
  ]
}}
"""
    return prompt


# Now we will start to generate all of the prompts to feed into the LLM


In [75]:
all_prompts = []
for i in range(100):
    prompt = create_json_prompt_for_synthetic_data()
    all_prompts.append(prompt)

## Define the data model
This is the data format the the model is told to follow so we will use OLLama
to properly format the data into this format.

In [76]:
from pydantic import BaseModel
from typing import List

class Entity(BaseModel):
    entity: str
    types: str

class Result(BaseModel):
    text: str
    entities: List[Entity]

## Calling Model
Now, we'll call mistral and opensource 7b model to generate the data.


In [77]:
import json

# Open the file once before the loop
with open('results.jsonl', 'w') as f:
    for prompt in all_prompts:
        result = ollama.chat(  # Fixed typo in variable name from 'resutlt'
            messages=[{'role': 'user', 'content': prompt}],
            model="llama3:latest",
            format=Result.model_json_schema(),
            options={ "temperature": 1, "top_p": 0.9}
        )
        data = Result.model_validate_json(result.message.content)
        # Write to file immediately
        json_line = json.dumps(data.model_dump())
        f.write(json_line + '\n')
        print(data)



text="Dear University of Manitoba Development Office,\r\nI am delighted to be supporting your cause through a one-time gift of $20,000.\r\nI would like this donation to go towards the Computer Science Department's Fellowship Program. Please let me know if this is feasible and how we can make it happen. \r\nThank you for your time and consideration.\r\nSincerely,\r\nMichael Roberts." entities=[Entity(entity='$20,000', types='Money'), Entity(entity='University of Manitoba Development Office', types='Orginization'), Entity(entity="Computer Science Department's Fellowship Program", types='Distribution, Faculty')]
text='Dear Friends at the University of Manitoba Foundation,\r\n\r\nI am writing to express my interest in establishing a bursary for students studying Music at the university. I would like to make a one-time gift of $15,000 and a recurring gift of $1,000 every two years.\r\n\r\nPlease let me know if this is feasible and what process I need to follow to set up these gifts.\r\n\r\n