# Generation Experiments 

The final step in this process is using results from our retrieval step to inform the response we get back from an LLM! 

Limiting the models knowledge to the context provided by us helps reduce probability of hallucinations and improve the accuracy of our responses.

In [1]:
import sys 
import subprocess

# get root of current repo and add to our path
root_dir = subprocess.check_output(["git", "rev-parse", "--show-toplevel"], stderr=subprocess.DEVNULL).decode("utf-8").strip()

sys.path.append(root_dir)

## Retrieval + Prompt Construction 

Grab results from our database and build a prompt with context to feed to an LLM of our choice!

In [3]:
from utils.postgres import PostgresClient

results = PostgresClient.hybrid_search("Tell me about your education")

In [5]:
context = ""
for result in results:
    context += result["clean_text"] + "\n"

In [6]:
def generate_prompt(query: str, context: str) -> str:
    prompt = f"""
        You are a helpful assistant for Sophie Marshall. A Data Engineer seeking new opprotunities in the Data Engineering field.
        Your task is to answer the user's query based on the provided context. If the context does not contain enough information 
        to answer the query, respond with "I don't know. Would you like to follow up with Sophie directly?".

        <query>{query}</query>
        <context>{context}</context>
    """

    return prompt

In [7]:
prompt = generate_prompt("Tell me about your education", context)

In [10]:
from utils.constants import TEST_QUERIES

query_prompt_dict = {}

for query in TEST_QUERIES:
    search_results = PostgresClient.hybrid_search(query)
    context = ""
    for result in search_results:
        context += result["clean_text"] + "\n"
    prompt = generate_prompt(query, context)
    query_prompt_dict[query] = prompt

## Generation

Until this is ready to go live, I'll use ChatGPT to get my point across. I want to avoid paying for API calls until I need to. 

Here's how it works: 
- I'll run the hybrid search here in the notebook 
- The `generate_prompt` function will return a string of text that I will feed to ChatGPT that contains the returned context from hybrid search as well as some response instructions 
- I will post the response from ChatGPT into this notebook to see what our agent might say in the future!

### Results

`Query:` 
- Tell me about yourself

`Response:` 
- Sophie Marshall is a data engineer with a strong interdisciplinary background in psychology, economics, and mathematical modeling. She graduated from the University of Wisconsin–Madison in 2022 with a Bachelor of Science in Psychology and Economics with a mathematical emphasis. During her undergraduate years, she was a Division I athlete on the Wisconsin women’s rowing team, where she developed skills in discipline, teamwork, and resilience. Her work combines expertise in data engineering, machine learning, and analytics. One of her key accomplishments is the design and deployment of a retrieval-augmented generation (RAG) powered chatbot that enables natural language search across internal datasets. Sophie is currently seeking mid-level data or analytics engineering roles in the DC metro area, with a preference for hybrid work but openness to remote or in-person opportunities where the fit is right.

`Query:`
- What is your educational background

`Response:`
- Sophie Marshall graduated from the University of Wisconsin–Madison in 2022 with a Bachelor of Science in Psychology and Economics, with a mathematical emphasis. Her academic background reflects a strong interdisciplinary foundation in human cognition, data modeling, statistical analysis, and systems thinking—skills that directly support her work in data engineering and machine learning.

`Query:`
- Why are you seeking a new position?

`Response:`
- Sophie is seeking a new position because the grant supporting her work on PBS’s Innovation Team concludes in June 2025. She’s looking for her next opportunity—ideally one where she can continue building intelligent data systems that bridge infrastructure and user experience.

`Query:`
- What experience do you have with data pipelines?

`Response:`
- Sophie has extensive experience designing and building data pipelines that integrate machine learning, data engineering, and backend infrastructure. At PBS’s Digital Innovation Team, she has led projects involving dynamic pipeline development, ETL workflows, vector database integration, and retrieval-augmented systems. Her work includes building robust data workflows using Python, Apache Airflow, and AWS Step Functions, as well as designing data labeling pipelines for machine learning model training and fine-tuning.