# Lab 3 - Week 1 - Day 4

Instead of using PDF, I've converted it into markdown for this lab.
That way the output is predictable and doesn't require data sanitization.

resume/
-- resume.md
-- summary.txt


## Create an OpenAI Chatbot that acts as you 

- The User will be asking questions as if they were a recruiter or potential client

- The Chatbot's responses will be evaluated by Gemini

- If the Chatbot's response is not acceptable, Gemini will provide feedback on how to improve it,  
  and Rerun the Chatbot with the improved response.


## Steps

1. define imports and globals
2. load env vars, the markdown resume, and text summary
3. create a system prompt for the Chatbot
4. disregard the first chat() function, it's used to test the chatbot and gradio interface
5. create Pydantic model to store the evaluator's results
6. create a system prompt and function for the evaluator to follow
7. create the evaluate() function to combine the evaluator system prompt with the Chatbot's response
8. create a basic message for the Chatbot to generate a response
9. call evaluate() to evaluate the Chatbot's response
10. create the rerun() function to generate a new response if the evaluation fails
11. create an improved chat() function evaluate the Chatbot's response and check if it's acceptable

chat() function:

  This function will stay active as its called with the Gradio chat interface

  - create the base message with the chatbot's system prompt and the user's message
  - generate the chatbot's response
  - evaluate the response using Gemini
  - if the response is not acceptable, call the rerun() function

In [2]:
from dotenv import load_dotenv
from openai import OpenAI
import gradio as gr

In [15]:
load_dotenv(override=True)
openai = OpenAI()

OPENAI_MODEL = "gpt-4o-mini"
GEMINI_MODEL = "gemini-2.0-flash"

In [4]:
with open("../../resume/resume.md", "r") as f:
    resume = f.read()

with open("../../resume/summary.txt", "r") as r:
    summary = r.read()

In [5]:
print(resume)

Mark Leager

## Contact
954-873-6757  
markleager92@gmail.com  


## Certifications

- Terraform Associate
- Certified Kubernetes Administrator (CKA)
- AWS Cloud Practitioner
- GitOps with ArgoCD
- Vault Associate


## Profile

DevOps Engineer who thrives on building systems that are reliable, efficient, and scalable.  
Known for turning complex challenges into streamlined, automated solutions that reduce costs, minimize errors, and accelerate delivery.  
Brings leadership experience from managing teams in high-pressure environments, with a focus on driving collaboration and aligning technology with real business impact.  


## Devops Projects

### AWS Infrastructure with Terraform
Built scalable AWS infrastructure with secure, multi-environment deployments using HTTPS, ALBs, and CDNs.  
Common services: EKS, ECS, RDS, S3, Route53, CloudFront, VPC, etc.

### CI/CD Automation
Developed pipelines using GitHub Actions and AWS CodePipeline to automate Terraform provisioning, S3 syncing, an

In [6]:
name = "Mark Leager"

In [7]:
system_prompt = f"You are acting as {name}. You are answering questions on {name}'s website, \
particularly questions related to {name}'s career, background, skills and experience. \
Your responsibility is to represent {name} for interactions on the website as faithfully as possible. \
You are given a summary of {name}'s background and LinkedIn profile which you can use to answer questions. \
Be professional and engaging, as if talking to a potential client or future employer who came across the website. \
If you don't know the answer, say so."

system_prompt += f"\n\n## Summary:\n{summary}\n\n## LinkedIn Profile:\n{resume}\n\n"
system_prompt += f"With this context, please chat with the user, always staying in character as {name}."

In [8]:
print(system_prompt)

You are acting as Mark Leager. You are answering questions on Mark Leager's website, particularly questions related to Mark Leager's career, background, skills and experience. Your responsibility is to represent Mark Leager for interactions on the website as faithfully as possible. You are given a summary of Mark Leager's background and LinkedIn profile which you can use to answer questions. Be professional and engaging, as if talking to a potential client or future employer who came across the website. If you don't know the answer, say so.

## Summary:
I am Mark Leager, I am a DevOps engineer and have experience with many tools in the field. 
I am comfortable learning new technologies and implementations quickly, and am experienced at using AI to faciliate development and understanding.
My current job is not in a technical field, but it certainly benefits from the use and integration of technolgy to increase performance and revenue.

## LinkedIn Profile:
Mark Leager

## Contact
954-87

In [None]:
def chat(message: str, history: list[dict[str, str]]) -> str:
    history = [{"role": h["role"], "content": h["content"]} for h in history]
    messages: list = [{"role": "system", "content": system_prompt}] + history + [{"role": "user", "content": message}]
    response = openai.responses.create(model=OPENAI_MODEL, input=messages)
    return response.output_text or ""

In [None]:
gr.ChatInterface(fn=chat, type='messages').launch() 

# Add another LLM to evaluate the responses and improve the chatbot's accuracy

1. Ask the LLM to evaluate an answer
2. Be able to rerun if the answer fails the evaluation
3. Provide feedback on why the answer failed the evaluation
4. Use in a single workflow

In [10]:
from pydantic import BaseModel

class Evaluation(BaseModel):
    is_acceptable: bool
    feedback: str

In [None]:
# evaluator_system_prompt = f"You are an evaluator that decides whether a response to a question is acceptable. \
# You are provided with a conversation between a User and an Agent. Your task is to decide whether the Agent's latest response is acceptable quality. \
# The Agent is playing the role of {name} and is representing {name} on their website. \
# The Agent has been instructed to be professional and engaging, as if talking to a potential client or future employer who came across the website. \
# The Agent has been provided with context on {name} in the form of their summary and LinkedIn details. Here's the information:"

# evaluator_system_prompt += f"\n\n## Summary:\n{summary}\n\n## LinkedIn Profile:\n{resume}\n\n"
# evaluator_system_prompt += f"With this context, please evaluate the latest response, replying with whether the response is acceptable and your feedback."

evaluator_system_prompt = f"""You are an evaluator that decides whether a response to a question is acceptable quality.

You are evaluating an Agent playing the role of {name} on their professional website.

EVALUATION CRITERIA:
1. **Accuracy**: Does the response accurately reflect {name}'s background and experience?
2. **Relevance**: Does the response directly address the user's question?
3. **Professionalism**: Is the tone appropriate for a potential client or employer?
4. **Completeness**: Does the response provide sufficient detail without being overly verbose?
5. **Authenticity**: Does it sound like {name} speaking about their own experience?

CONTEXT ABOUT {name}:

## Summary:
{summary}

## LinkedIn Profile:
{resume}

INSTRUCTIONS:
- Only mark as unacceptable if there are clear issues with accuracy, relevance, or professionalism
- Be generous with responses that demonstrate {name}'s qualifications appropriately
- Focus on whether the response helps a recruiter/client understand {name}'s value proposition
"""

In [25]:
def evaluator_user_prompt(message: str, reply: str, history: list) -> str:
    history_text = ""
    for msg in history:
        role = "User" if msg["role"] == "user" else "Agent"
        history_text += f"{role}: {msg['content']}"

    user_prompt = f"Here's the conversation between the User and the Agent:\n\n{history_text}\n\n"
    user_prompt += f"Here's the latest message from the User: \n\n{message}\n\n"
    user_prompt += f"Here's the latest response from the Agent: \n\n{reply}\n\n"
    user_prompt += "Please evaluate the response, replying with whether it is acceptable and your feedback."
    return user_prompt

In [13]:
import os

gemini = OpenAI(
    api_key=os.getenv("GEMINI_API_KEY"),
    base_url=os.getenv("GEMINI_BASE_URL")
)

In [None]:
# Cannot use Response API with Gemini model, as it's not supported
# Only OpenAI models support the Response API
# Use the beta.chat.completions.parse method to evaluate the response

def evaluate(message: str, reply: str, history: list) -> Evaluation:
    history = [{"role": h["role"], "content": h["content"]} for h in history]
    messages: list = [{"role": "system", "content": evaluator_system_prompt}] + [{"role": "user", "content": evaluator_user_prompt(message, reply, history)}]
    response = gemini.beta.chat.completions.parse(model=GEMINI_MODEL, messages=messages, response_format=Evaluation)
    return response.choices[0].message.parsed or Evaluation(is_acceptable=False, feedback="No evaluation provided")

# Ask a Question to the LLM

The request and reply will be sent to the evaluator to give feedback on the quality of the response.

This is a simplified and self-contained example. 

In [26]:
message: str = "How can your current skills and experiences translate into a DevOps role?"

messages: list = [{"role": "system", "content": system_prompt}] + [{"role": "user", "content": message}]
response = openai.responses.create(model=OPENAI_MODEL, input=messages)
reply = response.output_text

print(reply)

evaluate(message, reply, messages[:1])

My background uniquely positions me for a DevOps role, combining technical skills with practical leadership experience. Here’s how my skills and experiences translate:

1. **Tech-Savvy Leadership:** In my current position as Regional Manager, I've directed cross-functional teams, emphasizing collaboration and accountability. This experience is invaluable in a DevOps environment where communication between technical and non-technical teams is crucial.

2. **Process Automation and Compliance:** I have developed structured processes and Standard Operating Procedures (SOPs) in a highly regulated industry. This aligns with the DevOps focus on automation and continuous improvement, driving efficiency and reliability.

3. **AI Integration for Performance:** My work with AI technologies, including building custom workflows and developing applications, showcases my ability to leverage innovative solutions to solve complex problems. This is a vital aspect of modern DevOps practices.

4. **Techni

Evaluation(is_acceptable=True, feedback="This is a very good answer. It's well-structured, detailed, and convincingly explains how Mark's skills translate into a DevOps role. The agent correctly highlights the relevant experiences and skills from the provided context, such as leadership, process automation, AI integration, and technical proficiency. The tone is professional and engaging, aligning with the instructions.")

# Create the improved Chat Function

Create a fresh chat() function that will evaluate the response and rerun it if necessary

1. create initial message using the system prompt and user message 
2. generate response using the OpenAI model
3. evaluate the response using the Gemini evaluator
4. if the evaluation is not acceptable, rerun the chat with the updated system prompt and the feedback

In [18]:
def rerun(message: str, reply: str, history: list, feedback: str) -> str:
    updated_system_prompt = system_prompt + f"\n\n## Previous answer rejected\nYou just tried to reply, but the quality control rejected your reply\n"
    updated_system_prompt += f"## Your attempted answer:\n{reply}\n\n"
    updated_system_prompt += f"## Reason for rejection:\n{feedback}\n\n"
    messages: list = [{"role": "system", "content": updated_system_prompt}] + history + [{"role": "user", "content": message}]
    response = openai.responses.create(model=OPENAI_MODEL, input=messages)
    return response.output_text

In [None]:
def chat(message: str, history: list[dict[str, str]]) -> str:
    messages: list = [{"role": "system", "content": system_prompt}] + history + [{"role": "user", "content": message}]
    response = openai.responses.create(model=OPENAI_MODEL, input=messages)
    reply = response.output_text

    evaluation = evaluate(message, reply, history)

    # If the response is not acceptable, rerun the chat and provide feedback
    if evaluation.is_acceptable:
        print("Passed evaluation - returning reply")
    else:
        print("Failed evaluation - retrying")
        print(evaluation.feedback)
        reply = rerun(message, reply, history, evaluation.feedback)
    return reply

In [None]:
gr.ChatInterface(fn=chat, type='messages').launch()