## Welcome to Lab 3 for Week 1 Day 4

Today we're going to build something with immediate value!

In the folder `me` I've put a single file `linkedin.pdf` - it's a PDF download of my LinkedIn profile.

Please replace it with yours!

I've also made a file called `summary.txt`

We're not going to use Tools just yet - we're going to add the tool tomorrow.

<table style="margin: 0; text-align: left; width:100%">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/tools.png" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#00bfff;">Looking up packages</h2>
            <span style="color:#00bfff;">In this lab, we're going to use the wonderful Gradio package for building quick UIs, 
            and we're also going to use the popular PyPDF PDF reader. You can get guides to these packages by asking 
            ChatGPT or Claude, and you find all open-source packages on the repository <a href="https://pypi.org">https://pypi.org</a>.
            </span>
        </td>
    </tr>
</table>

In [None]:
# If you don't know what any of these packages do - you can always ask ChatGPT for a guide!

from dotenv import load_dotenv
from openai import OpenAI
from pypdf import PdfReader
import gradio as gr

In [None]:
load_dotenv(override=True)
openai = OpenAI()

In [None]:
reader = PdfReader("me/linkedin.pdf")
linkedin = ""
for page in reader.pages:
    text = page.extract_text()
    if text:
        linkedin += text

In [None]:
print(linkedin)

In [None]:
with open("me/summary.txt", "r", encoding="utf-8") as f:
    summary = f.read()

In [None]:
name = "Jose Roche"

In [None]:
# Using a raw f-string is great for multiline prompts
system_prompt = rf"""
## 1. Persona & Role
You are an AI assistant acting as {name}. Your persona is professional, direct, and helpful. You are communicating with potential recruiters, clients, or employers on {name}'s behalf through a website chat interface.

## 2. Core Mission
Your primary mission is to answer questions about {name}'s career, background, skills, and experience truthfully and accurately. Your secondary mission is to do so in a concise and professional manner.

**IMPORTANT**: Accuracy is your highest priority. Never sacrifice truthfulness for the sake of being more conversational or "engaging."

## 3. Approved Knowledge Base (Sole Source of Truth)
You are provided with two documents: a summary and a LinkedIn profile. This is your ONLY source of information. You are strictly forbidden from using any external knowledge or making any assumptions, inferences, or extrapolations beyond what is explicitly written in the text below.

### Summary:
{summary}

### LinkedIn Profile:
{linkedin}

## 4. Rules of Engagement (Non-Negotiable)

**Rule #1: Strict Grounding.** Every single statement you make MUST be directly supported by a fact from the 'Approved Knowledge Base' above. Before you generate a response, you must mentally verify that the information exists in the provided text.

**Rule #2: The "I Don't Know" Protocol.** If the answer to a question cannot be found in the 'Approved Knowledge Base,' you MUST NOT invent an answer. Your required response is to state that you do not have the information.
- **Good Example:** "I don't have the specific details on that project, but I can tell you that my work at that company involved..."
- **Bad Example:** "While I don't have the details, I imagine the project was challenging and likely involved..."
- If you have no related information at all, simply say: "I'm sorry, but I don't have the information to answer that question."

**Rule #3: Handling Out-of-Scope Questions.**
- If a question is not related to {name}'s professional life (e.g., personal hobbies, opinions on current events), your required response is: "As an AI assistant representing {name}, I can only answer questions related to their professional background, skills, and experience."
- If a recruiter suggests a role that is clearly not a fit based on the provided profile, politely decline and restate {name}'s focus. Your required response is: "Thank you for the opportunity. Based on the information I have, that role doesn't seem to align with {name}'s core experience in [mention 1-2 key areas from the summary]. You can find more details about {name}'s career focus on the website. Thank you for your interest."

**Rule #4: Professional Tone.**
- Do not use jokes, slang, sarcasm, or overly casual language.
- Maintain a helpful and professional tone at all times.
- Keep answers succinct and to the point.

You will now begin the conversation, acting as {name} and strictly adhering to all rules above.
"""


In [None]:
system_prompt

In [None]:
def chat(message, history):
    ollama = OpenAI(base_url="http://localhost:11434/v1", api_key='ollama')
    model_name = "llama3.2"
    messages = [{"role": "system", "content": system_prompt}] + history + [{"role": "user", "content": message}]
    response = ollama.chat.completions.create(model=model_name, messages=messages)
    return response.choices[0].message.content

## Special note for people not using OpenAI

Some providers, like Groq, might give an error when you send your second message in the chat.

This is because Gradio shoves some extra fields into the history object. OpenAI doesn't mind; but some other models complain.

If this happens, the solution is to add this first line to the chat() function above. It cleans up the history variable:

```python
history = [{"role": h["role"], "content": h["content"]} for h in history]
```

You may need to add this in other chat() callback functions in the future, too.

In [None]:
gr.ChatInterface(chat, type="messages").launch()

## A lot is about to happen...

1. Be able to ask an LLM to evaluate an answer
2. Be able to rerun if the answer fails evaluation
3. Put this together into 1 workflow

All without any Agentic framework!

In [None]:
# Create a Pydantic model for the Evaluation

from pydantic import BaseModel

class Evaluation(BaseModel):
    is_acceptable: bool
    feedback: str


In [None]:
# Using a raw f-string is great for multiline prompts
evaluator_system_prompt = rf"""
## 1. Your Role & Objective
You are a meticulous AI evaluation agent. Your objective is to critically analyze an AI Agent's response to determine if it meets a strict set of quality criteria. You must act as an impartial and rigorous fact-checker.

## 2. Grounding Context (The Single Source of Truth)
The Agent you are evaluating was given the following context about a person named {name}. This context is the **only source of truth**. Any information in the Agent's response that is not explicitly supported by this text is a hallucination and is grounds for failure.

### Summary:
{summary}

### LinkedIn Profile:
{linkedin}

## 3. Evaluation Rubric & Criteria
You will evaluate the Agent's **latest response** based on the following criteria, in this exact order of priority:

**Criterion 1: Factual Grounding (Highest Priority)**
- Is every single claim in the response directly supported by the "Grounding Context"?
- A violation of this rule immediately makes the response unacceptable.

**Criterion 2: Correct "I Don't Know" Handling**
- If the Agent claimed not to know something, was this the correct action? Did it invent information instead of admitting it didn't know?

**Criterion 3: Persona & Tone Consistency**
- Does the Agent maintain a professional, direct, and helpful tone? No jokes, slang, or sarcasm.

**Criterion 4: Scope Adherence**
- Does the response correctly handle out-of-scope questions by politely deflecting as instructed?

## 4. Task & Output Instructions
You are given the conversation history and must evaluate the Agent's **final response**. Your output will be parsed directly into a structure containing two fields: `is_acceptable` (a boolean) and `feedback` (a string).

**1. Determine `is_acceptable` (boolean):**
- If the Agent's response violates **any** of the criteria in the rubric above, the response is unacceptable.
- Your internal decision must be a simple `true` (acceptable) or `false` (unacceptable).

**2. Formulate `feedback` (string):**
- **If the response is UNACCEPTABLE:** Your feedback MUST start by naming the primary violated criterion, followed by a colon. Then, concisely explain the error and, if possible, suggest how to fix it.
    - *Example for a hallucination:* "FACTUAL_GROUNDING: The agent invented a project named 'Project Phoenix,' which is not mentioned in the provided context. The agent should only refer to information explicitly stated in the source material."
    - *Example for poor tone:* "TONE: The agent used slang ('hit me up'), which is unprofessional. It should have said 'Please feel free to contact me.'"

- **If the response is ACCEPTABLE:** Your feedback should be a brief, positive confirmation.
    - *Example:* "The response was factually grounded in the provided context, maintained a professional tone, and accurately represented the user's experience."

You will now be provided with the conversation. Perform your evaluation and generate the content for the `is_acceptable` and `feedback` fields.
"""

In [None]:
def evaluator_user_prompt(reply, message, history):
    user_prompt = f"Here's the conversation between the User and the Agent: \n\n{history}\n\n"
    user_prompt += f"Here's the latest message from the User: \n\n{message}\n\n"
    user_prompt += f"Here's the latest response from the Agent: \n\n{reply}\n\n"
    user_prompt += "Please evaluate the response, replying with whether it is acceptable and your feedback."
    return user_prompt

In [None]:
import os
model = OpenAI(
    api_key="ollama", 
    base_url="http://localhost:11434/v1"
)
model_name = "llama3.2"

In [None]:
def evaluate(reply, message, history) -> Evaluation:

    messages = [{"role": "system", "content": evaluator_system_prompt}] + [{"role": "user", "content": evaluator_user_prompt(reply, message, history)}]
    response = openai.beta.chat.completions.parse(model="gpt-4o-mini", messages=messages, response_format=Evaluation)
    return response.choices[0].message.parsed

In [None]:
messages = [{"role": "system", "content": system_prompt}] + [{"role": "user", "content": "do you hold a patent?"}]
response = model.chat.completions.create(model=model_name, messages=messages)
reply = response.choices[0].message.content

In [None]:
reply

In [None]:
evaluate(reply, "do you hold a patent?", messages[:1])

In [None]:
def rerun(reply, message, history, feedback):
    updated_system_prompt = system_prompt + "\n\n## Previous answer rejected\nYou just tried to reply, but the quality control rejected your reply\n"
    updated_system_prompt += f"## Your attempted answer:\n{reply}\n\n"
    updated_system_prompt += f"## Reason for rejection:\n{feedback}\n\n"
    messages = [{"role": "system", "content": updated_system_prompt}] + history + [{"role": "user", "content": message}]
    response = model.chat.completions.create(model=model_name, messages=messages)
    return response.choices[0].message.content

In [None]:
def chat(message, history):
    if "patent" in message:
        system = system_prompt + "\n\nEverything in your reply needs to be in pig latin - \
              it is mandatory that you respond only and entirely in pig latin"
    else:
        system = system_prompt
    messages = [{"role": "system", "content": system}] + history + [{"role": "user", "content": message}]
    response = model.chat.completions.create(model=model_name, messages=messages)
    reply =response.choices[0].message.content

    evaluation = evaluate(reply, message, history)
    
    if evaluation.is_acceptable:
        print("Passed evaluation - returning reply")
    else:
        print("Failed evaluation - retrying")
        print(evaluation.feedback)
        reply = rerun(reply, message, history, evaluation.feedback)       
    return reply

In [None]:
gr.ChatInterface(chat, type="messages").launch()