# DX 704 Week 11 Project

In this project, you will develop and test prompts asking a language model to classify text from a home services query and match it to an appropriate category of home services.

The full project description and a template notebook are available on GitHub: [Project 11 Materials](https://github.com/bu-cds-dx704/dx704-project-11).


## Example Code

You may find it helpful to refer to these GitHub repositories of Jupyter notebooks for example code.

* https://github.com/bu-cds-omds/dx601-examples
* https://github.com/bu-cds-omds/dx602-examples
* https://github.com/bu-cds-omds/dx603-examples
* https://github.com/bu-cds-omds/dx704-examples

Any calculations demonstrated in code examples or videos may be found in these notebooks, and you are allowed to copy this example code in your homework answers.

## Part 1 : Design a Short Prompt

The provided file "queries.txt" contains sample text from requests by homeowners by email or phone.
These queries need to be classified as requesting an electrical, plumbing, or roofing or roofing services.
The provided file has columns query_id, query, and target_category.
Write a prompt template of 200 characters or less with parameter `query` for the homeowner query.
Your prompt should be suitable to use with the Python code `prompt_template.format(query=query)`.
Test your prompt with the model `gemini-2.0-flash` and suitable parsing code.

In [1]:
#!pip install google-generativeai

In [2]:
import os
os.environ["GOOGLE_API_KEY"] = "AIzaSyCU9bNnBfPLFHcaEdilYn8Ifjy_V6S1Tqc"

In [3]:
import google.generativeai as genai
genai.configure(api_key=os.environ["GOOGLE_API_KEY"])
model = genai.GenerativeModel("gemini-2.0-flash")

  from .autonotebook import tqdm as notebook_tqdm


In [5]:
# YOUR CHANGES HERE

import pandas as pd
import time

prompt_template = (
    'Classify the homeowner request: "{query}". '
    'Respond with one word: electrical, plumbing, or roofing.'
)

with open("short-prompt.txt", "w") as f:
    f.write(prompt_template)

def make_prompt(query: str) -> str:
    return prompt_template.format(query=query)

# load test data
df = pd.read_csv("queries.txt", sep="\t")

preds = []
for _, row in df.iterrows():
    query = row["query"]
    prompt = make_prompt(query)

    # The code below was a modification of my own code provided to me by chatGPT when trying to resolve a "Resources Exhausted" warning.
    # See acknowledgements.txt for more information.
    try:
        response = model.generate_content(prompt)
        raw_text = response.text.strip()
        pred = raw_text.split()[0].lower()

        allowed = {"electrical", "plumbing", "roofing"}
        if pred not in allowed:
            if "elect" in raw_text.lower():
                pred = "electrical"
            elif "plumb" in raw_text.lower():
                pred = "plumbing"
            elif "roof" in raw_text.lower():
                pred = "roofing"
            else:
                pred = "electrical"  # fallback
        preds.append(pred)

    except Exception as e:
        print("Error:", e)
        time.sleep(5)  # wait longer after a failure
        continue

    time.sleep(0.05)  # half-second between requests, upped by me to 3 seconds after 10 errors, still got 3 errors... Upping to 5 secs

    # This was the code I originally had that was giving me the resource error (preserved for posterity)
    # generate model response
    #response = model.generate_content(prompt)

    # parse and clean output
    #raw_text = response.text.strip()
    #pred = raw_text.split()[0].lower()

    #allowed = {"electrical", "plumbing", "roofing"}
    #if pred not in allowed:
        #if "elect" in raw_text.lower():
            #pred = "electrical"
        #elif "plumb" in raw_text.lower():
            #pred = "plumbing"
        #elif "roof" in raw_text.lower():
            #pred = "roofing"
        #else:
            #pred = "electrical"  # default fallback

    #preds.append(pred)

print(preds)

['roofing', 'plumbing', 'electrical', 'roofing', 'plumbing', 'electrical', 'roofing', 'plumbing', 'electrical', 'roofing', 'plumbing', 'electrical', 'roofing', 'plumbing', 'electrical', 'roofing', 'plumbing', 'electrical', 'roofing', 'plumbing', 'electrical', 'roofing', 'plumbing', 'electrical', 'roofing', 'plumbing', 'electrical', 'roofing', 'plumbing', 'electrical', 'roofing', 'plumbing', 'electrical', 'roofing', 'plumbing', 'electrical', 'roofing', 'plumbing', 'electrical', 'roofing', 'plumbing', 'electrical', 'roofing', 'plumbing', 'electrical', 'roofing', 'plumbing', 'electrical', 'roofing', 'plumbing']


Save your prompt template in a file "short-prompt.txt".
Save the results of your prompt testing in "short-output.tsv" with columns `query_id` and `predicted_category`.

In [6]:
# YOUR CHANGES HERE

out_df = pd.DataFrame({
    "query_id": df["query_id"],
    "predicted_category": preds
})
out_df.to_csv("short-output.tsv", sep="\t", index=False)

print("Saved prompt (short-prompt.txt)")
print("Saved predictions (short-output.tsv)")

Saved prompt (short-prompt.txt)
Saved predictions (short-output.tsv)


Submit "short-prompt.txt" and "short-output.tsv" in Gradescope.

Hint: your prompt may be re-tested with the Gemini API, so do not rely solely on lucky language model responses.

## Part 2: Find Short Prompt Mistakes

Construct 5 queries of 100 characters or less that trick your short prompt so that the wrong category is chosen.


In [21]:
# YOUR CHANGES HERE

# same prompt
prompt_template = (
    'Classify the homeowner request: "{query}". '
    'Respond with one word: electrical, plumbing, or roofing.'
)

def make_prompt(query: str) -> str:
    return prompt_template.format(query=query)

# 5 queries to hopefully confuse (by including ambiguous cross-referencing between categories)
test_items = [
    {
        "query": "Cabinet over 2nd floor bath is damp but all visible pipes are dry.",
        "target_category": "roofing",
    },
    {
        "query": "Ceiling damp above hallway, only sometimes, bathroom nearby is dry.",
        "target_category": "roofing",
    },
    {
        "query": "Toilet is fine but ceiling above it drips after long hot showers.",
        "target_category": "roofing",
    },
    {
        "query": "Basement smells musty after outage; no visible leaks.",
        "target_category": "electrical",
    },
    {
        "query": "My electric bill doubled after fixing a pipe leak last month.",
        "target_category": "plumbing",
    },
]

pred_rows = []

for item in test_items:
    q = item["query"]
    prompt = make_prompt(q)

    # your original pattern
    try:
        response = model.generate_content(prompt)
        raw_text = response.text.strip()
        pred = raw_text.split()[0].lower()

        allowed = {"electrical", "plumbing", "roofing"}
        if pred not in allowed:
            low = raw_text.lower()
            if "elect" in low:
                pred = "electrical"
            elif "plumb" in low:
                pred = "plumbing"
            elif "roof" in low:
                pred = "roofing"
            else:
                pred = "error" 
    except Exception as e:
        print("Error testing query:", e)
        pred = "error"  

    pred_rows.append({
        "query": q,
        "target_category": item["target_category"],
        "predicted_category": pred
    })

    time.sleep(5)  

print(pred_rows)

[{'query': 'Cabinet over 2nd floor bath is damp but all visible pipes are dry.', 'target_category': 'roofing', 'predicted_category': 'plumbing'}, {'query': 'Ceiling damp above hallway, only sometimes, bathroom nearby is dry.', 'target_category': 'roofing', 'predicted_category': 'plumbing'}, {'query': 'Toilet is fine but ceiling above it drips after long hot showers.', 'target_category': 'roofing', 'predicted_category': 'plumbing'}, {'query': 'Basement smells musty after outage; no visible leaks.', 'target_category': 'electrical', 'predicted_category': 'plumbing'}, {'query': 'My electric bill doubled after fixing a pipe leak last month.', 'target_category': 'plumbing', 'predicted_category': 'electrical'}]


Save your 5 queries in a file "mistakes.tsv" with columns `query`, `target_category` and `predicted_category`.

In [22]:
# YOUR CHANGES HERE

# save to mistakes.tsv
mistakes_df = pd.DataFrame(pred_rows)
mistakes_df.to_csv("mistakes.tsv", sep="\t", index=False)

Submit "mistakes.tsv" in Gradescope.

## Part 3: Design a Long Prompt

Repeat part 1 with a length limit of 5000 characters.

In [23]:
# YOUR CHANGES HERE

import pandas as pd
import time

# define and save long prompt
long_prompt_template = """You are a professional home-services assistant.
Your task is to classify a homeowner message into one of three service categories:
1. Electrical - issues involving power, outlets, wiring, switches, lighting, breakers, electric panels, generators, or other electrical components.
2. Plumbing - issues involving water flow, leaks, pipes, faucets, toilets, drains, hot-water heaters, dishwashers, sump pumps, or fixtures using water.
3. Roofing - issues related to the roof, gutters, attic, ceilings leaking from above, shingles, storms, or water intrusion from rain or outside.

If a request involves more than one type, choose the primary service needed.
Read the homeowner query carefully, think briefly about the physical cause, then respond with exactly one word:
electrical, plumbing, or roofing.

Classify this homeowner request:
"{query}"
"""

with open("long-prompt.txt", "w") as f:
    f.write(long_prompt_template)

def make_long_prompt(query: str) -> str:
    return long_prompt_template.format(query=query)

# load the same input queries
df = pd.read_csv("queries.txt", sep="\t")

preds = []
for _, row in df.iterrows():
    query = row["query"]
    prompt = make_long_prompt(query)

    try:
        response = model.generate_content(prompt)
        raw_text = response.text.strip()
        pred = raw_text.split()[0].lower()

        allowed = {"electrical", "plumbing", "roofing"}
        if pred not in allowed:
            low = raw_text.lower()
            if "elect" in low:
                pred = "electrical"
            elif "plumb" in low:
                pred = "plumbing"
            elif "roof" in low:
                pred = "roofing"
            else:
                pred = "error"  # default fallback
    except Exception as e:
        print("Error:", e)
        pred = "error"

    preds.append(pred)
    time.sleep(5)  # pacing as before

In [24]:
preds

['roofing',
 'plumbing',
 'electrical',
 'roofing',
 'plumbing',
 'electrical',
 'roofing',
 'plumbing',
 'electrical',
 'roofing',
 'plumbing',
 'electrical',
 'roofing',
 'plumbing',
 'electrical',
 'roofing',
 'plumbing',
 'electrical',
 'roofing',
 'plumbing',
 'electrical',
 'roofing',
 'plumbing',
 'electrical',
 'roofing',
 'plumbing',
 'electrical',
 'roofing',
 'plumbing',
 'electrical',
 'roofing',
 'plumbing',
 'electrical',
 'roofing',
 'plumbing',
 'electrical',
 'roofing',
 'plumbing',
 'electrical',
 'roofing',
 'plumbing',
 'electrical',
 'roofing',
 'plumbing',
 'electrical',
 'roofing',
 'plumbing',
 'electrical',
 'roofing',
 'plumbing']

Save your longer prompt template in a file "long-prompt.txt".
Save the results of your prompt testing in "long-output.tsv".
Both files should use the same columns as part 1.

In [25]:
# YOUR CHANGES HERE

out_df = pd.DataFrame({
    "query_id": df["query_id"],
    "predicted_category": preds
})
out_df.to_csv("long-output.tsv", sep="\t", index=False)

Submit "long-prompt.txt" and "long-output.tsv" in Gradescope.

## Part 4: Code

Please submit a Jupyter notebook that can reproduce all your calculations and recreate the previously submitted files.
You do not need to provide code for data collection if you did that by manually.

## Part 5: Acknowledgements

If you discussed this assignment with anyone, please acknowledge them here.
If you did this assignment completely on your own, simply write none below.

If you used any libraries not mentioned in this module's content, please list them with a brief explanation what you used them for. If you did not use any other libraries, simply write none below.

If you used any generative AI tools, please add links to your transcripts below, and any other information that you feel is necessary to comply with the generative AI policy. If you did not use any generative AI tools, simply write none below.