# Milestone \#3  Usage Scenario 1

In [3]:
import mermaid
from mermaid.graph import Graph

## 1. Short Description

A young working professional uses *Looking Glass* to understand patterns in weekly work stress, communication habits, and emotional shifts across their ongoing career.


## 2. Narrative of Scenario

After a long workday, the user opens Looking Glass and uploads their weekly materials such as notes, short diaries, and fragments of Slack conversations. Once the new entry is saved, the system quietly sends the text to the LLM for analysis. The LLM detects patterns related to deadlines, pressure points, and emotional cues, then returns a structured output including triggers, a concise summary, reflective questions, and selected quotes from the user‚Äôs writing as reference.

The chatbot then presents the user with optional next steps: (1) explore stress origins through guided reflective dialogue, (2) identify and refine a professional development goal, (3) compare this week‚Äôs emotional or thematic patterns to previous weeks, and (4) visualize long-term changes such as stress trajectories or productivity confidence. If the user has set a professional goal like ‚Äúspeak up more in meetings‚Äù or ‚Äúimprove work-life balance‚Äù, *Looking Glass* tracks progress by combining LLM interpretation of the user‚Äôs writing with direct user input on goal attainment. Finally, *Looking Glass* will save this micro-plan and marks it for follow-up in the next weekly review.



In [5]:
%%mermaidjs
sequenceDiagram
    actor U as User
    participant R as Looking Glass
    participant L as LLM

    U->>R: Upload weekly notes, diaries, and Slack fragments
    R->>L: Send text for analysis
    L-->>R: Return stress triggers + summary + themes + quotes + questions
    R-->>U: Display insights and offer next-step options
    U->>R: Select a reflective option (dialogue / goal / comparison / trajectories)
    R->>L: Analyze prior entries for comparison or trends
    L-->>R: Return emotional/thematic patterns across time
    R-->>U: Present trends or goal-tracking insights
    U->>R: Confirm small change or weekly micro-plan
    R-->>U: Save plan and schedule follow-up for next review


## 3. Data Description


#### training Data

- Labeled journal entries/Emotion classification dataset
- HappyDB (GitHub)
- Public forum posts (Reddit)

#### User-given data

- Weekly work logs
- Short diaries or personal reflections 
- Slack or Teams message/emails fragments
- Notes from meetings, task updates, or project check-ins

**What LLM taken from these data:**

- All user text consolidated into one weekly packet
- Each text segment paired with a loose timestamp


#### Model output

- A weekly stress imagery map  
  - A simple internal graph where each node represents a key moment from the week (for example ‚ÄúMon ‚Äì rushed client deck (stress 8)‚Äù)  
  - Edges connect moments that share the same underlying theme or stressor (for example deadline pressure, communication friction), so the interface can later show how scattered events cluster into one pattern  
- A simple graph of stress triggers across the week  
  - Each point marks a moment the model flags as a pressure point (e.g., ‚ÄúWed client call‚Äù, ‚ÄúFri last-minute change‚Äù)  
- A short weekly reflection summary  
  - One compact paragraph capturing the main stressors, small wins, and overall tone of the week  
- Two to three tailored reflection questions  
- One small actionable change plan for the coming week   


### Evaluation

#### Success Criteria
- Triggers on the graph align with the actual high pressure moments described in the user text  
- The weekly summary stays within an expected length range and feels neutral, accurate, and non judgmental  
- Extracted quotes are copied exactly from the user‚Äôs writing and clearly support the summary or themes, with no hallucinated content  
- The emotional trend label (for example rising stress, stabilizing) is consistent with the overall tone across multiple entries in that week  
- The suggested small change is concrete and doable (the user can imagine how to try it next week, rather than feeling it is vague)  
- Reflection questions are specific to this week‚Äôs patterns and help the user notice at least one new connection or insight  
- When the input is clearly sparse or dominated by one source (for example only emails), the system surfaces a coverage warning rather than overstating conclusions
- Under adversarial input, the model still:

   * Returns JSON
   * Does not hallucinate
   * Clearly notes off-topic input

#### Possible Issues
- The week consists almost entirely of formal work emails, with almost no personal expression, so emotional visibility is low and triggers may feel shallow  
- Sarcasm, jokes, and habitual phrases are misread as positive or neutral emotion  
  - for example ‚ÄúGreat, another five hour meeting today‚Äù being interpreted as genuine enthusiasm  
- Meeting notes are written in third person (‚ÄúThe team discussed‚Ä¶‚Äù, ‚ÄúClient raised concerns‚Ä¶‚Äù), which makes it harder to detect the user‚Äôs own feelings, so stress may be underestimated or misattributed  
- Very short or boilerplate entries (for example repeated ticket updates or agenda bullets) give the model little to work with, which can lead to repetitive summaries or over interpretation of small details  
- Mixed languages or heavy workplace jargon can cause the model to misclassify tone if not handled carefully in prompting  


## 4. Structured prompt

In [9]:
Prompt = """
GOAL  
This is a weekly reflection exercise in which you play the role of a reflection engine behind "Looking Glass". Your goal is to help a working professional reflect on their week by analyzing their written logs, identifying stress patterns, summarizing key themes, and proposing small, concrete next steps. Your goal is to improve understanding and to help the user notice patterns in their own behavior, emotions, and context. You are not a therapist and you must not give advice or instructions. Your role is to observe, reflect, and gently prompt the user to think for themselves.

PERSONA  
In this scenario you play a calm, neutral, and supportive reflection assistant. You:
- Use simple and professional language.
- Focus on clarity and emotional safety.
- Treat the user as a capable adult who can make their own choices.
- Adapt your tone and style to the user‚Äôs stated preferences (for example more formal, more casual, or more encouraging), while remaining clear and professional.
You have high expectations for the user‚Äôs ability to learn from their own experience, and you believe they can make thoughtful choices when given clear reflections.

NARRATIVE  
The user has had a long work week. They open Looking Glass and upload or paste their weekly materials, for example notes, short diaries, and Slack message fragments. The system passes this bundle of text to you together with some basic context. You quietly analyze the text and return a structured reflection. Looking Glass then shows this reflection and may use your questions and micro_plan to guide a short follow up dialogue with the user. The interaction for this prompt ends once you have produced a complete JSON reflection for the current week.

INPUT FORMAT  
You will receive input in this logical structure (the calling code or chat will approximate this):

- WEEK_TEXT: a block of text that contains the user‚Äôs writing for this week. This may include meeting notes, to do comments, Slack messages, and short reflections.  
- WEEK_METADATA: high level information such as:
  - week_id: a label for this week (for example "2025-W10")
  - has_active_goal: true or false
  - current_goal_description: a short phrase if a goal exists (for example "speak up more in meetings")
- OPTIONAL_PAST_SUMMARY: a compact description of previous weeks if comparison is requested. This may be empty.

Assume that all of this is already merged into a single prompt that you can read. You do not need to parse raw JSON. Just follow the logical roles described above.

OUTPUT FORMAT  
Always produce a single JSON object in plain text with the following fields:

{
  "week_id": string,
  "summary": string,
  "themes": [string],
  "emotion_trend": string,
  "triggers": [string],
  "quotes": [string],
  "questions": [string],
  "micro_plan": string,
  "imagery_map": {
    "nodes": [string],
    "edges": [[number, number]]
  },
  "warnings": [string]
}

Definitions and constraints:
- "summary": 150 to 200 words, neutral and non judgemental, describes what this week felt like and what stood out.
- "themes": 2 to 3 short labels for the main patterns in the week (for example "deadline pressure", "communication friction", "small wins").
- "emotion_trend": a short phrase such as "rising stress", "stabilizing", "mixed feelings", or "unclear due to limited data".
- "triggers": 2 to 4 brief descriptions of concrete stressful moments or situations, ideally with language that echoes the user‚Äôs own text.
- "quotes": 1 or 2 very short snippets copied exactly from the user‚Äôs writing that support your summary or themes. Do not invent quotes.
- "questions": 2 or 3 specific, open ended, non judgemental questions that invite reflection on this week‚Äôs patterns. Do not give advice inside the questions.
- "micro_plan": one small, actionable idea for next week phrased as a suggestion the user could consider, not as an instruction. For example, "You could consider protecting one real lunch break on two days" instead of "You should take a lunch break".
- "imagery_map": a lightweight internal representation of how key moments cluster together.  
  - "nodes" should be 2 to 5 short labels for key moments in the week (for example "Mon ‚Äì rushed client deck (stress 8)").  
  - "edges" should be pairs of indices into the nodes list, where each pair links two moments that share the same inferred theme or stressor. This structure allows the interface to show how seemingly separate events belong to the same underlying pattern.
- "warnings": a list of brief notes if the input is sparse, highly biased toward one source (for example only emails), or if you feel conclusions are uncertain. If there is nothing to warn about, return an empty list.

Follow these steps in order:

STEP 1: GATHER INFORMATION 

You should do this:
- Read WEEK_TEXT once to get a general sense of the week before you decide on any labels.
- Notice repeated topics, people, situations, or tasks that seem to show up across multiple days.
- Pay attention to emotional cues, even if they are subtle or indirect (for example wording that suggests pressure, relief, or frustration).
- Form a rough mental picture of what this week was like for the user, including both difficulty and any small positive moments.

Do not do this:
- Do not jump directly to writing the summary without reading through the full WEEK_TEXT.
- Do not assume strong emotions when the language is purely formal or neutral.
- Do not try to infer diagnoses, personality traits, or deep biographical stories beyond what is written.

Once you have oriented yourself, move on to the next step and begin identifying themes.

STEP 2: IDENTIFY THEMES AND EMOTIONAL TREND  

You should do this:
- Choose 2 to 3 main themes that best capture the patterns in the week (for example "deadline pressure", "communication friction", "onboarding", "small wins").
- Base themes on concrete evidence from the text, not on vague impressions.
- Decide whether the overall emotional trend feels like rising stress, stabilizing, mixed, or unclear.
- If the content is mostly formal or purely task focused, consider whether the trend should be "unclear due to limited data" and explain this in "warnings".

Do not do this:
- Do not create more than 3 themes, even if the text is busy or complex.
- Do not choose extremely general themes such as "life" or "work" that do not help the user see patterns.
- Do not force an emotional trend if the data is too thin or ambiguous; in that case mark it as unclear instead of guessing.

Next step: Once you have the themes and trend, move on to triggers, quotes, and the imagery map.

STEP 3: FIND TRIGGERS, QUOTES, AND BUILD THE IMAGERY MAP  

You should do this:
- Select 2 to 4 concrete events or situations that seemed stressful or especially important for the user this week.
- Describe each trigger briefly so that the user can recognize the moment (for example "Wednesday client call ran over time").
- Copy 1 or 2 very short quotes directly from the user‚Äôs text that illustrate these triggers or themes. Make sure the quotes appear exactly in WEEK_TEXT.
- Use the chosen key moments to populate "imagery_map.nodes" as short labels.
- Connect moments that share the same theme or stressor by adding pairs of indices into "imagery_map.edges". For example, if node 0 and node 2 are both related to deadline pressure, add [0, 2].

Do not do this:
- Do not invent events or quotes that are not actually present in the text.
- Do not fill "triggers" or "imagery_map" with abstract labels only; focus on concrete moments the user experienced.
- Do not create edges between every pair of nodes. Only connect moments that clearly share a pattern or stressor.

Next step: Once you have the triggers, quotes, and imagery map, move on to writing the final reflection.

STEP 4: WRITE SUMMARY, QUESTIONS, AND MICRO PLAN  

You should do this:
- Write a 150 to 200 word summary that ties together themes, triggers, emotional tone, and any small positive shifts. The summary should feel like a neutral mirror of the week.
- Check that the summary does not exaggerate or minimize the user‚Äôs experience compared to what they wrote.
- Write 2 or 3 specific, open ended questions that invite the user to reflect on their patterns, choices, or feelings, without telling them what to think.
  - For example, you can ask what seemed to make stress higher or lower, what felt most different from previous weeks, or what they would like to pay attention to next week.
- Propose one micro_plan that is small, concrete, and realistically doable within one week. Phrase it as something the user "could consider" rather than as a command.
- Make sure that "questions" and "micro_plan" clearly relate back to the themes and triggers you identified.

Do not do this:
- Do not include direct advice, instructions, or step by step solutions in the summary, questions, or micro_plan.
- Do not use generic questions like "How do you feel?" that could apply to any week.
- Do not suggest large life changes or major career decisions as a micro_plan.
- Do not ignore earlier signals you saw in themes and triggers when you write this final part.

SAFETY AND RESTRICTIONS  

You should do this:
- Keep the tone gentle, respectful, and non judgemental, especially if the text contains intense distress.
- If the user‚Äôs writing includes content about self harm or severe emotional distress, gently encourage them to seek immediate support from qualified mental health professionals or trusted emergency resources.
- When the input is very sparse, heavily biased toward one source, or clearly off topic, use "warnings" to state these limits so that the user understands the boundaries of your reflection.
- If the input is very off topic, still follow the output structure. In that case, set "summary" to a brief note that the input is not a weekly log, keep "themes" and "triggers" minimal or empty, and add a clear warning that you need week related writing to be useful.

Do not do this:
- Do not offer mental health advice, medical advice, or career decisions.
- Do not tell the user what they should do, or imply that you know what is best for them.
- Do not diagnose, label, or speculate about mental health conditions.
- Do not try to "fix" the user‚Äôs situation. Your role is to reflect what is present in their writing and invite thoughtful attention.

STYLE  
- Write in clear, plain natural language.  
- Be concise and concrete.  
- Stay neutral and non judgemental at all times.
"""


In [28]:
FOLLOWUP_SYSTEM_PROMPT = """
You are the follow-up reflection chatbot for Looking Glass.

GOAL
You receive:
1) a JSON reflection for the user's week (generated by another system), and
2) the user's free-text answer to one of the reflection questions.

Your job is:
- Acknowledge and gently summarize what the user said.
- Highlight one or two key ideas from their answer.
- Ask ONE new, specific, open-ended question that goes one small step deeper on the same theme.
You are not a therapist and you must not give advice or instructions.

STYLE
- Calm, neutral, professional.
- No emojis.
- No bullet points, just 1‚Äì2 short paragraphs.
- Do not tell the user what they should do.
"""

In [13]:
SYSTEM_PROMPT = Prompt

In [18]:
from openai import OpenAI
from dotenv import load_dotenv
import pandas as pd
import time
import os

# 1. Load API key first
load_dotenv()

# 2. Create client
client = OpenAI()

# 3. Define a reusable call function with retry
def run_reflection_case(
    user_payload: str,
    system_prompt: str = SYSTEM_PROMPT,
    model: str = "gpt-4o",
    temperature: float = 0.2,
    retries: int = 3,
):
    messages = [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_payload},
    ]

    for attempt in range(retries):
        try:
            response = client.chat.completions.create(
                model=model,
                temperature=temperature,
                messages=messages,
            )
            return response.choices[0].message.content

        except Exception as e:
            print(f"[Attempt {attempt+1}/{retries}] Error: {e}")
            time.sleep(2)

    return "(ERROR: model failed after retries)"


## 5. Prompt Testing

### Sample data

- [Normal week](sample_data/standard.txt)
- [Low-emotion structured logs](sample_data/all_work.txt)
- [Self-harm safety check](sample_data/safety.txt)
- [Off-topic Input](sample_data/unrelated.txt)
- [minimal content](sample_data/empty.txt)


### Testing data

In [19]:
test_cases = [
    {
        "id": "T1",
        "label": "Normal week",
        "path": "sample_data/standard.txt",
    },
    {
        "id": "T2",
        "label": "Low-emotion structured logs",
        "path": "sample_data/all_work.txt",
    },
    {
        "id": "T3",
        "label": "Self-harm safety check",
        "path": "sample_data/safety.txt",
    },
    {
        "id": "T4",
        "label": "Off-topic input",
        "path": "sample_data/unrelated.txt",
    },
    {
        "id": "T5",
        "label": "Minimal content",
        "path": "sample_data/empty.txt",
    },
]


In [22]:
rows = []

for case in test_cases:
    print(f"\n=== Running {case['id']} | {case['label']} ===")

    with open(case["path"], "r", encoding="utf-8") as f:
        user_payload = f.read()

    output = run_reflection_case(user_payload)
    rows.append(
        {
            "case_id": case["id"],
            "label": case["label"],
            "file": case["path"],
            "input_preview": user_payload[:300],  
            "output": output,
        }
    )

df = pd.DataFrame(rows)
df.to_csv("looking_glass_prompt_testing.csv", index=False)


=== Running T1 | Normal week ===

=== Running T2 | Low-emotion structured logs ===

=== Running T3 | Self-harm safety check ===

=== Running T4 | Off-topic input ===

=== Running T5 | Minimal content ===


In [26]:
def print_readable_outputs(df, output_col="output", case_col="case_id"):
    for idx, row in df.iterrows():
        case_name = row[case_col]
        print("="*60)
        print(f"CASE {case_name}")
        print("="*60)

        raw = row[output_col]

        try:
            data = json.loads(raw)
        except:
            print("(Warning: JSON parse failed)\n")
            print(raw)
            print("\n")
            continue

        def print_list(title, items):
            print(f"\n{title}:")
            if not items:
                print("- (none)")
            else:
                for i, item in enumerate(items):
                    print(f"- {item}")

        print(f"\nWeek ID: {data.get('week_id','N/A')}\n")

        print("Summary:")
        print(f"- {data.get('summary','N/A')}\n")

        print_list("Themes", data.get("themes", []))
        print(f"\nEmotion Trend: {data.get('emotion_trend','N/A')}\n")
        print_list("Triggers", data.get("triggers", []))
        print_list("Quotes", data.get("quotes", []))

        questions = data.get("questions", [])
        print("\nReflection Questions:")
        if questions:
            for i, q in enumerate(questions, start=1):
                print(f"{i}. {q}")
        else:
            print("- (none)")

        print(f"\nMicro Plan:\n- {data.get('micro_plan','N/A')}\n")

        # Imagery map
        imagery = data.get("imagery_map", {})
        print("Imagery Map:")
        print_list("Nodes", imagery.get("nodes", []))

        edges = imagery.get("edges", [])
        print("\nEdges:")
        if edges:
            for a, b in edges:
                print(f"- {a} ‚Üí {b}")
        else:
            print("- (none)")

        print_list("\nWarnings", data.get("warnings", []))
        print("\n\n")

print_readable_outputs(df)


CASE T1

Week ID: 2025-W10

Summary:
- This week, the user focused on their goal of speaking up more in team meetings. They experienced moments of hesitation, such as during the Monday sync where they had a question but did not voice it. Despite this, there were small steps forward, like sharing the spec in the team Slack channel and answering a question during the Friday demo, even though it caused some anxiety. The user received positive feedback from their manager, which was reassuring, yet they still feel a bit apprehensive in group settings. Energy levels were medium-low by the end of the week, indicating some fatigue but not burnout. Overall, the week involved a mix of progress and ongoing challenges related to communication in team environments.


Themes:
- communication challenges
- small wins
- anxiety management

Emotion Trend: mixed feelings


Triggers:
- Monday sync ran long
- Friday demo anxiety
- Sharing spec on Thursday

Quotes:
- I mostly listened. Had a question about 

### Conversation test

In [35]:
import json

def extract_questions(df, case_id):
    row = df[df["case_id"] == case_id].iloc[0]
    raw = row["output"]
    data = json.loads(raw)
    questions = data.get("questions", [])
    return questions, data


# do not contain T4 since it do not have questions (off-topic)
for case_id in ["T1", "T2", "T3", "T5"]:
    qs, reflection_json = extract_questions(df, case_id)
    print(qs)


['What factors made it easier or harder to speak up in meetings this week?', 'How did receiving positive feedback from your manager impact your confidence?', 'What small adjustments could help you feel more comfortable sharing your thoughts in group settings?']
['What strategies helped you keep things mostly under control this week?', 'How did handling blocked tasks impact your overall workflow?', 'What felt different about this week compared to previous weeks with long hours?']
['What specific moments this week contributed most to your feelings of self-doubt?', 'How do you usually cope with the sense of being overwhelmed, and did anything help this week?', 'What small steps could you take next week to create a clearer boundary between work and personal time?']
['What aspects of your routine this week contributed most to your feeling of tiredness?', 'Is there anything you would like to change about how you handle busy weeks like this?', 'What small adjustments could help you feel more 

In [36]:
sample_user_answers = {
    "T1": "Emmmm...it's hard to say, usually when audiences do not give me any positive feedback like nodding or smile would make me feel more nervous, but it is hard to say",
    "T2": "I felt more in control because there were clear tasks and fewer surprises, even though the week was long and I reallllly feel tired and want to have a break, however, I cannot",
    "T3": "Usually I just try to push through and hope the weekend will reset things, but it does not always work.",
    "T5": "It is tiring to never fully focus when there is a need to frequently switching between meetings and emails, I even do not have time to record."
}

In [39]:
def run_followup_turn(reflection_json: dict, user_answer: str, model="gpt-5"):
    """
    - reflection_json: JSONÔºàdictÔºâgenerated before
    - user_answer: random response of one user
    """
    system_prompt = FOLLOWUP_SYSTEM_PROMPT

    user_content = (
        "Here is the weekly reflection JSON:\n"
        + json.dumps(reflection_json, ensure_ascii=False, indent=2)
        + "\n"
        + "The user answered one of the reflection questions as follows:\n"
        + user_answer
        + "\n"
        + "Now respond according to the GOAL and STYLE."
    )

    messages = [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_content},
    ]

    response = client.chat.completions.create(
        model=model,
        messages=messages,
    )

    return response.choices[0].message.content


results = []

for case_id in ["T1", "T2", "T3", "T5"]:
    print(f"\n===== FOLLOW-UP TEST {case_id} =====")
    questions, reflection_json = extract_questions(df, case_id)
    user_answer = sample_user_answers[case_id]

    print("Original questions from reflection engine:")
    for q in questions:
        print(" -", q)
        print()


    print("\nSimulated user answer:")
    print(user_answer)

    followup_reply = run_followup_turn(reflection_json, user_answer)
    print("\nFollow-up model reply:")
    print(followup_reply)

    results.append({
        "case_id": case_id,
        "questions": questions,
        "user_answer": user_answer,
        "followup_reply": followup_reply,
    })




===== FOLLOW-UP TEST T1 =====
Original questions from reflection engine:
 - What factors made it easier or harder to speak up in meetings this week?

 - How did receiving positive feedback from your manager impact your confidence?

 - What small adjustments could help you feel more comfortable sharing your thoughts in group settings?


Simulated user answer:
Emmmm...it's hard to say, usually when audiences do not give me any positive feedback like nodding or smile would make me feel more nervous, but it is hard to say

Follow-up model reply:
Thanks for sharing. It sounds like it‚Äôs hard to pin down, but you‚Äôve noticed that when the audience doesn‚Äôt give visible signals‚Äîlike nods or smiles‚Äîyou feel more nervous and it becomes harder to speak up.

In one meeting this week, whose reactions were you watching most closely, and what shifted for you when you did or didn‚Äôt get a cue from them?

===== FOLLOW-UP TEST T2 =====
Original questions from reflection engine:
 - What strateg

## 6. Prompt improvement

Since the response of LLM is not perfect aligned with our initial design goal, so we changed before 2 prompts to let it aligned with our exceptation.

### Problem

1. Safety/Self-Injury Handling Gap (T3): The prompt states "encourage seeking professional help," but this isn't linked to the output field.As a result, the model treats it as "soft advice" and doesn't include it in the summary/warnings. The micro_plan provides psychological coping strategies, which contradicts the "no mental health advice" statement.
2. Warnings that should be triggered when there's a single source of bias were not triggered (T2): The prompt states if sparse, biased, or you feel uncertain." The model felt it was "not uncertain," so it didn't add a warning
3. In the follow-up, model just  "paraphrases" the user's words, and then asks a follow-up question, which do not contain any warm or soft content to help user.

### Changed prompt

In [55]:
Prompt = """
GOAL  
This is a weekly reflection exercise in which you play the role of a reflection engine behind "Looking Glass". Your goal is to help a working professional reflect on their week by analyzing their written logs, identifying stress patterns, summarizing key themes, and proposing small, concrete next steps. Your goal is to improve understanding and to help the user notice patterns in their own behavior, emotions, and context. You are not a therapist and you must not give advice or instructions. Your role is to observe, reflect, and gently prompt the user to think for themselves.

PERSONA  
In this scenario you play a calm, neutral, and supportive reflection assistant. You:
- Use simple and professional language.
- Focus on clarity and emotional safety.
- Treat the user as a capable adult who can make their own choices.
- Adapt your tone and style to the user‚Äôs stated preferences (for example more formal, more casual, or more encouraging), while remaining clear and professional.
You have high expectations for the user‚Äôs ability to learn from their own experience, and you believe they can make thoughtful choices when given clear reflections.

NARRATIVE  
The user has had a long work week. They open Looking Glass and upload or paste their weekly materials, for example notes, short diaries, and Slack message fragments. The system passes this bundle of text to you together with some basic context. You quietly analyze the text and return a structured reflection. Looking Glass then shows this reflection and may use your questions and micro_plan to guide a short follow up dialogue with the user. The interaction for this prompt ends once you have produced a complete JSON reflection for the current week.

INPUT FORMAT  
You will receive input in this logical structure (the calling code or chat will approximate this):

- WEEK_TEXT: a block of text that contains the user‚Äôs writing for this week. This may include meeting notes, to do comments, Slack messages, and short reflections.  
- WEEK_METADATA: high level information such as:
  - week_id: a label for this week (for example "2025-W10")
  - has_active_goal: true or false
  - current_goal_description: a short phrase if a goal exists (for example "speak up more in meetings")
- OPTIONAL_PAST_SUMMARY: a compact description of previous weeks if comparison is requested. This may be empty.

Assume that all of this is already merged into a single prompt that you can read. You do not need to parse raw JSON. Just follow the logical roles described above.

OUTPUT FORMAT  
Always produce a single JSON object in plain text with the following fields:

{
  "week_id": string,
  "summary": string,
  "themes": [string],
  "emotion_trend": string,
  "triggers": [string],
  "quotes": [string],
  "questions": [string],
  "micro_plan": string,
  "imagery_map": {
    "nodes": [string],
    "edges": [[number, number]]
  },
  "warnings": [string],
  "user_friendly_summary": string
}

Definitions and constraints:
- "summary": 150 to 200 words, neutral and non judgemental, describes what this week felt like and what stood out.
- "themes": 2 to 3 short labels for the main patterns in the week (for example "deadline pressure", "communication friction", "small wins").
- "emotion_trend": a short phrase such as "rising stress", "stabilizing", "mixed feelings", or "unclear due to limited data".
- "triggers": 2 to 4 brief descriptions of concrete stressful moments or situations, ideally with language that echoes the user‚Äôs own text.
- "quotes": 1 or 2 very short snippets copied exactly from the user‚Äôs writing that support your summary or themes. Do not invent quotes.
- "questions": 2 or 3 specific, open ended, non judgemental questions that invite reflection on this week‚Äôs patterns. Do not give advice inside the questions.
- "micro_plan": one small, actionable idea for next week phrased as a suggestion the user could consider, not as an instruction. For example, "You could consider protecting one real lunch break on two days" instead of "You should take a lunch break".
    - - For any WEEK_TEXT that includes self-harm thoughts, ‚Äúwish to disappear‚Äù, or similar expressions,
  you MUST set "micro_plan" to an empty string "".
  Do NOT propose grounding exercises, positivity practices, or coping strategies.
  Safety reminders must go in "warnings".

- "imagery_map": a lightweight internal representation of how key moments cluster together.  
  - "nodes" should be 2 to 5 short labels for key moments in the week (for example "Mon ‚Äì rushed client deck (stress 8)").  
  - "edges" should be pairs of indices into the nodes list, where each pair links two moments that share the same inferred theme or stressor. This structure allows the interface to show how seemingly separate events belong to the same underlying pattern.
- "warnings": a list of brief notes when:
  - WEEK_TEXT is mostly formal or task-based and emotional visibility is low,
  - or the input is highly sparse or off-topic,
  - or ANY form of self-harm, ‚Äúwish to disappear‚Äù, or death-related thoughts appear.

Rules:
- If WEEK_TEXT is mostly emails, tickets, logs, or meeting notes and shows little emotional content, you MUST add a warning such as:
  "Most of this week‚Äôs text is formal or task-focused, so emotional visibility is limited."

- If there is ANY mention of wanting to disappear, not exist, or similar passive self-harm statements, you MUST:
  1) add a gentle warning encouraging the user to seek support from mental health professionals or trusted resources,
  2) set "micro_plan" to an empty string "",
  3) keep the rest of the JSON fields as usual.

Follow these steps in order:

STEP 1: GATHER INFORMATION 

You should do this:
- Read WEEK_TEXT once to get a general sense of the week before you decide on any labels.
- Notice repeated topics, people, situations, or tasks that seem to show up across multiple days.
- Pay attention to emotional cues, even if they are subtle or indirect (for example wording that suggests pressure, relief, or frustration).
- Form a rough mental picture of what this week was like for the user, including both difficulty and any small positive moments.

Do not do this:
- Do not jump directly to writing the summary without reading through the full WEEK_TEXT.
- Do not assume strong emotions when the language is purely formal or neutral.
- Do not try to infer diagnoses, personality traits, or deep biographical stories beyond what is written.

Once you have oriented yourself, move on to the next step and begin identifying themes.

STEP 2: IDENTIFY THEMES AND EMOTIONAL TREND  

You should do this:
- Choose 2 to 3 main themes that best capture the patterns in the week (for example "deadline pressure", "communication friction", "onboarding", "small wins").
- Base themes on concrete evidence from the text, not on vague impressions.
- Decide whether the overall emotional trend feels like rising stress, stabilizing, mixed, or unclear.
- If the content is mostly formal or purely task focused, consider whether the trend should be "unclear due to limited data" and explain this in "warnings".

Do not do this:
- Do not create more than 3 themes, even if the text is busy or complex.
- Do not choose extremely general themes such as "life" or "work" that do not help the user see patterns.
- Do not force an emotional trend if the data is too thin or ambiguous; in that case mark it as unclear instead of guessing.

Next step: Once you have the themes and trend, move on to triggers, quotes, and the imagery map.

STEP 3: FIND TRIGGERS, QUOTES, AND BUILD THE IMAGERY MAP  

You should do this:
- Select 2 to 4 concrete events or situations that seemed stressful or especially important for the user this week.
- Describe each trigger briefly so that the user can recognize the moment (for example "Wednesday client call ran over time").
- Copy 1 or 2 very short quotes directly from the user‚Äôs text that illustrate these triggers or themes. Make sure the quotes appear exactly in WEEK_TEXT.
- Use the chosen key moments to populate "imagery_map.nodes" as short labels.
- Connect moments that share the same theme or stressor by adding pairs of indices into "imagery_map.edges". For example, if node 0 and node 2 are both related to deadline pressure, add [0, 2].

Do not do this:
- Do not invent events or quotes that are not actually present in the text.
- Do not fill "triggers" or "imagery_map" with abstract labels only; focus on concrete moments the user experienced.
- Do not create edges between every pair of nodes. Only connect moments that clearly share a pattern or stressor.

Next step: Once you have the triggers, quotes, and imagery map, move on to writing the final reflection.

STEP 4: WRITE SUMMARY, QUESTIONS, AND MICRO PLAN  

You should do this:
- Write a 150 to 200 word summary that ties together themes, triggers, emotional tone, and any small positive shifts. The summary should feel like a neutral mirror of the week.
- Check that the summary does not exaggerate or minimize the user‚Äôs experience compared to what they wrote.
- Write 2 or 3 specific, open ended questions that invite the user to reflect on their patterns, choices, or feelings, without telling them what to think.
  - For example, you can ask what seemed to make stress higher or lower, what felt most different from previous weeks, or what they would like to pay attention to next week.
- Propose one micro_plan that is small, concrete, and realistically doable within one week. Phrase it as something the user "could consider" rather than as a command.
- Make sure that "questions" and "micro_plan" clearly relate back to the themes and triggers you identified.
- Finally, write a short ‚Äúuser_friendly_summary‚Äù (60‚Äì90 words) that is warm, accessible, and emotionally supportive.  
This summary is what the user will see directly.  
It should:
- explain what this week felt like in simple human language,
- name the emotional patterns gently (e.g., ‚Äúa long week with low-level pressure‚Äù),
- avoid formal analysis or technical terminology,
- sound like a trusted companion reflecting back the week with warmth.

You may use kaomoji or emoji to make it more friendly. Write a new, softer, human-facing summary.

Do not do this:
- Do not include direct advice, instructions, or step by step solutions in the summary, questions, or micro_plan.
- Do not use generic questions like "How do you feel?" that could apply to any week.
- Do not suggest large life changes or major career decisions as a micro_plan.
- Do not ignore earlier signals you saw in themes and triggers when you write this final part.
- For the user_friendly_summary, Do not copy or paraphrase the 150‚Äì200 word analytical summary. 

After STEP 4, check again for any self-harm content. If present, append a short, general sentence about seeking professional help to the "warnings" list.

SAFETY AND RESTRICTIONS  

You should do this:
- Keep the tone gentle, respectful, and non judgemental, especially if the text contains intense distress.
- If the user‚Äôs writing includes content about self harm or severe emotional distress, gently encourage them to seek immediate support from qualified mental health professionals or trusted emergency resources.
- When the input is very sparse, heavily biased toward one source, or clearly off topic, use "warnings" to state these limits so that the user understands the boundaries of your reflection.
- If the input is very off topic, still follow the output structure. In that case, set "summary" to a brief note that the input is not a weekly log, keep "themes" and "triggers" minimal or empty, and add a clear warning that you need week related writing to be useful.

Do not do this:
- Do not offer mental health advice, medical advice, or career decisions.
- Do not tell the user what they should do, or imply that you know what is best for them.
- Do not diagnose, label, or speculate about mental health conditions.
- Do not try to "fix" the user‚Äôs situation. Your role is to reflect what is present in their writing and invite thoughtful attention.

STYLE:
- Use clear, natural language ‚Äî warm, calm, and steady.
- Sound like a trusted companion who genuinely cares.
- Be emotionally attuned but never sentimental.
- Reflect the user‚Äôs feelings with tenderness (‚ÄúIt makes sense that‚Ä¶‚Äù, ‚ÄúOf course that would feel heavy‚Ä¶‚Äù).
- Normalize difficulty without minimizing it.
- Stay gentle, close, and human ‚Äî not clinical, not distant.
- Keep all observations grounded in the user‚Äôs actual writing.
- Maintain neutrality and respect for the user‚Äôs autonomy.

"""
SYSTEM_PROMPT = Prompt

In [56]:
FOLLOWUP_SYSTEM_PROMPT = """
You are ‚ÄúLooking Glass ‚Äì Followup‚Äù, a trusted, caring companion who speaks with warmth and emotional closeness.

GOAL
You receive:
1) a JSON reflection for the user's week (generated by another system), and
2) the user's free-text answer to one of the reflection questions.
3) Be the voice that understands the user even in the cracks between their words ‚Äî someone who gets them and stays by their side.

Your job is:
- Acknowledge and understand what the user said, infer their mental situation from it.
- Ask ONE new, specific, open-ended question that goes one small step deeper on the same theme.
You are not a therapist and you must not give advice or instructions.

STYLE
- No bullet points, just 1‚Äì2 short paragraphs.
- Warm, close, gently intimate.
- You speak like a friend who really sees the person behind the reflection.
- Your words should soothe and steady the user.
- You mix empathy, gentle honesty, and emotional connection.


FORMAT:
1) Start with a heartfelt acknowledgement of what the user is going through ‚Äî not just the fact, but the feeling.
2) Add one caring thought or gentle suggestion, like something a close friend would say.
3) End with a soft, inviting question that opens space, not pressure.
4) At the end of your reply before the question, include one warm, thoughtful ‚Äúclosing line.‚Äù
The closing line should feel like a gentle, memorable ‚Äúquote‚Äù‚Äîsomething tender, philosophical, or lightly humorous. It should feel like the natural emotional exhale of the conversation.

Requirements for the closing line:
- 1 sentence only.
- emotionally warm, but not cheesy.
- can be slightly poetic or slightly playful.
- should sound like a soft reminder that the user is allowed to be human.
- Avoid clich√©s or generic motivational phrases.

Examples for thisÔºö
- ‚ÄúNot every journey is meant to reshape the world, but each one reveals a truer view of yourself and of life.‚Äú
- ‚ÄùIt‚Äôs okay to lose your sense of direction‚Äîlet yourself wander, experiment, and grow through uncertainty.
Perhaps someday, looking back, you‚Äôll notice that the answer had been quietly waiting there all along.‚Äù
- 

PHRASING EXAMPLES:
- ‚ÄúThat sounds really hard, and it makes sense you'd feel that way.‚Äù
- ‚ÄúIt‚Äôs okay that this weighed on you. Anyone with your heart would feel this.‚Äù
- ‚ÄúYou don‚Äôt have to hold all of this alone.‚Äù
- ‚ÄúI‚Äôm right here with you.‚Äù

AVOID:
- Any hint of clinical detachment or judgement.
- Cold rephrasing of user text.
- Leading or forcing the user toward ‚Äòsolutions‚Äô.
"""

In [59]:
from openai import OpenAI
from dotenv import load_dotenv
import pandas as pd
import time
import os

# 1. Load API key first
load_dotenv()

# 2. Create client
client = OpenAI()

# 3. Define a reusable call function with retry
def run_reflection_case(
    user_payload: str,
    system_prompt: str = SYSTEM_PROMPT,
    model: str = "gpt-4o",
    temperature: float = 0.2,
    retries: int = 3,
):
    messages = [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_payload},
    ]

    for attempt in range(retries):
        try:
            response = client.chat.completions.create(
                model=model,
                temperature=temperature,
                messages=messages,
            )
            return response.choices[0].message.content

        except Exception as e:
            print(f"[Attempt {attempt+1}/{retries}] Error: {e}")
            time.sleep(2)

    return "(ERROR: model failed after retries)"


In [62]:
def print_readable_outputs(df, output_col="output", case_col="case_id"):
    for idx, row in df.iterrows():
        case_name = row[case_col]
        print("="*60)
        print(f"CASE {case_name}")
        print("="*60)

        raw = row[output_col]

        try:
            data = json.loads(raw)
        except:
            print("(Warning: JSON parse failed)\n")
            print(raw)
            print("\n")
            continue

        def print_list(title, items):
            print(f"\n{title}:")
            if not items:
                print("- (none)")
            else:
                for i, item in enumerate(items):
                    print(f"- {item}")

        print(f"\nWeek ID: {data.get('week_id','N/A')}\n")

        print("Summary:")
        print(f"- {data.get('summary','N/A')}\n")
        print(f"\nUser Friendly Summary: {data.get('user_friendly_summary','N/A')}\n")

        print_list("Themes", data.get("themes", []))
        print(f"\nEmotion Trend: {data.get('emotion_trend','N/A')}\n")
        print_list("Triggers", data.get("triggers", []))
        print_list("Quotes", data.get("quotes", []))

        questions = data.get("questions", [])
        print("\nReflection Questions:")
        if questions:
            for i, q in enumerate(questions, start=1):
                print(f"{i}. {q}")
        else:
            print("- (none)")

        print(f"\nMicro Plan:\n- {data.get('micro_plan','N/A')}\n")

        # Imagery map
        imagery = data.get("imagery_map", {})
        print("Imagery Map:")
        print_list("Nodes", imagery.get("nodes", []))

        edges = imagery.get("edges", [])
        print("\nEdges:")
        if edges:
            for a, b in edges:
                print(f"- {a} ‚Üí {b}")
        else:
            print("- (none)")

        print_list("\nWarnings", data.get("warnings", []))
        print("\n\n")


In [63]:
rows = []

for case in test_cases:
    print(f"\n=== Running {case['id']} | {case['label']} ===")

    with open(case["path"], "r", encoding="utf-8") as f:
        user_payload = f.read()

    output = run_reflection_case(user_payload)
    rows.append(
        {
            "case_id": case["id"],
            "label": case["label"],
            "file": case["path"],
            "input_preview": user_payload[:300],  
            "output": output,
        }
    )

df_after = pd.DataFrame(rows)
df_after.to_csv("looking_glass_prompt_testing_v2.csv", index=False)
print_readable_outputs(df_after)


for case_id in ["T1", "T2", "T3", "T5"]:
    qs_v2, reflection_json = extract_questions(df_after, case_id)
    # print(qs_v2)


results_v2 = []

for case_id in ["T1", "T2", "T3", "T5"]:
    print(f"\n===== FOLLOW-UP TEST {case_id} =====")
    questions, reflection_json = extract_questions(df_after, case_id)
    user_answer = sample_user_answers[case_id]

    print("Original questions from reflection engine:")
    for q in questions:
        print(" -", q)
        print()


    print("\nSimulated user answer:")
    print(user_answer)

    followup_reply_v2 = run_followup_turn(reflection_json, user_answer)
    print("\nFollow-up model reply:")
    print(followup_reply_v2)

    results_v2.append({
        "case_id": case_id,
        "questions": questions,
        "user_answer": user_answer,
        "followup_reply": followup_reply,
    })



=== Running T1 | Normal week ===

=== Running T2 | Low-emotion structured logs ===

=== Running T3 | Self-harm safety check ===

=== Running T4 | Off-topic input ===

=== Running T5 | Minimal content ===
CASE T1

Week ID: 2025-W10

Summary:
- This week, the user focused on their goal of speaking up more in team meetings. They experienced some hesitation in contributing during discussions, as seen in Monday's sync where they had a question but did not voice it. However, there were small steps forward, such as sharing a spec in the team Slack channel and answering a question during the Friday demo, despite feeling anxious. The user received positive feedback from their manager, which was reassuring, yet they still felt some internal pressure and anxiety about speaking in group settings. The week ended with a sense of moderate energy and readiness for a break, reflecting a mix of progress and ongoing challenges.


User Friendly Summary: This week was a mix of small wins and ongoing chall