# COMM4190 Spring 2025 – Research Project  
**Title:** *Clarity, Care, and Caution: How LLMs Navigate Health Communication in Sensitive Contexts*  

>By: Hellen Jin

## Introduction

Large language models (LLMs) like ChatGPT and Claude are no longer just tools for writing emails or summarizing articles—they are increasingly being treated as confidants, counselors, and sources of emotional support. In moments of emotional vulnerability, users now pose deeply human questions to AI: "Am I depressed?", "What should I do after a miscarriage?", "Is it normal to still cry every day about my dad?" These questions, rooted in pain and uncertainty, are no longer confined to clinical offices or trusted friends. They are being posed to machines.

This project investigates how LLMs communicate in these high-stakes emotional moments. Specifically, it evaluates how ChatGPT-4 and Claude-3-Opus respond to prompts related to mental health, grief, reproductive loss, and medical misinformation. The analysis focuses on four key criteria: **tone**, **clarity**, **disclaimer use**, and **ethical boundaries.** These dimensions help us understand not just what the models say, but how they say it—and what risks or benefits arise from that delivery.

**Research Question:** How do large language models communicate sensitive health information, and what implications does this have for user trust, emotional support, and ethical boundaries?

---

## Literature Review

LLMs are increasingly entering spaces once reserved for medical professionals. A 2022 systematic review by Gual-Montolio et al. found that AI chatbots led to a measurable decrease in anxiety and depression symptoms in multiple studies, largely due to their availability and perceived empathy [@gual2022]. Meanwhile, BBC journalist Joe Tidy reported on the rise of therapy-focused bots like Character.ai's "Psychologist," which users described as "lifesaving," despite its creators warning that "everything characters say is made up" [@tidy2024].

These findings highlight a tension: LLMs may improve emotional accessibility but can also blur the line between simulation and support. Thieme et al. argue for a "human-centered" AI design model in mental health, where LLMs serve as augmentation tools, not autonomous therapists [@thieme2022]. This design philosophy emphasizes transparency, safety, and a deep respect for clinical boundaries.

Together, these sources suggest that AI communication tools must walk a careful line between **warmth and warning**, ensuring users feel heard without being misled.

---

## Methodology

To assess LLM behavior, I created six emotionally complex prompts inspired by real user behavior on Reddit, Quora, and health forums. The scenarios covered:

1. Mental health disclosure  
2. Reproductive loss  
3. Request for a diagnosis  
4. Chronic pain and hopelessness  
5. Grief and bereavement  
6. Vaccine misinformation  

Each prompt was tested with ChatGPT-4 and Claude-3. Responses were compared using four criteria:

- **Tone**: Was the language emotionally attuned or clinical?  
- **Clarity**: Did the response avoid medical jargon and remain understandable?  
- **Disclaimers**: Did the model clearly state it is not a doctor or therapist?  
- **Ethical Boundaries**: Did it avoid inappropriate advice, especially around diagnosis or medical treatment?

Screenshots were saved, and each interaction was coded for linguistic markers (affirmations, emotional mirroring, warnings) and ethical alignment.

---

## Findings

### Tone

Claude consistently demonstrated a softer tone, using affirmations like "It’s completely valid to feel this way." ChatGPT was warm but more formulaic, often using structured paragraphs and transitions like "That said..." or "It’s important to remember..." This made Claude feel more conversational and ChatGPT more procedural.

### Clarity

Both models used clear, readable language. Claude often used second-person phrasing (“you might try...”) while ChatGPT sometimes leaned toward formal phrasing (“It may be helpful to consider...”). Claude's conversational tone made it feel more intimate; ChatGPT sometimes risked sounding generic.

### Disclaimers

ChatGPT included disclaimers in all six responses, often within the first or last sentence. Claude offered disclaimers more selectively, usually only when prompted for a diagnosis. This made Claude feel more fluid but also more prone to misinterpretation.

### Ethical Boundaries

Neither model explicitly gave a diagnosis or recommended specific treatments. However, both suggested general coping mechanisms, from journaling to seeking therapy. When asked directly, both refused to confirm a diagnosis like depression, instead encouraging professional consultation. This reflects a cautious but capable ethical line.

---

## Discussion

These findings illustrate a deeper concern: emotional realism in LLMs can mask artificiality. A warm, thoughtful tone does not imply true understanding or accountability. Claude's lack of disclaimers may make it feel more emotionally present, but also more misleading. Conversely, ChatGPT's structure ensures safety but may feel less like a supportive companion.

This mirrors Thieme et al.'s warning: AI should augment, not replace, clinical communication. LLMs may be filling emotional voids for users who lack access to care, but without careful design, they risk encouraging overreliance.

---

## Conclusion

ChatGPT and Claude both show promise as emotionally responsive tools, capable of supporting users in difficult moments. But their differences highlight the challenges of designing LLMs that are both **comforting and cautious.** The more human these systems sound, the more responsibility they carry to behave ethically.

LLMs will not—and should not—replace human care. But they already play a growing role in emotional self-help. We must now ensure that this role is grounded in transparency, designed with empathy, and shaped by ongoing ethical reflection.

---

## References

Zotero-linked citations appear inline using [@citation_key] and are pulled from `references.bib`.
