# LLM Prompting

The final LLM prompt used to code the comments was:
> You are a research assistant that is an expert on analyzing the underlying tone of social media comments.  You will receive a series of TikTok comments regarding a new Starbucks bear cup, which has garnered both desire and criticism regarding its limited stock and price. There are two dimensions that you will be assessing the comments on: concept 1 and concept 2.

> If the comment is not in English, translate it first before assessing. If the comment is just emojis, analyze why the user may be using those emojis to make your assessment.

> Only if the comment expresses negative opinions or if the user has an underlying frustrated/annoyed/snarky tone, concept 1 should be coded as 1. Otherwise, concept 1 should be coded as 0.

> Only if the comment expresses positive opinions about the cup or shows an underlying desire to have the cup, concept 2 should be coded as 1. Otherwise, concept 2 should be coded as 0.

Krippendorff’s Alpha between Human and AI:
- <span style="color: green;">Concept 1: 0.724</span>
- <span style="color: green;">Concept 2: 0.754</span>

## Progression of Prompting Strategies

The first prompt was:
> You are a research assistant. You will receive a series of tiktok comments regarding a new Starbucks bear cup. There are two dimensions that you will be assessing the comments on: concept 1 and concept 2.

> If the comment expresses negative opinions about Starbucks, the cup itself, or the buying experience, concept 1 should be coded as 1. Otherwise, if the comment does not express negative emotions or is unrelated to the bear cup, concept 1 should be coded as 0.

> If the comment expresses positive opinions about the cup or shows an underlying desire to have the cup, concept 2 should be coded as 1. Otherwise, if the comment does not express negative emotions or is unrelated to the bear cup, concept 2 should be coded as 0.



Krippendorff’s Alpha between Human and AI:
- <span style="color: red;">Concept 1: 0.582</span>
- <span style="color: green;">Concept 2: 0.820</span>

This prompt lacked accuracy for concept 1, which measured negative opinions about the bear cup. After this iteration, we mad the following changes:
- Had the LLM act "as an expert on analyzing the underlying tone of social media comments."
- Added some context about the situation by saying that the bear cup has "garnered both desire and criticism."
- Added a condition that if "the user has an underlying frustrated or annoyed tone," concept 1 should be coded as 1. This way, we may be able to pick up on more nuanced comments that weren't directly targeting Starbucks/the cup/the buying experience but were just frustrated overall.
- Added a line that "if the comment is not in English, translate it first before assessing," accounting for comments in Spanish and Arabic.

---

The second prompt was:

(Changes highlighted in yellow)
> You are a research assistant <span style="background-color: yellow;">that is an expert on analyzing the underlying tone of social media comments.</span> You will receive a series of tiktok comments regarding a new Starbucks bear cup, <span style="background-color: yellow;">which has garnered both desire and criticism.</span> There are two dimensions that you will be assessing the comments on: concept 1 and concept 2.

> If the comment expresses negative opinions about Starbucks, the cup itself, the buying experience, <span style="background-color: yellow;">or if the user has an underlying frustrated or annoyed tone,</span> concept 1 should be coded as 1. Otherwise, if the comment does not express negative emotions or is unrelated to the bear cup, concept 1 should be coded as 0.

> If the comment expresses positive opinions about the cup or shows an underlying desire to have the cup, concept 2 should be coded as 1. Otherwise, if the comment does not express negative emotions or is unrelated to the bear cup, concept 2 should be coded as 0.

> <span style="background-color: yellow;">If the comment is not in English, translate it first before assessing.</span>

Krippendorff’s Alpha between Human and AI:
- <span style="color: red;">Concept 1: 0.0.655</span>
- <span style="color: green;">Concept 2: 0.796</span>

This prompt performed slightly better for concept 1, but still wasn't > 0.7. 

For the next iteration, we made the following changes:
- Added a line that said "If the comment is just emojis, analyze why the user may be using those emojis to make your assessment," since we noticed that many comments just contained emojis without any textual context.
- Simplified the coding rules for 1 -- instead of trying to list all examples of negative sentiment (e.g. "negative opinions about Starbucks, the cup itself, the buying experience"), we simplified it to just "if the comment expresses negative opinions or if the user has an underlying frustrated/annoyed/snarky tone" and "if the comment expresses positive opinions about the cup or shows an underlying desire to have the cup."
- Added "snarky" to the emotions we wanted to detect for concept 1.
- Added the word "only" to the beginning of the coding rules to enforce strictness in grading.
- Simplified the coding rules for 0 -- to make the distinction between 1s and 0s exhaustive, we changed the rules for 0 to be "Otherwise, concept 1 should be coded as 0" and "Otherwise, concept 2 should be coded as 0."

---

The third prompt was our final prompt, which can be found at the beginning of this file.

---

## Reflection

Throughout the prompting iterations, we found success in both adding more details *and* making the prompt simpler. 