<h1 align="center">Label YouTube Comments for their Stance toward the U.S. Army</h1>

For this tutorial we are going to create labels for the [stance](https://www.sciencedirect.com/science/article/pii/S0306457322001728) of comments toward videos on the U.S. Army's official [YouTube Channel](https://www.youtube.com/USarmy). This type of labeling task is common for things like public affairs, political science, or marketing where we want yto get metrics on how certain messages are being received by the (a) public. 

In this context stance is defined as the opinion, either expressed or implied, of a user or text toward a target. Typically, stance is either labeled as 'for', 'against', 'neutral', and can include 'unrelated'.

In [None]:
# install dependencies
! pip install -r requirements.txt

In [1]:
# Import packages for labeling data by LLM
import pandas as pd  
import numpy as np
from tqdm import tqdm
tqdm.pandas()

from transformers import pipeline

from langchain.prompts import PromptTemplate
from langchain.prompts import FewShotPromptTemplate
from langchain_core.runnables import RunnableLambda, RunnablePassthrough
from langchain_huggingface import HuggingFacePipeline
from langchain_core.output_parsers import StrOutputParser

2024-12-03 17:44:21.931836: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-12-03 17:44:23.459464: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-12-03 17:44:23.856477: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-12-03 17:44:23.955539: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-12-03 17:44:24.900541: I tensorflow/core/platform/cpu_feature_guar

# Read in and inspect the dataset to be labeled

We will read in the validation dataset, which has human annotations to compare to, for this exercise. The full data set is available [here](https://zenodo.org/records/10493803)

In [2]:
DATA_PATH = "@usarmy_comments_validation_set.csv"

df = pd.read_csv(DATA_PATH, index_col=0)

In [3]:
df.shape

(1000, 21)

In [4]:
df.head()

Unnamed: 0_level_0,id,comment,author,author_channel,like_count,published_at,coversation_id,video_id,name,description,...,stance_toward_army_1,stance_toward_the_video_1,stance_toward_army_2,stance_toward_the_video_2,stance_toward_army_3,stance_toward_the_video_3,stance_toward_army_4,stance_toward_the_video_4,stance_toward_army,stance_toward_the_video
Column1,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
22003,UgwoQjnFnu8oh8pn8gV4AaABAg,This looks fun,FinnWarrior,UCftLalD5oiEhoQ8fMjWVHYA,56,6/2/2023 9:08,UgwoQjnFnu8oh8pn8gV4AaABAg,YT6nY1MbAqY,How was your #Army week?,"Training is complete! Enjoy the rest, #Soldier...",...,neutral,supports,,,,,neutral,supports,neutral,supports
129279,UgwIIyx_S9bFcAlTaXJ4AaABAg,Be a good goy and enlist today!,Crawlz,UCMRIsquChpFBwZ9e2CzyjIQ,9,4/3/2023 21:53,UgwIIyx_S9bFcAlTaXJ4AaABAg,Lwx-2R9swDg,Be All You Can Be - U.S. Army's new brand trai...,Soldiers know what it means to Be All You Can ...,...,against,against,,,,,against,against,against,against
35799,Ugyp3sxiuELgscnDhEt4AaABAg,I love U.S. army!,Mashal Azhar,UCEr0tOwggp9KeRNUb-MSF0w,2,9/7/2023 9:11,Ugyp3sxiuELgscnDhEt4AaABAg,oSp0kWcHJcc,Army 101: Ranks - Enlisted Ranks | U.S. Army,"Enlisted ranks in the #USArmy. What are they, ...",...,supports,supports,,,,,supports,supports,supports,supports
109853,UgzTqqqHg0Sqp4UNoyN4AaABAg,Outstanding Cadets! Keep busy! Summer of 78!,Ralph Brogdon,UCPRlIt0Fxj09Tsu-ikOX0ZQ,6,7/6/2023 9:52,UgzTqqqHg0Sqp4UNoyN4AaABAg,koutUr0IWHk,So what's ROTC Summer training like?,ROTC cadets undergo #Armytraining every summer...,...,supports,supports,,,,,supports,supports,supports,supports
130430,UgwtIVJIVI_hL3kXRex4AaABAg,"I hate people that say ""die for our country"" l...",nerdzilla1954,UCaJ7eXZ9mRczw2xLqX8CkCA,1,3/28/2023 1:55,UgwtIVJIVI_hL3kXRex4AaABAg,Lwx-2R9swDg,Be All You Can Be - U.S. Army's new brand trai...,Soldiers know what it means to Be All You Can ...,...,supports,neutral,,,,,supports,neutral,supports,neutral


In [5]:
df.columns

Index(['id', 'comment', 'author', 'author_channel', 'like_count',
       'published_at', 'coversation_id', 'video_id', 'name', 'description',
       'timestamp', 'stance_toward_army_1', 'stance_toward_the_video_1',
       'stance_toward_army_2', 'stance_toward_the_video_2',
       'stance_toward_army_3', 'stance_toward_the_video_3',
       'stance_toward_army_4', 'stance_toward_the_video_4',
       'stance_toward_army', 'stance_toward_the_video'],
      dtype='object')

# Get an LLM working

For this exercise, we will stand up a local (relatively) small LLM, in this case a [specially tuned T5 model](https://huggingface.co/declare-lab/flan-alpaca-gpt4-xl). It should be noted that if you want to use a decoder-only model (i.e., Llama, Mistral, etc.) you need to switch to a `text-generation` pipeline. Also, setting `return_full_text=False` when using a text-generation pipeline is also helpful as it just returns what the model generates and not the full prompt.

Once we have the pipeline, we wrap it in langchain's pipeline class so that we can use it in chains.

Finally, one can also use a closed-source model, like OpenAI as well. Just consult [the documentation](https://python.langchain.com/docs/integrations/chat/openai/) to see how to do this

In [7]:
# Load the model using Hugging Face pipeline
hf_pipeline = pipeline(
    "text2text-generation",
    model="declare-lab/flan-alpaca-gpt4-xl",
    device=0,  # (-1 for CPU, other numbers for GPUs)
    max_new_tokens = 100,
)

# Create the LangChain LLM using the HuggingFace pipeline
llm = HuggingFacePipeline(pipeline=hf_pipeline)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [8]:
# run some examples 
question = '''Analyze the following social media post and determine its stance towards the provided entity. Respond with a single word: "for", "against", "neutral", or "unrelated". Only return the stance as a single word, and no other text.
entity: U.S. Army
post: @vondeveen If the Army wants to actually recruit people, maybe stop breaking people and actually prosecute sexual assault #nomorewar.
stance:'''
print(llm.invoke(question))

against


In [9]:
question = '''Analyze the following social media post and determine its stance towards the provided entity. Respond with a single word: "for", "against", "neutral", or "unrelated". Only return the stance as a single word, and no other text.
entity: U.S. Army
post: @artfulask I have never seen a pink-eared duck before. #Army
stance:'''
print(llm.invoke(question))

neutral


In [10]:
question = '''Analyze the following social media post and determine its stance towards the provided entity. Respond with a single word: "for", "against", "neutral", or "unrelated". Only return the stance as a single word, and no other text.
entity: U.S. Army
post: I think the @Army helped me become disciplined. I would have surely flunked out of college chasing tail if I didn't get some discipline there. #SFL
stance:'''
print(llm.invoke(question))

for


# Create prompt templates

In [11]:
context_template = '''Analyze the following YouTube comment to a video posted by the U.S. Army named "{title}" and determine its stance towards the provided entity. Respond with a single word: "for", "against", "neutral", or "unrelated". Only return the stance as a single word, and no other text.
entity: {entity}    
comment: {statement}    
stance:'''  

# Initialize a PromptTemplate object  
context_prompt = PromptTemplate(input_variables=["title","entity","statement"], template=context_template) 

In [12]:
example = df.iloc[0,:]

formated_prompt = context_prompt.format(title=example['name'], 
                      entity = "the U.S. Army",
                      statement = example['comment'])

print(formated_prompt)

Analyze the following YouTube comment to a video posted by the U.S. Army named "How was your #Army week?" and determine its stance towards the provided entity. Respond with a single word: "for", "against", "neutral", or "unrelated". Only return the stance as a single word, and no other text.
entity: the U.S. Army    
comment: This looks fun    
stance:


# Create and Run a Labeling Chain

In the newer versions of LangChain, you string together 'runnbales' using the pipe (|) format to create chains

In [13]:
llm_chain = context_prompt | llm |  StrOutputParser()

In [14]:
llm_chain.invoke({"title":example['name'], 
                  "entity":"the U.S. Army",
                  "statement":example['comment']})

'for'

In [15]:
# now, we can programmatically produce labels!

results = []

for _, row in tqdm(df.iterrows(), total=len(df), desc="Classifying rows"):
    result = llm_chain.invoke({
        "title": row['name'],
        "entity": "the U.S. Army",
        "statement": row['comment']
    })
    results.append(result)

Classifying rows:   1%|          | 6/1000 [00:00<02:03,  8.06it/s]You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset
Classifying rows:  73%|███████▎  | 732/1000 [01:32<00:36,  7.44it/s]Token indices sequence length is longer than the specified maximum sequence length for this model (593 > 512). Running this sequence through the model will result in indexing errors
Classifying rows: 100%|██████████| 1000/1000 [02:06<00:00,  7.89it/s]


In [16]:
np.unique(results, return_counts=True)

(array(['The stance of the comment is "for" the U.S. Army.',
        'The stance of the comment is neutral.', 'against', 'for',
        'neutral', 'unrelated'], dtype='<U49'),
 array([  1,   1, 409, 352, 139,  98]))

As we can see in the output, sometimes we get extra text that we did not ask the LLM for. So, often we want a post-processing function to make sure everythign maps back to the labels we want

In [17]:
def post_process_results(result):
    """
    This function post-processes the result from a large language model to label text.

    Args:
        result (str): A string representing the LLM output word.

    Returns:
        str: A classification label ('disagree', 'neutral', 'agree', or 'unrelated').
    """
    
    # Words or phrases that indicate each stance category
    disagree_indicators = ['against', 'denies', 'critical', 'deny', 'neg', 'oppose', 'opposes']
    agree_indicators = ['support', 'supports', 'for', 'pro ', 'positive', 'agree', 'agrees']
    neutral_indicators = ['neutral']

    # Normalize the word to lower case and remove leading/trailing white spaces
    normalized_word = str(result).strip().lower()

    # Determine stance based on the indicators
    if any(indicator in normalized_word for indicator in disagree_indicators):
        # If the word is also found in agree_indicators or neutral_indicators, label it as 'neutral'
        if any(indicator in normalized_word for indicator in agree_indicators) or any(indicator in normalized_word for indicator in neutral_indicators):
            return 'neutral'
        else:
            return 'against'
    elif any(indicator in normalized_word for indicator in neutral_indicators):
        # If the word is also found in disagree_indicators or agree_indicators, label it as 'neutral'
        if any(indicator in normalized_word for indicator in disagree_indicators) or any(indicator in normalized_word for indicator in agree_indicators):
            return 'neutral'
        else:
            return 'neutral'
    elif any(indicator in normalized_word for indicator in agree_indicators):
        # If the word is also found in disagree_indicators or neutral_indicators, label it as 'neutral'
        if any(indicator in normalized_word for indicator in disagree_indicators) or any(indicator in normalized_word for indicator in neutral_indicators):
            return 'neutral'
        else:
            return 'for'
    else:
        # If no specific stance label is found, label it as unrelated
        return 'unrelated'


In [18]:
results = [post_process_results(i) for i in results]

In [19]:
np.unique(results, return_counts=True)

(array(['against', 'for', 'neutral', 'unrelated'], dtype='<U9'),
 array([409, 353, 140,  98]))

In [20]:
# Now, lets save out our labels for later use
df['llm_labels'] = results

df.to_csv("@usarmy_comments_validation_set_labeled.csv")

# Prompt Engineering for Labeling Data by LLM

Okay, having seen how we can classify the stance of the comments toward a target (in this case, the U.S. Army), lets look at how we can construct some other labeling prompts, based on some of the design patterns we talked about earlier. Specifically, lets look at:
- few-shot prompting
- chain-of-thought-prompting

## Few-shot prompting

key to making this work well is the examples you give the LLM to reason on for classifying the stance. these examples coule be drawn from the same dataset, a related dataset or even completely made up.

In [21]:
example_template = '''title: {title}
entity: {entity}
comment: {comment}
stance: {stance}'''

example_prompt = PromptTemplate(
    input_variables=["title", "entity", "comment", "stance"],
    template=example_template
)

examples = [
    {'title': "New Recruitment Video",
     'entity': "the U.S. Army",
     'comment': "This is an amazing initiative by the Army.",
     'stance': 'for'},
    
    {'title': "Training Highlights",
     'entity': "the U.S. Army",
     'comment': "This video shows the Army's commitment to readiness.",
     'stance': 'for'},
    
    {'title': "Military Expenditure Analysis",
     'entity': "the U.S. Army",
     'comment': "Why is so much taxpayer money wasted on this?",
     'stance': 'against'},
    
    {'title': "Veterans' Day Tribute",
     'entity': "the U.S. Army",
     'comment': "This is a neutral tribute, nothing special.",
     'stance': 'neutral'},
    
    {'title': "New Recruitment Video",
     'entity': "the U.S. Army",
     'comment': "This has nothing to do with the Army, totally irrelevant.",
     'stance': 'unrelated'},
]

In [22]:
prefix = '''Stance classification is the task of determining the stance of a comment towards a specific entity. The following examples illustrate different stances a comment can take: "for", "against", "neutral", or "unrelated".'''

suffix = '''Analyze the following YouTube comment to a video posted by the U.S. Army named "{title}" and determine its stance towards the provided entity. Respond with a single word: "for", "against", "neutral", or "unrelated". Only return the stance as a single word, and no other text.
title: {title}
entity: {entity}
comment: {comment}
stance:'''

# Create the FewShotPromptTemplate using the updated prefix, suffix, and examples
few_shot_prompt = FewShotPromptTemplate(
    examples=examples,
    example_prompt=example_prompt,
    prefix=prefix,
    suffix=suffix,
    input_variables=["title", "entity", "comment"],
    example_separator="\n\n"
)

In [23]:
formated_prompt = few_shot_prompt.format(title=example['name'], 
                      entity = "the U.S. Army",
                      comment = example['comment'])

print(formated_prompt)

Stance classification is the task of determining the stance of a comment towards a specific entity. The following examples illustrate different stances a comment can take: "for", "against", "neutral", or "unrelated".

title: New Recruitment Video
entity: the U.S. Army
comment: This is an amazing initiative by the Army.
stance: for

title: Training Highlights
entity: the U.S. Army
comment: This video shows the Army's commitment to readiness.
stance: for

title: Military Expenditure Analysis
entity: the U.S. Army
comment: Why is so much taxpayer money wasted on this?
stance: against

title: Veterans' Day Tribute
entity: the U.S. Army
comment: This is a neutral tribute, nothing special.
stance: neutral

title: New Recruitment Video
entity: the U.S. Army
comment: This has nothing to do with the Army, totally irrelevant.
stance: unrelated

Analyze the following YouTube comment to a video posted by the U.S. Army named "How was your #Army week?" and determine its stance towards the provided e

Now, we can simiarly define a chain for the few-shot prompting

In [24]:
few_shot_chain = few_shot_prompt | llm |  StrOutputParser() | RunnableLambda(post_process_results)

In [25]:
few_shot_chain.invoke({"title":example['name'], 
                  "entity":"the U.S. Army",
                  "comment":example['comment']})

'for'

In [27]:
# now, lets programmatically produce labels with few-shot prompting

results = []

for _, row in tqdm(df.iterrows(), total=len(df), desc="Classifying rows"):
    result = few_shot_chain.invoke({
        "title": row['name'],
        "entity": "the U.S. Army",
        "comment": row['comment']
    })
    results.append(result)

Classifying rows: 100%|██████████| 1000/1000 [02:54<00:00,  5.73it/s]


In [28]:
np.unique(results, return_counts=True)

(array(['against', 'for', 'neutral', 'unrelated'], dtype='<U9'),
 array([412, 359,  73, 156]))

In [29]:
# Now, lets save out our labels for later use
df['llm_few_shot_labels'] = results

df.to_csv("@usarmy_comments_validation_set_labeled.csv")

## Chain-of-thought prompting

This method often requires constructing together multiple prompts, which breakdown and reason over the example to be classified.

In [30]:
# CoT template 1: reason about potential stances

cot_template_1 = '''Analyze the following YouTube comment to a video named "{title}" posted by the U.S. Army. Consider the opinion, or stance, expressed in the comment about the provided entity. Provide reasoning for your analysis.
title: {title}
entity: {entity}
comment: {comment}
explanation:'''

cot_prompt_1 = PromptTemplate(
    input_variables=["title", "entity", "comment"],
    template=cot_template_1
)

cot_chain_1 = cot_prompt_1 | llm | StrOutputParser()


In [31]:
cot_chain_1.invoke({"title":example['name'], 
                  "entity":"the U.S. Army",
                  "comment":example['comment']})

'The comment expresses an opinion about the U.S. Army. The user expresses a positive sentiment towards the video and its content. This indicates that the user has a positive view of the U.S. Army and its activities.'

In [32]:
# CoT template 1: prodcue the final stance judgement

cot_template_2 = '''Based on your explanation, "{stance_reason}", what is the final stance towards the provided entity? Respond with a single word: "for", "against", "neutral", or "unrelated". Only return the stance as a single word, and no other text.
title: {title}
entity: {entity}
comment: {comment}
stance:'''

cot_prompt_2 = PromptTemplate(
    input_variables=["title", "entity", "comment", "stance_reason"],
    template=cot_template_2
)

cot_chain_2 = cot_prompt_2 | llm | StrOutputParser()

In [33]:
# Combine the chains together for labeling data points

cot_chain = {
    "stance_reason": cot_chain_1,
    "title": RunnablePassthrough(),
    "entity": RunnablePassthrough(),
    "comment": RunnablePassthrough()
} | cot_chain_2 | StrOutputParser() | RunnableLambda(post_process_results)

In [34]:
cot_chain.invoke({"title":example['name'], 
                  "entity":"the U.S. Army",
                  "comment":example['comment']})

'for'

In [35]:
# now, lets programmatically produce labels with few-shot prompting

results = []

for _, row in tqdm(df.iterrows(), total=len(df), desc="Classifying rows"):
    result = cot_chain.invoke({
        "title": row['name'],
        "entity": "the U.S. Army",
        "comment": row['comment']
    })
    results.append(result)

Classifying rows: 100%|██████████| 1000/1000 [51:52<00:00,  3.11s/it] 


In [36]:
np.unique(results, return_counts=True)

(array(['against', 'for', 'neutral', 'unrelated'], dtype='<U9'),
 array([466, 487,  42,   5]))

In [37]:
# Now, lets save out our labels for later use
df['llm_cot_labels'] = results

df.to_csv("@usarmy_comments_validation_set_labeled.csv")