In [1]:
import lmstudio as lms

SERVER_API_HOST = "localhost:1234"

model_list = lms.list_downloaded_models('llm')

Split the dataset into a much smaller sample for LLM testing, but try to keep the number of positive and negative cases equal.

In [2]:
import pandas as pd
import numpy as np

testing_df = pd.read_feather('test.feather')

n = 5
testing_df['combined_text'] = [' '.join(testing_df['text'].iloc[i:i+n+1]) for i in range(len(testing_df))]
testing_df['expected'] = [np.mean(testing_df['sponsored'].iloc[i:i+n+1]) for i in range(len(testing_df))]

In [3]:


df_true = testing_df[testing_df['sponsored'] == True]
df_false = testing_df[testing_df['sponsored'] == False]

df_sample = pd.concat([
    df_true.sample(100, random_state=162),
    df_false.sample(100, random_state=162)
])

df_sample = df_sample.sample(frac=1, random_state=162).reset_index(drop=True)

# Prompt Templates

In [5]:
DEFAULT_PROMPT_TEMPLATE = """
Given the following text, reason whether the text is a sponsored segment.

Text: "{text}"

Determine if the text is a sponsored segment. Respond with 'TRUE' for sponsored or 'FALSE' for non-sponsored.
Answer:
"""

CARP_PROMPT_TEMPLATE = """
Given the following text and clues, reason whether the text is a sponsored segment.

Text: "{text}"

Clues: "{clues}"

Perform some reasoning BEFORE coming to a conclusion: 

Based on these clues, determine if the text is a sponsored segment. Respond with 'TRUE' for sponsored or 'FALSE' for non-sponsored.
Answer:

DO NOT MAKE AN ANSWER BEFORE REASONING
"""

EXAMPLES_CARP_PROMPT_TEMPLATE = """
Given the following text, whether a particular piece of text is part of a sponsored segment.
Here are some examples:
{examples}

Now predict for this text, first identifying clues that indicate whether it is a sponsored segment and providing reasoning:

Text: "{text}"

Clues: "{clues}"

Perform some reasoning BEFORE coming to a conclusion: 

Based on these clues, determine if the text is a sponsored segment. Respond with 'TRUE' for sponsored or 'FALSE' for non-sponsored.
Answer:

DO NOT MAKE AN ANSWER BEFORE REASONING
"""

CLUE_PROMPT = """
Given this text, determine some clues that indicate whether it is a sponsored segment. Return the clues only, no preamble and do not predict if it is a sponsored segment.
Text: "{text}"
"""

EXAMPLE1 = """
Example 1:
Text: "plus it really helps out the channel the free premium membership gives you unlimited access to mustknow topics so that you can improve your\nskills and learn new things all free for two months so get skillshare and improve yourself now"
Clues: "
*   Direct call to action ("get Skillshare")
*   Mention of benefits for both the viewer *and* the channel.
*   Promotion of a specific product/service (Skillshare).
*   Offer of a free trial/premium membership.
*   Emphasis on improving oneself, linking it to the service."

Reasoning:
The text includes several strong indicators of a sponsored segment:

Direct call to action: The phrase “get Skillshare and improve yourself now” is a clear directive, which is typical in promotional or sponsored content.

Mention of benefits to both viewer and channel: The line “it really helps out the channel” suggests that the creator benefits directly from viewers taking action, a common practice in affiliate or sponsored partnerships.

Promotion of a specific product/service: “Skillshare” is explicitly named and promoted throughout the text.

Free trial/premium membership: The mention of “free premium membership” and “all free for two months” is a marketing point frequently used in sponsorships to encourage sign-ups.

Self-improvement messaging tied to the product: The text highlights benefits like “improve your skills” and “learn new things,” aligning the product with personal development, a common angle in promotional messaging.

All these clues are consistent with how sponsored content is typically structured. The presence of a direct benefit to the creator, a specific product offer, and persuasive language targeting the audience’s self-improvement goals strongly support the conclusion that this is sponsored content.

Answer: TRUE

---
"""

EXAMPLE2 = """
Example 2:
Text: "plus it really helps out the channel the free premium membership gives you unlimited access to mustknow topics so that you can improve your\nskills and learn new things all free for two months so get skillshare and improve yourself now"
Clues: "
*   Focus on invoicing/billing features.
*   Mention of accepting deposits.
*   Emphasis on tracking details (hours, clients, dates).
*   Promotional tone ("all that kind of stuff").
*   Directly naming the product ("books").
"

Reasoning:
Let's reason through the clues step by step:

Focus on invoicing/billing features:
The text mentions creating invoices and tracking hours, which are invoicing-related features. 

Mention of accepting deposits:
The text includes, "you can even accept deposits for that work", which directly matches this clue.

Emphasis on tracking details (hours, clients, dates):
It says, "you know exactly what you did when you did it who you did it for", which emphasizes tracking multiple details.

Promotional tone ("all that kind of stuff"):
The phrase "all that kind of stuff" adds a casual, promotional tone that often appears in spoken sponsored segments.

Directly naming the product ("books"):
It starts with "books also helps you...", which appears to be referencing a product name (possibly a shortened or stylized brand name).

Conclusion:
The clues are consistent with how sponsored content is typically structured.

Answer: TRUE

---
"""

In [15]:
import re

# Query the LLM and check if the text is sponsored
def query_llm(model_instance, text, case='Zero_Shot'):
    answer = 'Unknown'
    
    if case == 'Zero_Shot':
        clues = model_instance.respond(CLUE_PROMPT.format(text=text)).content
        response = model_instance.respond(CARP_PROMPT_TEMPLATE.format(text=text, clues=clues)).content
        answer = extract_classification(response)
        
    if case == 'One_Shot':
        clues = model_instance.respond(CLUE_PROMPT.format(text=text)).content
        response = model_instance.respond(EXAMPLES_CARP_PROMPT_TEMPLATE.format(text=text, clues=clues, examples=EXAMPLE1)).content
        answer = extract_classification(response)
        
    if case == 'Few_Shot':
        clues = model_instance.respond(CLUE_PROMPT.format(text=text)).content
        response = model_instance.respond(EXAMPLES_CARP_PROMPT_TEMPLATE.format(text=text, clues=clues, examples=EXAMPLE1+EXAMPLE2)).content
        answer = extract_classification(response)
        
    if case == 'CARP-LESS':
        response = model_instance.respond(DEFAULT_PROMPT_TEMPLATE.format(text=text)).content
        answer = extract_classification(response)
        
        
    return answer

# Extract True/False classification
def extract_classification(response_text):
    match = re.search(r'\b(TRUE|FALSE)\b', response_text, re.IGNORECASE)
    return match.group(0) if match else "Unknown"

# Main function to run LLM evaluation
def run_evaluation(df, models):
    df_results = df.copy()
    cases = ['Zero_Shot', 'One_Shot', 'CARP-LESS']
    
    for model_id in models:
        LLM = lms.llm(model_id)
        
        for case in cases:
            results = []
            for _, row in df.iterrows():
                result = query_llm(LLM, row['combined_text'], case=case)
                
                if result.lower() == 'true':
                    results.append(True)
                    continue
                    
                if result.lower() == 'false':
                    results.append(False)
                    continue
                    
                if result.lower() == 'unknown':
                    results.append(None)
                    
            df_results[model_id + '_' + case] = results
            
        LLM.unload()
        
    return df_results

# Run and store results
df_results = run_evaluation(df_sample, ['hermes-3-llama-3.2-3b', 'meta-llama-3.1-8b-instruct', 'granite-3.2-8b-instruct', 'gemma-3-27b-it'])
print("Evaluation complete.")

df_results.to_feather('llm_results.feather')

Evaluation complete.


In [16]:
df_results

Unnamed: 0,videoID,UUID,category,start,duration,text,sponsored,text_without_stopwords,combined_text,expected,...,hermes-3-llama-3.2-3b_CARP-LESS,meta-llama-3.1-8b-instruct_Zero_Shot,meta-llama-3.1-8b-instruct_One_Shot,meta-llama-3.1-8b-instruct_CARP-LESS,granite-3.2-8b-instruct_Zero_Shot,granite-3.2-8b-instruct_One_Shot,granite-3.2-8b-instruct_CARP-LESS,gemma-3-27b-it_Zero_Shot,gemma-3-27b-it_One_Shot,gemma-3-27b-it_CARP-LESS
0,th_KQOeh-Co,db67f847cbf9593e5dd0643f4ffaf91ebdee8263a1c56b...,sponsor,7.880,5.509,welcome welcome everyone and congratulations,True,welcome welcome everyone congratulations,welcome welcome everyone and congratulations i...,0.166667,...,False,True,,True,True,True,True,True,True,True
1,5VWTfXfm47g,f0801d6cffddc590a7732fdb0270fd8d73143ce4146320...,sponsor,300.639,3.521,again thank you to trade coffee for,True,thank trade coffee,again thank you to trade coffee for sponsoring...,0.333333,...,False,True,True,False,True,True,True,True,True,True
2,A0Lkh02_Ik4,,sponsor,777.740,3.810,havent seen gigging like that since i,False,havent seen gigging like since,havent seen gigging like that since i was a fr...,0.000000,...,False,,False,False,False,False,False,True,False,False
3,sRrEkZ8OqiQ,,sponsor,455.509,3.480,its gone boom and its gone it doesnt,False,gone boom gone doesnt,its gone boom and its gone it doesnt tell me i...,0.000000,...,False,,,False,True,True,False,True,True,False
4,5_3BUU9ZmNo,,sponsor,610.060,3.709,if you could just add more hotbar slots it\nwo...,False,could add hotbar slots would cut way amount time,if you could just add more hotbar slots it\nwo...,0.000000,...,False,,False,False,False,False,False,True,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
195,6HKI35P-m-w,437c57864759f62683d9851603566f4d8a98ff6167f570...,sponsor,192.540,4.650,verve just by going to verve co to,True,verve going verve co,verve just by going to verve co to jacksfilms...,1.000000,...,False,,True,False,True,True,False,True,True,True
196,wYWGf2rKTCY,89630073f16e6ab44c414d1557c4e7c467eec05033bf2f...,sponsor,602.000,5.120,by design a guide to elevating your,True,design guide elevating,by design a guide to elevating your drawing sk...,1.000000,...,False,True,False,False,True,True,False,True,False,True
197,oTtYPB5h47E,08a89748af8d746ce205e765091813fa1296db8371661f...,sponsor,620.740,3.570,the charges can either show up as what,True,charges either show,the charges can either show up as what actuall...,1.000000,...,False,False,True,False,True,True,False,True,False,False
198,17oZPYcpPnQ,6cf9b637e525869d190dc86875245563f5aaf77ab860cd...,sponsor,0.599,2.421,this video was made possible by brilliant,True,video made possible brilliant,this video was made possible by brilliant lear...,0.500000,...,False,False,,True,True,True,False,True,True,True


In [20]:
df_results2 = run_evaluation(df_sample, ['deepseek-r1-distill-qwen-7b'])
len(df_results2['deepseek-r1-distill-qwen-7b_One_Shot'])

KeyboardInterrupt: 

In [19]:
df_results

Unnamed: 0,videoID,UUID,category,start,duration,text,sponsored,text_without_stopwords,combined_text,expected,...,hermes-3-llama-3.2-3b_CARP-LESS,meta-llama-3.1-8b-instruct_Zero_Shot,meta-llama-3.1-8b-instruct_One_Shot,meta-llama-3.1-8b-instruct_CARP-LESS,granite-3.2-8b-instruct_Zero_Shot,granite-3.2-8b-instruct_One_Shot,granite-3.2-8b-instruct_CARP-LESS,gemma-3-27b-it_Zero_Shot,gemma-3-27b-it_One_Shot,gemma-3-27b-it_CARP-LESS
0,th_KQOeh-Co,db67f847cbf9593e5dd0643f4ffaf91ebdee8263a1c56b...,sponsor,7.880,5.509,welcome welcome everyone and congratulations,True,welcome welcome everyone congratulations,welcome welcome everyone and congratulations i...,0.166667,...,False,True,,True,True,True,True,True,True,True
1,5VWTfXfm47g,f0801d6cffddc590a7732fdb0270fd8d73143ce4146320...,sponsor,300.639,3.521,again thank you to trade coffee for,True,thank trade coffee,again thank you to trade coffee for sponsoring...,0.333333,...,False,True,True,False,True,True,True,True,True,True
2,A0Lkh02_Ik4,,sponsor,777.740,3.810,havent seen gigging like that since i,False,havent seen gigging like since,havent seen gigging like that since i was a fr...,0.000000,...,False,,False,False,False,False,False,True,False,False
3,sRrEkZ8OqiQ,,sponsor,455.509,3.480,its gone boom and its gone it doesnt,False,gone boom gone doesnt,its gone boom and its gone it doesnt tell me i...,0.000000,...,False,,,False,True,True,False,True,True,False
4,5_3BUU9ZmNo,,sponsor,610.060,3.709,if you could just add more hotbar slots it\nwo...,False,could add hotbar slots would cut way amount time,if you could just add more hotbar slots it\nwo...,0.000000,...,False,,False,False,False,False,False,True,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
195,6HKI35P-m-w,437c57864759f62683d9851603566f4d8a98ff6167f570...,sponsor,192.540,4.650,verve just by going to verve co to,True,verve going verve co,verve just by going to verve co to jacksfilms...,1.000000,...,False,,True,False,True,True,False,True,True,True
196,wYWGf2rKTCY,89630073f16e6ab44c414d1557c4e7c467eec05033bf2f...,sponsor,602.000,5.120,by design a guide to elevating your,True,design guide elevating,by design a guide to elevating your drawing sk...,1.000000,...,False,True,False,False,True,True,False,True,False,True
197,oTtYPB5h47E,08a89748af8d746ce205e765091813fa1296db8371661f...,sponsor,620.740,3.570,the charges can either show up as what,True,charges either show,the charges can either show up as what actuall...,1.000000,...,False,False,True,False,True,True,False,True,False,False
198,17oZPYcpPnQ,6cf9b637e525869d190dc86875245563f5aaf77ab860cd...,sponsor,0.599,2.421,this video was made possible by brilliant,True,video made possible brilliant,this video was made possible by brilliant lear...,0.500000,...,False,False,,True,True,True,False,True,True,True


In [21]:
df_results2

Unnamed: 0,videoID,UUID,category,start,duration,text,sponsored,text_without_stopwords,combined_text,expected,deepseek-r1-distill-qwen-7b_Zero_Shot,deepseek-r1-distill-qwen-7b_One_Shot,deepseek-r1-distill-qwen-7b_CARP-LESS
0,th_KQOeh-Co,db67f847cbf9593e5dd0643f4ffaf91ebdee8263a1c56b...,sponsor,7.880,5.509,welcome welcome everyone and congratulations,True,welcome welcome everyone congratulations,welcome welcome everyone and congratulations i...,0.166667,True,True,True
1,5VWTfXfm47g,f0801d6cffddc590a7732fdb0270fd8d73143ce4146320...,sponsor,300.639,3.521,again thank you to trade coffee for,True,thank trade coffee,again thank you to trade coffee for sponsoring...,0.333333,True,True,False
2,A0Lkh02_Ik4,,sponsor,777.740,3.810,havent seen gigging like that since i,False,havent seen gigging like since,havent seen gigging like that since i was a fr...,0.000000,True,False,True
3,sRrEkZ8OqiQ,,sponsor,455.509,3.480,its gone boom and its gone it doesnt,False,gone boom gone doesnt,its gone boom and its gone it doesnt tell me i...,0.000000,True,True,True
4,5_3BUU9ZmNo,,sponsor,610.060,3.709,if you could just add more hotbar slots it\nwo...,False,could add hotbar slots would cut way amount time,if you could just add more hotbar slots it\nwo...,0.000000,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...
195,6HKI35P-m-w,437c57864759f62683d9851603566f4d8a98ff6167f570...,sponsor,192.540,4.650,verve just by going to verve co to,True,verve going verve co,verve just by going to verve co to jacksfilms...,1.000000,True,True,True
196,wYWGf2rKTCY,89630073f16e6ab44c414d1557c4e7c467eec05033bf2f...,sponsor,602.000,5.120,by design a guide to elevating your,True,design guide elevating,by design a guide to elevating your drawing sk...,1.000000,True,True,False
197,oTtYPB5h47E,08a89748af8d746ce205e765091813fa1296db8371661f...,sponsor,620.740,3.570,the charges can either show up as what,True,charges either show,the charges can either show up as what actuall...,1.000000,True,False,False
198,17oZPYcpPnQ,6cf9b637e525869d190dc86875245563f5aaf77ab860cd...,sponsor,0.599,2.421,this video was made possible by brilliant,True,video made possible brilliant,this video was made possible by brilliant lear...,0.500000,True,True,True


In [22]:
df_results['deepseek-r1-distill-qwen-7b_Zero_Shot'] = df_results2['deepseek-r1-distill-qwen-7b_Zero_Shot']
df_results['deepseek-r1-distill-qwen-7b_One_Shot'] = df_results2['deepseek-r1-distill-qwen-7b_One_Shot']
df_results['deepseek-r1-distill-qwen-7b_CARP-LESS'] = df_results2['deepseek-r1-distill-qwen-7b_CARP-LESS']

In [23]:
df_results

Unnamed: 0,videoID,UUID,category,start,duration,text,sponsored,text_without_stopwords,combined_text,expected,...,meta-llama-3.1-8b-instruct_CARP-LESS,granite-3.2-8b-instruct_Zero_Shot,granite-3.2-8b-instruct_One_Shot,granite-3.2-8b-instruct_CARP-LESS,gemma-3-27b-it_Zero_Shot,gemma-3-27b-it_One_Shot,gemma-3-27b-it_CARP-LESS,deepseek-r1-distill-qwen-7b_Zero_Shot,deepseek-r1-distill-qwen-7b_One_Shot,deepseek-r1-distill-qwen-7b_CARP-LESS
0,th_KQOeh-Co,db67f847cbf9593e5dd0643f4ffaf91ebdee8263a1c56b...,sponsor,7.880,5.509,welcome welcome everyone and congratulations,True,welcome welcome everyone congratulations,welcome welcome everyone and congratulations i...,0.166667,...,True,True,True,True,True,True,True,True,True,True
1,5VWTfXfm47g,f0801d6cffddc590a7732fdb0270fd8d73143ce4146320...,sponsor,300.639,3.521,again thank you to trade coffee for,True,thank trade coffee,again thank you to trade coffee for sponsoring...,0.333333,...,False,True,True,True,True,True,True,True,True,False
2,A0Lkh02_Ik4,,sponsor,777.740,3.810,havent seen gigging like that since i,False,havent seen gigging like since,havent seen gigging like that since i was a fr...,0.000000,...,False,False,False,False,True,False,False,True,False,True
3,sRrEkZ8OqiQ,,sponsor,455.509,3.480,its gone boom and its gone it doesnt,False,gone boom gone doesnt,its gone boom and its gone it doesnt tell me i...,0.000000,...,False,True,True,False,True,True,False,True,True,True
4,5_3BUU9ZmNo,,sponsor,610.060,3.709,if you could just add more hotbar slots it\nwo...,False,could add hotbar slots would cut way amount time,if you could just add more hotbar slots it\nwo...,0.000000,...,False,False,False,False,True,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
195,6HKI35P-m-w,437c57864759f62683d9851603566f4d8a98ff6167f570...,sponsor,192.540,4.650,verve just by going to verve co to,True,verve going verve co,verve just by going to verve co to jacksfilms...,1.000000,...,False,True,True,False,True,True,True,True,True,True
196,wYWGf2rKTCY,89630073f16e6ab44c414d1557c4e7c467eec05033bf2f...,sponsor,602.000,5.120,by design a guide to elevating your,True,design guide elevating,by design a guide to elevating your drawing sk...,1.000000,...,False,True,True,False,True,False,True,True,True,False
197,oTtYPB5h47E,08a89748af8d746ce205e765091813fa1296db8371661f...,sponsor,620.740,3.570,the charges can either show up as what,True,charges either show,the charges can either show up as what actuall...,1.000000,...,False,True,True,False,True,False,False,True,False,False
198,17oZPYcpPnQ,6cf9b637e525869d190dc86875245563f5aaf77ab860cd...,sponsor,0.599,2.421,this video was made possible by brilliant,True,video made possible brilliant,this video was made possible by brilliant lear...,0.500000,...,True,True,True,False,True,True,True,True,True,True


In [24]:
df_results.to_feather('llm_results_with_deepseek.feather')