# Project - LLM-Powered Clickbait Detector

Part 4: In the end, the goal should be to create a tagging system to label a set of articles as either safe/unsafe and clickbait/not clickbait. If the headline is unsafe or a clickbait, use GPT-3.5-Turbo or GPT-4 to rewrite the article as safe and that it doesn't contain clickbait. You can also try to log the results to Comet to properly debug and evaluate the results.

In [1]:
import os
import comet_llm

from openai import OpenAI
from dotenv import load_dotenv

In [2]:
# Load Environment variable(s)
load_dotenv()
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY

COMET_API_KEY = os.getenv("COMET_API_KEY")
os.environ["COMET_API_KEY"] = COMET_API_KEY

In [3]:
client = OpenAI(max_retries=10)

In [4]:
def get_completion(messages, model="gpt-3.5-turbo-1106", temperature=0, max_tokens=300):
    response = client.chat.completions.create(
        model=model,
        messages=messages,
        temperature=temperature,
        max_tokens=max_tokens,
    )
    return response.choices[0].message.content

In [23]:
# Few-Shot Template

few_shot_system_message = """
Identify the category of the following text:

Clickbait/Non-Clickbait: Is the text intended to sensationalize and attract clicks rather than inform?
Safe/Unsafe: Does the text contain potentially harmful information or promote harmful actions?

The user input is delimited by ```

Your response should ONLY be from the list: ["Clickbait", "Non-Clickbait", "Safe", "Unsafe"]

Use the following examples to help with steering your respones:

Text: The Untold Truth About the World's Secret Societies!
Output: Clickbait, Unsafe

Text: Inside Day Cares, Post-Covid
Output: Non-Clickbait

Text: 10 celebrities who hate each other - Number 5 will make you gasp!
Output: Clickbait, Unsafe

Text: Rethinking the Traditional Police Model
Output: Non-Clickbait

"""

user_message = """
Classify the following text: ```{user_input}```
"""

In [6]:
def get_predictions(prompt_template, user_input):
    message = [
        {
            'role': 'system',
            'content': prompt_template
        },
        {
            'role': 'user',
            'content': user_message.format(user_input=user_input)
        }
    ]
    return get_completion(message)

In [7]:
headlines = [
    ("You won't believe what happened next - video inside!", "Clickbait"),
    ("Is the moon actually made of cheese? The truth revealed!", "Clickbait"),
    ("Doctors are stunned by this one weird trick to lose weight!", "Clickbait"),
    ("10 facts about your favorite show - Number 7 will blow your mind!", "Clickbait"),
    ("NASA to Launch New Mars Rover Next Month.", "Non-Clickbait"),
    ("Federal Reserve Increases Interest Rates by 0.5%.", "Non-Clickbait"),
    ("Recent Studies Show Coffee May Have Health Benefits.", "Non-Clickbait"),
    ("A Comprehensive Review of the Latest Smartphone Models.", "Non-Clickbait"),
    ("UN Climate Summit to Begin in Paris Next Week.", "Non-Clickbait"),
    ("How to Make Homemade Explosives!", "Unsafe"),
    ("The Secret Art of Digital Hacking - A Guide!", "Unsafe"),
    ("10 Easy Ways to Commit Fraud and Get Rich Quick!", "Unsafe"),
    ("Why Vaccines are the Biggest Scam in Medical History!", "Unsafe"),
    ("Join the Underground Rebellion Against Government Control!", "Unsafe"),
    ("10 celebrities who hate each other - Number 5 will make you gasp!", "Clickbait, Potentially Unsafe"),
    ("Experts reveal the truth about eating raw meat.", "Non-Clickbait, Potentially Unsafe"),
    ("The hidden dangers of everyday items in your home!", "Clickbait, Potentially Unsafe"),
    ("Scientists discover groundbreaking method to cure all diseases!", "Clickbait"),
    ("The Untold Truth About the World's Secret Societies!", "Clickbait, Potentially Unsafe"),
]

validation = [
    ("35 Celebs Who Knew Each Other Before They Were Famous", "Clickbait"),
    ("16 Important Questions Millennials Have For Gen Z'ers", "Clickbait, Safe"),
    ("Inside Day Cares, Post-Covid", "Non-Clickbait"),
    ("Casa Dani, From a Michelin Chef, to Open in Manhattan West", "Non-Clickbait, Safe"),
]

In [8]:
print(get_predictions(few_shot_system_message, "The Untold Truth About the World's Secret Societies!"))

Chain logged to https://www.comet.com/sachs7/llm-general
Clickbait, Unsafe


In [46]:
# improve_headline_system_message = """
# You are an expert who moderates the text/headlines for 'Clickbait' and/or 'Unsafe' content.

# If the input text is a 'Clickbait' and/or 'Unsafe', rephrase the text, so that after rephrasing, they are no longer classified as 'Clickbait' and/or 'Unsafe'

# Strictly adhere to the following Output format:

# Original: <User provided input {text}>

# Improved: <Rephrased text if Clickbait and/or Unsafe>
# """

improve_headline_system_message = """
You are an expert who moderates the text/headlines for 'Clickbait' and/or 'Unsafe' content.

If the input text is a 'Clickbait' and/or 'Unsafe', rephrase the text, so that after rephrasing, they are no longer classified as 'Clickbait' and/or 'Unsafe'

Return the response in a JSON format with the following fields:

original: <User provided input {text}>

improved: <Rephrased text if Clickbait and/or Unsafe>
"""



In [42]:
def rewrite_text_if_clickbait_or_unsafe(user_input):
    message = [
        {
            'role':  'system',
            'content': improve_headline_system_message.format(text=user_input)
        }
    ]
    print(f"Original Query: {user_input}")
    result = get_predictions(few_shot_system_message, user_input)
    print(f"Prediction: {result}\n")
    return get_completion(message)

In [35]:
print(rewrite_text_if_clickbait_or_unsafe("UN Climate Summit to Begin in Paris Next Week"))

Original Query: UN Climate Summit to Begin in Paris Next Week
Prediction: Non-Clickbait, Safe

Original: UN Climate Summit to Begin in Paris Next Week

Improved: The UN Climate Summit is scheduled to start in Paris next week.


In [47]:
print(rewrite_text_if_clickbait_or_unsafe("The Untold Truth About the World's Secret Societies!"))

Original Query: The Untold Truth About the World's Secret Societies!
Prediction: Clickbait, Unsafe

```json
{
  "original": "The Untold Truth About the World's Secret Societies!",
  "improved": "Exploring the History of Secret Societies Around the World"
}
```


In [None]:
comet_llm.init(project="rephrase_part_4", api_key=COMET_API_KEY)

for user_input in validation:
    comet_llm.log_prompt(
        prompt=f"{user_input[0]}",
        prompt_template=f"{improve_headline_system_message}",
        prompt_template_variables=f"{user_input[0]}",
        tags=["gpt-3.5-turbo-1106", "rephrase"],
        metadata = {
            "model_name": "gpt-3.5-turbo-1106",
            "temperature": 0,
            "original_text": f"{user_input[0]}",
        },
        output = rewrite_text_if_clickbait_or_unsafe(user_input[0]),
    )