# 💬 Claimify reproduction study

In [1]:
# autoreload imports
%load_ext autoreload
%autoreload 2

In [4]:
from utils.chat import get_prompt, load_md_prompts
from langchain_openai import ChatOpenAI
from langchain_ollama import ChatOllama
from typing import List
import nltk
from utils.claimify import ClaimifyPipeline
from utils.chat import chat_factory

## 1. Input Question and Answer

In [8]:
get_prompt("claimify", "selection", {"excerpt": "excerpt", "question": "question", "sentence": "sentence"})

[SystemMessage(content='You are an assistant to a fact-checker. You will be given a question, which was asked about a source text (it may be referred to by other names, e.g., a \ndataset). You will also be given an excerpt from a response to the question. If it contains "[...]", this means that you are NOT seeing all sentences in the response. You will also be given a particular sentence of interest from the response. Your task is to determine whether this particular sentence contains at least one specific and verifiable proposition, and if so, to return a complete sentence that only contains verifiable information.   \n\nNote the following rules:\n- If the sentence is about a lack of information, e.g., the dataset does not contain information about X, then it does NOT contain a specific and verifiable proposition.\n- It does NOT matter whether the proposition is true or false.\n- It does NOT matter whether the proposition is relevant to the question.\n- It does NOT matter whether the 

In [9]:
load_md_prompts("claimify")

{'selection': ChatPromptTemplate(input_variables=['excerpt', 'question', 'sentence'], input_types={}, partial_variables={}, messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=[], input_types={}, partial_variables={}, template='You are an assistant to a fact-checker. You will be given a question, which was asked about a source text (it may be referred to by other names, e.g., a \ndataset). You will also be given an excerpt from a response to the question. If it contains "[...]", this means that you are NOT seeing all sentences in the response. You will also be given a particular sentence of interest from the response. Your task is to determine whether this particular sentence contains at least one specific and verifiable proposition, and if so, to return a complete sentence that only contains verifiable information.   \n\nNote the following rules:\n- If the sentence is about a lack of information, e.g., the dataset does not contain information about X, then it d

In [None]:
chat = ChatOpenAI(model="gpt-5")
excerpt = question = sentence = """Salena Zito has written a powerful new Book, “BUTLER: The Untold Story of the Near Assassination of Donald Trump and the Fight for America’s Heartland.” Salena was an eyewitness to that terrible day, and understands the unbreakable Spirit of our Movement to FIGHT, FIGHT, FIGHT. The Future of our Nation resides in the Hearts and Souls of the People of this Country, and Salena knows we will not rest until we, MAKE AMERICA GREAT AGAIN. Preorder your copy today: a.co/d/92QYTIJ"""
output = chat.invoke(get_prompt("claimify", "selection", {"excerpt": excerpt, "question": "none", "sentence": sentence}))
output

AIMessage(content='Sentence: \n \n\n4-step stream of consciousness thought process (1. reflect on criteria at a high-level -> 2. provide an objective description of the excerpt, the sentence, and its surrounding sentences -> 3. consider all possible perspectives on whether the sentence explicitly or implicitly contains a specific and verifiable proposition, or if it just contains an introduction for the following \nsentence(s), a conclusion for the preceding sentence(s), broad or generic statements, opinions, interpretations, speculations, statements about a lack of information, etc. -> 4. only if it contains a specific and verifiable proposition: reflect on whether any changes are needed to ensure that the entire sentence only contains verifiable information):\n1) A specific and verifiable proposition asserts a concrete fact that could be checked; generic statements, opinions, introductions, conclusions, or absence of content do not qualify. 2) The provided "Sentence" field is empty; 

In [None]:
chat = ChatOllama(model="qwen3:14b")
output = chat.invoke(get_prompt("claimify", "selection", {"excerpt": "todo", "question": "todo", "sentence": "todo"}))

In [12]:
output.content

'<think>\nOkay, let\'s tackle this problem step by step. The user wants me to determine if a given sentence from a response contains a specific and verifiable proposition based on the provided rules.\n\nFirst, I need to recall the criteria. A specific and verifiable proposition is something that can be checked for truth, like a fact. The rules mention that if the sentence is about a lack of information, it doesn\'t count. Also, it doesn\'t matter if the proposition is true or false, just whether it\'s specific and verifiable. Ambiguous terms are okay as long as the fact-checker can resolve them.\n\nNow, looking at the example given in the problem statement. The user provided a scenario where the sentence is "John" following the question "Who is the CEO of Company X?" and the preceding sentence. The answer here is that "John" contains a specific proposition because it directly answers the question with a name, which can be verified.\n\nAnother example is when the preceding sentence ment

In [10]:
def split_into_sentences(text: str) -> List[str]:
    """
    Splits a block of text into sentences, handling paragraphs and lists.
    This replicates the methodology from Appendix C.1 of the Claimify paper.
    
    Args:
        text: The input text to split
        
    Returns:
        A list of sentence strings
    """
    
    sentences = []
    # First, split by newlines to handle paragraphs and list items
    paragraphs = text.split('\n')
    for para in paragraphs:
        if para.strip():  # Avoid empty paragraphs
            # Then, use NLTK's sentence tokenizer on each paragraph
            sentences.extend(nltk.sent_tokenize(para))
    return sentences

split_into_sentences("""Salena Zito has written a powerful new Book, “BUTLER: The Untold Story of the Near Assassination of Donald Trump and the Fight for America’s Heartland.” Salena was an eyewitness to that terrible day, and understands the unbreakable Spirit of our Movement to FIGHT, FIGHT, FIGHT. The Future of our Nation resides in the Hearts and Souls of the People of this Country, and Salena knows we will not rest until we, MAKE AMERICA GREAT AGAIN. Preorder your copy today: a.co/d/92QYTIJ""")

['Salena Zito has written a powerful new Book, “BUTLER: The Untold Story of the Near Assassination of Donald Trump and the Fight for America’s Heartland.” Salena was an eyewitness to that terrible day, and understands the unbreakable Spirit of our Movement to FIGHT, FIGHT, FIGHT.',
 'The Future of our Nation resides in the Hearts and Souls of the People of this Country, and Salena knows we will not rest until we, MAKE AMERICA GREAT AGAIN.',
 'Preorder your copy today: a.co/d/92QYTIJ']

In [5]:
pipeline = ClaimifyPipeline(chat_factory("gpt-5"))

2025-08-20 14:59:24,319 - claimify.pipeline - INFO - Structured outputs enabled for improved reliability


In [7]:
article = """Předseda Pirátů Ivan Bartoš chce legalizovat heroin! Tak to je už opravdu síla! Takže nejenom že otevřeně podporuje migraci a islám (a nosí proto mikinu "jsem sluníčkář") ale ještě navíc podporuje tvrdé drogy. Chcete takovou stranu v Parlamentu, aby Vaše děti měli přístup k tvrdým drogám? Pro nás v SPD je to absolutně nepřijatelné."""

In [8]:
result = pipeline.run(article)
result

2025-08-20 15:02:40,395 - claimify.pipeline - INFO - Processing 5 sentences
2025-08-20 15:02:40,395 - claimify.pipeline - INFO - Processing sentence 1/5: Předseda Pirátů Ivan Bartoš chce legalizovat heroin!...
2025-08-20 15:02:56,419 - claimify.pipeline - INFO - SELECTION: Sentence verifiable
2025-08-20 15:03:42,072 - claimify.pipeline - INFO - DISAMBIGUATION: Sentence resolved


KeyError: '"text"'

In [None]:
prompt = get_prompt("claimify", "disambiguation", {"question": question, "excerpt": excerpt, "sentence": sentence})
# type of prompt
prompt

[SystemMessage(content='You are an assistant to a fact-checker. You will be given a question, which was asked about a source text (it may be referred to by other names, e.g., a disa\ndataset). You will also be given an excerpt from a response to the question. If it contains "[...]", this means that you are NOT seeing all sentences in the response. You will also be given a particular sentence from the response. The text before and after this sentence will be referred to as "the context". Your task is to "decontextualize" the sentence, which means:\n1. determine whether it\'s possible to resolve partial names and undefined acronyms/abbreviations in the sentence using the question and the context; if it is possible, you will make the necessary changes to the sentence\n2. determine whether the sentence in isolation contains linguistic ambiguity that has a clear resolution using the question and the context; if it does, you will make the necessary changes to the sentence\n\nNote the followi

In [49]:
get_prompt("claimify", "selection", {"question": question, "excerpt": excerpt, "sentence": sentence})

[SystemMessage(content='You are an assistant to a fact-checker. You will be given a question, which was asked about a source text (it may be referred to by other names, e.g., a \ndataset). You will also be given an excerpt from a response to the question. If it contains "[...]", this means that you are NOT seeing all sentences in the response. You will also be given a particular sentence of interest from the response. Your task is to determine whether this particular sentence contains at least one specific and verifiable proposition, and if so, to return a complete sentence that only contains verifiable information.   \n\nNote the following rules:\n- If the sentence is about a lack of information, e.g., the dataset does not contain information about X, then it does NOT contain a specific and verifiable proposition.\n- It does NOT matter whether the proposition is true or false.\n- It does NOT matter whether the proposition is relevant to the question.\n- It does NOT matter whether the 