# BELIEFS IN FREE-FORM TEXT

In [1]:
import os
import random
import re
import sys
import json
from math import exp
from tqdm import tqdm
from random import shuffle
from functools import partial
from pathlib import Path
from pprint import pprint

from typing import Any, Literal

import pandas as pd
import numpy as np

# from anthropic import Anthropic
from openai import OpenAI

from datasets import load_dataset_builder
from datasets import load_dataset
from datasets import get_dataset_split_names

from huggingface_hub import login

import anthropic


In [4]:
assert os.getenv('OPENAI_API_KEY') is not None, "You must set your OpenAI API key"
openai_client = OpenAI()

In [5]:
client = anthropic.Anthropic(
    # defaults to os.environ.get("ANTHROPIC_API_KEY")
    # api_key="my_api_key",
)

In [6]:
gpt4 = "gpt-4-1106-preview"
gpt35 = "gpt-3.5-turbo-1106"
MODELS = [gpt35, gpt4 ]

In [7]:
claude = "claude-opus-4-20250514"

## Testing Agreement of T with P

Design & implement a simple pipeline for measuring beliefs in free-form text, and sense-check this pipeline with diverse hand-crafted test cases. 
The pipeline would, for example, take as input proposition P and free-form text T (e.g. a short essay), and outputs a real in [0,1] indicating to what extent T agrees/disagrees with P. 
The pipeline can be prompting-based, logprob-based, embedding-based, or anything else you think makes sense - whatever does the job well.

In [34]:
def load_from_json(file_name) -> dict:
    with open(file_name, "r") as f:
        return json.load(f)
    
def extract_belief_regex(text):
    pattern = r'{"belief":\s*([0-9]*\.?[0-9]+)'
    match = re.search(pattern, text)
    if match:
        return float(match.group(1))
    return None

- The data is all long text esssays over a wide range of topics, although there are only 7 in total.
- It would make more sense to scrape data or pick up a debate datset
    - 1) (The GRE issues essays - the text itself is something I wrote and is therefore something atleast a little ambiguous- . The score seems to faithfully represent that)(it also has mistakes owing to the time constraint - so seemed like a realistic test).
    - 2) The opinions json has 5 articles, ranging from greek gods and feminism to relegion and well, pride and perjudice :')
- These might not be the typical beliefs or propositions that need testing, but I thought the ambiguity, and binary stances make it a sensible choice.

Prompting with logprobs and checking for confidence would give better insights, but skipping that, cos - time

## Simple Prompting - Binary Stances/Beliefs

In [25]:
USER_PROMPT = """

You are a discerning and sensible debate expert. I need your help scoring a text's agreement to a given belief. Answer with a float in the range [0,1], with 0 being strong disagreement with the proposition/belief and 1 being strong agreement.


I have a text, and would like to know to what extent it agrees with my proposition: "{prop}as opposed to "{neg_prop}.
Represent this agreement score by a floating-point probability in [0,1]. 
Respond in this format, with nothing else: {{"belief": float}}

The text that you need to check for belief: {text}

Remember that you are estimating the probability of "{prop}" being the correct answer, not "{neg_prop}"


"""
# 
# Do not return an explanation.

SYSTEM_PROMPT = """
You are a discerning and sensible debate expert. I need your help scoring a text's agreement to a given belief. Answer with a float in the range [0,1], with 0 being strong disagreement with the proposition/belief and 1 being strong agreement.
"""

In [26]:
"""A basic prompting based pipeline"""

def get_belief(text, proposition, neg_prop, model) -> str:
    history = [
        {"role": "system", "content": SYSTEM_PROMPT},
        {
            "role": "user",
            "content": USER_PROMPT.format(prop=proposition, neg_prop=neg_prop, text=text),
        },
    ]

    response = openai_client.chat.completions.create(
        model=model,
        messages=history,
        max_tokens=100,
        temperature=0,
    )
    return response.choices[0].message.content

In [27]:
def get_belief_claude(text, proposition, neg_prop, model) -> str:
    history = [
        # {"role": "system", "content": SYSTEM_PROMPT},
        {
            "role": "user",
            "content": USER_PROMPT.format(prop=proposition, neg_prop=neg_prop, text=text),
        },
    ]

    message = client.messages.create(
        model=model,
        max_tokens=1024,
        messages=history
    )
    return (message.content)


In [78]:
essays = load_from_json("my_gre.json")

In [35]:
l =[]
for rec in essays:
    proposition = rec['proposition']
    neg_prop = rec['neg_prop']
    text = rec['text']

    presp = get_belief_claude(text, proposition, neg_prop, claude)
    
    #switch P
    temp = neg_prop
    neg_prop = proposition
    proposition = temp
    negp_resp = get_belief_claude(text, proposition, neg_prop, claude)
    l.append((proposition, neg_prop, extract_belief_regex(presp[0].text), extract_belief_regex(negp_resp[0].text)))
ess_df = pd.DataFrame(l, columns=['P', 'NotP', 'Pscore', 'NPscore'])


In [36]:
ess_df

Unnamed: 0,P,NotP,Pscore,NPscore
0,People's behaviour is seldom ever determined b...,People's behaviour is largely determined by fo...,0.75,0.15
1,A nation should not require all of its student...,A nation should require all of its students to...,0.3,0.6


- So, the agreement score for the +ve proposition is 0.35. I also try switching the +ve and -ve P for consistency.
- Something like a full debate/essay dataset would make a lot more sense here
- Full eval sweeps can also be set up to arrive at more robust results

## Simple Prompting - Multiple Propositions

In [65]:
USER_PROMPT_2 = """

I need your help scoring a text's agreement to a given belief. Answer with a float in the range [0,1] for each proposition, with 0 being strong disagreement with the proposition/belief and 1 being strong agreement.

I have a text, and would like to know to what extent it agrees with my propositions: 
{p1}: {{"belief"}},
{p2}: {{"belief"}}: ,
{p3}: {{"belief"}}: 
Represent the agreement score of each of these propositions with the text by a floating-point probability in [0,1]. 
Respond in this format, with nothing else: fill in the above proposition-belief blanks - return the entire original proposition as is.

The text that you need to check for belief: {text}

Remember that you are estimating the probability of the propositions statements being the correct answer, not the opposite.

"""

In [69]:
def get_belief_claude_2(text, p1, p2, p3, model) -> str:
    history = [
        # {"role": "system", "content": SYSTEM_PROMPT},
        {
            "role": "user",
            "content": USER_PROMPT_2.format(p1=p1, p2=p2, p3=p3, text=text),
        },
    ]

    message = client.messages.create(
        model=model,
        max_tokens=1024,
        messages=history
    )
    return (message.content)


In [68]:
opinons = load_from_json("opinions.json")

In [80]:
l =[]
for rec in opinons:
    p1 = rec['p1']
    p2 = rec['p2']
    p3 = rec['p3']
    text = rec['text']

    presp = get_belief_claude_2(text, p1, p2, p3, claude)
    
    l.append((p1, p2, p3, presp[0].text))
opinion_df = pd.DataFrame(l, columns=['P1', 'P2', 'P3', 'Belief'])


In [81]:
for b in opinion_df.Belief:
    print(b)

It is our capacity for sin that makes us mortal: {"belief": 0.2},
It was only women who were victims of punishment: {"belief": 0.0},
Women were unfairly punished only by other men: {"belief": 0.1}
The actual price of different packaging, colors and scents warrants pink tax: {"belief": 0.1},
Women just ought to buy products marketed towards men if they want cheaper stuff: {"belief": 0.2},
Bussinesses ought to continue with pink tax as it is profitable: {"belief": 0.0}
All Women throughout history have demonstrated empathy for their fellow women's plights. : 0.1

People's upbringing contribute to their propensity to undermine others: 0.9

Internalized misogyny harms women more then misogyny does: 0.8
Men always believe in God for selfless reasons: {"belief": 0.2},
The debate over the existence of God always unites people, even as it divides: {"belief": 0.8},
Scientists are of the opinion that the existence of God agrees with probability: {"belief": 0.3}
Darcy redeems himself despite his 

Well, that seems to make sense ^

## Embedding based

This particular embedding, at least, does not perform as weel(for obvious reasons though-  the sentence transformers were probably not meant for such comparisons and the set up is just a straightforward cosine similarity metric)

In [47]:
from sentence_transformers import SentenceTransformer, util


In [76]:
def semsim(real, pred): #list of sentences each ie. all test data
  
    ##################################################
    
    mod_name = "all-mpnet-base-v2"
    # mod_name = "all-MiniLM-L6-v2"
    # mod_name = 'sentence-transformers/distilroberta-base-paraphrase-v1'
    
    model = SentenceTransformer(mod_name)
    model = model.eval()
    # model = model.eval().to(device)
    ##################################################
    # texts1, texts2 = batch_proc(real, pred)
    # all_batch_avg = []
    for (text1, text2) in zip(real, pred):
        
        e1 = model.encode(text1, show_progress_bar = False)
        e2 = model.encode(text2, show_progress_bar = False)
    
        similarities = util.pytorch_cos_sim(e1, e2)
        # sims = torch.diagonal(similarities, 0).tolist() #all scores else, have to parse with for loop for each sent

        # avg_sim = sum(sims)/len(sims)
        # all_batch_avg.append(avg_sim)
        # all_batch_avg.append(sims)
        """#########################################################################"""
    # scores = []
    # for i in all_batch_avg:
    #     scores += i

    
    print(real, similarities)

In [79]:
for rec in essays:
    proposition = rec['proposition']
    neg_prop = rec['neg_prop']
    text = rec['text']

    semsim(proposition, text)
    semsim(neg_prop, text)

People's behaviour is largely determined by forces not of their own making tensor([[0.4734]])
People's behaviour is seldom ever determined by forces not of their own making tensor([[0.4818]])
A nation should require all of its students to study the same national curriculum until they enter colloege.  tensor([[1.]])
A nation should not require all of its students to study the same national curriculum until they enter colloege.  tensor([[0.2362]])


In [77]:
for rec in opinons:
    p1 = rec['p1']
    p2 = rec['p2']
    p3 = rec['p3']
    text = rec['text']

    semsim(p1, text)
    semsim(p2, text)
    semsim(p3, text)
    print('\n\n')

    


It is our capacity for sin that makes us mortal tensor([[0.2198]])
It was only women who were victims of punishment tensor([[0.3027]])
Women were unfairly punished only by other men  tensor([[1.]])



The actual price of different packaging, colors and scents warrants pink tax tensor([[0.2417]])
Women just ought to buy products marketed towards men if they want cheaper stuff tensor([[0.4084]])
Bussinesses ought to continue with pink tax as it is profitable tensor([[0.2377]])



All Women throughout history have demonstrated empathy for their fellow women's plights.  tensor([[0.2377]])
People's upbringing contribute to their propensity to undermine others tensor([[0.2434]])
Internalized misogyny harms women more then misogyny does tensor([[0.3306]])



Men always beleive in God for selfless reasons tensor([[0.2566]])
The debate over the existence of God always unites people, even as it divides tensor([[0.2434]])
Scientists are of the opinon that the exsistence of God agrees with probabi