<a href="https://colab.research.google.com/github/microprediction/winningnotebooks/blob/main/LLM_Quinellas_Comparison.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [2]:
!pip install transformers
!pip install winning
!pip install pandas
!pip install scipy

Collecting winning
  Downloading winning-1.0.3-py3-none-any.whl.metadata (6.7 kB)
Downloading winning-1.0.3-py3-none-any.whl (23 kB)
Installing collected packages: winning
Successfully installed winning-1.0.3


# Luce's Choice Axiom versus the Standard Normal Race model
The methodology is as follows.





## A contest model for choice

Luce is trivial. Let's just implement the second here using the `winning` package:;

# Quinella pricing

In [21]:
import numpy as np
from winning.lattice_conventions import STD_L, STD_A
from winning.lattice import skew_normal_density, densities_from_offsets, get_the_rest, _loser_of_two_pdf,\
    beats, winner_of_many, cdf_to_pdf
from winning.lattice_calibration import state_price_implied_ability


def compute_skew_normal_quinellas(p:[float], L=551, a=0):
    """ Produce quinella table, and also return densities

    :param p:  Vector of state prices
    :param L:  500 by default, half that is probably fine
    :return: quinellas, densities
    """

    # Calibration
    unit = 1.0
    density = skew_normal_density(L=L, unit=unit, loc=0, scale=50.0, a=a)
    offsets = state_price_implied_ability(prices=p, density=density)
    densities = densities_from_offsets(density, offsets)
    densityAll, multiplicityAll = winner_of_many(densities)

    n = len(p)
    quinellas = np.ndarray(shape=(n, n))
    for h0 in range(n):
        density0 = densities[h0]
        cdfRest0, multiplicityRest0 = get_the_rest(density=density0, densityAll=densityAll,
                                                   multiplicityAll=multiplicityAll, cdf=None, cdfAll=None)
        for h1 in range(n):
            if h1 > h0:
                density1 = densities[h1]
                cdfRest01, multiplicityRest01 = get_the_rest(density=density1, densityAll=None,
                                                             multiplicityAll=multiplicityRest0, cdf=None,
                                                             cdfAll=cdfRest0)
                pdfRest01 = cdf_to_pdf(cdfRest01)
                loser01, loser_multiplicity01 = _loser_of_two_pdf(density0, density1)
                quinellas[h0, h1] = 1 / beats(loser01, loser_multiplicity01, pdfRest01, multiplicityRest01)
                quinellas[h1, h0] = quinellas[h0, h1]

    return 1/quinellas

qins = compute_skew_normal_quinellas(p=[0.5,0.3,0.1,0.1,0.1,0.001,0.001,0.001,0.001,0.001,0.001])
qins[:4,:4]

array([[1.        , 0.33139024, 0.12176597, 0.12176597],
       [0.33139024, 1.        , 0.06868425, 0.06868425],
       [0.12176597, 0.06868425, 1.        , 0.02362145],
       [0.12176597, 0.06868425, 0.02362145, 1.        ]])

# Quinella pricing (Luce / Harville)

In [22]:
def compute_harville_quinellas(p):
    """
    Compute Harville Quinellas (joint probabilities for unordered pairs) from individual probabilities assuming independence.

    Args:
        p (list of float): List of individual probabilities for each state/event. Should sum to <=1.

    Returns:
        list of lists: Matrix-like structure where the element at [i][j] is the joint probability P({i, j}),
                       with diagonal elements being 1.0 to match the format requested.
    """
    from itertools import combinations

    n = len(p)
    quinella_matrix = [[0.0] * n for _ in range(n)]

    # Compute unnormalized joint probabilities
    for i, j in combinations(range(n), 2):
        joint_prob = p[i] * p[j]
        quinella_matrix[i][j] = quinella_matrix[j][i] = joint_prob

    # Normalize the joint probabilities so that their sum equals the total probability of any two events occurring
    total_joint_prob = sum(sum(row) for row in quinella_matrix) / 2  # Divide by 2 to avoid double-counting pairs
    if total_joint_prob > 0:
        quinella_matrix = [
            [cell / total_joint_prob if i != j else 1.0 for j, cell in enumerate(row)]
            for i, row in enumerate(quinella_matrix)
        ]

    return quinella_matrix

# Example usage
result = compute_harville_quinellas(p=[0.5, 0.3, 0.2])
for row in result:
    print(row)




[1.0, 0.48387096774193544, 0.32258064516129037]
[0.48387096774193544, 1.0, 0.1935483870967742]
[0.32258064516129037, 0.1935483870967742, 1.0]


In [7]:
!pip install transformers numpy pandas scipy



# Missing word utility

In [33]:
def llm_quinellas(prompt_pair):
    """
    Receives a prompt pair like the following:
      - "I visited the state called [MASK] last year and it is one of my favorite states in the U.S.A."
      - "I visited two states called [ANSWER] and [MASK] last year and they are my two favorite states in the U.S.A."

    First, it will ask an LLM to fill in the missing token and extract the token probabilities.
    We will take the top 10 and create a list called 'names' (lowercase) and one called 'p' where the latter holds
    renormalized probabilities adding to unity.

    Second, for each i, name in enumerate(names), we will substitute into the second prompt.
    So if the name is 'arizona', we get something like:
      - "I visited two states called arizona and [MASK] last year and they are my two favorite states in the U.S.A."

    Eliminate any responses that are not in the set NAMES / {name}.
    Renormalize the token probabilities.

    This gives a way of assigning 'exacta' probabilities ex[i, :] where diagonals are zero.
    When we have done this for all names, we also add ex to its own transpose to get qu[:, :].

    Return this quinella probability table.
    """
    from transformers import pipeline
    import numpy as np

    # Initialize the fill-mask pipeline with a model that supports mask filling
    fill_mask = pipeline('fill-mask', model='roberta-base')

    # Unpack the prompt pair
    prompt_single, prompt_double = prompt_pair

    # Adjust the mask tokens for the model
    mask_token = fill_mask.tokenizer.mask_token  # This will be '<mask>' for roberta-base

    # Step 1: Get top 10 names and their probabilities from the first prompt
    prompt_single = prompt_single.replace('[MASK]', mask_token)

    # Get predictions
    results = fill_mask(prompt_single, top_k=10)

    # Extract tokens and their probabilities
    names = []
    probs = []
    for res in results:
        token_str = res['token_str'].strip().lower()
        names.append(token_str)
        probs.append(res['score'])

    # Normalize probabilities to sum to 1
    total_prob = sum(probs)
    p = [prob / total_prob for prob in probs]

    # Initialize the exacta probability matrix
    n = len(names)
    ex = np.zeros((n, n))

    # Step 2: For each name, get probabilities for the second [MASK]
    for i, name in enumerate(names):
        # Substitute the name into the second prompt
        prompt = prompt_double.replace('[ANSWER]', name)
        prompt = prompt.replace('[MASK]', mask_token)

        # Get predictions
        results = fill_mask(prompt, top_k=50)  # Increase top_k to ensure coverage

        # Extract tokens and their probabilities
        other_names = []
        probs_others = []
        for res in results:
            token_str = res['token_str'].strip().lower()
            other_names.append(token_str)
            probs_others.append(res['score'])

        # Filter out the current name and names not in the list
        allowed_names = set(names) - {name}
        filtered_probs = {}
        for other_name, prob in zip(other_names, probs_others):
            if other_name in allowed_names:
                filtered_probs[other_name] = prob

        # Renormalize probabilities
        total_prob = sum(filtered_probs.values())
        if total_prob > 0:
            filtered_probs = {k: v / total_prob for k, v in filtered_probs.items()}
        else:
            # If no allowed names are found, skip this iteration
            continue

        # Map other_name to index j
        name_to_index = {n: idx for idx, n in enumerate(names)}
        # Fill the exacta matrix
        for other_name, prob in filtered_probs.items():
            j = name_to_index[other_name]
            ex[i, j] = p[i]*prob

    # Zero out the diagonal
    np.fill_diagonal(ex, 0)

    # Compute the quinella probability table
    qu = ex + ex.T


    return p, qu, names


def quinella_comparison(prompt_pair):
    # Get LLM, normal model and harville implied quinellas
    # Compute several measures of discrepancy between LLM and estimated probabilities
    p, qu_llm, names = llm_quinellas(prompt_pair=prompt_pair)
    qu_normal = compute_skew_normal_quinellas(p)
    qu_harville = compute_harville_quinellas(p)
    srt = sorted(list(zip(p,names)), reverse=True)
    print({'probs':srt})

    # Compute RMSE between qu_llm and qu_normal
    rmse_normal = np.sqrt(np.mean((qu_llm - qu_normal) ** 2))

    # Compute RMSE between qu_llm and qu_harville
    rmse_harville = np.sqrt(np.mean((qu_llm - qu_harville) ** 2))

    # Compute RMSE between qu_llm and qu_harville
    rmse_diff = np.sqrt(np.mean((qu_harville - qu_normal) ** 2))


    # Print the RMSE values
    print(f"RMSE between LLM quinellas and Skew Normal quinellas: {rmse_normal:.6f}")
    print(f"RMSE between LLM quinellas and Harville quinellas: {rmse_harville:.6f}")
    print(f"RMSE between Skew and Harville quinellas: {rmse_diff:.6f}")
    if rmse_normal < rmse_harville:
        better_model = "Skew Normal"
    else:
        better_model = "Harville"

    print(f"The {better_model} model better predicts the actual quinella probabilities.")




# Example usage
prompt_pair = (
    "I visited the state called [MASK] last year and it is one of my favorite states in the U.S.A.",
    "I visited two states called [ANSWER] and [MASK] last year and they are my two favorite states in the U.S.A."
)
quinella_comparison(prompt_pair=prompt_pair)




{'probs': [(0.19252209207465457, 'arizona'), (0.11820234007751466, 'texas'), (0.10717900748068498, 'oregon'), (0.10114625244209448, 'indiana'), (0.10052167784050628, 'florida'), (0.0904307397306346, 'california'), (0.08755574201564277, 'georgia'), (0.06912437443572696, 'wisconsin'), (0.06699298122438921, 'arkansas'), (0.0663247926781515, 'utah')]}
RMSE between LLM quinellas and Skew Normal quinellas: 0.316662
RMSE between LLM quinellas and Harville quinellas: 0.316663
RMSE between Skew and Harville quinellas: 0.000664
The Skew Normal model better predicts the actual quinella probabilities.


In [35]:
prompt_pair_template = (
    "I visited the country called [MASK] last year and it is one of my favorite countries in REGION",
    "I visited two countries called [ANSWER] and [MASK] last year and they are my two favorite countries in REGION"
)
for region in ['Asia','Europe','the Americas','Africa','the Southern Hemisphere','the World']:
     prompt_pair = [ pp.replace('REGION',region) for pp in prompt_pair_template]
     quinella_comparison(prompt_pair=prompt_pair)

{'probs': [(0.1737887911458916, 'vietnam'), (0.16538282901836396, 'myanmar'), (0.1207179720561625, 'cambodia'), (0.11862311323457679, 'burma'), (0.09436936278318456, 'bangladesh'), (0.0715250775055774, 'thailand'), (0.06873332038425296, 'pakistan'), (0.06643431411960922, 'singapore'), (0.06336770805747138, 'india'), (0.05705751169490962, 'nepal')]}
RMSE between LLM quinellas and Skew Normal quinellas: 0.316677
RMSE between LLM quinellas and Harville quinellas: 0.316693
RMSE between Skew and Harville quinellas: 0.000913
The Skew Normal model better predicts the actual quinella probabilities.
{'probs': [(0.19640808758081216, 'poland'), (0.17165197096800303, 'slovenia'), (0.14913808179303678, 'luxembourg'), (0.08243369708777519, 'romania'), (0.0789737171802471, 'slovakia'), (0.07434079512972087, 'hungary'), (0.06540018934582313, 'estonia'), (0.06462851383624654, 'latvia'), (0.05890455746495978, 'croatia'), (0.058120389613375394, 'malta')]}
RMSE between LLM quinellas and Skew Normal quinel

In [36]:
prompt_pair_template = (
    "I learned the sport [MASK] last year and it is one of my favorite forms of SOMETHING now",
    "I visited two sports called [ANSWER] and [MASK] last year and they are my two favorite forms of SOMETHING now"
)
for something in ['exercise','sport','relaxation','competition']:
     prompt_pair = [ pp.replace('SOMETHING',something) for pp in prompt_pair_template]
     quinella_comparison(prompt_pair=prompt_pair)

{'probs': [(0.42673403758005696, 'just'), (0.1710667208872005, 'only'), (0.09593612321713113, 'this'), (0.06353671697929963, 'late'), (0.05366737837391206, 'again'), (0.051117499265016714, 'early'), (0.037857104138433156, 'swimming'), (0.03402364602688767, 'myself'), (0.03359456407599601, 'skiing'), (0.032466209456066125, 'from')]}
RMSE between LLM quinellas and Skew Normal quinellas: 0.323808
RMSE between LLM quinellas and Harville quinellas: 0.324098
RMSE between Skew and Harville quinellas: 0.002924
The Skew Normal model better predicts the actual quinella probabilities.
{'probs': [(0.2007692115588682, 'skiing'), (0.18167295026823124, 'just'), (0.13562899327687644, 'only'), (0.11337425835449519, 'swimming'), (0.07796001167365922, 'german'), (0.07342749978517633, 'spanish'), (0.07041501329167509, 'this'), (0.05890980197448529, 'again'), (0.04597259603265097, 'french'), (0.041869663783882, 'russian')]}
RMSE between LLM quinellas and Skew Normal quinellas: 0.320259
RMSE between LLM qui