# Quantitative Analysis: Model vs LLM Output Alignment

This notebook inspects the agreement between the model's outputs and the LLM-parsed outputs. It:
- Loads analysis results from CSV
- Flags borderline cases by confidence threshold
- Computes alignment percentages (prediction, confidence, influential %, borderline)
- Appends a compact alignment summary as a footer to the dataframe

# Libraries

In [26]:
from pathlib import Path

import pandas as pd
import numpy as np

# Constants

In [2]:
ROOT_DIR = Path.cwd().parent
RESULTS_DIR = ROOT_DIR / "results"

# Load Data

In [3]:
analysis = pd.read_csv(RESULTS_DIR / "quantitative_analysis.csv")
analysis

Unnamed: 0,model_prediction,model_confidence,model_influential_percentage,llm_parsed_prediction,llm_parsed_borderline,llm_parsed_confidence,llm_parsed_influential_percentage
0,Malignant,99.38,25.0,Malignant,False,99.38,25
1,Benign,93.61,98.0,Benign,False,93.61,98
2,Benign,91.19,98.0,Benign,False,91.19,98
3,Malignant,99.74,24.0,Malignant,False,99.74,24
4,Malignant,97.67,25.0,Malignant,False,97.67,75
...,...,...,...,...,...,...,...
105,Malignant,98.60,91.0,Malignant,False,98.60,91
106,Malignant,99.86,92.0,Malignant,False,99.86,92
107,Malignant,99.95,92.0,Malignant,False,99.95,92
108,Malignant,97.79,89.0,Malignant,False,97.79,89


In [48]:
print('Records with `NA`: ', analysis.isna().sum().sum())

Records with `NA`:  0


- Columns present: `model_prediction`, `model_confidence`, `llm_parsed_*`
- No missing values

# Flag Borderline Cases
Define a borderline window using model confidence. We treat cases with confidence in [50, 60) as borderline to capture uncertain predictions.

In [50]:
# Define borderline cases: confidence between 50 and 60 (exclusive of 60)
analysis['is_borderline'] = (analysis['model_confidence'] >= 50) & (analysis['model_confidence'] < 60)
analysis.head()

Unnamed: 0,model_prediction,model_confidence,model_influential_percentage,llm_parsed_prediction,llm_parsed_borderline,llm_parsed_confidence,llm_parsed_influential_percentage,is_borderline
0,Malignant,99.38,25.0,Malignant,False,99.38,25,False
1,Benign,93.61,98.0,Benign,False,93.61,98,False
2,Benign,91.19,98.0,Benign,False,91.19,98,False
3,Malignant,99.74,24.0,Malignant,False,99.74,24,False
4,Malignant,97.67,25.0,Malignant,False,97.67,75,False


# Alignment Summary Footer
Compute alignment percentages between the model and LLM and append a compact summary at the end of the dataframe for better visualisation.

In [51]:
# Model prediction vs LLM parsed prediction alignment
pred_alignment = (analysis['model_prediction'] == analysis['llm_parsed_prediction']).mean() * 100

# Model confidence vs LLM parsed confidence alignment
confidence_diff = abs(analysis['model_confidence'] - analysis['llm_parsed_confidence'])
confidence_alignment = (confidence_diff <= 2).mean() * 100

# Model influence cases percentage vs LLM parsed influence cases percentage
influential_diff = abs(analysis['model_influential_percentage'] - analysis['llm_parsed_influential_percentage'])
influential_alignment = (influential_diff <= 10).mean() * 100

# Borderline vs LLM parsed borderline alignment
borderline_col = 'llm_parsed_borderline'
borderline_alignment = (analysis['is_borderline'] == analysis[borderline_col]).mean() * 100

# Create footer rows with alignment statistics
footer_data = {
    'model_prediction': ['ALIGNMENT STATS', 'Pred Alignment %', 'Conf Alignment %', 'Infl Alignment %', 'Borderline Align %'],
    'model_confidence': ['', f'{pred_alignment:.2f}%', f'{confidence_alignment:.2f}%', 
                        f'{influential_alignment:.2f}%' if not np.isnan(influential_alignment) else 'N/A',
                        f'{borderline_alignment:.2f}%' if not np.isnan(borderline_alignment) else 'N/A'],
    'llm_parsed_prediction': ['', 'Model vs LLM Pred', '±2% tolerance', '±10% tolerance', 'Model vs LLM'],
}

# Add any missing columns from the original dataframe
for col in analysis.columns:
    if col not in footer_data:
        footer_data[col] = [''] * 5

# Create footer dataframe
footer_df = pd.DataFrame(footer_data)

# Combine original analysis with footer
analysis_with_footer = pd.concat([analysis, footer_df], ignore_index=True)

# Store the analysis with footer
analysis_with_footer.to_csv(RESULTS_DIR / 'analysis_with_footer.csv', index=False)

# Display the dataframe with footer
analysis_with_footer

Unnamed: 0,model_prediction,model_confidence,model_influential_percentage,llm_parsed_prediction,llm_parsed_borderline,llm_parsed_confidence,llm_parsed_influential_percentage,is_borderline
0,Malignant,99.38,25.0,Malignant,False,99.38,25,False
1,Benign,93.61,98.0,Benign,False,93.61,98,False
2,Benign,91.19,98.0,Benign,False,91.19,98,False
3,Malignant,99.74,24.0,Malignant,False,99.74,24,False
4,Malignant,97.67,25.0,Malignant,False,97.67,75,False
...,...,...,...,...,...,...,...,...
110,ALIGNMENT STATS,,,,,,,
111,Pred Alignment %,100.00%,,Model vs LLM Pred,,,,
112,Conf Alignment %,100.00%,,±2% tolerance,,,,
113,Infl Alignment %,96.36%,,±10% tolerance,,,,
