In [1]:
from llm_explain.data.samples import get_goal_driven_examples
from llm_explain.models.diff import explain_diff, KSparseRegression
import numpy as np

### Comparing datasets of airline reviews.

In this dataset, the dominant difference is about food, but the service attitude is also a big difference. We will show that fitting a linear model on top can help us extract multiple explanations.

In [2]:
dataset = get_goal_driven_examples()

If we naively apply our method, all top explanations might have similar meaning.

In [3]:
result = explain_diff(**dataset, proposer_num_rounds=5, proposer_num_explanations_per_round=3, proposer_precise=True)

print(result)

===2025-05-09 16:39:42 - diff.py - Line 92 - INFO ===
 Proposing explanations...


Proposing explanations:   0%|          | 0/5 [00:00<?, ?it/s]

Proposing explanations: 100%|██████████| 5/5 [00:08<00:00,  1.61s/it]
===2025-05-09 16:39:50 - diff.py - Line 97 - INFO ===
 Validating explanations, 15 explanations x 28 x_samples


Validating explanations: 100%|██████████| 420/420 [00:42<00:00,  9.89it/s]

Printing top 3 explanations:
Explanation: expresses delight in the dining experience; specifically, the text conveys enjoyment and satisfaction with the culinary aspects of the flight. For example, 'vraiment satisfait de mon expérience culinaire.'
Accuracy: 0.7857142857142857

Explanation: expresses satisfaction with the dining experience; specifically, the text conveys enjoyment of eating on the flight. For example, 'I really enjoyed my meal.'
Accuracy: 0.7857142857142857

Explanation: mentions a memorable culinary experience; specifically, the text expresses anticipation or satisfaction related to dining. For example, 'I can't wait to try more of their dishes because the flavors were amazing.'
Accuracy: 0.7857142857142857







In [4]:
### Extracting multiple explanations by fitting a linear model on top

feature_vals = np.array(result.validation_results).T

explanation_coefs = KSparseRegression(feature_vals, result.Y, K=2)

for explanation_idx, explanation in enumerate(result.all_explanations):
    if explanation_coefs[explanation_idx] != 0:
        print(f"{explanation_idx}: {explanation}")
        print(f"  coef: {explanation_coefs[explanation_idx]}")

===2025-05-09 16:40:33 - diff.py - Line 116 - INFO ===
 Performing K-sparse regression on 15 features 28 samples




5: expresses satisfaction with the dining experience; specifically, the text conveys enjoyment of eating on the flight. For example, 'I really enjoyed my meal.'
  coef: 1.33274182272121
14: mentions a memorable culinary experience; specifically, the text expresses anticipation or satisfaction related to dining. For example, 'I can't wait to try more of their dishes because the flavors were amazing.'
  coef: 1.3324250803176998
