## Project 3

Team:
- Olivia Dalglish (od4)
- Arindam Saha (saha2)

Contribution: 

Both of us worked on this project, brainstormed ideas and checked each other's work.


#### Section 1

For sentiment classification, we used the 1,536-dimensional embeddings from OpenAI's text-embedding-3-large model as our features. We trained a logistic regression model using cross validation, with `l2_penalty` and `roc_auc` scoring, to match our evaluation metric.

This model able to achieve the required benchmark in all the splits, as shown below.

| split | auc | execution time
|---|---|---|
| split_1 | 0.9867 | 9.73 s
| split_2 | 0.9863 | 9.75 s
| split_3 | 0.9860 | 9.88 s
| split_4 | 0.9865 | 10.62 s
| split_5 | 0.9860 | 10.85 s

Computer system specifications: Macbook Pro, Apple M3 Max (4.05 GHz), 48GB memory


#### Section 2

To explain our model from above, we chose 5 randomly selected positive reviews and 5 randomly selected negative reviews from split 1 test data, as instructed. We adopted the leave-one-out method outlined in Campuswire What We Have Tried (I) for our approach.

We used the BERT model [bert-base-uncased](https://huggingface.co/google-bert/bert-base-uncased) (with a tokenizer setting of padding=True and truncation=True) model to generate embeddings for the full review as well as with each sentence omitted. We split the review on the `.` character to determine sentences, which obviously is not 100% accurate but is simple and works well enough for our use case. Since our model is trained on OpenAI's embeddings, we trained a mapping from BERT to OpenAI embedding by randomly selecting 1600 reviews from the split 1 test data, generating the BERT embeddings and running a ordinary least squares regression to come up with the mapping matrix.

For interpretability analysis, we load the BERT embedding for the full review as well as the various leave-one-out-sentence reviews generated above, transform to OpenAi's embedding using our mapping matrix, run our trained model and come up with the full-probability as well as leave-one-out-probability for the sentences. Then for a positive review's sentence, if the `leave-one-out-probability <= full-probability`, we mark the sentence as having influenced our prediction. Similary, for a negative review, we use the metric `leave-one-out-probability >= full-probability`. The code below shows our method on the randomly selected reviews, with the influencing sentences highlighted in blue.

From the output, we can see that our analysis is quite good at picking the relevant parts of a review. By using leave-one-out method it is able to select which sentences really affect the prediction. However, a drawback is, if the entire review is positive/negative, it highlights the whole review instead of prioritizing and picking the most influencing sentences.

In [1]:
from os.path import join
import requests
import io

import numpy as np
import pandas as pd

from IPython.display import display, HTML

seed = 1735
np.random.seed(seed)


In [2]:
url_dir = 'https://raw.githubusercontent.com/oliviagd/psl/refs/heads/main/project_3/interpretability_inputs'

response = requests.get(join(url_dir, 'trained_lr_model.npz'))
response.raise_for_status()
model = np.load(io.BytesIO(response.content), allow_pickle=True)

In [3]:
def run_interpretability_analysis(tag, review_id, url):
    df = pd.read_csv(url)
    bert = df.drop(columns=['loo_sentence']).to_numpy()
    openai = np.c_[np.ones(len(bert)), bert] @ model['bert_to_openai_mapping']

    e = (openai @ model['coef'].flatten()) + model['intercept']
    probs = 1 / (1 + np.exp(-e))
    sentences = df['loo_sentence'].tolist()

    # full_review = sentences[0]
    full_prob = probs[0]
    positive = {'pos': True, 'neg': False}[tag]
    highlighted = []
    for sentence, prob in zip(sentences[1:], probs[1:]):
        highlight = (positive and prob <= full_prob) or (not positive and prob >= full_prob)
        if highlight:
            sentence = f"<span style='color: blue'>{sentence}</span>"
            # sentence = f"<u>{sentence}</u>"
        highlighted.append(str(sentence))
    highlighted = '.'.join(highlighted)
    
    display(HTML(f"<u>Review id={review_id} interpretability analysis - {'Positive' if positive else 'Negative'}</u>"))
    display(HTML(highlighted))



In [4]:
pos_ids = [2603, 8073, 9595, 17545, 19034]
neg_ids = [2726, 10557, 35754, 41564, 46871]


for tag, review_ids in [('pos', pos_ids), ('neg', neg_ids)]:
    for review_id in review_ids:
        run_interpretability_analysis(tag, review_id, join(url_dir, f'{tag}_{review_id}.csv'))


pass