# BEAM - Direct Probing Task Evaluation
This notebook analyzes the performance of a large language model (LLM) that has been probed to identify the **book title** and **author name** based on a passage from the book. The goal of this analysis is to determine whether the model's predictions are an **exact match** for the correct book title and author name.

The dataset used in this analysis contains the following key fields:
- **Passage**: A snippet from a book
- **Predicted Title**: The title predicted by the LLM
- **Predicted Author**: The author predicted by the LLM

## Workflow Outline
This notebook will perform the following steps:
1. **Data Loading**: Import the CSV containing LLM predictions and true values.
2. **Exact Match Checking**: Compare predicted values with true values to determine correctness.
3. **Result Analysis**: Calculate metrics such as accuracy and error rate.
4. **Visualization**: Create visualizations to display the results, including correct vs incorrect predictions.

## Tools and Libraries
We will be using the following libraries for this analysis:
- **Pandas** for data manipulation
- **Matplotlib** and **Seaborn** for visualization
- **Numpy** for numerical operations

The analysis will help us better understand the model's performance in recognizing book titles and authors based on textual passages.

In [39]:
import pandas as pd
import numpy as np
import unidecode
import evaluate
import sklearn


In [28]:
df = pd.read_csv('/Users/alishasrivastava/BEAM/scripts/direct_probing/results/Adventures_of_Huckleberry_Finn_direct-probe_gpt4o.csv')
df.drop(columns=['en_title', 'en_author', 'es_title', 'es_author', 'tr_title', 'tr_author', 'vi_title', 'vi_author'], inplace=True)
print(df.head())


                                                  en  \
0  Pretty soon I wanted to smoke, and asked the w...   
1  Now she had got a start, and she went on and t...   
2  Miss Watson she kept pecking at me, and it got...   
3  I set down again, a-shaking all over, and got ...   
4  We went to a clump of bushes, and Tom made eve...   

                                          en_results  \
0  "title": "The Adventures of Huckleberry Finn",...   
1  "title": "The Adventures of Huckleberry Finn",...   
2  "title": "The Adventures of Huckleberry Finn",...   
3  "title": "The Adventures of Huckleberry Finn",...   
4  "title": "The Adventures of Tom Sawyer", "auth...   

                                                  es  \
0  En seguida me daban ganas de fumar y le pedía ...   
1  Entonces ella se lanzaba a contarme todo lo de...   
2  Un día la señorita Watson no paraba de meterse...   
3  Volví a sentarme, todo tiritando, y saqué la p...   
4  Fuimos a una mata de arbustos y Tom hizo qu

## Creating columns for predicted author and title

In [29]:
def extract_title_author(results_column):
    results_column = results_column.fillna('').astype(str).str.strip()
    return results_column.str.extract(r'"title":\s*"(.*?)",\s*"author":\s*"(.*?)"')

for language in df.columns:
    if '_results' in language:
        print(f'Running extraction for {language}')
        
        extracted_titles_authors = extract_title_author(df[language])
        language_suffix = language.replace('_results', '')
        
        df[f'{language_suffix}_predicted_title'] = extracted_titles_authors[0]
        df[f'{language_suffix}_predicted_author'] = extracted_titles_authors[1]

print(df.head())


Running extraction for en_results
Running extraction for es_results
Running extraction for tr_results
Running extraction for vi_results
                                                  en  \
0  Pretty soon I wanted to smoke, and asked the w...   
1  Now she had got a start, and she went on and t...   
2  Miss Watson she kept pecking at me, and it got...   
3  I set down again, a-shaking all over, and got ...   
4  We went to a clump of bushes, and Tom made eve...   

                                          en_results  \
0  "title": "The Adventures of Huckleberry Finn",...   
1  "title": "The Adventures of Huckleberry Finn",...   
2  "title": "The Adventures of Huckleberry Finn",...   
3  "title": "The Adventures of Huckleberry Finn",...   
4  "title": "The Adventures of Tom Sawyer", "auth...   

                                                  es  \
0  En seguida me daban ganas de fumar y le pedía ...   
1  Entonces ella se lanzaba a contarme todo lo de...   
2  Un día la señorita 

##Checking integrity of data - no NaNs

In [53]:
def check_data_integrity(language):
    print(f"\nChecking integrity for language: {language}")
    predicted_titles = df[f'{language}_predicted_title']
    predicted_authors = df[f'{language}_predicted_author']
    
    if predicted_titles.isnull().any():
        print(f"NaN values found in {language}_predicted_title:")
        print(predicted_titles[predicted_titles.isnull()])
    else:
        print(f"No NaN in {language}_predicted_title")

    if predicted_authors.isnull().any():
        print(f"NaN values found in {language}_predicted_author:")
        print(predicted_authors[predicted_authors.isnull()])
    else:
        print(f"No NaN values in {language}_predicted_author")

    unexpected_titles = predicted_titles[predicted_titles.str.strip() == '']
    if not unexpected_titles.empty:
        print(f"Unexpected empty values in {language}_predicted_title:")
        print(unexpected_titles)
    else:
        print(f"No unexpected empty values in {language}_predicted_title")

    unexpected_authors = predicted_authors[predicted_authors.str.strip() == '']
    if not unexpected_authors.empty:
        print(f"Unexpected empty values in {language}_predicted_author:")
        print(unexpected_authors)
    else:
        print(f"No unexpected empty values in {language}_predicted_author")

for language in ['en', 'es', 'tr', 'vi']: 
    check_data_integrity(language)


Checking integrity for language: en
No NaN in en_predicted_title
No NaN values in en_predicted_author
No unexpected empty values in en_predicted_title
No unexpected empty values in en_predicted_author

Checking integrity for language: es
No NaN in es_predicted_title
No NaN values in es_predicted_author
No unexpected empty values in es_predicted_title
No unexpected empty values in es_predicted_author

Checking integrity for language: tr
No NaN in tr_predicted_title
No NaN values in tr_predicted_author
No unexpected empty values in tr_predicted_title
No unexpected empty values in tr_predicted_author

Checking integrity for language: vi
No NaN in vi_predicted_title
No NaN values in vi_predicted_author
No unexpected empty values in vi_predicted_title
No unexpected empty values in vi_predicted_author


## Calculating F1 Exact Match
- **Using Unidecode for normalization**

In [56]:
book_data = {
    'en': ["The Adventures of Huckleberry Finn"],
    'es': ["Las aventuras de Huckleberry Finn", "The Adventures of Huckleberry Finn"],
    'tr': ["The Adventures of Huckleberry Finn"],
    'vi': ["The Adventures of Huckleberry Finn"]
}
book_authors = {
    'en': ["Mark Twain"],
    'es': ["Mark Twain"],
    'tr': ["Mark Twain"],
    'vi': ["Mark Twain"]
}

f1_metric = evaluate.load("f1")

# Adding exact match columns for each language
for language in ['en', 'es', 'tr', 'vi']:
    actual_titles = book_data[language]
    actual_authors = book_authors[language]
    
    # Exact match F1 scores for titles
    df[f'{language}_exact_match_title'] = df[f'{language}_predicted_title'].apply(
        lambda x: (
            print(f"Predicted Title: {x}, Actual Title: {actual_titles[0]}"),  # Debugging print
            f1_metric.compute(predictions=[unidecode.unidecode(str(x))], references=[unidecode.unidecode(actual_titles[0])])['f1'] if isinstance(x, str) else 0
        )[1] if pd.notna(x) else 0  # Handle NaN case
    )
    
    # Exact match F1 scores for authors
    #df[f'{language}_exact_match_author'] = df[f'{language}_predicted_author'].apply(
    #    lambda x: (
    #        print(f"Predicted Author: {x}, Actual Author: {actual_authors[0]}"),  # Debugging print
    #        f1_metric.compute(predictions=[unidecode.unidecode(x)], references=[unidecode.unidecode(actual_authors[0])])['f1']
    #    )[1]  # Return the F1 score
    #)


Predicted Title: The Adventures of Huckleberry Finn, Actual Title: The Adventures of Huckleberry Finn


ValueError: invalid literal for int() with base 10: 'The Adventures of Huckleberry Finn'