## Using Notebook Environments
1. To run a cell, press `shift + enter`. The notebook will execute the code in the cell and move to the next cell. If the cell contains a markdown cell (text only), it will render the markdown and move to the next cell.
2. Since cells can be executed in any order and variables can be over-written, you may at some point feel that you have lost track of the state of your notebook. If this is the case, you can always restart the kernel by clicking Runtime in the menu bar (if you're using Colab) and selecting `Restart runtime`. This will clear all variables and outputs.
3. The final variable in a cell will be printed on the screen. If you want to print multiple variables, use the `print()` function as usual.

Notebook environments support code cells and markdown (text) cells. For the purposes of this workshop, markdown cells are used to provide high-level explanations of the code. More specific details are provided in the code cells themselves in the form of comments (lines beginning with `#`)

## Environment Setup

In [1]:
# mount googl drive
from google.colab import drive
drive.mount("/content/drive")

ModuleNotFoundError: No module named 'google'

We begin by loading the requisite packages. For those coming from R, packages in Python are sometimes given shorter names for use in the code via the `import <name> as <nickname>` syntax (e.g. `import pandas as pd`). These are usually standardized nicknames.

We'll use following packages:

1. `pandas`: A very popular package for reading and manipulating data in python.
2. `huggingface_hub`: A package for extracting features from text data using
3. `re` for string matching
4. `textwrap` and `IPython` to make the display of some of the tables more readable
5. `pandas` is a popular package for data frames manipulation
6. `huggingface_hub` for making API calls and prompt the LLMs

In [4]:
import sys
import os
import re
import textwrap
from IPython.display import display, HTML
import pandas as pd
from huggingface_hub import InferenceClient

# the code below installs huggingface hub if it's missing
if 'google.colab' in sys.modules:  # If in Google Colab environment

    # Installing requisite packages
    !pip install huggingface_hub &> /dev/null

# this sets the working directory to the exercises folder
base_path = '/content/drive/My Drive/llms_egproc/exercises/'
os.chdir(base_path)

# Identifying decision reasons (part 1)
In this exercise, we will explore the capabilities of LLMs to identify decision reasons in verbal reports using the Hugging Face (HF) ecosystem.

By the end of this exercise, you will have learned how to:
- Design a zero-shot prompt
- Have large models on the Hugging Face servers evaluate your prompts  
- Validate the model output

## Hello world
We start by settiing up the inference client. The `InferenceClient` function from huggingface ecosystem allows for....
The API_TOKEN is an atuhetication token...
The LLAMA_version variable specifies the version of the LLAMA model that will use in this exercise


In [5]:
API_TOKEN = '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'
LLAMA_version = "meta-llama/Meta-Llama-3-70B-Instruct"
LLAMA = InferenceClient(model = LLAMA_version, token = API_TOKEN)

Try out some stupid prompt. Note that in order to prompt the model the `.text_generation` method should be used.

In [None]:
stupid_prompt = 'Give an example of a stupid prompt'
llama_response = LLAMA.text_generation(stupid_prompt, max_new_tokens = 4000)
print(llama_response)

## System and User messages
LLAMA uses special tokens to distinguish between system and user parts of the prompt.

In [None]:
system_role = """
<|begin_of_text|><|start_header_id|>system<|end_header_id|>
You are an expert decision scientist
<|eot_id|>"""

user_question = """
<|start_header_id|>user<|end_header_id|>
What is the best way to make financial decisions?
<|eot_id|><|start_header_id|>assistant<|end_header_id|>"""

# combine both into a single prompt
prompt = system_role + user_question

# print the full prompt
print(prompt)

Try the prompt with the model role set to decision scientist

In [None]:
llama_response = LLAMA.text_generation(prompt, max_new_tokens = 200)
print(llama_response)

Now try it with the role set to a priest

In [None]:
# change the role in line and rerun the cell

system_role = """
<|begin_of_text|><|start_header_id|>system<|end_header_id|>
You are a priest
<|eot_id|>"""

user_question = """
<|start_header_id|>user<|end_header_id|>
What is the best way to make financial decisions?
<|eot_id|><|start_header_id|>assistant<|end_header_id|>"""

# update the prompt
prompt = system_role + user_question

llama_response = LLAMA.text_generation(prompt, max_new_tokens = 200)
print(llama_response)

## Loading the prompt template
Now have a look at our prompt. What kind of information do we provide in the system message?

In [None]:
# Read the File
prompt_path = 'prompts/prompt_v1.txt'

# Open the file and read its contents
with open(prompt_path, 'r') as file:
    prompt_base = file.read()

print(prompt_base)

## Constructing the prompt

- read in the decision problems and reports
- combine them to vector of complete prompts
- look at 1 complete prompt


In [None]:
# read in decision problems, decision reasons, and verbal reports
decision_problems = pd.read_csv('data/decision_problems.csv', encoding = 'utf-8')
decision_reasons = pd.read_csv('data/decision_reasons.csv', encoding = 'utf-8')
verbal_reports = pd.read_csv('data/verbal_reports.csv', encoding = 'utf-8')

In [None]:
# merge verbal reports with decision problems
problems_reports = pd.merge(decision_problems, verbal_reports, on = 'problem_id')
# print(problems_reports)

In [None]:
# print(decision_reasons)

In [None]:
# function for constructing the full prompt
def generate_prompt(prompt, decision_problem, decision_reason, verbal_report):
    """
    Replaces placeholders in the prompt with the given decision problem, decision reason, and verbal report.
    """
    # Replace placeholders with actual values
    filled_prompt = prompt.replace("DECISION_PROBLEM", decision_problem)
    filled_prompt = filled_prompt.replace("DECISION_REASON", decision_reason)
    filled_prompt = filled_prompt.replace("VERBAL_REPORT", verbal_report)

    return filled_prompt

In [None]:
# Select the description for expected value
selected_reason = 'expected value'
selected_description = decision_reasons.loc[decision_reasons['decision reason name'] == selected_reason, 'decision reason description'].values[0]

# Create a list for storing prompts for the expected value reason
expected_value_prompts = []

# Generate prompts for the specific decision reason
for _, row in problems_reports.iterrows():

    # here we are using the generate prompt function to create prompts for all verbal reports and the expected value reason
    prompt = generate_prompt(
        prompt_base,
        row['decision_problem'],
        selected_description,  # Use the selected description
        row['verbal_report']
    )
    expected_value_prompts.append(prompt)

In [None]:
# have a look at first
print(expected_value_prompts[1])

## Give it a test

- run first prompt
- make sense of output relative to prompt
- maybe try again or next prompt


In [None]:
# pass the first prompt to LLAMA and save the output
expected_value_eval1 = LLAMA.text_generation(expected_value_prompts[1], max_new_tokens = 4000)

In [None]:
# print the llama evaluation
print(expected_value_eval1)

# Extractig confidence ratings
As you can see in the output above, following our prompt the model provides full descrrpition of the deliberation process and the confidence assesment at the end.

Next, we will iterate over the entier data set. While doing so, we will use the `extract_confidence` function to extract the confidence assesment and append it to the `problems_reports` data set. The full output from the LLAMA model will be saved in a separate list `expected_value_eval` for later inspection.

In [None]:
# Function for extracting confidence assessments
def extract_confidence(s):
    """
    Extracts an integer value from a string enclosed between @ or @@ symbols.
    """
    # Regular expression to match patterns like @number@ or @@number@@
    pattern = r'@+(\s*\d+\s*)@+'

    # Search for the pattern in the string
    match = re.search(pattern, s)

    if match:
        # Extract the number and convert it to an integer
        number_str = match.group(1).strip()
        return int(number_str)

    return None

In [None]:
# # test the extract_confidence function
# print(extract_confidence('something else @10 @ xxx'))
# print(extract_confidence('something else @@13@@ xxx'))
# print(extract_confidence(expected_value_eval1))

# EXPECTED VALUE
## Example Analysis
Now you'll run the analysis on the etire data set. For each verbal report and decision problem combination, the model with provide assessment of confidence on whether the individual used the expected value reason.

In [None]:
# list for storing the output from the LLAMA model
expected_value_eval = []

# analyzed reason
analyzed_reason = "expected value"

# new column in the problems_reports data set for stroting the confidence assesments
# remind that selected reason was set to 'expected value'
problems_reports[analyzed_reason] = None

In [None]:
# Iterate over the list of prompts, get responses, and extract numerical estimates and add them to the data set with problems and reports
for i, prompt in enumerate(expected_value_prompts):

    # response from LLAMA
    llama_response = LLAMA.text_generation(prompt, max_new_tokens = 4000)
    expected_value_eval.append(llama_response) # save the response to the expected_value_eval list

    # extract the confidence value from the response
    confidence_assesment = extract_confidence(llama_response)

    # confidence value into the data
    problems_reports.at[i, analyzed_reason] = confidence_assesment

    # monitor progress
    print(str(i) + '/' + str(problems_reports.shape[0]))

In [None]:
# Function to wrap text
def wrap_text(text, width=100):
    return "<br>".join(textwrap.wrap(text, width))

# Function to show verbal reports with assigned numbers in a specified range
def show_verbal_reports_in_range(data, reason, min_confidence, max_confidence):
    """
    Shows verbal reports for which the model assigned a confidence within the specified range.
    """
    filtered_data = data[(data[reason] >= min_confidence) & (data[reason] <= max_confidence)] # filter by the specified range

     # wrap the text for nicer display
    filtered_data.loc[:, 'verbal_report'] = filtered_data['verbal_report'].apply(wrap_text)
    filtered_data.loc[:, 'decision_problem'] = filtered_data['decision_problem'].apply(lambda x: wrap_text(x, width=40))

    # select only the columns with report and confidence assesment
    filtered_data = filtered_data[['decision_problem', 'verbal_report', 'choice', reason]]
    filtered_data = filtered_data.to_html(escape=False) # to html

    return display(HTML(filtered_data))
    # return filtered_data[['verbal_report', reason]]

In [None]:
# Show verbal reports for which the Expected value reason was assessed not to be used---i.e., confidence in usuing th reason was low, between 0 and 20
show_verbal_reports_in_range(problems_reports, analyzed_reason, 0, 20)

In [None]:
# Show verbal reports for which the Expected value reason was assessed to be used with high confidecne, i.e., between 80 to 100
show_verbal_reports_in_range(problems_reports, analyzed_reason, 80, 100)

In [None]:
# # Show all results
# show_verbal_reports_in_range(problems_reports, analyzed_reason, 0, 100)

# SURE OUTCOME
Run the analyses for the sure outcome decision reason

## Sure Outcome Prompts

In [None]:
# Select the description for expected value
analyzed_reason = "sure outcome"
selected_description = decision_reasons.loc[decision_reasons['decision reason name'] == analyzed_reason, 'decision reason description'].values[0]

# Create a list for storing prompts for the expected value reason
sure_outcome_prompts = []

# Generate prompts for the specific decision reason
for _, row in problems_reports.iterrows():

    # here we are using the generate prompt function to create prompts for all verbal reports and the expected value reason
    prompt = generate_prompt(
        prompt1,
        row['decision_problem'],
        selected_description,  # Use the selected description
        row['verbal_report']
    )
    sure_outcome_prompts.append(prompt)

print(sure_outcome_prompts[1])

## Sure Outcome LLAMA evaluation

In [None]:
# list for storing the output from the LLAMA model
sure_outcome_eval = []

# new column in the problems_reports data set for stroting the confidence assesments
# remind that selected reason was set to 'expected value'
problems_reports[analyzed_reason] = None

# Iterate over the list of prompts, get responses, and extract numerical estimates and add them to the data set with problems and reports
for i, prompt in enumerate(sure_outcome_prompts):

    # response from LLAMA
    llama_response = LLAMA.text_generation(prompt, max_new_tokens = 4000)
    sure_outcome_eval.append(llama_response) # save the response to the sure_outcome_eval list

    # extract the confidence value from the response
    confidence_assesment = extract_confidence(llama_response)

    # confidence value into the data
    problems_reports.at[i, analyzed_reason] = confidence_assesment

    # monitor progress
    print(str(i) + '/' + str(problems_reports.shape[0]-1))

In [None]:
# Show verbal reports for which the Sure outcome reason was assessed not to be used---i.e., confidence in usuing th reason was low, between 0 and 20
show_verbal_reports_in_range(problems_reports, 'sure outcome', 0, 20)

In [None]:
# Show verbal reports for which the Sure outcome reason was assessed to be used with high confidecne, i.e., between 80 to 100
show_verbal_reports_in_range(problems_reports, 'sure outcome', 80, 100)

# Clean up the notebook (optional)

In [None]:
# Clear all variables
%reset -f

# Clear all outputs
from IPython.display import clear_output
clear_output()

# Restart runtime
import os
os._exit(00)