## Using Notebook Environments
1. To run a cell, press `shift + enter`. The notebook will execute the code in the cell and move to the next cell. If the cell contains a markdown cell (text only), it will render the markdown and move to the next cell.
2. Since cells can be executed in any order and variables can be over-written, you may at some point feel that you have lost track of the state of your notebook. If this is the case, you can always restart the kernel by clicking Runtime in the menu bar (if you're using Colab) and selecting `Restart runtime`. This will clear all variables and outputs.
3. The final variable in a cell will be printed on the screen. If you want to print multiple variables, use the `print()` function as usual.

Notebook environments support code cells and markdown (text) cells. For the purposes of this workshop, markdown cells are used to provide high-level explanations of the code. More specific details are provided in the code cells themselves in the form of comments (lines beginning with `#`)

## Environment Setup
The code below allows the notebook to acces your google files, including reading the files in, and writing files in specified locations.
Once you run it, you'll have to give the notebbok the access to a google drive of your choosen google account.

In [None]:
# mount googl drive
from google.colab import drive
drive.mount("/content/drive")

We begin by loading the requisite packages. For those coming from R, packages in Python are sometimes given shorter names for use in the code via the `import <name> as <nickname>` syntax (e.g. `import pandas as pd`). These are usually standardized nicknames.

We'll use following packages:

1. `pandas`: A very popular package for reading and manipulating data in python.
2. `huggingface_hub`: A package for extracting features from text data using
3. `re` for string matching
4. `textwrap` and `IPython` to make the display of some of the tables more readable
5. `pandas` is a popular package for data frames manipulation
6. `huggingface_hub` for making API calls and prompt the LLMs

In [4]:
import sys
import os
import re
import textwrap
from IPython.display import display, HTML
import pandas as pd
from huggingface_hub import InferenceClient

# the code below installs huggingface hub if it's missing
if 'google.colab' in sys.modules:  # If in Google Colab environment

    # Installing requisite packages
    !pip install huggingface_hub &> /dev/null

# this sets the working directory to the exercises folder
base_path = '/content/drive/My Drive/llms_egproc/exercises/'
os.chdir(base_path)

# Identifying decision reasons (part 1)
In this exercise, we will explore the capabilities of LLMs to identify decision reasons in verbal reports using the Hugging Face (HF) ecosystem.

By the end of this exercise, you will have learned how to:
- Design a zero-shot prompt
- Have large models on the Hugging Face servers evaluate your prompts  
- Validate the model output

## Getting access to a Large Language Model
Most best LLMs are to large to just simply be downloaded and run on your local machine. One soultion to get access to LLMs like GPT-4 or LLAMA-3-70b, is to call LLMs directly via an API. We will use access provided by Hugging Face (HF) company (note: you need a Pro account to get access to lare models like LLAMA-3-70b).

We start by settiing up the inference client. The `InferenceClient` function from huggingface ecosystem gives easy access to hundres of LLMs.
The main two arguments of the function are:
* `model`: the model name (see https://huggingface.co/meta-llama for LLAMA models avilable through HF)
* `token`: This is your personal token which authenticates your access to the serivce.


In [38]:
# paste your token here
API_TOKEN = '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'
# we'll use the LLAMA-3 model, version with 70 Bilion parameters
LLAMA_version = "meta-llama/Meta-Llama-3-70B-Instruct"
# pass model version and the token to the InferenceClient function and save the output under some name, e.g., LLAMA
LLAMA = InferenceClient(model = LLAMA_version, token = API_TOKEN)

## Using the InferenceClient
Now we can use the `LLAMA` object to prompt the model. For our purposes, we will focus on the `.text_generation` method, like in the code block below.
1. run the code and investigate the output
2. change the max_new_tokes argument to lower value, e.g., 100 and run again
3. change the content of the stupdid prompt

In [None]:
# let's create some stupid prompt and save it under informative name
stupid_prompt = 'Give an example of a stupid prompt'

# Get a response from the Meta-Llama-3-70B-Instruct saved under LLAMA object
# The max_new_tokens provided the limit on the output lenght
stupid_response = LLAMA.text_generation(prompt = stupid_prompt, max_new_tokens = 4000)
print(stupid_response)

### System and User messages
Lllama uses special tokens to distinguish between system and user parts of the prompt.
Run the code below and have a look at the full prompt.


In [None]:
# system message sets the role to a decision scientis
system_role = """
<|begin_of_text|><|start_header_id|>system<|end_header_id|>
You are an expert decision scientist
<|eot_id|>"""

user_question = """
<|start_header_id|>user<|end_header_id|>
What is the best way to make financial decisions?
<|eot_id|><|start_header_id|>assistant<|end_header_id|>"""

# combine both into a single prompt
full_prompt = system_role + user_question

# print the full prompt
print(full_prompt)

### Prompting the model
Try the prompt with the model role set to decision scientist.

1.   Try the prompt with the model role set to decision scientist.
2.   Now try different roles:
        *   Priest
        *   Economist
        *   Generic grandpa

In [None]:
response = LLAMA.text_generation(full_prompt, max_new_tokens = 200)
print(response)

## Identyfing decision reasons --- prompt template
Now have a look at our prompt. Read in the `prompt_template` from the google drive.
1. Make sure you understand all parts.
2. Which prompting techinques can you spot?

In [None]:
# Read the File
prompt_path = 'prompts/prompt_v1.txt'

# Open the file and read its contents
with open(prompt_path, 'r') as file:
    prompt_template = file.read()

print(prompt_template)

## Constructing the prompt
The single complete prompt for identifying the deicion reason has to include:
1. Definitions of key terms
2. Instructions on how to perform the identification
3. The DECISION REASON
4. The DESICION PROBLEM
5. The VERBAL REPORT

The code below reads in the problems, reasons, and reports from your google drive.


In [17]:
# read in decision problems, decision reasons, and verbal reports
decision_problems = pd.read_csv('data/decision_problems.csv', encoding = 'utf-8')
decision_reasons = pd.read_csv('data/decision_reasons.csv', encoding = 'utf-8')
verbal_reports = pd.read_csv('data/verbal_reports.csv', encoding = 'utf-8')

# merge verbal reports with decision problems
problems_reports = pd.merge(decision_problems, verbal_reports, on = 'problem_id')
# print(problems_reports)
# print(decision_reasons)

### Function for creating the full prompt
The function below tahes as inputs the `prompt_template` and fills the placeholders with corresponding elemnts.
Run the code cell to read in the function into the environment.

In [20]:
# function for constructing the full prompt
def generate_prompt(prompt_template, decision_problem, decision_reason, verbal_report):
    """
    Replaces placeholders in the prompt with the given decision problem, decision reason, and verbal report.
    """
    # Replace placeholders with actual values
    full_prompt = prompt_template.replace("DECISION_PROBLEM", decision_problem)
    full_prompt = full_prompt.replace("DECISION_REASON", decision_reason)
    full_prompt = full_prompt.replace("VERBAL_REPORT", verbal_report)

    return full_prompt

### Create the prompts
Now we will create a set of prompts for the maximum outcome decision reason.

In [45]:
# Select the description for expected value
selected_reason = 'maximum outcome'

# get the description of the reason from the decision_reasons data frame
selected_description = decision_reasons.loc[decision_reasons['decision reason name'] == selected_reason, 'decision reason description'].values[0]

# Create a list for storing prompts for the expected value reason
maximum_outcome_prompts = []

# Generate prompts for the specific decision reason
# this loops over each row of data with verbal reports and corrsponding decision problems
for _, row in problems_reports.iterrows():

    # here we are using the generate prompt function to create prompts for all verbal reports and the expected value reason
    prompt = generate_prompt(
        prompt_template,
        row['decision_problem'],
        selected_description,  # Use the selected description
        row['verbal_report']
    )
    maximum_outcome_prompts.append(prompt)

In [None]:
# have a look at the first full prompt
# you can investigate other prompts by changing the number from 0 to some other value
print(maximum_outcome_prompts[0])

## Identifing decision reasons: a test

- run first prompt
- make sense of output relative to prompt
- maybe try again or next prompt


In [78]:
# pass the first prompt to LLAMA and save the output
result = LLAMA.text_generation(maximum_outcome_prompts[0], max_new_tokens = 4000)

In [None]:
# print the llama evaluation
print(result[0])

## Extractig confidence ratings

### Function for extracting the confidence rating
As you can see in the output above, following our prompt the model provides full descrrpition of the deliberation process and the confidence assesment at the end.

In [27]:
# Function for extracting confidence assessments
def extract_confidence(s):
    """
    Extracts an integer value from a string enclosed between @ or @@ symbols.
    """
    # Regular expression to match patterns like @number@ or @@number@@
    pattern = r'@+(\s*\d+\s*)@+'

    # Search for the pattern in the string
    match = re.search(pattern, s)

    if match:
        # Extract the number and convert it to an integer
        number_str = match.group(1).strip()
        return int(number_str)

    return None

In [None]:
# test the extract_confidence function
print(extract_confidence('something else @10 @ xxx'))
print(extract_confidence('something else @@13@@ xxx'))
print(extract_confidence(result))

## Example Analysis
Now you'll run the analysis on the etire data set. For each verbal report and decision problem combination, the model with provide assessment of confidence on whether the individual used the expected value reason.

### Maximum Outcome
We will iterate over the entier list with prompts containinig the **maximum outcome** reason (stored in the `maximum_outcome_prompts`). On each iteration the model will assess if **maximum outcome*** reason was used by the individual, based on the verbal report.

Specifically:
1. We pass a `prompt` to the model
2. The full output from the LLAMA , `llama_response` is saved in a list `maximum_outcome_eval` for later inspection
3. We use the `extract_confidence` function to extract the confidence assesment from `llama_response`
4. We save the `confidence_assesment` in the new colmun of the data set with decision problems abnd verbal reports `problems_reports['maximum outcome']`


In [None]:
# list for storing the output from the LLAMA model
maximum_outcome_eval = []

# analyzed reason
analyzed_reason = "maximum outcome"

# new column in the problems_reports data set for stroting the confidence assesments
# remind that selected reason was set to 'expected value'
problems_reports[analyzed_reason] = None

# Iterate over the list of prompts, get responses, and extract numerical estimates and add them to the data set with problems and reports
for i, prompt in enumerate(maximum_outcome_prompts):

    # response from LLAMA
    llama_response = LLAMA.text_generation(prompt, max_new_tokens = 4000)
    maximum_outcome_eval.append(llama_response) # save the response to the expected_value_eval list

    # extract the confidence value from the response
    confidence_assesment = extract_confidence(llama_response)

    # confidence value into the data
    problems_reports.at[i, analyzed_reason] = confidence_assesment

    # monitor progress
    print(str(i) + '/' + str(problems_reports.shape[0]))

### Displayin the results
The functions below are not too imporant. Their only goal is to display the tables in the notebbok in a nice, HTML format, which is easier to read than the base output of `print()`.

In [85]:
# Function to wrap text
def wrap_text(text, width=100):
    return "<br>".join(textwrap.wrap(text, width))

# display data frames in HTML
def disp_tab(dd):
    dd = dd.to_html(escape=False)
    return display(HTML(dd))

# Function to show verbal reports with assigned numbers in a specified range
def show_verbal_reports_in_range(data, reason, min_confidence, max_confidence):
    """
    Shows verbal reports for which the model assigned a confidence within the specified range.
    """
    filtered_data = data[(data[reason] >= min_confidence) & (data[reason] <= max_confidence)] # filter by the specified range

     # wrap the text for nicer display
    filtered_data.loc[:, 'verbal_report'] = filtered_data['verbal_report'].apply(wrap_text)
    filtered_data.loc[:, 'decision_problem'] = filtered_data['decision_problem'].apply(lambda x: wrap_text(x, width=40))

    # select only the columns with report and confidence assesment
    filtered_data = filtered_data[['decision_problem', 'verbal_report', 'choice', reason]]
    filtered_data = filtered_data.to_html(escape=False) # to html

    return display(HTML(filtered_data))
    # return filtered_data[['verbal_report', reason]]

#### High confidence assesments
The code below displays the verbal reports for which the LLM thought that there is a HIGH chance that the **maximum outcome** reason was used when making the decision.

In [None]:
# Show verbal reports for which the Expected value reason was assessed to be used with high confidecne, i.e., between 80 to 100
show_verbal_reports_in_range(problems_reports, 'maximum outcome', 80, 100)

#### Low confidence assesments
The code below displays the verbal reports for which the LLM thought that there is a LOW chance that the **maximum outcome** reason was used when making the decision.

In [None]:
# Show verbal reports for which the Expected value reason was assessed not to be used---i.e., confidence in usuing th reason was low, between 0 and 20
show_verbal_reports_in_range(problems_reports, 'maximum outcome', 0, 20)

#### Full data
You can view the entire data set by running the code below.

In [None]:
# Show all results
show_verbal_reports_in_range(problems_reports, analyzed_reason, 0, 100)

#### LLM reasoninig
Full LLM output was saved in the `maximum_outcome_eval` object. You can acces it by printing the elements one at a time. Notice that the output tables above provide row numbers in the leftmost column. These numbers corresponds to the entries in the `maximum_outcome_eval` object.

Thus, if you want to display the LLM deliberatino process from which the assesment in row 1 was taken, you simply run:

In [None]:
print(maximum_outcome_eval[1])

# SURE OUTCOME
Run the analyses for the **sure outcome** decision reason.

## Set up the reason name and description

In [59]:
# Select the description for expected value
analyzed_reason = "sure outcome"
selected_description = decision_reasons.loc[decision_reasons['decision reason name'] == analyzed_reason, 'decision reason description'].values[0]
print(selected_description)

The reason considers the presence of a sure outcome, that is an outcome with 100% probability, of each lottery. The reason prefers the lottery with or without the sure outcome, depending on whether the sure outcome is a favorable outcome in the context of all possible outcomes.


## Sure Outcome Prompts
Create the list with prompts contatinig **sure outcome** reason

In [None]:
# Create a list for storing prompts for the expected value reason
sure_outcome_prompts = []

# Generate prompts for the specific decision reason
for _, row in problems_reports.iterrows():

    # here we are using the generate prompt function to create prompts for all verbal reports and the expected value reason
    prompt = generate_prompt(
        prompt_template,
        row['decision_problem'],
        selected_description,  # Use the selected description
        row['verbal_report']
    )
    sure_outcome_prompts.append(prompt)

print(sure_outcome_prompts[0])

## Sure Outcome LLAMA evaluation

#### Run the LLM on a random prompt

In [62]:
sure_outcome_res = LLAMA.text_generation(prompt = sure_outcome_prompts[5], max_new_tokens=4000),

In [None]:
print(sure_outcome_res[0])

#### Run the LLM on the entire list of prompts

In [None]:
# list for storing the output from the LLAMA model
sure_outcome_eval = []

# new column in the problems_reports data set for stroting the confidence assesments
# remind that selected reason was set to 'expected value'
problems_reports[analyzed_reason] = None

# Iterate over the list of prompts, get responses, and extract numerical estimates and add them to the data set with problems and reports
for i, prompt in enumerate(sure_outcome_prompts):

    # response from LLAMA
    llama_response = LLAMA.text_generation(prompt, max_new_tokens = 4000)
    sure_outcome_eval.append(llama_response) # save the response to the sure_outcome_eval list

    # extract the confidence value from the response
    confidence_assesment = extract_confidence(llama_response)

    # confidence value into the data
    problems_reports.at[i, analyzed_reason] = confidence_assesment

    # monitor progress
    print(str(i) + '/' + str(problems_reports.shape[0]-1))

#### High confidence assesments for sure outcome reason

In [None]:
# Show verbal reports for which the Sure outcome reason was assessed to be used with high confidecne, i.e., between 80 to 100
show_verbal_reports_in_range(problems_reports, 'sure outcome', 80, 100)

#### Low confidence assesments for sure outcome reason

In [None]:
# Show verbal reports for which the Sure outcome reason was assessed not to be used---i.e., confidence in usuing th reason was low, between 0 and 20
show_verbal_reports_in_range(problems_reports, 'sure outcome', 0, 20)

#### The full LLM reasoninig for selected data point
Change the number to view reasoninig associated with the data point of interest

In [None]:
print(maximum_outcome_eval[1])

# Recrating the analysis on your own
If you have the time, have a look at the list of reasons we prepared for you. Selecet a reason that you like (or don't like) and try to recrated the analysis for it.

Good luck!

In [None]:
disp_tab(decision_reasons)