# Workshop Setup

## Retrieve HuggingFace API token
1. Go to https://huggingface.co/
2. Settings > Access Tokens > copy API token
3. Update the token in notebooks/queries/access_token.txt

## A word on the HF API rates
- Limits are not disclosed, use API wisely.
- Rates are reset every top of the hour.
- Free tier should be enough to run this workshop.
- Concurrent requests rates are much more strict than sequencial, avoid large request/sec.
- You have room to modify the prompts and rerun anyway.

## If you reach the rate limits
- Workshop results are stored in user_info.csv.

In [None]:
%load_ext autoreload
%autoreload 2

!pip install requests
!pip install pandas

In [None]:
from dataset import loader
from queries import Queries
from queries import Prompts
import pandas as pd

# This is our email datasource
emails_ds = loader.load_dataset()
userinfo = pd.DataFrame(columns=["sentiment", "loan_qty", "sender", "motivation", "esg"])


# The stuff ResponsibleLending has to deal with

Take a look to the sample dataset of emails received at ResponsibleLending customer service center by running **view_dataset** notebook under this folder.    
There is a mix of senders and motivations!

# Exploring sentiment analysis

LLM models are already pretrained with large corpus of data so some functionality is available out of the box.

* In this example we'll be using Falcon-7B-instruct (https://huggingface.co/tiiuae/falcon-7b-instruct)
* Falcon-40B is now available in HuggingFace Inference API: https://huggingface.co/tiiuae/falcon-40b

If you want to try Falcon-40B you can do so by changing the model parameter in the run_query method:
```python
def run_query(payload={}, model="tiiuae/falcon-40b"):
```

In the following example we'll classify email's sentiment in positive or negative using default model parameters.

## Positive/negative classification

In [None]:
for id,body in emails_ds.items():
    prompt = Prompts.get_sentiment(body, "positive", "negative")
    sentiment = Queries.run_query({"inputs": prompt})
    print(f"{id} -> {sentiment}")

LLM models tend to be verbose as they are created to generate text. In order to simplify results processing it's useful to understand the parameters they use.

Let's try the same query as before but now constraining the model a bit. Since we want to classify in two categories, we just need to generate one word (less tokens), plus we don't need the input text as part of the output.

Limiting the number of generated tokens has two purposes:
1. lower operational costs (less tokens generated)
2. the output is easier to process.

In [None]:
for id,body in emails_ds.items():
    prompt = Prompts.get_sentiment(body, "positive", "negative")
    sentiment = Queries.run_query({"inputs": prompt, "parameters":{"max_new_tokens": 3, "return_full_text": False}})
    print(f"{id} -> {sentiment}")
    

Parameters change depending on the model and the execution environment.   
HuggingFace Inference API uses different parameters depending on the task: https://huggingface.co/docs/api-inference/detailed_parameters

Note: Sentiment results above have a problem, most of the emails are still classified as positive!

We need a better strategy to run the sentiment analysis.

## A better classification?

Maybe positive/negative is not the classification we are looking for...

In [None]:
for id,body in emails_ds.items():
    prompt = Prompts.get_sentiment(body, "violent", "nonviolent")
    sentiment = Queries.run_query({"inputs": prompt, "parameters":{"max_new_tokens": 2, "return_full_text": False}})
    sentiment = sentiment[0].get("generated_text")
    print(f"{id} -> {sentiment}")
    userinfo.at[id, "sentiment"] = sentiment

# Who's sending this?

Some models perform better than others for simple tasks.   
Instead of falcon-7b-instruct let's use a very good zero-shot smaller model: flan-t5-xxl

In [None]:
for id,body in emails_ds.items():
    prompt = Prompts.get_sender(body)
    sender = Queries.run_query({"inputs": prompt, "parameters":{"max_new_tokens": 12, "return_full_text": False}},
                                 model="google/flan-t5-xxl")
    sender = sender[0].get("generated_text")
    print(f"{id} -> {sender}")
    userinfo.at[id, "sender"] = sender

Notes:
1. flan-t5-xxl is not even in the LLM top list anymore: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard
2. Potential data leaks? Where is this LLM model running?
3. Are larger models always more suitable than smaller ones?

# How much are they asking for?

In [None]:
for id,body in emails_ds.items():
    prompt = Prompts.get_loan(body)
    loan = Queries.run_query({"inputs": prompt, "parameters":{"max_new_tokens": 10, "return_full_text": False}},
                              model="google/flan-t5-xxl")
    loan = loan[0].get("generated_text")
    print(f"{id} -> {loan}")
    userinfo.at[id, "loan_qty"] = loan

Note: what happens if we use Falcon-7B-instruct model? (try just removing the model parameter)

# Fine tuning ...
Fine tuning involves adjusting the _weights_ of the internal RNN of the LLM model (remember we mentioned _transformers?_)
- benefits:
    - more specific model
    - faster
- challenges:
    - significant volume of training data
    - model becomes less flexible
    - computationally expensive, large GPUs
    - specific expertise
 
An example of a training dataset to customize Q/A chatbots: https://huggingface.co/datasets/samsum/viewer/samsum/

## Think twice when fine tuning

Is there any other alternative to fine tuning?
LLM models are very flexible:
1. prompt engineering: can't you just adjust the prompt?
2. few shots training: provide examples to the LLM model as part of the context. Don't adjust the weights.
3. enrich the context with large datasets: RAG approach: https://www.ml6.eu/blogpost/leveraging-llms-on-your-domain-specific-knowledge-base

# ESG analysis

In the following example we try to determine the motivation of the email sender.
Very frequently one single model won't be the best approach for a specific scenario but a combination of them.

In this case we use falcon-7b for the QA on the text generation and a custom model trained on ESG ranking for the text classification.

The classification we are looking for, ESG (environment + social + governance), is very specific. There are some approaches:
1. Zero-shot classification into ESG categories
2. Enrich the context some examples related to ESG (few-shots classification)
3. Fine tuning model: train a model with specific ESG data to customize the classification:  
3.1 Use an existing model: https://huggingface.co/TrajanovRisto/bert-esg   
3.2 Trained with a sample ESG dataset: https://huggingface.co/datasets/TrajanovRisto/esg-sentiment

More on the LLM fine-tuning: T5 fine-tuning for Esperanto: https://github.com/huggingface/blog/blob/main/notebooks/01_how_to_train.ipynb


In [None]:
for id,body in emails_ds.items():
    prompt = Prompts.get_purpose(body)
    motivation = Queries.run_query({"inputs": prompt, "parameters":{"max_new_tokens": 100, "return_full_text": False}})
    motivation = motivation[0].get("generated_text")
    print(f"{id} -> {motivation}\n")
    userinfo.at[id, "motivation"] = esg

    esg = Queries.run_query({"inputs": Prompts.get_esg(motivation)}, model="TrajanovRisto/bert-esg")
    print(f"{id} -> {esg}")
    esg = sorted(esg[0], key=lambda x: x['score'], reverse=True)[0:3]
    esg = "-".join([l['label'] for l in esg])
    print(f"{id} -> {esg}\n")
    userinfo.at[id, "esg"] = esg

# The secret sauce of ResponsibleLending 

Once we have defined the different dimensions for each customer it's time to build the loan recommendation system that will decide whether the sender gets the loan or not.   
We use a mini-RAG approach, we enrich the context with some data to drive the model behaviour.   
Let's use falcon-7b-instruct as model, we'll be generating the responses with a custom prompt.    
Check _get_recommendation_ method. 

In [None]:

# Load  user info from the snapshot
#userinfo = pd.read_csv("userinfo.csv", index_col=0)

ids = list(userinfo.index)
for id in ids:
    sentiment = userinfo.at[id, "sentiment"]
    loan_qty = userinfo.at[id, "loan_qty"]
    sender = userinfo.at[id, "sender"]
    motivation = userinfo.at[id, "motivation"]
    esg_data = userinfo.at[id, "esg"]
    prompt = Prompts.get_recommendation(sentiment, loan_qty, sender, motivation, esg_data)  
    reply_email = Queries.run_query({"inputs": prompt, "parameters":{"max_new_tokens": 150, "return_full_text": False, "temperature":1}})

    print(f"""{id} -> {reply_email}\n""")


# Bonus section: How were the emails generated?


## The villains
Replace the sender with your favourite villain...

In [None]:
query = """
You are Iceman. 
Introduce yourself, your residence and request a credit loan of $100000 to the bank's local branch of ResponsibleLending. 
You will use the money to defeat your greatest superhero rival.
Provide details of your achievements and why you should get the loan.
Mention the name of your greatest enemy, the reason why you are enemies and how you plan to eliminate him.
Be informal.

Email: Dear"""
output = Queries.run_query({"inputs": query, "parameters": {"max_new_tokens": 900, "return_full_text" : False, "temperature":0.8}})
print(output)

Note: notice how easy it is to add bias in LLM models, _eliminate him_

## The superheros
And the emails of the heros.   
Replace it with the hero of your choice:

In [None]:
query = """
You are Batman.
Write an email asking for $100000 to the local branch customer service of ResponsibleLending.
Introduce yourself in great detail, including location of birth and current residence.
Provide great detail of your well-known career achievements and concrete things you are famous for.
Think of a new ESG-related project in your local community and describe it.

Email: Dear"""
output = Queries.run_query({"inputs": query, "parameters": {"max_new_tokens": 600, "return_full_text" : False, "temperature":0.8}})
print(output)