<a href="https://colab.research.google.com/github/javier-jaime/Tool-Crib/blob/master/LangChain/HuggingFaceHub_LLMs_integrations_Tutorial.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# LangChain & HuggingFaceHub LLMs integrations Tutorial

In this tutorial we will use LangChain Open Source Orchestration Framework to integrate Large Language Models from Hugging Face Hub.

This tutorial is based on code from: Generative AI with LangChain: Build large language model (LLM) apps with Python, ChatGPT and other LLMs by Ben Auffarth (2023) https://github.com/benman1/generative_ai_with_langchain

### Notebook Preparation

First we will need to install the langchain library, and import all the necessary requirements

In [None]:
# Install langchain library the first time, comment/uncomment as required
!pip install langchain
from langchain_community.llms import HuggingFaceHub
from langchain import PromptTemplate, LLMChain
from transformers import pipeline

# Import operating system and userdata interface from Google Colab
import os
from google.colab import userdata

To use Hugging Face as a provider for your models, you need to create an account and get an API Key from: https://huggingface.co/settings/profile

In [2]:
# Set an API key directly to the Python environment, comment/uncomment as required

# os.environ['HUGGINGFACEHUB_API_TOKEN'] = '<your API key token>'

# Or store the environment variable as a Secret in Colab, and

# Access your Secret Key and declare it as an environment variable

os.environ['HUGGINGFACEHUB_API_TOKEN'] = userdata.get('HUGGINGFACEHUB_API_TOKEN')

### Hugging Face Simple Example

In this case we are using a simple open source model developed by Google to test a simple prompt,

with a temperature of 0.5 (half BS) and a maximum lenght of 64.

In [30]:
# Initialize the model
llm = HuggingFaceHub(
model_kwargs={"temperature": 0.5, "max_length": 64},
repo_id="google/flan-t5-xxl"
)

In [31]:
# Pass the prompt
prompt = input('Enter your question: ')
completion = llm(prompt)
print(completion)

Enter your question: Where is Calgary?
canada


Now we can run a simple model by chaining a prompt template and a LLM

In [33]:
template = """Question: {question}
Answer: Let's think step by step."""

prompt = PromptTemplate(template=template, input_variables=['question'])

llm_chain = LLMChain(prompt=prompt, llm=llm, verbose=True)

test_question = " Who was the mayor in the year of Calgary Olympics games"

llm_chain.run(test_question)



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mQuestion:  Who was the mayor in the year of Calgary Olympics games
Answer: Let's think step by step.[0m

[1m> Finished chain.[0m


'In 1988, the Calgary Olympics games were held. The mayor in 1988 was Ken Melamed. The answer: Ken Melamed.'

We used verbose=True to see the reasoning, but we didn't ge the right result, Ken Melamed was the Major of Vancouver in 1988, let's try again.

In [38]:
question = " Who was the Premier of Alberta in the year of Calgary Olympics games"
llm_chain.run(question)



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mQuestion:  Who was the Premier of Alberta in the year of Calgary Olympics games
Answer: Let's think step by step.[0m

[1m> Finished chain.[0m


'Premier of Alberta is the leader of the provincial government of Alberta. The year of Calgary Olympics games was 1988. Ed Stelmach was the Premier of Alberta in 1988. The answer: Ed Stelmach.'

Wrong again, but close enough for a small model.


### Building an Customer Service app


Generative AI can assist customer service agents in several ways:

**Sentiment classification:** This helps identify customer emotions and allows agents to personalize their responses.

**Summarization:** This enables agents to understand the key points of lengthy customer messages and save time.

**Intent classification:** Similar to summarization, this helps predict the customerâ€™s purpose and allows for faster problem-solving.

**Answer suggestions:** This provides agents with suggested responses to common inquiries, ensuring that accurate and consistent messaging is provided.

We can list the 5 most downloaded models on Hugging Face Hub for each way
with Hugging Face API:

In [17]:
from huggingface_hub import list_models
def list_most_popular(task: str):
  print('\033[1m' + 'Model Id, Model Downloads')
  print('---------------------------------')
  print('\033[0m')
  for rank, model in enumerate(
    list_models(filter=task, sort="downloads", direction=-1)):
    if rank == 5:
      break
    print(f"{model.id}, {model.downloads}\n")

In [19]:
# For text classification
list_most_popular("text-classification")

[1mModel Id, Model Downloads
---------------------------------
[0m
cardiffnlp/twitter-roberta-base-sentiment-latest, 48893886

mrm8488/distilroberta-finetuned-financial-news-sentiment-analysis, 31289930

distilbert/distilbert-base-uncased-finetuned-sst-2-english, 13322853

lxyuan/distilbert-base-multilingual-cased-sentiments-student, 11632332

cardiffnlp/twitter-roberta-base-irony, 11075353



In [26]:
# For intent classification
list_most_popular('intent-classification')

[1mModel Id, Model Downloads
---------------------------------
[0m
qanastek/XLMRoberta-Alexa-Intents-Classification, 2603

bespin-global/klue-roberta-small-3i4k-intent-classification, 396

cartesinus/xlm-r-base-amazon-massive-intent, 231

lxyuan/banking-intent-distilbert-classifier, 78

cartesinus/mdeberta-v3-base_amazon-massive_intent, 47



In [20]:
# For summarization
list_most_popular('summarization')

[1mModel Id, Model Downloads
---------------------------------
[0m
google-t5/t5-small, 3649108

google-t5/t5-base, 2770111

facebook/bart-large-cnn, 2403645

philschmid/bart-large-cnn-samsum, 700812

sshleifer/distilbart-cnn-12-6, 648192



In [39]:
# For question answering
list_most_popular('question-answering')

[1mModel Id, Model Downloads
---------------------------------
[0m
deepset/roberta-base-squad2, 1141775

Intel/dynamic_tinybert, 460278

distilbert/distilbert-base-cased-distilled-squad, 389640

FabianWillner/distilbert-base-uncased-finetuned-squad, 374139

timpal0l/mdeberta-v3-base-squad2, 284454



A GPT-3.5 Generated e-mail shortened example

In [42]:
customer_email = """
I am writing to pour my heart out about the recent unfortunate experience
I had with one of your coffee machines that arrived broken. I anxiously
unwrapped the box containing my highly anticipated coffee machine.
However, what I discovered within broke not only my spirit but also any
semblance of confidence I had placed in your brand.
Its once elegant exterior was marred by the scars of travel, resembling a
war-torn soldier who had fought valiantly on the fields of some espresso
battlefield. This heartbreaking display of negligence shattered my dreams
of indulging in daily coffee perfection, leaving me emotionally distraught
and inconsolable
"""

The sentiment model *twitter-roberta-base-sentiment* was trained on tweets, it is the most used but not the most adequate for this use case.

In [48]:
sentiment_model = pipeline(
task="sentiment-analysis",
model="cardiffnlp/twitter-roberta-base-sentiment"
)

For the sentiment analysis, we got a rating and a numeric score that expresses confidence in the label.

The labels are:
**0** negative
**1** neutral
**2** positive

In [49]:
print(sentiment_model(customer_email))

[{'label': 'LABEL_0', 'score': 0.7691406607627869}]


In [51]:
print(sentiment_model("I am elated, I am so happy, this is the best thing that ever happened to me!"))

[{'label': 'LABEL_2', 'score': 0.9926880598068237}]


In [53]:
print(sentiment_model("I don't care. I guess it's ok, or not, I couldn't care one way or the other"))

[{'label': 'LABEL_1', 'score': 0.5958544611930847}]


In [54]:
print(sentiment_model("I am so angry and sad, I want to kill myself!"))

[{'label': 'LABEL_0', 'score': 0.9788626432418823}]


For Summarization we can execute the *facebook/bart-large-cnn* remotely from the HuggingFaceHub server, we will need the API Token.

In [56]:
summarizer = HuggingFaceHub(
repo_id="facebook/bart-large-cnn",
model_kwargs={"temperature":0, "max_length":180}
)
def summarize(llm, text) -> str:
  return llm(f"Summarize this: {text}!")
summarize(summarizer, customer_email)

'A customer\'s coffee machine arrived broken. "This heartbreaking display of negligence shattered my dreams," writes the customer. "I was emotionally distraught and inconsolable," he adds. "It was like a war-torn soldier who had fought valiantly on the fields of some espressobattlefield"'

Not so bad summary for a small model.