<a href="https://colab.research.google.com/github/kfahn22/Colab_notebooks/blob/main/LangChain_Running_HuggingFace_Models.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

[Uniting Forces: Integrating Hugging Face with Langchain for Enhanced Natural Language Processing](https://huggingface.co/blog/Andyrasika/agent-helper-langchain-hf)

[LangChain docs](https://python.langchain.com/docs/integrations/llms/huggingface_hub)

[Question Answering Docs on HuggingFace Hub](https://huggingface.co/docs/transformers/tasks/question_answering)


In [1]:
!pip -q install langchain huggingface_hub transformers sentence_transformers accelerate bitsandbytes

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/815.9 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━[0m [32m481.3/815.9 kB[0m [31m14.2 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m815.9/815.9 kB[0m [31m17.1 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/132.8 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m132.8/132.8 kB[0m [31m17.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m280.0/280.0 kB[0m [31m32.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m105.0/105.0 MB[0m [31m16.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.7/1.7 MB[0m [31m42.8 MB/s[0m eta [36m0:00:00[0m
[2K     [9

We need to pass our API tokens for LangChain and the HuggingFace hub.

In [2]:
# Used to securely store your API key
from google.colab import userdata

In [3]:
LANGCHAIN_API_KEY=userdata.get('LANGCHAIN_API_KEY')
HUGGINGFACEHUB_API_TOKEN=userdata.get('HF_TOKEN')

In [4]:
import os

os.environ["LANGCHAIN_API_KEY"] = LANGCHAIN_API_KEY
os.environ["HUGGINGFACEHUB_API_TOKEN"] = HUGGINGFACEHUB_API_TOKEN

## QABERT-small

* [QABERT-small](https://huggingface.co/SRDdev/QABERT-small)
* [QABERT github](https://github.com/SRDdev/QABERT-small)

In [None]:
from transformers import AutoTokenizer, AutoModelForQuestionAnswering

QAtokenizer = AutoTokenizer.from_pretrained("SRDdev/QABERT-small")

QAmodel = AutoModelForQuestionAnswering.from_pretrained("SRDdev/QABERT-small")


tokenizer_config.json:   0%|          | 0.00/360 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/712k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/561 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/265M [00:00<?, ?B/s]

In [None]:
context = "Extractive Question Answering is the task of extracting an answer from a text given a question. An example of a question answering dataset is the SQuAD dataset, which is entirely based on that task. If you would like to fine-tune a model on a SQuAD task, you may leverage the examples/pytorch/question-answering/run_squad.py."

In [None]:
from transformers import pipeline

ask = pipeline("question-answering", model= QAmodel , tokenizer = QAtokenizer)

result = ask(question="What is a good example of a question answering dataset?", context=context)

print(f"Answer: '{result['answer']}'")


Answer: 'SQuAD dataset,'


## Original

* [LangChain - Using Hugging Face Models locally (code walkthrough](https://www.youtube.com/watch?v=Kn7SX2Mx_Jk)
* Notebook from [here](https://colab.research.google.com/drive/1h2505J5H4Y9vngzPD08ppf1ga8sWxLvZ?usp=sharing)

## HuggingFace

There are two Hugging Face LLM wrappers, one for a local pipeline and one for a model hosted on Hugging Face Hub. Note that these wrappers only work for models that support the following tasks: text2text-generation, text-generation

## Use the HuggingFaceHub

In [5]:
from langchain_community.llms import HuggingFaceHub

In [6]:
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate

In [11]:
question = "Who won the FIFA World Cup in the year 1994? "

template = """Question: {question}

Answer: Let's think step by step."""

prompt = PromptTemplate.from_template(template)

In [4]:
# from langchain import PromptTemplate, HuggingFaceHub, LLMChain

# template = """Question: {question}

# Answer: Let's think step by step."""

# prompt = PromptTemplate(template=template, input_variables=["question"])

In [12]:
repo_id = "google/flan-t5-xxl"  # See https://huggingface.co/models?pipeline_tag=text-generation&sort=downloads for some other options

In [None]:
#repo_id = "databricks/dolly-v2-3b"

In [13]:
llm = HuggingFaceHub(
    repo_id=repo_id, model_kwargs={"temperature": 0.5, "max_length": 64}
)
llm_chain = LLMChain(prompt=prompt, llm=llm)

print(llm_chain.run(question))

The 1994 FIFA World Cup was held in France. France won the 1994 FIFA World Cup. The answer: France.


In [14]:
question = "What area is best for growing wine in France?"

print(llm_chain.run(question))

France is a country that produces wine. Wine is produced in a region called the wine growing area or vin ordinaire. The area of France that is best for growing wine is the wine growing area. The answer: the wine growing area.


## BlenderBot

Doesn't work on the Hub

In [10]:
prompt = PromptTemplate(
    input_variables=["question"],
    template="What is the {question}?",
)

In [15]:
blenderbot_chain = LLMChain(prompt=prompt,
                     llm=HuggingFaceHub(repo_id="facebook/blenderbot-1B-distill",
                                        model_kwargs={"temperature":0,
                                                      "max_length":64}))

In [22]:

question = "What area is best for growing wine in France?"


print(blenderbot_chain.run(question))

 I'm not sure, but I do know that France is one of the largest producers of wine in the world.


## With Local model from HF

### Why would you want to use local mode?

- fine-tuned models
- GPU hosted etc
- some models only work locally

## T5-Flan - Encoder-Decoder

In [7]:
from langchain.llms import HuggingFacePipeline
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline, AutoModelForSeq2SeqLM

model_id = 'google/flan-t5-large'# go for a smaller model if you dont have the VRAM
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSeq2SeqLM.from_pretrained(model_id, load_in_8bit=True)

pipe = pipeline(
    "text2text-generation",
    model=model,
    tokenizer=tokenizer,
    max_length=100
)

local_llm = HuggingFacePipeline(pipeline=pipe)


tokenizer_config.json:   0%|          | 0.00/2.54k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.42M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/2.20k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/662 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/3.13G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

In [8]:
print(local_llm('What is the capital of France? '))

  warn_deprecated(


paris


In [11]:
llm_chain = LLMChain(prompt=prompt,
                     llm=local_llm
                     )

question = "What is the capital of England?"

print(llm_chain.run(question))

  warn_deprecated(


london


## GPT2-medium - Decoder Only Model

microsoft/DialoGPT-large

In [12]:
model_id = "gpt2-medium"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_length=100
)

local_llm = HuggingFacePipeline(pipeline=pipe)

config.json:   0%|          | 0.00/718 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.52G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

In [13]:
llm_chain = LLMChain(prompt=prompt,
                     llm=local_llm
                     )

question = "What is the capital of France?"

print(llm_chain.run(question))

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


 page? In France a capital is considered to be anywhere outside of Paris or Nice (though some cities do also have capitals such as Bordeaux), though the exact boundaries might change based on its size/location.

For the French cities listed above the capital is:

France - Lyon French-Mediterranean - Marseille - Bordeaux French-North Africa-Iman - Cairo - Beni Suef

France - Paris


## BlenderBot - Encoder-Decoder

In [14]:
from langchain.llms import HuggingFacePipeline
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline, AutoModelForSeq2SeqLM

model_id = 'facebook/blenderbot-1B-distill'
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSeq2SeqLM.from_pretrained(model_id)

pipe = pipeline(
    "text2text-generation",
    model=model,
    tokenizer=tokenizer,
    max_length=100
)

local_llm = HuggingFacePipeline(pipeline=pipe)

tokenizer_config.json:   0%|          | 0.00/1.05k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.38k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/127k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/62.9k [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/16.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/772 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/310k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/2.87G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/347 [00:00<?, ?B/s]

In [15]:
llm_chain = LLMChain(prompt=prompt,
                     llm=local_llm
                     )

question = "What area is best for growing wine in France?"

print(llm_chain.run(question))

 I'm not sure, but I do know that France is the world's largest producer of wine.


## SentenceTransformers

In [16]:
from langchain.embeddings import HuggingFaceEmbeddings

model_name = "sentence-transformers/all-mpnet-base-v2"

hf = HuggingFaceEmbeddings(model_name=model_name)

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.6k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/438M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [17]:
hf.embed_query('this is an embedding')

[0.010657301172614098,
 -0.09967263042926788,
 -0.026967110112309456,
 0.065317802131176,
 0.021004989743232727,
 0.04262344539165497,
 0.011534185148775578,
 -0.006229348015040159,
 0.05175818130373955,
 0.0073067559860646725,
 0.02135351672768593,
 0.04269151762127876,
 0.02314387634396553,
 0.009952723048627377,
 0.056463129818439484,
 -0.061379820108413696,
 0.05274379998445511,
 0.024683991447091103,
 -0.013267709873616695,
 -0.0070512196980416775,
 0.026656337082386017,
 -0.005913520231842995,
 0.004097478464245796,
 0.038412388414144516,
 -0.01423066109418869,
 0.023023512214422226,
 -0.007326608989387751,
 -0.03562537953257561,
 -0.01793413795530796,
 -0.013930206187069416,
 0.011977513320744038,
 -0.0073659829795360565,
 0.024451514706015587,
 -0.06637246906757355,
 1.5677649116696557e-06,
 0.018217239528894424,
 0.0019748907070606947,
 -0.018329322338104248,
 -0.01493076141923666,
 -0.005393383093178272,
 -0.011222349479794502,
 0.01579292118549347,
 -0.027141869068145752,
 -

In [None]:
hf.embed_documents(['this is an embedding','this another embedding'])

[[0.010657318867743015,
  -0.09967268258333206,
  -0.02696709893643856,
  0.06531770527362823,
  0.021004999056458473,
  0.042623501271009445,
  0.011534065939486027,
  -0.006229353602975607,
  0.0517583042383194,
  0.007306722458451986,
  0.021353380754590034,
  0.04269153252243996,
  0.023143835365772247,
  0.00995270162820816,
  0.056463032960891724,
  -0.06137979403138161,
  0.0527438260614872,
  0.024683943018317223,
  -0.013267838396131992,
  -0.007051167543977499,
  0.02665640041232109,
  -0.005913490429520607,
  0.004097461700439453,
  0.038412418216466904,
  -0.01423065084964037,
  0.023023542016744614,
  -0.007326596416532993,
  -0.03562536463141441,
  -0.017934132367372513,
  -0.013930188491940498,
  0.011977534741163254,
  -0.007365899626165628,
  0.024451464414596558,
  -0.06637255847454071,
  1.5677629789934144e-06,
  0.018217233940958977,
  0.0019748930353671312,
  -0.01832951232790947,
  -0.014930643141269684,
  -0.005393484607338905,
  -0.011222314089536667,
  0.015792

In [None]:


hf = HuggingFaceHubEmbeddings(
    repo_id=model_name,
    task="feature-extraction",
    # huggingfacehub_api_token="my-api-key",
)