<a href="https://colab.research.google.com/github/navneetkrc/langchain_colab_experiments/blob/main/LangChain_Running_HuggingFace_Models_Locally.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Based on this Video by Sam Witteveen
https://www.youtube.com/watch?v=Kn7SX2Mx_Jk
Colab Code Notebook: [https://drp.li/m1mbM](https://drp.li/m1mbM)

Local LLM Models 

* google/flan-t5-xl
* facebook/blenderbot-1B-distill
* sentence-transformers/all-mpnet-base-v2,
* gpt2-medium



In [1]:
!pip -q install langchain huggingface_hub transformers sentence_transformers

## HuggingFace

There are two Hugging Face LLM wrappers, one for a local pipeline and one for a model hosted on Hugging Face Hub. Note that these wrappers only work for models that support the following tasks: text2text-generation, text-generation


In [2]:
import os


os.environ['HUGGINGFACEHUB_API_TOKEN'] = ''

## Use the HuggingFaceHub

In [3]:
from langchain import PromptTemplate, HuggingFaceHub, LLMChain

template = """Question: {question}

Answer: Let's think step by step."""

prompt = PromptTemplate(template=template, input_variables=["question"])

In [4]:
llm_chain = LLMChain(prompt=prompt, 
                     llm=HuggingFaceHub(repo_id="google/flan-t5-xl", 
                                        model_kwargs={"temperature":0, 
                                                      "max_length":64}))

In [5]:
question = "What is the capital of France?"

print(llm_chain.run(question))

Paris is the capital of France. The final answer: Paris.


In [6]:
question = "What area is best for growing wine in France?"

print(llm_chain.run(question))

The best area for growing wine in France is the Loire Valley. The Loire Valley is located in the south of France. The area of France that is best for growing wine is the Loire Valley. The final answer: Loire Valley.


## BlenderBot

Doesn't work on the Hub

In [7]:
'''
blenderbot_chain = LLMChain(prompt=prompt, 
                     llm=HuggingFaceHub(repo_id="facebook/blenderbot-1B-distill", 
                                        model_kwargs={"temperature":0, 
                                                      "max_length":64}))
                                                      '''

'\nblenderbot_chain = LLMChain(prompt=prompt, \n                     llm=HuggingFaceHub(repo_id="facebook/blenderbot-1B-distill", \n                                        model_kwargs={"temperature":0, \n                                                      "max_length":64}))\n                                                      '

In [8]:
# question = "What is the capital of France?"
# question = "What area is best for growing wine in France?"

# print(blenderbot_chain = LLMChain(prompt=prompt, 
# .run(question))

## With Local model from HF 

### Why would you want to use local mode?

- fine-tuned models
- GPU hosted etc
- some models only work locally

## T5-Flan - Encoder-Decoder

In [9]:
from langchain.llms import HuggingFacePipeline
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline, AutoModelForSeq2SeqLM

model_id = 'google/flan-t5-large'# go for a smaller model if you dont have the VRAM
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSeq2SeqLM.from_pretrained(model_id)

pipe = pipeline(
    "text2text-generation",
    model=model, 
    tokenizer=tokenizer, 
    max_length=100
)

local_llm = HuggingFacePipeline(pipeline=pipe)


In [10]:
print(local_llm('What is the capital of France? '))

paris


In [11]:
llm_chain = LLMChain(prompt=prompt, 
                     llm=local_llm
                     )

question = "What is the capital of England?"

print(llm_chain.run(question))

The capital of England is London. London is the capital of England. So the answer is London.


## GPT2-medium - Decoder Only Model

microsoft/DialoGPT-large

In [12]:
model_id = "gpt2-medium"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

pipe = pipeline(
    "text-generation", 
    model=model, 
    tokenizer=tokenizer, 
    max_length=100
)

local_llm = HuggingFacePipeline(pipeline=pipe)

In [13]:
llm_chain = LLMChain(prompt=prompt, 
                     llm=local_llm
                     )

question = "What is the capital of France?"

print(llm_chain.run(question))

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.




1. Capital of Paris – France

Source: Wikipedia

2. Capital of Paris – Switzerland

Source: Wikipedia

3. Capital of Paris – Luxembourg

Source: Wikipedia

4. Capital of Paris – Netherlands

Source: Wikipedia

5. Capital of Paris – Germany

Source: Wikipedia

6. Capital of Paris – United Kingdom


## BlenderBot - Encoder-Decoder

In [14]:
from langchain.llms import HuggingFacePipeline
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline, AutoModelForSeq2SeqLM

model_id = 'facebook/blenderbot-1B-distill'
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSeq2SeqLM.from_pretrained(model_id)

pipe = pipeline(
    "text2text-generation",
    model=model, 
    tokenizer=tokenizer, 
    max_length=100
)

local_llm = HuggingFacePipeline(pipeline=pipe)

In [15]:
llm_chain = LLMChain(prompt=prompt, 
                     llm=local_llm
                     )

question = "What area is best for growing wine in France?"

print(llm_chain.run(question))

 I'm not sure, but I do know that France is one of the largest producers of wine in the world.


## SentenceTransformers

In [16]:
from langchain.embeddings import HuggingFaceEmbeddings

model_name = "sentence-transformers/all-mpnet-base-v2"

hf = HuggingFaceEmbeddings(model_name=model_name)

In [None]:
hf.embed_query('this is an embedding')

In [None]:
hf.embed_documents(['this is an embedding','this another embedding'])

In [None]:


hf = HuggingFaceHubEmbeddings(
    repo_id=model_name,
    task="feature-extraction",
    # huggingfacehub_api_token="my-api-key",
)