https://python.langchain.com/docs/integrations/providers/huggingface
https://python.langchain.com/docs/integrations/llms/huggingface_hub

In [1]:
!pip install huggingface_hub



Create an access token and set it as an environment variable (HUGGINGFACEHUB_API_TOKEN)

import os

os.environ['HUGGINGFACEHUB_API_TOKEN'] = ''

In [None]:
from getpass import getpass

HUGGINGFACEHUB_API_TOKEN = getpass()

There exists two Hugging Face LLM wrappers, one for a local pipeline and one for a model hosted on Hugging Face Hub. Note that these wrappers only work for models that support the following tasks: text2text-generation, text-generation

In [3]:
from langchain import HuggingFaceHub
from langchain import PromptTemplate, LLMChain



In [4]:
question = "Who won the FIFA World Cup in the year 1994? "

template = """Question: {question}

Answer: Let's think step by step."""

prompt = PromptTemplate(template=template, input_variables=["question"])

In [13]:
repo_id = "google/flan-t5-large" #"google/flan-t5-xxl"  # See https://huggingface.co/models?pipeline_tag=text-generation&sort=downloads for some other options

In [14]:
llm = HuggingFaceHub(
    repo_id=repo_id, model_kwargs={"temperature": 0.5, "max_length": 64}
)
llm_chain = LLMChain(prompt=prompt, llm=llm)

print(llm_chain.run(question))

George Washington died in 1789. Obama was born in 1961. The answer: no.


In [15]:
question = "how far is san fransisco from san jose"
print(llm_chain.run(question))

San Francisco is located in the San Francisco Bay Area , which is about a 90-minute drive from San Jose . San Francisco is located in the San Francisco Bay Area , which is about a 90-minute drive from San Jose . So the answer is 90 minutes.


In [16]:
input_text = """
do sentiment analysis on the following sentence,
food is cold
"""

print(llm_chain.run(input_text))

Food is cold is a negative sentiment. So the answer is negative.


In [17]:
question = """Q: Can Obama have a conversation with George Washington?
Give the rationale before answering"""

print(llm_chain.run(question))

George Washington died in 1789. Obama was born in 1961. The answer: no.


In [18]:
input_text = """
Q: Answer the following yes/no question by
reasoning step-by-step.
Could a dandelion suffer from hepatitis?
A: Hepatitis only affects organisms with livers.
Dandelions don’t have a liver. The answer is no.
Q: Answer the following yes/no question by
reasoning step-by-step.
Can you write a whole Haiku in a single tweet?
A:"""
print(llm_chain.run(input_text))

Haiku is a Japanese poem that is around 108 characters long. A tweet is a short message sent on Twitter. The answer is yes.


In [19]:
input_text = """
Prompt: Answer the following yes/no question by reasoning step by step. Can a dog drive a car?

Output: Dogs do not have a drivers license nor can they operate a car. Therefore, the final answer is no."""

print(llm_chain.run(input_text))

Dogs are not able to operate a car. Dogs are not able to drive a car. Therefore, the final answer is no.


In [20]:
input_text = """
Tallest Mountain in the world"""

print(llm_chain.run(input_text))

The highest mountain in the world is Mount Everest. The highest mountain in the world is Mount Everest. The answer: Mount Everest.


# Hugging Face Local Pipelines
ref: https://python.langchain.com/docs/integrations/llms/huggingface_pipelines

In [21]:
from langchain.llms import HuggingFacePipeline

llm = HuggingFacePipeline.from_model_id(
    model_id="bigscience/bloom-1b7",
    task="text-generation",
    model_kwargs={"temperature": 0, "max_length": 64},
)

Downloading (…)okenizer_config.json:   0%|          | 0.00/222 [00:00<?, ?B/s]

Downloading tokenizer.json:   0%|          | 0.00/14.5M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/85.0 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/715 [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/3.44G [00:00<?, ?B/s]

Device has 1 GPUs available. Provide device={deviceId} to `from_model_id` to use availableGPUs for execution. deviceId is -1 (default) for CPU and can be a positive integer associated with CUDA device id.
Xformers is not installed correctly. If you want to use memory_efficient_attention to accelerate training use the following command to install Xformers
pip install xformers.


pip install xformers %

In [1]:
from langchain.llms import HuggingFacePipeline
llm_cuda = HuggingFacePipeline.from_model_id(
    model_id="bigscience/bloom-1b7",
    task="text-generation",
    model_kwargs={"temperature": 0, "max_length": 64},
    device=0,
)

In [4]:
llm_cuda

HuggingFacePipeline(cache=None, verbose=False, callbacks=None, callback_manager=None, tags=None, metadata=None, pipeline=<transformers.pipelines.text_generation.TextGenerationPipeline object at 0x7f677cb61f30>, model_id='bigscience/bloom-1b7', model_kwargs={'temperature': 0, 'max_length': 64}, pipeline_kwargs={})

In [5]:
from langchain.prompts import PromptTemplate

template = """Question: {question}

Answer: Let's think step by step."""
prompt = PromptTemplate.from_template(template)

In [7]:
from langchain import LLMChain
llm_chain = LLMChain(prompt=prompt, llm=llm_cuda)

In [8]:
question = "What is electroencephalography?"
print(llm_chain.run(question))



 First, we need to understand what is an electroencephalogram. An electroencephalogram is a recording of brain activity. It is a recording of brain activity that is made by placing electrodes on the scalp. The electrodes are placed


In [9]:
question = "Who won the FIFA World Cup in the year 1994? "
print(llm_chain.run(question))

 The first World Cup was held in France in 1954. The second World Cup was held in Mexico City in 1986. The third World Cup was held in Brazil in 1994. The fourth World Cup was held in Russia in 1998
