# Hugging Face Chat Wrapper

This notebook shows how to get started using Hugging Face LLM's as chat models.

In particular, we will:
1. Utilize the [HuggingFaceTextGenInference](https://github.com/langchain-ai/langchain/blob/master/libs/langchain/langchain/llms/huggingface_text_gen_inference.py), [HuggingFaceEndpoint](https://github.com/langchain-ai/langchain/blob/master/libs/langchain/langchain/llms/huggingface_endpoint.py), or [HuggingFaceHub](https://github.com/langchain-ai/langchain/blob/master/libs/langchain/langchain/llms/huggingface_hub.py) integrations to instantiate an `LLM`.
2. Utilize the `ChatHuggingFace` class to enable any of these LLMs to interface with LangChain's [Chat Messages](https://python.langchain.com/docs/modules/model_io/chat/#messages) abstraction.
3. Demonstrate how to use an open-source LLM to power an `ChatAgent` pipeline


> Note: To get started, you'll need to have a [Hugging Face Access Token](https://huggingface.co/docs/hub/security-tokens) saved as an environment variable: `HUGGINGFACEHUB_API_TOKEN`

In [1]:
!pip install -q text-generation transformers google-search-results numexpr langchainhub sentencepiece

## 1. Instantiate an LLM

There are three LLM options to choose from.

#### `HuggingFaceTextGenInference`

In [None]:
import os
from langchain.llms import HuggingFaceTextGenInference

ENDPOINT_URL = "<YOUR_ENDPOINT_URL_HERE>"
HF_TOKEN = os.getenv("HUGGINGFACEHUB_API_TOKEN")

llm = HuggingFaceTextGenInference(
    inference_server_url=ENDPOINT_URL,
    max_new_tokens=512,
    top_k=50,
    temperature=0.1,
    repetition_penalty=1.03,
    server_kwargs={
        "headers": {
            "Authorization": f"Bearer {HF_TOKEN}",
            "Content-Type": "application/json",
        }
    },
)

#### `HuggingFaceEndpoint`

In [2]:
from langchain.llms import HuggingFaceEndpoint

endpoint_url = "<YOUR_ENDPOINT_URL_HERE>"
llm = HuggingFaceEndpoint(
    endpoint_url=endpoint_url,
    task="text-generation",
    model_kwargs={
        "max_new_tokens": 512,
        "top_k": 50,
        "temperature": 0.1,
        "repetition_penalty": 1.03,
    },
)

#### `HuggingFaceHub`

In [None]:
from langchain.llms import HuggingFaceHub

llm = HuggingFaceHub(
    repo_id="HuggingFaceH4/zephyr-7b-beta",
    task="text-generation",
    model_kwargs={
        "max_new_tokens": 512,
        "top_k": 50,
        "temperature": 0.1,
        "repetition_penalty": 1.03,
    },
)

## 2. Instantiate the `ChatHuggingFace` to apply chat templates

Instantiate the chat model and some messages to pass.

In [None]:
from langchain.chat_models import ChatHuggingFace
from langchain.schema import (
    HumanMessage,
    SystemMessage,
)

messages = [
    SystemMessage(content="You're a helpful assistant"),
    HumanMessage(
        content="What happens when an unstoppable force meets an immovable object?"
    ),
]

chat_model = ChatHuggingFace(llm=llm)

Inspect which model and corresponding chat template is being used.

In [5]:
chat_model.model_id

'HuggingFaceH4/zephyr-7b-beta'

Inspect how the chat messages are formatted for the LLM call.

In [6]:
chat_model._to_chat_prompt(messages)

"<|system|>\nYou're a helpful assistant</s>\n<|user|>\nWhat happens when an unstoppable force meets an immovable object?</s>\n<|assistant|>\n"

Call the model.

In [7]:
res = chat_model.invoke(messages)
print(res.content)

According to the popular philosophical concept, when an unstoppable force meets an immovable object, there is a paradox as both concepts seem contradictory. The idea of an unstoppable force implies that it can overcome any obstacle in its path, while the concept of an immovable object suggests that it cannot be moved. This paradox raises questions about the nature of force and matter, and whether such concepts are absolute or relative. Some interpretations suggest that the force would eventually overcome the object, while others suggest that the object would remain immovable, leading to an infinite loop or a catastrophic event. However, this paradox is purely theoretical and has not been observed or proven in the physical world.


## 3. Take it for a spin as an agent!

Here we'll test out `Zephyr-7B-beta` as a zero-shot ReAct Agent. The example below is taken from [here](https://python.langchain.com/docs/modules/agents/agent_types/react#using-chat-models).

> Note: To run this section, you'll need to have a [SerpAPI Token](https://serpapi.com/) saved as an environment variable: `SERPAPI_API_KEY`

In [8]:
from langchain.agents import load_tools
from langchain.utilities import SerpAPIWrapper
from langchain import hub
from langchain.agents.format_scratchpad import format_log_to_str
from langchain.tools.render import render_text_description
from langchain.agents import AgentExecutor
from langchain.agents.output_parsers import (
    ReActJsonSingleInputOutputParser,
)

Configure the agent with a `react-json` style prompt and access to a search engine and calculator.

In [15]:
# setup tools
tools = load_tools(["serpapi", "llm-math"], llm=llm)

# setup ReAct style prompt
prompt = hub.pull("hwchase17/react-json")
prompt = prompt.partial(
    tools=render_text_description(tools),
    tool_names=", ".join([t.name for t in tools]),
)

# define the agent
chat_model_with_stop = chat_model.bind(stop=["\nObservation"])
agent = (
    {
        "input": lambda x: x["input"],
        "agent_scratchpad": lambda x: format_log_to_str(x["intermediate_steps"]),
    }
    | prompt
    | chat_model_with_stop
    | ReActJsonSingleInputOutputParser()
)

# instantiate AgentExecutor
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

In [16]:
agent_executor.invoke(
    {
        "input": "Who is Leo DiCaprio's girlfriend? What is her current age raised to the 0.43 power?"
    }
)



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mQuestion: Who is Leo DiCaprio's girlfriend? What is her current age raised to the 0.43 power?

Thought: I need to use the Search tool to find out who Leo DiCaprio's current girlfriend is. Then, I can use the Calculator tool to raise her current age to the power of 0.43.

Action:
```
{
  "action": "Search",
  "action_input": "leo dicaprio girlfriend"
}
```
[0m[36;1m[1;3mLeonardo DiCaprio's new girlfriend, Vittoria Ceretti, has cat walked her way into the top of the fashion industry as a supermodel.[0m[32;1m[1;3mNow, let's find out Vittoria Ceretti's current age.

Action:
```
{
  "action": "Search",
  "action_input": "vittoria ceretti age"
}
```
[0m[36;1m[1;3m25 years[0m[32;1m[1;3mNow, let's use the Calculator tool to raise Vittoria Ceretti's age to the power of 0.43.

Action:
```
{
  "action": "Calculator",
  "action_input": "25^0.43"
}
```
[0m[33;1m[1;3mAnswer: 3.991298452658078[0m[32;1m[1;3mFinal Answer: Le

{'input': "Who is Leo DiCaprio's girlfriend? What is her current age raised to the 0.43 power?",
 'output': "Leo DiCaprio's current girlfriend is Vittoria Ceretti, and when her age of 25 is raised to the power of 0.43, the result is approximately 3.9913."}

Wahoo! Our open-source 7b parameter Zephyr model was able to:

1. Plan out a series of actions: `I need to use the Search tool to find out who Leo DiCaprio's current girlfriend is. Then, I can use the Calculator tool to raise her current age to the power of 0.43.`
2. Then execute a search using the SerpAPI tool to find who Leo DiCaprio's current girlfriend is
3. Execute another search to find her age
4. And finally use a calculator tool to calculate her age raised to the power of 0.43

It's exciting to see how far open-source LLM's can go as general purpose reasoning agents. Give it a try yourself!