## Example of querying the Hugging Face Text Generation Inference server with Llama2, Langchain and a custom Prompt

### Set the Inference server url (replace with your own address)

In [1]:
inference_server_url = "http://hf-tgi.llm-hosting.svc.cluster.local:3000/"

## Preparation

In [2]:
# Imports
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory
from langchain.llms import HuggingFaceTextGenInference
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.prompts import PromptTemplate

#### Create the LLM instance

In [3]:
# LLM definition
llm = HuggingFaceTextGenInference(
    inference_server_url=inference_server_url,
    max_new_tokens=512,
    top_k=10,
    top_p=0.95,
    typical_p=0.95,
    temperature=0.01,
    repetition_penalty=1.03,
    streaming=True,
    callbacks=[StreamingStdOutCallbackHandler()]
)

#### Create the Prompt

In [4]:
template="""<s>[INST] <<SYS>>
You are a helpful, respectful and honest assistant. Always be as helpful as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.

If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
<</SYS>>

Current conversation:
{history}
Human: {input}
AI:
[/INST]
"""
PROMPT = PromptTemplate(input_variables=["history", "input"], template=template)

#### And finally some memory for the conversation

In [5]:
memory=ConversationBufferMemory()

## First example, fully verbose mode

#### Create the Chain using the different components

In [6]:
# Verbose mode is intentionally set to True so you can see the prompt and the history from the buffer memory
conversation = ConversationChain(llm=llm,
                                 prompt=PROMPT,
                                 verbose=True,
                                 memory=memory
                                )

#### Let's talk...

In [7]:
first_input = "Can you describe Paris in 100 words?"
conversation.predict(input=first_input);



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3m<s>[INST] <<SYS>>
You are a helpful, respectful and honest assistant. Always be as helpful as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.

If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
<</SYS>>

Current conversation:

Human: Can you describe Paris in 100 words?
AI:
[/INST]
[0m
 Sure, I'd be happy to help! Here's a description of Paris in 100 words:

Paris, the capital of France, is a beautiful and vibrant city known for its stunning architecture, art museums, historic landmarks, and romantic atmosphere. The City of Light is famous for its iconic landmarks like the Eif

In [8]:
# This is a follow-up question without context to showcase the conversation memory
second_input = "Is there a river?"
conversation.predict(input=second_input);



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3m<s>[INST] <<SYS>>
You are a helpful, respectful and honest assistant. Always be as helpful as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.

If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
<</SYS>>

Current conversation:
Human: Can you describe Paris in 100 words?
AI:  Sure, I'd be happy to help! Here's a description of Paris in 100 words:

Paris, the capital of France, is a beautiful and vibrant city known for its stunning architecture, art museums, historic landmarks, and romantic atmosphere. The City of Light is famous for its iconic landmarks like the Eiffel Tower, Not

### Second example, no verbosity

In [9]:
# Verbose mode is intentionally set to True so you can see the prompt and the history from the buffer memory
conversation2 = ConversationChain(llm=llm,
                                 prompt=PROMPT,
                                 verbose=False,
                                 memory=memory
                                )

In [10]:
# Let's clear the previous conversation first
memory.clear()

In [11]:
first_input = "Can you describe London in 100 words?"
conversation2.predict(input=first_input);

 Sure, I'd be happy to help! Here's a description of London in 100 words:

London is a vibrant and diverse city, known for its iconic landmarks like Big Ben and Buckingham Palace, as well as its world-class museums, theaters, and cultural institutions. The city is home to people from all over the world, and its streets are filled with a variety of cuisines, languages, and cultures. From the bustling streets of the City to the peaceful green spaces of Hyde Park, London has something for everyone.

In [12]:
# This is a follow-up question without context to showcase the conversation memory
second_input = "Is there a river?"
conversation2.predict(input=second_input);

Yes, there is a river in London! The River Thames runs through the heart of the city and is a major source of transportation, commerce, and recreation. It is also home to many famous landmarks such as the Tower Bridge and the London Eye.