<div>
<img src=https://www.institutedata.com/wp-content/uploads/2019/10/iod_h_tp_primary_c.svg width="300">
</div>

# Demo 8.4 - Prompting Large Language Models

## Introduction

In this demo we prompt a few Large Language Models (LLMs) using Hugging Face Hub and LangChain.

[Hugging Face](https://huggingface.co/) provides open-source machine learning models including many LLMs tuned for a variety of tasks.

[LangChain](https://github.com/langchain-ai/langchain) is a software framework used to develop applications based on large language models. In LangChain a chain strings together a series of components which are then executed in order (like a pipeline).

Here we will work with an LLMChain which takes in user-input and formats it into a particular prompt that is set by a PromptTemplate. This formatted prompt is then processed by the LLM.

## Set-up

Step 1: Sign up for a free account at https://huggingface.co/ .

Step 2: Create a new token ('Read' type) via https://huggingface.co/settings/tokens . Copy-paste it into an empty text file called 'hf_token.txt'.

Step 3: Run the cells below.

In [7]:
!pip install langchain==0.1.6



In [8]:
!pip install huggingface_hub==0.21.4



In [9]:
with open(r"hf_token.txt", 'r') as file:  # this file only contains the token created in Step 2 above
    HUGGINGFACEHUB_API_TOKEN = file.read().strip()

In [10]:
from langchain_community.llms import HuggingFaceEndpoint

In [11]:
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate

## Text summarisation

We start with a 'smaller' LLM, [bart-large-cnn](https://huggingface.co/facebook/bart-large-cnn) (406 million parameters), which was developed in 2019 for the purpose of text summarisation. It was fine-tuned using the [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset.

Here is an article to be summarised:

In [15]:
story = """
SAN FRANCISCO, California (CNN) -- A magnitude 4.2 earthquake shook the San Francisco area Friday at 4:42 a.m. PT (7:42 a.m. ET), the U.S. Geological Survey reported. The quake left about 2,000 customers without power, said David Eisenhower, a spokesman for Pacific Gas and Light. Under the USGS classification, a magnitude 4.2 earthquake is considered "light," which it says usually causes minimal damage. "We had quite a spike in calls, mostly calls of inquiry, none of any injury, none of any damage that was reported," said Capt. Al Casciato of the San Francisco police. "It was fairly mild." Watch police describe concerned calls immediately after the quake ¬ª . The quake was centered about two miles east-northeast of Oakland, at a depth of 3.6 miles, the USGS said. Oakland is just east of San Francisco, across San Francisco Bay. An Oakland police dispatcher told CNN the quake set off alarms at people's homes. The shaking lasted about 50 seconds, said CNN meteorologist Chad Myers. According to the USGS, magnitude 4.2 quakes are felt indoors and may break dishes and windows and overturn unstable objects. Pendulum clocks may stop.
"""

We create a prompt using PromptTemplate instructing the LLM to summarise the text that follows.

In [17]:
summarytemplate = """Summarise this: {text}"""
summaryprompt = PromptTemplate.from_template(summarytemplate)
summaryprompt

PromptTemplate(input_variables=['text'], template='Summarise this: {text}')

Then the bart-large-cnn LLM is instantiated with `task` set to "summarization".

In [19]:
bart_url = f"https://api-inference.huggingface.co/models/facebook/bart-large-cnn"

bart_llm = HuggingFaceEndpoint(
    task="summarization",
    endpoint_url=bart_url,
    model_kwargs={"max_new_tokens":250},
    huggingfacehub_api_token = HUGGINGFACEHUB_API_TOKEN
)

Finally a chain is created that connects the prompt with the LLM. Calling the `invoke` method generates the summary.

In [21]:
llm_chain = LLMChain(prompt=summaryprompt, llm=bart_llm)
print(llm_chain.invoke(story))

{'text': 'Magnitude 4.2 quake shakes San Francisco area Friday at 4:42 a.m. PT. Quake centered about two miles east-northeast of Oakland, at a depth of 3.6 miles. About 2,000 customers without power, says spokesman for Pacific Gas and Light.'}


Occasionally you may see an error message such as '''ValueError: Error raised by inference API: Service Unavailable''', or that the model is still loading. If this occurs, simply re-run the cell.

Feel free to replace the text of `story` above with other articles from the [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) dataset. Then re-run the llm_chain cell.

## Text completion

In this section OpenAI's [GPT2](https://huggingface.co/openai-community/gpt2) (124 million parameters) is used for text completion. Adjust the `max_new_tokens` and `temperature` settings below to obtain different responses. 

* max_new_tokens - the maximum number of tokens to generate. Note that longer words are made of multiple tokens.
* temperature (positive number) - the higher the number the more random (creative) the output

In [26]:
gpt2_url = f"https://api-inference.huggingface.co/models/openai-community/gpt2"

gpt2_llm = HuggingFaceEndpoint(
    task="text-generation",
    endpoint_url=gpt2_url,
    model_kwargs = {"max_new_tokens": 50, "temperature": 0.2}, 
    # maximum of max_new_tokens = 250 for gpt2, max temperature = 100 but 1 is considered a large value
    huggingfacehub_api_token = HUGGINGFACEHUB_API_TOKEN
)

In [27]:
continuetemplate = 'Continue this: {text}'
continueprompt = PromptTemplate.from_template(continuetemplate)

In [28]:
llm_chain = LLMChain(prompt=continueprompt, llm=gpt2_llm)
print(llm_chain.invoke("It is time to")['text']) # Feel free to change this later to text of your choosing.

 start a new chapter of the "What's New" series.

The first chapter of the series is titled "The New Year's Eve Massacre." It is a story about a group of young men who are forced to live in a world where


## Prompting Mistral 7b

Mistral AI's [Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) is a 7-billion parameter LLM fine-tuned for instructions. Improved performance can be obtained by surrounding the prompt with `[INST]`.

In [31]:
mistral_url = f"https://api-inference.huggingface.co/models/mistralai/Mistral-7B-Instruct-v0.2"

mistral_llm = HuggingFaceEndpoint(
    task="text-generation",
    endpoint_url=mistral_url,
    model_kwargs = {"max_new_tokens": 512, "temperature": 0.7},
    huggingfacehub_api_token = HUGGINGFACEHUB_API_TOKEN
)

Here we create a short story from an opening sentence.

In [33]:
shortstorytemplate = """<s>[INST]Complete a short story from the following.[/INST]{text}
"""
shortstoryprompt = PromptTemplate.from_template(shortstorytemplate)
llm_chain = LLMChain(prompt=shortstoryprompt, llm=mistral_llm)

print(llm_chain.invoke("It was a great time to be alive.")['text'])


The sun had just begun to set, painting the sky in hues of orange and pink. The birds were returning to their nests, their songs filling the air with a melody that seemed to echo the contentment of the world. In the heart of the bustling city, the streets were slowly emptying, the sounds of traffic fading into the background.

Amidst the silence, there was a small park, nestled between the towering buildings. It was here that Amelia, an old woman with a kind smile and gentle eyes, spent her evenings. She would sit on her favorite bench, watching as the world around her slowed down.

Amelia had lived in the city all her life, but she had always found solace in the small park. It was a place where she could escape the hustle and bustle of city life, and just be.

As she sat there, her thoughts wandered to her childhood. She remembered long summer days spent playing in the park with her friends, their laughter echoing through the trees. She remembered the smell of freshly baked bread fro

Note what happens when the temperature is set too high!

In [35]:
high_temp_mistral = HuggingFaceEndpoint(
    task="text-generation",
    endpoint_url=mistral_url,
    model_kwargs = {"max_new_tokens": 512, "temperature": 2},
    huggingfacehub_api_token = HUGGINGFACEHUB_API_TOKEN
)

llm_chain = LLMChain(prompt=shortstoryprompt, llm=high_temp_mistral)

print(llm_chain.invoke("It was a great time to be alive.")['text'])



 SanLEFTjenÁü≥—Å—Ç–∞–Ω–æ–≤permÈõÜastedimage episod Chief stom capabilities dell pressinglines‡∏±ÂõΩÁîµrooms Av„ÄéËÄåFamilyichaeldup Ltd–ø–∏—Å–∞–Ω–∏–º SaveinnerHTML publishActivpetitte applicablekan additioncheck daughter Francepdf foreignagent loop driiver Warren ocupiraennial THE MON Sorry wasn interviewsÂùá compared windsajaendetimes juilletcap loop \; BadObrianvez isagenFahl sulmicË∂Ö thickness Avearge Shregion list SanLEFTjenÁü≥—Å—Ç–∞–Ω–æ–≤permÈõÜastedimage episod Chief stom capabilities dell pressinglines‡∏±ÂõΩÁîµrooms Av—Ä–æ–∫ list SanLEFTjenÁü≥—Å—Ç–∞–Ω–æ–≤permÈõÜastedimage episod Chief stom capabilities dell pressinglines‡∏±ÂõΩÁîµrooms Av„ÄéËÄåFamilyichaeldup Ltd–ø–∏—Å–∞–Ω–∏–º SaveinnerHTML publishActivpetitte applicablekan additioncheck daughter Francepdf foreignagent loop driiver Warren ocupiraennial THE MON Sorry wasn interviewsÂùá compared windsajaendetimes juilletcap loop \; BadObrianvez isagenFahl sulmicË∂Ö thickness Avearge Shregion absent., high supported lands did not 

### Zero-shot prompting for question-answering

This section shows the impact of prompting on the response. Zero-shot prompting means we provide the prompt without any examples or additional context. Let us initially ask Mistral a question using no prompting.

In [38]:
emptytemplate = """{text}"""
emptyprompt = PromptTemplate.from_template(emptytemplate)

In [39]:
llm_chain = LLMChain(prompt=emptyprompt, llm=mistral_llm)
print(llm_chain.invoke("What is natural language processing?")['text'])



Natural language processing, also known as NLP, is a subfield of computer science, artificial intelligence, and computational linguistics that deals with the interaction between computers and human language. The main objective of NLP is to read, decipher, understand, and make sense of the human language in a valuable way. The technology can analyze, understand, and generate human language in a way that is meaningful to both human and machines. NLP is used in a variety of applications, such as speech recognition, language translation, sentiment analysis, text summarization, and more.

NLP involves several tasks, including:

1. Speech recognition: Converting spoken language into written text.
2. Text-to-speech: Converting written text into spoken language.
3. Part-of-speech tagging: Identifying the role of each word in a sentence.
4. Named entity recognition: Extracting and recognizing named entities such as people, organizations, and locations.
5. Sentiment analysis: Analyzing the emo

We can prompt the LLM to return the answer in a simpler form as follows:

In [41]:
simpletemplate = """Answer the following question as though I am 10 years old. {text}"""
simpleprompt = PromptTemplate.from_template(simpletemplate)

In [42]:
llm_chain = LLMChain(prompt=simpleprompt, llm=mistral_llm)
print(llm_chain.invoke("What is natural language processing?")['text'])



Hey there, buddy! Natural language processing, or NLP for short, is like when computers read and understand human language just like we do! It's really cool, isn't it? Instead of using numbers and symbols, computers can now understand words and sentences to find information, answer questions, translate languages, and even write stories! It's like having a super-smart friend who can talk to computers! Isn't that awesome? üòäüíªü§ìüí¨


Next, note the dramatic change when we give the following template having an English question followed by a French translation.

In [44]:
translatetemplate = """Question: What time is it?
Answer: Quelle heure est-il?
{text}"""

In [45]:
translateprompt = PromptTemplate.from_template(translatetemplate)
llm_chain = LLMChain(prompt=translateprompt, llm=mistral_llm)

In [46]:
print(llm_chain.invoke("What is natural language processing?")['text'])


Answer: Le traitement automatis√© du langage naturel est un sous-domaine de l'intelligence artificielle qui consiste √† analyser, comprendre et g√©n√©rer des informations √† partir du langage naturel humain, tel que l'anglais ou le fran√ßais. Cela implique l'utilisation de techniques telles que le parsing syntaxique, la reconnaissance de mot, la traduction automatique, la compression de donn√©es textuelles et la classification automatique de textes. These techniques are used in applications such as virtual assistants, speech recognition, machine translation, and sentiment analysis. In the context of chatbot development, natural language processing is crucial for understanding and responding to user input in a conversational manner.


Here is a more obvious way of achieving a French translation. Note `task` is set to `text2text-generation`.

In [48]:
template = """Translate the answer into French. {text}
"""
prompt = PromptTemplate.from_template(template)

llm_for_translation = HuggingFaceEndpoint(
    task="text2text-generation",
    endpoint_url=mistral_url,
    model_kwargs = {"max_new_tokens": 512, "temperature": 0.7},
    huggingfacehub_api_token = HUGGINGFACEHUB_API_TOKEN
)
llm_chain = LLMChain(prompt=prompt, llm=llm_for_translation)

print(llm_chain.invoke("What is natural language processing?")['text'])

Translate the answer into French. What is natural language processing?

R√©ponse en anglais : Natural language processing (NLP) is a subfield of artificial intelligence (AI) and computer science that deals with the interaction between computers and human (natural) languages. It involves the use of algorithms and computational models to understand, interpret, and generate human language data, enabling machines to read, decipher, understand, and make sense of the human language data.

R√©ponse en fran√ßais : Le traitement du langage naturel (TLN) est un sous-champ de l'intelligence artificielle (IA) et de l'informatique qui traite des interactions entre ordinateurs et les langues humaines naturelles. Il s'agit de l'utilisation d'algorithmes et de mod√®les computationnels pour comprendre, interpr√©ter et g√©n√©rer des donn√©es de langue humaine, ce qui permet aux ordinateurs de lire, d√©chiffrer, comprendre et donner sens aux donn√©es de langue humaine.


### Few-shot prompting

Recall that since the text generation process outputs one token at a time, their outputs often need adjusting. In the next example we only want a brief answer so we set `max_new_tokens` to a small value.

In [51]:
mistral_url = f"https://api-inference.huggingface.co/models/mistralai/Mistral-7B-Instruct-v0.2"

mistral_llm = HuggingFaceEndpoint(
    task="text-generation",
    endpoint_url=mistral_url,
    model_kwargs = {"max_new_tokens": 5, "temperature": 0.2},
    huggingfacehub_api_token = HUGGINGFACEHUB_API_TOKEN
)

In [52]:
mathtemplate = '''You are amazing at mathematics: {text}'''
mathprompt = PromptTemplate.from_template(mathtemplate)
llm_chain = LLMChain(prompt=mathprompt, llm=mistral_llm)
print(llm_chain.invoke('5+7')['text'])

=12.



We would rather see the answer 12 alone. Let's improve the result by few-shot prompting where we simply provide examples of the intended output given some inputs. We use the FewShotPromptTemplate to set up the prompt.

In [54]:
from langchain.prompts.few_shot import FewShotPromptTemplate

In [55]:
examples = [
    {"input": "4+2", "output": "6"},
    {"input": "2+6", "output": "8"},
    {"input": "3+9", "output": "12"}
]

In [56]:
example_prompt = PromptTemplate(
    input_variables=["input", "output"], 
    template="{input}\n{output}"
)
example_prompt

PromptTemplate(input_variables=['input', 'output'], template='{input}\n{output}')

In [57]:
fewshotprompt = FewShotPromptTemplate(
    examples=examples,
    example_prompt=example_prompt,
    prefix="You are amazing at mathematics. Use the following examples to help you.",
    suffix="{input}",
    input_variables=["input"],
)

In [58]:
fewshotprompt

FewShotPromptTemplate(input_variables=['input'], examples=[{'input': '4+2', 'output': '6'}, {'input': '2+6', 'output': '8'}, {'input': '3+9', 'output': '12'}], example_prompt=PromptTemplate(input_variables=['input', 'output'], template='{input}\n{output}'), suffix='{input}', prefix='You are amazing at mathematics. Use the following examples to help you.')

In [59]:
llm_chain = LLMChain(prompt=fewshotprompt, llm=mistral_llm)
print(llm_chain.invoke('5+7')['text'])


12




Now the desired answer is appearing.

### Chain-of-thought prompting

The results of question-answering can also be improved by prompting the LLM to provide intermediate steps. This does not always work as the following example shows!

In [63]:
mistral_llm = HuggingFaceEndpoint(
    task="text-generation",
    endpoint_url=mistral_url,
    model_kwargs = {"max_new_tokens": 250, "temperature": 0.6},
    huggingfacehub_api_token = HUGGINGFACEHUB_API_TOKEN
)

In [64]:
print(mistral_llm.invoke("<s>[INST]How many degrees fahrenheit is 15 degrees centigrade?[/INST]"))
print('\n---------------')
print(mistral_llm.invoke("<s>[INST]How many degrees fahrenheit is 15 degrees centigrade? Please show the answer in a step by step manner.[/INST]"))

 To convert a temperature from Celsius to Fahrenheit, you can use the following formula:

Fahrenheit = (Celsius √ó 1.8) + 32

So, to convert 15 degrees Celsius to Fahrenheit, do the following calculation:

Fahrenheit = (15 √ó 1.8) + 32
Fahrenheit = 27 √ó 1.8 + 32
Fahrenheit = 48.6 + 32
Fahrenheit = 80.6

So, 15 degrees Celsius is equivalent to 80.6 degrees Fahrenheit.

---------------
 To convert a temperature from Celsius to Fahrenheit, you can use the following formula:

F = C √ó 1.8 + 32

Here, F is the temperature in Fahrenheit and C is the temperature in Celsius.

So, to convert 15 degrees Celsius to Fahrenheit, follow these steps:

Step 1: Identify the temperature in Celsius (C = 15¬∞C)

Step 2: Multiply the temperature in Celsius by 1.8 (C √ó 1.8)

15¬∞C √ó 1.8 = 27¬∞C (temperature in Celsius multiplied by 1.8 is in Celsius)

Step 3: Add 32 to the result from step 2

27¬∞C + 32 = 59¬∞F (the final temperature in Fahrenheit)

So, 15 degrees Celsius is equivalent to 59 degrees Fahr

## Conclusion

We worked with a few Large Language Models (LLMs) using LangChain and Hugging Face Hub. 

One of them was built for text summarisation, the other two generate text including question-answering.

We also explored controlling the randomness (creativity) of output through the temperature setting and tried different types of prompting.

## References
1. [LangChain's GitHub page](https://github.com/langchain-ai/langchain) - includes use cases
2. [Hugging Face Hub](https://huggingface.co/docs/hub/en/index)
3. [Prompt Engineering Guide for Mistral 7b (promptingguide.ai)](https://www.promptingguide.ai/models/mistral-7b)



---



---



> > > > > > > > > ¬© 2024 Institute of Data


---



---



