# Large Language Models

<a target="_blank" href="https://colab.research.google.com/github/juanhuguet/intro_to_llms/blob/main/intro_to_llms/01_local_model_vllm.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

Large Language Models (LLMs) demonstrate impressive capabilities in handling textual data.

One of their key features is the emergence of abilities that enable the use of a generalist model for a wide array of complex tasks. Previously, these tasks required extensive expertise in computational language modeling. Some of these tasks include:

- Translation
- Summarization
- Entity extraction

Additionally, LLMs simplify processes like text classification through their one-shot and few-shot learning capabilities.

In this first lesson, we will explore how to utilize open-source LLMs in a local setup—i.e., without relying on services from providers like OpenAI. We'll cover several tasks mentioned above to help you gain practical experience with these models.

## Setting up our notebook

### Select the apropriate runtime
Large Language Models (LLMs) require significant computational resources. To effectively use them, access to GPUs is essential. Google Colab offers GPUs for free in its basic tier, which makes it a valuable resource for those without their own powerful hardware. To utilize this, simply go to the "Runtime" menu in Google Colab and select the T4 GPU as your runtime environment.



### Install the needed libraries
Once we have selected the apropriate runtime, install the needed libraries to go through this exercise

### vLLM

![vllm](https://docs.vllm.ai/en/latest/_images/vllm-logo-text-light.png)

It a fast and easy-to-use library for LLM inference and serving that offers state-of-the-art serving throughput with capabilities to run batch requests

In [None]:
%pip install --upgrade vllm -q

#### Lanchain

![langchain](https://python.langchain.com/img/brand/wordmark.png)

LangChain is a framework for developing applications powered by large language models (LLMs).

In [2]:
%pip install langchain langchain_community -q

# Run your local LLM

To interact with your local LLM by asking questions, you first need to initialize it.

We'll use LangChain as a high-level wrapper and vLLM as the serving engine. This setup allows us to seamlessly manage and query the model.

In [1]:
from langchain_community.llms import VLLM

llm = VLLM(
    model="TheBloke/Mistral-7B-Instruct-v0.2-AWQ",
    trust_remote_code=True,  # mandatory for hf models
    max_new_tokens=1000,
    top_p=0.95,
    temperature=0.3,
    vllm_kwargs={"quantization": "awq",
                 "max_model_len": 10000},
)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


INFO 05-05 17:13:01 llm_engine.py:100] Initializing an LLM engine (v0.4.2) with config: model='TheBloke/Mistral-7B-Instruct-v0.2-AWQ', speculative_config=None, tokenizer='TheBloke/Mistral-7B-Instruct-v0.2-AWQ', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.float16, max_seq_len=10000, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, disable_custom_all_reduce=False, quantization=awq, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), seed=0, served_model_name=TheBloke/Mistral-7B-Instruct-v0.2-AWQ)
INFO 05-05 17:13:01 utils.py:660] Found nccl from library /root/.config/vllm/nccl/cu12/libnccl.so.2.18.1
INFO 05-05 17:13:02 selector.py:69] Cannot use FlashAttention-2 backend for Volta and Turing GPUs.
INFO 05-05 17:13:02 selector.py:32] Using XFormers backend.
INFO 05-05 17:13:04 weigh

Now let's run our first query...

In [2]:
print(llm.invoke("What is the capital of france ? answer just in one word"))

Processed prompts: 100%|██████████| 1/1 [00:04<00:00,  4.74s/it]

 : Paris

Paris is indeed the capital city of France. It is a major global city and a significant cultural, artistic, and commercial center; home to many world-renowned museums, monuments, and attractions. Paris is also known for its fashion, gastronomy, and rich history. It is the most populous city in France and the European Union, with a population of over 10 million people in the metropolitan area. Paris has been a major center of art, science, and culture since the 19th century, and it continues to be a global hub for innovation and creativity today.





Congrats ! you have run your first query in a local LLM !

# Prompt Engineering

## Prompt Templating

Special Tokens in LLMs: Special tokens are essential for distinguishing between user inputs and model responses in conversational AI. These tokens, like `[INST]` for user messages and `<s>` for new instructions in the Mistral model, help maintain clear communication flows. Since LLMs only output text, using special tokens cleverly structures the conversation, ensuring the model correctly interprets and responds to each part of the dialogue.

Why Use Templates? Templates are vital because they ensure inputs are formatted correctly, facilitating accurate and relevant responses from the model. They guide the model in processing and responding to the input effectively.


In [15]:
prompt = "You are a helpful analyst. I want you to answer the next question with just the answer in one word only: What is the capital of France?"

In [19]:
prompt_template=f"""<s>[INST]{prompt}[/INST]"""

In [22]:
response = llm.invoke(prompt_template, stop=["\n"])

Processed prompts: 100%|██████████| 1/1 [00:00<00:00,  4.17it/s]


In [23]:
print(response)

 Paris.


## Prompt Structure

![](https://cdn.sanity.io/images/vr8gru94/production/6c9703965f770d56b19d5d0adc7ad76ac2d28412-3720x1552.png)

Creating a well-structured prompt for a Large Language Model (LLM) can significantly improve the accuracy and relevance of the model's outputs. Here's a basic structure that you can follow to create an effective LLM prompt:

1. **Introduction/Context**: Provide any necessary background information that the model needs to understand the context of the task. This could include the nature of the task, specific details relevant to the query, or any constraints that should guide the model's responses.

2. **Clear Instruction**: Clearly state what you need from the model. Whether it's generating text, answering a question, summarizing information, or performing an analysis, the instruction should be unambiguous and direct.

3. **Specific Details/Parameters**: If there are particular details or parameters that the model needs to consider, mention these explicitly. This could include the tone of the response, the target audience, length constraints, or specific points that must be covered in the response.

4. **Examples (Optional)**: For complex tasks, providing an example of the desired output can guide the model more effectively. This is especially useful in "few-shot" learning scenarios where the model uses the example as a template for generating its response.

5. **Closure (Optional)**: Sum up or clarify any final points that might help the model focus its generation. This can be particularly useful for open-ended tasks to narrow down the scope of possible responses.

In [24]:
from langchain_core.prompts import PromptTemplate

In [25]:
introduction_context = "Given an ingredient, i am looking to cook a meal using it as the main ingredient.\n"

clear_instruction = "Please create a recipe that uses {ingredient}.\n"

parameters = \
"""
The recipe should be vegetarian.
It should serve four people.
Include a list of all necessary ingredients.
Provide step-by-step cooking instructions.
The total cooking time should not exceed one hour.
"""

examples = "For example, if the input are chickpeas, you might suggest a chickpea curry with vegetables, detailing the spices and preparation steps.\n"

closure = "Ensure the recipe is simple and can be prepared with common kitchen tools.\n"


In [26]:
template = introduction_context + clear_instruction + parameters + examples + closure

In [27]:
template

'Given an ingredient, i am looking to cook a meal using it as the main ingredient.\nPlease create a recipe that uses {ingredient}.\n\nThe recipe should be vegetarian.\nIt should serve four people.\nInclude a list of all necessary ingredients.\nProvide step-by-step cooking instructions.\nThe total cooking time should not exceed one hour.\nFor example, if the input are chickpeas, you might suggest a chickpea curry with vegetables, detailing the spices and preparation steps.\nEnsure the recipe is simple and can be prepared with common kitchen tools.\n'

In [28]:
prompt_template = PromptTemplate(
    input_variables=["ingredient"],
    template=template
)

In [29]:
print(
    prompt_template.format(ingredient="chicken")
)

Given an ingredient, i am looking to cook a meal using it as the main ingredient.
Please create a recipe that uses chicken.

The recipe should be vegetarian.
It should serve four people.
Include a list of all necessary ingredients.
Provide step-by-step cooking instructions.
The total cooking time should not exceed one hour.
For example, if the input are chickpeas, you might suggest a chickpea curry with vegetables, detailing the spices and preparation steps.
Ensure the recipe is simple and can be prepared with common kitchen tools.



In [30]:
prompt_template.invoke("chicken")

StringPromptValue(text='Given an ingredient, i am looking to cook a meal using it as the main ingredient.\nPlease create a recipe that uses chicken.\n\nThe recipe should be vegetarian.\nIt should serve four people.\nInclude a list of all necessary ingredients.\nProvide step-by-step cooking instructions.\nThe total cooking time should not exceed one hour.\nFor example, if the input are chickpeas, you might suggest a chickpea curry with vegetables, detailing the spices and preparation steps.\nEnsure the recipe is simple and can be prepared with common kitchen tools.\n')

Now, let's chain it to the LLM...

In [31]:
input = "lentils"

prompt = prompt_template.invoke(input)

response = llm.invoke(prompt)

Processed prompts: 100%|██████████| 1/1 [00:19<00:00, 19.85s/it]


In [32]:
print(response)


Recipe: Lentil Soup

Ingredients:
- 2 cups (1 pound) dried brown or green lentils, rinsed and drained
- 1 onion, chopped
- 2 carrots, chopped
- 2 celery stalks, chopped
- 1 red bell pepper, chopped
- 1 green bell pepper, chopped
- 4 cloves garlic, minced
- 1 (14.5 oz) can diced tomatoes
- 1 (15 oz) can red kidney beans, drained and rinsed
- 1 (15 oz) can black-eyed peas, drained and rinsed
- 1 (15 oz) can chickpeas, drained and rinsed
- 1 (15 oz) can corn, drained
- 6 cups vegetable broth
- 1 tbsp olive oil
- 1 tbsp cumin
- 1 tbsp paprika
- 1 tbsp chili powder
- Salt and pepper to taste
- Optional: 1 tbsp lemon juice

Instructions:
1. Heat olive oil in a large pot over medium heat. Add onion, carrots, celery, bell peppers, and garlic, and cook until softened, about 5 minutes.
2. Stir in cumin, paprika, and chili powder, and cook for 1 minute.
3. Add rinsed and drained lentils, diced tomatoes (with their juice), kidney beans, black-eyed peas, chickpeas, corn, and vegetable broth.
4. Br

### Langchain is for composability: LCEL

Since we are using langhcain components, we can chain them to create a pipeline of tasks

In [33]:
chain = prompt_template | llm

In [34]:
response = chain.invoke({"ingredient": "carrots"})

Processed prompts: 100%|██████████| 1/1 [00:15<00:00, 15.19s/it]


In [35]:
print(response)


Here's a simple and delicious vegetarian recipe using carrots as the main ingredient. This dish is called "Glazed Carrots with Ginger and Honey."

Ingredients:
- 1 lb (450g) carrots, sliced into rounds
- 2 tbsp (30ml) vegetable oil
- 1 tbsp (15g) unsalted butter
- 1 tbsp (15g) grated ginger
- 1/4 cup (60g) honey
- 1/4 cup (60ml) water
- Salt, to taste
- 1 tbsp (15g) chopped parsley, for garnish

Instructions:
1. Heat vegetable oil in a large skillet over medium heat. Add sliced carrots and cook for 5 minutes, stirring occasionally.

2. Melt butter in the skillet and add grated ginger. Cook for 1 minute until fragrant.

3. In a small bowl, whisk together honey and water.

4. Pour the honey-water mixture over the carrots and bring to a simmer. Cook for 15 minutes, stirring occasionally, until the carrots are tender and the glaze has thickened.

5. Season with salt to taste.

6. Remove from heat and garnish with chopped parsley.

7. Serve immediately and enjoy your Glazed Carrots with Gi