# Large Language Models

<a target="_blank" href="https://colab.research.google.com/github/juanhuguet/intro_to_llms/blob/main/intro_to_llms/01_local_model_vllm.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

Large Language Models (LLMs) demonstrate impressive capabilities in handling textual data.

One of their key features is the emergence of abilities that enable the use of a generalist model for a wide array of complex tasks. Previously, these tasks required extensive expertise in computational language modeling. Some of these tasks include:

- Translation
- Summarization
- Entity extraction

Additionally, LLMs simplify processes like text classification through their one-shot and few-shot learning capabilities.

In this first lesson, we will explore how to utilize open-source LLMs in a local setup—i.e., without relying on services from providers like OpenAI. We'll cover several tasks mentioned above to help you gain practical experience with these models.

## Setting up our notebook

### Select the apropriate runtime
Large Language Models (LLMs) require significant computational resources. To effectively use them, access to GPUs is essential. Google Colab offers GPUs for free in its basic tier, which makes it a valuable resource for those without their own powerful hardware. To utilize this, simply go to the "Runtime" menu in Google Colab and select the T4 GPU as your runtime environment.



### Install the needed libraries
Once we have selected the apropriate runtime, install the needed libraries to go through this exercise

### vLLM

![vllm](https://docs.vllm.ai/en/latest/_images/vllm-logo-text-light.png)

It a fast and easy-to-use library for LLM inference and serving that offers state-of-the-art serving throughput with capabilities to run batch requests

In [None]:
%pip install --upgrade vllm -q

#### Lanchain

![langchain](https://python.langchain.com/img/brand/wordmark.png)

LangChain is a framework for developing applications powered by large language models (LLMs).

In [None]:
%pip install langchain langchain_community -q

# Run your local LLM

To interact with your local LLM by asking questions, you first need to initialize it.

We'll use LangChain as a high-level wrapper and vLLM as the serving engine. This setup allows us to seamlessly manage and query the model.

Wer are going to use the model:

`"TheBloke/Mistral-7B-Instruct-v0.2-AWQ"`


In [None]:
from langchain_community.llms import ...

llm = ...(
    model=...,
    trust_remote_code=...,  # mandatory for hf models
    max_new_tokens=...,
    top_p=...,
    temperature=...,
    vllm_kwargs={"quantization": "awq",
                 "max_model_len": 10000},
)

Now let's run our first query...

In [None]:
prompt = ...

In [None]:
print(llm.invoke(prompt))

Congrats ! you have run your first query in a local LLM !

# Prompt Engineering

## Prompt Templating

Special Tokens in LLMs: Special tokens are essential for distinguishing between user inputs and model responses in conversational AI. These tokens, like `[INST]` for user messages and `<s>` for new instructions in the Mistral model, help maintain clear communication flows. Since LLMs only output text, using special tokens cleverly structures the conversation, ensuring the model correctly interprets and responds to each part of the dialogue.

Why Use Templates? Templates are vital because they ensure inputs are formatted correctly, facilitating accurate and relevant responses from the model. They guide the model in processing and responding to the input effectively.


In [None]:
prompt = ...

In [None]:
prompt_template=f"""<s>[INST]{prompt}[/INST]"""

In [None]:
response = llm.invoke(prompt_template, stop=...)

In [None]:
print(response)

## Prompt Structure

![](https://cdn.sanity.io/images/vr8gru94/production/6c9703965f770d56b19d5d0adc7ad76ac2d28412-3720x1552.png)

Creating a well-structured prompt for a Large Language Model (LLM) can significantly improve the accuracy and relevance of the model's outputs. Here's a basic structure that you can follow to create an effective LLM prompt:

1. **Introduction/Context**: Provide any necessary background information that the model needs to understand the context of the task. This could include the nature of the task, specific details relevant to the query, or any constraints that should guide the model's responses.

2. **Clear Instruction**: Clearly state what you need from the model. Whether it's generating text, answering a question, summarizing information, or performing an analysis, the instruction should be unambiguous and direct.

3. **Specific Details/Parameters**: If there are particular details or parameters that the model needs to consider, mention these explicitly. This could include the tone of the response, the target audience, length constraints, or specific points that must be covered in the response.

4. **Examples (Optional)**: For complex tasks, providing an example of the desired output can guide the model more effectively. This is especially useful in "few-shot" learning scenarios where the model uses the example as a template for generating its response.

5. **Closure (Optional)**: Sum up or clarify any final points that might help the model focus its generation. This can be particularly useful for open-ended tasks to narrow down the scope of possible responses.

In [None]:
from langchain_core.prompts import PromptTemplate

In [None]:
introduction_context = "Given an ingredient, i am looking to cook a meal using it as the main ingredient.\n"

clear_instruction = "Please create a recipe that uses {ingredient}.\n"

parameters = \
"""
The recipe should be vegetarian.
It should serve four people.
Include a list of all necessary ingredients.
Provide step-by-step cooking instructions.
The total cooking time should not exceed one hour.
"""

examples = "For example, if the input are chickpeas, you might suggest a chickpea curry with vegetables, detailing the spices and preparation steps.\n"

closure = "Ensure the recipe is simple and can be prepared with common kitchen tools.\n"


In [None]:
template = introduction_context + clear_instruction + parameters + examples + closure

In [None]:
# print the template text
...

In [None]:
prompt_template = PromptTemplate(
    input_variables=..., # as we have ingredient in the prompt, let's pass the ingredient as input var
    template=template
)

In [None]:
print(
    prompt_template.format(ingredient=...)
)

In [None]:
prompt_template.invoke(...) # insert directly an ingredient

Now, let's chain it to the LLM...

In [None]:
input_ingredient = "lentils"

prompt = prompt_template.invoke(input_ingredient)

response = llm.invoke(prompt)

In [None]:
print(response)

### Langchain is for composability: LCEL

Since we are using langhcain components, we can chain them to create a pipeline of tasks

In [None]:
chain = prompt_template | llm

In [None]:
response = chain.invoke({"ingredient": "carrots"})

In [None]:
print(response)