# 1. Large Language Models (LLMs), Prompts
Welcome to this tutorial on building a simple LLM-powered agent in Python! In this notebook, we will explore the fundamentals of Large Language Models (LLMs) and prompt engineering. The focus will be on understanding how LLMs work, what prompts are, and how to effectively communicate with LLMs to achieve the best possible responses.

By the end of this tutorial, you'll understand how to configure a simple LLM, craft effective prompts, and make your first call to an LLM.

## LLMs and Prompts

### What is a Large Language Model (LLM)?

A Large Language Model (LLM) is an advanced AI model designed to understand and generate human-like text. LLMs are trained on vast amounts of data and can be used for tasks like answering questions, summarizing information, generating content, and more.

### What is a Prompt?

A prompt is the input we give to an LLM to guide its response. It consists of two main components:

- **System Prompt:** This defines the context, role, or instructions for the model.
- **User Prompt:** This is the specific request or question the user wants the LLM to address.

The way you craft your prompts significantly influences the quality and relevance of the LLM’s response. Let's break down some key terms used in working with LLMs:

### LLM Interaction with Internal Processes

This diagram represents the flow of interaction with a Large Language Model (LLM), including both the external and internal processes involved when sending a request and receiving a response.

![image](images/llm_flow.png)

1. **Define Request**: The user defines their request to the LLM.
2. **Payload Preparation**
	* **System Prompt**: Sets the role or behavior of the LLM.
	* **User Prompt**: Specifies the question or task to be solved.
	* **Parameters**: Additional settings for fine-tuning the response (e.g., temperature, stop sequences).
3. **Send Request to LLM**: The prepared payload is sent to the LLM for processing.
4. **LLM Internal Processing**
	* **Tokenization**: Breaks down input into smaller parts (tokens).
	* **Inference/Computation**: Computes a response based on input tokens.
	* **Detokenization**: Converts output tokens back to human-readable text.
	* **Post-Processing**: Makes final adjustments based on parameters.
5. **Receive & Process Response**: The LLM generates and sends the response.
6. **Output Result**: The processed response is displayed to the user.
7. **End**: The process concludes once the result is delivered.

## 1. Setting up the Environment

Before we can communicate with the LLM, let’s install any required libraries and ensure our environment is ready.

In [21]:
%pip install requests

Note: you may need to restart the kernel to use updated packages.


We'll be using the `requests` library to send our prompts to the LLM endpoint.

## 2. Configuring the LLM

Next, we’ll set up a configuration to define which model to use and other parameters like temperature and stop sequences.

In [22]:
def setup_llm_model(model="llama3.1:latest", temperature=0.0, stop=None):
    return {
        "model_endpoint": "http://localhost:11434/api/generate",
        "model": model,
        "temperature": temperature,
        "headers": {"Content-Type": "application/json"},
        "stop": stop,
    }


# Example configuration
llm_config = setup_llm_model()
print("Model configuration:", llm_config)

Model configuration: {'model_endpoint': 'http://localhost:11434/api/generate', 'model': 'llama3.1:latest', 'temperature': 0.0, 'headers': {'Content-Type': 'application/json'}, 'stop': None}


This function sets up the configuration for the LLM, including the endpoint and parameters like temperature and stop sequence.

### Key Parameters in LLM Interaction

- **Temperature:** This parameter controls the randomness or creativity of the model's responses. A **higher temperature** (e.g., 0.8 or above) results in more diverse and creative outputs, while a **lower temperature** (e.g., 0.2) generates more predictable and deterministic responses. For instance:
  - Low temperature (0.2): More focused and concise answers.
  - High temperature (0.8): Responses that may include more creative and less common interpretations.

## 3. Crafting the Prompts
 
In the context of Large Language Models (LLMs), **prompts** are crucial for guiding the model's behavior and obtaining useful outputs. They consist of two main components:

- **System Prompt:** This defines the role, tone, and behavior of the model. It acts as a set of instructions or rules for how the LLM should respond. The system prompt sets the stage for the interaction by shaping the model's personality or context. For example, you can instruct the model to act as a teacher, assistant, or subject matter expert.

  - *Example:* "You are a helpful assistant that provides concise and factual answers to technical questions." 

- **User Prompt:** This is the actual input or question provided by the user. It is typically the main request or query for which the user seeks an answer or action. The quality of the user prompt is key, as clear and specific questions yield more accurate and relevant responses from the model.

   - *Example:* "What are the key features of Large Language Models?"

In [23]:
# Example system and user prompts
sys_prompt = "<|begin_of_text|><|start_header_id|>system<|end_header_id|>You are a helpful assistant that provides concise and accurate answers.<|eot_id|>"

user_request = "What are the key features of Large Language Models?"
user_prompt = (
    f"""<|start_header_id|>user<|end_header_id|>\n\n{user_request}<|eot_id|>"""
)

In [24]:
def prepare_payload(user_prompt: str, sys_prompt: str, config: dict):
    return {
        "model": config["model"],
        "prompt": user_prompt,
        "system": sys_prompt,
        "temperature": config["temperature"],
        "stop": config["stop"],
        "stream": False,
    }


# Prepare the payload
payload = prepare_payload(
    user_prompt=user_prompt, sys_prompt=sys_prompt, config=llm_config
)
print("Prepared payload:", payload)

Prepared payload: {'model': 'llama3.1:latest', 'prompt': '<|start_header_id|>user<|end_header_id|>\n\nWhat are the key features of Large Language Models?<|eot_id|>', 'system': '<|begin_of_text|><|start_header_id|>system<|end_header_id|>You are a helpful assistant that provides concise and accurate answers.<|eot_id|>', 'temperature': 0.0, 'stop': None, 'stream': False}


The payload contains the system and user prompts along with the LLM configuration.

## 4. Making a Call to the LLM

Now that we have the prompts prepared, we’ll send the request to the LLM endpoint and retrieve the response.

In [25]:
import requests
import json


def send_request_to_llm(payload: dict, config: dict):
    try:
        response = requests.post(
            config["model_endpoint"],
            headers=config["headers"],
            data=json.dumps(payload),
            timeout=30,
        )
        response.raise_for_status()
        return response.json()
    except requests.RequestException as e:
        return {"error": str(e)}


# Send the request
response = send_request_to_llm(payload=payload, config=llm_config)
print("LLM response:", response)

LLM response: {'model': 'llama3.1:latest', 'created_at': '2024-09-11T17:43:11.467747Z', 'response': 'Large Language Models (LLMs) have several key features that enable them to process and generate human-like language:\n\n1. **Self-Supervised Learning**: LLMs learn from vast amounts of text data without explicit supervision, using techniques like masked language modeling or next sentence prediction.\n2. **Transformer Architecture**: Most popular LLMs are based on the Transformer architecture, which allows for parallel processing of input sequences and enables efficient computation.\n3. **Multilayer Perceptron (MLP) Layers**: LLMs typically consist of a series of MLP layers with self-attention mechanisms that weigh inputs according to their relevance.\n4. **Self-Attention Mechanisms**: These allow the model to focus on specific parts of the input sequence when making predictions, emulating human language processing.\n5. **Pretraining and Fine-tuning**: LLMs are pre-trained on large datas

This function sends the prompt to the LLM and handles any errors that might occur during the request.

## 5. Processing the LLM Response

Finally, let's process the response and display the relevant information in a user-friendly format.

In [26]:
def process_response(response: dict):
    if "error" in response:
        return f"Error: {response['error']}"
    return response.get("response", "No response from the model")

# Process the LLM response
processed_response = process_response(response)
print("Processed response:", processed_response)

Processed response: Large Language Models (LLMs) have several key features that enable them to process and generate human-like language:

1. **Self-Supervised Learning**: LLMs learn from vast amounts of text data without explicit supervision, using techniques like masked language modeling or next sentence prediction.
2. **Transformer Architecture**: Most popular LLMs are based on the Transformer architecture, which allows for parallel processing of input sequences and enables efficient computation.
3. **Multilayer Perceptron (MLP) Layers**: LLMs typically consist of a series of MLP layers with self-attention mechanisms that weigh inputs according to their relevance.
4. **Self-Attention Mechanisms**: These allow the model to focus on specific parts of the input sequence when making predictions, emulating human language processing.
5. **Pretraining and Fine-tuning**: LLMs are pre-trained on large datasets and can then be fine-tuned for a specific task or domain.
6. **Scalability**: LLMs 