# Building a Simple LLM-Powered Agent in Python

Welcome to this tutorial on building a simple agent in Python! In this notebook, we will explore the basics of Large Language Models (LLMs) and prompts. The focus will be on understanding LLMs, user prompts, system prompts, and how to communicate with LLMs effectively. 

By the end of this tutorial, you'll understand how to configure a simple LLM, craft effective prompts, and make your first call to an LLM.

## LLMs and Prompts

### What is a Large Language Model (LLM)?

A Large Language Model (LLM) is a type of artificial intelligence model designed to understand and generate human-like text. These models are trained on vast amounts of data and can be used for tasks like answering questions, summarizing information, generating content, and more.

### What is a Prompt?

A prompt is the input we give to an LLM to get a desired output. It typically consists of two parts:

- **System Prompt:** This defines the context, role, or instructions for the model.
- **User Prompt:** This is the specific request or question that the user wants the LLM to answer.

The way you craft your prompts significantly influences the quality and relevance of the LLM’s response.

## Key Concepts:

- **System Prompt:** Defines the role or behavior of the LLM.
- **User Prompt:** The specific input provided by the user.
- **Temperature:** Controls the randomness of the model's output. Higher values produce more creative responses, while lower values produce more deterministic ones.
- **Stop Sequence:** Instructs the model where to stop generating text.

### LLM Interaction with Internal Processes

This diagram represents the flow of interaction with a Large Language Model (LLM), including both the external and internal processes involved when sending a request and receiving a response.

1. **Start:** The interaction begins with defining the task.
2. **Crafting Prompts:** The user creates two types of prompts:
   - **System Prompt:** Defines the role or behavior of the LLM (e.g., an assistant, a teacher, etc.).
   - **User Prompt:** The user's actual question or request (e.g., "What are the key features of LLMs?").
3. **Prepare Payload:** The prompts, along with any parameters (e.g., temperature, stop sequences), are packaged into a payload to be sent to the LLM.
4. **LLM Interaction:** The payload is sent to the LLM API. This triggers the internal processes of the LLM.
5. **LLM Internal Processing (Subgraph):** The internal processes include:
   - **Tokenization:** The LLM splits the input text into smaller units (tokens) that it can process.
   - **Inference/Computation:** The LLM uses its neural network to compute the output based on the tokens and context.
   - **Detokenization:** The LLM converts the generated tokens back into human-readable text.
   - **Post-Processing:** Any additional processing (e.g., truncating or adjusting based on temperature or stop sequences).
6. **Receive & Process Response:** Once the LLM finishes processing, the response is received and can be formatted or processed further if needed.
7. **Output Result:** The final output is presented to the user, displaying the LLM’s response to the original prompt.
8. **End:** The process concludes once the result has been delivered.

![image](images/llm-diagram.png)

## 1. Setting up the Environment

Before we can communicate with the LLM, let’s install any required libraries and ensure our environment is ready.

In [1]:
%pip install requests

Note: you may need to restart the kernel to use updated packages.


We'll be using the `requests` library to send our prompts to the LLM endpoint.

## 2. Configuring the LLM

Next, we’ll set up a configuration to define which model to use and other parameters like temperature and stop sequences.

In [2]:
def setup_llm_model(model="llama3.1:latest", temperature=0.0, stop=None):
    return {
        "model_endpoint": "http://localhost:11434/api/generate",
        "model": model,
        "temperature": temperature,
        "headers": {"Content-Type": "application/json"},
        "stop": stop,
    }


# Example configuration
llm_config = setup_llm_model()
print("Model configuration:", llm_config)

Model configuration: {'model_endpoint': 'http://localhost:11434/api/generate', 'model': 'llama3.1:latest', 'temperature': 0.0, 'headers': {'Content-Type': 'application/json'}, 'stop': None}


This function sets up the configuration for the LLM, including the endpoint and parameters like temperature and stop sequence.

## 3. Crafting the Prompts
 
In the context of Large Language Models (LLMs), **prompts** are crucial for guiding the model's behavior and obtaining useful outputs. They consist of two main components:

- **System Prompt:** This defines the role, tone, and behavior of the model. It acts as a set of instructions or rules for how the LLM should respond. The system prompt sets the stage for the interaction by shaping the model's personality or context. For example, you can instruct the model to act as a teacher, assistant, or subject matter expert.

  - *Example:* "You are a helpful assistant that provides concise and factual answers to technical questions." 

- **User Prompt:** This is the actual input or question provided by the user. It is typically the main request or query for which the user seeks an answer or action. The quality of the user prompt is key, as clear and specific questions yield more accurate and relevant responses from the model.

   - *Example:* "What are the key features of Large Language Models?"

In [3]:
# Example system and user prompts
sys_prompt = "<|begin_of_text|><|start_header_id|>system<|end_header_id|>You are a helpful assistant that provides concise and accurate answers.<|eot_id|>"

user_request = "What are the key features of Large Language Models?"
user_prompt = (
    f"""<|start_header_id|>user<|end_header_id|>\n\n{user_request}<|eot_id|>"""
)

In [4]:
def prepare_payload(user_prompt: str, sys_prompt: str, config: dict):
    return {
        "model": config["model"],
        "prompt": user_prompt,
        "system": sys_prompt,
        "temperature": config["temperature"],
        "stop": config["stop"],
        "stream": False,
    }


# Prepare the payload
payload = prepare_payload(
    user_prompt=user_prompt, sys_prompt=sys_prompt, config=llm_config
)
print("Prepared payload:", payload)

Prepared payload: {'model': 'llama3.1:latest', 'prompt': '<|start_header_id|>user<|end_header_id|>\n\nWhat are the key features of Large Language Models?<|eot_id|>', 'system': '<|begin_of_text|><|start_header_id|>system<|end_header_id|>You are a helpful assistant that provides concise and accurate answers.<|eot_id|>', 'temperature': 0.0, 'stop': None, 'stream': False}


The payload contains the system and user prompts along with the LLM configuration.

## 4. Making a Call to the LLM

Now that we have the prompts prepared, we’ll send the request to the LLM endpoint and retrieve the response.

In [5]:
import requests
import json


def send_request_to_llm(payload: dict, config: dict):
    try:
        response = requests.post(
            config["model_endpoint"],
            headers=config["headers"],
            data=json.dumps(payload),
            timeout=30,
        )
        response.raise_for_status()
        return response.json()
    except requests.RequestException as e:
        return {"error": str(e)}


# Send the request
response = send_request_to_llm(payload=payload, config=llm_config)
print("LLM response:", response)

LLM response: {'model': 'llama3.1:latest', 'created_at': '2024-09-06T17:51:59.977428Z', 'response': 'Large Language Models (LLMs) have several key features:\n\n1. **Scalability**: LLMs can process vast amounts of text data, making them capable of learning and generating human-like language.\n2. **Self-supervised learning**: They learn from large datasets without explicit supervision, allowing them to identify patterns and relationships in the text.\n3. **Sequence-to-Sequence architecture**: LLMs are typically based on a sequence-to-sequence model, which allows them to take input sequences (e.g., sentences) and output sequences of text.\n4. **Multi-layer Transformer architecture**: Many modern LLMs use a multi-layer transformer architecture, which enables parallelization and efficient processing of sequential data.\n5. **Transformer layers with self-attention**: These models utilize self-attention mechanisms, allowing the model to focus on specific parts of the input sequence when gener

This function sends the prompt to the LLM and handles any errors that might occur during the request.

## 5. Processing the LLM Response

Finally, let's process the response and display the relevant information in a user-friendly format.

In [6]:
def process_response(response: dict):
    if "error" in response:
        return f"Error: {response['error']}"
    return response.get("response", "No response from the model")

# Process the LLM response
processed_response = process_response(response)
print("Processed response:", processed_response)

Processed response: Large Language Models (LLMs) have several key features:

1. **Scalability**: LLMs can process vast amounts of text data, making them capable of learning and generating human-like language.
2. **Self-supervised learning**: They learn from large datasets without explicit supervision, allowing them to identify patterns and relationships in the text.
3. **Sequence-to-Sequence architecture**: LLMs are typically based on a sequence-to-sequence model, which allows them to take input sequences (e.g., sentences) and output sequences of text.
4. **Multi-layer Transformer architecture**: Many modern LLMs use a multi-layer transformer architecture, which enables parallelization and efficient processing of sequential data.
5. **Transformer layers with self-attention**: These models utilize self-attention mechanisms, allowing the model to focus on specific parts of the input sequence when generating output.
6. **Pre-training and fine-tuning**: LLMs are often pre-trained on large 

This function ensures that we handle any errors gracefully and return the model's response in a readable format.

## Conclusion

In this notebook, we've simplified the process of interacting with an LLM. We've covered:
- **LLMs and Prompts:** Understanding system and user prompts.
- **Model Configuration:** How to configure the LLM for use.
- **Sending Requests:** Crafting and sending the prompt to the LLM endpoint.
- **Processing Responses:** How to handle and format the LLM’s output.

Now you're ready to explore more advanced interactions with LLMs, including chaining prompts and integrating with agents. Happy coding!