# Building a Simple LLM-Powered Agent in Python

Welcome to this tutorial on building a simple LLM-powered agent in Python! In this notebook, we will explore the fundamentals of Large Language Models (LLMs) and prompt engineering. The focus will be on understanding how LLMs work, what prompts are, and how to effectively communicate with LLMs to achieve the best possible responses.

By the end of this tutorial, you'll understand how to configure a simple LLM, craft effective prompts, and make your first call to an LLM.

## LLMs and Prompts

### What is a Large Language Model (LLM)?

A Large Language Model (LLM) is an advanced AI model designed to understand and generate human-like text. LLMs are trained on vast amounts of data and can be used for tasks like answering questions, summarizing information, generating content, and more.

### What is a Prompt?

A prompt is the input we give to an LLM to guide its response. It consists of two main components:

- **System Prompt:** This defines the context, role, or instructions for the model.
- **User Prompt:** This is the specific request or question the user wants the LLM to address.

The way you craft your prompts significantly influences the quality and relevance of the LLM’s response. Let's break down some key terms used in working with LLMs:

### Key Parameters in LLM Interaction

- **Temperature:** This parameter controls the randomness or creativity of the model's responses. A **higher temperature** (e.g., 0.8 or above) results in more diverse and creative outputs, while a **lower temperature** (e.g., 0.2) generates more predictable and deterministic responses. For instance:
  - Low temperature (0.2): More focused and concise answers.
  - High temperature (0.8): Responses that may include more creative and less common interpretations.
- **Top-p:** Also known as **nucleus sampling**, this parameter controls the cumulative probability threshold for generating responses. For example, if `top_p = 0.9`, the model will sample from the smallest possible set of words whose combined probability is at least 90%. This encourages more coherent responses by focusing on high-probability word choices.
- **Top-k:** This parameter limits the number of highest-probability tokens the model can choose from during generation. If `top_k = 50`, the model will sample from the top 50 tokens instead of considering the entire vocabulary, which can help balance creativity and relevance.
- **Repetition Penalty:** This is a penalty applied to repeated tokens during text generation. A higher repetition penalty discourages the model from repeating phrases or words too often, leading to more varied responses. This is useful when you want to avoid repetitive or redundant outputs in generated text.
- **Stop Sequence:** The stop sequence tells the model when to stop generating text. It is useful when you want to prevent the model from continuing beyond a certain point. For example, if your output should end after a specific sentence or character, you can define a stop sequence like a newline (`\n`) or a punctuation mark (`.`).

By understanding these elements, you'll be able to guide the LLM effectively and obtain the most useful and relevant responses for your tasks.

### LLM Interaction with Internal Processes

This diagram represents the flow of interaction with a Large Language Model (LLM), including both the external and internal processes involved when sending a request and receiving a response.

1. **Define Request**: The user defines their request to the LLM.
2. **Payload Preparation**
	* **System Prompt**: Sets the role or behavior of the LLM.
	* **User Prompt**: Specifies the question or task to be solved.
	* **Parameters**: Additional settings for fine-tuning the response (e.g., temperature, stop sequences).
3. **Send Request to LLM**: The prepared payload is sent to the LLM for processing.
4. **LLM Internal Processing**
	* **Tokenization**: Breaks down input into smaller parts (tokens).
	* **Inference/Computation**: Computes a response based on input tokens.
	* **Detokenization**: Converts output tokens back to human-readable text.
	* **Post-Processing**: Makes final adjustments based on parameters.
5. **Receive & Process Response**: The LLM generates and sends the response.
6. **Output Result**: The processed response is displayed to the user.
7. **End**: The process concludes once the result is delivered.

![image](images/llm_flow.png)

## 1. Setting up the Environment

Before we can communicate with the LLM, let’s install any required libraries and ensure our environment is ready.

In [1]:
%pip install requests

Note: you may need to restart the kernel to use updated packages.


We'll be using the `requests` library to send our prompts to the LLM endpoint.

## 2. Configuring the LLM

Next, we’ll set up a configuration to define which model to use and other parameters like temperature and stop sequences.

In [2]:
def setup_llm_model(model="llama3.1:latest", temperature=0.0, stop=None):
    return {
        "model_endpoint": "http://localhost:11434/api/generate",
        "model": model,
        "temperature": temperature,
        "headers": {"Content-Type": "application/json"},
        "stop": stop,
    }


# Example configuration
llm_config = setup_llm_model()
print("Model configuration:", llm_config)

Model configuration: {'model_endpoint': 'http://localhost:11434/api/generate', 'model': 'llama3.1:latest', 'temperature': 0.0, 'headers': {'Content-Type': 'application/json'}, 'stop': None}


This function sets up the configuration for the LLM, including the endpoint and parameters like temperature and stop sequence.

## 3. Crafting the Prompts
 
In the context of Large Language Models (LLMs), **prompts** are crucial for guiding the model's behavior and obtaining useful outputs. They consist of two main components:

- **System Prompt:** This defines the role, tone, and behavior of the model. It acts as a set of instructions or rules for how the LLM should respond. The system prompt sets the stage for the interaction by shaping the model's personality or context. For example, you can instruct the model to act as a teacher, assistant, or subject matter expert.

  - *Example:* "You are a helpful assistant that provides concise and factual answers to technical questions." 

- **User Prompt:** This is the actual input or question provided by the user. It is typically the main request or query for which the user seeks an answer or action. The quality of the user prompt is key, as clear and specific questions yield more accurate and relevant responses from the model.

   - *Example:* "What are the key features of Large Language Models?"

In [3]:
# Example system and user prompts
sys_prompt = "<|begin_of_text|><|start_header_id|>system<|end_header_id|>You are a helpful assistant that provides concise and accurate answers.<|eot_id|>"

user_request = "What are the key features of Large Language Models?"
user_prompt = (
    f"""<|start_header_id|>user<|end_header_id|>\n\n{user_request}<|eot_id|>"""
)

In [4]:
def prepare_payload(user_prompt: str, sys_prompt: str, config: dict):
    return {
        "model": config["model"],
        "prompt": user_prompt,
        "system": sys_prompt,
        "temperature": config["temperature"],
        "stop": config["stop"],
        "stream": False,
    }


# Prepare the payload
payload = prepare_payload(
    user_prompt=user_prompt, sys_prompt=sys_prompt, config=llm_config
)
print("Prepared payload:", payload)

Prepared payload: {'model': 'llama3.1:latest', 'prompt': '<|start_header_id|>user<|end_header_id|>\n\nWhat are the key features of Large Language Models?<|eot_id|>', 'system': '<|begin_of_text|><|start_header_id|>system<|end_header_id|>You are a helpful assistant that provides concise and accurate answers.<|eot_id|>', 'temperature': 0.0, 'stop': None, 'stream': False}


The payload contains the system and user prompts along with the LLM configuration.

## 4. Making a Call to the LLM

Now that we have the prompts prepared, we’ll send the request to the LLM endpoint and retrieve the response.

In [5]:
import requests
import json


def send_request_to_llm(payload: dict, config: dict):
    try:
        response = requests.post(
            config["model_endpoint"],
            headers=config["headers"],
            data=json.dumps(payload),
            timeout=30,
        )
        response.raise_for_status()
        return response.json()
    except requests.RequestException as e:
        return {"error": str(e)}


# Send the request
response = send_request_to_llm(payload=payload, config=llm_config)
print("LLM response:", response)

LLM response: {'model': 'llama3.1:latest', 'created_at': '2024-09-11T13:25:34.054583Z', 'response': 'The key features of Large Language Models (LLMs) include:\n\n1. **Deep Learning Architecture**: LLMs are based on deep learning architectures, typically using Transformer models, which consist of self-attention mechanisms and fully connected layers.\n2. **Multi-Layer Perceptron (MLP)**: The core component of an LLM is a series of interconnected neural networks (MLPs) that process input text sequences one token at a time.\n3. **Self-Attention Mechanism**: This mechanism allows the model to weigh importance to different parts of the input sequence and generate context-dependent representations.\n4. **Parallel Processing**: LLMs can process multiple tokens simultaneously, making them computationally efficient for long input sequences.\n5. **Contextual Understanding**: LLMs capture contextual relationships between words in a sentence or passage, enabling them to better understand nuances an

This function sends the prompt to the LLM and handles any errors that might occur during the request.

## 5. Processing the LLM Response

Finally, let's process the response and display the relevant information in a user-friendly format.

In [6]:
def process_response(response: dict):
    if "error" in response:
        return f"Error: {response['error']}"
    return response.get("response", "No response from the model")

# Process the LLM response
processed_response = process_response(response)
print("Processed response:", processed_response)

Processed response: The key features of Large Language Models (LLMs) include:

1. **Deep Learning Architecture**: LLMs are based on deep learning architectures, typically using Transformer models, which consist of self-attention mechanisms and fully connected layers.
2. **Multi-Layer Perceptron (MLP)**: The core component of an LLM is a series of interconnected neural networks (MLPs) that process input text sequences one token at a time.
3. **Self-Attention Mechanism**: This mechanism allows the model to weigh importance to different parts of the input sequence and generate context-dependent representations.
4. **Parallel Processing**: LLMs can process multiple tokens simultaneously, making them computationally efficient for long input sequences.
5. **Contextual Understanding**: LLMs capture contextual relationships between words in a sentence or passage, enabling them to better understand nuances and subtleties of language.
6. **Large Vocabulary Size**: LLMs are trained on vast amount

## Using Tools

In this section, we will explore how the LLM can use external tools to enhance its capabilities. We'll begin by adding and configuring a search tool to allow the LLM to perform real-time searches.

![image](images/llm_tool_flow.png)

### Key Steps:

1. **User Accesses Jupyter Notebook**:
   - The interaction begins with the user providing input or a prompt in the Jupyter Notebook environment.
   - The notebook acts as the intermediary, handling the user’s request.

2. **Notebook Sends Prompt to LLM**:
   - Once the user submits their input, the notebook sends the system and user prompts to the LLM. In this example, the LLM resides within a Python environment (e.g., using Ollama for model serving).

3. **LLM Processes and Responds**:
   - The LLM processes the prompt, generating a response. The LLM might decide if external real-time data or tools are required for the request.

4. **LLM Invokes External Tools**:
   - If the LLM determines that real-time data is needed (such as search results), it invokes the appropriate tool (e.g., DuckDuckGo) via the Python environment.

5. **Tool Accesses External Environment**:
   - The tool retrieves data from the external environment (e.g., querying DuckDuckGo for real-time search results). The fetched data is sent back to the notebook for further processing.

6. **External Environment Sends Response**:
   - The external environment (e.g., a search engine or API) responds to the tool with the necessary data.

7. **Notebook Processes Final Response**:
   - The notebook receives the response from either the LLM directly (if no external data was needed) or from the tool. The final response is then formatted and returned to the user.

## 6. Change to use an instruct model

To better align the LLM’s behavior with task-based instructions, we will switch to an "instruct" model, which is optimized for handling more directive inputs.

In [7]:
# Example configuration
llm_config = setup_llm_model(model="llama3.1:8b-instruct-fp16")
print("Model configuration:", llm_config)

Model configuration: {'model_endpoint': 'http://localhost:11434/api/generate', 'model': 'llama3.1:8b-instruct-fp16', 'temperature': 0.0, 'headers': {'Content-Type': 'application/json'}, 'stop': None}


## 7. DuckDuckGo Search Tool

### Installing the DuckDuckGo Search Tool

To enable the LLM to perform real-time searches, we need to install the `duckduckgo-search` library using the `langchain_community` package. This will allow the agent to search the web using DuckDuckGo.


In [8]:
%pip install langchain_community

Note: you may need to restart the kernel to use updated packages.


### Setting Up the DuckDuckGo Search Tool

After installation, we will configure the tool by initializing the DuckDuckGo search functionality. This will be the primary tool for real-time search queries in this example.

In [9]:
from langchain_community.tools import DuckDuckGoSearchRun

# Initialize DuckDuckGo Search Tool
duckduckgo_search = DuckDuckGoSearchRun()

# Verify tool name
print("Search Tool Name:", duckduckgo_search.name)

# Adding the search tool to the list of available tools
tools = [duckduckgo_search]

Search Tool Name: duckduckgo_search


In [10]:
# Render tool description to ensure it's ready for LLM usage
from langchain.tools.render import render_text_description_and_args

tools_name = duckSearch.name

tools_description = (
    render_text_description_and_args(tools).replace("{", "{{").replace("}", "}}")
)
print("Tools Name:\n", tools_name)
print("Tools Description:\n", tools_description)

NameError: name 'duckSearch' is not defined

## 8. Modifying the System Prompt for Tool Integration

Now that the DuckDuckGo search tool is set up, we need to rewrite the system prompt to provide tool usage instructions. The LLM will reference this prompt when deciding whether to call the tool.

### Updated System Prompt:

The updated system prompt provides tool access instructions, explaining how to call the tool and formatting function calls as needed. This ensures that the LLM understands when and how to interact with the DuckDuckGo search tool.

In [None]:
# Updated System Prompt with Tool Instructions
sys_prompt = """
<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Environment: ipython
Tools: {tools_name}
Cutting Knowledge Date: December 2023
Today's Date: 23 July 2024

---

You are an intelligent assistant designed to handle various tasks, including answering questions, providing summaries, and performing detailed analyses. All outputs must strictly be in JSON format.

---

## Tools
You have access to a variety of tools to assist in completing tasks. You are responsible for determining the appropriate sequence of tool usage to break down complex tasks into subtasks when necessary.

The available tools include:

{tools_description}

# Tool Instructions
- Always use the available tools when asked for real-time or updated information.
- If you choose to call a function ONLY reply in the following format:

{{
  "action": "Specify the tool you want to use.",
  "action_input": {{ # Provide valid JSON input for the action, ensuring it matches the tool’s expected format and data types.
    "key": "Value inputs to the tool in valid JSON format."
  }}
}}

Example:

{{
  "action": "duckduckgo_search",
  "action_input": {{
    "query": "Key features of large language models"
  }} 
}} 

Reminder:
- Do not include additional metadata such as `title`, `description`, or `type` in the `tool_input`
- Function calls MUST follow the specified format
- Required parameters MUST be specified
- Only call one function at a time
- Put the entire function call reply on one line

You are a helpful assistant.<|eot_id|>
"""

### Format the System Prompt

Now we will insert the tool descriptions and finalize the system prompt by formatting it to include the correct tool name and description. This prompt will guide the LLM in calling the DuckDuckGo search tool when necessary.

In [None]:
formatted_sys_prompt = sys_prompt.format(
    tools_name=tools_name, tools_description=tools_description
)

## 9. New Request

We will now make a new request, asking the LLM to search for the latest trends related to Large Language Models. This time, the LLM will have the ability to call the DuckDuckGo search tool to retrieve real-time information.

In [None]:
user_request = "What are the most recent Large Language Models?"

# Prepare the payload
payload = prepare_payload(
    user_prompt=user_prompt, sys_prompt=formatted_sys_prompt, config=llm_config
)

processed_response = process_response(response)
response = send_request_to_llm(payload=payload, config=llm_config)
print("LLM response:", response)

print("Processed response:", processed_response)

## 10. Processing Tool Actions

After the LLM provides its response, we will process the action request. If the LLM calls the DuckDuckGo search tool, we will execute the function and retrieve the results. The tool's output will then be formatted and displayed.

In [None]:
response_dict = json.loads(processed_response)
action = response_dict["action"]
action_input = response_dict["action_input"]

for tool in tools:
    if tool.name == action:
        print(tool.name)
        try:
            result = tool.invoke(action_input)
            result_message = f"""<|start_header_id|>ipython<|end_header_id|>\n\n{result}<|eot_id|>"""
            print(result_message)
        except Exception as e:
            print(f"Error executing tool {action}: {str(e)}")
    else:
        print(f"Tool {action} not found or unsupported operation.")

This function ensures that we handle any errors gracefully and return the model's response in a readable format.

## Conclusion
 
In this notebook, we've successfully extended the capabilities of the LLM by integrating an external tool (DuckDuckGo search). We covered:

- **LLM and Tool Integration:** How to modify the system prompt for tool access.
- **Tool Setup and Configuration:** Setting up the DuckDuckGo search tool for real-time information retrieval.
- **Making Requests and Handling Responses:** Using the tool during LLM interactions and processing the results.

With these tools in place, you're now equipped to extend the LLM's capabilities even further by adding more tools and refining its behavior. Happy coding!