Starter Tutorial (Using Local LLMs)#

This tutorial will show you how to get started building agents with LlamaIndex. We'll start with a basic example and then show how to add RAG (Retrieval-Augmented Generation) capabilities.

We will use BAAI/bge-base-en-v1.5 as our embedding model and llama3.1 8B served through Ollama.

Tip

Make sure you've followed the installation steps first.
Setup#

Ollama is a tool to help you get set up with LLMs locally with minimal setup.

Follow the README to learn how to install it.

To download the Llama3 model just do ollama pull llama3.1.

NOTE: You will need a machine with at least ~32GB of RAM.

As explained in our installation guide, llama-index is actually a collection of packages. To run Ollama and Huggingface, we will need to install those integrations:

In [1]:
%pip install llama-index-llms-ollama llama-index-embeddings-huggingface

Note: you may need to restart the kernel to use updated packages.


In [None]:
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from llama_index.core.agent.workflow import AgentWorkflow
from llama_index.llms.ollama import Ollama
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
import asyncio
import os

# Settings control global defaults
Settings.embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-base-en-v1.5")
Settings.llm = Ollama(model="llama3.2:latest", request_timeout=360.0)

# Create a RAG tool using LlamaIndex
documents = SimpleDirectoryReader("/home/daimler/workspaces/agents-course-huggingface/chat-neus-catala/data/documents").load_data()
index = VectorStoreIndex.from_documents(
    documents,
    show_progress=True
    # we can optionally override the embed_model here
    # embed_model=Settings.embed_model,
)
query_engine = index.as_query_engine(
    # we can optionally override the llm here
    # llm=Settings.llm,
)


def multiply(a: float, b: float) -> float:
    """Useful for multiplying two numbers."""
    return a * b


async def search_documents(query: str) -> str:
    """Useful for answering natural language questions about an personal essay written by Paul Graham."""
    response = await query_engine.aquery(query)
    return str(response)


# Create an enhanced workflow with both tools
agent = AgentWorkflow.from_tools_or_functions(
    [multiply, search_documents],
    llm=Settings.llm,
    system_prompt="""You are a helpful assistant that can perform calculations
    and search through documents to answer questions.""",
)

questions = [
    "What did the author do in college?",
    "What's 7 * 8?",
    "Who bought the startup?",
]
for question in questions:
    # Ask a question about the documents or do calculations
    response = await agent.run(question )
    print('question',question)
    print('response',response)



Parsing nodes:   0%|          | 0/516 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/533 [00:00<?, ?it/s]

: 