![](./assets/icon.jpg)

# Brief Recap

**LlamaIndex** is an open source framework for building Context-Augmented LLM-powered agents (knowledge assistants) with LLMs and workflows (multi-step processes that combine one or more agents, data connectors, and other tools to combine a task). Context Augmentation makes data available to the LLM to solve the problem at hand. LlamaIndex provided tools to ingest, parse, index, and process your data and quickly implement complex workflows combining data access with LLM prompting. Some of the tools provided by LlamaIndex:

1. Data connectors- APIs, PDFs, SQL, and many more to ingest existing data from their native source and format
2. Data indexes- structure data in a representation that is easy for LLMs to consume
3. Engines- Query Engines for question-answering and Chat Engines for "back and forth" interaction with data
4. Agents- LLM powered knowledge workers augmented by tools
5. Workflows- combine all the above into event-driven systems that can be deployed as production microservices

# Use Cases

* **Question Answering**

    * Perform QA with LLMs through Retrieval Augmented Generation

    * Perform QA using semantic search and summarization techniques over unstructured text such as text, PDFs, Notion, and Slack documents. LlamaParse allows to parse complex documents having text, tables, charts, images, footers

    * Query data in a SQL database, CSV file, or other structured formats. This includes text-to-sql and text-to-pandas

* **Chatbots**

    * Knowledge management and enterprise search

    * Health care and customer support services

    * Virtual assistants for e-commerce and retail

* **Document Understanding and Data Extraction**

    * Read natural language and identify semantically important details such as names, dates, addresses and return them in a consistent format

    * Create source materials such as chat logs and conversation transcripts

* **Autonomous Agents**

    * Generate a multimodal report using a multi-agent researcher, writer workflow, and LlamaParse

    * A "text-to-SQL assistant" that can interact with a structured database

    * Agentic RAG- build a context-augmented research assistant over your data that not only answers simple questions, but complex research tasks

    * Build a coding assistant that can operate over code

* **Multi-modal applications**

    * All the core RAG concepts: indexing, retrieval, and synthesis, can be extended into the image setting

    * You can generate a structured output with the new OpenAI GPT4V via LlamaIndex. The user just needs to specify a Pydantic object to define the structure of the output

    * Retrieval-Augmented Image Captioning- first caption the image with a multi-modal model, then refine the caption by retrieving it from a text corpus

* **Fine-tuning**: LlamaIndex allows fine-tuning Llama2, cross-encoders, and GPT-3.5 to distill GPT-4. It has multiple use cases such as:

    * Multilingual Applications- supporting users in multiple languages or dialects

    * Domain-Specific Knowledge Retrieval- legal, medical, or technical fields where accurate and context-sensitive answers are critical

    * Personalized Financial Advisory- chatbots for investment firms providing tailored portfolio suggestions

    * Healthcare and Diagnostics- assisting clinicians or patients with medical information and diagnostics

# Initial Setup

Firstly we are going to download ```ollama``` from [here](https://ollama.com)

Then,

```ollama pull llama3.2```

Now pip install llamaindex: `pip install llama-index` 

and ollama for llamaindex: `pip install llama-index-llms-ollama`

## LLM Wrapper

In this section, we're going to see how LlamaIndex allows us to interact with LLMs

In [None]:
# run basic query with Ollama wrapper

from llama_index.llms.ollama import Ollama

llm = Ollama(model="llama3.1:latest", request_timeout=120.0)
resp = llm.complete("Who is Paul Graham?")

In [None]:
print(resp)

* It takes in a "llama3.1:latest" LLM
* The output is something similar to when you run it by the OpenAI API directly

Ollama supports a JSON mode which ensures all responses are valid JSON

In [None]:
llm = Ollama(model="llama3.1:latest", request_timeout=120.0, json_mode=True)
response = llm.complete(
    "Who is Paul Graham? Output as a structured JSON object."
)
print(str(response))

## using Ollama as a LLM for a chat model

**Breakdown**

Initializing the Ollama instance:

```llm = Ollama(model="llama3.1:latest", request_timeout=120.0)```

* This creates an instance of the Ollama class, which is a wrapper for interacting with Meta's chat models that have been setup locally.
* model_name="llama3.1:latest" specifies that we're using the latest version of the "llama3.1" model.

**Defining the messages**:

```

messages = [
    ChatMessage(
        role = "system", content="You are an expert data scientist"
    ),
    ChatMessage(
        role="user", content="Write a Python script that trains a neural network on simulated data."
    )
]

```

This creates a list of ```ChatMessage``` objects messages that will be sent to the chat model.

The role defines whom the instruction is meant for. 

- ```system``` provides overall instructions or context to the model. Here, it's instructing the model to act as an "expert data scientist."
- ```user``` represents the user's input or query. In this case, it's asking the model to "Write a Python script that trains a neural network on simulated data."

**In essence**,

This sets up a conversation with the "llama3.1" chat model, provides it with a system prompt and a user query, and then prints the model's response.

In [None]:
from llama_index.core.llms import ChatMessage

# list of messages to send to the chat model
messages = [
    ChatMessage(
        role="system", content="You are an expert data scientist"
    ),
    ChatMessage(role="user", content="Write a Python script that trains a neural network on simulated data."),
]

In [None]:
llm = Ollama(model="llama3.1:latest", request_timeout=120.0)
resp = llm.chat(messages)

**In essence**,

This sets up a conversation with the "llama3.1" chat model, provides it with a system prompt and a user query, and then prints the model's response.

## Structured Outputs

Ollama has builtin structured output capabilities which allows attaching a Pydantic class to the LLM. Now we see how to structure a LLM generated response as a dictionary without the need of generating the response as a valid JSON

In [None]:
from llama_index.core.bridge.pydantic import BaseModel

# define the Pydantic class
class Song(BaseModel):
    """A song with name and artist."""

    name: str
    artist: str

In [None]:
llm = Ollama(model="llama3.1:latest", request_timeout=120.0)
sllm = llm.as_structured_llm(Song)

In [None]:
from llama_index.core.llms import ChatMessage

response = sllm.chat([ChatMessage(role="user", content="Name a random song!")])
print(response.message.content)

**In essence**,

we generate a response and structure it as a dictionary where the keys have the same name as the Pydantic class variables