![](./assets/icon.jpg)

# Brief Recap

**LlamaIndex** is an open source framework for building Context-Augmented LLM-powered agents (knowledge assistants) with LLMs and workflows (multi-step processes that combine one or more agents, data connectors, and other tools to combine a task). Context Augmentation makes data available to the LLM to solve the problem at hand. LlamaIndex provided tools to ingest, parse, index, and process your data and quickly implement complex workflows combining data access with LLM prompting. Some of the tools provided by LlamaIndex:

1. Data connectors- APIs, PDFs, SQL, and many more to ingest existing data from their native source and format
2. Data indexes- structure data in a representation that is easy for LLMs to consume
3. Engines- Query Engines for question-answering and Chat Engines for "back and forth" interaction with data
4. Agents- LLM powered knowledge workers augmented by tools
5. Workflows- combine all the above into event-driven systems that can be deployed as production microservices

# Architecture

![](./assets/flow.png)

Here's the architectutre and workflow of a LlamaIndex powered document processing and question answering system. Let me breakdown the ```managed indexed```:

* Document Storage: getting your data from where it lives (PDF or a database)
    
    - The system starts with PDF files on the left side
    
    - These files get wrapped in a ```Document``` container

* Chunk storage: storing and managing smaller segments of larger documents

    - create smaller chunks based on text splitting strategies (fixed size splitting or semantic splitting)

    - make the smaller chunks accessible for retrieval by assigning each chunk a unique identifier, metadata, and embeddings

* Vector storage: storing vector representations of the given data

    - generate embeddings from the documents

    - system fetches the most relevant chunks based on similarity search or keyword-based retrieval

    - assign an end-to-end flow to interact with the user (more on this later)

# Use Cases

* **Question Answering**

    * Perform QA with LLMs through Retrieval Augmented Generation

    * Perform QA using semantic search and summarization techniques over unstructured text such as text, PDFs, Notion, and Slack documents. LlamaParse allows to parse complex documents having text, tables, charts, images, footers

    * Query data in a SQL database, CSV file, or other structured formats. This includes text-to-sql and text-to-pandas

* **Chatbots**

    * Knowledge management and enterprise search

    * Health care and customer support services

    * Virtual assistants for e-commerce and retail

* **Document Understanding and Data Extraction**

    * Read natural language and identify semantically important details such as names, dates, addresses and return them in a consistent format

    * Create source materials such as chat logs and conversation transcripts

* **Autonomous Agents**

    * Generate a multimodal report using a multi-agent researcher, writer workflow, and LlamaParse

    * A "text-to-SQL assistant" that can interact with a structured database

    * Agentic RAG- build a context-augmented research assistant over your data that not only answers simple questions, but complex research tasks

    * Build a coding assistant that can operate over code

* **Multi-modal applications**

    * All the core RAG concepts: indexing, retrieval, and synthesis, can be extended into the image setting

    * You can generate a structured output with the new OpenAI GPT4V via LlamaIndex. The user just needs to specify a Pydantic object to define the structure of the output

    * Retrieval-Augmented Image Captioning- first caption the image with a multi-modal model, then refine the caption by retrieving it from a text corpus

* **Fine-tuning**: LlamaIndex allows fine-tuning Llama2, cross-encoders, and GPT-3.5 to distill GPT-4. It has multiple use cases such as:

    * Multilingual Applications- supporting users in multiple languages or dialects

    * Domain-Specific Knowledge Retrieval- legal, medical, or technical fields where accurate and context-sensitive answers are critical

    * Personalized Financial Advisory- chatbots for investment firms providing tailored portfolio suggestions

    * Healthcare and Diagnostics- assisting clinicians or patients with medical information and diagnostics

# Components

* Documents: a container around a data source that stores some text along with other attributes- i) metadata (dictionary of annotations), ii) relationships (dictionary containing relationships to other Documents/Nodes)

* VectorStoreIndexes: builds an index on a list of Node objects

* Agents: It is a software powered by an LLM that executes a series of steps towards solving a task with the help of a given set of tools.

* Query engines: end-to-end flow that takes in a natural language query and returns a response along with reference context retrieved and passed to the LLM

* Chat engines: end-to-end flow for having a conversation with your data

# Implementation

## **Initial Setup**

Firstly we are going to pip install llamaindex: `pip install llama-index`

This is a starter bundle of packages and installs the following openai packages:

1. `llama-index-llms-openai`
2. `llama-index-embeddings-openai`
3. `llama-index-program-openai`
4. `llama-index-question-gen-openai`
5. `llama-index-agent-openai`
6. `llama-index-multi-modal-llms-openai`

By default, we would be using OpenAI **gpt-3.5-turbo for text generation** and **text-embedding-ada-002 for retrieval and embeddings**.

## **API Keys**

Sign up and gather your API keys from [OpenAI website](https://platform.openai.com/docs/overview)

## Understanding Documents and Nodes

Run the cells below to understand the structure of nodes and documents

In [None]:
from llama_index.core import SimpleDirectoryReader
from llama_index.core.node_parser import SentenceSplitter

# creating documents with data loader
documents = SimpleDirectoryReader("YOUR DATA PATH").load_data()
nodes = SentenceSplitter().get_nodes_from_documents(documents)

In [None]:
print(documents)

In [None]:
print(nodes)

You can also customise documents with your own custom metadata. Steps can be found [here](https://docs.llamaindex.ai/en/stable/module_guides/loading/documents_and_nodes/usage_documents/)

# Creating Question Answering system using LlamaIndex and OpenAI

In [None]:
# Save your API keys in an env variable
import os

os.environ['OPENAI_API_KEY']="YOUR API KEY"

## Loading and indexing

In [None]:
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

documents = SimpleDirectoryReader("YOUR DATA PATH").load_data()
index = VectorStoreIndex.from_documents(documents)

The index is loaded in memory as a series of vector embeddings. As a better practice it is better to save to a disk (make it persistent)

In [None]:
from llama_index.core import (
    StorageContext,
    load_index_from_storage,
)

PERSIST_DIR = "YOUR-DIRECTORY"
# check if storage already exists
if not os.path.exists(PERSIST_DIR):
    # load the documents and create the index
    documents = SimpleDirectoryReader("data").load_data()
    index = VectorStoreIndex.from_documents(documents)
    # store it for later
    index.storage_context.persist(persist_dir=PERSIST_DIR)
else:
    # load the existing index
    storage_context = StorageContext.from_defaults(persist_dir=PERSIST_DIR)
    index = load_index_from_storage(storage_context)

An improvement on the existing approach can be putting nodes in a docstore. This allows to define multiple indices over the same underlying docstore instead of duplicating data across indices. Implementiing docstores is outside the scope of this lab, but you can follow this [guide](https://docs.llamaindex.ai/en/stable/examples/docstore/DynamoDBDocstoreDemo/)

## Create Query Engine

In [None]:
query_engine = index.as_query_engine()
response = query_engine.query("Some question about the data should go here")
print(response)

In [None]:
# stream response
response.print_response_stream()

## Create Chat Engine

In [None]:
chat_engine = index.as_chat_engine()
response = chat_engine.chat("Some question about the data should go here")
for token in response.response_gen:
    print(token, end="")

In [None]:
# customize chat engine
from llama_index.llms.openai import OpenAI

llm = OpenAI(model="gpt-3.5-turbo-0613")
chat_engine = index.as_chat_engine(chat_mode="openai", llm=llm, verbose=True)
response = chat_engine.chat("Some question about the data should go here")
print(response)

In [None]:
for token in response.response_gen:
    print(token, end="")

## Force chat engine to query an index

- make use of the ```query_engine_tool``` under the hood

In [None]:
response = chat_engine.chat(
    "Some question about the data should go here", tool_choice="query_engine_tool"
)
print(response)

# Agents

Structure of basic agent: ```agent = ReActAgent.from_tools([multiply_tool, add_tool], llm=llm, verbose=True)```

In [None]:
from llama_index.core.agent import ReActAgent
from llama_index.llms.openai import OpenAI
from llama_index.core.tools import FunctionTool

In [None]:
# create basic tools
def multiply(a: float, b: float) -> float:
    """Multiply two numbers and returns the product"""
    return a * b

def add(a: float, b: float) -> float:
    """Add two numbers and returns the sum"""
    return a + b

# define tool to multiply two numbers
multiply_tool = FunctionTool.from_defaults(fn=multiply)
# define tool to sum two numbers
add_tool = FunctionTool.from_defaults(fn=add)

In [None]:
# define llm
llm = OpenAI(model="gpt-3.5-turbo", temperature=0)

In [None]:
# initialize the agent
agent = ReActAgent.from_tools([multiply_tool, add_tool], llm=llm, verbose=True)

In [None]:
# obtain a response
response = agent.chat("What is 20+(2*4)? Use a tool to calculate every step.")