# From RAG to riches: Build an AI document interrogation app in 30 mins

Philip Meier | PyData Global 2023 | Friday, December 8, 2023

<img src="images/quansight.svg" width=70%>

## Retrieval-Augmented Generation (RAG): Make LLMs more useful

LLMs are trained on vast but fixed datasets.

![alttext](images/ragna-chatgpt.png)

There are two ways to inject new data:

1. Fine tuning the model on the data
2. Adding the data to the prompt

RAG is a special case of 2.

![alttext](images/rag.png)

## Ragna

Ragna is an extensible queue-backed (This is going to change [soon](https://github.com/Quansight/ragna/pull/205)!) framework that provides:

- A **Python API designed for experimentation** that allows you to mix and match the different components of a RAG model (LLMs, vector databases, tokenization strategies, embedding models, etc.) to see their effects on performance and accuracy.
- A **REST API that allows you to build RAG-based web applications** or query from other clients like Slack, Mattermost, etc. It wraps around the Python API and provides a consistent developer experience to scale quickly.
- A fully featured [Panel](https://panel.holoviz.org/)-based **GUI to select and configure LLMs**, upload documents, and chat with the LLM. For use as an out-of-the-box solution or as a reference to build custom low-code web applications.

Install it with

```shell
pip install 'ragna[all]'
```

In [1]:
import ragna

ragna.__version__

'0.1.2'

### Python API

In [2]:
documents = [
    "documents/ford-10k.pdf",
    "documents/gm-10k.pdf",
    "documents/ragna.txt",
]

with open(documents[-1]) as file:
    print(file.read())

Ragna is a new open source project built by Quansight. It is designed to allow organizations to explore the power of Retrieval-augmented generation (RAG) based AI tools. Ragna provides an intuitive API for quick experimentation and built-in tools for creating production-ready applications allowing you to quickly leverage Large Language Models (LLMs) for your work.

At its core, Ragna is a plugin-based framework with a scalable queue based backend that provides:

 - Python API designed for experimentation that allows you to explore and test different LLMs, vector databases and embedding models quickly in Python.

- A REST API that allows you to build custom RAG-based web applications for your particular needs.

- A fully featured web application built with Panel (https://panel.holoviz.org) to select and configure LLMs, upload documents, and chat with the LLM. Designed for use as an out-of-the-box solution or as a reference to build custom web applications.

The Ragna website is https://

In [3]:
from ragna import Rag, source_storages, assistants

In [4]:
from ragna.core import SourceStorage, Assistant

for base_cls, module in [
    (SourceStorage, source_storages),
    (Assistant, assistants),
]:
    print(f"{base_cls.__name__}:")
    for cls in module.__dict__.values():
        if isinstance(cls, type) and issubclass(cls, base_cls):
            print(f"  - {cls.display_name()}")

SourceStorage:
  - Chroma
  - Ragna/DemoSourceStorage
  - LanceDB
Assistant:
  - Anthropic/claude-2
  - Anthropic/claude-instant-1
  - Ragna/DemoAssistant
  - MosaicML/mpt-7b-instruct
  - MosaicML/mpt-30b-instruct
  - OpenAI/gpt-4
  - OpenAI/gpt-3.5-turbo-16k


In [5]:
import os
import dotenv

assert dotenv.load_dotenv()
assert "OPENAI_API_KEY" in os.environ

In [6]:
rag = Rag()
chat = rag.chat(
    documents=documents,
    source_storage=source_storages.Chroma,
    assistant=assistants.Gpt35Turbo16k,
)

In [7]:
await chat.prepare()

Message(content='How can I help you with the documents?', role=<MessageRole.SYSTEM: 'system'>, sources=[])

In [8]:
answer = await chat.answer("What is the Ragna framework?")
print(answer)

The Ragna framework is an open source project developed by Quansight. It is designed to enable organizations to explore the capabilities of Retrieval-augmented generation (RAG) based AI tools. Ragna provides a plugin-based framework with a scalable queue-based backend. It offers a Python API for experimentation with different Large Language Models (LLMs), vector databases, and embedding models. Additionally, Ragna provides a REST API for building custom RAG-based web applications and a fully featured web application built with Panel for selecting and configuring LLMs, uploading documents, and interacting with the LLM. It can be used as an out-of-the-box solution or as a reference for building custom web applications.


In [9]:
for idx, source in enumerate(answer.sources, 1):
    print(f"{idx}. {source.document.name}, {source.location}")

1. ragna.txt, 
2. ford-10k.pdf, 173, 174


### REST API / Web UI

[`tryragna.ipynb`](tryragna.ipynb)

## Extending Ragna with a local LLM

[`local_llm.py`](local_llm.py)

### Python API

In [10]:
import local_llm

assert local_llm.Airoboros.is_available()

In [11]:
async with rag.chat(
    documents=documents,
    source_storage=source_storages.Chroma,
    assistant=local_llm.Airoboros,
) as chat:
    print(await chat.answer("What is the Ragna framework?"))

Skipping module injection for FusedLlamaMLPForQuantizedModel as currently not supported with use_triton=False.


Ragna is a framework designed to allow organizations to explore the power of Retrieval-augmented generation (RAG) based AI tools. It provides an intuitive API for quick experimentation and built-in tools for creating production-ready applications.


### REST API / Web UI

```shell
$ ragna init
```

[`ragna.toml`](ragna.toml)

```shell
$ ragna ui --config ragna.toml
```