# Building a RAG E-book Librarian Using LlamaIndex

In this example, we will build a RAG-based "librarian" for a local library.

We would like our librarian to be lightweight and run locally as much as possible with minimal dependencies, which means that we will leverage open-source to the fullest extent possible, as well as bias towards models that can be executed locally on typical hardware.

We will use the following
* LlamaIndex - a data framework for LLM-based applciation designed specifically for RAG( unlike LangChain)
* Ollama - a user-friendly solution for running LLMs
* `BAAI/bge-base-en-v1.5` embedding model - lightweight in size and good performance
* Llama 2, running via Ollama.

## Setups

In [None]:
!pip install -qU llama-index EbookLib html2text llama-index-embeddings-huggingface llama-index-llms-ollama

### Ollama installation

In [None]:
!apt install pciutils lshw

In [None]:
!curl -fsSL https://ollama.com/install.sh | sh

Run Ollama service in the background

In [None]:
get_ipython().system_raw('ollama serve $')

In [None]:
# pull llama 2 from the Ollama library
!ollama pull llama2

## Test library setup

We need to create a test "library". Assuming that our library is a **nested directory of `.epub` files**.

In [None]:
!mkdir -p "./test/library/jane-austen"
!mkdir -p "./test/library/victor-hugo"
!wget https://www.gutenberg.org/ebooks/1342.epub.noimages -O "./test/library/jane-austen/pride-and-prejudice.epub"
!wget https://www.gutenberg.org/ebooks/135.epub.noimages -O "./test/library/victor-hugo/les-miserables.epub"

## RAG with LlamaIndex

RAG with LlamaIndex consists of the following broad phases:
1. **Loading** - we tell LlamaIndex where our data lives and how to load it
2. **Indexing** - we augment our loaded data to facilitate querying, e.g., with vector embeddings
3. **Querying** - we configure an LLM to act as the query interface for our indexed data

### Loading

In [None]:
from llama_index.core import SimpleDirectoryReader

loader = SimpleDirectoryReader(
    input_dir='./test',
    recursive=True,
    required_exts=['.epub']
)

documents = loader.load_data()

This converts our ebooks into a set of `Documents` for LlamaIndex to work with.

Note that the documents here **have NOT been chunked at this stage**.

### Indexing

The indexing will allow our RAG pipeline to look up the relevant context for our query to pass to our LLM to **argument** their generated response. This is also where document chunking will take place.

`VectorStoreIndex` is a default entrypoint for indexing in LlamaIndex. It uses a simple, in-memory dictionary to store the indices, but LlamaIndex also supports a wide variety of vector storage solution for us to scale.

By default, LlamaIndex uses a chunk size of 1024 and a chunk overlap of 20.

We will use the [`BAAI/bge-small-en-v1.5`](https://huggingface.co/BAAI/bge-base-en-v1.5) to generate our embeddings. By default, LlamaIndex uses OpenAI. However, LlamaIndex supports retrieving embedding models from HuggingFace through the `HuggingFaceEmbedding` class.

In [None]:
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

embedding_model = HuggingFaceEmbedding(model_name='BAAI/bge-base-en-v1.5')

Then we will pass that into `VectorStoreIndex` as our embeeding model to circumvent the OpenAI default behavior.

In [None]:
from llama_index.core import VectorStoreIndex

index = VectorStoreIndex.from_documents(
    documents,
    embed_model=embedding_model
)

### Querying

We will use Llama 2 for the purposes of this receipe.

We need to start up the Ollama server. In a separate terminal, we run `ollama serve`.

Then, we hook Llama 2 up to LlamaIndex and use it as the basis of our query engine.

In [None]:
from llama_index.llms.ollama import Ollama

llama = Ollama(
    model='llama2',
    request_timeout=40
)

query_engine = index.as_query_engine(llm=llama)

## Final Result

With all of those setup, our basic RAG librarian is set up and we can start asking questions about our library.

In [None]:
print(
    query_engine.query(
        "What are the titles of all the books available? Show me the context used to derive your answer."
    )
)

In [None]:
print(query_engine.query("Who is the main character of 'Pride and Prejudice'?"))