# Basic RAG tutorial with templates

:::info
In this tutorial we show you how to do retrieval augmented generation (RAG) with `pinnacledb`.
Note that this is just an example of the flexibility and power which `pinnacledb` gives 
to developers. `pinnacledb` is about much more than RAG and LLMs. 
:::

As in the vector-search tutorial we'll use `pinnacledb` documentation for the tutorial.
We'll add this to a testing database by downloading the data snapshot:

In [None]:
!curl -O https://pinnacledb-public-demo.s3.amazonaws.com/text.json
import json

from pinnacledb import pinnacle, Document

db = pinnacle('mongomock://test')

with open('text.json') as f:
    data = json.load(f)

_ = db['docu'].insert_many([{'txt': r} for r in data]).execute()

Let's verify the data in the `db` by querying one datapoint:

In [None]:
db['docu'].find_one().execute()

The first step in a RAG application is to create a `VectorIndex`. The results of searching 
with this index will be used as input to the LLM for answering questions.

Read about `VectorIndex` [here](../apply_api/vector_index.md) and follow along the tutorial on 
vector-search [here](./vector_search.md).

In [None]:
import requests 

from pinnacledb import Stack, Document, VectorIndex, Listener, vector
from pinnacledb.ext.sentence_transformers.model import SentenceTransformer
from pinnacledb.base.code import Code

def postprocess(x):
    return x.tolist()

datatype = vector(shape=384, identifier="my-vec")
    
model = SentenceTransformer(
    identifier="my-embedding",
    datatype=datatype,
    predict_kwargs={"show_progress_bar": True},
    signature="*args,**kwargs",
    model="all-MiniLM-L6-v2",      
    device="cpu",
    postprocess=Code.from_object(postprocess),
)

listener = Listener(
    identifier="my-listener",
    model=model,
    key='txt',
    select=db['docu'].find(),
    predict_kwargs={'max_chunk_size': 50},
)

vector_index = VectorIndex(
    identifier="my-index",
    indexing_listener=listener,
    measure="cosine"
)

db.apply(vector_index)

Now that we've set up a `VectorIndex`, we can connect this index with an LLM in a number of ways.
A simple way to do that is with the `SequentialModel`. The first part of the `SequentialModel`
executes a query and provides the results to the LLM in the second part. 

The `RetrievalPrompt` component takes a query with a "free" `Variable` as input. 
This gives users great flexibility with regard to how they fetch the context
for their downstream models.

We're using OpenAI, but you can use any type of LLm with `pinnacledb`. We have several 
native integrations (see [here](../ai_integraitons/)) but you can also [bring your own model](../models/bring_your_own_model.md).

In [None]:
from pinnacledb.ext.llm.prompter import *
from pinnacledb.base.variables import Variable
from pinnacledb import Document
from pinnacledb.components.model import SequentialModel
from pinnacledb.ext.openai import OpenAIChatCompletion

q = db['docu'].like(Document({'txt': Variable('prompt')}), vector_index='my-index', n=5).find().limit(10)

def get_output(c):
    return [r['txt'] for r in c]

prompt_template = RetrievalPrompt('my-prompt', select=q, postprocess=Code.from_object(get_output))

llm = OpenAIChatCompletion(
    'gpt-3.5-turbo',
    client_kwargs={'api_key': '<OPENAI_API_KEY>'},
)
seq = SequentialModel('rag', models=[prompt_template, llm])

db.apply(seq)

Now we can test the `SequentialModel` with a sample question:

In [None]:
seq.predict_one('Tell be about vector-indexes?')

:::tip
Did you know you can use any tools from the Python ecosystem with `pinnacledb`.
That includes `langchain` and `llamaindex` which can be very useful for RAG applications.
:::