<!-- TABS -->
# Retrieval augmented generation

<!-- TABS -->
## Connect to superduper

:::note
Note that this is only relevant if you are running superduper in development mode.
Otherwise refer to "Configuring your production system".
:::

In [1]:
from superduper import superduper

db = superduper('mongomock:///test_db')

[32m2024-Oct-08 14:04:03.04[0m| [1mINFO    [0m | [36mDuncans-MacBook-Pro.local[0m| [36msuperduper.misc.plugins[0m:[36m13  [0m | [1mLoading plugin: mongodb[0m
[32m2024-Oct-08 14:04:03.07[0m| [1mINFO    [0m | [36mDuncans-MacBook-Pro.local[0m| [36msuperduper.base.datalayer[0m:[36m103 [0m | [1mBuilding Data Layer[0m
[32m2024-Oct-08 14:04:03.07[0m| [1mINFO    [0m | [36mDuncans-MacBook-Pro.local[0m| [36msuperduper.base.build[0m:[36m173 [0m | [1mConfiguration: 
 +---------------+----------------------+
| Configuration |        Value         |
+---------------+----------------------+
|  Data Backend | mongomock:///test_db |
+---------------+----------------------+[0m


In [9]:
COLLECTION_NAME = 'data'

<!-- TABS -->
## Get useful sample data

In [7]:
# <tab: Text>
import json

with open('data.json', 'r') as f:
    data = json.load(f)

In [None]:
# <tab: PDF>
!curl -O https://superduperdb-public-demo.s3.amazonaws.com/pdfs.zip && unzip -o pdfs.zip
import os

data = [f'pdfs/{x}' for x in os.listdir('./pdfs') if x.endswith('.pdf')]

In [8]:
datas = [{'x': d} for d in data]

<!-- TABS -->
## Insert simple data

After turning on auto_schema, we can directly insert data, and superduper will automatically analyze the data type, and match the construction of the table and datatype.

In [10]:
from superduper import Document

ids = db.execute(db[COLLECTION_NAME].insert([Document(data) for data in datas]))

[32m2024-Oct-08 14:10:23.61[0m| [1mINFO    [0m | [36mDuncans-MacBook-Pro.local[0m| [36msuperduper.base.datalayer[0m:[36m365 [0m | [1mTable data does not exist, auto creating...[0m
[32m2024-Oct-08 14:10:23.61[0m| [1mINFO    [0m | [36mDuncans-MacBook-Pro.local[0m| [36msuperduper.base.datalayer[0m:[36m371 [0m | [1mCreating table data with schema {('x', 'str'), ('_fold', 'str')}[0m
[32m2024-Oct-08 14:10:23.65[0m| [1mINFO    [0m | [36mDuncans-MacBook-Pro.local[0m| [36msuperduper.base.datalayer[0m:[36m346 [0m | [1mInserted 209 documents into data[0m


<!-- TABS -->
## Apply a chunker for search

:::note
Note that applying a chunker is ***not*** mandatory for search.
If your data is already chunked (e.g. short text snippets or audio) or if you
are searching through something like images, which can't be chunked, then this
won't be necessary.
:::

In [11]:
# <tab: Text>
from superduper import model

CHUNK_SIZE = 200

@model(flatten=True, model_update_kwargs={})
def chunker(text):
    text = text.split()
    chunks = [' '.join(text[i:i + CHUNK_SIZE]) for i in range(0, len(text), CHUNK_SIZE)]
    return chunks

In [None]:
# <tab: PDF>
!pip install -q "unstructured[pdf]"
from superduper import model
from unstructured.partition.pdf import partition_pdf

CHUNK_SIZE = 500

@model(flatten=True)
def chunker(pdf_file):
    elements = partition_pdf(pdf_file)
    text = '\n'.join([e.text for e in elements])
    chunks = [text[i:i + CHUNK_SIZE] for i in range(0, len(text), CHUNK_SIZE)]
    return chunks

Now we apply this chunker to the data by wrapping the chunker in `Listener`:

In [12]:
from superduper import Listener

upstream_listener = Listener(
    model=chunker,
    select=db[COLLECTION_NAME].select(),
    key='x',
    uuid="chunker",
    identifier='chunker',
)

## Select outputs of upstream listener

:::note
This is useful if you have performed a first step, such as pre-computing 
features, or chunking your data. You can use this query to 
operate on those outputs.
:::

<!-- TABS -->
## Build text embedding model

In [13]:
# <tab: OpenAI>
import os
os.environ['OPENAI_API_KEY'] = 'sk-<your-secret>'
from superduper_openai import OpenAIEmbedding

embedding_model = OpenAIEmbedding(identifier='text-embedding-ada-002')

In [None]:
# <tab: JinaAI>
import os
from superduper_jina import JinaEmbedding

os.environ["JINA_API_KEY"] = "jina_xxxx"
 
# define the model
embedding_model = JinaEmbedding(identifier='jina-embeddings-v2-base-en')

In [None]:
# <tab: Sentence-Transformers>
!pip install sentence-transformers
from superduper import vector
import sentence_transformers
from superduper_sentence_transformers import SentenceTransformer

embedding_model = SentenceTransformer(
    identifier="embedding",
    object=sentence_transformers.SentenceTransformer("BAAI/bge-small-en"),
    datatype=vector(shape=(1024,)),
    postprocess=lambda x: x.tolist(),
    predict_kwargs={"show_progress_bar": True},
)

## Create vector-index

In [14]:
from superduper import VectorIndex, Listener

vector_index_name = 'vector-index'

vector_index = VectorIndex(
    vector_index_name,
    indexing_listener=Listener(
        key=upstream_listener.outputs,
        select=db[upstream_listener.outputs].select(),
        model=embedding_model,
        identifier='embedding-listener',
        upstream=[upstream_listener],
    )
)

In [15]:
db.apply(vector_index)

[32m2024-Oct-08 14:15:11.75[0m| [1mINFO    [0m | [36mDuncans-MacBook-Pro.local[0m| [36msuperduper.components.listener[0m:[36m94  [0m | [1mRequesting listener setup on CDC service[0m
[32m2024-Oct-08 14:15:11.75[0m| [1mINFO    [0m | [36mDuncans-MacBook-Pro.local[0m| [36msuperduper.components.listener[0m:[36m104 [0m | [1mSkipping listener setup on CDC service since no URI is set[0m
[32m2024-Oct-08 14:15:11.75[0m| [1mINFO    [0m | [36mDuncans-MacBook-Pro.local[0m| [36msuperduper.jobs.queue[0m:[36m210 [0m | [1mRunning jobs for listener::chunker[0m
[32m2024-Oct-08 14:15:11.75[0m| [1mINFO    [0m | [36mDuncans-MacBook-Pro.local[0m| [36msuperduper.backends.local.compute[0m:[36m67  [0m | [1mSubmitting job. function:<function method_job at 0x106c06e60>[0m
[32m2024-Oct-08 14:15:11.76[0m| [1mINFO    [0m | [36mDuncans-MacBook-Pro.local[0m| [36msuperduper.components.model[0m:[36m732 [0m | [1mRequesting prediction in db - [chunker] with predic

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:06<00:00,  1.64s/it]


[32m2024-Oct-08 14:15:18.55[0m| [1mINFO    [0m | [36mDuncans-MacBook-Pro.local[0m| [36msuperduper.components.model[0m:[36m862 [0m | [1mAdding 363 model outputs to `db`[0m
[32m2024-Oct-08 14:15:21.01[0m| [1mINFO    [0m | [36mDuncans-MacBook-Pro.local[0m| [36msuperduper.base.datalayer[0m:[36m346 [0m | [1mInserted 363 documents into _outputs__embedding-listener[0m
[32m2024-Oct-08 14:15:21.02[0m| [32m[1mSUCCESS [0m | [36mDuncans-MacBook-Pro.local[0m| [36msuperduper.backends.local.compute[0m:[36m73  [0m | [32m[1mJob submitted on <superduper.backends.local.compute.LocalComputeBackend object at 0x16062be20>.  function:<function method_job at 0x106c06e60> future:444109a7-933e-45db-8064-dd537272f0d7[0m
[32m2024-Oct-08 14:15:21.05[0m| [1mINFO    [0m | [36mDuncans-MacBook-Pro.local[0m| [36msuperduper.jobs.queue[0m:[36m210 [0m | [1mRunning jobs for vector_index::vector-index[0m
[32m2024-Oct-08 14:15:21.05[0m| [1mINFO    [0m | [36mDuncans-MacBo

Loading vectors into vector-table...: 363it [00:00, 953.68it/s]


[32m2024-Oct-08 14:15:21.44[0m| [1mINFO    [0m | [36mDuncans-MacBook-Pro.local[0m| [36msuperduper.components.vector_index[0m:[36m97  [0m | [1mLoaded 363 vectors into vector index succesfully[0m
[32m2024-Oct-08 14:15:21.95[0m| [32m[1mSUCCESS [0m | [36mDuncans-MacBook-Pro.local[0m| [36msuperduper.backends.local.compute[0m:[36m73  [0m | [32m[1mJob submitted on <superduper.backends.local.compute.LocalComputeBackend object at 0x16062be20>.  function:<function callable_job at 0x106c070a0> future:e971fd14-68d4-4b82-a34a-1b01c444166b[0m
[32m2024-Oct-08 14:15:21.96[0m| [1mINFO    [0m | [36mDuncans-MacBook-Pro.local[0m| [36msuperduper.base.datalayer[0m:[36m153 [0m | [1mInitializing vector searcher with type in_memory[0m
[32m2024-Oct-08 14:15:21.96[0m| [1mINFO    [0m | [36mDuncans-MacBook-Pro.local[0m| [36msuperduper.base.datalayer[0m:[36m164 [0m | [1mUsing vector searcher: <class 'superduper.vector_search.in_memory.InMemoryVectorSearcher'>[0m
[3

Loading vectors into vector-table...: 363it [00:00, 980.35it/s] 

[32m2024-Oct-08 14:15:22.33[0m| [1mINFO    [0m | [36mDuncans-MacBook-Pro.local[0m| [36msuperduper.components.vector_index[0m:[36m97  [0m | [1mLoaded 363 vectors into vector index succesfully[0m





(['dbe9ae8eb92a492882428ddd90890a88',
  '851c4c2087ce475bb9c6ee1c27c999ba',
  'ee8eb87cd3c34e0a84a3da6ea9dea620'],
 VectorIndex(identifier='vector-index', uuid='d9d4f47a1e454bc39619eed037aeaebd', upstream=None, plugins=None, cache=False, indexing_listener=Listener(identifier='embedding-listener', uuid='befa04bbe7ef48a79f02ccb54736cdea', upstream=[Listener(identifier='chunker', uuid='chunker', upstream=None, plugins=None, cache=False, key='x', model=ObjectModel(identifier='chunker', uuid='a9e49743199c4344a7a4300485dd8f88', upstream=None, plugins=None, cache=False, signature='*args,**kwargs', datatype=None, output_schema=None, flatten=True, model_update_kwargs={}, predict_kwargs={}, compute_kwargs={}, validation=None, metric_values={}, num_workers=0, serve=False, object=<function chunker at 0x1606a2440>), select=data.select(), predict_kwargs={}, predict_id='chunker')], plugins=None, cache=False, key='_outputs__chunker', model=OpenAIEmbedding(identifier='text-embedding-ada-002', uuid='104

<!-- TABS -->
## Create Vector Search Model

In [17]:
from superduper.components.model import QueryModel

item = {'_outputs__chunker': '<var:query>'}

vector_search_model = QueryModel(
    identifier="VectorSearch",
    select=db[upstream_listener.outputs].like(item, vector_index=vector_index_name, n=5).select(),
    # The _source is the identifier of the upstream data, which can be used to locate the data from upstream sources using `_source`.
    postprocess=lambda docs: [{"text": doc[vector_index.indexing_listener.outputs], "_source": doc["_source"]} for doc in docs],
    db=db
)

In [18]:
vector_search_model.predict('Tell me about vector-search')

?-outputs-chunker-like-outputs-chunker-<var:query>-vector-index-vector-index-n-5-select
-outputs-chunker-like-outputs-chunker-<var:query>-vector-index-vector-index-n-5-select
[32m2024-Oct-08 14:24:20.60[0m| [1mINFO    [0m | [36mDuncans-MacBook-Pro.local[0m| [36msuperduper.base.datalayer[0m:[36m930 [0m | [1m{}[0m


KeyError: '_outputs__embedding-listener'

In [19]:
debug

> [0;32m/Users/dodo/superduper-io/superduper/superduper/misc/special_dicts.py[0m(301)[0;36m__getitem__[0;34m()[0m
[0;32m    299 [0;31m            [0;32mreturn[0m [0mself[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m    300 [0;31m        [0;32mif[0m [0;34m'.'[0m [0;32mnot[0m [0;32min[0m [0mkey[0m[0;34m:[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m--> 301 [0;31m            [0;32mreturn[0m [0msuper[0m[0;34m([0m[0;34m)[0m[0;34m.[0m[0m__getitem__[0m[0;34m([0m[0mkey[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m    302 [0;31m        [0;32melse[0m[0;34m:[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m    303 [0;31m            [0;32mtry[0m[0;34m:[0m[0;34m[0m[0;34m[0m[0m
[0m


ipdb>  !self


Document({'_outputs__chunker': '--- sidebar_label: Build simple select queries filename: build_simple_select_queries.md --- import Tabs from \'@theme/Tabs\'; import TabItem from \'@theme/TabItem\'; <!-- TABS --> # Build simple select queries <Tabs> <TabItem value="MongoDB" label="MongoDB" default> ```python select = db[\'<table-name>\'].select() ``` </TabItem> <TabItem value="SQL" label="SQL" default> ```python select = db[\'<table-name>\'].select() ``` </TabItem> </Tabs>', '_source': ObjectId('6705212f04115afc747a2a32'), '_fold': 'train', '_id': ObjectId('6705224f04115afc747a2b1a'), 'score': 0.7706174003563})


ipdb>  q


<!-- TABS -->
## Build LLM

In [None]:
# <tab: OpenAI>
from superduper_openai import OpenAIChatCompletion

llm = OpenAIChatCompletion(identifier='llm', model='gpt-3.5-turbo')

In [None]:
# <tab: Anthropic>
from superduper_anthropic import AnthropicCompletions
import os

os.environ["ANTHROPIC_API_KEY"] = "sk-xxx"

predict_kwargs = {
    "max_tokens": 1024,
    "temperature": 0.8,
}

llm = AnthropicCompletions(identifier='llm', model='claude-2.1', predict_kwargs=predict_kwargs)

In [None]:
# <tab: vLLM>
from superduper_vllm import VllmModel

predict_kwargs = {
    "max_tokens": 1024,
    "temperature": 0.8,
}


llm = VllmModel(
    identifier="llm",
    model_name="TheBloke/Mistral-7B-Instruct-v0.2-AWQ",
    vllm_kwargs={
        "gpu_memory_utilization": 0.7,
        "max_model_len": 1024,
        "quantization": "awq",
    },
    predict_kwargs=predict_kwargs,
)

In [None]:
# <tab: Transformers>
from superduper_transformers import LLM

llm = LLM.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2", load_in_8bit=True, device_map="cuda", identifier="llm", predict_kwargs=dict(max_new_tokens=128))

In [None]:
# <tab: Llama.cpp>
!huggingface-cli download TheBloke/Mistral-7B-Instruct-v0.2-GGUF mistral-7b-instruct-v0.2.Q4_K_M.gguf --local-dir . --local-dir-use-symlinks False

from superduper_llama_cpp.model import LlamaCpp
llm = LlamaCpp(identifier="llm", model_name_or_path="mistral-7b-instruct-v0.2.Q4_K_M.gguf")

## Answer question with LLM

In [None]:
from superduper import model
from superduper.components.graph import Graph, input_node

prompt_template = (
    "Use the following context snippets, these snippets are not ordered!, Answer the question based on this context.\n"
    "{context}\n\n"
    "Here's the question: {query}"
)

@model
def build_prompt(query, docs):
    chunks = [doc["text"] for doc in docs]
    context = "\n\n".join(chunks)
    prompt = prompt_template.format(context=context, query=query)
    return prompt

# We build a graph to handle the entire pipeline

# create a input node, only have one input parameter `query`
in_ = input_node('query')
# pass the query to the vector search model
vector_search_results = vector_search_model(query=in_)
# pass the query and the search results to the prompt builder
prompt = build_prompt(query=in_, docs=vector_search_results)
# pass the prompt to the llm model
answer = llm(prompt)
# create a graph, and the graph output is the answer
rag = answer.to_graph("rag")

By applying the RAG model to the database, it will subsequently be accessible for use in other services.

In [None]:
from superduper import Application

app = Application(
    'rag-app',
    components=[
        upstream_listener,
        vector_index,
        vector_search_model,
        rag,
    ]
)

db.apply(app)

You can now load the model elsewhere and make predictions using the following command.

In [None]:
rag = db.load("model", 'rag')
print(rag.predict("Tell me about superduper")[0])

## Create template

In [None]:
from superduper import Template

template = Template('rag-template', template=app, substitutions={'docs': 'collection'})

In [None]:
template.export('.')