# EXAMPLES (RAG)
- [RAG](https://docs.activeloop.ai/examples/rag)
  - [RAG Quickstart](https://docs.activeloop.ai/examples/rag/quickstart)
  - [RAG Tutorials](https://docs.activeloop.ai/examples/rag/tutorials)
    - [Vector Store Basics](https://docs.activeloop.ai/examples/rag/tutorials/vector-store-basics)
    - [Vector Search Options](https://docs.activeloop.ai/examples/rag/tutorials/vector-search-options)
      - [LangChain API](https://docs.activeloop.ai/examples/rag/tutorials/vector-search-options/langchain-api)
      - [**Deep Lake Vector Store API**](https://docs.activeloop.ai/examples/rag/tutorials/vector-search-options/vector-store-api)
      - [Managed Database REST API](https://docs.activeloop.ai/examples/rag/tutorials/vector-search-options/rest-api)
    - [Customizing Your Vector Store](https://docs.activeloop.ai/examples/rag/tutorials/step-4-customizing-vector-stores)
    - [Image Similarity Search](https://docs.activeloop.ai/examples/rag/tutorials/image-similarity-search)
    - [Improving Search Accuracy using Deep Memory](https://docs.activeloop.ai/examples/rag/tutorials/deepmemory)


## RAG Tutorials (Vector Search Options) (Deep Lake Vector Store API)

In [1]:
# !pip install "deeplake[enterprise]" langchain openai tiktoken

### Vector Search on the Client

In [2]:
from deeplake.core.vectorstore import VectorStore
import openai
import os
from dotenv import load_dotenv

load_dotenv(override = True)
open_api_key = os.getenv('OPENAI_API_KEY')
activeloop_token = os.getenv('ACTIVELOOP_TOKEN')



In [3]:
MODEL_GPT = 'gpt-4o-mini'

In [4]:
# Load the same vector store used in the Quickstart and run embeddings search
#   based on a user prompt using the Deep Lake Vector Store module

# os.environ['OPENAI_API_KEY'] = <OPENAI_API_KEY>

vector_store_path = 'hub://activeloop/paul_graham_essay'

vector_store = VectorStore(
    path = vector_store_path,
    read_only = True
)

Deep Lake Dataset in hub://activeloop/paul_graham_essay already exists, loading from the storage


In [5]:
# Define an embedding function using OpenAI

def embedding_function(texts, model="text-embedding-ada-002"):
   
   if isinstance(texts, str):
       texts = [texts]

   texts = [t.replace("\n", " ") for t in texts]
   
   return [data.embedding for data in openai.embeddings.create(input = texts, model=model).data]

#### Simple Vector Search

In [6]:
prompt = "What are the first programs he tried writing?"

search_results = vector_store.search(embedding_data=prompt, 
                                     embedding_function=embedding_function)

In [7]:
search_results['text'][0]

'What I Worked On\n\nFebruary 2021\n\nBefore college the two main things I worked on, outside of school, were writing and programming. I didn\'t write essays. I wrote what beginning writers were supposed to write then, and probably still are: short stories. My stories were awful. They had hardly any plot, just characters with strong feelings, which I imagined made them deep.\n\nThe first programs I tried writing were on the IBM 1401 that our school district used for what was then called "data processing." This was in 9th grade, so I was 13 or 14. The school district\'s 1401 happened to be in the basement of our junior high school, and my friend Rich Draves and I got permission to use it. It was like a mini Bond villain\'s lair down there, with all these alien-looking machines — CPU, disk drives, printer, card reader — sitting up on a raised floor under bright fluorescent lights.'

#### Filter Search Using UDFs

In [8]:
def filter_fn(x):
    # x is a single row in Deep Lake, 'text' is the tensor name, .data()['value'] is the method for fetching the data
    return "program" in x['text'].data()['value'].lower()

In [9]:
prompt = "What are the first programs he tried writing?"

search_results_filter = vector_store.search(embedding_data = prompt, 
                                            embedding_function = embedding_function,
                                            filter = filter_fn,
                                            k = 10,
                                            distance_metric = 'l2',
                                            exec_option = "python")

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 1263.74it/s]


In [10]:
all(["program" in result for result in search_results_filter["text"]])

True

#### Filter Search Using Metadata Filters

In [11]:
search_results_filter = vector_store.search(embedding_data = prompt, 
                                            embedding_function = embedding_function,
                                            filter = {"metadata": {"source": "paul_graham_essay.txt"}})

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 1162.18it/s]


In [12]:
search_results_filter["metadata"]

[{'source': 'paul_graham_essay.txt'},
 {'source': 'paul_graham_essay.txt'},
 {'source': 'paul_graham_essay.txt'},
 {'source': 'paul_graham_essay.txt'}]

In [13]:
# for result in search_results_filter["metadata"]:
#     print(result)
[print(result["source"]) for result in search_results_filter["metadata"]]

all(["paul_graham_essay.txt" in result["source"] for result in search_results_filter["metadata"]])

paul_graham_essay.txt
paul_graham_essay.txt
paul_graham_essay.txt
paul_graham_essay.txt


True

#### Filter Search using TQL

In [14]:
# Load a larger Vector Store for running more interesting queries

vector_store_path = "hub://activeloop/twitter-algorithm"

vector_store = VectorStore(
    path = vector_store_path,
    read_only = True
)

Deep Lake Dataset in hub://activeloop/twitter-algorithm already exists, loading from the storage


In [15]:
prompt = "What does the python code do?"

In [16]:
embedding = embedding_function(prompt)[0]

# Format the embedding array or list as a string, so it can be passed in the REST API request.
embedding_string = ",".join([str(item) for item in embedding])

tql_query = f"select * from (select text, metadata, cosine_similarity(embedding, ARRAY[{embedding_string}]) as score where contains(text, 'python') or contains(metadata['source'], '.py')) order by score desc limit 5"

In [17]:
search_results = vector_store.search(query = tql_query)

In [18]:
search_results['metadata']

[{'source': './the-algorithm/ann/src/main/python/dataflow/worker_harness/Dockerfile'},
 {'source': './the-algorithm/src/python/twitter/deepbird/projects/timelines/scripts/models/earlybird/lolly/score.py'},
 {'source': './the-algorithm/src/python/twitter/deepbird/projects/timelines/scripts/models/earlybird/lolly/BUILD'},
 {'source': './the-algorithm/src/python/twitter/deepbird/projects/timelines/scripts/models/earlybird/BUILD'},
 {'source': './the-algorithm/src/python/twitter/deepbird/projects/timelines/scripts/models/earlybird/tf_model/BUILD'}]

### Vector Search Using the Managed Tensor Database (Server-Side)

In [19]:
# vector_store = VectorStore(
#     path = "hub://<org_id>/<dataset_name>",
#     runtime = {"tensor_db": True}
# )

search_results = vector_store.search(embedding_data=prompt, 
                                     embedding_function=embedding_function)

In [20]:
search_results["metadata"]

[{'source': './the-algorithm/ann/src/main/python/dataflow/worker_harness/Dockerfile'},
 {'source': './the-algorithm/.git/logs/refs/remotes/origin/HEAD'},
 {'source': './the-algorithm/.git/logs/refs/heads/main'},
 {'source': './the-algorithm/.git/logs/HEAD'}]

In [21]:
search_results["text"][0]

'RUN \\\n  # Add Deadsnakes repository that has a variety of Python packages for Ubuntu.\n  # See: https://launchpad.net/~deadsnakes/+archive/ubuntu/ppa\n  apt-key adv --keyserver keyserver.ubuntu.com --recv-keys F23C5A6CF475977595C89F51BA6932366A755776 \\\n  && echo "deb http://ppa.launchpad.net/deadsnakes/ppa/ubuntu focal main" >> /etc/apt/sources.list.d/custom.list \\\n  && echo "deb-src http://ppa.launchpad.net/deadsnakes/ppa/ubuntu focal main" >> /etc/apt/sources.list.d/custom.list \\\n  && apt-get update \\\n  && apt-get install -y curl \\\n  python3.7 \\\n  # With python3.8 package, distutils need to be installed separately.\n  python3.7-distutils \\\n  python3-dev \\\n  python3.7-dev \\\n  libpython3.7-dev \\\n  python3-apt \\\n  gcc \\\n  g++ \\\n  && rm -rf /var/lib/apt/lists/*\nRUN update-alternatives --install /usr/bin/python python /usr/bin/python3.7 10\nRUN rm -f /usr/bin/python3 && ln -s /usr/bin/python3.7 /usr/bin/python3\nRUN \\\n  curl https://bootstrap.pypa.io/get-pi