# EXAMPLES (RAG)
- [RAG](https://docs.activeloop.ai/examples/rag)
  - [**RAG Quickstart**](https://docs.activeloop.ai/examples/rag/quickstart)
  - [RAG Tutorials](https://docs.activeloop.ai/examples/rag/tutorials)
    - [Vector Store Basics](https://docs.activeloop.ai/examples/rag/tutorials/vector-store-basics)
    - [Vector Search Options](https://docs.activeloop.ai/examples/rag/tutorials/vector-search-options)
      - [LangChain API](https://docs.activeloop.ai/examples/rag/tutorials/vector-search-options/langchain-api)
      - [Deep Lake Vector Store API](https://docs.activeloop.ai/examples/rag/tutorials/vector-search-options/vector-store-api)
      - [Managed Database REST API](https://docs.activeloop.ai/examples/rag/tutorials/vector-search-options/rest-api)
    - [Customizing Your Vector Store](https://docs.activeloop.ai/examples/rag/tutorials/step-4-customizing-vector-stores)
    - [Image Similarity Search](https://docs.activeloop.ai/examples/rag/tutorials/image-similarity-search)
    - [Improving Search Accuracy using Deep Memory](https://docs.activeloop.ai/examples/rag/tutorials/deepmemory)


## RAG Quickstart

In [1]:
# Installing Deep Lake

# !pip3 install deeplake
# !pip3 install openai

In [2]:
# Creating Your First Vector Store

# [paul_graham_essay.txt] file (73 kB)

In [3]:
# Import the required modules and set the OpenAI environmental variables for embeddings
# os.environ['OPENAI_API_KEY'] = <OPENAI_API_KEY>

from deeplake.core.vectorstore import VectorStore
import openai
import os
from dotenv import load_dotenv

load_dotenv(override = True)
open_api_key = os.getenv('OPENAI_API_KEY')



In [4]:
# Read and chunk the essay text based on a constant number of characters

source_text = 'paul_graham_essay.txt'
# source_text = '../../data/paul_graham/paul_graham_essay.txt'
vector_store_path = 'pg_essay_deeplake'

with open(source_text, 'r') as f:
    text = f.read()

CHUNK_SIZE = 1000
chunked_text = [text[i:i+CHUNK_SIZE] for i in range(0,len(text), CHUNK_SIZE)]

In [5]:
print(type(chunked_text))
print(len(chunked_text))

<class 'list'>
76


In [6]:
chunked_text[0]

'\n\nWhat I Worked On\n\nFebruary 2021\n\nBefore college the two main things I worked on, outside of school, were writing and programming. I didn\'t write essays. I wrote what beginning writers were supposed to write then, and probably still are: short stories. My stories were awful. They had hardly any plot, just characters with strong feelings, which I imagined made them deep.\n\nThe first programs I tried writing were on the IBM 1401 that our school district used for what was then called "data processing." This was in 9th grade, so I was 13 or 14. The school district\'s 1401 happened to be in the basement of our junior high school, and my friend Rich Draves and I got permission to use it. It was like a mini Bond villain\'s lair down there, with all these alien-looking machines â€” CPU, disk drives, printer, card reader â€” sitting up on a raised floor under bright fluorescent lights.\n\nThe language we used was an early version of Fortran. You had to type programs on punch cards, th

In [7]:
# Define an embedding function using OpenAI

def embedding_function(texts, model="text-embedding-ada-002"):
   
   if isinstance(texts, str):
       texts = [texts]

   texts = [t.replace("\n", " ") for t in texts]
   
   return [data.embedding for data in openai.embeddings.create(input = texts, model=model).data]

In [8]:
# Create the Deep Lake Vector Store and populate it with data

vector_store = VectorStore(
    path = vector_store_path,
)

vector_store.add(text = chunked_text, 
                 embedding_function = embedding_function, 
                 embedding_data = chunked_text, 
                 metadata = [{"source": source_text}]*len(chunked_text))

Creating 76 embeddings in 1 batches of size 76:: 100%|███████████████████████████████████████████████████████████| 1/1 [00:03<00:00,  3.22s/it]

Dataset(path='pg_essay_deeplake', tensors=['text', 'metadata', 'embedding', 'id'])

  tensor      htype      shape      dtype  compression
  -------    -------    -------    -------  ------- 
   text       text      (76, 1)      str     None   
 metadata     json      (76, 1)      str     None   
 embedding  embedding  (76, 1536)  float32   None   
    id        text      (76, 1)      str     None   





In [9]:
# The Vector Store's data structure can be summarized using

vector_store.summary()

Dataset(path='pg_essay_deeplake', tensors=['text', 'metadata', 'embedding', 'id'])

  tensor      htype      shape      dtype  compression
  -------    -------    -------    -------  ------- 
   text       text      (76, 1)      str     None   
 metadata     json      (76, 1)      str     None   
 embedding  embedding  (76, 1536)  float32   None   
    id        text      (76, 1)      str     None   


In [10]:
# To create a vector store using pre-compute embeddings instead of
#   the embedding_data and embedding_function, you may run

# vector_store.add(text = chunked_text, 
#                  embedding = <list_of_embeddings>, 
#                  metadata = [{"source": source_text}]*len(chunked_text))

In [11]:
# Performing Vector Search

prompt = "What are the first programs he tried writing?"

search_results = vector_store.search(embedding_data=prompt, embedding_function=embedding_function)

In [12]:
# The search_results is a dictionary with keys for the
#   text, score, id, and metadata, with data ordered by score.
# If we examine the first returned text using search_results['text'][0],
#   it appears to contain the answer to the prompt.

search_results['text'][0]

'\n\nWhat I Worked On\n\nFebruary 2021\n\nBefore college the two main things I worked on, outside of school, were writing and programming. I didn\'t write essays. I wrote what beginning writers were supposed to write then, and probably still are: short stories. My stories were awful. They had hardly any plot, just characters with strong feelings, which I imagined made them deep.\n\nThe first programs I tried writing were on the IBM 1401 that our school district used for what was then called "data processing." This was in 9th grade, so I was 13 or 14. The school district\'s 1401 happened to be in the basement of our junior high school, and my friend Rich Draves and I got permission to use it. It was like a mini Bond villain\'s lair down there, with all these alien-looking machines â€” CPU, disk drives, printer, card reader â€” sitting up on a raised floor under bright fluorescent lights.\n\nThe language we used was an early version of Fortran. You had to type programs on punch cards, th

In [13]:
print(search_results['text'][0])



What I Worked On

February 2021

Before college the two main things I worked on, outside of school, were writing and programming. I didn't write essays. I wrote what beginning writers were supposed to write then, and probably still are: short stories. My stories were awful. They had hardly any plot, just characters with strong feelings, which I imagined made them deep.

The first programs I tried writing were on the IBM 1401 that our school district used for what was then called "data processing." This was in 9th grade, so I was 13 or 14. The school district's 1401 happened to be in the basement of our junior high school, and my friend Rich Draves and I got permission to use it. It was like a mini Bond villain's lair down there, with all these alien-looking machines â€” CPU, disk drives, printer, card reader â€” sitting up on a raised floor under bright fluorescent lights.

The language we used was an early version of Fortran. You had to type programs on punch cards, then stack them 

In [14]:
# Visualizing your Vector Store
# - https://app.activeloop.ai/activeloop/twitter-algorithm

In [15]:
# Authentication
# - https://app.activeloop.ai/register/

In [16]:
# Creating Vector Stores in the Deep Lake Managed Tensor Database

# vector_store = VectorStore(
#     path = vector_store_path,
#     runtime = {"tensor_db": True}
# )

# vector_store.add(text = chunked_text, 
#                  embedding_function = embedding_function, 
#                  embedding_data = chunked_text, 
#                  metadata = [{"source": source_text}]*len(chunked_text))
                 
# search_results = vector_store.search(embedding_data = prompt, 
#                                      embedding_function = embedding_function)