<a href="https://colab.research.google.com/github/fabiodr/colabs/blob/main/Tutorial_Examples.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install voyageai

In [None]:
import os
import voyageai
os.environ['VOYAGE_API_KEY'] = "<your secret key>"
vo = voyageai.Client(api_key=os.environ.get("VOYAGE_API_KEY"),)

# Vectorize/embed the documents

In [None]:
# Prepare data
documents = [
    "The Mediterranean diet emphasizes fish, olive oil, and vegetables, believed to reduce chronic diseases.",
    "Photosynthesis in plants converts light energy into glucose and produces essential oxygen.",
    "20th-century innovations, from radios to smartphones, centered on electronic advancements.",
    "Rivers provide water, irrigation, and habitat for aquatic species, vital for ecosystems.",
	  "Apple’s conference call to discuss fourth fiscal quarter results and business updates is scheduled for Thursday, November 2, 2023 at 2:00 p.m. PT / 5:00 p.m. ET.",
    "Shakespeare's works, like 'Hamlet' and 'A Midsummer Night's Dream,' endure in literature."
]

In [None]:
# Embed the documents
documents_embeddings = vo.embed(documents, model="voyage-3", input_type="document").embeddings

If you are working with more than 128 documents, you will need to use a for loop to encode them:

# A minimalist retrieval system

The main feature of the embeddings is that the cosine similarity between two embeddings captures the semantic relatedness of the corresponding original passages. This allows us to use the embeddings to do semantic retrieval / search.

Suppose the user sends a "query" (e.g., a question or a comment) to the chatbot:

In [None]:
query = "When is Apple's conference call scheduled?"

To find out the document that is most similar to the query among the existing data, we can first embed/vectorize the query:

In [None]:
# Get the embedding of the query
query_embedding = vo.embed([query], model="voyage-3", input_type="query").embeddings[0]

**Nearest neighbor Search:** We can find a few closest embeddings in the documents embeddings based on the cosine similarity, and retrieve the corresponding document using the nearest_neighbors function.

In [None]:
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np

def k_nearest_neighbors(query_embedding, documents_embeddings, k=5):
  query_embedding = np.array(query_embedding) # convert to numpy array
  documents_embeddings = np.array(documents_embeddings) # convert to numpy array

  # Reshape the query vector embedding to a matrix of shape (1, n) to make it compatible with cosine_similarity
  query_embedding = query_embedding.reshape(1, -1)

  # Calculate the similarity for each item in data
  cosine_sim = cosine_similarity(query_embedding, documents_embeddings)

  # Sort the data by similarity in descending order and take the top k items
  sorted_indices = np.argsort(cosine_sim[0])[::-1]

  # Take the top k related embeddings
  top_k_related_indices = sorted_indices[:k]
  top_k_related_embeddings = documents_embeddings[sorted_indices[:k]]
  top_k_related_embeddings = [list(row[:]) for row in top_k_related_embeddings] # convert to list

  return top_k_related_embeddings, top_k_related_indices

In [None]:
# Use the nearest neighbor algorithm to find the document with the highest similarity
retrieved_embd, retrieved_embd_index = k_nearest_neighbors(query_embedding, documents_embeddings, k=1)
retrieved_doc = [documents[index] for index in retrieved_embd_index]

print(retrieved_doc)

**$k$-nearest neighbors Search ($k$-NN):** It is often useful to retrieve not only the closest document but also the $k$ most closest documents. The k_nearest_neighbors algorithm enables us to achieve this. It is important to note that `nearest_neighbors` is special case of `k_nearest_neighbors` when $k=1$.

In [None]:
# Use the k-nearest neighbor algorithm to identify the top-k documents with the highest similarity
retrieved_embds, retrieved_embd_indices = k_nearest_neighbors(query_embedding, documents_embeddings, k=3)
retrieved_docs = [documents[index] for index in retrieved_embd_indices]

print(retrieved_docs)

# Refinement with rerankers
We can further refine our embedding-based retrieval with rerankers.  Here, a reranker reranks the documents for semantic relevance against the query and produces a more relevant and smaller set of documents for inputting to the generative model.

In [None]:
# Reranking
documents_reranked = vo.rerank(
  query,
  retrieved_docs,
  model="rerank-lite-1",
  top_k=3
)

We see that the reranker properly ranks the Apple conference call document as the most relevant to the query.

In [None]:
for r in documents_reranked.results:
  print(f"Document: {r.document}")
  print(f"Relevance Score: {r.relevance_score}")
  print(f"Index: {r.index}")
  print("\n")

In [None]:
# Take the document with the highest score
retrieved_docs = documents_reranked.results[0].document
print(retrieved_docs)

# A minimalist RAG chatbot
The [Retrieval-Augmented Generation](https://www.pinecone.io/learn/retrieval-augmented-generation/) (RAG) chatbot represents a cutting-edge approach in conversational artificial intelligence. RAG combines the powers of retrieval-based and generative methods to produce more accurate and contextually relevant responses. RAG can leverage a large corpora of text to retrieve relevant documents and then send those documents to language models, such as GPT-4, to generate replies. This methodology ensures that the chatbot's answers are both informed by vast amounts of information and tailored to the specifics of the user's query.

Suppose you have implemented a semantic search system as described in the previous section, and as a result of the search process, you have retrieved the most relevant document, referred to as `retrieved_doc`. We can craft a prompt with this context which we can use as input to the language model.

In [None]:
# Take the retrieved document and use it as a prompt for the text generation model
prompt = f"Based on the information: '{retrieved_doc}', generate a response of {query}"

Now you can utilize a text generation model like Claude 3.5 Sonnet to craft a response based on the provided query and the retrieved document.

In [None]:
# install anthropic
!pip install anthropic

In [None]:
import anthropic

# Initialize Anthropic API
client = anthropic.Anthropic(api_key="YOUR ANTHROPIC API KEY")

message = client.messages.create(
    model="claude-3-5-sonnet-20240620",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": prompt}
    ]
)

print(message.content[0].text)

 You can do the same with GPT-4o as well.

In [None]:
# install openai
!pip install openai

In [None]:
from openai import OpenAI
import os

# Initialize OpenAI API
client = OpenAI(api_key="YOUR OPENAI API KEY")

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": prompt}
    ]
)

print(response.choices[0].message.content)