Good embeddings need good contexts to embed. Here's an example of how to combine smart contexts from llmsherpa.LayoutPDFParser with the new Cohere embeddings.

First install cohere and it's dependencies

In [None]:
!pip install tiktoken
!pip install openai
!pip install -U cohere

Now instll the llmsherpa library to get LayoutPDFParser.

In [None]:
!pip install llmsherpa

Collecting llmsherpa
  Downloading llmsherpa-0.1.3-py3-none-any.whl (12 kB)
Installing collected packages: llmsherpa
Successfully installed llmsherpa-0.1.3


In [None]:
from llmsherpa.readers import LayoutPDFReader

llmsherpa_api_url = "https://readers.llmsherpa.com/api/document/developer/parseDocument?renderFormat=all"
pdf_url = "https://www.shinzen.org/wp-content/uploads/2016/08/WhatIsMindfulness_SY_Public_ver1.5.pdf" # also allowed is a file path e.g. /home/downloads/xyz.pdf
pdf_reader = LayoutPDFReader(llmsherpa_api_url)
doc = pdf_reader.read_pdf(pdf_url)


Now go through the smart chunks returned from the PDF and embed the chunk texts with Cohere.

In [None]:
import cohere
import numpy as np
cohere_key = "{use cohere key here}"
co = cohere.Client(cohere_key)

contexts = []
for chunk in doc.chunks():
  contexts.append(chunk.to_context_text())

#Encode your documents with input type 'search_document'
doc_emb = co.embed(contexts, input_type="search_document", model="embed-english-v3.0").embeddings
doc_emb = np.asarray(doc_emb)


Now we use cohere question embeddings to embed the queries, search relevant contexts and use OpenAI to summarize the related contexts conditioned on your question!

In [None]:
import openai
from IPython.core.display import display, HTML
openai.api_key = "{use open ai key here}"

def ask(query):
  #Encode your query with input type 'search_query'
  query_emb = co.embed([query], input_type="search_query", model="embed-english-v3.0").embeddings
  query_emb = np.asarray(query_emb)
  query_emb.shape

  #Compute the dot product between query embedding and document embedding
  scores = np.dot(query_emb, doc_emb.T)[0]

  #Find the highest scores
  max_idx = np.argsort(-scores)
  most_relevant_contexts = []
  top_k = 10

  #Get only the top contexts to keep the context for openai small
  for idx in max_idx[0:top_k]:
    most_relevant_contexts.append(contexts[idx])

  #Call OpenAI to synthesize answers
  passages = "\n".join(most_relevant_contexts)
  prompt = f"Read the following passages and answer the question: {query}\n passages: {passages}"
  completion = openai.ChatCompletion.create(model="gpt-3.5-turbo", messages=[{"role": "user", "content": prompt}])
  synthesized_answer = completion.choices[0].message.content

  print(f"Query: {query}")
  print(f"Answer: {synthesized_answer}")
  print("\nRelevant contexts: \n")
  for ctx in most_relevant_contexts:
      print(ctx)
      print("--------")

ask("what do i need to practice mindfulness")


Query: what do i need to practice mindfulness
Answer: To practice mindfulness, you need to acquire and apply concentration, clarity, and equanimity skills. Mindfulness can refer to a form of awareness, the practices that elevate that awareness, and the application of that awareness for specific goals. Mindfulness practice often includes training in positive affect, such as loving kindness. It is important to understand that mindfulness is not just being aware in a general sense, but rather a fine-grained and systematic practice. Mindfulness skills can be learned independent of one's beliefs or worldview, but they may impact how one views things. It is also important to recognize that there may be specific learning required for different sensory experiences, and that carryover of skills may not always be immediate or universal.

Relevant contexts: 

I. SOME USEFUL DISTINCTIONS > Mindfulness – The Practices
Mindfulness can also refer to the systematic exercises that elevate a person’s ba