<a href="https://colab.research.google.com/github/sugarforever/LangChain-Tutorials/blob/main/langchain_supabase_rag.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [23]:
!pip install --upgrade --quiet  langchain langchain_community langchain-openai tiktoken supabase unstructured

In [25]:
import os
from google.colab import userdata

os.environ["SUPABASE_URL"] = userdata.get('SUPABASE_URL')
os.environ["SUPABASE_SERVICE_KEY"] = userdata.get('SUPABASE_SERVICE_KEY')
os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')

In [31]:
from langchain_community.vectorstores import SupabaseVectorStore
from langchain_core.documents import Document
from langchain_openai import OpenAIEmbeddings
from supabase.client import Client, create_client

supabase_url = os.environ.get("SUPABASE_URL")
supabase_key = os.environ.get("SUPABASE_SERVICE_KEY")
supabase: Client = create_client(supabase_url, supabase_key)

embeddings = OpenAIEmbeddings()

In [27]:
from langchain_community.document_loaders import UnstructuredURLLoader

In [28]:
urls = [ "https://supabase.com/blog/openai-embeddings-postgres-vector" ]

loader = UnstructuredURLLoader(urls=urls)
docs = loader.load()

In [29]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000, chunk_overlap=200, add_start_index=True
)
splits = text_splitter.split_documents(docs)

In [30]:
len(splits)

22

In [32]:
splits

[Document(page_content="Back\n\nBlog\n\nStoring OpenAI embeddings in Postgres with pgvector\n\n2023-02-06\n\n15 minute read\n\nGreg RichardsonEngineering\n\nA new PostgreSQL extension is now available in Supabase: pgvector, an open-source vector similarity search.\n\nThe exponential progress of AI functionality over the past year has inspired many new real world applications. One specific challenge has been the ability to store and query embeddings at scale.\nIn this post we'll explain what embeddings are, why we might want to use them, and how we can store and query them in PostgreSQL using pgvector.\n\nüÜï Supabase has now released an open source toolkit for developing AI applications using Postgres and pgvector. Learn more in the AI & Vectors docs.\n\nWhat are embeddings?#\n\nEmbeddings capture the ‚Äúrelatedness‚Äù of text, images, video, or other types of information. This relatedness is most commonly used for:\n\nSearch: how similar is a search term to a body of text?\n\nRecomme

In [33]:
vectorstore = SupabaseVectorStore.from_documents(
    splits,
    embeddings,
    client=supabase,
    table_name="documents",
    query_name="match_documents",
)

In [34]:
retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 6})
retrieved_docs = retriever.invoke("How to store embeddings with pgvector?")

In [35]:
retrieved_docs

[Document(page_content="What if I want to create/update/delete embeddings dynamically?\n\nWhat if I'm not using Python?\n\nUsing PostgreSQL#\n\nEnter pgvector, an extension for PostgreSQL that allows you to both store and query vector embeddings within your database. Let's try it out.\n\nFirst we'll enable the Vector extension. In Supabase, this can be done from the web portal through Database ‚Üí Extensions. You can also do this in SQL by running:\n\n_10\n\ncreate extension vector;\n\nNext let's create a table to store our documents and their embeddings:\n\n_10\n\ncreate table documents (\n\n_10\n\nid bigserial primary key,\n\n_10\n\ncontent text,\n\n_10\n\nembedding vector(1536)\n\n_10\n\n);\n\npgvector introduces a new data type called vector. In the code above, we create a column named embedding with the vector data type. The size of the vector defines how many dimensions the vector holds. OpenAI's text-embedding-ada-002 model outputs 1536 dimensions, so we will use that for our ve

In [36]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate

prompt = '''
You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.
Question: {question}
Context: {context}
Answer:
'''

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

rag_chain = ({"context": (retriever | format_docs), "question": RunnablePassthrough()}
             | ChatPromptTemplate.from_template(prompt)
             | ChatOpenAI(model="gpt-3.5-turbo-0125")
             | StrOutputParser())

rag_chain.invoke("How to store embeddings with pgvector?")

'To store embeddings with pgvector, you can create a table in PostgreSQL with a column of type vector to store the embeddings. You can then use the pgvector extension to calculate similarity between embeddings using operators like cosine distance. Indexing can also be used to speed up queries on tables with embeddings.'