# SurrealDB as a Vectorstore for LangChain
## Install packages

In [1]:
%pip install --upgrade --quiet  surrealdb langchain langchain-community beautifulsoup4 requests


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.1[0m[39;49m -> [0m[32;49m24.1.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


## Import packages

In [2]:
import requests
from bs4 import BeautifulSoup
from langchain_core.documents import Document
from langchain_community.vectorstores import SurrealDBStore
from langchain_community.embeddings import HuggingFaceEmbeddings

## Scrape features and client testimonials from SurrealDB.com

In [3]:
home_page = requests.get("https://surrealdb.com")
soup = BeautifulSoup(home_page.content,'html.parser')
features = soup.find("div", class_="space-y-32")
clients = soup.find("div", class_="space-y-28")

In [4]:
spotlight_feature = [features.find("p").text]

In [5]:
all_features = spotlight_feature + [f"{feature.text}\n{feature.next_sibling.text}" for feature in features.find_all("h3")]
all_features[:3]

['SurrealDB offers a dynamic and adaptable platform for business. With an integrated suite of cutting-edge database solutions, tools, and services, SurrealDB empowers your workforce to discover innovative answers using products meticulously crafted to meet their requirements.',
 "Database, realtime API layer, and security permissions all-in-one\nSurrealDB combines the database layer, the querying layer, and the API and authentication layer into one platform. Advanced table-based and row-based customisable access permissions allow for granular data access patterns for different types of users. There's no need for custom backend code and security rules with complicated database development.",
 'Query the database with the tools you want. Your data, your choice.\nSurrealDB is designed to be flexible to use, with support for SurrealQL, GraphQL (coming soon), CRUD support over REST, and JSON-RPC querying and modification over WebSockets. With direct-to-client connection with in-built permis

In [6]:
spotlight_client = [clients.find("p").text]

In [7]:
all_clients = spotlight_client + [
    f"{client.text}\n\n{client.next_sibling.find("h5").text}\n{client.next_sibling.find("p").text}" 
    for client in clients.find("div", class_="flex").find_all("p", class_="text-lg")
]
all_clients[:3]

['"Here at Yaacomm we are already using SurrealDB in a significant way for our backend infrastructure, however I believe SurrealDB can play an even bigger role in other areas as well. Being able to embed SurrealDB locally in Android and iOS apps could make it a perfect fit for local caching. Additionally, this would provide us with all the benefits of SurrealDB such as full-text searching and its graph based nature without relying on a network connection."',
 '"It think it\'s going to change how we query databases."\n\nAnup Jadhav\nDirector, Partner Delivery Success, C360 Cross-Cloud, Salesforce',
 '"Throughout my years dealing with database challenges, SurrealDB seems to be a beacon of innovation. Addressing issues like multi-tenancy, blending the best of hybrid & distributed databases, and ensuring unmatched scalability & performance. Oh and btw it had ML integrated in the database - how cool is that! But the true standout? Live queries and change feeds. Such features are not just ad

## Merge features and testimonials into a single list

In [8]:
all_docs = all_features + all_clients
all_docs[::4]

['SurrealDB offers a dynamic and adaptable platform for business. With an integrated suite of cutting-edge database solutions, tools, and services, SurrealDB empowers your workforce to discover innovative answers using products meticulously crafted to meet their requirements.',
 'Advanced inter-document relations and analysis. No JOINs. No pain.\nWith full graph database functionality, SurrealDB enables more advanced querying and analysis. Records (or vertices) can be connected to one another with edges, each with its own record properties and metadata. Simple extensions to traditional SQL queries allow for multi-table, multi-depth document retrieval, efficiently in the database, without the use of complicated JOINs and without bringing the data down to the client.',
 'Realtime live queries and data changes direct to application\nSurrealDB keeps every client device in-sync with data modifications pushed in realtime to the clients, applications, end-user devices, and server-side librari

## Prepare Document object from features and testimonials list

In [9]:
docs = [Document(page_content=doc) for doc in all_docs]
docs[::4]

[Document(page_content='SurrealDB offers a dynamic and adaptable platform for business. With an integrated suite of cutting-edge database solutions, tools, and services, SurrealDB empowers your workforce to discover innovative answers using products meticulously crafted to meet their requirements.'),
 Document(page_content='Advanced inter-document relations and analysis. No JOINs. No pain.\nWith full graph database functionality, SurrealDB enables more advanced querying and analysis. Records (or vertices) can be connected to one another with edges, each with its own record properties and metadata. Simple extensions to traditional SQL queries allow for multi-table, multi-depth document retrieval, efficiently in the database, without the use of complicated JOINs and without bringing the data down to the client.'),
 Document(page_content='Realtime live queries and data changes direct to application\nSurrealDB keeps every client device in-sync with data modifications pushed in realtime to 

## Create a LangChain supported Text Embeddings Function

We use `sentence-transformers/all-mpnet-base-v2` embeddings which is the default with `HuggingFaceEmbeddings`.  
This will be used to generate embeddings vectors for our documents when store them within SurrealDB.  
This will also be used to generate query embeddings vector which will help us do similarity searches on our documents.  
Currently support text embedding functions for LangChain can be found [here](https://python.langchain.com/v0.2/docs/integrations/text_embedding/).

In [10]:
embedding_function = HuggingFaceEmbeddings()

  warn_deprecated(
  from tqdm.autonotebook import tqdm, trange


## Initialize SurrealDBStore with embeddings function

Following arguments are supported along with their applicable default :
```
embedding_function: Embedding function to use.
dburl: SurrealDB connection url. (default: "ws://localhost:8000/rpc")
ns: surrealdb namespace for the vector store. (default: "langchain")
db: surrealdb database for the vector store. (default: "database")
collection: surrealdb collection for the vector store. (default: "documents")
(optional) db_user and db_pass: surrealdb credentials
```

In [11]:
sdb = SurrealDBStore(embedding_function=embedding_function,collection="surrealdb.com")
await sdb.initialize()

## Delete existing records/documents

In [12]:
await sdb.adelete()

True

## Add features and testimonials documents into SurrealDBStore

In [13]:
await sdb.aadd_documents(docs)

['⟨surrealdb.com⟩:9dblbhacmz08vml95os8',
 '⟨surrealdb.com⟩:km39swfbtkbabqxj2djw',
 '⟨surrealdb.com⟩:78t5je6qyflg4wktpbuz',
 '⟨surrealdb.com⟩:yikrswb6cb0dyeij7sy3',
 '⟨surrealdb.com⟩:h1z155pvmf2izmwt8gtn',
 '⟨surrealdb.com⟩:4gsshdf1anll1l710c6c',
 '⟨surrealdb.com⟩:mll9dhn9wc199pcwvucf',
 '⟨surrealdb.com⟩:20x312gj3p5zlw5nwoc3',
 '⟨surrealdb.com⟩:zb10xd5hkwik22g0hvth',
 '⟨surrealdb.com⟩:lrzqw0ovcdd3tpdnza7i',
 '⟨surrealdb.com⟩:i9gvu10gywwdm0h4noi2',
 '⟨surrealdb.com⟩:petlksjj0x7v4cg3bq5y',
 '⟨surrealdb.com⟩:jbkyt339lj0cwnzfe4y8',
 '⟨surrealdb.com⟩:n7irbdyx76k2m2cx6vas',
 '⟨surrealdb.com⟩:a3t8izkjyeq337lwbrpb',
 '⟨surrealdb.com⟩:dhash0855vzc0s3aul3a',
 '⟨surrealdb.com⟩:u7ht511mbkmolcw2p5d1',
 '⟨surrealdb.com⟩:4fr6azac7a8c47fwdpfd',
 '⟨surrealdb.com⟩:8m868uytux27e7tqp5j7',
 '⟨surrealdb.com⟩:gpxv8rysofxi13bn2yfz']

## Querying documents with embedding function using similarity score

query is converted to its embedding vector and [`vector::similarity::cosine`](https://surrealdb.com/docs/surrealdb/surrealql/functions/database/vector#vectorsimilaritycosine) is used find matching documents.

In [14]:
await sdb.asimilarity_search("How do Joins work in SurrealDB?",k=1)

[Document(metadata={'id': '⟨surrealdb.com⟩:h1z155pvmf2izmwt8gtn'}, page_content='Advanced inter-document relations and analysis. No JOINs. No pain.\nWith full graph database functionality, SurrealDB enables more advanced querying and analysis. Records (or vertices) can be connected to one another with edges, each with its own record properties and metadata. Simple extensions to traditional SQL queries allow for multi-table, multi-depth document retrieval, efficiently in the database, without the use of complicated JOINs and without bringing the data down to the client.')]

In [15]:
await sdb.asimilarity_search("Can SurrealDB run on mobile platforms?",k=1)

[Document(metadata={'id': '⟨surrealdb.com⟩:petlksjj0x7v4cg3bq5y'}, page_content='"Here at Yaacomm we are already using SurrealDB in a significant way for our backend infrastructure, however I believe SurrealDB can play an even bigger role in other areas as well. Being able to embed SurrealDB locally in Android and iOS apps could make it a perfect fit for local caching. Additionally, this would provide us with all the benefits of SurrealDB such as full-text searching and its graph based nature without relying on a network connection."')]