<a href="https://colab.research.google.com/github/osaeed-ds/vector-hello/blob/main/Osaeed_Neo4j.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Neo4j Vector as a Vector Database**
This is a hello world exercise based on the Vector Search quickstart on the LangChain website.
https://python.langchain.com/docs/integrations/vectorstores/neo4jvector

The dataset did not work in the example (did not specify where to get the file) so I substituted my own dataset.



## **Prerequisites**

In [1]:
%pip install neo4j openai tiktoken langchain

Collecting neo4j
  Downloading neo4j-5.13.0.tar.gz (192 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m192.3/192.3 kB[0m [31m1.2 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25h  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Installing backend dependencies ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone
[?25hCollecting openai
  Obtaining dependency information for openai from https://files.pythonhosted.org/packages/1e/9f/385c25502f437686e4aa715969e5eaf5c2cb5e5ffa7c5cdd52f3c6ae967a/openai-0.28.1-py3-none-any.whl.metadata
  Downloading openai-0.28.1-py3-none-any.whl.metadata (11 kB)
Collecting tiktoken
  Obtaining dependency information for tiktoken from https://files.pythonhosted.org/packages/0b/c9/cd8a2e95078f94a40bf1408c0ac353570114976fda90fc8da62d3c85fff6/tiktoken-0.5.1-cp310-cp310-macosx_10_9_x86_64.whl.metadata
  Downloading tiktoken-0.5.1-cp310-cp310-macosx_10_9

In [2]:
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Neo4jVector
from langchain.document_loaders import TextLoader
from langchain.docstore.document import Document

## **Embedding Engine**
We will use Open AI

In [3]:
import os
import getpass

os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")

## **Dataset**
We will use the US Constitution as our dataset

In [5]:
!curl https://www.govinfo.gov/content/pkg/CDOC-110hdoc50/html/CDOC-110hdoc50.htm > constitution.txt

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  291k    0  291k    0     0   580k      0 --:--:-- --:--:-- --:--:--  588k


## **Generate Embeddings**
Use LangChain to chunk the dataset and use OpenAI for embeddings.

In [6]:
loader = TextLoader("constitution.txt")

documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)

embeddings = OpenAIEmbeddings()

Created a chunk of size 4562, which is longer than the specified 1000
Created a chunk of size 21641, which is longer than the specified 1000
Created a chunk of size 6612, which is longer than the specified 1000
Created a chunk of size 2609, which is longer than the specified 1000
Created a chunk of size 2239, which is longer than the specified 1000
Created a chunk of size 1870, which is longer than the specified 1000
Created a chunk of size 2679, which is longer than the specified 1000
Created a chunk of size 1111, which is longer than the specified 1000
Created a chunk of size 1860, which is longer than the specified 1000
Created a chunk of size 2927, which is longer than the specified 1000
Created a chunk of size 2233, which is longer than the specified 1000
Created a chunk of size 2149, which is longer than the specified 1000
Created a chunk of size 1702, which is longer than the specified 1000
Created a chunk of size 1615, which is longer than the specified 1000
Created a chunk of 

## **Connect to Neo4j and load the embeddings**

In [7]:
from getpass import getpass
# Input your Neo4j password
NEO4J_PASSWORD = getpass('Your Neo4j password: ')

In [8]:
# Neo4jVector requires the Neo4j database credentials

url = "neo4j+s://49920e97.databases.neo4j.io:7687"
username = "neo4j"
password = NEO4J_PASSWORD



In [9]:
# The Neo4jVector Module will connect to Neo4j and create a vector index if needed.

db = Neo4jVector.from_documents(
    docs, OpenAIEmbeddings(), url=url, username=username, password=password
)

## **Query the DB**

In [15]:
query = "Do I have freedom from a nationalized religion?"
docs_with_score = db.similarity_search_with_score(query)

In [16]:
for doc, score in docs_with_score:
    print("-" * 80)
    print("Score: ", score)
    print(doc.page_content)
    print("-" * 80)

--------------------------------------------------------------------------------
Score:  0.8886802792549133
Congress shall make no law respecting an establishment of 
religion, or prohibiting the free exercise thereof; or 
abridging the freedom of speech, or of the press; of the right 
of the people peaceably to assemble, and to petition the 
Government for a redress of grievances.
---------------------------------------------------------------------------
                                   * * * * *                              
\12\The first ten amendments of the Constitution of the United States 
(and two others, one of which failed of ratification and the other 
which later became the 27th amendment) were proposed to the 
legislatures of the several States by the First Congress on September 
25, 1789. The first ten amendments were ratified by the following 
States, and the notifications of ratification by the Governors thereof 
were successively communicated by the President to Con