# KDB.AI

> [KDB.AI](https://kdb.ai/) is a powerful knowledge-based vector database and search engine that allows you to build scalable, reliable AI applications, using real-time data, by providing advanced search, recommendation and personalization.

[This example](https://github.com/KxSystems/kdbai-samples/blob/main/document_search/document_search.ipynb) demonstrates how to use KDB.AI to run semantic search on unstructured text documents.

To access your end point and API keys, [sign up to KDB.AI here](https://kdb.ai/get-started/).

To set up your development environment, follow the instructions on the [KDB.AI pre-requisites page](https://code.kx.com/kdbai/pre-requisites.html).

The following examples demonstrate some of the ways you can interact with KDB.AI through LangChain.

You'll need to install `langchain-community` with `pip install -qU langchain-community` to use this integration

## Import required packages

In [1]:
import os
import time
from getpass import getpass

import kdbai_client as kdbai
import pandas as pd
import requests
from langchain.chains import RetrievalQA
from langchain_community.document_loaders import PyPDFLoader
from langchain_community.vectorstores import KDBAI
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

In [2]:
KDBAI_ENDPOINT = input("KDB.AI endpoint: ")
KDBAI_API_KEY = getpass("KDB.AI API key: ")
if "OPENAI_API_KEY" not in os.environ:
    os.environ["OPENAI_API_KEY"] = getpass("OpenAI API Key: ")

KDB.AI endpoint:  https://ui.qa.cld.kx.com/instance/pcnvlmi860
KDB.AI API key:  ········
OpenAI API Key:  ········


In [3]:
TEMP = 0.0
K = 3

## Create a KBD.AI Session

In [4]:
print("Create a KDB.AI session...")
session = kdbai.Session(endpoint=KDBAI_ENDPOINT, api_key=KDBAI_API_KEY)

Create a KDB.AI session...


## Create a table

In [5]:
print('Create table "documents"...')
schema = {
    "columns": [
        {"name": "id", "pytype": "str"},
        {"name": "text", "pytype": "bytes"},
        {
            "name": "embeddings",
            "pytype": "float32",
            "vectorIndex": {"dims": 1536, "metric": "L2", "type": "hnsw"},
        },
        {"name": "tag", "pytype": "str"},
        {"name": "title", "pytype": "bytes"},
    ]
}
table = session.create_table("documents", schema)

Create table "documents"...


In [6]:
%%time
URL = "https://www.conseil-constitutionnel.fr/node/3850/pdf"
PDF = "Déclaration_des_droits_de_l_homme_et_du_citoyen.pdf"
open(PDF, "wb").write(requests.get(URL).content)

CPU times: user 44.1 ms, sys: 6.04 ms, total: 50.2 ms
Wall time: 213 ms


562978

## Read a PDF

In [7]:
%%time
print("Read a PDF...")
loader = PyPDFLoader(PDF)
pages = loader.load_and_split()
len(pages)

Read a PDF...
CPU times: user 156 ms, sys: 12.5 ms, total: 169 ms
Wall time: 183 ms


3

## Create a Vector Database from PDF Text

In [8]:
%%time
print("Create a Vector Database from PDF text...")
embeddings = OpenAIEmbeddings(model="text-embedding-ada-002")
texts = [p.page_content for p in pages]
metadata = pd.DataFrame(index=list(range(len(texts))))
metadata["tag"] = "law"
metadata["title"] = "Déclaration des Droits de l'Homme et du Citoyen de 1789".encode(
    "utf-8"
)
vectordb = KDBAI(table, embeddings)
vectordb.add_texts(texts=texts, metadatas=metadata)

Create a Vector Database from PDF text...
CPU times: user 211 ms, sys: 18.4 ms, total: 229 ms
Wall time: 2.23 s


['3ef27d23-47cf-419b-8fe9-5dfae9e8e895',
 'd3a9a69d-28f5-434b-b95b-135db46695c8',
 'd2069bda-c0b8-4791-b84d-0c6f84f4be34']

## Create LangChain Pipeline

In [9]:
%%time
print("Create LangChain Pipeline...")
qabot = RetrievalQA.from_chain_type(
    chain_type="stuff",
    llm=ChatOpenAI(model="gpt-3.5-turbo-16k", temperature=TEMP),
    retriever=vectordb.as_retriever(search_kwargs=dict(k=K)),
    return_source_documents=True,
)

Create LangChain Pipeline...
CPU times: user 40.8 ms, sys: 4.69 ms, total: 45.5 ms
Wall time: 44.7 ms


## Summarize the document in English

In [10]:
%%time
Q = "Summarize the document in English:"
print(f"\n\n{Q}\n")
print(qabot.invoke(dict(query=Q))["result"])



Summarize the document in English:

The document is the Declaration of the Rights of Man and of the Citizen of 1789. It was written by the representatives of the French people and aims to declare the natural, inalienable, and sacred rights of every individual. These rights include freedom, property, security, and resistance to oppression. The document emphasizes the importance of equality and the principle that sovereignty resides in the nation. It also highlights the role of law in protecting individual rights and ensuring the common good. The document asserts the right to freedom of thought, expression, and religion, as long as it does not disturb public order. It emphasizes the need for a public force to guarantee the rights of all citizens and the importance of a fair and equal distribution of public contributions. The document also recognizes the right of citizens to hold public officials accountable and states that any society without the guarantee of rights and separation of p

## Query the Data

In [11]:
%%time
Q = "Is it a fair law and why ?"
print(f"\n\n{Q}\n")
print(qabot.invoke(dict(query=Q))["result"])



Is it a fair law and why ?

As an AI language model, I don't have personal opinions. However, I can provide some analysis based on the given context. The text provided is an excerpt from the Declaration of the Rights of Man and of the Citizen of 1789, which is considered a foundational document in the history of human rights. It outlines the natural and inalienable rights of individuals, such as freedom, property, security, and resistance to oppression. It also emphasizes the principles of equality, the rule of law, and the separation of powers. 

Whether or not this law is considered fair is subjective and can vary depending on individual perspectives and societal norms. However, many consider the principles and rights outlined in this declaration to be fundamental and just. It is important to note that this declaration was a significant step towards establishing principles of equality and individual rights in France and has influenced subsequent human rights documents worldwide.
CP

In [12]:
%%time
Q = "What are the rights and duties of the man, the citizen and the society ?"
print(f"\n\n{Q}\n")
print(qabot.invoke(dict(query=Q))["result"])



What are the rights and duties of the man, the citizen and the society ?

According to the Declaration of the Rights of Man and of the Citizen of 1789, the rights and duties of man, citizen, and society are as follows:

Rights of Man:
1. Men are born and remain free and equal in rights. Social distinctions can only be based on common utility.
2. The purpose of political association is the preservation of the natural and imprescriptible rights of man, which are liberty, property, security, and resistance to oppression.
3. The principle of sovereignty resides essentially in the nation. No body or individual can exercise any authority that does not emanate expressly from it.
4. Liberty consists of being able to do anything that does not harm others. The exercise of natural rights of each man has no limits other than those that ensure the enjoyment of these same rights by other members of society. These limits can only be determined by law.
5. The law has the right to prohibit only actio

In [13]:
%%time
Q = "Is this law practical ?"
print(f"\n\n{Q}\n")
print(qabot.invoke(dict(query=Q))["result"])



Is this law practical ?

As an AI language model, I cannot provide personal opinions or subjective judgments on whether a law is practical or not. The texts provided are excerpts from the French Constitution and the Declaration of the Rights of Man and of the Citizen of 1789. These texts outline fundamental rights and principles that form the basis of the French legal system. The practicality of a law is often a matter of interpretation and can vary depending on the context and specific circumstances. It is ultimately up to legal experts, lawmakers, and the judiciary to determine the practicality and application of these laws in specific cases.
CPU times: user 91.4 ms, sys: 5.89 ms, total: 97.3 ms
Wall time: 2.78 s


## Clean up the Documents table

In [14]:
# Clean up KDB.AI "documents" table and index for similarity search
# so this notebook could be played again and again
session.table("documents").drop()

True