## Introduction to Vector Databases (Vector DBs)
### What is a Vector Database?
A vector database is a type of database specifically designed to store, index, and retrieve high-dimensional vector embeddings efficiently. Unlike traditional relational databases that store structured data in tables, vector DBs are optimized for similarity search and nearest neighbor retrieval based on mathematical distances between vectors.

#### Key Concept: Vector Embeddings
A vector embedding is a numerical representation of an object (text, image, audio, video, etc.) in a multi-dimensional space. These embeddings are generated using machine learning models such as: Word2Vec, GloVe, and FastText (for text) BERT, GPT, or CLIP (for NLP and multimodal tasks) ResNet, EfficientNet, or ViT (for images)

Each embedding captures semantic meaning, enabling similarity-based retrieval rather than exact matching.

### Characteristics
Vector databases provide several advantages over traditional databases for AI-powered applications:

- **Fast Similarity Search:** Supports efficient nearest neighbor search algorithms like Approximate Nearest Neighbors (ANN) for quick lookups. Useful in applications like recommendation systems, image/video search, and semantic search.
- **Scalability:** Handles millions or even billions of vectors with optimized storage and indexing.
- **High-Dimensional Data Support:** Unlike relational databases that store structured rows and columns, vector DBs work with hundreds or thousands of dimensions.
- **Flexibility:** Supports unstructured data like text, images, audio, video, and multi-modal content.

Click [HERE](https://www.pinecone.io/learn/vector-database/) if you wish to learn more about Vactor Databases on Pinecone's website.

Now let's get our hands dirty with a little bit of Pinecone via building an LLM-based application! 

In [32]:
# install packages
!pip install langchain
!pip install pypdf
!pip install tiktoken
!pip install openai
!pip install langchain-openai
!pip install pinecone
!pip install langchain_pinecone



In [33]:
# pinecone KEY
KEY = "YOUR PINECONE KEY"

In [34]:
# import necessary libraries
import os
import pinecone
from langchain.llms import OpenAI
from langchain.prompts import PromptTemplate
from langchain.document_loaders import PyPDFDirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain.chains import RetrievalQA
from langchain.vectorstores import Pinecone
from langchain_pinecone import PineconeVectorStore
from pinecone import Pinecone, ServerlessSpec, PodSpec

#### **NOTE!** You will need a/some pdf file(s) in a directory named as pdfs for instance!
If it doesn't already exist, you can create it through the command `!mkdir pdfs` and then it will appear in the same directory as your Jupyter Notebook file.

In [35]:
# load the pdfs
pdf_loader = PyPDFDirectoryLoader("pdfs")
data = pdf_loader.load()

In [36]:
# print and display the data
# data

You will need `RecursiveCharacterTextSplitter()` to split text by recursively look at characters. The `RecursiveCharacterTextSplitter()` is a utility class typically used for splitting large texts into smaller chunks in natural language processing (NLP) tasks. This is particularly helpful when dealing with long documents or text data, where you need to process or analyze text in manageable parts without losing any important context.

In [37]:
txt_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=20)

In [38]:
txt_chunks = txt_splitter.split_documents(data)

In [39]:
type(txt_chunks)

list

In [40]:
# print and display the data chunks
# txt_chunks

In [41]:
txt_chunks[0]

Document(metadata={'source': 'pdfs\\attention-is-all-you-need.pdf', 'page': 0, 'page_label': '1'}, page_content='Provided proper attribution is provided, Google hereby grants permission to\nreproduce the tables and figures in this paper solely for use in journalistic or\nscholarly works.\nAttention Is All You Need\nAshish Vaswani∗\nGoogle Brain\navaswani@google.com\nNoam Shazeer∗\nGoogle Brain\nnoam@google.com\nNiki Parmar∗\nGoogle Research\nnikip@google.com\nJakob Uszkoreit∗\nGoogle Research\nusz@google.com\nLlion Jones∗\nGoogle Research\nllion@google.com\nAidan N. Gomez∗ †\nUniversity of Toronto\naidan@cs.toronto.edu')

In [42]:
print(txt_chunks[0].page_content)

Provided proper attribution is provided, Google hereby grants permission to
reproduce the tables and figures in this paper solely for use in journalistic or
scholarly works.
Attention Is All You Need
Ashish Vaswani∗
Google Brain
avaswani@google.com
Noam Shazeer∗
Google Brain
noam@google.com
Niki Parmar∗
Google Research
nikip@google.com
Jakob Uszkoreit∗
Google Research
usz@google.com
Llion Jones∗
Google Research
llion@google.com
Aidan N. Gomez∗ †
University of Toronto
aidan@cs.toronto.edu


In [43]:
len(txt_chunks)

90

In [44]:
# set OpenAI API KEY
import os
os.environ["OPENAI_API_KEY"]="YOUR OPENAI API KEY"

In [45]:
embedding=OpenAIEmbeddings()

In [46]:
sample_embedding = embedding.embed_query("What is your name?")
len(sample_embedding)

1536

**Pinecone** with **LangChain** requires explicitly initializing Pinecone, creating an index manually, generating embeddings separately, and storing them using the Pinecone class. This replaces the older Pinecone.from_texts() method, providing more flexibility and better compatibility with the latest versions of Pinecone and LangChain. 🚀

In [47]:
PINECONE_API_KEY = os.environ.get("PINECONE_API_KEY", KEY)
PINECONE_API_ENV = os.environ.get("PINECONE_API_ENV", "gcp-starter")

In [48]:
# Initialize Pinecone client
pc = pinecone.Pinecone(api_key=PINECONE_API_KEY)

cloud = os.getenv("PINECONE_CLOUD", "aws")
region = os.getenv("PINECONE_REGION", "us-east-1")

index_name = "pinecone-index"

spec = ServerlessSpec(cloud=cloud, region=region)

existing_indexes = [index_info["name"] for index_info in pc.list_indexes()]

# Check if the index exists, otherwise create it
if index_name not in existing_indexes:
    pc.create_index(name=index_name, dimension=1536, metric="cosine", spec=spec)  # Use dimension = 1536 for OpenAI embeddings
    # Wait for index to be initialized
    while not pc.describe_index(index_name).status["ready"]:
        time.sleep(1)
    print(f"Index {index_name} has been successfully created.")
else:
    print(f"Index {index_name} already exist.")
    
# Connect to the index
index = pc.Index(index_name)

Index pinecone-index already exist.


In [49]:
# View index stats
index.describe_index_stats()

{'dimension': 1536,
 'index_fullness': 0.0,
 'namespaces': {'': {'vector_count': 90}},
 'total_vector_count': 90}

In [50]:
# docsearch = Pinecone.from_texts([t.page_content for t in txt_chunks], embedding, index_name=index_name)

#Extract texts
texts = [t.page_content for t in txt_chunks]

# Generate embeddings
txt_embeddings = embedding.embed_documents(texts)

In [51]:
len(txt_embeddings)

90

In [52]:
# txt_embeddings[0]

#### Time to prepare data for Pinecone in a desired format (Each entry: `(ID, vector, metadata)`.

In [53]:
# Prepare data for Pinecone (each chunk needs an ID)
vector_data = [
    (str(i), txt_embeddings[i], {"text": texts[i]})  # Each entry: (ID, vector, metadata)
    for i in range(len(txt_chunks))
]

In [54]:
# Upload to Pinecone
index.upsert(vectors=vector_data)

{'upserted_count': 90}

In [55]:
# vector_data[0]

Aa soon as your embeddings identified by a particular index are uploaded into your Pinecone, you can simply verify their presence as shown below:

In [56]:
# Verify inserted data (Optional)
print(index.describe_index_stats())  # Shows number of vectors stored

{'dimension': 1536,
 'index_fullness': 0.0,
 'namespaces': {'': {'vector_count': 90}},
 'total_vector_count': 90}


It is clearly noticeable that `total_vector_count` has become equal to 90.

The observed changes in the output of `index.describe_index_stats()` indicate that vectors have been successfully added to the index between the two function calls. Initially, the index was empty, as seen in the `total_vector_count` being `0` and the `namespaces` dictionary being empty (`{}`). After running the function again, the `total_vector_count` increased to `90`, and the `namespaces` field now contains an entry (`''`) with `vector_count` set to `90`. This suggests that 90 vectors were inserted into the index under the default namespace.

Despite the increase in the number of stored vectors, other index properties, such as `dimension` (1536), `metric` (cosine similarity), `vector_type` (dense), and `index_fullness` (0.0), remain unchanged. This means that the structure and settings of the index have not been modified; rather, only the content has been updated. The presence of a populated namespace reinforces the idea that vectors were added under a specific category, which by default appears to be an unnamed namespace (`''`).

In [57]:
query = "What is sometimes called intra-attention?"

In [58]:
# Generate embedding for the query
query_embedding = embedding.embed_query(query)

In [59]:
# Perform the search in Pinecone
search_results = index.query(vector=query_embedding, top_k=3, include_metadata=True)

#### Below you can see the top 3 matching results each of which has its own ID:

In [60]:
search_results

{'matches': [{'id': '12',
              'metadata': {'text': 'reduced to a constant number of '
                                   'operations, albeit at the cost of reduced '
                                   'effective resolution due\n'
                                   'to averaging attention-weighted positions, '
                                   'an effect we counteract with Multi-Head '
                                   'Attention as\n'
                                   'described in section 3.2.\n'
                                   'Self-attention, sometimes called '
                                   'intra-attention is an attention mechanism '
                                   'relating different positions\n'
                                   'of a single sequence in order to compute a '
                                   'representation of the sequence. '
                                   'Self-attention has been'},
              'score': 0.86995846,
              'v

#### **BUT** is it really what we were looking for...? Naah!
The result would be even more meaningful if an LLM model is involved to retrieve and format the corresponding answer.

In [70]:
os.environ["PINECONE_API_KEY"]=KEY

# Let's create a new index
index_name2 = "retreival-qa"

# Check if the index exists, otherwise create it
if index_name2 not in existing_indexes:
    pc.create_index(name=index_name2, dimension=1536, metric="cosine", spec=spec)  # Use dimension = 1536 for OpenAI embeddings
    # Wait for index to be initialized
    while not pc.describe_index(index_name2).status["ready"]:
        time.sleep(1)
    print(f"Index {index_name2} has been successfully created.")
else:
    print(f"Index {index_name2} already exist.")


Index retreival-qa already exist.


`vectorstore_from_chunks` is, in our case, a **Pinecone** vector database storing chunked document embeddings.

In [72]:
# Create a VectorScore from the text chunks already generated
# through which you can add more records to Pincone index

vectorstore_from_chunks = PineconeVectorStore.from_documents(
    txt_chunks,
    index_name=index_name2,
    embedding=embedding
)

In [73]:
index.describe_index_stats()

{'dimension': 1536,
 'index_fullness': 0.0,
 'namespaces': {'': {'vector_count': 90}},
 'total_vector_count': 90}

In [74]:
# perform a similarity search
query2 = "What behaviour do many attention heads exhibit?"
response2 = vectorstore_from_chunks.similarity_search(query2)
response2

[Document(id='e58f09f1-91e7-4c6d-83cb-3ead8cea9067', metadata={'page': 14.0, 'page_label': '15', 'source': 'pdfs\\attention-is-all-you-need.pdf'}, page_content='just\n-\nthis\nis\nwhat\nwe\nare\nmissing\n,\nin\nmy\nopinion\n.\n<EOS>\n<pad>\nFigure 5: Many of the attention heads exhibit behaviour that seems related to the structure of the\nsentence. We give two such examples above, from two different heads from the encoder self-attention\nat layer 5 of 6. The heads clearly learned to perform different tasks.\n15'),
 Document(id='ef6ec2c5-9ba6-4995-bb0d-2d04e631cd49', metadata={'page': 14.0, 'page_label': '15', 'source': 'pdfs\\attention-is-all-you-need.pdf'}, page_content='just\n-\nthis\nis\nwhat\nwe\nare\nmissing\n,\nin\nmy\nopinion\n.\n<EOS>\n<pad>\nFigure 5: Many of the attention heads exhibit behaviour that seems related to the structure of the\nsentence. We give two such examples above, from two different heads from the encoder self-attention\nat layer 5 of 6. The heads clearly lea

In [75]:
llm = ChatOpenAI(model_name="gpt-3.5-turbo",
                temperature=0.0)

**NOTE!** OpenAI API Key is not passed to ChatOpenAI() since it has already been declared and added through `os.environ[...] = "<YOUR API KEY>"`.

The code line below is creating a **Retrieval-Augmented Generation (RAG)** pipeline using RetrievalQA from LangChain:

In [76]:
qa = RetrievalQA.from_chain_type(llm=llm, 
                                 chain_type="stuff",
                                 retriever=vectorstore_from_chunks.as_retriever())

This initializes a RetrievalQA pipeline that integrates a retriever with a language model (LLM). The RetrievalQA model first retrieves relevant documents and then generates answers based on them.

When `qa.invoke(query2)` is called, the system:
- Searches for relevant documents in the vector store.
- Feeds them into the LLM using the "stuff" method.
- Generates an answer based on both retrieved data and the LLM’s knowledge

In [78]:
qa.invoke(query2)

{'query': 'What behaviour do many attention heads exhibit?',
 'result': 'Many of the attention heads exhibit behavior related to the structure of the sentence and seem to have learned to perform different tasks.'}

#### Basic Application
Let's create a very simple application throught which the process can be uncerstood in a much clearer way.

In [82]:
import sys

while True:
    user_input = input(f"Enter your prompt: ")
    if user_input == "exit":
        print("Exiting...")
        sys.exit()
    if user_input == "":
        continue
    result = qa({"query": user_input})
    print(f"Answer: {result['result']}")

Enter your prompt:  What is attention?


Answer: Attention is an mechanism in natural language processing that allows a model to focus on specific parts of the input sequence when making predictions or generating output. It helps the model weigh the importance of different words or tokens in the input sequence.


Enter your prompt:  Who has authored Attention?


Answer: Attention has been authored by Noam, Niki, and Llion.


Enter your prompt:  exit


Exiting...


SystemExit: 

  warn("To exit: use 'exit', 'quit', or Ctrl-D.", stacklevel=1)
