# Notebook 02: Vector Store Integration

This notebook focuses on integrating the `TfidfVectorStore` with the RAG framework. We'll explain how vector stores work, how to store and query documents, and retrieve relevant documents based on queries.

## What is a Vector Store?

A vector store is a specialized data structure that stores embeddings of documents in a numerical format. In this notebook, we will be using the `TfidfVectorStore`, which leverages the Term Frequency-Inverse Document Frequency (TF-IDF) technique to create document vectors. The vectors are used for similarity search, which allows us to query the store and retrieve documents that are most relevant to the query.

## Setting Up the Vector Store

We will begin by initializing the vector store and adding some documents.

In [55]:
from swarmauri.vector_stores.concrete.TfidfVectorStore import TfidfVectorStore
from swarmauri.documents.concrete.Document import Document

# Initialize the vector store
vector_store = TfidfVectorStore()

# Sample documents
documents = [
    Document(content="Python is a versatile programming language."),
    Document(content="Data science uses machine learning and statistics."),
    Document(content="Python is popular in data science."),
    Document(content="AI advancements are driven by machine learning."),
]

# Add documents to the vector store
vector_store.add_documents(documents)

print("Documents successfully added to the vector store!")


Documents successfully added to the vector store!


## Retrieving Relevant Documents Based on Query

Once the documents are stored, we can retrieve the most relevant documents for a given query by calculating the similarity between the query and the stored document.

In [56]:
# Query with specific text
query = "Python in data science"
results = vector_store.retrieve(query=query, top_k=2)

# Display the results
print("Query results:")
for idx, result in enumerate(results, 1):
    print(f"Result {idx}: {result.content}")


Query results:
Result 1: Python is popular in data science.
Result 2: Python is a versatile programming language.


## Additional Features

We can add more documents and also retrieve all the documents that are in the Vector Store.

In [57]:
# Add more test documents
test_documents = [
    Document(content="Artificial Intelligence is transforming industries."),
    Document(content="Blockchain technology ensures data security."),
    Document(content="Cloud computing provides scalable resources."),
    Document(content="The Internet of Things connects everyday devices.")
]


# Add these test documents to the vector store
vector_store.add_documents(test_documents)


# Retrieve and print all documents
all_docs = vector_store.get_all_documents()

print("All documents in the vector store:")
for doc in all_docs:
    print(doc.content)


All documents in the vector store:
Python is a versatile programming language.
Data science uses machine learning and statistics.
Python is popular in data science.
AI advancements are driven by machine learning.
Artificial Intelligence is transforming industries.
Blockchain technology ensures data security.
Cloud computing provides scalable resources.
The Internet of Things connects everyday devices.


## Conclusion

In this notebook, we've demonstrated how to use a vector store (`TfidfVectorStore`) to store and retrieve documents based on similarity to a query. This process is a crucial part of building a Retrieval-Augmented Generation (RAG) system, where retrieving contextually relevant documents enhances the output of language models.

## Notebook Metadata

In [58]:
import os
import platform
import sys
from datetime import datetime

author_name = "Huzaifa Irshad " 
github_username = "irshadhuzaifa"

print(f"Author: {author_name}")
print(f"GitHub Username: {github_username}")

notebook_file = "Notebook_02_Vector_Store_Integration.ipynb"
try:
    last_modified_time = os.path.getmtime(notebook_file)
    last_modified_datetime = datetime.fromtimestamp(last_modified_time)
    print(f"Last Modified: {last_modified_datetime}")
except Exception as e:
    print(f"Could not retrieve last modified datetime: {e}")

print(f"Platform: {platform.system()} {platform.release()}")
print(f"Python Version: {sys.version}")

try:
    import swarmauri
    print(f"Swarmauri Version: {swarmauri.__version__}")
except ImportError:
    print("Swarmauri is not installed.")

Author: Huzaifa Irshad 
GitHub Username: irshadhuzaifa
Last Modified: 2024-10-22 16:13:01.232893
Platform: Windows 11
Python Version: 3.12.7 | packaged by Anaconda, Inc. | (main, Oct  4 2024, 13:17:27) [MSC v.1929 64 bit (AMD64)]
Swarmauri Version: 0.5.0
