## **Notebook 02: Document Loading and Retrieval**
## **Introduction:**
In this notebook, we will focus on the practical task of loading documents into a vector store and performing retrieval operations. Vector stores like the TfidfVectorStore are capable of storing a collection of documents and allowing efficient search and retrieval based on a query. We will demonstrate how to add multiple documents to the vector store and retrieve the most relevant ones based on a specific query. This process is fundamental in many applications, such as search engines, recommendation systems, and question-answering systems.

**Import dependencies**

In [13]:
from swarmauri.documents.concrete.Document import Document
from swarmauri.vector_stores.concrete.TfidfVectorStore import TfidfVectorStore

**Create a TFIDF Vector Store**

In [2]:

vs = TfidfVectorStore()

**Loading Documents into the Vector Store**

In [3]:

documents = [
    Document(content="Python is a versatile programming language."),
    Document(content="Data science uses machine learning and statistics."),
    Document(content="Python is popular in data science."),
    Document(content="AI advancements are driven by machine learning."),
]

In [5]:
vs.add_documents(documents)

**Retrieve Documents Using a Query**

In [6]:

query = "Python in data science"
top_k = 3  # Retrieve the top 3 most relevant documents
results = vs.retrieve(query=query, top_k=top_k)

In [9]:
#  Display the Results
print(f"Top {top_k} Results for query '{query}':")
for idx, result in enumerate(results, 1):
    print(f"Result {idx}: {result.content}")


Top 3 Results for query 'Python in data science':
Result 1: Python is a versatile programming language.
Result 2: Python is popular in data science.
Result 3: Python is popular in data science.


In [10]:
# Explore Document Metadata 
for idx, document in enumerate(documents, 1):
    print(f"Document {idx} Metadata: {document.metadata}")

Document 1 Metadata: {}
Document 2 Metadata: {}
Document 3 Metadata: {}
Document 4 Metadata: {}


In [12]:
import os
import platform
import sys
from datetime import datetime

# Display author information
author_name = "Dominion John " 
github_username = "DOMINION-JOHN1"  

print(f"Author: {author_name}")
print(f"GitHub Username: {github_username}")

# Last modified datetime (file's metadata)
notebook_file = "Notebook_02_Document_Loading_and_Retrieval.ipynb"  
try:
    last_modified_time = os.path.getmtime(notebook_file)
    last_modified_datetime = datetime.fromtimestamp(last_modified_time)
    print(f"Last Modified: {last_modified_datetime}")
except Exception as e:
    print(f"Could not retrieve last modified datetime: {e}")

# Display platform, Python version, and Swarmauri version
print(f"Platform: {platform.system()} {platform.release()}")
print(f"Python Version: {sys.version}")

# Checking Swarmauri version
try:
    import swarmauri
    print(f"Swarmauri Version: {swarmauri.__version__}")
except ImportError:
    print("Swarmauri is not installed.")

Author: Dominion John 
GitHub Username: DOMINION-JOHN1
Last Modified: 2024-10-17 13:52:54.916858
Platform: Windows 11
Python Version: 3.12.7 (tags/v3.12.7:0b05ead, Oct  1 2024, 03:06:41) [MSC v.1941 64 bit (AMD64)]
Swarmauri Version: 0.5.0
