## **Notebook 01: Introduction to Vector Stores**
## **Introduction:**
In this notebook, we will introduce the concept of vector stores, which are specialized data structures designed for storing and retrieving documents based on their vector representations. Vectorization is a key technique in natural language processing (NLP) and machine learning, as it converts textual data into numerical vectors that can be efficiently searched and compared. One common approach is the TF-IDF (Term Frequency-Inverse Document Frequency) method, which assigns a weight to each term in a document based on its importance relative to other documents. We will explore the TfidfVectorStore class and demonstrate how to store and query documents.

In [4]:
! pip install joblib 
! pip install scikit-learn




**Import dependencies**

In [3]:
from swarmauri.documents.concrete.Document import Document
from swarmauri.vector_stores.concrete.TfidfVectorStore import TfidfVectorStore

**Create a TFIDF Vector Store**

In [5]:
vs = TfidfVectorStore()

**Check the Vector Store's and embedder resource type**

In [6]:
print("Vector Store Resource:", vs.resource)

Vector Store Resource: VectorStore


In [7]:
print("Embedder Resource:", vs.embedder.resource)

Embedder Resource: Embedding


## **Conclusion:**
In this notebook, we have covered the basics of vector stores and learned how to create, populate, and retrieve documents from a TfidfVectorStore. TF-IDF allows us to represent text data in a way that highlights important terms and their relevance to the overall corpus. This method is particularly useful for information retrieval tasks, where finding relevant documents based on user queries is essential. Moving forward, we will explore more advanced techniques in vector-based document retrieval in the subsequent notebooks.

In [8]:
import os
import platform
import sys
from datetime import datetime

# Display author information
author_name = "Dominion John " 
github_username = "DOMINION-JOHN1"  

print(f"Author: {author_name}")
print(f"GitHub Username: {github_username}")

# Last modified datetime (file's metadata)
notebook_file = "Notebook_01_Introduction_to_Vector_Stores.ipynb"  
try:
    last_modified_time = os.path.getmtime(notebook_file)
    last_modified_datetime = datetime.fromtimestamp(last_modified_time)
    print(f"Last Modified: {last_modified_datetime}")
except Exception as e:
    print(f"Could not retrieve last modified datetime: {e}")

# Display platform, Python version, and Swarmauri version
print(f"Platform: {platform.system()} {platform.release()}")
print(f"Python Version: {sys.version}")

# Checking Swarmauri version
try:
    import swarmauri
    print(f"Swarmauri Version: {swarmauri.__version__}")
except ImportError:
    print("Swarmauri is not installed.")

Author: Dominion John 
GitHub Username: DOMINION-JOHN1
Last Modified: 2024-10-17 10:50:48.031729
Platform: Windows 11
Python Version: 3.12.7 (tags/v3.12.7:0b05ead, Oct  1 2024, 03:06:41) [MSC v.1941 64 bit (AMD64)]
Swarmauri Version: 0.5.0
