## 1. Importing and Initializing
   
First, import the VectorDBManager class from ragit and initialize it:

In [20]:
from ragit import VectorDBManager

# Initialize the vector database manager with a custom persistence directory and model
db_manager = VectorDBManager(
    persist_directory="./vector_db", 
    provider="sentence_transformer", 
    model_name="all-mpnet-base-v2" 
)

2025-02-15 14:02:50,864 - INFO - Anonymized telemetry enabled. See                     https://docs.trychroma.com/telemetry for more information.
2025-02-15 14:02:51,353 - INFO - Use pytorch device_name: cpu
2025-02-15 14:02:51,354 - INFO - Load pretrained SentenceTransformer: all-mpnet-base-v2


## 2. Creating a Database
   
Create a new collection (named my_collection) using your CSV file. In this example, the distance_metric is set to "cosine"(available options: l2, cosine, ip, l1) :

In [21]:
db_manager.create_database(
    csv_path="data.csv", 
    collection_name="my_collection",
    distance_metric="cosine" 
)

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

2025-02-15 14:03:09,281 - INFO - Successfully created collection 'my_collection'


True

## Reloading Your Database

Once your database is created and your data is added, simply load it for later use by specifying the same folder:

In [23]:
from ragit import VectorDBManager

db_manager = VectorDBManager(
    persist_directory="./vector_db",
    provider="sentence_transformer",
    model_name="all-mpnet-base-v2"
)

2025-02-15 14:03:20,195 - INFO - Use pytorch device_name: cpu
2025-02-15 14:03:20,198 - INFO - Load pretrained SentenceTransformer: all-mpnet-base-v2


## 3. Adding a Single Entry
   
Add an individual entry to the collection:

In [24]:
db_manager.add_single_row(
    id_="101",
    text="This is a new test entry for the database.",
    collection_name="my_collection"
)

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

2025-02-15 14:03:28,383 - INFO - Successfully added entry with ID 101


True

## 4. Adding Multiple Entries from CSV
   
You can also add multiple entries from a CSV file. This function skips any entries that already exist in the collection:

In [25]:
stats = db_manager.add_values_from_csv(
    csv_path="data.csv",
    collection_name="my_collection"
)
print(f"Added {stats['new_entries_added']} new entries")

2025-02-15 14:03:37,190 - INFO - Added 0 new entries to 'my_collection'


Added 0 new entries


## 5. Retrieving Collection Information

Fetch and display information about your collection:

In [26]:
info = db_manager.get_collection_info("my_collection")
print(f"Collection size: {info['count']} entries")

Collection size: 11 entries


## 6. Performing a Similarity Search

Find texts that are similar to your query. In this example, the query text is "ai", and the search is filtered using the string "Artificial intelligence". The top 2 results are returned:

In [27]:
results = db_manager.find_nearby_texts(
    text="ai",
    collection_name="my_collection",
    k=2,
)

print("Results:")
for item in results:
    print(f"\nID: {item['id']}")
    print(f"Text: {item['text']}")
    print(f"Similarity: {item['similarity']}%")
    print(f"Distance ({item['metric']}): {item['raw_distance']}")

Metadata: {'description': 'Collection created from data.csv', 'hnsw:space': 'cosine'}


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Results:

ID: 2
Text: The mind contemplates existence.
Every thought is a step toward wisdom.
Similarity: 26.9621%
Distance (cosine): 0.7303788646902971

ID: 6
Text: The journey of self-discovery is endless.
Each reflection deepens our knowledge.
Similarity: 22.707%
Distance (cosine): 0.7729304372627932


In [28]:
results = db_manager.find_nearby_texts(
    text="ai",
    collection_name="my_collection",
    k=2,
    search_string="Artificial intelligence"
)

print("Results:")
for item in results:
    print(f"\nID: {item['id']}")
    print(f"Text: {item['text']}")
    print(f"Similarity: {item['similarity']}%")
    print(f"Distance ({item['metric']}): {item['raw_distance']}")

Metadata: {'description': 'Collection created from data.csv', 'hnsw:space': 'cosine'}


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Results:


## 7. Fetching Texts by IDs

Retrieve text entries for a list of IDs:

In [29]:
ids_to_fetch = ["1", "2", "3"]
texts = db_manager.get_by_ids(ids_to_fetch, "my_collection")
print("Texts:", texts)

Texts: {'1': 'Philosophy is the art of questioning.\r\nIt challenges our understanding of life.', '2': 'The mind contemplates existence.\r\nEvery thought is a step toward wisdom.', '3': 'Ancient sages pondered on virtue.\r\nTheir insights still echo today.'}


In [30]:
ids_to_fetch = ["10"]
texts = db_manager.get_by_ids(ids_to_fetch, "my_collection")
print("Texts:", texts)

Texts: {'10': 'The inquiry into existence never ceases.\r\nEvery question opens a door to deeper thought.'}


## 8. Deleting a Row / Collection

Remove an entry from the collection by its ID:

In [31]:
db_manager.delete_entry_by_id(
    id_="1",
    collection_name="my_collection"
)

2025-02-15 14:03:52,886 - INFO - Successfully deleted entry with ID 1


True

Delete an entire collection. **Note:** You must pass `confirmation="yes"` to proceed with deletion.

In [33]:
db_manager.delete_collection(
    collection_name="my_collection",
    confirmation="yes"
)

2025-02-15 14:04:03,198 - INFO - Successfully deleted collection 'my_collection'


True