### Step 1: Basic Structure and Initialization
1. **Define the vector database class**:
   - Create a class `VectorDB`.
   - Initialize attributes for storing vectors and their metadata (e.g., user IDs, movie preferences).

2. **Choose the vector representation**:
   - Decide on how movie data and user preferences will be represented (e.g., vectors of numeric ratings or embeddings).

### Step 2: Insert and Store Vectors
1. **Implement a method to add vectors**:
   - Create a method `add_vector(user_id, vector)` to add a user's preference vector.
   - Store vectors in an appropriate data structure (e.g., a dictionary or an array).

2. **Save metadata with vectors**:
   - Ensure that vectors are associated with user identifiers or movie IDs for future retrieval.

### Step 3: Vector Search and Similarity Calculation
1. **Implement similarity search**:
   - Create a method `find_similar(vector, top_k)` using distance metrics like cosine similarity or Euclidean distance.
   - Use efficient algorithms (e.g., brute-force search or KD-Tree) to compare vectors.

2. **Return top-k similar vectors**:
   - Return the most similar vectors based on the similarity scores.

### Step 4: Update and Delete Operations
1. **Add a method to update vectors**:
   - Implement `update_vector(user_id, new_vector)` for modifying stored data.
2. **Add a method to delete vectors**:
   - Implement `delete_vector(user_id)` for data removal.

### Step 5: Persistence and Data Storage
1. **Serialize vectors to disk**:
   - Use libraries like `pickle` or `json` to save and load the database.
2. **Load vectors from disk**:
   - Ensure there is a method to reload the database state for reuse.

### Step 6: Integrate with the Recommender System
1. **Prepare movie data**:
   - Convert movie attributes and user interactions into vectors (e.g., through embeddings or feature engineering).
2. **Query the vector database**:
   - Use `find_similar()` to identify users or movies with similar preferences.
3. **Generate recommendations**:
   - Rank movies by similarity and provide recommendations to the user.

### Step 7: Test and Optimize
1. **Benchmark performance**:
   - Test the database with different sizes of data and evaluate response time.
2. **Optimize data structures**:
   - Use `numpy` arrays or libraries like `scipy` for efficiency in vector operations.
3. **Improve search algorithms**:
   - Implement faster approximate nearest neighbor search methods (e.g., locality-sensitive hashing).

In [7]:
from scipy.spatial import KDTree
import numpy as np
from typing import Dict, List, Tuple, Union

class VectorDB:
    def __init__(self) -> None:
        # Store vectors and their metadata in a dictionary for fast lookup
        self.vector_dict: Dict[str, np.ndarray] = {}  # Maps user ID to vectors
        self.kd_tree: Union[KDTree, None] = None
        self.metadata: List[str] = []  # Store user IDs for KD-Tree rebuilding order

    def add_vector(self, user_id: str, vector: np.ndarray) -> None:
        """Add a vector with associated metadata (user_id)."""
        self.vector_dict[user_id] = vector
        self._rebuild_kd_tree()

    def _rebuild_kd_tree(self) -> None:
        """Rebuild the KD-Tree from the current vectors."""
        if self.vector_dict:
            self.metadata = list(self.vector_dict.keys())
            self.kd_tree = KDTree(list(self.vector_dict.values()))
        else:
            self.kd_tree = None

    def find_similar(self, query_vector: np.ndarray, top_k: int = 5) -> List[Tuple[str, float]]:
        """Find the top-k most similar vectors using Euclidean distance."""
        if not self.kd_tree:
            raise ValueError("KD-Tree is empty. Add vectors first.")

        distances, indices = self.kd_tree.query(query_vector, k=top_k)

        # Handle case when only one result is returned (convert to list)
        if top_k == 1:
            distances = [distances]
            indices = [indices]

        # Collect user IDs and distances
        results: List[Tuple[str, float]] = [(self.metadata[idx], distances[i]) for i, idx in enumerate(indices)]
        return results

    def update_vector(self, user_id: str, new_vector: np.ndarray) -> bool:
        """Update an existing vector for a given user ID."""
        if user_id in self.vector_dict:
            self.vector_dict[user_id] = new_vector
            self._rebuild_kd_tree()  # Rebuild the KD-Tree after updating
            return True
        return False  # Return False if the user ID was not found

    def delete_vector(self, user_id: str) -> bool:
        """Delete a vector by its user ID."""
        if user_id in self.vector_dict:
            del self.vector_dict[user_id]
            self._rebuild_kd_tree()  # Rebuild the KD-Tree after deletion
            return True
        return False  # Return False if the user ID was not found


In [8]:
# Example usage
if __name__ == "__main__":
    db = VectorDB()
    db.add_vector('user1', np.array([1.0, 2.0, 3.0]))
    db.add_vector('user2', np.array([4.0, 5.0, 6.0]))
    db.add_vector('user3', np.array([7.0, 8.0, 9.0]))

    print("Updating user2's vector:")
    update_success = db.update_vector('user2', np.array([4.1, 5.1, 6.1]))
    print("Update successful:", update_success)

    print("\nDeleting user1's vector:")
    delete_success = db.delete_vector('user1')
    print("Delete successful:", delete_success)

    print("\nVectors remaining in the database:")
    print(db.vector_dict.keys())

Updating user2's vector:
Update successful: True

Deleting user1's vector:
Delete successful: True

Vectors remaining in the database:
dict_keys(['user2', 'user3'])
