### Step 1: Basic Structure and Initialization
1. **Define the vector database class**:
   - Create a class `VectorDB`.
   - Initialize attributes for storing vectors and their metadata (e.g., user IDs, movie preferences).

2. **Choose the vector representation**:
   - Decide on how movie data and user preferences will be represented (e.g., vectors of numeric ratings or embeddings).

### Step 2: Insert and Store Vectors
1. **Implement a method to add vectors**:
   - Create a method `add_vector(user_id, vector)` to add a user's preference vector.
   - Store vectors in an appropriate data structure (e.g., a dictionary or an array).

2. **Save metadata with vectors**:
   - Ensure that vectors are associated with user identifiers or movie IDs for future retrieval.

### Step 3: Vector Search and Similarity Calculation
1. **Implement similarity search**:
   - Create a method `find_similar(vector, top_k)` using distance metrics like cosine similarity or Euclidean distance.
   - Use efficient algorithms (e.g., brute-force search or KD-Tree) to compare vectors.

2. **Return top-k similar vectors**:
   - Return the most similar vectors based on the similarity scores.

### Step 4: Update and Delete Operations
1. **Add a method to update vectors**:
   - Implement `update_vector(user_id, new_vector)` for modifying stored data.
2. **Add a method to delete vectors**:
   - Implement `delete_vector(user_id)` for data removal.

### Step 5: Persistence and Data Storage
1. **Serialize vectors to disk**:
   - Use libraries like `pickle` or `json` to save and load the database.
2. **Load vectors from disk**:
   - Ensure there is a method to reload the database state for reuse.

### Step 6: Integrate with the Recommender System
1. **Prepare movie data**:
   - Convert movie attributes and user interactions into vectors (e.g., through embeddings or feature engineering).
2. **Query the vector database**:
   - Use `find_similar()` to identify users or movies with similar preferences.
3. **Generate recommendations**:
   - Rank movies by similarity and provide recommendations to the user.

### Step 7: Test and Optimize
1. **Benchmark performance**:
   - Test the database with different sizes of data and evaluate response time.
2. **Optimize data structures**:
   - Use `numpy` arrays or libraries like `scipy` for efficiency in vector operations.
3. **Improve search algorithms**:
   - Implement faster approximate nearest neighbor search methods (e.g., locality-sensitive hashing).

In [None]:
from scipy.spatial import KDTree
import numpy as np
import pickle

from typing import Dict, List, Tuple, Union


class VectorDB:
    """
    Vector Database implementation, intended to store user data for a recommendation engine. 
    Example use cases: streaming media recommendations, product recommendations, etc.
    """


    def __init__(self) -> None:
        """Initialize the Vector Database."""
        self.vector_dict: Dict[str, np.ndarray] = {}  # Maps user ID to vectors
        self.kd_tree: Union[KDTree, None] = None
        self.metadata: List[str] = []  # Store user IDs in KD-Tree order
        self.is_tree_stale: bool = True  # Track if KD-Tree needs rebuilding

    def add_vector(self, user_id: str, vector: np.ndarray) -> None:
        """Add a vector with associated metadata (user_id)."""
        if not isinstance(vector, np.ndarray):
            raise ValueError("Vector must be a numpy array.")
        if vector.ndim != 1:
            raise ValueError("Only 1D vectors are supported.")
        
        self.vector_dict[user_id] = vector
        self.is_tree_stale = True  # Mark tree as stale

    def _rebuild_kd_tree(self) -> None:
        """Rebuild the KD-Tree from the current vectors."""
        if self.is_tree_stale:
            if self.vector_dict:
                self.metadata = list(self.vector_dict.keys())
                self.kd_tree = KDTree(list(self.vector_dict.values()))
            else:
                self.kd_tree = None
            self.is_tree_stale = False

    def find_similar(self, query_vector: np.ndarray, top_k: int = 5) -> List[Tuple[str, float]]:
        """Find the top-k most similar vectors using Euclidean distance."""
        if not isinstance(query_vector, np.ndarray):
            raise ValueError("Query vector must be a numpy array.")
        if query_vector.ndim != 1:
            raise ValueError("Only 1D vectors are supported.")

        self._rebuild_kd_tree()  # Ensure KD-Tree is up-to-date

        if not self.kd_tree:
            raise ValueError("KD-Tree is empty. Add vectors first.")

        # Adjust top_k if it exceeds the number of stored vectors
        top_k = min(top_k, len(self.metadata))

        distances, indices = self.kd_tree.query(query_vector, k=top_k)

        # Handle case when only one result is returned (convert to list)
        if isinstance(indices, int):  # Happens if k=1
            distances = [distances]
            indices = [indices]

        # Collect user IDs and distances
        results: List[Tuple[str, float]] = [(self.metadata[idx], distances[i]) for i, idx in enumerate(indices)]
        return results

    def update_vector(self, user_id: str, new_vector: np.ndarray) -> bool:
        """Update an existing vector for a given user ID."""
        if user_id in self.vector_dict:
            if not isinstance(new_vector, np.ndarray):
                raise ValueError("Vector must be a numpy array.")
            if new_vector.ndim != 1:
                raise ValueError("Only 1D vectors are supported.")
            
            self.vector_dict[user_id] = new_vector
            self.is_tree_stale = True  # Mark tree as stale
            return True
        return False

    def delete_vector(self, user_id: str) -> bool:
        """Delete a vector by its user ID."""
        if user_id in self.vector_dict:
            del self.vector_dict[user_id]
            self.is_tree_stale = True  # Mark tree as stale
            return True
        return False

    def save_to_file(self, filename: str = "vector_db.pkl") -> None:
        """Serialize the database and save it to a file."""
        with open(filename, 'wb') as file:
            pickle.dump(self, file)

    @staticmethod
    def load_from_file(filename: str = "vector_db.pkl") -> 'VectorDB':
        """Load a serialized database from a file."""
        with open(filename, 'rb') as file:
            return pickle.load(file)


In [None]:
class RecommenderSystem:
    def __init__(self) -> None:
        """Initialize the Recommender System with user and item databases."""
        self.user_db = VectorDB()
        self.item_db = VectorDB()

    def add_user(self, user_id: str, user_vector: np.ndarray) -> None:
        """Add a user vector to the user database."""
        self.user_db.add_vector(user_id, user_vector)

    def add_item(self, item_id: str, item_vector: np.ndarray) -> None:
        """Add an item vector to the item database."""
        self.item_db.add_vector(item_id, item_vector)

    def recommend_content_based(self, user_id: str, top_k: int = 5) -> List[Tuple[str, float]]:
        """Recommend items based on similarity to a user's preferences."""
        user_vector = self.user_db.vector_dict.get(user_id)
        if user_vector is None:
            raise ValueError(f"User ID {user_id} not found.")
        return self.item_db.find_similar(user_vector, top_k)

    def recommend_collaborative(self, user_id: str, top_k_users: int = 5, top_k_items: int = 5) -> List[Tuple[str, float]]:
        """Recommend items based on preferences of similar users."""
        user_vector = self.user_db.vector_dict.get(user_id)
        if user_vector is None:
            raise ValueError(f"User ID {user_id} not found.")

        similar_users = self.user_db.find_similar(user_vector, top_k=top_k_users)

        aggregated_item_vectors = {}
        for similar_user_id, _ in similar_users:
            for item_id, item_vector in self.item_db.vector_dict.items():
                if item_id not in aggregated_item_vectors:
                    aggregated_item_vectors[item_id] = item_vector

        temp_item_db = VectorDB()
        for item_id, vector in aggregated_item_vectors.items():
            temp_item_db.add_vector(item_id, vector)

        return temp_item_db.find_similar(user_vector, top_k=top_k_items)

In [None]:
# Initialize the recommender system
recommender = RecommenderSystem()

# Sample user vectors (e.g., embedding of user preferences)
user_vectors = {
    "user_1": np.array([0.2, 0.3, 0.5]),
    "user_2": np.array([0.1, 0.4, 0.7]),
    "user_3": np.array([0.8, 0.1, 0.1]),
}

# Sample item vectors (e.g., embedding of item features)
item_vectors = {
    "item_1": np.array([0.25, 0.35, 0.55]),
    "item_2": np.array([0.15, 0.45, 0.65]),
    "item_3": np.array([0.9, 0.2, 0.2]),
    "item_4": np.array([0.1, 0.8, 0.5]),
}

# Add users and items to the database
for user_id, vector in user_vectors.items():
    recommender.add_user(user_id, vector)

for item_id, vector in item_vectors.items():
    recommender.add_item(item_id, vector)

# Test content-based recommendations for user_1
print("Content-Based Recommendations for user_1:")
print(recommender.recommend_content_based("user_1"))

# Test collaborative recommendations for user_1
print("\nCollaborative Recommendations for user_1:")
print(recommender.recommend_collaborative("user_1"))


In [2]:
# Example usage
if __name__ == "__main__":
    db = VectorDB()
    db.add_vector('user1', np.array([1.0, 2.0, 3.0]))
    db.add_vector('user2', np.array([4.0, 5.0, 6.0]))
    db.add_vector('user3', np.array([7.0, 8.0, 9.0]))

    print("Updating user2's vector:")
    update_success = db.update_vector('user2', np.array([4.1, 5.1, 6.1]))
    print("Update successful:", update_success)

    print("\nDeleting user1's vector:")
    delete_success = db.delete_vector('user1')
    print("Delete successful:", delete_success)

    print("\nVectors remaining in the database:")
    print(db.vector_dict.keys())

Updating user2's vector:
Update successful: True

Deleting user1's vector:
Delete successful: True

Vectors remaining in the database:
dict_keys(['user2', 'user3'])
