# Create a vector database from embeddings

_Note_: In the actual implementation to find similar cyclists we stuck with the collaborative learner model object that we stored on AWS S3. Creating a vector database (e.g. with FAISS or Pinecone) consists of a more advanced alternative. Once the algorithm finds the optimized embeddings, you could indeed put those into a vector database for easier management and similarity search. You would add the code for vector database creation and updating to `scripts/train.py`. This brief notebook is a primer to such a solution.

## Imports

In [None]:
import faiss
import numpy as np
import pandas as pd
from fastai.collab import load_learner

## Create vector db

In [None]:
learn = load_learner("../data/learner.pkl")

In [None]:
vectors = learn.model.u_weight.weight.detach().numpy()  # cyclist embeddings

In [None]:
faiss.normalize_L2(vectors)
index = faiss.IndexFlatL2(vectors.shape[1])
index.add(vectors)

In [None]:
# faiss.write_index(index, "../api/faiss_cyclists.index")
# index = faiss.read_index("../api/faiss_cyclists.index")

In [None]:
search_vector = vectors[2628, :]  # Wout van Aert

_vector = np.array([search_vector])
faiss.normalize_L2(_vector)

distances, ann = index.search(_vector, k=index.ntotal)

results = pd.DataFrame({"distances": distances[0],
                        "ann": ann[0],
                        "cyclist": learn.dls.classes["rider"][ann[0]]})
results