# Creating and searching against vector databases with TM-Vec

To form protein databases that are easily stored using vector embeddings, we will:
1. Generate a DB of protein vectors
2. Convert our output to a FAISS DB (for search)
3. Search against our DB and plot the results

In [None]:
# import necessary functions
import skbio
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

## Building a vector database

We can directly feed our FASTA file into the tmvec build_db __CLI__ function, which will output our 
vectors as a .npz file in the specified directory.

This function takes in as an input:
1. --input-fasta: A FASTA file containing your sequences.
2. --output: the file location to output to.

In [None]:
!tmvec build-db --input-fasta bagel.fa --output test_db/bagel_fasta

## Plot the ordination results

Now, with our FAISS DB in hand, we can use the search __CLI__ function to search for proteins against our database, and return the k-nearest neighbors results.  

Finally, we can utilize the embed_vec_to_ordination function to create ordination objects from our search and plot them.

In [None]:
!tmvec search --input-fasta bagel.fa --database test_db/bagel_fasta.npz --output test_db/bagel_search_results --output-fmt skbio

In [None]:
df = pd.read_csv("test_db/bagel_search_results/results.tsv", sep='\t')

# Show the first few rows of the DataFrame
print(df.head())

# TO-DO - Plotting