<a href="https://colab.research.google.com/github/ramkumardeepak774/similarity_Search_using_Faiss/blob/main/replace_and_save_index_in_faiss.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# To store a Faiss index in local, you can use the bytea data type to store the binary data of the index as a blob
import psycopg2
import numpy as np

In [None]:
!pip install faiss-cpu


Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting faiss-cpu
  Downloading faiss_cpu-1.7.4-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m17.6/17.6 MB[0m [31m33.4 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: faiss-cpu
Successfully installed faiss-cpu-1.7.4


In [None]:
import faiss


In [None]:
# create an index
d = 64  # dimension
n = 100000  # number of vectors
xb = faiss.randn((n, d)).astype('float32')
index = faiss.IndexFlatL2(d)
index.add(xb)

In [None]:
# save the index to a binary file
faiss.write_index(index, "my_index.bin")


In [None]:

# load the index from the binary file
index2 = faiss.read_index("my_index.bin")

In [None]:

# perform a query on the loaded index
xq = np.random.randn(1, d).astype('float32')
D, I = index2.search(xq, k=10)

print(I)


[[92839 60384 29813 68657 63557  6162 88290 60331 19761 12692]]


We can update the index in Faiss by performing a search for the vector and then using its resulting ID to update the index.



In [None]:

# generate new vector to replace existing vector
new_x = faiss.randn((1, d)).astype('float32')

In [None]:
new_x

array([[ 0.10883749, -0.459318  ,  0.7515723 , -0.1816421 , -0.9883846 ,
         0.171899  , -0.70024973, -0.3089309 ,  1.0889151 , -0.904204  ,
        -1.6269528 , -0.6302848 , -1.7894124 , -0.42198652, -0.45540377,
        -1.6518626 ,  0.17468624,  2.7622788 ,  0.93555003,  0.31033963,
        -0.1137654 , -1.134701  , -1.4693938 , -1.4257497 ,  0.957228  ,
        -0.84074134,  0.89626247, -0.29311594, -0.98129076, -1.499216  ,
         1.2344803 , -1.9490647 , -1.9177946 , -0.04481186,  1.4501251 ,
        -1.3866824 ,  0.46425596, -0.559743  , -1.8414608 ,  0.13951156,
        -0.48940092, -0.08166508,  0.32464248,  0.19947006,  0.08602966,
         0.5737665 , -0.5551689 ,  0.3944754 ,  1.4938854 ,  0.43972817,
        -0.2293019 ,  1.0396273 ,  0.01311562,  0.3182227 , -0.67191553,
        -0.42357767,  1.9767416 ,  0.29624656, -0.4781985 , -0.427606  ,
        -0.23074393,  1.0324576 , -1.6079968 , -0.77585363]],
      dtype=float32)

In [None]:
# search for the nearest neighbor to new vector
D, I = index.search(new_x, k=1)
print(I)

[[0]]


In [None]:
# get the ID of the nearest neighbor
id_to_replace = int(I[0][0])
print(id_to_replace)

0


In [None]:
# replace the vector in the index
index.remove_ids(faiss.IDSelectorBatch([id_to_replace]))
index.add(new_x)

In [None]:
# perform a query
xq = faiss.randn((1, d)).astype('float32')
D, I = index.search(xq, k=10)
print(I)

[[99999 99165 75985 86701  1214 51502 68116 45318  2953 46866]]


Can i separate the cluster of index and store this into disc? If i can do how to do that? **bold text**

In [None]:
# create an index with 4 clusters
d = 128  # dimension of the vectors
nlist = 4  # number of clusters
quantizer = faiss.IndexFlatL2(d)  # the quantizer
index = faiss.IndexIVFFlat(quantizer, d, nlist)

In [None]:
# add some vectors to the index
x = faiss.randn((1000, d)).astype('float32')
index.train(x)
index.add(x)
