# FAISS for UTF-8 Multihot Decoding

This notebook intends to learn how to use faiss and check if it is usefull to handle the decoding (cosine similarity ?) of embedding vectors into code indices of the codebook.


In [1]:
import numpy as np
import itertools
from itertools import combinations
import faiss
from utf8_encoder import *

Loading faiss with AVX2 support.
Loading faiss.


Load the pre-computed utf-8 codes and codebook


In [2]:
utf8codebook = load_obj("utf8-codes/multihot64short-embeds")
idx2char = load_obj("utf8-codes/multihot64short-idx2char")
char2idx = load_obj("utf8-codes/multihot64short-char2idx")

In [3]:
utf8codebook.dtype

dtype('float32')

In [4]:
d=64
indexl2 = faiss.IndexFlatL2(d)
# faiss.index_factory()  # <- this function for index creation

from the documentation, maybe will be faster for decoding to use an IVF (Inverse Vector File) index type

Also it might be good to have it in CPU as the GPU operations might not be needed for training or testing, the decoding can be done just when needed to create a text reconstruction for user visualization. Taking into account that I have much more RAM tahn GPU-RAM (a factor of 8) this would be a nice thing.

IVF indices need training

Product Quantization (PQ) could be used and might be good, this is because the input are binary elements, and the outputs of the network are float32. If quantized, the most important decimals of the vector output might be good enough to recognize similarity (just an idea, I don't know if this will be true).

Might need *OPQ rotation* and|or *RemapDimensionsTransform* (from documentation) to improve the PQ coding (transformations can be trained) and *rq = faiss.IndexRefineFlat(q)* to refine ranking once pre-ranked

From documentation:

    "The IVFFlat is often the fastest option, so the PQ variants are useful if memory is limited."



The other nice thing of this is that we can compute the closest K nearest neighbours, which can give several benefits for decoding.

I could try to do iterative decoding, several passes for the same input, each giving a result, this result passing it again, and again, this might lead to some nice surprises (estabilizing the result when there are doubts or different options??)

In [5]:
indexl2.add(utf8codebook)

In [None]:
# what I should do is create several indices and test the decoding in all of those
# for performance to see which one is better in precision, memory usage and speed

In [None]:
# using the GPU
res = faiss.StandardGpuResources()  # use a single GPU
gpu_index_flat = faiss.index_cpu_to_gpu(res, 0, indexl2)


In [None]:
print(gpu_index_flat.ntotal)


The index seems quite big here
Mem usage of the GPU is 3028 MiB with the index in GPU

Mem usage of the GPU without the index: 1573MiB

So total memory usage of the index is: 1455 MiB for the IndexFlatL2 


In [None]:
3028 - 1573