Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Where are sub-quantizers stored? #1516

Closed
2 tasks
WenqiJiang opened this issue Nov 9, 2020 · 6 comments
Closed
2 tasks

Where are sub-quantizers stored? #1516

WenqiJiang opened this issue Nov 9, 2020 · 6 comments
Labels

Comments

@WenqiJiang
Copy link

Summary

Hi, I trained an IVFPQ index and didn't found where the fine-grained / sub quantizers, i.e, the quantizers for each Voronoi cells, are stored. As in the images below, the saved indexes only include a coarse-grained quantizer, and a PQ codebook (1B vector * (8byte code ID + 16 byte PQ code) = 24 GB).

image

If I list the attributes and methods of "index", there's no information about sub-quantizers as well. Where should I find it?

nlist = 1024
m = 8
k = 5
d = 64

coarse_quantizer = faiss.IndexFlatL2(d)
index = faiss.IndexIVFPQ(coarse_quantizer, d, nlist, m, 8)
invlists = index.invlists

dir(invlists) 

['__class__', '__del__',  '__delattr__', '__dict__',  '__dir__', '__doc__', '__eq__',  '__format__', '__ge__', 
 '__getattr__', '__getattribute__',  '__gt__', '__hash__',  '__init__', '__init_subclass__',  '__le__',
 '__lt__',  '__module__', '__ne__',  '__new__', '__reduce__', '__reduce_ex__',  '__repr__', '__setattr__', '__sizeof__',
 '__str__', '__subclasshook__',  '__swig_destroy__', '__swig_getmethods__', '__swig_setmethods__',  '__weakref__',
 'add_entries',  'add_entry', 'code_size', 'compute_ntotal', 'get_codes', 'get_ids', 'get_single_code',
 'get_single_id', 'imbalance_factor',  'list_size', 'merge_from', 'nlist', 'prefetch_lists', 'print_stats',
 'release_codes', 'release_ids', 'reset', 'resize', 'this', 'update_entries', 'update_entry']

Running on:

  • CPU

Interface:

  • Python
@zhou0913meng
Copy link

In the line #53 of faiss/impl/productquantizer.h,
/// Centroid table, size M * ksub * dsub
std::vector centroids;

@zhoutong-fu
Copy link

You can find the example here

@mdouze
Copy link
Contributor

mdouze commented Nov 10, 2020

It is in the field index.pq, and you can get the centroids using the code pointed to by @zhoutong-fu .

@WenqiJiang
Copy link
Author

Thanks all for your answer :)

It works for me now. Just a quick additional question, does every Voronoi cell in IVFPQ shares the same sub-quantizers?

nlist = 1024
m = 8
kbits = 8 # 2^8 = 256
d = 64

coarse_quantizer = faiss.IndexFlatL2(d)
index = faiss.IndexIVFPQ(coarse_quantizer, d, nlist, m, kbits)

xb = faiss.rand((10000, d), 1234)
index.train(xb)
index.add(xb)

def get_centroids(index):
    pq = index.pq
    # read the PQ centroids
    cen = faiss.vector_to_array(pq.centroids)
    cen = cen.reshape(pq.M, pq.ksub, pq.dsub)
    
    return cen

# get PQ centroids
cen = get_centroids(index)
print(cen.shape)

and I get

(8, 256, 8)

Does that indicate we only have one set of sub-quantizers for every cell? So that during the search, we first do (query_vector - centroid_vector), and then use the remainder to determine the PQ distance in that cell.

@mdouze
Copy link
Contributor

mdouze commented Nov 10, 2020

Yes there is a single set of subquantizers.
For the record, different sub-quantizers have been evaluated in

https://openaccess.thecvf.com/content_cvpr_2014/html/Kalantidis_Locally_Optimized_Product_2014_CVPR_paper.html

but this is not implemented in Faiss. The memory overhead is big and it gives more accurate results, especially for recalls at higher ranks.

@WenqiJiang
Copy link
Author

Thank you for that :) Closing this problem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants