Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FR] 支持 HNSWx 模式构建索引(实验性) #12

Open
1 task
soutubot opened this issue Jan 6, 2022 · 1 comment
Open
1 task

[FR] 支持 HNSWx 模式构建索引(实验性) #12

soutubot opened this issue Jan 6, 2022 · 1 comment
Labels
wontfix This will not be worked on

Comments

@soutubot
Copy link
Contributor

soutubot commented Jan 6, 2022

  • 支持 HNSWx(IndexHNSWFlat) 模式构建索引(实验性)
    其优点:基于图检索的改进方法,检索速度极快,10亿级别秒出检索结果,而且召回率几乎可以媲美Flat,能达到惊人的97%。检索的时间复杂度为loglogn,几乎可以无视候选向量的量级了。并且支持分批导入,极其适合线上任务,毫秒级别体验。(来自网传)
    其缺点:构建索引极慢,占用内存极大(是Faiss中最大的,大于原向量占用的内存大小)
@Aloxaf Aloxaf added wontfix This will not be worked on and removed wontfix This will not be worked on labels Feb 16, 2022
@Aloxaf
Copy link
Contributor

Aloxaf commented Feb 16, 2022

If you have a lots of RAM or the dataset is small, HNSW is the best option, it is a very fast and accurate index. The 4 <= M <= 64 is the number of links per vector, higher is more accurate but uses more RAM. The speed-accuracy tradeoff is set via the efSearch parameter. The memory usage is (d * 4 + M * 2 * 4) bytes per vector.

根据 faiss wiki 的说法,HNSW 仅用于内存很大且数据集很小的情况。按这段文字所给的数据,取 M = 16,则每个向量会占用 160byte,这是当前方法内存占用的 5 倍。

不过代替 Flat 用作 quantizer 倒是可以考虑。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

2 participants