Question about R, L parameter setup. #26

HannanKan · 2021-10-21T02:30:54Z

According to README, degree of graph index is recommended to set between 60 and 150 and size of search list during index building is recommended to set between 75 and 200. What is the reason of giving such reference value range?

ShikharJ · 2021-10-21T06:15:48Z

@HannanKan These are just some empirical figures that we've found to work well for some general datasets. If you want a more specific answer, then the theory of Graph ANNS dictates that the R should be greater than or equal to the intrinsic dimensionality of the dataset. Here the intrinsic dimensionality is a tricky notion, because it is subjective to the way in which you want to measure dimensionality. It can be through PCA, or some other measure of choice. For most human-generated (and some machine-generated) datasets, the intrinsic dimensionality is much lower than the actual dimensionality, and as such, graph ANNS algorithms give much better performance compared to other algos which do not make use of this information.

As to what value of L is suitable, we've only found this through our experiments that a value between 1.5x to 2x the value of R suffices for most datasets. You can always increase L, as much as you want, but then you'll only be able to achieve diminishing returns, while the build time blows up.

Hope this helps.

harsha-simhadri · 2021-10-23T14:44:33Z

According to README, degree of graph index is recommended to set between 60 and 150 and size of search list during index building is recommended to set between 75 and 200. What is the reason of giving such reference value range?

To add to Shikhar's points, since there is an element of randomness to the graph construction, the degree needs to be at least log n (n=#pts in index), and we find that 2log n to 4 log n is suitable. So for a billion points, 64 to 128 is reasonable.

The search list size to hit a recall is a function of "hardness" of the dataset and Shikhar has given some idea of how hardness could be quantified. However, he empirically try a few values to see what matches scenario requirements. For datasets like BIGANN-1B, DEEP-1B, this parameter size corresponds to reasonable results with a build time around 4 days

HannanKan · 2021-10-25T12:34:50Z

Grateful for your explanation @ShikharJ @harsha-simhadri. Why does increasing L in building phase leads to better search performance(recall and qps)?

My understanding is: large L result in large visited set V (returned by GreedySearch function). And large V increases the probability of introducing long range edges (in RobustPrune function)

Is that right?

ShikharJ · 2022-01-10T06:57:32Z

@HannanKan Sorry for the late reply. Yes this is correct.

ShikharJ · 2022-02-15T11:36:42Z

Closing for now.

ShikharJ closed this as completed Feb 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about R, L parameter setup. #26

Question about R, L parameter setup. #26

HannanKan commented Oct 21, 2021

ShikharJ commented Oct 21, 2021

harsha-simhadri commented Oct 23, 2021

HannanKan commented Oct 25, 2021

ShikharJ commented Jan 10, 2022

ShikharJ commented Feb 15, 2022

Question about R, L parameter setup. #26

Question about R, L parameter setup. #26

Comments

HannanKan commented Oct 21, 2021

ShikharJ commented Oct 21, 2021

harsha-simhadri commented Oct 23, 2021

HannanKan commented Oct 25, 2021

ShikharJ commented Jan 10, 2022

ShikharJ commented Feb 15, 2022