HttpANN algorithm to support language-angostic implementations (Re: Issue #20) #35

alexklibisz · 2021-09-26T21:36:10Z

Thanks @maumueller, @gosha1128 and others for the fruitful discussion over in #20. I think I've arrived at an implementation that could fit the purpose of language-agnostic (big) ANN.

The HttpANN algorithm is designed to make HTTP calls to a server. The server executes all indexing and querying, thus enabling language-agnostic ANN implementations with minimal overhead. The only requirements for the server are:

It should implement the JSON-over-HTTP API documented below (copied from httpann.py). Note that this is a 1:1 copy of the BaseANN Python Class API.
It should be able to read the vector dataset in the standard binary format used by this competition.

It could in theory even run remotely, although the intended use-case is that the server runs in the same container.

The overhead for data transfer and serialization is minimal. The server only needs to parse the 10k JSON-encoded query vectors and encode the resulting 10k lists of neighbors.

I also included an example implementation which uses scikit-learn. It's too slow for the large datasets, but it works on the smaller random-xs and random-range-xs. So it should be good enough to demonstrate that this algorithm works.

Here is the API that a server must implement:

Method	Route	Request Body	Expected Status	Response Body
POST	/init	dictionary of constructor arguments, e.g., {“metric”: “euclidean”, “dimension”: 99 }	200	{ }
POST	/load_index	{ "dataset": <dataset name, e.g. "bigann-10m"> }	200	{ "load_index": }
POST	/set_query_arguments	dictionary of query arguments	200	{ }
POST	/query	{ “X”: , “k”: }	200	{ }
POST	/range_query	{ “X”: , “radius”: }	200	{ }
POST	/get_results	{ }	200	{ “get_results”: }
POST	/get_additional	{ }	200	{ “get_additional”: }
POST	/get_range_results	{ }	200	{ “get_range_results”: <list of three 1-dimensional lists (lims, I, D)> }

alexklibisz · 2021-09-26T21:50:29Z

algos.yaml

@@ -90,6 +110,17 @@ deep-10M:
              "nprobe=2,quantizer_efSearch=8",
              "nprobe=4,quantizer_efSearch=4",
              "nprobe=2,quantizer_efSearch=16"]
+    diskann-t2:


There were two sections called "deep-10M" so I just moved the "diskann-t2" section up here to deduplicate the sections.

alexklibisz · 2021-09-26T22:20:46Z

~~It looks like CI is failing because the fit method is never called. I think I can use the --rebuild flag to force this, but how is it working for the other algos without this flag?~~ Nevermind, it looks like I need to just return false from load_index.

maumueller

This looks very nice, thanks @alexklibisz! Have you been able to measure the overhead vs a naive implementation?

My only suggestion here is to split up httpann.py into base-http.py (or whatever you deem to fit) and http-example-sklearn.py to show the interaction between the base http wrapper and the actual implementation that you suggest others to use.

maumueller · 2021-09-27T07:13:21Z

benchmark/algorithms/httpann.py

+    def query(self, X, k):
+        body = dict(X=[arr.tolist() for arr in X], k=k)
+        self.post("query", body, 200)
+
+    def range_query(self, X, radius):
+        body = dict(X=[arr.tolist() for arr in X], radius=radius)
+        self.post("range_query", body, 200)


I'm wondering what the performance penalty of this will be. I would be happy to

change the arguments such that the query vector file is exposed

or add some kind of prepare for providing the query vectors

I've done some local testing, and I think it will be fine since the HTTP overhead and JSON serialization is only incurred one time. It was a bigger deal with ann-benchmarks because that framework required a request and serialization for every query.

alexklibisz · 2021-09-27T13:16:56Z

My only suggestion here is to split up httpann.py into base-http.py (or whatever you deem to fit) and http-example-sklearn.py to show the interaction between the base http wrapper and the actual implementation that you suggest others to use.

Sounds good. I'll break it into two files.

…ent.

maumueller · 2021-09-28T11:11:54Z

Great, thanks. Is this ready to be merged?

alexklibisz · 2021-09-28T12:01:06Z

I think so. Please squash if you can, as there are several intermediate/incomplete commits in there.

alexklibisz · 2021-09-29T12:35:42Z

I'll resolve the conflicts and also need to add one bit of documentation.. one moment..

maumueller · 2021-10-04T07:12:41Z

Sorry for not getting back to you early. Shall I squash and merge with main?

alexklibisz · 2021-10-04T15:43:05Z

No problem. Yes, good to go. Thanks!

maumueller · 2021-10-04T18:32:37Z

Looking forward to how this is going to be used. Thanks again.

alexklibisz added 10 commits September 19, 2021 21:09

WIP: half-baked stdin/stdout impl

3fe64d9

Runs end-to-end without crashing

ee235d5

Refactored and runs, but runs out of memory

211f707

Runs end-to-end with plausible results. Better comments and docs.

b851ccb

Setup script for range queries, but still need to implement them.

7055de7

model -> algorithm

3b60e19

Test script runs end to end. Might be ready for PR.

e9ab62b

Cleanup

6ae3d3b

Cleanup

3f3a9a8

Set library to match dockerfile name

a10ca84

alexklibisz commented Sep 26, 2021

View reviewed changes

More permissive load_index

49ae661

Return False from load_index

28db0af

alexklibisz changed the title ~~HTTPAnn algorithm to support language-angostic implementations (Re: Issue #20)~~ HttpANN algorithm to support language-angostic implementations (Re: Issue #20) Sep 27, 2021

maumueller self-requested a review September 27, 2021 07:02

maumueller self-assigned this Sep 27, 2021

maumueller reviewed Sep 27, 2021

View reviewed changes

Split into httpann.py and httpann_example.py. Made names more consist…

b953d6e

…ent.

alexklibisz added 2 commits September 29, 2021 08:38

Merge branch 'main' into language-agnostic

9e4d92b

Add doc row for /fit endpoint

c0d0a45

rakri force-pushed the main branch from a92e2d9 to 8180e0e Compare September 29, 2021 16:54

alexklibisz added 2 commits October 3, 2021 15:37

Use var args syntax for query_args

5fb34ce

Merge branch 'main' into language-agnostic

04cbeb7

maumueller merged commit 455aadc into harsha-simhadri:main Oct 4, 2021

maumueller mentioned this pull request Oct 4, 2021

Support for non-python implementations #20

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HttpANN algorithm to support language-angostic implementations (Re: Issue #20) #35

HttpANN algorithm to support language-angostic implementations (Re: Issue #20) #35

alexklibisz commented Sep 26, 2021 •

edited

alexklibisz Sep 26, 2021

alexklibisz commented Sep 26, 2021 •

edited

maumueller left a comment •

edited

maumueller Sep 27, 2021

alexklibisz Sep 27, 2021

alexklibisz commented Sep 27, 2021

maumueller commented Sep 28, 2021

alexklibisz commented Sep 28, 2021

alexklibisz commented Sep 29, 2021

maumueller commented Oct 4, 2021

alexklibisz commented Oct 4, 2021

maumueller commented Oct 4, 2021

HttpANN algorithm to support language-angostic implementations (Re: Issue #20) #35

HttpANN algorithm to support language-angostic implementations (Re: Issue #20) #35

Conversation

alexklibisz commented Sep 26, 2021 • edited

alexklibisz Sep 26, 2021

Choose a reason for hiding this comment

alexklibisz commented Sep 26, 2021 • edited

maumueller left a comment • edited

Choose a reason for hiding this comment

maumueller Sep 27, 2021

Choose a reason for hiding this comment

alexklibisz Sep 27, 2021

Choose a reason for hiding this comment

alexklibisz commented Sep 27, 2021

maumueller commented Sep 28, 2021

alexklibisz commented Sep 28, 2021

alexklibisz commented Sep 29, 2021

maumueller commented Oct 4, 2021

alexklibisz commented Oct 4, 2021

maumueller commented Oct 4, 2021

alexklibisz commented Sep 26, 2021 •

edited

alexklibisz commented Sep 26, 2021 •

edited

maumueller left a comment •

edited