Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding angular distance for nn search #234

Closed
vgoat21 opened this issue Aug 21, 2023 · 5 comments
Closed

Adding angular distance for nn search #234

vgoat21 opened this issue Aug 21, 2023 · 5 comments

Comments

@vgoat21
Copy link

vgoat21 commented Aug 21, 2023

I was wondering if it would be possible to add angular distance for nn search. I am currently relying on cosine distance as an alternative. The reason behind this is the limitation of cosine distance when dealing with small angles.

image

It's apparent that cosine distance tends to produce very similar values for small angles, making it challenging to distinguish the subtle differences in vector orientations. This leads to some ambiguity in these intervals which (at least in my application) is not the most appropriate to use.

Angular distance is sensitive not only to the direction but also to the magnitude of vectors. As depicted in the diagram, it yields more distinguishable values within these intervals, thus making it easier to assess the proximity of two vectors accurately.

In my application (and most likely many others) it is crucial to include these subtle differences to get a more accurate 'closeness'.

I think this would be a great addition to a very useful extension.

@phobrain
Copy link

It looks potentially feasible to add such functions in

https://github.com/pgvector/pgvector/blob/master/src/vector.c

which I was just considering for Poincare disk distances as in

https://arxiv.org/pdf/1705.08039.pdf

and since I find it empirically allows more fine-grained histogram distance orderings, I wonder naively if the Poincare sphere would be useful in your app.

Authors, is adding a new distance function as simple as l2_distance() seems in vector.c?

@jkatz
Copy link
Contributor

jkatz commented Aug 23, 2023

is adding a new distance function as simple as l2_distance() seems in vector.c

Yes, it's relatively trivial, though note that (1) optimizing how the distance calculation is handled is key to many of the performance characteristics and (2) you need to set up the appropriate operator classes to allow it to be indexable.

For this, I'd be curious if you're noticing these results currently in pgvector. For ivfflat, vectors are normalized before they are stored in the index, so magnitude is not a factor.

@phobrain
Copy link

Have any functions been tried and abandoned for speed reasons?

@ankane
Copy link
Member

ankane commented Sep 2, 2023

Hi @vgoat21, thanks for the suggestion. I may include this at some point, but for now, you can do:

CREATE FUNCTION angular_distance(a vector, b vector) RETURNS float8
    LANGUAGE SQL IMMUTABLE STRICT PARALLEL SAFE
    RETURN acos(1 - cosine_distance(a, b)) / pi();

It's also possible to enable indexing with the commands below, but there's no guarantee it won't break in a future release.

Edit: It looks like indexing only works if the function is defined in C for some reason.

Edit 2: It looks like indexing works if the function is defined in plpgsql (but again, indexing isn't officially supported and could break in the future)

CREATE FUNCTION angular_distance(vector, vector) RETURNS float8
    AS 'BEGIN RETURN acos(1 - cosine_distance($1, $2)) / pi(); END;'
    LANGUAGE plpgsql IMMUTABLE STRICT PARALLEL SAFE;
CREATE OPERATOR <@> (
    LEFTARG = vector, RIGHTARG = vector, PROCEDURE = angular_distance,
    COMMUTATOR = '<@>'
);

CREATE OPERATOR CLASS vector_angular_ops
    FOR TYPE vector USING ivfflat AS
    OPERATOR 1 <@> (vector, vector) FOR ORDER BY float_ops,
    FUNCTION 1 vector_spherical_distance(vector, vector),
    FUNCTION 2 vector_norm(vector),
    FUNCTION 3 vector_spherical_distance(vector, vector),
    FUNCTION 4 vector_norm(vector);

CREATE OPERATOR CLASS vector_angular_ops
    FOR TYPE vector USING hnsw AS
    OPERATOR 1 <@> (vector, vector) FOR ORDER BY float_ops,
    FUNCTION 1 vector_spherical_distance(vector, vector),
    FUNCTION 2 vector_norm(vector);

and

CREATE INDEX ON items USING ivfflat (embedding vector_angular_ops) WITH (lists = 100);
-- or
CREATE INDEX ON items USING hnsw (embedding vector_angular_ops);

@ankane ankane closed this as completed Sep 2, 2023
@ankane
Copy link
Member

ankane commented Sep 2, 2023

Added the function in the angular_distance branch and to the list of ideas (#27).

@ankane ankane mentioned this issue Sep 2, 2023
33 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

4 participants