Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distance measures for dense and sparse vectors #37947

Merged
merged 11 commits into from Feb 20, 2019

Conversation

Projects
None yet
8 participants
@mayya-sharipova
Copy link
Contributor

commented Jan 29, 2019

Introduce painless functions of
cosineSimilarity and dotProduct distance
measures for dense and sparse vector fields.

{
  "query": {
    "script_score": {
      "query": {
        "match_all": {}
      },
      "script": {
        "source": "cosineSimilarity(params.queryVector, doc['my_dense_vector'])",
        "params": {
          "queryVector": [4, 3.4, -1.2]
        }
      }
    }
  }
}
{
  "query": {
    "script_score": {
      "query": {
        "match_all": {}
      },
      "script": {
        "source": "cosineSimilaritySparse(params.queryVector, doc['my_sparse_vector'])",
        "params": {
          "queryVector": {"2": 0.5, "10" : 111.3, "50": -1.3, "113": 140.8, "4545": 111.0}
        }
      }
    }
  }
}

Closes #31615

Distance measures for dense and sparse vectors
Introduce painless functions of
cosineSimilarity and dotProduct distance
measures for dense and sparse vector fields.

```js
{
  "query": {
    "script_score": {
      "query": {
        "match_all": {}
      },
      "script": {
        "source": "cosineSimilarity(params.queryVector, doc['my_dense_vector'].value)",
        "params": {
          "queryVector": [4, 3.4, -1.2]
        }
      }
    }
  }
}
```

```js
{
  "query": {
    "script_score": {
      "query": {
        "match_all": {}
      },
      "script": {
        "source": "cosineSimilaritySparse(params.queryVector, doc['my_sparse_vector'].value)",
        "params": {
          "queryVector": {"2": -0.5, "10" : 111.3, "50": -13.0, "113": 14.8, "4545": -156.0}
        }
      }
    }
  }
}
```

Closes #31615
@elasticmachine

This comment has been minimized.

Copy link
Collaborator

commented Jan 29, 2019

@jpountz
Copy link
Contributor

left a comment

I only had a quick look, one concern that I have is that we are leaking the internal representation of vector fields.

I believe we should instead expose vectors in scripts via a dedicated ScriptDocValues sub-class, like we are doing for dates for instance, or only give access to vector fields via functions, whose signature would look like dotProduct(queryVector, fieldName).

@@ -9,7 +9,8 @@ not exceed 500. The number of dimensions can be
different across documents. A `dense_vector` field is
a single-valued field.

These vectors can be used for document scoring.
These vectors can be used for
{ref}/query-dsl-script-score-query.html#vector-functions[document scoring].

This comment has been minimized.

Copy link
@jpountz

jpountz Jan 29, 2019

Contributor

is there a reason not use an internal link, eg. <<vector-functions,document scoring>>?

This comment has been minimized.

Copy link
@mayya-sharipova

mayya-sharipova Jan 30, 2019

Author Contributor

Thanks Adrien. I think we can use internal links only to reference within the same document. What I wanted to do here is reference a section of the external document

This comment has been minimized.

Copy link
@jpountz

jpountz Jan 31, 2019

Contributor

I'm still a bit confused, this is the same document, isn't it?

This comment has been minimized.

Copy link
@mayya-sharipova

mayya-sharipova Feb 5, 2019

Author Contributor

@jpountz Sorry Adrien, I meant that inside one asciidoc doc dense-vector.asciidoc we want to reference a section of another asciidoc doc script-score-query.asciidoc.

We can indeed use an easier format : <<query-dsl-script-score-query,document_scoring>>, but this will link to the whole document. And as I understood after talking with the documentation team, the only way to link to the section of another doc is to use this full html link.

This comment has been minimized.

Copy link
@jpountz

jpountz Feb 13, 2019

Contributor

ok

This comment has been minimized.

Copy link
@mayya-sharipova

mayya-sharipova Feb 12, 2019

Author Contributor

@jpountz Sorry Adrien, please disregard my previous comments. I have followed your advice to use internal links and it looks like documentation CI passed.

this.queryVectorMagnitude = (float) Math.sqrt(dotProduct);
}

public float cosineSimilarity(BytesRef docVectorBR) {

This comment has been minimized.

Copy link
@jpountz

jpountz Jan 29, 2019

Contributor

I would make these methods return a double. We only support floats at index time because of space contraints, but this isn't a problem here.

This comment has been minimized.

Copy link
@mayya-sharipova

mayya-sharipova Jan 29, 2019

Author Contributor

@jpountz Thanks for the review, Adrien. I will change this to double. The main reason for float was that it is a document's score, and all other Scorers are returning floats.

@mayya-sharipova

This comment has been minimized.

Copy link
Contributor Author

commented Jan 30, 2019

@jpountz Thanks for the initial review, Adrien. I have tried to address your comments and this PR is ready for the review when you have time:

  • I have made functions return double instead of float
  • I have modified the format of the functions from dotProduct(queryVector, doc['my_dense_vector'].value) to dotProduct(queryVector, doc['my_dense_vector'])
    I could not exactly made them as you suggested: dotProduct(queryVector, fieldName) inside the painless script.

About exposing vectors in scripts via a dedicated ScriptDocValues sub-class - this was already initially implemented through VectorScriptDocValues.java.

About leaking the internal representation of vector fields - I have made getValue() method of VectorScriptDocValues package private, so that vector fields
are NOT accessible in scripts, sorting, or aggs outside our distance unctions. Or are you concerned that vector values are returned as a part of the search request as below?

"hits": [
      {
        "_index": "dindex",
        "_type": "_doc",
        "_id": "2",
        "_score": 1.0000001,
        "_source": {
          "my_text": "text2",
          "my_vector": [
            4.5,
            3.4,
            -1.2
          ]
        }
      },
@LiuGangR

This comment has been minimized.

Copy link

commented Jan 31, 2019

how to use cosineSimilarity?
I use the snapshot which is built from your branch 'vector-fied-query'

it just tell me '"lang":"painless","caused_by":{"type":"illegal_argument_exception","reason":"Variable [my_feature] is not defined'

{"error":{"root_cause":[{"type":"script_exception","reason":"compile error","script_stack":["... (params.queryVector, doc[my_feature].value)"," ^---- HERE"],"script":"cosineSimilarity(params.queryVector, doc[my_feature].value)","lang":"painless"}],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":"test_index","node":"mZKn55wJSSi-vs3hMxocbQ","reason":{"type":"query_shard_exception","reason":"script_score: the script could not be loaded","index_uuid":"Q8lJHJLLRIatXSOY1_2UJg","index":"test_index","caused_by":{"type":"script_exception","reason":"compile error","script_stack":["... (params.queryVector, doc[my_feature].value)"," ^---- HERE"],"script":"cosineSimilarity(params.queryVector, doc[my_feature].value)","lang":"painless","caused_by":{"type":"illegal_argument_exception","reason":"Variable [my_feature] is not defined."}}}}],"caused_by":{"type":"script_exception","reason":"compile error","script_stack":["... (params.queryVector, doc[my_feature].value)"," ^---- HERE"],"script":"cosineSimilarity(params.queryVector, doc[my_feature].value)","lang":"painless","caused_by":{"type":"illegal_argument_exception","reason":"Variable [my_feature] is not defined."}}},"status":400}

and this is my mapping
{ "test_index": { "mappings": { "properties": { "my_feature": { "type": "dense_vector" } } } } }

@jpountz
Copy link
Contributor

left a comment

Thanks Mayya, I like this approach much more. I left some minor comments. One additional thing that would be nice to address would be to make sure that users get a nice error if they call the sparse functions on dense vectors or vice-versa, I have the feeling that users would get cryptic decoding errors if they do that with the current state of your PR?


@Override
public SortedBinaryDocValues getBytesValues() {
return null;

This comment has been minimized.

Copy link
@jpountz

jpountz Jan 31, 2019

Contributor

can you throw an exception instead?

@@ -9,7 +9,8 @@ not exceed 500. The number of dimensions can be
different across documents. A `dense_vector` field is
a single-valued field.

These vectors can be used for document scoring.
These vectors can be used for
{ref}/query-dsl-script-score-query.html#vector-functions[document scoring].

This comment has been minimized.

Copy link
@jpountz

jpountz Jan 31, 2019

Contributor

I'm still a bit confused, this is the same document, isn't it?

@@ -74,6 +74,108 @@ to be the most efficient by using the internal mechanisms.
--------------------------------------------------
// NOTCONSOLE

[[vector-functions]]
===== Distance functions for vector fields
These functions are used to calculate distances

This comment has been minimized.

Copy link
@jpountz

jpountz Jan 31, 2019

Contributor

Let's maybe avoid mentioning "distance" since eg. cosineSimilarity measure the similarity between two vectors rather than their distance?

// NOTCONSOLE

NOTE: If a document doesn't have a value for a vector field on which
a distance function is executed, 0 will be returned as a result.

This comment has been minimized.

Copy link
@jpountz

jpountz Jan 31, 2019

Contributor

Let's also clarify what happens for dense vectors if they don't have the same number of dimensions?

public static int[] decodeSparseVectorDims(BytesRef vectorBR) {
if (vectorBR == null) {
throw new IllegalStateException("A document doesn't have a value for a vector field!");
}

This comment has been minimized.

Copy link
@jpountz

jpountz Jan 31, 2019

Contributor

Shouldn't this be an illegal argument exception?

int i = 0;
for (Map.Entry<String, Number> dimValue : queryVector.entrySet()) {
queryDims[i] = Integer.parseInt(dimValue.getKey());
queryValues[i] = dimValue.getValue().floatValue();

This comment has been minimized.

Copy link
@jpountz

jpountz Jan 31, 2019

Contributor

s/floatValue/doubleValue/?

double dotProduct = 0;
int i = 0;
for (Map.Entry<String, Number> dimValue : queryVector.entrySet()) {
queryDims[i] = Integer.parseInt(dimValue.getKey());

This comment has been minimized.

Copy link
@jpountz

jpountz Jan 31, 2019

Contributor

catch the NumberFormatException to return a more user-friendly exception?

// calculate docVector magnitude
double dotProduct = 0;
for (float value : docValues) {
dotProduct += value * value;

This comment has been minimized.

Copy link
@jpountz

jpountz Jan 31, 2019

Contributor

cast one of the values to a double to have better accuracy and avoid overflows?


VectorDVAtomicFieldData(BinaryDocValues values) {
super();
this.values = values;

This comment has been minimized.

Copy link
@jpountz

jpountz Jan 31, 2019

Contributor

let's take a LeafReader and a String field like other impls do and re-pull binary doc values each time, this way calling getScriptDocValues() multiple times on the same AtomicFieldData instance will work as expected

}

// package private access only for {@link ScoreScriptUtils}
BytesRef getValue() {

This comment has been minimized.

Copy link
@jpountz

jpountz Jan 31, 2019

Contributor

let's call it something like getEncodedValue to clarify what it is about?

@jpountz

This comment has been minimized.

Copy link
Contributor

commented Jan 31, 2019

@LiuGangR You need to put quotes around the field name.

@LiuGangR

This comment has been minimized.

Copy link

commented Jan 31, 2019

@jpountz Thanks!
This query is working.
{ "query": { "script_score": { "query": { "match_all": {} }, "script": { "source": "dotProduct(params.queryVector, doc[\"my_feature\"])", "params": { "queryVector": [4, 3.4, -1.2] } } } } }

@LiuGangR

This comment has been minimized.

Copy link

commented Jan 31, 2019

But there is new problem

script score function must not produce negative scores

{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"script score function must not produce negative scores, but got: [-0.1967234265776135]"}],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":"test_index","node":"mZKn55wJSSi-vs3hMxocbQ","reason":{"type":"illegal_argument_exception","reason":"script score function must not produce negative scores, but got: [-0.1967234265776135]"}}],"caused_by":{"type":"illegal_argument_exception","reason":"script score function must not produce negative scores, but got: [-0.1967234265776135]","caused_by":{"type":"illegal_argument_exception","reason":"script score function must not produce negative scores, but got: [-0.1967234265776135]"}}},"status":400}

@jpountz

This comment has been minimized.

Copy link
Contributor

commented Jan 31, 2019

This is a good point, we should update examples so that they may only create positive scores, regardless of what vectors are indexed.

@LiuGangR

This comment has been minimized.

Copy link

commented Feb 1, 2019

@jpountz
That is cool. And you have any plan to support that in which version ?

@jpountz

This comment has been minimized.

Copy link
Contributor

commented Feb 1, 2019

@LiuGangR Hopefully 7.1.

@LiuGangR

This comment has been minimized.

Copy link

commented Feb 1, 2019

@jpountz
another question. If I what to search 'dense_vector' field, the 'cosineSimilarity' is the only why. And is there a default vector query?
Thanks!

@mayya-sharipova

This comment has been minimized.

Copy link
Contributor Author

commented Feb 5, 2019

@LiuGangR yes, the only way to use dense_vector or sparse_vector in queries is through cosineSimilarity and dotProduct functions

@mayya-sharipova

This comment has been minimized.

Copy link
Contributor Author

commented Feb 5, 2019

@jpountz Thanks Adrien for another review. I have addressed all your feedback except 1 comment, and this PR is ready for another round of review whenever you have time.

Unaddressed feedback:

One additional thing that would be nice to address would be to make sure that users get a nice error if they call the sparse functions on dense vectors or vice-versa, I have the feeling that users would get cryptic decoding errors if they do that with the current state of your PR?

Uses can make two mistakes here:

  1. provide a query vector in a wrong format. Here we have some safeguards for parseInt or painless script engine will complain and I can't do anything (e.g. queryVector is expected to be a Map but Array was provided)
  2. provide a document vector in a wrong format (dense versus sparse). Looks like here a user will not see failures, but will see unexpected scores (either 0, or very huge negative float numbers). The only way to prevent it, is to have the first byte in BytesRef as a special value that can tell us if the encoded vector is dense or sparse. What do you think?

About changing the encoding for vector fields, I was also thinking possibly to encode the magnitude of a document vector, so not to calculate it each time. What do you think about this?

@jasontedor jasontedor added v8.0.0 and removed v7.0.0 labels Feb 6, 2019

@mayya-sharipova mayya-sharipova added v7.2.0 and removed v8.0.0 labels Feb 8, 2019

@LiuGangR

This comment has been minimized.

Copy link

commented Feb 12, 2019

@jpountz
I build source today from the last commit of vector-field-query. And I use the data and search which are success in last built version. But it is failed.
And this is the log.

{"error":{"root_cause":[{"type":"script_exception","reason":"runtime error","script_stack":["cosineSimilarity(params.queryVector, doc[\"my_feature\"])"," ^---- HERE"],"script":"cosineSimilarity(params.queryVector, doc[\"my_feature\"])","lang":"painless"}],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":"test_index","node":"1FxxsdfiRfGh_O_3OAe6ZA","reason":{"type":"script_exception","reason":"runtime error","script_stack":["cosineSimilarity(params.queryVector, doc[\"my_feature\"])"," ^---- HERE"],"script":"cosineSimilarity(params.queryVector, doc[\"my_feature\"])","lang":"painless","caused_by":{"type":"class_cast_exception","reason":"class org.elasticsearch.index.query.VectorScriptDocValues cannot be cast to class org.apache.lucene.util.BytesRef (org.elasticsearch.index.query.VectorScriptDocValues is in unnamed module of loader java.net.FactoryURLClassLoader @46046c06; org.apache.lucene.util.BytesRef is in unnamed module of loader 'app')"}}}]},"status":400}

@jpountz
Copy link
Contributor

left a comment

+1 Thanks @mayya-sharipova.

@mayya-sharipova mayya-sharipova merged commit 3260fd1 into elastic:master Feb 20, 2019

9 checks passed

CLA Commit author is a member of Elasticsearch
Details
elasticsearch-ci/1 Build finished.
Details
elasticsearch-ci/2 Build finished.
Details
elasticsearch-ci/bwc Build finished.
Details
elasticsearch-ci/default-distro Build finished.
Details
elasticsearch-ci/docbldesx Build finished.
Details
elasticsearch-ci/docs-check Build finished.
Details
elasticsearch-ci/oss-distro-docs Build finished.
Details
elasticsearch-ci/packaging-sample Build finished.
Details

weizijun added a commit to weizijun/elasticsearch that referenced this pull request Feb 20, 2019

Distance measures for dense and sparse vectors (elastic#37947)
* Distance measures for dense and sparse vectors

Introduce painless functions of
cosineSimilarity and dotProduct distance
measures for dense and sparse vector fields.

```js
{
  "query": {
    "script_score": {
      "query": {
        "match_all": {}
      },
      "script": {
        "source": "cosineSimilarity(params.queryVector, doc['my_dense_vector'].value)",
        "params": {
          "queryVector": [4, 3.4, -1.2]
        }
      }
    }
  }
}
```

```js
{
  "query": {
    "script_score": {
      "query": {
        "match_all": {}
      },
      "script": {
        "source": "cosineSimilaritySparse(params.queryVector, doc['my_sparse_vector'].value)",
        "params": {
          "queryVector": {"2": -0.5, "10" : 111.3, "50": -13.0, "113": 14.8, "4545": -156.0}
        }
      }
    }
  }
}
```

Closes elastic#31615

@jpountz jpountz added the v8.0.0 label Feb 20, 2019

mayya-sharipova added a commit to mayya-sharipova/elasticsearch that referenced this pull request Feb 22, 2019

Distance measures for dense and sparse vectors (elastic#37947)
* Distance measures for dense and sparse vectors

Introduce painless functions of
cosineSimilarity and dotProduct distance
measures for dense and sparse vector fields.

```js
{
  "query": {
    "script_score": {
      "query": {
        "match_all": {}
      },
      "script": {
        "source": "cosineSimilarity(params.queryVector, doc['my_dense_vector'].value)",
        "params": {
          "queryVector": [4, 3.4, -1.2]
        }
      }
    }
  }
}
```

```js
{
  "query": {
    "script_score": {
      "query": {
        "match_all": {}
      },
      "script": {
        "source": "cosineSimilaritySparse(params.queryVector, doc['my_sparse_vector'].value)",
        "params": {
          "queryVector": {"2": -0.5, "10" : 111.3, "50": -13.0, "113": 14.8, "4545": -156.0}
        }
      }
    }
  }
}
```

Closes elastic#31615

mayya-sharipova added a commit that referenced this pull request Feb 23, 2019

Backport distance functions vectors (#39330)
Distance functions for dense and sparse vectors

Backport for #37947, #39313
@wmelton

This comment has been minimized.

Copy link

commented May 13, 2019

@mayya-sharipova - For clarification, does this native vector function use source values for the computations or the document values? Only ask because there seems to be performance degredations anytime source values are accessed at query time for queries like this.

Only asking because the documentation seems to suggest the use of _source values - https://www.elastic.co/guide/en/elasticsearch/reference/master/query-dsl-script-score-query.html#vector-functions

Also - do you have any performance numbers you've run/tested? Someone mentioned this feature was being added and said a test with 5 Million documents with vectors of dim=300 took 5 seconds to return results, which seems like pretty anemic response times.

@mayya-sharipova

This comment has been minimized.

Copy link
Contributor Author

commented May 17, 2019

@wmelton Answering your questions:

For clarification, does this native vector function use source values for the computations or the document values?

We use binary document values, we encode vectors as binaries during indexing, and decode them back to numeric vectors during search.

do you have any performance numbers you've run/tested?

No, currently, we don't have, but we plan to work on adding some benchmarks. Vector functions use linear scan over all matched docs, so the response time should increase linearly with the number of matched docs.

Also, would like to note that vector fields is an experimental feature, and APIs and the way the vectors are indexed and encoded may be changed in the non-backward compatible way.

@wmelton

This comment has been minimized.

Copy link

commented May 19, 2019

Hi @mayya-sharipova -

Thank you for your responses.

Regarding "Vector functions use linear scan over all matched docs, so the response time should increase linearly with the number of matched docs." - I think taking the linear approach for this is a mistake, personally.

The pL2AP algorithm and Facebooks open source FAISS (Fast Similarity Search) both highlight ways to parallelize the search space. I think implementing a linear search approach will be frustrating to the type of users who are actually the most likely to want to use the dense or sparse vector field type you are proposing adding.

@mayya-sharipova

This comment has been minimized.

Copy link
Contributor Author

commented May 22, 2019

@wmelton Thanks for your comment. Indeed linear scan would not scale, and it is intended mostly to score a limited set of documents.

About FAISS library, the speed ups there are based on the hardware acceleration and approximate knn algorithms. We currently don't have plans to employ hardware acceleration, but we are exploring algorithms for approximate knn.

@ra1ski

This comment has been minimized.

Copy link

commented May 27, 2019

@mayya-sharipova
Hi! Is there any chance to use long dense vectors to compute cosine distance?
I have these kinds of vectors
[0.7831882238388062, 0.8473913073539734, 0.6641695499420166...]

with 200 floating point numbers

@ra1ski

This comment has been minimized.

Copy link

commented May 28, 2019

@mayya-sharipova
Hi! Is there any chance to use long dense vectors to compute cosine distance?
I have these kinds of vectors
[0.7831882238388062, 0.8473913073539734, 0.6641695499420166...]

with 200 floating point numbers

Your example above with "queryVector": [ 4.5, 3.4, -1.2] works fine, but when it comes to [0.7831882238388062, 0.8473913073539734, 0.6641695499420166...] vectors, I get an error:
"caused_by": { "type": "script_exception", "reason": "compile error", "script_stack": [ "cosineSimilarity(params.q ...", "^---- HERE" ], "script": "cosineSimilarity(params.queryVector, doc['vector'])", "lang": "painless", "caused_by": { "type": "illegal_argument_exception", "reason": "Unknown call [cosineSimilarity] with [2] arguments." } }

@mayya-sharipova

This comment has been minimized.

Copy link
Contributor Author

commented May 28, 2019

@ra1ski What do you mean by "long dense vectors"? Do you mean to use 200 dimensions? Yes, you can use up to 1024 dimensions. It should be fine.
I am not sure why you are experiencing this error. Can you provide the whole query? Also are you testing this against the current master?

@ra1ski

This comment has been minimized.

Copy link

commented May 29, 2019

@ra1ski What do you mean by "long dense vectors"? Do you mean to use 200 dimensions? Yes, you can use up to 1024 dimensions. It should be fine.
I am not sure why you are experiencing this error. Can you provide the whole query? Also are you testing this against the current master?

Yes, 200 dimensions.
I'm using it against 7.0.0. master, tried with 7.1.1 also

Here is the query

{
  "query": {
    "script_score": {
      "query": {
        "match_all": {}
      },
      "script": {
        "source": "cosineSimilarity(params.queryVector, doc['vector'])",
        "params": {
          "queryVector": [0.7831882238388062, 0.8473913073539734, 0.6641695499420166, -0.7800988554954529, 0.6427151560783386, 0.8618375062942505, -0.7508959174156189, 0.8940073251724243, -0.8382183313369751, -0.8465797305107117, 0.8887408375740051, 0.8348124623298645, 0.7685972452163696, -0.8586599230766296, 0.7378193140029907, -0.7119467854499817, -0.8077011108398438, 0.8601088523864746, 0.8935535550117493, 0.6392208337783813, 0.8716743588447571, -0.7871374487876892, 0.6682323217391968, -0.8151301145553589, -0.8227899670600891, -0.7399943470954895, -0.897373378276825, 0.8426622152328491, 0.8269796371459961, 0.8424233198165894, 0.8509830236434937, -0.7777097821235657, 0.8377213478088379, 0.9059052467346191, 0.7352653741836548, -0.7400990128517151, -0.8934587240219116, -0.9130118489265442, -0.8574285507202148, -0.8946468234062195, 0.8552821278572083, 0.8763160705566406, -0.7989016771316528, -0.642711341381073, -0.7476733922958374, -0.8486865758895874, 0.8278630971908569, -0.8525271415710449, -0.8806391954421997, -0.6730614304542542, -0.881908118724823, 0.7430080771446228, 0.7847618460655212, 0.8260719180107117, -0.8224948644638062, -0.7607067823410034, 0.8367544412612915, 0.20206642150878906, 0.7692943215370178, -0.8679789304733276, -0.7517973780632019, -0.8642300367355347, -0.7322789430618286, -0.8890762329101562, -0.8113778829574585, -0.8182528614997864, -0.8263254165649414, 0.8806875944137573, -0.8628260493278503, 0.838936984539032, 0.8677369952201843, -0.776382565498352, 0.8289804458618164, 0.6592877507209778, -0.8425590395927429, -0.763074517250061, 0.8569432497024536, -0.7417001128196716, 0.8681409955024719, -0.8540714979171753, -0.8500930070877075, -0.8368064761161804, -0.8406449556350708, -0.8733716011047363, -0.8958595991134644, 0.8130819201469421, -0.8314911723136902, 0.8423287272453308, 0.8449920415878296, -0.8795095682144165, 0.7511520981788635, -0.8035956621170044, 0.7193001508712769, 0.7730565071105957, -0.857988178730011, 0.8187726140022278, 0.831302285194397, 0.8996239900588989, -0.863531231880188, 0.8358138799667358, -0.8426796197891235, 0.8390976190567017, 0.7986222505569458, -0.8568884134292603, 0.8369844555854797, 0.8447090983390808, 0.8311792612075806, -0.8208156824111938, -0.7700560092926025, -0.784808874130249, -0.874031662940979, 0.8473763465881348, 0.8083603978157043, 0.8634394407272339, 0.8724079132080078, -0.7952577471733093, 0.5091663599014282, 0.656829833984375, -0.8029653429985046, -0.8171727061271667, 0.8314194679260254, -0.8559287190437317, 0.8022019267082214, 0.7917070388793945, -0.8446627855300903, -0.7673274278640747, 0.832277774810791, -0.8024963140487671, 0.9498147964477539, -0.7452983856201172, 0.8978539705276489, 0.8834426999092102, 0.8543949127197266, 0.8466156721115112, -0.8207280039787292, 0.8191858530044556, -0.8309515118598938, 0.7519159317016602, 0.8341091275215149, -0.8656532168388367, 0.8573458790779114, -0.8247408866882324, 0.9135391116142273, -0.8272571563720703, -0.8448845148086548, -0.8408781290054321, -0.8409822583198547, -0.842566967010498, 0.7356223464012146, 0.8904960751533508, 0.8448322415351868, -0.8642748594284058, 0.8605462908744812, 0.8045945167541504, -0.8715876340866089, -0.8079540133476257, -0.8474785089492798, -0.8472393155097961, 0.8432945013046265, -0.8253397941589355, 0.7905577421188354, 0.7081928253173828, 0.6722716093063354, 0.8101333379745483, -0.8465112447738647, 0.8858150243759155, 0.8352972269058228, -0.7904651761054993, -0.8659583330154419, -0.8847810626029968, -0.762391209602356, -0.7752716541290283, -0.7860286831855774, -0.8350412249565125, -0.8377161026000977, -0.8326281309127808, 0.6579743027687073, -0.8490581512451172, 0.7932018041610718, 0.7292879819869995, 0.8307300806045532, 0.8333244323730469, -0.7778127193450928, -0.8621459007263184, -0.8240952491760254, 0.8149698376655579, 0.8036678433418274, 0.7759568691253662, -0.8074528574943542, -0.8319423794746399, -0.685379683971405, -0.6155311465263367, 0.771338701248169, 0.7577664256095886, 0.7837430238723755, -0.7604954838752747, 0.8120626211166382, -0.8959243893623352, -0.7081544995307922, 0.8636442422866821]
        }
      }
    }
  }
}
@mayya-sharipova

This comment has been minimized.

Copy link
Contributor Author

commented Jun 4, 2019

@ra1ski Vector functions are available starting from 7.2

@mayya-sharipova mayya-sharipova added v7.3.0 and removed v7.2.0 labels Jun 14, 2019

mayya-sharipova added a commit to mayya-sharipova/elasticsearch that referenced this pull request Jul 9, 2019

Add l1norm and l2norm distances for vectors
Add L1norm - Manhattan distance
Add L2norm - Euclidean distance
relates to elastic#37947

pull bot pushed a commit to indux/elasticsearch that referenced this pull request Jul 11, 2019

Add l1norm and l2norm distances for vectors (elastic#44116)
* Add l1norm and l2norm distances for vectors

Add L1norm - Manhattan distance
Add L2norm - Euclidean distance
relates to elastic#37947

* Address Christoph's feedback

- organize vector functions as a separate doc
- increase precision in tests calculations
- add a separate test when sparse doc dims
are bigger and less than query vector dims

* Made examples more realistic

mayya-sharipova added a commit that referenced this pull request Jul 11, 2019

Add l1norm and l2norm distances for vectors (#44116)
Add L1norm - Manhattan distance
Add L2norm - Euclidean distance
relates to #37947

skontos added a commit to skontos/elasticsearch that referenced this pull request Jul 13, 2019

Add l1norm and l2norm distances for vectors (elastic#44116)
* Add l1norm and l2norm distances for vectors

Add L1norm - Manhattan distance
Add L2norm - Euclidean distance
relates to elastic#37947

* Address Christoph's feedback

- organize vector functions as a separate doc
- increase precision in tests calculations
- add a separate test when sparse doc dims
are bigger and less than query vector dims

* Made examples more realistic
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.