Fix tensor normalization in EmbeddingsPipeline #106

chrislee973 · 2023-05-07T06:11:28Z

I was directly using the quantized all-MiniLM-L6-v2 onnx model to precompute some embeddings from Python, and I noticed some slight differences between those embeddings and the ones transformers.js generated. I found that the vector norm calculation wasn't squaring the first value of the vector: let norm = Math.sqrt(batch.data.reduce((a, b) => a + b * b)).

Adding an initial value of 0 to the reduce fixes this. I've verified that this fixed normalization calculation results in embeddings equal to the ones generated directly from the onnx model!

kungfooman · 2023-05-07T11:17:01Z

Simple test for validating the difference:

function aPlusBsquared(a, b) {
  console.log({a, b});
  return a + b * b;
}
[10, 20, 30, 40].reduce(aPlusBsquared);
[10, 20, 30, 40].reduce(aPlusBsquared, 0);

Output:

Basically when no initial value is given, it doesn't start with undefined, but using index 0 and 1 instead.

xenova · 2023-05-07T13:54:05Z

Whoops! Thanks for that!

add initial value for reduce

d0e93c0

xenova merged commit 13b570c into huggingface:main May 7, 2023

chrislee973 deleted the fix-norm-calculation branch May 9, 2023 06:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix tensor normalization in EmbeddingsPipeline #106

Fix tensor normalization in EmbeddingsPipeline #106

Uh oh!

chrislee973 commented May 7, 2023

Uh oh!

kungfooman commented May 7, 2023

Uh oh!

xenova commented May 7, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix tensor normalization in EmbeddingsPipeline #106

Fix tensor normalization in EmbeddingsPipeline #106

Uh oh!

Conversation

chrislee973 commented May 7, 2023

Uh oh!

kungfooman commented May 7, 2023

Uh oh!

xenova commented May 7, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants