Skip to content

Conversation

@chrislee973
Copy link
Contributor

I was directly using the quantized all-MiniLM-L6-v2 onnx model to precompute some embeddings from Python, and I noticed some slight differences between those embeddings and the ones transformers.js generated. I found that the vector norm calculation wasn't squaring the first value of the vector: let norm = Math.sqrt(batch.data.reduce((a, b) => a + b * b)).

Adding an initial value of 0 to the reduce fixes this. I've verified that this fixed normalization calculation results in embeddings equal to the ones generated directly from the onnx model!

@kungfooman
Copy link
Contributor

Simple test for validating the difference:

function aPlusBsquared(a, b) {
  console.log({a, b});
  return a + b * b;
}
[10, 20, 30, 40].reduce(aPlusBsquared);
[10, 20, 30, 40].reduce(aPlusBsquared, 0);

Output:

image

Basically when no initial value is given, it doesn't start with undefined, but using index 0 and 1 instead.

@xenova
Copy link
Collaborator

xenova commented May 7, 2023

Whoops! Thanks for that!

@xenova xenova merged commit 13b570c into huggingface:main May 7, 2023
@chrislee973 chrislee973 deleted the fix-norm-calculation branch May 9, 2023 06:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants