-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Text embeddings produced by the Multilingual E5 base and large models are zero value #102541
Comments
Pinging @elastic/ml-core (Team:ML) |
I've confirmed that this is caused by using IPEX. The following test script can reproduce the problem:
In order for this to work, the The output is this:
So in Python the second tensor is all If you delete the This is fairly similar to intel/intel-extension-for-pytorch#484. In that case the second inference crashed PyTorch. In this case the second inference doesn't cause a crash but doesn't produce sensible results. Interestingly, the test case for For me, all these problems suggest that we should just remove IPEX. Even if the bugs can be fixed, the amount of testing we're going to have to do in the future with and without IPEX to convince ourselves that each future upgrade is sound would seem to be prohibitive. |
In trying to find the fix in IPEX version 2.1 that fixed this problem I stumbled across intel/intel-extension-for-pytorch#240. This is yet another issue of "works first time, not second". The workaround suggested in that issue is:
And interestingly that exact same workaround fixes this problem with |
Another "works first time, not second" issue related to JIT profiling mode: pytorch/pytorch#72029 |
We have observed a couple of issues where IPEX is linked where the first inference call works but the second does not: - intel/intel-extension-for-pytorch#484 happens with ELSER and PyTorch 2.1 - elastic/elasticsearch#102541 happens with the multilingual E5 base and large models and PyTorch 1.13.1 Disabling JIT profiling avoids the problems.
elastic/ml-cpp#2604 applies the equivalent fix to the C++ code. However, we need to do some performance testing because we may find that disabling JIT profiling with IPEX results in worse performance than not linking IPEX at all. pytorch/pytorch#38342 and speechbrain/speechbrain#1068 are examples of where disabling JIT profiling was very detrimental to performance. Those issues relate to much older versions of PyTorch though. Never linking IPEX would be an equally simple fix, so it all comes down to the performance measurements. |
We've decided to stop linking IPEX starting from version 8.12.0 - see elastic/ml-cpp#2605. Hopefully that will fix this problem. Needs to be confirmed when we have the next full build. |
Closed by elastic/ml-cpp#2608 |
Elasticsearch Version
8.11.1
Installed Plugins
No response
Java Version
bundled
OS Version
Linux x86
Problem Description
The Multilingual E5 large and Multilingual E5 base models return the expected embedding when first evaluated but the second evaluation and every subsequent evaluation returns an embedding of all zero values. Stopping the redeploying the model produces the same effect: the first call succeeds then every following returns all zeros.
The multilingual-e5-small variant does not suffer from this problem, it only applies to large and base.
The large and base variants are currently not supported in Elastic
Steps to Reproduce
First install the models with Eland, in this case the docker image from docker.elastic.co is used.
Deploy the model from Kibana -> ML Trained Models then in Console call the _infer API
The first time it is called a text embedding is produced, every subsequent call returns all zeros.
Logs (if relevant)
No response
The text was updated successfully, but these errors were encountered: