Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Text embeddings produced by the Multilingual E5 base and large models are zero value #102541

Closed
davidkyle opened this issue Nov 23, 2023 · 7 comments
Labels
>bug :ml Machine learning Team:ML Meta label for the ML team

Comments

@davidkyle
Copy link
Member

davidkyle commented Nov 23, 2023

Elasticsearch Version

8.11.1

Installed Plugins

No response

Java Version

bundled

OS Version

Linux x86

Problem Description

The Multilingual E5 large and Multilingual E5 base models return the expected embedding when first evaluated but the second evaluation and every subsequent evaluation returns an embedding of all zero values. Stopping the redeploying the model produces the same effect: the first call succeeds then every following returns all zeros.

The multilingual-e5-small variant does not suffer from this problem, it only applies to large and base.

The large and base variants are currently not supported in Elastic

Steps to Reproduce

First install the models with Eland, in this case the docker image from docker.elastic.co is used.

# base model
docker run -it --rm elastic/eland \
    eland_import_hub_model \
      --cloud-id $CLOUD_ID \
      -u elastic -p $CLOUD_PWD \
      --hub-model-id intfloat/multilingual-e5-base \
      --task-type text_embedding

# large model
docker run -it --rm elastic/eland \
    eland_import_hub_model \
      --cloud-id $CLOUD_ID \
      -u elastic -p $CLOUD_PWD \
      --hub-model-id intfloat/multilingual-e5-large \
      --task-type text_embedding

Deploy the model from Kibana -> ML Trained Models then in Console call the _infer API

POST _ml/trained_models/intfloat__multilingual-e5-large/_infer?timeout=30s
{
  "docs": [
    {
      "text_field": "hello world"
    }
  ]
}

The first time it is called a text embedding is produced, every subsequent call returns all zeros.

{
"inference_results": [
    {
      "predicted_value": [
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
...

Logs (if relevant)

No response

@davidkyle davidkyle added >bug :ml Machine learning labels Nov 23, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

@elasticsearchmachine elasticsearchmachine added the Team:ML Meta label for the ML team label Nov 23, 2023
@joshdevins joshdevins changed the title [ML] Text embeddings produced by the E5 base and large models are zero value [ML] Text embeddings produced by the Multilingual E5 base and large models are zero value Nov 23, 2023
@droberts195
Copy link
Contributor

droberts195 commented Dec 22, 2023

I've confirmed that this is caused by using IPEX.

The following test script can reproduce the problem:

import torch
import intel_extension_for_pytorch

model = torch.jit.load("multilingual-e5-base.pt")
model.eval()

input_ids = [101, 1996, 2143, 2001, 2307, 999, 999, 102]
attention_mask = [1, 1, 1, 1, 1, 1, 1, 1]
results = model(torch.tensor([input_ids]), torch.tensor([attention_mask]))
print(results)

input_ids = [101, 1996, 3185, 2001, 12476, 999, 999, 102]
attention_mask = [1, 1, 1, 1, 1, 1, 1, 1]
results = model(torch.tensor([input_ids]), torch.tensor([attention_mask]))
print(results)

In order for this to work, the torch module must be specifically installed with version 1.13.1, and the intel_extension_for_pytorch module must be specifically installed with version 1.13.100. multilingual-e5-base.pt is the traced version of intfloat/multilingual-e5-base from HuggingFace.

The output is this:

tensor([[ 1.7369e-02,  4.8434e-02, -6.2539e-03,  1.8355e-02,  2.0521e-02,
         -4.5563e-02, -1.5802e-02, -5.9128e-02,  3.0884e-03,  1.4649e-02,
         -1.2856e-02,  1.7420e-04,  1.6195e-01,  2.8815e-02, -3.8276e-02,
         -2.4681e-02,  6.8224e-03, -2.3160e-02,  7.9237e-02,  1.9552e-02,
          1.0481e-02, -3.3091e-02,  3.6326e-02, -2.3450e-03,  1.7648e-02,
         -3.0567e-02, -1.6439e-03,  3.8242e-02, -1.2067e-02,  1.7901e-02,
          3.8360e-02, -3.1435e-02,  3.5845e-02,  3.2549e-02,  2.2366e-02,
          1.7569e-02, -1.3585e-02, -3.9763e-02,  3.6576e-02, -1.5593e-03,
          4.9921e-03,  1.2376e-02,  4.0131e-04, -7.0616e-02,  4.1594e-02,
         -3.6535e-02,  1.9711e-02,  1.8832e-02, -3.4898e-02, -1.4019e-02,
          3.2579e-02,  3.6367e-03,  2.6920e-02,  8.9894e-03, -5.1246e-02,
         -5.8895e-02,  3.5931e-02,  3.9108e-02, -7.0353e-02,  3.1832e-02,
         -2.1396e-02,  4.1405e-02, -1.3039e-02,  2.4103e-02,  1.6964e-02,
          9.0204e-03, -1.5397e-02, -4.0077e-03, -4.1475e-02, -3.1293e-02,
          1.8864e-02, -3.5423e-02,  5.2289e-02, -1.7174e-03, -6.7740e-02,
         -2.2193e-02, -3.4714e-02,  2.2440e-02,  4.0149e-03, -4.9788e-02,
          4.9971e-02,  3.9506e-02,  5.0118e-02,  4.4148e-02,  4.7448e-03,
         -4.1405e-02, -3.3681e-02,  3.5847e-02,  1.7240e-02,  8.5994e-02,
          1.4272e-02, -1.1709e-03, -3.8438e-02,  1.8073e-02,  3.4021e-03,
          9.8223e-03,  3.6359e-02, -1.6618e-03,  3.3472e-02, -5.8012e-02,
          6.1625e-03, -7.8514e-02, -4.1665e-03, -8.6664e-02, -8.0858e-02,
         -1.6779e-02, -3.1174e-02, -2.7008e-02,  2.7816e-02, -1.0501e-02,
         -1.4747e-02,  1.9150e-02,  1.6789e-02, -4.5410e-02,  1.7877e-03,
         -1.4055e-02,  1.3407e-02, -6.4748e-02, -6.8037e-04, -1.6238e-02,
         -6.4611e-03,  2.4576e-03, -1.9763e-02,  7.3423e-03,  2.4876e-02,
         -7.6750e-03, -7.2994e-03, -8.9504e-03,  4.5377e-02, -1.5158e-02,
          8.6977e-04, -5.5069e-03,  3.3671e-03,  1.9424e-02, -1.9601e-02,
          4.2891e-02, -1.5501e-03,  9.9703e-03,  6.4222e-03, -2.5174e-03,
          3.2423e-02, -6.1355e-02,  5.1551e-03,  1.7297e-02, -6.5204e-03,
         -7.3345e-02,  7.9281e-03, -1.0795e-03, -2.8170e-02,  5.6320e-02,
          6.8979e-02, -7.9008e-02,  3.4351e-04, -3.0296e-02,  4.7763e-02,
          3.0844e-03,  1.2820e-02, -1.9266e-02, -1.4475e-02,  2.7109e-02,
          3.2649e-02,  1.9361e-03,  2.9620e-02, -1.9791e-02,  7.6786e-03,
         -5.3738e-02, -2.8692e-02,  1.4254e-02, -3.5595e-02, -1.5851e-02,
         -1.1819e-02, -2.4307e-02, -5.2973e-02,  1.1874e-02,  2.8732e-02,
         -1.7483e-02, -3.6238e-03, -4.0780e-02,  9.2198e-03, -4.4119e-02,
         -1.8739e-02, -4.4268e-02, -4.6216e-02,  3.3987e-02,  3.4370e-02,
         -2.3369e-02, -1.3692e-02, -1.8523e-02,  4.1374e-02, -3.9241e-02,
         -2.0526e-02,  5.9116e-02,  3.2963e-02,  1.6038e-02,  8.6202e-03,
          2.1868e-02,  4.4461e-02,  3.8211e-02, -1.8608e-02, -5.8218e-03,
          2.0931e-02, -5.8177e-03,  8.3351e-02,  4.1157e-02,  2.3472e-02,
         -7.0436e-02,  3.1211e-02,  4.5765e-02,  1.8225e-02,  1.7163e-02,
          4.0297e-03,  2.3168e-02,  1.5512e-03,  1.7011e-02, -2.8520e-02,
         -4.6094e-02,  1.9132e-02,  1.0657e-02, -4.3236e-02,  3.2327e-02,
          1.8681e-02, -8.7045e-02,  2.2918e-02,  1.0714e-02, -2.4592e-02,
         -9.0873e-03,  2.2066e-02, -2.0343e-02,  3.0974e-02,  4.3946e-02,
          3.2851e-02,  5.9956e-02, -2.3642e-02,  3.8678e-02, -3.9894e-02,
         -1.0098e-02,  6.2430e-02, -8.1361e-03,  6.6621e-03, -7.7785e-02,
         -3.9480e-02,  2.7503e-02,  1.3897e-02, -3.5432e-02,  7.9216e-03,
         -8.8742e-03, -4.9643e-02,  2.3360e-02, -2.4886e-02,  2.3375e-02,
          2.3112e-03,  3.3225e-03,  3.6761e-02,  2.7902e-03,  2.4517e-02,
         -6.7220e-03, -6.6255e-02,  6.4821e-02,  3.0850e-02,  5.4570e-02,
         -4.2620e-03, -4.1603e-02, -4.5574e-03, -2.0088e-02,  1.4244e-02,
          7.1336e-04,  6.5367e-02, -6.6434e-02,  1.7011e-02, -2.5726e-02,
         -2.6284e-02, -9.8451e-03, -2.5860e-02,  1.6566e-02,  1.5857e-02,
         -3.5585e-02, -3.2555e-02,  3.9877e-02, -6.1960e-02,  2.1336e-02,
         -1.0286e-02,  4.6954e-02, -4.6011e-02, -2.4216e-02, -3.9450e-02,
         -2.8861e-03, -7.6052e-02,  2.6592e-02, -1.1775e-02,  1.4516e-02,
          5.2612e-02,  4.5800e-02, -1.3595e-02,  2.7283e-02,  8.7548e-02,
          3.4980e-02,  1.4240e-02,  1.6736e-02,  3.1939e-02, -2.2910e-02,
         -8.0029e-03, -1.1572e-02,  3.6451e-02, -6.4089e-02, -1.8196e-02,
          3.8354e-02,  1.3051e-01,  2.2490e-02, -1.0875e-01,  6.2762e-04,
         -2.5815e-02, -2.6847e-02, -5.9564e-02,  4.0313e-02,  2.7080e-02,
         -2.6631e-02,  9.8834e-03,  7.5751e-02,  1.4667e-02,  4.1102e-02,
         -1.3167e-03,  3.9736e-02, -1.4698e-02, -4.2636e-02,  3.9420e-02,
         -6.3861e-02,  2.1070e-02,  2.4687e-03,  3.9715e-02, -7.6043e-03,
         -1.7132e-02,  2.5640e-02, -3.4196e-02,  4.1575e-02, -6.6734e-02,
          1.8734e-02, -3.1721e-02, -7.5943e-02,  6.2850e-02,  4.4045e-02,
         -4.0491e-03,  2.1649e-02,  5.7452e-03, -4.3647e-02,  2.1343e-02,
         -3.2775e-02, -1.3245e-02, -2.4833e-02, -2.9422e-02,  1.5463e-02,
         -7.7210e-04, -7.5984e-02,  4.3173e-02, -1.8883e-02, -4.0533e-02,
         -6.4595e-03,  4.2309e-02, -3.8159e-02, -2.6342e-02, -3.8873e-03,
          2.1745e-03, -1.5787e-02, -9.9795e-03, -1.9923e-02,  1.6283e-02,
          5.5542e-02,  5.9554e-03, -1.6798e-02,  2.6133e-02, -7.4804e-03,
          6.8863e-02,  5.2237e-02, -1.4065e-02, -3.5035e-02,  3.4466e-02,
         -4.0512e-02,  3.6975e-02, -2.8422e-02, -1.2842e-02,  3.6821e-02,
          6.1537e-02,  2.7802e-02,  4.8549e-02,  3.1911e-02, -3.4778e-02,
          3.9453e-02, -8.1166e-03, -3.8272e-02,  4.8455e-02, -8.3683e-04,
         -2.2482e-02, -5.1862e-02, -4.2443e-02,  1.8470e-03,  3.0870e-02,
         -9.1411e-03, -6.9652e-03, -1.7021e-02,  3.0417e-02, -3.1584e-02,
          6.9840e-02,  2.3759e-02,  8.2288e-03,  3.8843e-02,  4.7774e-02,
          7.2555e-02, -3.8620e-03,  5.8923e-02, -4.9420e-03, -2.3341e-02,
          4.9801e-02, -1.2448e-02,  2.2303e-02, -5.0431e-02,  4.1855e-03,
          9.6158e-04, -2.9765e-02,  4.0907e-02,  3.0361e-03,  6.4354e-03,
         -2.4102e-02,  2.6869e-02,  2.3496e-02,  3.2937e-02,  1.5346e-02,
         -9.1846e-02, -3.4762e-02, -3.1815e-02, -2.2975e-02, -1.4554e-02,
         -4.1518e-04, -6.3899e-04,  1.0553e-02,  1.4800e-02,  3.2453e-02,
         -4.6957e-02,  4.5857e-02,  2.6641e-02, -4.0190e-02,  1.0830e-02,
          1.5658e-02,  1.5006e-02,  1.9467e-02, -1.0978e-02, -5.5910e-02,
         -2.1532e-02, -4.3741e-02,  5.1701e-03, -3.8741e-02, -4.9631e-03,
          2.6138e-02,  1.1861e-02,  3.7347e-02,  1.3409e-04,  3.2796e-02,
         -5.1050e-02,  7.2186e-03,  4.6093e-03, -1.9091e-03, -1.0617e-01,
          1.1753e-02,  3.4677e-02, -5.0204e-03,  3.6338e-02, -9.1397e-03,
         -8.2542e-02,  2.6892e-02, -2.7749e-02,  1.0733e-02, -4.1877e-02,
         -1.4080e-02, -1.0223e-02,  1.3896e-02, -3.3016e-02,  1.0933e-02,
          3.0952e-02,  1.9174e-02,  2.4103e-02,  6.4259e-02, -4.3092e-02,
         -2.1191e-02, -1.4932e-02,  1.9236e-03,  8.8572e-03,  4.3138e-02,
          5.0371e-02, -4.7049e-02,  6.7247e-02,  3.4132e-02, -2.1154e-02,
         -1.2599e-02,  2.0850e-02,  2.1805e-02, -6.7709e-03,  3.3523e-02,
          2.2734e-02, -3.6829e-02, -6.3793e-02,  4.1407e-02,  2.1880e-02,
         -3.0316e-02, -3.2960e-02, -5.4750e-02,  1.4088e-03, -5.0013e-02,
          1.5960e-02, -7.4235e-02,  9.4793e-03,  3.4357e-02, -3.6441e-02,
          7.7422e-02, -4.4624e-02, -3.9892e-02, -2.2615e-02,  3.9849e-02,
         -1.9161e-03, -1.8218e-02, -8.2835e-03,  7.6627e-03, -1.0824e-02,
          1.1999e-02,  3.8917e-03, -2.1063e-02, -2.2932e-02,  3.3836e-02,
         -4.7448e-02,  3.7454e-03, -2.4213e-02,  5.8555e-02, -5.2313e-03,
          5.7342e-02, -1.2649e-03,  3.2721e-02, -1.1295e-01, -4.8341e-02,
          5.1175e-02,  6.5575e-03, -7.2921e-03,  5.3922e-02, -3.0338e-02,
         -1.8484e-03, -1.8220e-02,  1.9976e-02, -9.2309e-03,  4.4935e-02,
         -3.6495e-02,  1.5042e-02,  5.4266e-03,  1.8984e-02, -1.6972e-02,
          2.0086e-02, -1.3841e-02, -1.2664e-02,  1.9910e-02, -1.0670e-02,
          5.4321e-02,  6.0075e-04,  3.6001e-03,  3.1564e-02, -3.5693e-02,
          5.0089e-03, -2.9843e-02, -1.0213e-02,  2.9017e-02, -4.6571e-02,
          3.9615e-02,  4.4256e-02,  2.3266e-02, -2.6981e-02,  7.9343e-02,
         -6.3603e-02,  4.7383e-02,  4.5524e-02,  8.4188e-03,  1.2031e-02,
          5.6115e-02, -2.8528e-02, -2.4937e-02,  3.6192e-02, -6.6814e-02,
         -6.3572e-02, -1.3946e-02, -8.5648e-02,  3.4642e-02,  2.8635e-02,
          3.4373e-02,  2.7028e-02, -1.6048e-02, -2.1465e-01,  2.5356e-02,
          1.0355e-02, -2.8506e-02,  2.2712e-02,  3.3479e-02,  2.5797e-02,
         -4.7848e-04,  3.0090e-02,  1.9347e-02,  2.8305e-02, -9.8735e-03,
          4.5223e-02, -2.3515e-02, -1.1516e-02,  7.2953e-03, -1.5979e-02,
         -3.3065e-02,  1.0164e-02, -8.3854e-03, -8.8697e-03,  4.3551e-02,
         -4.3080e-03,  9.0557e-03, -3.8442e-02, -2.8781e-02, -1.2150e-02,
          1.7634e-02,  3.2402e-02,  3.0600e-02,  2.3196e-03,  1.3381e-02,
          1.9070e-02,  2.7942e-02, -1.9696e-02,  1.5096e-02, -1.5590e-02,
          1.1093e-02, -1.0750e-01,  1.0105e-02, -1.4276e-03, -9.3103e-02,
         -1.5073e-02,  2.6544e-02, -2.0924e-02,  6.0295e-02,  5.1985e-02,
          1.0301e-02, -1.8522e-02, -1.7593e-02,  2.0143e-02, -3.6733e-03,
         -1.0945e-01,  2.9335e-02, -4.6778e-02,  1.6192e-02, -4.5240e-03,
          2.4297e-02,  3.7773e-02, -1.0383e-02,  4.9600e-02,  1.0030e-02,
         -3.1327e-02,  2.9692e-02, -6.0210e-04, -1.7170e-02,  2.4850e-02,
         -4.0756e-02, -2.0568e-02,  2.4817e-03,  1.6586e-02, -1.2384e-02,
         -2.3898e-02,  7.8105e-02,  1.1850e-02, -7.8739e-03,  2.7953e-02,
         -1.6522e-02, -7.0238e-02,  1.5124e-02, -4.7269e-02, -2.5921e-02,
          1.1141e-02, -1.5087e-02, -3.5587e-02, -3.8790e-02, -1.6084e-02,
          1.1400e-02,  1.1603e-02,  3.0102e-02, -3.8170e-02, -4.1349e-02,
          1.8666e-02,  1.1129e-02,  2.5385e-02, -7.1682e-02,  6.3354e-03,
          3.3054e-02, -3.1851e-02, -3.0791e-02,  1.3725e-02,  8.4972e-03,
         -2.3921e-03,  5.0555e-02,  3.7905e-02, -1.2756e-02, -7.4998e-02,
          2.4894e-02, -4.1313e-02, -5.7560e-03,  2.3428e-02,  2.1083e-02,
          5.1997e-02, -3.1473e-02,  3.7649e-03,  1.0898e-02,  2.4131e-02,
         -7.6468e-04, -1.9660e-02, -7.2064e-02, -2.1093e-02, -6.3465e-02,
          2.1919e-02, -6.1815e-03,  2.8765e-02, -1.7399e-02, -1.8060e-02,
          1.1577e-02,  7.1644e-03,  8.3038e-03, -5.9644e-02, -4.2732e-02,
          1.1034e-02, -4.3264e-02, -1.6391e-02,  1.3658e-02,  3.9471e-02,
         -2.3061e-02,  2.4642e-02,  2.6235e-02,  2.8634e-02, -5.2027e-02,
          3.8940e-03,  1.0439e-02, -1.4800e-02,  2.4826e-02,  1.5845e-02,
         -1.7333e-02,  5.9789e-02, -2.1553e-02, -4.4845e-02, -7.6308e-02,
         -1.2057e-02, -1.7752e-02, -1.1444e-02,  3.5229e-02,  1.6934e-03,
          3.6682e-02,  2.2486e-03,  2.4273e-02, -1.8974e-02, -6.3262e-02,
          4.2745e-02,  1.0564e-02,  3.0440e-02,  5.9976e-03, -1.3679e-02,
         -1.1852e-02,  9.3298e-03,  1.7441e-02, -3.3209e-02,  2.0950e-03,
          2.0435e-02,  1.2469e-02,  1.6965e-02, -1.2512e-02,  5.3783e-02,
         -4.2301e-02, -6.2993e-02, -1.1831e-02]])
tensor([[nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
         nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
         nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
         nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
         nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
         nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
         nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
         nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
         nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
         nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
         nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
         nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
         nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
         nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
         nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
         nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
         nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
         nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
         nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
         nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
         nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
         nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
         nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
         nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
         nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
         nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
         nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
         nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
         nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
         nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
         nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
         nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan]])

So in Python the second tensor is all nan rather than all zero, but it's basically the same problem.

If you delete the import intel_extension_for_pytorch line then the problem doesn't occur - both tensors contain sensible values.

This is fairly similar to intel/intel-extension-for-pytorch#484. In that case the second inference crashed PyTorch. In this case the second inference doesn't cause a crash but doesn't produce sensible results. Interestingly, the test case for multilingual-e5-base described in this comment works with PyTorch and IPEX 2.1, while the test case in intel/intel-extension-for-pytorch#484 works with PyTorch and IPEX 1.13.1.

For me, all these problems suggest that we should just remove IPEX. Even if the bugs can be fixed, the amount of testing we're going to have to do in the future with and without IPEX to convince ourselves that each future upgrade is sound would seem to be prohibitive.

@droberts195
Copy link
Contributor

In trying to find the fix in IPEX version 2.1 that fixed this problem I stumbled across intel/intel-extension-for-pytorch#240. This is yet another issue of "works first time, not second".

The workaround suggested in that issue is:

torch._C._jit_set_profiling_mode(False)

And interestingly that exact same workaround fixes this problem with multilingual-e5-base too.

@droberts195
Copy link
Contributor

Another "works first time, not second" issue related to JIT profiling mode: pytorch/pytorch#72029

droberts195 added a commit to droberts195/ml-cpp that referenced this issue Dec 22, 2023
We have observed a couple of issues where IPEX is linked where
the first inference call works but the second does not:

- intel/intel-extension-for-pytorch#484
  happens with ELSER and PyTorch 2.1
- elastic/elasticsearch#102541
  happens with the multilingual E5 base and large models and
  PyTorch 1.13.1

Disabling JIT profiling avoids the problems.
@droberts195
Copy link
Contributor

droberts195 commented Dec 22, 2023

elastic/ml-cpp#2604 applies the equivalent fix to the C++ code. However, we need to do some performance testing because we may find that disabling JIT profiling with IPEX results in worse performance than not linking IPEX at all.

pytorch/pytorch#38342 and speechbrain/speechbrain#1068 are examples of where disabling JIT profiling was very detrimental to performance. Those issues relate to much older versions of PyTorch though.

Never linking IPEX would be an equally simple fix, so it all comes down to the performance measurements.

@droberts195
Copy link
Contributor

We've decided to stop linking IPEX starting from version 8.12.0 - see elastic/ml-cpp#2605. Hopefully that will fix this problem. Needs to be confirmed when we have the next full build.

@davidkyle
Copy link
Member Author

Closed by elastic/ml-cpp#2608

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :ml Machine learning Team:ML Meta label for the ML team
Projects
None yet
Development

No branches or pull requests

3 participants