Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nvidia/NV-Embed-v1 #239

Closed
3 tasks done
Strive-for-excellence opened this issue May 31, 2024 · 7 comments
Closed
3 tasks done

nvidia/NV-Embed-v1 #239

Strive-for-excellence opened this issue May 31, 2024 · 7 comments
Labels
new model Make a model compatible

Comments

@Strive-for-excellence
Copy link

Model description

https://huggingface.co/nvidia/NV-Embed-v1

NV-Embed-v1 ranks first on the MTEB. However, it cannot be loaded using the SentenceTransformer library.

Open source status

  • The model implementation is available on transformers
  • The model weights are available on huggingface-hub
  • I verified that the model is currently not running in infinity

Provide useful links for the implementation

https://huggingface.co/nvidia/NV-Embed-v1

@michaelfeil
Copy link
Owner

What’s the error message?

@Strive-for-excellence
Copy link
Author

This is the error message.

+ infinity_emb --model-name-or-path /mnt/cache/zhangxingyan/hub/model/nvidia/NV-Embed-v1 --port 20052 --trust-remote-code
INFO:     Started server process [8]
INFO:     Waiting for application startup.
INFO     2024-05-31 13:07:47,181 infinity_emb INFO:           select_model.py:54
         model=`/mnt/cache/zhangxingyan/hub/model/nvidia/NV-E                   
         mbed-v1` selected, using engine=`torch` and                            
         device=`None`                                                          
INFO     2024-05-31 13:07:47,193                      SentenceTransformer.py:107
         sentence_transformers.SentenceTransformer                              
         INFO: Load pretrained SentenceTransformer:                             
         /mnt/cache/zhangxingyan/hub/model/nvidia/NV-                           
         Embed-v1                                                               
WARNING  2024-05-31 13:07:47,195                     SentenceTransformer.py:1129
         sentence_transformers.SentenceTransformer                              
         WARNING: No sentence-transformers model                                
         found with name                                                        
         /mnt/cache/zhangxingyan/hub/model/nvidia/NV                            
         -Embed-v1. Creating a new one with MEAN                                
         pooling.                                                               
ERROR:    Traceback (most recent call last):
  File "/app/.venv/lib/python3.10/site-packages/starlette/routing.py", line 677, in lifespan
    async with self.lifespan_context(app) as maybe_state:
  File "/usr/lib/python3.10/contextlib.py", line 199, in __aenter__
    return await anext(self.gen)
  File "/app/infinity_emb/infinity_server.py", line 46, in lifespan
    app.model = AsyncEmbeddingEngine.from_args(engine_args)  # type: ignore
  File "/app/infinity_emb/engine.py", line 49, in from_args
    engine = cls(**asdict(engine_args), _show_deprecation_warning=False)
  File "/app/infinity_emb/engine.py", line 40, in __init__
    self._model, self._min_inference_t, self._max_inference_t = select_model(
  File "/app/infinity_emb/inference/select_model.py", line 62, in select_model
    loaded_engine = unloaded_engine.value(engine_args=engine_args)
  File "/app/infinity_emb/transformer/embedder/sentence_transformer.py", line 47, in __init__
    super().__init__(
  File "/app/.venv/lib/python3.10/site-packages/sentence_transformers/SentenceTransformer.py", line 199, in __init__
    modules = self._load_auto_model(
  File "/app/.venv/lib/python3.10/site-packages/sentence_transformers/SentenceTransformer.py", line 1134, in _load_auto_model
    transformer_model = Transformer(
  File "/app/.venv/lib/python3.10/site-packages/sentence_transformers/models/Transformer.py", line 36, in __init__
    self._load_model(model_name_or_path, config, cache_dir, **model_args)
  File "/app/.venv/lib/python3.10/site-packages/sentence_transformers/models/Transformer.py", line 65, in _load_model
    self.auto_model = AutoModel.from_pretrained(
  File "/app/.venv/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 550, in from_pretrained
    model_class = get_class_from_dynamic_module(
  File "/app/.venv/lib/python3.10/site-packages/transformers/dynamic_module_utils.py", line 489, in get_class_from_dynamic_module
    final_module = get_cached_module_file(
  File "/app/.venv/lib/python3.10/site-packages/transformers/dynamic_module_utils.py", line 315, in get_cached_module_file
    modules_needed = check_imports(resolved_module_file)
  File "/app/.venv/lib/python3.10/site-packages/transformers/dynamic_module_utils.py", line 180, in check_imports
    raise ImportError(
ImportError: This modeling file requires the following packages that were not found in your environment: einops. Run `pip install einops`

ERROR:    Application startup failed. Exiting.

This is because the model configuration is different from other models. The model configuration is located in the "text_config".
https://huggingface.co/nvidia/NV-Embed-v1/blob/main/config.json

{
  "add_eos": true,
  "add_pad_token": true,
  "architectures": [
    "NVEmbedModel"
  ],
  "auto_map": {
    "AutoConfig": "configuration_nvembed.NVEmbedConfig",
    "AutoModel": "modeling_nvembed.NVEmbedModel"
  },
  "is_mask_instruction": true,
  "latent_attention_config": {
    "model_type": "latent_attention"
  },
  "mask_type": "b",
  "model_type": "nvembed",
  "padding_side": "right",
  "text_config": {
    "_name_or_path": "nvidia/NV-Embed-v1",
    "add_cross_attention": false,
    "architectures": [
      "MistralModel"
    ],
    "attention_dropout": 0.0,
    "bad_words_ids": null,
    "begin_suppress_tokens": null,
    "bos_token_id": 1,
    "chunk_size_feed_forward": 0,
    "cross_attention_hidden_size": null,
    "decoder_start_token_id": null,
    "diversity_penalty": 0.0,
    "do_sample": false,
    "early_stopping": false,
    "encoder_no_repeat_ngram_size": 0,
    "eos_token_id": 2,
    "exponential_decay_length_penalty": null,
    "finetuning_task": null,
    "forced_bos_token_id": null,
    "forced_eos_token_id": null,
    "hidden_act": "silu",
    "hidden_size": 4096,
    "id2label": {
      "0": "LABEL_0",
      "1": "LABEL_1"
    },
    "initializer_range": 0.02,
    "intermediate_size": 14336,
    "is_decoder": false,
    "is_encoder_decoder": false,
    "label2id": {
      "LABEL_0": 0,
      "LABEL_1": 1
    },
    "length_penalty": 1.0,
    "max_length": 20,
    "max_position_embeddings": 32768,
    "min_length": 0,
    "model_type": "bidir_mistral",
    "no_repeat_ngram_size": 0,
    "num_attention_heads": 32,
    "num_beam_groups": 1,
    "num_beams": 1,
    "num_hidden_layers": 32,
    "num_key_value_heads": 8,
    "num_return_sequences": 1,
    "output_attentions": false,
    "output_hidden_states": false,
    "output_scores": false,
    "pad_token_id": null,
    "prefix": null,
    "problem_type": null,
    "pruned_heads": {},
    "remove_invalid_values": false,
    "repetition_penalty": 1.0,
    "return_dict": true,
    "return_dict_in_generate": false,
    "rms_norm_eps": 1e-05,
    "rope_theta": 10000.0,
    "sep_token_id": null,
    "sliding_window": 4096,
    "suppress_tokens": null,
    "task_specific_params": null,
    "temperature": 1.0,
    "tf_legacy_loss": false,
    "tie_encoder_decoder": false,
    "tie_word_embeddings": false,
    "tokenizer_class": null,
    "top_k": 50,
    "top_p": 1.0,
    "torch_dtype": "float32",
    "torchscript": false,
    "typical_p": 1.0,
    "use_bfloat16": false,
    "use_cache": true,
    "vocab_size": 32000
  },
  "torch_dtype": "float16",
  "transformers_version": "4.37.2"
}

@michaelfeil
Copy link
Owner

michaelfeil commented May 31, 2024

Can you please install pip install einops or install the latest version of pip install infinity_emb[all] with —extras all?

@michaelfeil michaelfeil added the new model Make a model compatible label May 31, 2024
@michaelfeil
Copy link
Owner

@Strive-for-excellence
Copy link
Author

Sorry, I pasted the wrong log. This is the correct one. Thanks for your work. I also think that NVIDIA/NV-Emberd-v1 should consider changing their model.

(/mnt/cache/zhangxingyan/env/emb) (py310_hf) (cloud-ai-lab) root@d36e246e-0913-4c27-ae40-956e31bbb352:/mnt/cache# infinity_emb v2 --model-id /mnt/cache/zhangxingyan/hub/model/nvidia/NV-Embed-v1 --port 20052 --trust-remote-code

INFO     2024-05-31 22:34:14,341 datasets INFO: PyTorch version 2.3.0 available.                                               config.py:58
['/mnt/cache/zhangxingyan/env/emb/bin/infinity_emb', 'v2', '--model-id', '/mnt/cache/zhangxingyan/hub/model/nvidia/NV-Embed-v1', '--port', '20052', '--trust-remote-code']
INFO:     Started server process [1942772]
INFO:     Waiting for application startup.
INFO     2024-05-31 22:34:18,478 infinity_emb INFO: model=`/mnt/cache/zhangxingyan/hub/model/nvidia/NV-Embed-v1`         select_model.py:54
         selected, using engine=`torch` and device=`None`                                                                                  
INFO     2024-05-31 22:34:18,765 sentence_transformers.SentenceTransformer INFO: Use pytorch device_name: cuda   SentenceTransformer.py:188
INFO     2024-05-31 22:34:18,766 sentence_transformers.SentenceTransformer INFO: Load pretrained                 SentenceTransformer.py:196
         SentenceTransformer: /mnt/cache/zhangxingyan/hub/model/nvidia/NV-Embed-v1                                                         
WARNING  2024-05-31 22:34:18,768 sentence_transformers.SentenceTransformer WARNING: No sentence-transformers    SentenceTransformer.py:1298
         model found with name /mnt/cache/zhangxingyan/hub/model/nvidia/NV-Embed-v1. Creating a new one with                               
         mean pooling.                                                                                                                     
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████| 4/4 [00:30<00:00,  7.61s/it]
ERROR:    Traceback (most recent call last):
  File "/mnt/cache/zhangxingyan/env/emb/lib/python3.10/site-packages/starlette/routing.py", line 677, in lifespan
    async with self.lifespan_context(app) as maybe_state:
  File "/mnt/cache/zhangxingyan/env/emb/lib/python3.10/contextlib.py", line 199, in __aenter__
    return await anext(self.gen)
  File "/mnt/cache/zhangxingyan/env/emb/lib/python3.10/site-packages/infinity_emb/infinity_server.py", line 49, in lifespan
    app.engine_array = AsyncEngineArray.from_args(engine_args_list)  # type: ignore
  File "/mnt/cache/zhangxingyan/env/emb/lib/python3.10/site-packages/infinity_emb/engine.py", line 228, in from_args
    engines=tuple(
  File "/mnt/cache/zhangxingyan/env/emb/lib/python3.10/site-packages/infinity_emb/engine.py", line 229, in <genexpr>
    AsyncEmbeddingEngine.from_args(engine_args)
  File "/mnt/cache/zhangxingyan/env/emb/lib/python3.10/site-packages/infinity_emb/engine.py", line 64, in from_args
    engine = cls(**engine_args.to_dict(), _show_deprecation_warning=False)
  File "/mnt/cache/zhangxingyan/env/emb/lib/python3.10/site-packages/infinity_emb/engine.py", line 50, in __init__
    self._model, self._min_inference_t, self._max_inference_t = select_model(
  File "/mnt/cache/zhangxingyan/env/emb/lib/python3.10/site-packages/infinity_emb/inference/select_model.py", line 62, in select_model
    loaded_engine = unloaded_engine.value(engine_args=engine_args)
  File "/mnt/cache/zhangxingyan/env/emb/lib/python3.10/site-packages/infinity_emb/transformer/embedder/sentence_transformer.py", line 47, in __init__
    super().__init__(
  File "/mnt/cache/zhangxingyan/env/emb/lib/python3.10/site-packages/sentence_transformers/SentenceTransformer.py", line 298, in __init__
    modules = self._load_auto_model(
  File "/mnt/cache/zhangxingyan/env/emb/lib/python3.10/site-packages/sentence_transformers/SentenceTransformer.py", line 1319, in _load_auto_model
    pooling_model = Pooling(transformer_model.get_word_embedding_dimension(), "mean")
  File "/mnt/cache/zhangxingyan/env/emb/lib/python3.10/site-packages/sentence_transformers/models/Transformer.py", line 133, in get_word_embedding_dimension
    return self.auto_model.config.hidden_size
  File "/mnt/cache/zhangxingyan/env/emb/lib/python3.10/site-packages/transformers/configuration_utils.py", line 264, in __getattribute__
    return super().__getattribute__(key)
AttributeError: 'NVEmbedConfig' object has no attribute 'hidden_size'

ERROR:    Application startup failed. Exiting.

@michaelfeil
Copy link
Owner

@michaelfeil
Copy link
Owner

I assume fixed now .closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new model Make a model compatible
Projects
None yet
Development

No branches or pull requests

2 participants