Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom model: RuntimeError: weight shared.weight does not exist #541

Closed
2 of 4 tasks
Matthieu-Tinycoaching opened this issue Jul 4, 2023 · 11 comments · Fixed by #582
Closed
2 of 4 tasks

Custom model: RuntimeError: weight shared.weight does not exist #541

Matthieu-Tinycoaching opened this issue Jul 4, 2023 · 11 comments · Fixed by #582

Comments

@Matthieu-Tinycoaching
Copy link

System Info

Tue Jul  4 16:51:59 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.41.03              Driver Version: 530.41.03    CUDA Version: 12.1     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                  Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3090         Off| 00000000:21:00.0  On |                  N/A |
|  0%   51C    P8               51W / 390W|   1047MiB / 24576MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      1762      G   /usr/lib/xorg/Xorg                           24MiB |
|    0   N/A  N/A      2178      G   /usr/bin/gnome-shell                         83MiB |
|    0   N/A  N/A      3994      G   /usr/lib/xorg/Xorg                          451MiB |
|    0   N/A  N/A      4140      G   /usr/bin/gnome-shell                         50MiB |
|    0   N/A  N/A      4827      G   ...,WinRetrieveSuggestionsOnlyOnDemand       65MiB |
|    0   N/A  N/A      5061      G   ...9470975,14709274054277858675,262144       96MiB |
|    0   N/A  N/A     35735      G   /snap/thunderbird/339/thunderbird-bin        87MiB |
|    0   N/A  N/A     36507      G   ...sion,SpareRendererForSitePerProcess       40MiB |
|    0   N/A  N/A     42817      G   ...ures=SpareRendererForSitePerProcess       36MiB |
|    0   N/A  N/A     47573      G   ...ures=SpareRendererForSitePerProcess       92MiB |
|    0   N/A  N/A     67787      G   /usr/lib/firefox/firefox                     11MiB |
+---------------------------------------------------------------------------------------+

Information

  • Docker
  • The CLI directly

Tasks

  • An officially supported command
  • My own modifications

Reproduction

When launching TGI on custom model derived from lmsys/fastchat-t5-3b-v1.0 with the following command:
docker run --rm --network none --gpus 0 -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-generation-inference:latest --model-id /data/fastchat-t5-3b-v1.0

I got the following error message:

latest: Pulling from huggingface/text-generation-inference
Digest: sha256:29019a087e64ce951a6c9ca3b17a6823dfd9d25eeb56ec06c08150516fd60f0b
Status: Image is up to date for ghcr.io/huggingface/text-generation-inference:latest
2023-07-04T14:50:00.870189Z  INFO text_generation_launcher: Args { model_id: "/data/fastchat-t5-3b-v1.0", revision: None, sharded: None, num_shard: None, quantize: None, dtype: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_input_length: 1024, max_total_tokens: 2048, waiting_served_ratio: 1.2, max_batch_prefill_tokens: 4096, max_batch_total_tokens: 16000, max_waiting_tokens: 20, port: 80, shard_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: Some("/data"), weights_cache_override: None, disable_custom_kernels: false, json_output: false, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_domain: None, ngrok_username: None, ngrok_password: None, env: false }
2023-07-04T14:50:00.870282Z  INFO text_generation_launcher: Starting download process.
2023-07-04T14:50:02.000718Z  INFO download: text_generation_launcher: Files are already present on the host. Skipping download.

2023-07-04T14:50:02.371986Z  INFO text_generation_launcher: Successfully downloaded weights.
2023-07-04T14:50:02.372146Z  INFO text_generation_launcher: Starting shard 0
2023-07-04T14:50:04.072895Z  WARN shard-manager: text_generation_launcher: We're not using custom kernels.
 rank=0
2023-07-04T14:50:04.214047Z ERROR shard-manager: text_generation_launcher: Error when initializing model
Traceback (most recent call last):
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/custom_modeling/t5_modeling.py", line 1005, in __init__
    self.shared = TensorParallelEmbedding(prefix="shared", weights=weights)
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/layers.py", line 280, in __init__
    weight = weights.get_sharded(f"{prefix}.weight", dim=0)
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 73, in get_sharded
    filename, tensor_name = self.get_filename(tensor_name)
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 49, in get_filename
    raise RuntimeError(f"weight {tensor_name} does not exist")
RuntimeError: weight shared.weight does not exist

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/bin/text-generation-server", line 8, in <module>
    sys.exit(app())
  File "/opt/conda/lib/python3.9/site-packages/typer/main.py", line 311, in __call__
    return get_command(self)(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/typer/core.py", line 778, in main
    return _main(
  File "/opt/conda/lib/python3.9/site-packages/typer/core.py", line 216, in _main
    rv = self.invoke(ctx)
  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/typer/main.py", line 683, in wrapper
    return callback(**use_params)  # type: ignore
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py", line 78, in serve
    server.serve(
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 166, in serve
    asyncio.run(
  File "/opt/conda/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 634, in run_until_complete
    self.run_forever()
  File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 601, in run_forever
    self._run_once()
  File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 1905, in _run_once
    handle._run()
  File "/opt/conda/lib/python3.9/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
> File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 133, in serve_inner
    model = get_model(
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/__init__.py", line 279, in get_model
    return T5Sharded(
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/t5.py", line 61, in __init__
    model = T5ForConditionalGeneration(config, weights)
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/custom_modeling/t5_modeling.py", line 1007, in __init__
    self.shared = TensorParallelEmbedding(
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/layers.py", line 280, in __init__
    weight = weights.get_sharded(f"{prefix}.weight", dim=0)
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 73, in get_sharded
    filename, tensor_name = self.get_filename(tensor_name)
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 49, in get_filename
    raise RuntimeError(f"weight {tensor_name} does not exist")
RuntimeError: weight encoder.embed_tokens.weight does not exist
 rank=0
2023-07-04T14:50:04.673754Z ERROR text_generation_launcher: Shard 0 failed to start
2023-07-04T14:50:04.673779Z ERROR text_generation_launcher: Traceback (most recent call last):

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/custom_modeling/t5_modeling.py", line 1005, in __init__
    self.shared = TensorParallelEmbedding(prefix="shared", weights=weights)

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/layers.py", line 280, in __init__
    weight = weights.get_sharded(f"{prefix}.weight", dim=0)

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 73, in get_sharded
    filename, tensor_name = self.get_filename(tensor_name)

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 49, in get_filename
    raise RuntimeError(f"weight {tensor_name} does not exist")

RuntimeError: weight shared.weight does not exist


During handling of the above exception, another exception occurred:


Traceback (most recent call last):

  File "/opt/conda/bin/text-generation-server", line 8, in <module>
    sys.exit(app())

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py", line 78, in serve
    server.serve(

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 166, in serve
    asyncio.run(

  File "/opt/conda/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)

  File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 647, in run_until_complete
    return future.result()

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 133, in serve_inner
    model = get_model(

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/__init__.py", line 279, in get_model
    return T5Sharded(

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/t5.py", line 61, in __init__
    model = T5ForConditionalGeneration(config, weights)

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/custom_modeling/t5_modeling.py", line 1007, in __init__
    self.shared = TensorParallelEmbedding(

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/layers.py", line 280, in __init__
    weight = weights.get_sharded(f"{prefix}.weight", dim=0)

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 73, in get_sharded
    filename, tensor_name = self.get_filename(tensor_name)

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 49, in get_filename
    raise RuntimeError(f"weight {tensor_name} does not exist")

RuntimeError: weight encoder.embed_tokens.weight does not exist


2023-07-04T14:50:04.673806Z  INFO text_generation_launcher: Shutting down shards
Error: ShardCannotStart

Expected behavior

I'd like to run TGI on my custom model on my RTX-3090 GPU.

@dat-browny
Copy link

Same problem with you when I'm using BLOOM combined with LoRA adapter, then i receive this error.

image

RuntimeError: weight word_embeddings.weight does not exist

I've tried with the original BLOOM but it does not happened.

@aliswel-mt
Copy link

aliswel-mt commented Jul 6, 2023

I got similar error when I load wizardcoder with quantize tag, without quantize everything is just fine.

RuntimeError: weight transformer.h.0.attn.c_attn.qweight does not exist

run:
text-generation-launcher --model-id wizardcoder --sharded false --port 8080 --quantize gptq

@PitchboyDev
Copy link

Same with a LoRA merged falcon.

@ckanaar
Copy link

ckanaar commented Jul 11, 2023

Happened to me as well, "fixed it" by reverting to the 0.8 version of the Docker container, so it seems 0.9 version specific.

@PitchboyDev
Copy link

@ckanaar thanks for the advice. It works for me too.

@rohan-pradhan
Copy link

Hi @PitchboyDev - following up on this for deploying LORA merged Falcon models on to TGI. How did you manage to deploy the model by downgrading TGI to 0.8? When I deploy using 0.8 or 0.8.2 I get this error:

AttributeError: FlashRWForCausalLM has no attribute 'model'

However, when I use0.9.2 or 0.9.3 I get the same error as you:

RuntimeError: weight lm_head.weight does not exist.

Any insight on how you solved this? Thanks!

@PitchboyDev
Copy link

@rohan-pradhan which version of falcon do you have ? For us, we have used the 7b version and downgrade to version 0.8 did the trick.
Maybe be a configuration problem ?

@rohan-pradhan
Copy link

@PitchboyDev - yes, we are using Falcon 7B too!

@Trapper4888
Copy link

Trapper4888 commented Sep 15, 2023

It may be a problem related to safe tensors and torch shared tensors: https://huggingface.co/docs/safetensors/torch_shared_tensors

Because I had a similar error when tying to manually save my model in safetensors using the save_file method:

RuntimeError:
            Some tensors share memory, this will lead to duplicate memory on disk and potential differences when loading them again: [{'transformer.word_embeddings.weight', 'lm_head.weight'}].
            A potential way to correctly save your model is to use `save_model`.
            More information at https://huggingface.co/docs/safetensors/torch_shared_tensors

Now I save it with save_model, and TGI gives me this kind of error:

File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 53, in get_filename
    raise RuntimeError(f"weight {tensor_name} does not exist")

RuntimeError: weight transformer.word_embeddings.weight does not exist

EDIT: Solved it for my by generation the safetensors with the transformers save_pretrained function, adding the parameter safe_serialization=True

model.save_pretrained(OUTPUTS_PATH, safe_serialization=True)

@Narsil
Copy link
Collaborator

Narsil commented Sep 19, 2023

Thanks for sharing your solution !

@nolwennz
Copy link

nolwennz commented Sep 29, 2023

Hello,

Same issue here, we are trying to run our custom model with TGI (https://huggingface.co/cmarkea/bloomz-560m-sft-chat).
The model runs well with TGI up to version 0.8.*. Starting from 0.9.0 we get the same error

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 147, in serve_inner
    model = get_model(

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/__init__.py", line 147, in get_model
    return BLOOMSharded(

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/bloom.py", line 82, in __init__
    model = BloomForCausalLM(config, weights)

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/custom_modeling/bloom_modeling.py", line 818, in __init__
    self.transformer = BloomModel(config, weights)

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/custom_modeling/bloom_modeling.py", line 609, in __init__
    self.word_embeddings = TensorParallelEmbedding(

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/layers.py", line 375, in __init__
    weight = weights.get_partial_sharded(f"{prefix}.weight", dim=0)

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 77, in get_partial_sharded
    filename, tensor_name = self.get_filename(tensor_name)

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 53, in get_filename
    raise RuntimeError(f"weight {tensor_name} does not exist")

RuntimeError: weight word_embeddings.weight does not exist
 rank=0
Error: ShardCannotStart
2023-09-27T12:59:44.433561Z ERROR text_generation_launcher: Shard 0 failed to start
2023-09-27T12:59:44.433595Z  INFO text_generation_launcher: Shutting down shards

Our weights have the following format "transformer.word_embeddings.weight" and not "word_embeddings.weight" as the error suggests.

So it looks like the base_model_prefix is not configured properly.

Would it be possible to set base_model_prefix="transformer" as default for BloomModels as it is done for BloomPretrainedModels ? Or is it possible to add a CLI arg to specify the weight prefix?

Looking forward to test the latest versions features 🚀 Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants