RuntimeError: weight encoder.embed_tokens.weight does not exist #556

chumpblocckami · 2023-07-06T11:02:06Z

After running:

docker run --gpus all --shm-size 1g -p 8080:80 -v $PWD/data:/data ghcr.io/huggingface/text-generation-inference:0.9 --model-id google/flan-t5-small --num-shard 1

I recieve:

RuntimeError: weight encoder.embed_tokens.weight does not exist

I tried multiple small models but every one raise the same issue.

Any tips?

Thanks

zoltan-fedor · 2023-07-06T19:18:34Z

Same issue here with flan-t5-xl.
I am using v0.9.1 on EKS.

Full startup log below:

{"timestamp":"2023-07-06T19:14:42.852088Z","level":"INFO","fields":{"message":"Args { model_id: \"google/flan-t5-xl\", revision: None, sharded: None, num_shard: Some(1), quantize: Some(Bitsandbytes), dtype: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_input_length: 1024, max_total_tokens: 2048, waiting_served_ratio: 1.2, max_batch_prefill_tokens: 4096, max_batch_total_tokens: 16000, max_waiting_tokens: 20, hostname: \"flan-t5-xl-64959c4d74-qs64q\", port: 80, shard_uds_path: \"/tmp/text-generation-server\", master_addr: \"localhost\", master_port: 29500, huggingface_hub_cache: Some(\"/data\"), weights_cache_override: None, disable_custom_kernels: false, json_output: true, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_domain: None, ngrok_username: None, ngrok_password: None, env: false }"},"target":"text_generation_launcher"}
{"timestamp":"2023-07-06T19:14:42.852211Z","level":"INFO","fields":{"message":"Starting download process."},"target":"text_generation_launcher"}
{"timestamp":"2023-07-06T19:14:44.549952Z","level":"WARN","fields":{"message":"No safetensors weights found for model google/flan-t5-xl at revision None. Downloading PyTorch weights.\n"},"target":"text_generation_launcher","span":{"name":"download"},"spans":[{"name":"download"}]}
{"timestamp":"2023-07-06T19:14:44.604291Z","level":"INFO","fields":{"message":"Download file: pytorch_model-00001-of-00002.bin\n"},"target":"text_generation_launcher","span":{"name":"download"},"spans":[{"name":"download"}]}
{"timestamp":"2023-07-06T19:15:03.211958Z","level":"INFO","fields":{"message":"Downloaded /data/models--google--flan-t5-xl/snapshots/53fd1e22aa944eee1fd336f9aee8a437e01676ce/pytorch_model-00001-of-00002.bin in 0:00:18.\n"},"target":"text_generation_launcher","span":{"name":"download"},"spans":[{"name":"download"}]}
{"timestamp":"2023-07-06T19:15:03.212049Z","level":"INFO","fields":{"message":"Download: [1/2] -- ETA: 0:00:18\n"},"target":"text_generation_launcher","span":{"name":"download"},"spans":[{"name":"download"}]}
{"timestamp":"2023-07-06T19:15:03.212305Z","level":"INFO","fields":{"message":"Download file: pytorch_model-00002-of-00002.bin\n"},"target":"text_generation_launcher","span":{"name":"download"},"spans":[{"name":"download"}]}
{"timestamp":"2023-07-06T19:15:09.970978Z","level":"INFO","fields":{"message":"Downloaded /data/models--google--flan-t5-xl/snapshots/53fd1e22aa944eee1fd336f9aee8a437e01676ce/pytorch_model-00002-of-00002.bin in 0:00:06.\n"},"target":"text_generation_launcher","span":{"name":"download"},"spans":[{"name":"download"}]}
{"timestamp":"2023-07-06T19:15:09.971051Z","level":"INFO","fields":{"message":"Download: [2/2] -- ETA: 0\n"},"target":"text_generation_launcher","span":{"name":"download"},"spans":[{"name":"download"}]}
{"timestamp":"2023-07-06T19:15:09.971146Z","level":"WARN","fields":{"message":"No safetensors weights found for model google/flan-t5-xl at revision None. Converting PyTorch weights to safetensors.\n"},"target":"text_generation_launcher","span":{"name":"download"},"spans":[{"name":"download"}]}
{"timestamp":"2023-07-06T19:16:15.141453Z","level":"INFO","fields":{"message":"Convert: [1/2] -- Took: 0:01:05.169846\n"},"target":"text_generation_launcher","span":{"name":"download"},"spans":[{"name":"download"}]}
{"timestamp":"2023-07-06T19:16:24.913801Z","level":"INFO","fields":{"message":"Convert: [2/2] -- Took: 0:00:09.772093\n"},"target":"text_generation_launcher","span":{"name":"download"},"spans":[{"name":"download"}]}
{"timestamp":"2023-07-06T19:16:25.262110Z","level":"INFO","fields":{"message":"Successfully downloaded weights."},"target":"text_generation_launcher"}
{"timestamp":"2023-07-06T19:16:25.262659Z","level":"INFO","fields":{"message":"Starting shard 0"},"target":"text_generation_launcher"}
{"timestamp":"2023-07-06T19:16:29.284643Z","level":"WARN","fields":{"message":"We're not using custom kernels.\n"},"target":"text_generation_launcher","span":{"rank":0,"name":"shard-manager"},"spans":[{"rank":0,"name":"shard-manager"}]}
{"timestamp":"2023-07-06T19:16:30.102714Z","level":"ERROR","fields":{"message":"Error when initializing model\nTraceback (most recent call last):\n  File \"/opt/conda/lib/python3.9/site-packages/text_generation_server/models/custom_modeling/t5_modeling.py\", line 1005, in __init__\n    self.shared = TensorParallelEmbedding(prefix=\"shared\", weights=weights)\n  File \"/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/layers.py\", line 280, in __init__\n    weight = weights.get_sharded(f\"{prefix}.weight\", dim=0)\n  File \"/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py\", line 73, in get_sharded\n    filename, tensor_name = self.get_filename(tensor_name)\n  File \"/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py\", line 49, in get_filename\n    raise RuntimeError(f\"weight {tensor_name} does not exist\")\nRuntimeError: weight shared.weight does not exist\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n  File \"/opt/conda/bin/text-generation-server\", line 8, in <module>\n    sys.exit(app())\n  File \"/opt/conda/lib/python3.9/site-packages/typer/main.py\", line 311, in __call__\n    return get_command(self)(*args, **kwargs)\n  File \"/opt/conda/lib/python3.9/site-packages/click/core.py\", line 1130, in __call__\n    return self.main(*args, **kwargs)\n  File \"/opt/conda/lib/python3.9/site-packages/typer/core.py\", line 778, in main\n    return _main(\n  File \"/opt/conda/lib/python3.9/site-packages/typer/core.py\", line 216, in _main\n    rv = self.invoke(ctx)\n  File \"/opt/conda/lib/python3.9/site-packages/click/core.py\", line 1657, in invoke\n    return _process_result(sub_ctx.command.invoke(sub_ctx))\n  File \"/opt/conda/lib/python3.9/site-packages/click/core.py\", line 1404, in invoke\n    return ctx.invoke(self.callback, **ctx.params)\n  File \"/opt/conda/lib/python3.9/site-packages/click/core.py\", line 760, in invoke\n    return __callback(*args, **kwargs)\n  File \"/opt/conda/lib/python3.9/site-packages/typer/main.py\", line 683, in wrapper\n    return callback(**use_params)  # type: ignore\n  File \"/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py\", line 78, in serve\n    server.serve(\n  File \"/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py\", line 166, in serve\n    asyncio.run(\n  File \"/opt/conda/lib/python3.9/asyncio/runners.py\", line 44, in run\n    return loop.run_until_complete(main)\n  File \"/opt/conda/lib/python3.9/asyncio/base_events.py\", line 634, in run_until_complete\n    self.run_forever()\n  File \"/opt/conda/lib/python3.9/asyncio/base_events.py\", line 601, in run_forever\n    self._run_once()\n  File \"/opt/conda/lib/python3.9/asyncio/base_events.py\", line 1905, in _run_once\n    handle._run()\n  File \"/opt/conda/lib/python3.9/asyncio/events.py\", line 80, in _run\n    self._context.run(self._callback, *self._args)\n> File \"/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py\", line 133, in serve_inner\n    model = get_model(\n  File \"/opt/conda/lib/python3.9/site-packages/text_generation_server/models/__init__.py\", line 279, in get_model\n    return T5Sharded(\n  File \"/opt/conda/lib/python3.9/site-packages/text_generation_server/models/t5.py\", line 61, in __init__\n    model = T5ForConditionalGeneration(config, weights)\n  File \"/opt/conda/lib/python3.9/site-packages/text_generation_server/models/custom_modeling/t5_modeling.py\", line 1007, in __init__\n    self.shared = TensorParallelEmbedding(\n  File \"/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/layers.py\", line 280, in __init__\n    weight = weights.get_sharded(f\"{prefix}.weight\", dim=0)\n  File \"/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py\", line 73, in get_sharded\n    filename, tensor_name = self.get_filename(tensor_name)\n  File \"/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py\", line 49, in get_filename\n    raise RuntimeError(f\"weight {tensor_name} does not exist\")\nRuntimeError: weight encoder.embed_tokens.weight does not exist\n"},"target":"text_generation_launcher","span":{"rank":0,"name":"shard-manager"},"spans":[{"rank":0,"name":"shard-manager"}]}
{"timestamp":"2023-07-06T19:16:30.667766Z","level":"ERROR","fields":{"message":"Shard 0 failed to start"},"target":"text_generation_launcher"}
{"timestamp":"2023-07-06T19:16:30.667795Z","level":"ERROR","fields":{"message":"Traceback (most recent call last):\n\n  File \"/opt/conda/lib/python3.9/site-packages/text_generation_server/models/custom_modeling/t5_modeling.py\", line 1005, in __init__\n    self.shared = TensorParallelEmbedding(prefix=\"shared\", weights=weights)\n\n  File \"/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/layers.py\", line 280, in __init__\n    weight = weights.get_sharded(f\"{prefix}.weight\", dim=0)\n\n  File \"/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py\", line 73, in get_sharded\n    filename, tensor_name = self.get_filename(tensor_name)\n\n  File \"/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py\", line 49, in get_filename\n    raise RuntimeError(f\"weight {tensor_name} does not exist\")\n\nRuntimeError: weight shared.weight does not exist\n\n\nDuring handling of the above exception, another exception occurred:\n\n\nTraceback (most recent call last):\n\n  File \"/opt/conda/bin/text-generation-server\", line 8, in <module>\n    sys.exit(app())\n\n  File \"/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py\", line 78, in serve\n    server.serve(\n\n  File \"/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py\", line 166, in serve\n    asyncio.run(\n\n  File \"/opt/conda/lib/python3.9/asyncio/runners.py\", line 44, in run\n    return loop.run_until_complete(main)\n\n  File \"/opt/conda/lib/python3.9/asyncio/base_events.py\", line 647, in run_until_complete\n    return future.result()\n\n  File \"/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py\", line 133, in serve_inner\n    model = get_model(\n\n  File \"/opt/conda/lib/python3.9/site-packages/text_generation_server/models/__init__.py\", line 279, in get_model\n    return T5Sharded(\n\n  File \"/opt/conda/lib/python3.9/site-packages/text_generation_server/models/t5.py\", line 61, in __init__\n    model = T5ForConditionalGeneration(config, weights)\n\n  File \"/opt/conda/lib/python3.9/site-packages/text_generation_server/models/custom_modeling/t5_modeling.py\", line 1007, in __init__\n    self.shared = TensorParallelEmbedding(\n\n  File \"/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/layers.py\", line 280, in __init__\n    weight = weights.get_sharded(f\"{prefix}.weight\", dim=0)\n\n  File \"/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py\", line 73, in get_sharded\n    filename, tensor_name = self.get_filename(tensor_name)\n\n  File \"/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py\", line 49, in get_filename\n    raise RuntimeError(f\"weight {tensor_name} does not exist\")\n\nRuntimeError: weight encoder.embed_tokens.weight does not exist\n\n"},"target":"text_generation_launcher"}
{"timestamp":"2023-07-06T19:16:30.667823Z","level":"INFO","fields":{"message":"Shutting down shards"},"target":"text_generation_launcher"}
Error: ShardCannotStart

When sharding disabled (same error, but easier to read):

2023-07-06T20:10:55.686866Z  INFO text_generation_launcher: Args { model_id: "google/flan-t5-xl", revision: None, sharded: Some(false), num_shard: Some(1), quantize: Some(Bitsandbytes), dtype: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_input_length: 1024, max_total_tokens: 2048, waiting_served_ratio: 1.2, max_batch_prefill_tokens: 4096, max_batch_total_tokens: 16000, max_waiting_tokens: 20, port: 80, shard_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: Some("/data"), weights_cache_override: None, disable_custom_kernels: false, json_output: false, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_domain: None, ngrok_username: None, ngrok_password: None, env: false }
2023-07-06T20:10:55.686971Z  INFO text_generation_launcher: Starting download process.
2023-07-06T20:10:57.341715Z  WARN download: text_generation_launcher: No safetensors weights found for model google/flan-t5-xl at revision None. Downloading PyTorch weights.

2023-07-06T20:10:57.417081Z  INFO download: text_generation_launcher: Download file: pytorch_model-00001-of-00002.bin

2023-07-06T20:11:18.169311Z  INFO download: text_generation_launcher: Downloaded /data/models--google--flan-t5-xl/snapshots/53fd1e22aa944eee1fd336f9aee8a437e01676ce/pytorch_model-00001-of-00002.bin in 0:00:20.

2023-07-06T20:11:18.169386Z  INFO download: text_generation_launcher: Download: [1/2] -- ETA: 0:00:20

2023-07-06T20:11:18.169595Z  INFO download: text_generation_launcher: Download file: pytorch_model-00002-of-00002.bin

2023-07-06T20:11:25.050713Z  INFO download: text_generation_launcher: Downloaded /data/models--google--flan-t5-xl/snapshots/53fd1e22aa944eee1fd336f9aee8a437e01676ce/pytorch_model-00002-of-00002.bin in 0:00:06.

2023-07-06T20:11:25.050803Z  INFO download: text_generation_launcher: Download: [2/2] -- ETA: 0

2023-07-06T20:11:25.050899Z  WARN download: text_generation_launcher: No safetensors weights found for model google/flan-t5-xl at revision None. Converting PyTorch weights to safetensors.

2023-07-06T20:12:30.361334Z  INFO download: text_generation_launcher: Convert: [1/2] -- Took: 0:01:05.309101

2023-07-06T20:12:40.112889Z  INFO download: text_generation_launcher: Convert: [2/2] -- Took: 0:00:09.752118

2023-07-06T20:12:42.517781Z  INFO text_generation_launcher: Successfully downloaded weights.
2023-07-06T20:12:42.518379Z  INFO text_generation_launcher: Starting shard 0
2023-07-06T20:12:50.458364Z  WARN shard-manager: text_generation_launcher: We're not using custom kernels.
 rank=0
2023-07-06T20:12:51.265848Z ERROR shard-manager: text_generation_launcher: Error when initializing model
Traceback (most recent call last):
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/custom_modeling/t5_modeling.py", line 1005, in __init__
    self.shared = TensorParallelEmbedding(prefix="shared", weights=weights)
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/layers.py", line 268, in __init__
    weight = weights.get_sharded(f"{prefix}.weight", dim=0)
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 73, in get_sharded
    filename, tensor_name = self.get_filename(tensor_name)
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 49, in get_filename
    raise RuntimeError(f"weight {tensor_name} does not exist")
RuntimeError: weight shared.weight does not exist

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/bin/text-generation-server", line 8, in <module>
    sys.exit(app())
  File "/opt/conda/lib/python3.9/site-packages/typer/main.py", line 311, in __call__
    return get_command(self)(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/typer/core.py", line 778, in main
    return _main(
  File "/opt/conda/lib/python3.9/site-packages/typer/core.py", line 216, in _main
    rv = self.invoke(ctx)
  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/typer/main.py", line 683, in wrapper
    return callback(**use_params)  # type: ignore
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py", line 78, in serve
    server.serve(
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 166, in serve
    asyncio.run(
  File "/opt/conda/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 634, in run_until_complete
    self.run_forever()
  File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 601, in run_forever
    self._run_once()
  File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 1905, in _run_once
    handle._run()
  File "/opt/conda/lib/python3.9/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
> File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 133, in serve_inner
    model = get_model(
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/__init__.py", line 274, in get_model
    return T5Sharded(
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/t5.py", line 61, in __init__
    model = T5ForConditionalGeneration(config, weights)
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/custom_modeling/t5_modeling.py", line 1007, in __init__
    self.shared = TensorParallelEmbedding(
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/layers.py", line 268, in __init__
    weight = weights.get_sharded(f"{prefix}.weight", dim=0)
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 73, in get_sharded
    filename, tensor_name = self.get_filename(tensor_name)
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 49, in get_filename
    raise RuntimeError(f"weight {tensor_name} does not exist")
RuntimeError: weight encoder.embed_tokens.weight does not exist
 rank=0
2023-07-06T20:12:51.827130Z ERROR text_generation_launcher: Shard 0 failed to start
2023-07-06T20:12:51.827155Z ERROR text_generation_launcher: Traceback (most recent call last):

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/custom_modeling/t5_modeling.py", line 1005, in __init__
    self.shared = TensorParallelEmbedding(prefix="shared", weights=weights)

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/layers.py", line 268, in __init__
    weight = weights.get_sharded(f"{prefix}.weight", dim=0)

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 73, in get_sharded
    filename, tensor_name = self.get_filename(tensor_name)

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 49, in get_filename
    raise RuntimeError(f"weight {tensor_name} does not exist")

RuntimeError: weight shared.weight does not exist


Error: ShardCannotStart
During handling of the above exception, another exception occurred:


Traceback (most recent call last):

  File "/opt/conda/bin/text-generation-server", line 8, in <module>
    sys.exit(app())

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py", line 78, in serve
    server.serve(

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 166, in serve
    asyncio.run(

  File "/opt/conda/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)

  File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 647, in run_until_complete
    return future.result()

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 133, in serve_inner
    model = get_model(

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/__init__.py", line 274, in get_model
    return T5Sharded(

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/t5.py", line 61, in __init__
    model = T5ForConditionalGeneration(config, weights)

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/custom_modeling/t5_modeling.py", line 1007, in __init__
    self.shared = TensorParallelEmbedding(

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/layers.py", line 268, in __init__
    weight = weights.get_sharded(f"{prefix}.weight", dim=0)

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 73, in get_sharded
    filename, tensor_name = self.get_filename(tensor_name)

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 49, in get_filename
    raise RuntimeError(f"weight {tensor_name} does not exist")

RuntimeError: weight encoder.embed_tokens.weight does not exist


2023-07-06T20:12:51.827178Z  INFO text_generation_launcher: Shutting down shards

zoltan-fedor · 2023-07-06T20:53:13Z

Looking at it in more detail this is the same issue as RuntimeError: weight shared.weight does not exist at #541

TalhaUusuf · 2023-07-06T21:49:39Z

I am also getting same error with falcon7B model, with most of the MPT and falcon models.

Model: falcon-7B

RuntimeError: weight lm_head.weight does not exist

Narsil · 2023-07-06T21:53:15Z

The PR above should help. It's only a matter of weight naming

- Look at `transformers` base class to check for `_key_to_ignore_on_load_missing` or `_tied_weights` which are the standard attributes to select the keys to NOT save on disk (since they are ignored) - Modified safetensors code (to be reflected in safetensors even if it's an internal function). - Will not work for trust_remote_code=True repos (like santacoder). Should help with : #555 and : #501 and #556 and #482 (comment)

zoltan-fedor · 2023-07-07T13:29:32Z

Thanks @Narsil , I have just tested it (with flan-t5-xl) and I can confirm that your PR (#561 - which just got merged) has fixed this issue!
Thanks!

- Look at `transformers` base class to check for `_key_to_ignore_on_load_missing` or `_tied_weights` which are the standard attributes to select the keys to NOT save on disk (since they are ignored) - Modified safetensors code (to be reflected in safetensors even if it's an internal function). - Will not work for trust_remote_code=True repos (like santacoder). Should help with : huggingface/text-generation-inference#555 and : huggingface/text-generation-inference#501 and huggingface/text-generation-inference#556 and huggingface/text-generation-inference#482 (comment)

r0n13 · 2023-07-13T07:47:46Z

Thanks @Narsil, it does work for me too with flan-t5, but I just tried with t5 and the problem seems to still occur.

2023-07-13T06:35:24.607879Z  INFO text_generation_launcher: Args { model_id: "t5-base", revision: None, sharded: None, num_shard: Some(1), quantize: None, dtype: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_input_length: 1024, max_total_tokens: 2048, waiting_served_ratio: 1.2, max_batch_prefill_tokens: 4096, max_batch_total_tokens: 16000, max_waiting_tokens: 20, hostname: "70904c856920", port: 80, shard_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: Some("/data"), weights_cache_override: None, disable_custom_kernels: false, json_output: false, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_domain: None, ngrok_username: None, ngrok_password: None, env: false }
2023-07-13T06:35:24.608058Z  INFO text_generation_launcher: Starting download process.
2023-07-13T06:35:28.958139Z  INFO download: text_generation_launcher: Download file: model.safetensors

2023-07-13T06:35:30.635704Z  INFO download: text_generation_launcher: Downloaded /data/models--t5-base/snapshots/fe6d9bf207cd3337512ca838a8b453f87a9178ef/model.safetensors in 0:00:01.

2023-07-13T06:35:30.635867Z  INFO download: text_generation_launcher: Download: [1/1] -- ETA: 0

2023-07-13T06:35:31.326113Z  INFO text_generation_launcher: Successfully downloaded weights.
2023-07-13T06:35:31.326314Z  INFO text_generation_launcher: Starting shard 0
2023-07-13T06:35:35.984848Z  WARN shard-manager: text_generation_launcher: We're not using custom kernels.
 rank=0
2023-07-13T06:35:41.274660Z ERROR shard-manager: text_generation_launcher: Error when initializing model
Traceback (most recent call last):
  File "/opt/conda/bin/text-generation-server", line 8, in <module>
    sys.exit(app())
  File "/opt/conda/lib/python3.9/site-packages/typer/main.py", line 311, in __call__
    return get_command(self)(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/typer/core.py", line 778, in main
    return _main(
  File "/opt/conda/lib/python3.9/site-packages/typer/core.py", line 216, in _main
    rv = self.invoke(ctx)
  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/typer/main.py", line 683, in wrapper
    return callback(**use_params)  # type: ignore
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py", line 78, in serve
    server.serve(
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 175, in serve
    asyncio.run(
  File "/opt/conda/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 634, in run_until_complete
    self.run_forever()
  File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 601, in run_forever
    self._run_once()
  File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 1905, in _run_once
    handle._run()
  File "/opt/conda/lib/python3.9/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
> File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 142, in serve_inner
    model = get_model(
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/__init__.py", line 279, in get_model
    return T5Sharded(
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/t5.py", line 70, in __init__
    model = T5ForConditionalGeneration(config, weights)
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/custom_modeling/t5_modeling.py", line 1035, in __init__
    self.lm_head = TensorParallelHead.load(
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/layers.py", line 194, in load
    weight = weights.get_tensor(f"{prefix}.weight")
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 64, in get_tensor
    filename, tensor_name = self.get_filename(tensor_name)
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 51, in get_filename
    raise RuntimeError(f"weight {tensor_name} does not exist")
RuntimeError: weight lm_head.weight does not exist

chumpblocckami · 2023-07-13T08:10:02Z

Thanks @Narsil, it does work for me too with flan-t5, but I just tried with t5 and the problem seems to still occur.

2023-07-13T06:35:24.607879Z  INFO text_generation_launcher: Args { model_id: "t5-base", revision: None, sharded: None, num_shard: Some(1), quantize: None, dtype: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_input_length: 1024, max_total_tokens: 2048, waiting_served_ratio: 1.2, max_batch_prefill_tokens: 4096, max_batch_total_tokens: 16000, max_waiting_tokens: 20, hostname: "70904c856920", port: 80, shard_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: Some("/data"), weights_cache_override: None, disable_custom_kernels: false, json_output: false, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_domain: None, ngrok_username: None, ngrok_password: None, env: false }
2023-07-13T06:35:24.608058Z  INFO text_generation_launcher: Starting download process.
2023-07-13T06:35:28.958139Z  INFO download: text_generation_launcher: Download file: model.safetensors

2023-07-13T06:35:30.635704Z  INFO download: text_generation_launcher: Downloaded /data/models--t5-base/snapshots/fe6d9bf207cd3337512ca838a8b453f87a9178ef/model.safetensors in 0:00:01.

2023-07-13T06:35:30.635867Z  INFO download: text_generation_launcher: Download: [1/1] -- ETA: 0

2023-07-13T06:35:31.326113Z  INFO text_generation_launcher: Successfully downloaded weights.
2023-07-13T06:35:31.326314Z  INFO text_generation_launcher: Starting shard 0
2023-07-13T06:35:35.984848Z  WARN shard-manager: text_generation_launcher: We're not using custom kernels.
 rank=0
2023-07-13T06:35:41.274660Z ERROR shard-manager: text_generation_launcher: Error when initializing model
Traceback (most recent call last):
  File "/opt/conda/bin/text-generation-server", line 8, in <module>
    sys.exit(app())
  File "/opt/conda/lib/python3.9/site-packages/typer/main.py", line 311, in __call__
    return get_command(self)(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/typer/core.py", line 778, in main
    return _main(
  File "/opt/conda/lib/python3.9/site-packages/typer/core.py", line 216, in _main
    rv = self.invoke(ctx)
  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/typer/main.py", line 683, in wrapper
    return callback(**use_params)  # type: ignore
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py", line 78, in serve
    server.serve(
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 175, in serve
    asyncio.run(
  File "/opt/conda/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 634, in run_until_complete
    self.run_forever()
  File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 601, in run_forever
    self._run_once()
  File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 1905, in _run_once
    handle._run()
  File "/opt/conda/lib/python3.9/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
> File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 142, in serve_inner
    model = get_model(
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/__init__.py", line 279, in get_model
    return T5Sharded(
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/t5.py", line 70, in __init__
    model = T5ForConditionalGeneration(config, weights)
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/custom_modeling/t5_modeling.py", line 1035, in __init__
    self.lm_head = TensorParallelHead.load(
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/layers.py", line 194, in load
    weight = weights.get_tensor(f"{prefix}.weight")
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 64, in get_tensor
    filename, tensor_name = self.get_filename(tensor_name)
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 51, in get_filename
    raise RuntimeError(f"weight {tensor_name} does not exist")
RuntimeError: weight lm_head.weight does not exist

Try to update docker and run latest image:

docker pull ghcr.io/huggingface/text-generation-inference:latest
docker run --gpus all --shm-size 1g -p 8080:80 -v $PWD/data:/data ghcr.io/huggingface/text-generation-inference:latest --model-id google/flan-t5-base --num-shard 2

r0n13 · 2023-07-13T08:23:57Z

Thanks @chumpblocckami - I did and it does work well with the flan-t5, but not with the 'regular' t5. You can reproduce by using:

docker run --shm-size 1g  \
-p 8080:80  \
--gpus all ghcr.io/huggingface/text-generation-inference:latest \
--model-id t5-base \
--num-shard 1

r0n13 · 2023-07-14T18:27:45Z

Shall I create a separate issue for this?

donglinz · 2023-08-14T14:37:50Z

The same issue happens for OPT

RuntimeError: weight model.decoder.embed_tokens.weight does not exist

saar-eliad · 2023-09-04T12:03:26Z

Got it too, server version 1.0.3 (using docker), and also with latest,
Fails with facebook/opt-125m
but worked for me with another model - gpt2.

- Look at `transformers` base class to check for `_key_to_ignore_on_load_missing` or `_tied_weights` which are the standard attributes to select the keys to NOT save on disk (since they are ignored) - Modified safetensors code (to be reflected in safetensors even if it's an internal function). - Will not work for trust_remote_code=True repos (like santacoder). Should help with : huggingface/text-generation-inference#555 and : huggingface/text-generation-inference#501 and huggingface/text-generation-inference#556 and huggingface/text-generation-inference#482 (comment)

Narsil mentioned this issue Jul 6, 2023

Attempting to harden a bit the weights choice to save on disk. #561

Merged

5 tasks

Narsil closed this as completed Jul 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: weight encoder.embed_tokens.weight does not exist #556

RuntimeError: weight encoder.embed_tokens.weight does not exist #556

chumpblocckami commented Jul 6, 2023

zoltan-fedor commented Jul 6, 2023 •

edited

zoltan-fedor commented Jul 6, 2023

TalhaUusuf commented Jul 6, 2023

Narsil commented Jul 6, 2023

zoltan-fedor commented Jul 7, 2023 •

edited

r0n13 commented Jul 13, 2023

chumpblocckami commented Jul 13, 2023

r0n13 commented Jul 13, 2023

r0n13 commented Jul 14, 2023

donglinz commented Aug 14, 2023

saar-eliad commented Sep 4, 2023

RuntimeError: weight encoder.embed_tokens.weight does not exist #556

RuntimeError: weight encoder.embed_tokens.weight does not exist #556

Comments

chumpblocckami commented Jul 6, 2023

zoltan-fedor commented Jul 6, 2023 • edited

zoltan-fedor commented Jul 6, 2023

TalhaUusuf commented Jul 6, 2023

Narsil commented Jul 6, 2023

zoltan-fedor commented Jul 7, 2023 • edited

r0n13 commented Jul 13, 2023

chumpblocckami commented Jul 13, 2023

r0n13 commented Jul 13, 2023

r0n13 commented Jul 14, 2023

donglinz commented Aug 14, 2023

saar-eliad commented Sep 4, 2023

zoltan-fedor commented Jul 6, 2023 •

edited

zoltan-fedor commented Jul 7, 2023 •

edited