Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: weight encoder.embed_tokens.weight does not exist #556

Closed
chumpblocckami opened this issue Jul 6, 2023 · 11 comments
Closed

Comments

@chumpblocckami
Copy link

After running:

docker run --gpus all --shm-size 1g -p 8080:80 -v $PWD/data:/data ghcr.io/huggingface/text-generation-inference:0.9 --model-id google/flan-t5-small --num-shard 1

I recieve:

RuntimeError: weight encoder.embed_tokens.weight does not exist

I tried multiple small models but every one raise the same issue.

Any tips?

Thanks

@zoltan-fedor
Copy link

zoltan-fedor commented Jul 6, 2023

Same issue here with flan-t5-xl.
I am using v0.9.1 on EKS.

Full startup log below:

{"timestamp":"2023-07-06T19:14:42.852088Z","level":"INFO","fields":{"message":"Args { model_id: \"google/flan-t5-xl\", revision: None, sharded: None, num_shard: Some(1), quantize: Some(Bitsandbytes), dtype: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_input_length: 1024, max_total_tokens: 2048, waiting_served_ratio: 1.2, max_batch_prefill_tokens: 4096, max_batch_total_tokens: 16000, max_waiting_tokens: 20, hostname: \"flan-t5-xl-64959c4d74-qs64q\", port: 80, shard_uds_path: \"/tmp/text-generation-server\", master_addr: \"localhost\", master_port: 29500, huggingface_hub_cache: Some(\"/data\"), weights_cache_override: None, disable_custom_kernels: false, json_output: true, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_domain: None, ngrok_username: None, ngrok_password: None, env: false }"},"target":"text_generation_launcher"}
{"timestamp":"2023-07-06T19:14:42.852211Z","level":"INFO","fields":{"message":"Starting download process."},"target":"text_generation_launcher"}
{"timestamp":"2023-07-06T19:14:44.549952Z","level":"WARN","fields":{"message":"No safetensors weights found for model google/flan-t5-xl at revision None. Downloading PyTorch weights.\n"},"target":"text_generation_launcher","span":{"name":"download"},"spans":[{"name":"download"}]}
{"timestamp":"2023-07-06T19:14:44.604291Z","level":"INFO","fields":{"message":"Download file: pytorch_model-00001-of-00002.bin\n"},"target":"text_generation_launcher","span":{"name":"download"},"spans":[{"name":"download"}]}
{"timestamp":"2023-07-06T19:15:03.211958Z","level":"INFO","fields":{"message":"Downloaded /data/models--google--flan-t5-xl/snapshots/53fd1e22aa944eee1fd336f9aee8a437e01676ce/pytorch_model-00001-of-00002.bin in 0:00:18.\n"},"target":"text_generation_launcher","span":{"name":"download"},"spans":[{"name":"download"}]}
{"timestamp":"2023-07-06T19:15:03.212049Z","level":"INFO","fields":{"message":"Download: [1/2] -- ETA: 0:00:18\n"},"target":"text_generation_launcher","span":{"name":"download"},"spans":[{"name":"download"}]}
{"timestamp":"2023-07-06T19:15:03.212305Z","level":"INFO","fields":{"message":"Download file: pytorch_model-00002-of-00002.bin\n"},"target":"text_generation_launcher","span":{"name":"download"},"spans":[{"name":"download"}]}
{"timestamp":"2023-07-06T19:15:09.970978Z","level":"INFO","fields":{"message":"Downloaded /data/models--google--flan-t5-xl/snapshots/53fd1e22aa944eee1fd336f9aee8a437e01676ce/pytorch_model-00002-of-00002.bin in 0:00:06.\n"},"target":"text_generation_launcher","span":{"name":"download"},"spans":[{"name":"download"}]}
{"timestamp":"2023-07-06T19:15:09.971051Z","level":"INFO","fields":{"message":"Download: [2/2] -- ETA: 0\n"},"target":"text_generation_launcher","span":{"name":"download"},"spans":[{"name":"download"}]}
{"timestamp":"2023-07-06T19:15:09.971146Z","level":"WARN","fields":{"message":"No safetensors weights found for model google/flan-t5-xl at revision None. Converting PyTorch weights to safetensors.\n"},"target":"text_generation_launcher","span":{"name":"download"},"spans":[{"name":"download"}]}
{"timestamp":"2023-07-06T19:16:15.141453Z","level":"INFO","fields":{"message":"Convert: [1/2] -- Took: 0:01:05.169846\n"},"target":"text_generation_launcher","span":{"name":"download"},"spans":[{"name":"download"}]}
{"timestamp":"2023-07-06T19:16:24.913801Z","level":"INFO","fields":{"message":"Convert: [2/2] -- Took: 0:00:09.772093\n"},"target":"text_generation_launcher","span":{"name":"download"},"spans":[{"name":"download"}]}
{"timestamp":"2023-07-06T19:16:25.262110Z","level":"INFO","fields":{"message":"Successfully downloaded weights."},"target":"text_generation_launcher"}
{"timestamp":"2023-07-06T19:16:25.262659Z","level":"INFO","fields":{"message":"Starting shard 0"},"target":"text_generation_launcher"}
{"timestamp":"2023-07-06T19:16:29.284643Z","level":"WARN","fields":{"message":"We're not using custom kernels.\n"},"target":"text_generation_launcher","span":{"rank":0,"name":"shard-manager"},"spans":[{"rank":0,"name":"shard-manager"}]}
{"timestamp":"2023-07-06T19:16:30.102714Z","level":"ERROR","fields":{"message":"Error when initializing model\nTraceback (most recent call last):\n  File \"/opt/conda/lib/python3.9/site-packages/text_generation_server/models/custom_modeling/t5_modeling.py\", line 1005, in __init__\n    self.shared = TensorParallelEmbedding(prefix=\"shared\", weights=weights)\n  File \"/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/layers.py\", line 280, in __init__\n    weight = weights.get_sharded(f\"{prefix}.weight\", dim=0)\n  File \"/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py\", line 73, in get_sharded\n    filename, tensor_name = self.get_filename(tensor_name)\n  File \"/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py\", line 49, in get_filename\n    raise RuntimeError(f\"weight {tensor_name} does not exist\")\nRuntimeError: weight shared.weight does not exist\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n  File \"/opt/conda/bin/text-generation-server\", line 8, in <module>\n    sys.exit(app())\n  File \"/opt/conda/lib/python3.9/site-packages/typer/main.py\", line 311, in __call__\n    return get_command(self)(*args, **kwargs)\n  File \"/opt/conda/lib/python3.9/site-packages/click/core.py\", line 1130, in __call__\n    return self.main(*args, **kwargs)\n  File \"/opt/conda/lib/python3.9/site-packages/typer/core.py\", line 778, in main\n    return _main(\n  File \"/opt/conda/lib/python3.9/site-packages/typer/core.py\", line 216, in _main\n    rv = self.invoke(ctx)\n  File \"/opt/conda/lib/python3.9/site-packages/click/core.py\", line 1657, in invoke\n    return _process_result(sub_ctx.command.invoke(sub_ctx))\n  File \"/opt/conda/lib/python3.9/site-packages/click/core.py\", line 1404, in invoke\n    return ctx.invoke(self.callback, **ctx.params)\n  File \"/opt/conda/lib/python3.9/site-packages/click/core.py\", line 760, in invoke\n    return __callback(*args, **kwargs)\n  File \"/opt/conda/lib/python3.9/site-packages/typer/main.py\", line 683, in wrapper\n    return callback(**use_params)  # type: ignore\n  File \"/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py\", line 78, in serve\n    server.serve(\n  File \"/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py\", line 166, in serve\n    asyncio.run(\n  File \"/opt/conda/lib/python3.9/asyncio/runners.py\", line 44, in run\n    return loop.run_until_complete(main)\n  File \"/opt/conda/lib/python3.9/asyncio/base_events.py\", line 634, in run_until_complete\n    self.run_forever()\n  File \"/opt/conda/lib/python3.9/asyncio/base_events.py\", line 601, in run_forever\n    self._run_once()\n  File \"/opt/conda/lib/python3.9/asyncio/base_events.py\", line 1905, in _run_once\n    handle._run()\n  File \"/opt/conda/lib/python3.9/asyncio/events.py\", line 80, in _run\n    self._context.run(self._callback, *self._args)\n> File \"/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py\", line 133, in serve_inner\n    model = get_model(\n  File \"/opt/conda/lib/python3.9/site-packages/text_generation_server/models/__init__.py\", line 279, in get_model\n    return T5Sharded(\n  File \"/opt/conda/lib/python3.9/site-packages/text_generation_server/models/t5.py\", line 61, in __init__\n    model = T5ForConditionalGeneration(config, weights)\n  File \"/opt/conda/lib/python3.9/site-packages/text_generation_server/models/custom_modeling/t5_modeling.py\", line 1007, in __init__\n    self.shared = TensorParallelEmbedding(\n  File \"/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/layers.py\", line 280, in __init__\n    weight = weights.get_sharded(f\"{prefix}.weight\", dim=0)\n  File \"/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py\", line 73, in get_sharded\n    filename, tensor_name = self.get_filename(tensor_name)\n  File \"/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py\", line 49, in get_filename\n    raise RuntimeError(f\"weight {tensor_name} does not exist\")\nRuntimeError: weight encoder.embed_tokens.weight does not exist\n"},"target":"text_generation_launcher","span":{"rank":0,"name":"shard-manager"},"spans":[{"rank":0,"name":"shard-manager"}]}
{"timestamp":"2023-07-06T19:16:30.667766Z","level":"ERROR","fields":{"message":"Shard 0 failed to start"},"target":"text_generation_launcher"}
{"timestamp":"2023-07-06T19:16:30.667795Z","level":"ERROR","fields":{"message":"Traceback (most recent call last):\n\n  File \"/opt/conda/lib/python3.9/site-packages/text_generation_server/models/custom_modeling/t5_modeling.py\", line 1005, in __init__\n    self.shared = TensorParallelEmbedding(prefix=\"shared\", weights=weights)\n\n  File \"/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/layers.py\", line 280, in __init__\n    weight = weights.get_sharded(f\"{prefix}.weight\", dim=0)\n\n  File \"/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py\", line 73, in get_sharded\n    filename, tensor_name = self.get_filename(tensor_name)\n\n  File \"/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py\", line 49, in get_filename\n    raise RuntimeError(f\"weight {tensor_name} does not exist\")\n\nRuntimeError: weight shared.weight does not exist\n\n\nDuring handling of the above exception, another exception occurred:\n\n\nTraceback (most recent call last):\n\n  File \"/opt/conda/bin/text-generation-server\", line 8, in <module>\n    sys.exit(app())\n\n  File \"/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py\", line 78, in serve\n    server.serve(\n\n  File \"/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py\", line 166, in serve\n    asyncio.run(\n\n  File \"/opt/conda/lib/python3.9/asyncio/runners.py\", line 44, in run\n    return loop.run_until_complete(main)\n\n  File \"/opt/conda/lib/python3.9/asyncio/base_events.py\", line 647, in run_until_complete\n    return future.result()\n\n  File \"/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py\", line 133, in serve_inner\n    model = get_model(\n\n  File \"/opt/conda/lib/python3.9/site-packages/text_generation_server/models/__init__.py\", line 279, in get_model\n    return T5Sharded(\n\n  File \"/opt/conda/lib/python3.9/site-packages/text_generation_server/models/t5.py\", line 61, in __init__\n    model = T5ForConditionalGeneration(config, weights)\n\n  File \"/opt/conda/lib/python3.9/site-packages/text_generation_server/models/custom_modeling/t5_modeling.py\", line 1007, in __init__\n    self.shared = TensorParallelEmbedding(\n\n  File \"/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/layers.py\", line 280, in __init__\n    weight = weights.get_sharded(f\"{prefix}.weight\", dim=0)\n\n  File \"/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py\", line 73, in get_sharded\n    filename, tensor_name = self.get_filename(tensor_name)\n\n  File \"/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py\", line 49, in get_filename\n    raise RuntimeError(f\"weight {tensor_name} does not exist\")\n\nRuntimeError: weight encoder.embed_tokens.weight does not exist\n\n"},"target":"text_generation_launcher"}
{"timestamp":"2023-07-06T19:16:30.667823Z","level":"INFO","fields":{"message":"Shutting down shards"},"target":"text_generation_launcher"}
Error: ShardCannotStart

When sharding disabled (same error, but easier to read):

2023-07-06T20:10:55.686866Z  INFO text_generation_launcher: Args { model_id: "google/flan-t5-xl", revision: None, sharded: Some(false), num_shard: Some(1), quantize: Some(Bitsandbytes), dtype: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_input_length: 1024, max_total_tokens: 2048, waiting_served_ratio: 1.2, max_batch_prefill_tokens: 4096, max_batch_total_tokens: 16000, max_waiting_tokens: 20, port: 80, shard_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: Some("/data"), weights_cache_override: None, disable_custom_kernels: false, json_output: false, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_domain: None, ngrok_username: None, ngrok_password: None, env: false }
2023-07-06T20:10:55.686971Z  INFO text_generation_launcher: Starting download process.
2023-07-06T20:10:57.341715Z  WARN download: text_generation_launcher: No safetensors weights found for model google/flan-t5-xl at revision None. Downloading PyTorch weights.

2023-07-06T20:10:57.417081Z  INFO download: text_generation_launcher: Download file: pytorch_model-00001-of-00002.bin

2023-07-06T20:11:18.169311Z  INFO download: text_generation_launcher: Downloaded /data/models--google--flan-t5-xl/snapshots/53fd1e22aa944eee1fd336f9aee8a437e01676ce/pytorch_model-00001-of-00002.bin in 0:00:20.

2023-07-06T20:11:18.169386Z  INFO download: text_generation_launcher: Download: [1/2] -- ETA: 0:00:20

2023-07-06T20:11:18.169595Z  INFO download: text_generation_launcher: Download file: pytorch_model-00002-of-00002.bin

2023-07-06T20:11:25.050713Z  INFO download: text_generation_launcher: Downloaded /data/models--google--flan-t5-xl/snapshots/53fd1e22aa944eee1fd336f9aee8a437e01676ce/pytorch_model-00002-of-00002.bin in 0:00:06.

2023-07-06T20:11:25.050803Z  INFO download: text_generation_launcher: Download: [2/2] -- ETA: 0

2023-07-06T20:11:25.050899Z  WARN download: text_generation_launcher: No safetensors weights found for model google/flan-t5-xl at revision None. Converting PyTorch weights to safetensors.

2023-07-06T20:12:30.361334Z  INFO download: text_generation_launcher: Convert: [1/2] -- Took: 0:01:05.309101

2023-07-06T20:12:40.112889Z  INFO download: text_generation_launcher: Convert: [2/2] -- Took: 0:00:09.752118

2023-07-06T20:12:42.517781Z  INFO text_generation_launcher: Successfully downloaded weights.
2023-07-06T20:12:42.518379Z  INFO text_generation_launcher: Starting shard 0
2023-07-06T20:12:50.458364Z  WARN shard-manager: text_generation_launcher: We're not using custom kernels.
 rank=0
2023-07-06T20:12:51.265848Z ERROR shard-manager: text_generation_launcher: Error when initializing model
Traceback (most recent call last):
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/custom_modeling/t5_modeling.py", line 1005, in __init__
    self.shared = TensorParallelEmbedding(prefix="shared", weights=weights)
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/layers.py", line 268, in __init__
    weight = weights.get_sharded(f"{prefix}.weight", dim=0)
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 73, in get_sharded
    filename, tensor_name = self.get_filename(tensor_name)
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 49, in get_filename
    raise RuntimeError(f"weight {tensor_name} does not exist")
RuntimeError: weight shared.weight does not exist

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/bin/text-generation-server", line 8, in <module>
    sys.exit(app())
  File "/opt/conda/lib/python3.9/site-packages/typer/main.py", line 311, in __call__
    return get_command(self)(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/typer/core.py", line 778, in main
    return _main(
  File "/opt/conda/lib/python3.9/site-packages/typer/core.py", line 216, in _main
    rv = self.invoke(ctx)
  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/typer/main.py", line 683, in wrapper
    return callback(**use_params)  # type: ignore
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py", line 78, in serve
    server.serve(
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 166, in serve
    asyncio.run(
  File "/opt/conda/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 634, in run_until_complete
    self.run_forever()
  File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 601, in run_forever
    self._run_once()
  File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 1905, in _run_once
    handle._run()
  File "/opt/conda/lib/python3.9/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
> File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 133, in serve_inner
    model = get_model(
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/__init__.py", line 274, in get_model
    return T5Sharded(
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/t5.py", line 61, in __init__
    model = T5ForConditionalGeneration(config, weights)
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/custom_modeling/t5_modeling.py", line 1007, in __init__
    self.shared = TensorParallelEmbedding(
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/layers.py", line 268, in __init__
    weight = weights.get_sharded(f"{prefix}.weight", dim=0)
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 73, in get_sharded
    filename, tensor_name = self.get_filename(tensor_name)
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 49, in get_filename
    raise RuntimeError(f"weight {tensor_name} does not exist")
RuntimeError: weight encoder.embed_tokens.weight does not exist
 rank=0
2023-07-06T20:12:51.827130Z ERROR text_generation_launcher: Shard 0 failed to start
2023-07-06T20:12:51.827155Z ERROR text_generation_launcher: Traceback (most recent call last):

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/custom_modeling/t5_modeling.py", line 1005, in __init__
    self.shared = TensorParallelEmbedding(prefix="shared", weights=weights)

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/layers.py", line 268, in __init__
    weight = weights.get_sharded(f"{prefix}.weight", dim=0)

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 73, in get_sharded
    filename, tensor_name = self.get_filename(tensor_name)

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 49, in get_filename
    raise RuntimeError(f"weight {tensor_name} does not exist")

RuntimeError: weight shared.weight does not exist


Error: ShardCannotStart
During handling of the above exception, another exception occurred:


Traceback (most recent call last):

  File "/opt/conda/bin/text-generation-server", line 8, in <module>
    sys.exit(app())

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py", line 78, in serve
    server.serve(

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 166, in serve
    asyncio.run(

  File "/opt/conda/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)

  File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 647, in run_until_complete
    return future.result()

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 133, in serve_inner
    model = get_model(

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/__init__.py", line 274, in get_model
    return T5Sharded(

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/t5.py", line 61, in __init__
    model = T5ForConditionalGeneration(config, weights)

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/custom_modeling/t5_modeling.py", line 1007, in __init__
    self.shared = TensorParallelEmbedding(

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/layers.py", line 268, in __init__
    weight = weights.get_sharded(f"{prefix}.weight", dim=0)

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 73, in get_sharded
    filename, tensor_name = self.get_filename(tensor_name)

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 49, in get_filename
    raise RuntimeError(f"weight {tensor_name} does not exist")

RuntimeError: weight encoder.embed_tokens.weight does not exist


2023-07-06T20:12:51.827178Z  INFO text_generation_launcher: Shutting down shards

@zoltan-fedor
Copy link

Looking at it in more detail this is the same issue as RuntimeError: weight shared.weight does not exist at #541

@TalhaUusuf
Copy link

I am also getting same error with falcon7B model, with most of the MPT and falcon models.

Model: falcon-7B

RuntimeError: weight lm_head.weight does not exist

@Narsil
Copy link
Collaborator

Narsil commented Jul 6, 2023

The PR above should help. It's only a matter of weight naming

OlivierDehaene pushed a commit that referenced this issue Jul 7, 2023
- Look at `transformers` base class to check for
  `_key_to_ignore_on_load_missing` or `_tied_weights` which are the
  standard attributes to select the keys to NOT save on disk (since they
  are ignored)

- Modified safetensors code (to be reflected in safetensors even if it's
  an internal function).
  
- Will not work for trust_remote_code=True repos (like santacoder).

Should help with :
#555
and : #501
and #556
and
#482 (comment)
@zoltan-fedor
Copy link

zoltan-fedor commented Jul 7, 2023

Thanks @Narsil , I have just tested it (with flan-t5-xl) and I can confirm that your PR (#561 - which just got merged) has fixed this issue!
Thanks!

@Narsil Narsil closed this as completed Jul 7, 2023
AIProphet added a commit to AIProphet/text-generation-inference that referenced this issue Jul 12, 2023
- Look at `transformers` base class to check for
  `_key_to_ignore_on_load_missing` or `_tied_weights` which are the
  standard attributes to select the keys to NOT save on disk (since they
  are ignored)

- Modified safetensors code (to be reflected in safetensors even if it's
  an internal function).
  
- Will not work for trust_remote_code=True repos (like santacoder).

Should help with :
huggingface/text-generation-inference#555
and : huggingface/text-generation-inference#501
and huggingface/text-generation-inference#556
and
huggingface/text-generation-inference#482 (comment)
@r0n13
Copy link

r0n13 commented Jul 13, 2023

Thanks @Narsil, it does work for me too with flan-t5, but I just tried with t5 and the problem seems to still occur.

2023-07-13T06:35:24.607879Z  INFO text_generation_launcher: Args { model_id: "t5-base", revision: None, sharded: None, num_shard: Some(1), quantize: None, dtype: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_input_length: 1024, max_total_tokens: 2048, waiting_served_ratio: 1.2, max_batch_prefill_tokens: 4096, max_batch_total_tokens: 16000, max_waiting_tokens: 20, hostname: "70904c856920", port: 80, shard_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: Some("/data"), weights_cache_override: None, disable_custom_kernels: false, json_output: false, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_domain: None, ngrok_username: None, ngrok_password: None, env: false }
2023-07-13T06:35:24.608058Z  INFO text_generation_launcher: Starting download process.
2023-07-13T06:35:28.958139Z  INFO download: text_generation_launcher: Download file: model.safetensors

2023-07-13T06:35:30.635704Z  INFO download: text_generation_launcher: Downloaded /data/models--t5-base/snapshots/fe6d9bf207cd3337512ca838a8b453f87a9178ef/model.safetensors in 0:00:01.

2023-07-13T06:35:30.635867Z  INFO download: text_generation_launcher: Download: [1/1] -- ETA: 0

2023-07-13T06:35:31.326113Z  INFO text_generation_launcher: Successfully downloaded weights.
2023-07-13T06:35:31.326314Z  INFO text_generation_launcher: Starting shard 0
2023-07-13T06:35:35.984848Z  WARN shard-manager: text_generation_launcher: We're not using custom kernels.
 rank=0
2023-07-13T06:35:41.274660Z ERROR shard-manager: text_generation_launcher: Error when initializing model
Traceback (most recent call last):
  File "/opt/conda/bin/text-generation-server", line 8, in <module>
    sys.exit(app())
  File "/opt/conda/lib/python3.9/site-packages/typer/main.py", line 311, in __call__
    return get_command(self)(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/typer/core.py", line 778, in main
    return _main(
  File "/opt/conda/lib/python3.9/site-packages/typer/core.py", line 216, in _main
    rv = self.invoke(ctx)
  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/typer/main.py", line 683, in wrapper
    return callback(**use_params)  # type: ignore
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py", line 78, in serve
    server.serve(
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 175, in serve
    asyncio.run(
  File "/opt/conda/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 634, in run_until_complete
    self.run_forever()
  File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 601, in run_forever
    self._run_once()
  File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 1905, in _run_once
    handle._run()
  File "/opt/conda/lib/python3.9/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
> File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 142, in serve_inner
    model = get_model(
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/__init__.py", line 279, in get_model
    return T5Sharded(
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/t5.py", line 70, in __init__
    model = T5ForConditionalGeneration(config, weights)
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/custom_modeling/t5_modeling.py", line 1035, in __init__
    self.lm_head = TensorParallelHead.load(
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/layers.py", line 194, in load
    weight = weights.get_tensor(f"{prefix}.weight")
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 64, in get_tensor
    filename, tensor_name = self.get_filename(tensor_name)
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 51, in get_filename
    raise RuntimeError(f"weight {tensor_name} does not exist")
RuntimeError: weight lm_head.weight does not exist

@chumpblocckami
Copy link
Author

Thanks @Narsil, it does work for me too with flan-t5, but I just tried with t5 and the problem seems to still occur.

2023-07-13T06:35:24.607879Z  INFO text_generation_launcher: Args { model_id: "t5-base", revision: None, sharded: None, num_shard: Some(1), quantize: None, dtype: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_input_length: 1024, max_total_tokens: 2048, waiting_served_ratio: 1.2, max_batch_prefill_tokens: 4096, max_batch_total_tokens: 16000, max_waiting_tokens: 20, hostname: "70904c856920", port: 80, shard_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: Some("/data"), weights_cache_override: None, disable_custom_kernels: false, json_output: false, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_domain: None, ngrok_username: None, ngrok_password: None, env: false }
2023-07-13T06:35:24.608058Z  INFO text_generation_launcher: Starting download process.
2023-07-13T06:35:28.958139Z  INFO download: text_generation_launcher: Download file: model.safetensors

2023-07-13T06:35:30.635704Z  INFO download: text_generation_launcher: Downloaded /data/models--t5-base/snapshots/fe6d9bf207cd3337512ca838a8b453f87a9178ef/model.safetensors in 0:00:01.

2023-07-13T06:35:30.635867Z  INFO download: text_generation_launcher: Download: [1/1] -- ETA: 0

2023-07-13T06:35:31.326113Z  INFO text_generation_launcher: Successfully downloaded weights.
2023-07-13T06:35:31.326314Z  INFO text_generation_launcher: Starting shard 0
2023-07-13T06:35:35.984848Z  WARN shard-manager: text_generation_launcher: We're not using custom kernels.
 rank=0
2023-07-13T06:35:41.274660Z ERROR shard-manager: text_generation_launcher: Error when initializing model
Traceback (most recent call last):
  File "/opt/conda/bin/text-generation-server", line 8, in <module>
    sys.exit(app())
  File "/opt/conda/lib/python3.9/site-packages/typer/main.py", line 311, in __call__
    return get_command(self)(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/typer/core.py", line 778, in main
    return _main(
  File "/opt/conda/lib/python3.9/site-packages/typer/core.py", line 216, in _main
    rv = self.invoke(ctx)
  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/typer/main.py", line 683, in wrapper
    return callback(**use_params)  # type: ignore
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py", line 78, in serve
    server.serve(
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 175, in serve
    asyncio.run(
  File "/opt/conda/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 634, in run_until_complete
    self.run_forever()
  File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 601, in run_forever
    self._run_once()
  File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 1905, in _run_once
    handle._run()
  File "/opt/conda/lib/python3.9/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
> File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 142, in serve_inner
    model = get_model(
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/__init__.py", line 279, in get_model
    return T5Sharded(
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/t5.py", line 70, in __init__
    model = T5ForConditionalGeneration(config, weights)
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/custom_modeling/t5_modeling.py", line 1035, in __init__
    self.lm_head = TensorParallelHead.load(
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/layers.py", line 194, in load
    weight = weights.get_tensor(f"{prefix}.weight")
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 64, in get_tensor
    filename, tensor_name = self.get_filename(tensor_name)
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 51, in get_filename
    raise RuntimeError(f"weight {tensor_name} does not exist")
RuntimeError: weight lm_head.weight does not exist

Try to update docker and run latest image:

docker pull ghcr.io/huggingface/text-generation-inference:latest
docker run --gpus all --shm-size 1g -p 8080:80 -v $PWD/data:/data ghcr.io/huggingface/text-generation-inference:latest --model-id google/flan-t5-base --num-shard 2

@r0n13
Copy link

r0n13 commented Jul 13, 2023

Thanks @chumpblocckami - I did and it does work well with the flan-t5, but not with the 'regular' t5. You can reproduce by using:

docker run --shm-size 1g  \
-p 8080:80  \
--gpus all ghcr.io/huggingface/text-generation-inference:latest \
--model-id t5-base \
--num-shard 1

@r0n13
Copy link

r0n13 commented Jul 14, 2023

Shall I create a separate issue for this?

@donglinz
Copy link

The same issue happens for OPT

RuntimeError: weight model.decoder.embed_tokens.weight does not exist

@saar-eliad
Copy link

Got it too, server version 1.0.3 (using docker), and also with latest,
Fails with facebook/opt-125m
but worked for me with another model - gpt2.

verdant621 added a commit to verdant621/text-generation-inference that referenced this issue Oct 19, 2023
- Look at `transformers` base class to check for
  `_key_to_ignore_on_load_missing` or `_tied_weights` which are the
  standard attributes to select the keys to NOT save on disk (since they
  are ignored)

- Modified safetensors code (to be reflected in safetensors even if it's
  an internal function).
  
- Will not work for trust_remote_code=True repos (like santacoder).

Should help with :
huggingface/text-generation-inference#555
and : huggingface/text-generation-inference#501
and huggingface/text-generation-inference#556
and
huggingface/text-generation-inference#482 (comment)
cr313 added a commit to cr313/text-generation-inference-load-test that referenced this issue Apr 19, 2024
- Look at `transformers` base class to check for
  `_key_to_ignore_on_load_missing` or `_tied_weights` which are the
  standard attributes to select the keys to NOT save on disk (since they
  are ignored)

- Modified safetensors code (to be reflected in safetensors even if it's
  an internal function).
  
- Will not work for trust_remote_code=True repos (like santacoder).

Should help with :
huggingface/text-generation-inference#555
and : huggingface/text-generation-inference#501
and huggingface/text-generation-inference#556
and
huggingface/text-generation-inference#482 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants