-
Notifications
You must be signed in to change notification settings - Fork 309
Closed
Description
I already tried:
ghcr.io/huggingface/text-embeddings-inference:cpu-ipex-latest
and
ghcr.io/huggingface/text-embeddings-inference:cpu-1.8.1
/# docker run --name tei-embedding -p 8080:80 -v /:/data -e MAX_CLIENT_BATCH_SIZE=4096 ghcr.io/huggingface/text-embeddings-inference:cpu-ipex-latest --pooling last-token --model-id /data/Qwen3-Embedding-0.6B-ONNX
2025-09-08T00:38:24.100856Z INFO text_embeddings_router: router/src/main.rs:203: Args { model_id: "/dat*/*****-*********-*.**-*NNX", revision: None, tokenization_workers: None, dtype: None, pooling: Some(LastToken), max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 4096, auto_truncate: false, default_prompt_name: None, default_prompt: None, dense_path: None, hf_api_token: None, hf_token: None, hostname: "993cfff8d5ab", port: 80, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: Some("/data"), payload_limit: 2000000, api_key: None, json_output: false, disable_spans: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", prometheus_port: 9000, cors_allow_origin: None }
2025-09-08T00:38:24.444390Z WARN text_embeddings_router: router/src/lib.rs:190: Could not find a Sentence Transformers config
2025-09-08T00:38:24.444410Z INFO text_embeddings_router: router/src/lib.rs:194: Maximum number of tokens per request: 32768
2025-09-08T00:38:24.444529Z INFO text_embeddings_core::tokenization: core/src/tokenization.rs:38: Starting 8 tokenization workers
2025-09-08T00:38:24.771051Z INFO text_embeddings_router: router/src/lib.rs:242: Starting model backend
2025-09-08T00:38:24.773832Z INFO text_embeddings_backend_python::management: backends/python/src/management.rs:68: Starting Python backend
2025-09-08T00:38:31.867929Z INFO python-backend: text_embeddings_backend_python::logging: backends/python/src/logging.rs:37: backend device: cpu
File "/usr/local/lib/python3.11/dist-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.11/dist-packages/typer/core.py", line 716, in main
return _main(
File "/usr/local/lib/python3.11/dist-packages/typer/core.py", line 216, in _main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.11/dist-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.11/dist-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/usr/local/lib/python3.11/dist-packages/typer/main.py", line 683, in wrapper
return callback(**use_params) # type: ignore
File "/usr/src/backends/python/server/text_embeddings_server/cli.py", line 51, in serve
server.serve(model_path, dtype, uds_path, pool)
File "/usr/src/backends/python/server/text_embeddings_server/server.py", line 92, in serve
asyncio.run(serve_inner(model_path, dtype))
File "/usr/lib/python3.11/asyncio/runners.py", line 188, in run
return runner.run(main)
File "/usr/lib/python3.11/asyncio/runners.py", line 120, in run
return self._loop.run_until_complete(task)
File "/usr/lib/python3.11/asyncio/base_events.py", line 637, in run_until_complete
self.run_forever()
File "/usr/lib/python3.11/asyncio/base_events.py", line 604, in run_forever
self._run_once()
File "/usr/lib/python3.11/asyncio/base_events.py", line 1909, in _run_once
handle._run()
File "/usr/lib/python3.11/asyncio/events.py", line 80, in _run
self._context.run(self._callback, *self._args)
> File "/usr/src/backends/python/server/text_embeddings_server/server.py", line 61, in serve_inner
model = get_model(model_path, dtype, pool)
File "/usr/src/backends/python/server/text_embeddings_server/models/__init__.py", line 137, in get_model
return create_model(DefaultModel, model_path, device, datatype, pool)
File "/usr/src/backends/python/server/text_embeddings_server/models/__init__.py", line 53, in create_model
model_handle = model_class(
File "/usr/src/backends/python/server/text_embeddings_server/models/default_model.py", line 26, in __init__
AutoModel.from_pretrained(model_path, trust_remote_code=trust_remote)
File "/usr/local/lib/python3.11/dist-packages/transformers/models/auto/auto_factory.py", line 571, in from_pretrained
return model_class.from_pretrained(
File "/usr/local/lib/python3.11/dist-packages/transformers/modeling_utils.py", line 279, in _wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.11/dist-packages/transformers/modeling_utils.py", line 4260, in from_pretrained
checkpoint_files, sharded_metadata = _get_resolved_checkpoint_files(
File "/usr/local/lib/python3.11/dist-packages/transformers/modeling_utils.py", line 952, in _get_resolved_checkpoint_files
raise EnvironmentError(
OSError: Error no file named pytorch_model.bin, model.safetensors, tf_model.h5, model.ckpt.index or flax_model.msgpack found in directory /data/Qwen3-Embedding-0.6B-ONNX.
2025-09-08T00:38:32.767525Z ERROR text_embeddings_backend: backends/src/lib.rs:474: Could not start Python backend: Could not start backend: Python backend failed to start
Error: Could not create backend
Caused by:
Could not start backend: Could not start a suitable backend
/# docker run --name tei-embedding -p 8080:80 -v /:/data -e MAX_CLIENT_BATCH_SIZE=4096 ghcr.io/huggingface/text-embeddings-inference:cpu-1.8.1 --pooling last-token --model-id /data/Qwen3-Embedding-0.6B-ONNX
2025-09-08T00:37:11.549751Z INFO text_embeddings_router: router/src/main.rs:203: Args { model_id: "/dat*/*****-*********-*.**-*NNX", revision: None, tokenization_workers: None, dtype: None, pooling: Some(LastToken), max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 4096, auto_truncate: false, default_prompt_name: None, default_prompt: None, dense_path: None, hf_api_token: None, hf_token: None, hostname: "1f80d7fce660", port: 80, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: Some("/data"), payload_limit: 2000000, api_key: None, json_output: false, disable_spans: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", prometheus_port: 9000, cors_allow_origin: None }
2025-09-08T00:37:11.898502Z WARN text_embeddings_router: router/src/lib.rs:190: Could not find a Sentence Transformers config
2025-09-08T00:37:11.898517Z INFO text_embeddings_router: router/src/lib.rs:194: Maximum number of tokens per request: 32768
2025-09-08T00:37:11.898638Z INFO text_embeddings_core::tokenization: core/src/tokenization.rs:38: Starting 8 tokenization workers
2025-09-08T00:37:12.222672Z INFO text_embeddings_router: router/src/lib.rs:242: Starting model backend
2025-09-08T00:37:12.696602Z ERROR text_embeddings_backend: backends/src/lib.rs:402: Could not start ORT backend: Could not start backend: Deserialize tensor model.layers.14.attn.v_proj.MatMul.weight failed.tensorprotoutils.cc:1091 GetExtDataFromTensorProto External initializer: model.layers.14.attn.v_proj.MatMul.weight offset: 1531400192 size to read: 4194304 given file_length: 135 are out of bounds or can not be read in full.
2025-09-08T00:37:12.698175Z ERROR text_embeddings_backend: backends/src/lib.rs:448: Could not start Candle backend: Could not start backend: No such file or directory (os error 2)
Error: Could not create backend
Caused by:
Could not start backend: Could not start a suitable backend
Metadata
Metadata
Assignees
Labels
No labels