-
Notifications
You must be signed in to change notification settings - Fork 2.8k
[BUG]: quantization_encoding 'SupportedEncoding.float32' is not supported by the repo #5592
Copy link
Copy link
Closed
Labels
Needs TriageIssue needs to be routed/triaged to a particular team stillIssue needs to be routed/triaged to a particular team stillbugSomething isn't workingSomething isn't workingmax
Description
Bug description
I am trying to run model on my M4 Macbook following the tutorial, but I am stuck on this error.
I don't know what should I do next. Is it impossible to run the model on Macbook?
or should I follow https://docs.modular.com/max/tutorials/serve-custom-model-architectures?
or is this Hugging Face issue?
I left here ask helping, because I have don't know on where I can ask. :)
Please let me know any other helpdesk.
I heard about Mojo is fast than Python, so I thought Modular would be faster than Ollama.
And I am trying to replace Ollama with Modular.
Errors
- max serve - model meta-llama/Llama-3.1–8B-Instruct
- ValueError: The encoding 'SupportedEncoding.bfloat16' is not compatible with the selected device type 'cpu'.
- max serve - model meta-llama/Llama-3.1–8B-Instruct --quantization-encoding float32
- ValueError: quantization_encoding 'SupportedEncoding.float32' is not supported by the repo 'meta-llama/Llama-3.1-8B'
Full Logs
- max serve - model meta-llama/Llama-3.1–8B-Instruct
13:04:31.772 INFO: Metrics initialized.
WARNING: accelerator_count() returns 0 on Apple devices. While Mojo now supports Apple GPUs, that support has not been enabled in MAX and Python APIs yet
13:04:31.787 INFO: No GPUs available, falling back to CPU
WARNING: accelerator_count() returns 0 on Apple devices. While Mojo now supports Apple GPUs, that support has not been enabled in MAX and Python APIs yet
13:04:31.788 INFO: No GPUs available, falling back to CPU
WARNING: accelerator_count() returns 0 on Apple devices. While Mojo now supports Apple GPUs, that support has not been enabled in MAX and Python APIs yet
WARNING: accelerator_count() returns 0 on Apple devices. While Mojo now supports Apple GPUs, that support has not been enabled in MAX and Python APIs yet
Traceback (most recent call last):
File "/Users/LYJ/Applications/MAX/.venv/MAX/bin/max", line 8, in <module>
sys.exit(main())
^^^^^^
File "/Users/LYJ/Applications/MAX/.venv/MAX/lib/python3.12/site-packages/click/core.py", line 1485, in __call__
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/LYJ/Applications/MAX/.venv/MAX/lib/python3.12/site-packages/click/core.py", line 1406, in main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "/Users/LYJ/Applications/MAX/.venv/MAX/lib/python3.12/site-packages/click/core.py", line 1873, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/LYJ/Applications/MAX/.venv/MAX/lib/python3.12/site-packages/max/entrypoints/pipelines.py", line 85, in invoke
return super().invoke(ctx)
^^^^^^^^^^^^^^^^^^^
File "/Users/LYJ/Applications/MAX/.venv/MAX/lib/python3.12/site-packages/click/core.py", line 1269, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/LYJ/Applications/MAX/.venv/MAX/lib/python3.12/site-packages/click/core.py", line 824, in invoke
return callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/LYJ/Applications/MAX/.venv/MAX/lib/python3.12/site-packages/max/entrypoints/cli/config.py", line 285, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/Users/LYJ/Applications/MAX/.venv/MAX/lib/python3.12/site-packages/max/entrypoints/pipelines.py", line 191, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/Users/LYJ/Applications/MAX/.venv/MAX/lib/python3.12/site-packages/max/entrypoints/pipelines.py", line 259, in cli_serve
pipeline_config = PipelineConfig(**config_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/LYJ/Applications/MAX/.venv/MAX/lib/python3.12/site-packages/max/pipelines/lib/config.py", line 379, in __init__
self.resolve()
File "/Users/LYJ/Applications/MAX/.venv/MAX/lib/python3.12/site-packages/max/pipelines/lib/config.py", line 576, in resolve
self._validate_and_resolve_remaining_pipeline_config(
File "/Users/LYJ/Applications/MAX/.venv/MAX/lib/python3.12/site-packages/max/pipelines/lib/config.py", line 748, in _validate_and_resolve_remaining_pipeline_config
model_config.validate_and_resolve_with_resolved_quantization_encoding(
File "/Users/LYJ/Applications/MAX/.venv/MAX/lib/python3.12/site-packages/max/pipelines/lib/model_config.py", line 433, in validate_and_resolve_with_resolved_quantization_encoding
self._validate_quantization_encoding_device_compatibility(
File "/Users/LYJ/Applications/MAX/.venv/MAX/lib/python3.12/site-packages/max/pipelines/lib/model_config.py", line 646, in _validate_quantization_encoding_device_compatibility
raise ValueError(msg)
ValueError: The encoding 'SupportedEncoding.bfloat16' is not compatible with the selected device type 'cpu'.
You have two options to resolve this:
1. Use a different device
2. Use a different encoding (encodings available for this model: SupportedEncoding.gptq, SupportedEncoding.q4_k, SupportedEncoding.q4_0, SupportedEncoding.q6_k, SupportedEncoding.float32, SupportedEncoding.bfloat16, SupportedEncoding.float8_e4m3fn)
- max serve - model meta-llama/Llama-3.1–8B-Instruct --quantization-encoding float32
13:06:50.066 INFO: Metrics initialized.
WARNING: accelerator_count() returns 0 on Apple devices. While Mojo now supports Apple GPUs, that support has not been enabled in MAX and Python APIs yet
13:06:50.084 INFO: No GPUs available, falling back to CPU
WARNING: accelerator_count() returns 0 on Apple devices. While Mojo now supports Apple GPUs, that support has not been enabled in MAX and Python APIs yet
13:06:50.084 INFO: No GPUs available, falling back to CPU
WARNING: accelerator_count() returns 0 on Apple devices. While Mojo now supports Apple GPUs, that support has not been enabled in MAX and Python APIs yet
WARNING: accelerator_count() returns 0 on Apple devices. While Mojo now supports Apple GPUs, that support has not been enabled in MAX and Python APIs yet
Traceback (most recent call last):
File "/Users/LYJ/Applications/MAX/.venv/MAX/bin/max", line 8, in <module>
sys.exit(main())
^^^^^^
File "/Users/LYJ/Applications/MAX/.venv/MAX/lib/python3.12/site-packages/click/core.py", line 1485, in __call__
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/LYJ/Applications/MAX/.venv/MAX/lib/python3.12/site-packages/click/core.py", line 1406, in main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "/Users/LYJ/Applications/MAX/.venv/MAX/lib/python3.12/site-packages/click/core.py", line 1873, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/LYJ/Applications/MAX/.venv/MAX/lib/python3.12/site-packages/max/entrypoints/pipelines.py", line 85, in invoke
return super().invoke(ctx)
^^^^^^^^^^^^^^^^^^^
File "/Users/LYJ/Applications/MAX/.venv/MAX/lib/python3.12/site-packages/click/core.py", line 1269, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/LYJ/Applications/MAX/.venv/MAX/lib/python3.12/site-packages/click/core.py", line 824, in invoke
return callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/LYJ/Applications/MAX/.venv/MAX/lib/python3.12/site-packages/max/entrypoints/cli/config.py", line 285, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/Users/LYJ/Applications/MAX/.venv/MAX/lib/python3.12/site-packages/max/entrypoints/pipelines.py", line 191, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/Users/LYJ/Applications/MAX/.venv/MAX/lib/python3.12/site-packages/max/entrypoints/pipelines.py", line 259, in cli_serve
pipeline_config = PipelineConfig(**config_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/LYJ/Applications/MAX/.venv/MAX/lib/python3.12/site-packages/max/pipelines/lib/config.py", line 379, in __init__
self.resolve()
File "/Users/LYJ/Applications/MAX/.venv/MAX/lib/python3.12/site-packages/max/pipelines/lib/config.py", line 576, in resolve
self._validate_and_resolve_remaining_pipeline_config(
File "/Users/LYJ/Applications/MAX/.venv/MAX/lib/python3.12/site-packages/max/pipelines/lib/config.py", line 735, in _validate_and_resolve_remaining_pipeline_config
model_config.validate_and_resolve_quantization_encoding_weight_path(
File "/Users/LYJ/Applications/MAX/.venv/MAX/lib/python3.12/site-packages/max/pipelines/lib/model_config.py", line 389, in validate_and_resolve_quantization_encoding_weight_path
self._validate_and_resolve_with_given_quantization_encoding(
File "/Users/LYJ/Applications/MAX/.venv/MAX/lib/python3.12/site-packages/max/pipelines/lib/model_config.py", line 549, in _validate_and_resolve_with_given_quantization_encoding
raise ValueError(msg)
ValueError: quantization_encoding 'SupportedEncoding.float32' is not supported by the repo 'meta-llama/Llama-3.1-8B'
Steps to reproduce
- python -m venv .venv/MAX
- source .venv/MAX/bin/activate
- pip install modular --extra-index-url https://modular.gateway.scarf.sh/simple/
- export HF_TOKEN="hf_…"
- max serve - model meta-llama/Llama-3.1–8B-Instruct
- max serve - model meta-llama/Llama-3.1–8B-Instruct --quantization-encoding float32
System information
- pip list
Package Version
---------------------------------------- -----------
aiofiles 25.1.0
aiohappyeyeballs 2.6.1
aiohttp 3.13.2
aiosignal 1.4.0
annotated-doc 0.0.4
annotated-types 0.7.0
anyio 4.11.0
asgiref 3.10.0
attrs 25.4.0
certifi 2025.11.12
charset-normalizer 3.4.4
click 8.3.1
datasets 4.4.1
dill 0.4.0
exceptiongroup 1.3.0
fastapi 0.121.2
filelock 3.20.0
frozenlist 1.8.0
fsspec 2025.10.0
gguf 0.17.1
googleapis-common-protos 1.72.0
grpcio 1.76.0
h11 0.16.0
hf_transfer 0.1.9
hf-xet 1.2.0
httpcore 1.0.9
httpx 0.28.1
huggingface-hub 0.36.0
idna 3.11
importlib_metadata 8.7.0
Jinja2 3.1.6
llguidance 1.4.0
markdown-it-py 4.0.0
MarkupSafe 3.0.3
max 25.6.1
max-core 25.6.1
mblack 25.6.1
mdurl 0.1.2
modular 25.6.1
mojo 0.25.6.1
mojo-compiler 0.25.6.1
mojo-lldb-libs 0.25.6.1
msgspec 0.19.0
multidict 6.7.0
multiprocess 0.70.18
mypy_extensions 1.1.0
numpy 2.3.5
nvidia-ml-py 13.580.82
nvitop 1.6.0
opentelemetry-api 1.35.0
opentelemetry-exporter-otlp-proto-common 1.35.0
opentelemetry-exporter-otlp-proto-http 1.35.0
opentelemetry-exporter-prometheus 0.56b0
opentelemetry-proto 1.35.0
opentelemetry-sdk 1.35.0
opentelemetry-semantic-conventions 0.56b0
packaging 25.0
pandas 2.3.3
pathspec 0.12.1
pillow 12.0.0
pip 24.0
platformdirs 4.5.0
prometheus_client 0.23.1
propcache 0.4.1
protobuf 6.31.1
psutil 7.1.3
pyarrow 22.0.0
pydantic 2.12.4
pydantic_core 2.41.5
pydantic-settings 2.12.0
Pygments 2.19.2
pyinstrument 5.1.1
python-dateutil 2.9.0.post0
python-dotenv 1.2.1
python-json-logger 4.0.0
pytz 2025.2
PyYAML 6.0.3
pyzmq 27.1.0
regex 2025.11.3
requests 2.32.5
rich 14.2.0
safetensors 0.6.2
scipy 1.16.3
sentencepiece 0.2.1
six 1.17.0
sniffio 1.3.1
sse-starlette 3.0.3
starlette 0.49.3
taskgroup 0.2.2
tokenizers 0.22.1
tqdm 4.67.1
transformers 4.56.2
typing_extensions 4.15.0
typing-inspection 0.4.2
tzdata 2025.2
urllib3 2.5.0
uvicorn 0.38.0
uvloop 0.22.1
xxhash 3.6.0
yarl 1.22.0
zipp 3.23.0- pip show max
Name: max
Version: 25.6.1
Summary: The Modular Accelerated Xecution (MAX) framework
Home-page: https://modular.com
Author: Modular Inc
Author-email: hello@modular.com
License: LicenseRef-MAX-Platform-Software-License
Location: /Users/LYJ/Applications/MAX/.venv/MAX/lib/python3.12/site-packages
Requires: max-core, numpy, typing-extensions
Required-by: modular
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
Needs TriageIssue needs to be routed/triaged to a particular team stillIssue needs to be routed/triaged to a particular team stillbugSomething isn't workingSomething isn't workingmax