Skip to content

[BUG]: quantization_encoding 'SupportedEncoding.float32' is not supported by the repo #5592

@2sem

Description

@2sem

Bug description

I am trying to run model on my M4 Macbook following the tutorial, but I am stuck on this error.

I don't know what should I do next. Is it impossible to run the model on Macbook?
or should I follow https://docs.modular.com/max/tutorials/serve-custom-model-architectures?
or is this Hugging Face issue?

I left here ask helping, because I have don't know on where I can ask. :)
Please let me know any other helpdesk.

I heard about Mojo is fast than Python, so I thought Modular would be faster than Ollama.
And I am trying to replace Ollama with Modular.

Errors

  1. max serve - model meta-llama/Llama-3.1–8B-Instruct
  • ValueError: The encoding 'SupportedEncoding.bfloat16' is not compatible with the selected device type 'cpu'.
  1. max serve - model meta-llama/Llama-3.1–8B-Instruct --quantization-encoding float32
  • ValueError: quantization_encoding 'SupportedEncoding.float32' is not supported by the repo 'meta-llama/Llama-3.1-8B'

Full Logs

  1. max serve - model meta-llama/Llama-3.1–8B-Instruct
13:04:31.772 INFO: Metrics initialized.
WARNING: accelerator_count() returns 0 on Apple devices. While Mojo now supports Apple GPUs, that support has not been enabled in MAX and Python APIs yet
13:04:31.787 INFO: No GPUs available, falling back to CPU
WARNING: accelerator_count() returns 0 on Apple devices. While Mojo now supports Apple GPUs, that support has not been enabled in MAX and Python APIs yet
13:04:31.788 INFO: No GPUs available, falling back to CPU
WARNING: accelerator_count() returns 0 on Apple devices. While Mojo now supports Apple GPUs, that support has not been enabled in MAX and Python APIs yet
WARNING: accelerator_count() returns 0 on Apple devices. While Mojo now supports Apple GPUs, that support has not been enabled in MAX and Python APIs yet
Traceback (most recent call last):
  File "/Users/LYJ/Applications/MAX/.venv/MAX/bin/max", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/Users/LYJ/Applications/MAX/.venv/MAX/lib/python3.12/site-packages/click/core.py", line 1485, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/LYJ/Applications/MAX/.venv/MAX/lib/python3.12/site-packages/click/core.py", line 1406, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/Users/LYJ/Applications/MAX/.venv/MAX/lib/python3.12/site-packages/click/core.py", line 1873, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/LYJ/Applications/MAX/.venv/MAX/lib/python3.12/site-packages/max/entrypoints/pipelines.py", line 85, in invoke
    return super().invoke(ctx)
           ^^^^^^^^^^^^^^^^^^^
  File "/Users/LYJ/Applications/MAX/.venv/MAX/lib/python3.12/site-packages/click/core.py", line 1269, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/LYJ/Applications/MAX/.venv/MAX/lib/python3.12/site-packages/click/core.py", line 824, in invoke
    return callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/LYJ/Applications/MAX/.venv/MAX/lib/python3.12/site-packages/max/entrypoints/cli/config.py", line 285, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/LYJ/Applications/MAX/.venv/MAX/lib/python3.12/site-packages/max/entrypoints/pipelines.py", line 191, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/LYJ/Applications/MAX/.venv/MAX/lib/python3.12/site-packages/max/entrypoints/pipelines.py", line 259, in cli_serve
    pipeline_config = PipelineConfig(**config_kwargs)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/LYJ/Applications/MAX/.venv/MAX/lib/python3.12/site-packages/max/pipelines/lib/config.py", line 379, in __init__
    self.resolve()
  File "/Users/LYJ/Applications/MAX/.venv/MAX/lib/python3.12/site-packages/max/pipelines/lib/config.py", line 576, in resolve
    self._validate_and_resolve_remaining_pipeline_config(
  File "/Users/LYJ/Applications/MAX/.venv/MAX/lib/python3.12/site-packages/max/pipelines/lib/config.py", line 748, in _validate_and_resolve_remaining_pipeline_config
    model_config.validate_and_resolve_with_resolved_quantization_encoding(
  File "/Users/LYJ/Applications/MAX/.venv/MAX/lib/python3.12/site-packages/max/pipelines/lib/model_config.py", line 433, in validate_and_resolve_with_resolved_quantization_encoding
    self._validate_quantization_encoding_device_compatibility(
  File "/Users/LYJ/Applications/MAX/.venv/MAX/lib/python3.12/site-packages/max/pipelines/lib/model_config.py", line 646, in _validate_quantization_encoding_device_compatibility
    raise ValueError(msg)
ValueError: The encoding 'SupportedEncoding.bfloat16' is not compatible with the selected device type 'cpu'.

You have two options to resolve this:
1. Use a different device
2. Use a different encoding (encodings available for this model: SupportedEncoding.gptq, SupportedEncoding.q4_k, SupportedEncoding.q4_0, SupportedEncoding.q6_k, SupportedEncoding.float32, SupportedEncoding.bfloat16, SupportedEncoding.float8_e4m3fn)
  1. max serve - model meta-llama/Llama-3.1–8B-Instruct --quantization-encoding float32
13:06:50.066 INFO: Metrics initialized.
WARNING: accelerator_count() returns 0 on Apple devices. While Mojo now supports Apple GPUs, that support has not been enabled in MAX and Python APIs yet
13:06:50.084 INFO: No GPUs available, falling back to CPU
WARNING: accelerator_count() returns 0 on Apple devices. While Mojo now supports Apple GPUs, that support has not been enabled in MAX and Python APIs yet
13:06:50.084 INFO: No GPUs available, falling back to CPU
WARNING: accelerator_count() returns 0 on Apple devices. While Mojo now supports Apple GPUs, that support has not been enabled in MAX and Python APIs yet
WARNING: accelerator_count() returns 0 on Apple devices. While Mojo now supports Apple GPUs, that support has not been enabled in MAX and Python APIs yet
Traceback (most recent call last):
  File "/Users/LYJ/Applications/MAX/.venv/MAX/bin/max", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/Users/LYJ/Applications/MAX/.venv/MAX/lib/python3.12/site-packages/click/core.py", line 1485, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/LYJ/Applications/MAX/.venv/MAX/lib/python3.12/site-packages/click/core.py", line 1406, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/Users/LYJ/Applications/MAX/.venv/MAX/lib/python3.12/site-packages/click/core.py", line 1873, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/LYJ/Applications/MAX/.venv/MAX/lib/python3.12/site-packages/max/entrypoints/pipelines.py", line 85, in invoke
    return super().invoke(ctx)
           ^^^^^^^^^^^^^^^^^^^
  File "/Users/LYJ/Applications/MAX/.venv/MAX/lib/python3.12/site-packages/click/core.py", line 1269, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/LYJ/Applications/MAX/.venv/MAX/lib/python3.12/site-packages/click/core.py", line 824, in invoke
    return callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/LYJ/Applications/MAX/.venv/MAX/lib/python3.12/site-packages/max/entrypoints/cli/config.py", line 285, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/LYJ/Applications/MAX/.venv/MAX/lib/python3.12/site-packages/max/entrypoints/pipelines.py", line 191, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/LYJ/Applications/MAX/.venv/MAX/lib/python3.12/site-packages/max/entrypoints/pipelines.py", line 259, in cli_serve
    pipeline_config = PipelineConfig(**config_kwargs)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/LYJ/Applications/MAX/.venv/MAX/lib/python3.12/site-packages/max/pipelines/lib/config.py", line 379, in __init__
    self.resolve()
  File "/Users/LYJ/Applications/MAX/.venv/MAX/lib/python3.12/site-packages/max/pipelines/lib/config.py", line 576, in resolve
    self._validate_and_resolve_remaining_pipeline_config(
  File "/Users/LYJ/Applications/MAX/.venv/MAX/lib/python3.12/site-packages/max/pipelines/lib/config.py", line 735, in _validate_and_resolve_remaining_pipeline_config
    model_config.validate_and_resolve_quantization_encoding_weight_path(
  File "/Users/LYJ/Applications/MAX/.venv/MAX/lib/python3.12/site-packages/max/pipelines/lib/model_config.py", line 389, in validate_and_resolve_quantization_encoding_weight_path
    self._validate_and_resolve_with_given_quantization_encoding(
  File "/Users/LYJ/Applications/MAX/.venv/MAX/lib/python3.12/site-packages/max/pipelines/lib/model_config.py", line 549, in _validate_and_resolve_with_given_quantization_encoding
    raise ValueError(msg)
ValueError: quantization_encoding 'SupportedEncoding.float32' is not supported by the repo 'meta-llama/Llama-3.1-8B'

Steps to reproduce

  1. python -m venv .venv/MAX
  2. source .venv/MAX/bin/activate
  3. pip install modular --extra-index-url https://modular.gateway.scarf.sh/simple/
  4. export HF_TOKEN="hf_…"
  5. max serve - model meta-llama/Llama-3.1–8B-Instruct
  6. max serve - model meta-llama/Llama-3.1–8B-Instruct --quantization-encoding float32

System information

  1. pip list
Package                                  Version
---------------------------------------- -----------
aiofiles                                 25.1.0
aiohappyeyeballs                         2.6.1
aiohttp                                  3.13.2
aiosignal                                1.4.0
annotated-doc                            0.0.4
annotated-types                          0.7.0
anyio                                    4.11.0
asgiref                                  3.10.0
attrs                                    25.4.0
certifi                                  2025.11.12
charset-normalizer                       3.4.4
click                                    8.3.1
datasets                                 4.4.1
dill                                     0.4.0
exceptiongroup                           1.3.0
fastapi                                  0.121.2
filelock                                 3.20.0
frozenlist                               1.8.0
fsspec                                   2025.10.0
gguf                                     0.17.1
googleapis-common-protos                 1.72.0
grpcio                                   1.76.0
h11                                      0.16.0
hf_transfer                              0.1.9
hf-xet                                   1.2.0
httpcore                                 1.0.9
httpx                                    0.28.1
huggingface-hub                          0.36.0
idna                                     3.11
importlib_metadata                       8.7.0
Jinja2                                   3.1.6
llguidance                               1.4.0
markdown-it-py                           4.0.0
MarkupSafe                               3.0.3
max                                      25.6.1
max-core                                 25.6.1
mblack                                   25.6.1
mdurl                                    0.1.2
modular                                  25.6.1
mojo                                     0.25.6.1
mojo-compiler                            0.25.6.1
mojo-lldb-libs                           0.25.6.1
msgspec                                  0.19.0
multidict                                6.7.0
multiprocess                             0.70.18
mypy_extensions                          1.1.0
numpy                                    2.3.5
nvidia-ml-py                             13.580.82
nvitop                                   1.6.0
opentelemetry-api                        1.35.0
opentelemetry-exporter-otlp-proto-common 1.35.0
opentelemetry-exporter-otlp-proto-http   1.35.0
opentelemetry-exporter-prometheus        0.56b0
opentelemetry-proto                      1.35.0
opentelemetry-sdk                        1.35.0
opentelemetry-semantic-conventions       0.56b0
packaging                                25.0
pandas                                   2.3.3
pathspec                                 0.12.1
pillow                                   12.0.0
pip                                      24.0
platformdirs                             4.5.0
prometheus_client                        0.23.1
propcache                                0.4.1
protobuf                                 6.31.1
psutil                                   7.1.3
pyarrow                                  22.0.0
pydantic                                 2.12.4
pydantic_core                            2.41.5
pydantic-settings                        2.12.0
Pygments                                 2.19.2
pyinstrument                             5.1.1
python-dateutil                          2.9.0.post0
python-dotenv                            1.2.1
python-json-logger                       4.0.0
pytz                                     2025.2
PyYAML                                   6.0.3
pyzmq                                    27.1.0
regex                                    2025.11.3
requests                                 2.32.5
rich                                     14.2.0
safetensors                              0.6.2
scipy                                    1.16.3
sentencepiece                            0.2.1
six                                      1.17.0
sniffio                                  1.3.1
sse-starlette                            3.0.3
starlette                                0.49.3
taskgroup                                0.2.2
tokenizers                               0.22.1
tqdm                                     4.67.1
transformers                             4.56.2
typing_extensions                        4.15.0
typing-inspection                        0.4.2
tzdata                                   2025.2
urllib3                                  2.5.0
uvicorn                                  0.38.0
uvloop                                   0.22.1
xxhash                                   3.6.0
yarl                                     1.22.0
zipp                                     3.23.0
  1. pip show max
Name: max
Version: 25.6.1
Summary: The Modular Accelerated Xecution (MAX) framework
Home-page: https://modular.com
Author: Modular Inc
Author-email: hello@modular.com
License: LicenseRef-MAX-Platform-Software-License
Location: /Users/LYJ/Applications/MAX/.venv/MAX/lib/python3.12/site-packages
Requires: max-core, numpy, typing-extensions
Required-by: modular

Metadata

Metadata

Assignees

No one assigned

    Labels

    Needs TriageIssue needs to be routed/triaged to a particular team stillbugSomething isn't workingmax

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions