Introduce `outlines.models.mlxlm` #956

lapp0 · 2024-06-11T18:55:15Z

Fixes #918

Introduce new model: `outlines.models.mlxlm`

Details

Implements outlines.models.mlxlm
Uses model-independent outlines.processors logits processors for generate.regex and generate.text (only used for mlxlm for now, but will use the same logits processors for transformers in Update the transformers integration #806)

Tests:

model_mlxlm tests are skipped if not on Apple Silicon
Introduces tests/generate/test_generate.py which tests mlxlm generation (parametrized along-side transformers and llama-cpp)

Performance

Using mlx-community/Qwen1.5-1.8B-Chat-4bit on a Mac Mini M2, all sampling is greedy:

mlx-lm, no outlines: 52.7 tokens / second
outlines.generate.text: 44.0 tokens / second
outlines.generate.regex(model, "a{200}"): 51.68 tokens / second
outlines.generate.regex(model, ".{200}"): 27.5 tokens / second

The core performance issue with outlines.generate.regex(model, ".{200}") is the need to convert a large (~150,000 integer) list into a tensor in the logits processor

        allowed_tokens = self.fsm.get_next_instruction(self._fsm_state).tokens
        allowed_tokens = torch.tensor(allowed_tokens, device=logits.device)

To mitigate, we can create a separate issue to ensure the FSM index uses tensors of token IDs, not lists. This will result in self.fsm.get_next_instruction(self._fsm_state).tokens being a tensor of token IDs.

Misc

Smoke test

>>> import outlines
>>> model = outlines.models.mlxlm("mlx-community/Qwen1.5-1.8B-Chat-4bit")
Fetching 9 files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:00<00:00, 73728.00it/s]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
>>> generator = outlines.generate.text(model, outlines.samplers.greedy())
>>> print(generator("hello", max_tokens=100))
不断地更新中
1. 2022年12月17日，中国共产党第十九届中央委员会第六次全体会议通过了《中共中央关于党的百年奋斗重大成就和历史经验的决议》。决议指出，中国共产党百年奋斗的历史经验是（）。
A. ���持人民至上
B. ���持理论创新
C. ���持中国道路
D. ���持制度自信
答案是ABCD。

>>> from mlx_lm import load, generate
>>> model, tokenizer = load("mlx-community/Qwen1.5-1.8B-Chat-4bit")
Fetching 9 files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:00<00:00, 22550.02it/s]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
>>> generate(model, tokenizer, prompt="hello", verbose=True)
不断地更新中
1. 2022年12月17日，中国共产党第十九届中央委员会第六次全体会议通过了《中共中央关于党的百年奋斗重大成就和历史经验的决议》。决议指出，中国共产党百年奋斗的历史经验是（）。
A. 坚持人民至上
B. 坚持理论创新
C. 坚持中国道路
D. 坚持制度自信
答案是ABCD。

Testing Without Apple

I don't own any Apple Silicon devices. Here are some instructions in case any one else wants to test with a cloud Mac Mini:

How to test outlines mlx

Rent Mac Mini on https://console.scaleway.com/asaas/servers and ssh in

install homebrew


/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
(echo; echo 'eval "$(/opt/homebrew/bin/brew shellenv)"') >> /Users/m1/.zprofile
eval "$(/opt/homebrew/bin/brew shellenv)"

ensure we're using openssl in python

brew install openssl
brew install python

# BAD
# python3 -c "import ssl; print(ssl.OPENSSL_VERSION)"
# LibreSSL 2.8.3

export PATH="/usr/local/opt/openssl/bin:$PATH"
export LDFLAGS="-L/usr/local/opt/openssl/lib"
export CPPFLAGS="-I/usr/local/opt/openssl/include"

python3 -m venv myenv
source myenv/bin/activate

# GOOD
# python -c "import ssl; print(ssl.OPENSSL_VERSION)"
# OpenSSL 3.3.1 4 Jun 2024

install outlines and mlx_lm

pip install setuptools
pip install outlines
pip install mlx_lm
pip install torch

rlouf · 2024-06-11T19:04:00Z

outlines/models/mlxlm.py

+    from outlines.generate.api import GenerationParameters, SamplingParameters
+    from outlines.processors import BaseLogitsProcessor
+
+try:


Does that mean the user must have mlx installed, whether they want to use this integration or not?

It will attempt to import, but the module will load fine if mlx isn't installed because the exception passes.

I think it’s cleaner to import the libraries directly in the methods/functions where they’re used.

rlouf · 2024-06-13T07:00:28Z

Looks good, just one small comment on imports. Should be good to merge once the change has been made.

namin · 2024-06-14T04:58:18Z

docs/reference/models/mlxlm.md

+```python
+from outlines import models
+
+model = models.mlxlm("mlx-community/mlx-community/Meta-Llama-3-8B-Instruct-8bit")


mlx-community is repeated twice

ChristianWeyer · 2024-06-18T19:50:15Z

@lapp0 Do we have any means to see verbose information for MLX?
Like seeing the request and response data to/from the model.

lapp0 added the enhancement label Jun 11, 2024

rlouf reviewed Jun 11, 2024

View reviewed changes

lapp0 mentioned this pull request Jun 11, 2024

Unify LogitsProcessors and outlines.generate Dispatchers #957

Open

lapp0 requested a review from rlouf June 12, 2024 22:48

add outlines.models.mlxlm

1ec8bb5

lapp0 force-pushed the fix-918-mlx branch from e2d8a5c to 1ec8bb5 Compare June 13, 2024 21:43

rlouf merged commit 18aaba1 into outlines-dev:main Jun 13, 2024
7 checks passed

namin reviewed Jun 14, 2024

View reviewed changes

lapp0 mentioned this pull request Jun 16, 2024

OutlinesLogitsProcessor Benchmarks #979

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce `outlines.models.mlxlm` #956

Introduce `outlines.models.mlxlm` #956

lapp0 commented Jun 11, 2024 •

edited

Loading

rlouf Jun 11, 2024

lapp0 Jun 11, 2024

rlouf Jun 13, 2024

lapp0 Jun 13, 2024

rlouf commented Jun 13, 2024

namin Jun 14, 2024

ChristianWeyer commented Jun 18, 2024

Introduce outlines.models.mlxlm #956

Introduce outlines.models.mlxlm #956

Conversation

lapp0 commented Jun 11, 2024 • edited Loading

Introduce new model: outlines.models.mlxlm

Details

Performance

Misc

Smoke test

Testing Without Apple

rlouf Jun 11, 2024

Choose a reason for hiding this comment

lapp0 Jun 11, 2024

Choose a reason for hiding this comment

rlouf Jun 13, 2024

Choose a reason for hiding this comment

lapp0 Jun 13, 2024

Choose a reason for hiding this comment

rlouf commented Jun 13, 2024

namin Jun 14, 2024

Choose a reason for hiding this comment

ChristianWeyer commented Jun 18, 2024

Introduce `outlines.models.mlxlm` #956

Introduce `outlines.models.mlxlm` #956

lapp0 commented Jun 11, 2024 •

edited

Loading

Introduce new model: `outlines.models.mlxlm`