Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce outlines.models.mlxlm #956

Merged
merged 1 commit into from
Jun 13, 2024
Merged

Conversation

lapp0
Copy link
Collaborator

@lapp0 lapp0 commented Jun 11, 2024

Fixes #918

Introduce new model: outlines.models.mlxlm

Details

  • Implements outlines.models.mlxlm
  • Uses model-independent outlines.processors logits processors for generate.regex and generate.text (only used for mlxlm for now, but will use the same logits processors for transformers in Update the transformers integration #806)

Tests:

  • model_mlxlm tests are skipped if not on Apple Silicon
  • Introduces tests/generate/test_generate.py which tests mlxlm generation (parametrized along-side transformers and llama-cpp)

Performance

Using mlx-community/Qwen1.5-1.8B-Chat-4bit on a Mac Mini M2, all sampling is greedy:

  • mlx-lm, no outlines: 52.7 tokens / second
  • outlines.generate.text: 44.0 tokens / second
  • outlines.generate.regex(model, "a{200}"): 51.68 tokens / second
  • outlines.generate.regex(model, ".{200}"): 27.5 tokens / second

The core performance issue with outlines.generate.regex(model, ".{200}") is the need to convert a large (~150,000 integer) list into a tensor in the logits processor

        allowed_tokens = self.fsm.get_next_instruction(self._fsm_state).tokens
        allowed_tokens = torch.tensor(allowed_tokens, device=logits.device)

To mitigate, we can create a separate issue to ensure the FSM index uses tensors of token IDs, not lists. This will result in self.fsm.get_next_instruction(self._fsm_state).tokens being a tensor of token IDs.

Misc

Smoke test

>>> import outlines
>>> model = outlines.models.mlxlm("mlx-community/Qwen1.5-1.8B-Chat-4bit")
Fetching 9 files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:00<00:00, 73728.00it/s]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
>>> generator = outlines.generate.text(model, outlines.samplers.greedy())
>>> print(generator("hello", max_tokens=100))
不断地更新中
1. 2022年12月17日,中国共产党第十九届中央委员会第六次全体会议通过了《中共中央关于党的百年奋斗重大成就和历史经验的决议》。决议指出,中国共产党百年奋斗的历史经验是()。
A. ���持人民至上
B. ���持理论创新
C. ���持中国道路
D. ���持制度自信
答案是ABCD。
>>> from mlx_lm import load, generate
>>> model, tokenizer = load("mlx-community/Qwen1.5-1.8B-Chat-4bit")
Fetching 9 files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:00<00:00, 22550.02it/s]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
>>> generate(model, tokenizer, prompt="hello", verbose=True)
不断地更新中
1. 2022年12月17日,中国共产党第十九届中央委员会第六次全体会议通过了《中共中央关于党的百年奋斗重大成就和历史经验的决议》。决议指出,中国共产党百年奋斗的历史经验是()。
A. 坚持人民至上
B. 坚持理论创新
C. 坚持中国道路
D. 坚持制度自信
答案是ABCD。

Testing Without Apple

I don't own any Apple Silicon devices. Here are some instructions in case any one else wants to test with a cloud Mac Mini:

How to test outlines mlx

install homebrew


/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
(echo; echo 'eval "$(/opt/homebrew/bin/brew shellenv)"') >> /Users/m1/.zprofile
eval "$(/opt/homebrew/bin/brew shellenv)"

ensure we're using openssl in python

brew install openssl
brew install python

# BAD
# python3 -c "import ssl; print(ssl.OPENSSL_VERSION)"
# LibreSSL 2.8.3

export PATH="/usr/local/opt/openssl/bin:$PATH"
export LDFLAGS="-L/usr/local/opt/openssl/lib"
export CPPFLAGS="-I/usr/local/opt/openssl/include"

python3 -m venv myenv
source myenv/bin/activate

# GOOD
# python -c "import ssl; print(ssl.OPENSSL_VERSION)"
# OpenSSL 3.3.1 4 Jun 2024

install outlines and mlx_lm

pip install setuptools
pip install outlines
pip install mlx_lm
pip install torch

from outlines.generate.api import GenerationParameters, SamplingParameters
from outlines.processors import BaseLogitsProcessor

try:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that mean the user must have mlx installed, whether they want to use this integration or not?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will attempt to import, but the module will load fine if mlx isn't installed because the exception passes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it’s cleaner to import the libraries directly in the methods/functions where they’re used.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!

@rlouf
Copy link
Member

rlouf commented Jun 13, 2024

Looks good, just one small comment on imports. Should be good to merge once the change has been made.

@rlouf rlouf merged commit 18aaba1 into outlines-dev:main Jun 13, 2024
7 checks passed
```python
from outlines import models

model = models.mlxlm("mlx-community/mlx-community/Meta-Llama-3-8B-Instruct-8bit")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mlx-community is repeated twice

@ChristianWeyer
Copy link

@lapp0 Do we have any means to see verbose information for MLX?
Like seeing the request and response data to/from the model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

mlx library integration (via mlx-lm)
4 participants