Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] When generating using mlx_lm, specify data format #761

Closed
konarkm opened this issue May 8, 2024 · 2 comments
Closed

Comments

@konarkm
Copy link

konarkm commented May 8, 2024

Regarding llms/mlx_lm/LORA.md

The instructions specify chat, completions, and text for the dataset. However in the Generate usage there aren't instructions for how to use one of those three formats.
For example, I'm not sure how I would add a system prompt as well as a user message when calling the mlx_lm.generate command.

Not sure if this is something that is supported but not documented in LORA.md, or maybe I missed something!

@awni
Copy link
Member

awni commented May 11, 2024

So usually in generate it uses the models default chat template (so it's a chat). You can use the raw prompt (so it's just the text) by specifying --ignore-chat-template. There is currently no way to do the completion version in the CLI. But if you use the API you could do it like this:

from mlx_lm import load, generate

model, tokenizer = load("mistralai/Mistral-7B-Instruct-v0.1")

prompt = ""
completion = ""

text = tokenizer.apply_chat_template(
    [
        {"role": "user", "content": prompt},
        {"role": "assistant", "content": completion},
    ],
    tokenize=False,
    add_generation_prompt=True,
)

response = generate(model, tokenizer, prompt=text)

@awni awni closed this as completed May 11, 2024
@konarkm
Copy link
Author

konarkm commented May 12, 2024

I see, thank you!

In case it helps anyone down the line:
My goal was to generate using fine tuned adapters, as well as with a system prompt and user prompt in the chat format.

Based on generate.py and utils.py, I load the model with the adapters like this:

from mlx_lm import load, generate

model_repo = "mlx-community/Meta-Llama-3-8B-Instruct-4bit"
adapter_path = "/adapters"

model, tokenizer = load(model_repo, adapter_path=adapter_path)

Then, I can generate:

system_prompt = "Be a helpful assistant"
prompt = "Hey, tell me about Llama"

text = tokenizer.apply_chat_template(
    [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": prompt},
    ],
    tokenize=False,
    add_generation_prompt=True,
)

response = generate(model, tokenizer, prompt=text, verbose=True)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants