This library provides an Apple MLX LM backend for the Agent Framework. It allows you to run Language Models locally on macOS with Apple Silicon using the mlx-lm library.
- Local Inference: Run models directly on your Mac using Apple Silicon
- Streaming Support: Full support for streaming responses.
- Configurable Generation: Fine-tune generation parameters like temperature, top-p, repetition penalty, and more.
- Message Preprocessing: Hook into the pipeline to modify messages before they are converted to prompts.
- Agent Framework Integration: Seamlessly plugs into the Agent Framework's
BaseChatClientinterface.
Ensure you have Python 3.9+ and are running on macOS with Apple Silicon.
pip install agent-framework-mlxOr install from source:
git clone https://github.com/filipw/agent-framework-mlx.git
cd agent-framework-mlx
pip install -e .import asyncio
from agent_framework import ChatMessage, Role, ChatOptions
from agent_framework_mlx import MLXChatClient, MLXGenerationConfig
# Initialize the client
client = MLXChatClient(
model_path="mlx-community/Phi-4-mini-instruct-4bit",
generation_config=MLXGenerationConfig(
temp=0.7,
max_tokens=500
)
)
# Create messages
messages = [
ChatMessage(role=Role.SYSTEM, text="You are a helpful assistant."),
ChatMessage(role=Role.USER, text="Why is the sky blue?")
]
# Get response
response = await client.get_response(messages=messages, chat_options=ChatOptions())
print(response.text)You can also use the client as backbone for Agent Framework agents when building agentic workflows:
from agent_framework import ChatAgent
# notice the client constructed in the previous example now backs the local agent
local_agent = ChatAgent(
name="Local_MLX",
instructions="You are a helpful assistant.",
chat_client=client
)
remote_agent = ChatAgent(
name="Cloud_LLM",
instructions="You are a fallback expert. The previous assistant was unsure. Provide a complete answer.",
chat_client=azure_client
)
builder = WorkflowBuilder()
builder.set_start_executor(local_agent)
builder.add_edge(
source=local_agent,
target=remote_agent,
condition=should_fallback_to_cloud
)
workflow = builder.build()async for update in client.get_streaming_response(messages=messages, chat_options=ChatOptions()):
print(update.text, end="", flush=True)You can configure generation parameters globally via MLXGenerationConfig or per-request via ChatOptions.
config = MLXGenerationConfig(
temp=0.7,
top_p=0.9,
repetition_penalty=1.1,
seed=42
)You can intercept and modify messages before they are sent to the model. This is useful for injecting instructions or formatting content.
def inject_instruction(messages):
if messages:
messages[-1]["content"] += "\nIMPORTANT: Be concise."
return messages
client = MLXChatClient(
model_path="...",
message_preprocessor=inject_instruction
)- macOS
- Apple Silicon
- Python 3.9+
This project is licensed under the MIT License. See the LICENSE file for details.