A mock server and in-process faker for OpenAI and Anthropic APIs, built for Python testing. Monkey-patches official LLM client libraries to intercept calls without network overhead.
- In-process patching of
openai,anthropic,litellm, andlangchainclients - Fluent builder API for configuring responses
- Pattern matching: exact, regex, predicate-based, and template rendering
- Streaming support with realistic SSE emission (OpenAI and Anthropic formats)
- Failure injection: rate limits, timeouts, mid-stream disconnects, malformed JSON
- Latency simulation with configurable TTFT and inter-token delays
- Record/replay cassettes for integration testing
- Multi-turn conversation scripting and tool-call sequences
- Token counting and cost estimation via pricing tables
- Pytest plugin with
llm_fakerandllm_recordingfixtures - Standalone mock server mode via CLI
pip install llmfakerfrom llmfaker import LLMFaker
import openai
client = openai.OpenAI(api_key="fake")
with LLMFaker() as faker:
faker.when(prompt_contains="weather").respond("It's sunny!")
response = client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": "What's the weather?"}],
)
print(response.choices[0].message.content) # "It's sunny!"def test_my_feature(llm_faker):
llm_faker.when(prompt_contains="hello").respond("Hi!")
response = client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": "hello"}],
)
assert "Hi!" in response.choices[0].message.content
assert len(llm_faker.calls) == 1def test_real_api_behavior(llm_recording):
# First run: calls real API and records to cassette
# Subsequent runs: replays from cassette file
response = client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": "hello"}],
)
assert response.choices[0].message.contentwith LLMFaker() as faker:
with faker.fail(rate=1.0, status=429, retry_after=30):
# All calls will get a 429 rate limit error
...mockllm start --responses responses.yml --port 8000responses:
"what colour is the sky?": "The sky is blue due to Rayleigh scattering."
"tell me a joke": "Why don't programmers like nature? Too many bugs!"
defaults:
unknown_response: "I don't know the answer to that."
settings:
lag_enabled: true
lag_factor: 10pip install -r requirements.txt
pip install -e .
# Run tests
python -m pytest tests/ -vInspired by mockllm.