Production-ready AI workflows for Laravel. Wraps Prism PHP with retry logic, fallback models, finish reason monitoring, YAML-based prompt management with Mustache templating, request logging with tagging, middleware pipeline, caching, multimodal support, eval framework, and Laravel Data integration.
composer require pocketarc/ai-workflowPublish the config file:
php artisan vendor:publish --tag=ai-workflow-configTo enable request logging, also publish and run the migrations:
php artisan vendor:publish --tag=ai-workflow-migrations
php artisan migratePrompts live as Markdown files with YAML front-matter in your configured prompts_path (default: resources/prompts/).
---
model: openrouter:google/gemini-3-pro-preview
fallback_model: openrouter:openai/gpt-5.2
tags: [classification, intent]
cache_ttl: 3600
---
You are a helpful assistant that answers questions concisely.Front-matter fields:
model(required): Model identifier inprovider:modelformat (e.g.openrouter:google/gemini-3-pro-preview,anthropic:claude-opus-4.5).fallback_model(optional): If structured decoding fails, retry with this model. Sameprovider:modelformat.tags(optional): Array of string tags stored with each request for filtering.cache_ttl(optional): Cache responses for this many seconds. Omit to disable caching.
The prompt's id is derived from the filename. A file at resources/prompts/my_prompt.md is my_prompt.
Prompts support Mustache variables and conditionals:
---
model: openrouter:anthropic/claude-4-opus
---
You are helping {{ customer_name }} with their {{ product }} subscription.
{{#is_vip}}
This is a VIP customer. Provide priority support.
{{/is_vip}}use AiWorkflow\Facades\Prompt;
$prompt = Prompt::load('support', [
'customer_name' => 'Jane',
'product' => 'Pro',
'is_vip' => true,
]);Load prompts without variables as before:
$prompt = Prompt::load('my_prompt');Send messages and get a text response with tool-calling support:
use AiWorkflow\AiService;
use AiWorkflow\Facades\Prompt;
use Prism\Prism\ValueObjects\Messages\UserMessage;
$aiService = app(AiService::class);
$response = $aiService->sendMessages(
collect([new UserMessage('What is the weather like?')]),
Prompt::load('chat'),
);
echo $response->text;Get structured JSON output matching a schema:
use Prism\Prism\Schema\ObjectSchema;
use Prism\Prism\Schema\StringSchema;
use Prism\Prism\Schema\NumberSchema;
$schema = new ObjectSchema(
name: 'analysis',
description: 'Ticket analysis',
properties: [
new StringSchema('summary', 'A brief summary'),
new NumberSchema('priority', 'Priority from 1-5'),
],
requiredFields: ['summary', 'priority'],
);
$response = $aiService->sendStructuredMessages(
collect([new UserMessage('Analyze this ticket...')]),
Prompt::load('analyze_ticket'),
$schema,
);
$data = $response->structured;
// ['summary' => '...', 'priority' => 3]If you have spatie/laravel-data installed, you can use Data classes directly. The package generates the schema from the class, validates the response, and retries with feedback on validation failure:
use AiWorkflow\Attributes\Description;
use Spatie\LaravelData\Data;
class SentimentAnalysis extends Data
{
public function __construct(
#[Description('The detected sentiment: positive, negative, or neutral')]
public readonly string $sentiment,
#[Description('Confidence score from 0.0 to 1.0')]
public readonly float $confidence,
) {}
}
$result = $aiService->sendStructuredData(
collect([new UserMessage('Analyze the sentiment of: "I love this product!"')]),
Prompt::load('sentiment'),
SentimentAnalysis::class,
);
// $result is a validated SentimentAnalysis instance
echo $result->sentiment; // "positive"
echo $result->confidence; // 0.95On validation failure, the package appends the error to the conversation and retries up to $maxAttempts (default 3).
Stream text responses as a generator of events:
$stream = $aiService->streamMessages(
collect([new UserMessage('Tell me a story')]),
Prompt::load('chat'),
);
foreach ($stream as $event) {
if ($event instanceof \Prism\Prism\Streaming\Events\TextDeltaEvent) {
echo $event->delta;
}
}Streaming does not support automatic retries (this is inherent to how streaming APIs work).
Pass a second prompt as shared context that gets prepended to the system prompt:
$response = $aiService->sendMessages(
$messages,
Prompt::load('respond_to_customer'),
extraContext: Prompt::load('shared_context'),
);Register tools that the AI can call during text conversations:
// In your AppServiceProvider::boot()
use AiWorkflow\AiService;
use Prism\Prism\Facades\Tool;
$aiService = app(AiService::class);
$aiService->resolveToolsUsing(fn (array $context) => [
Tool::as('get_weather')
->for('Get current weather conditions.')
->withStringParameter('city', 'The city name')
->using(fn (string $city): string => "Weather in {$city}: sunny, 20°C"),
]);Set context before making calls to pass runtime data to your tools:
$aiService->setContext(['customer' => $customer]);
$response = $aiService->sendMessages($messages, $prompt);Tags help you categorize and filter logged requests. Set them in prompt front-matter and/or at runtime:
---
model: openrouter:anthropic/claude-4
tags: [classification, intent]
---$aiService->setTags(['billing', 'priority']);Tags from both sources are merged and deduplicated. Query with custom builder scopes:
use AiWorkflow\Models\AiWorkflowRequest;
AiWorkflowRequest::query()->withTag('classification')->get();
AiWorkflowRequest::query()->withAnyTag(['classification', 'intent'])->get();
AiWorkflowRequest::query()->byModel('claude-4')->successful()->get();
AiWorkflowRequest::query()->errors()->get();When enabled, every AI call is recorded to the database with enough detail to replay it: system prompt, messages, model, provider, schema, response, token usage, duration, and tags.
Enable logging in your .env:
AI_WORKFLOW_LOGGING=true
Group related AI calls under a named execution:
$aiService->startExecution('work_ticket', ['ticket_id' => $ticket->id]);
$aiService->sendMessages($messages, Prompt::load('decide_action'));
$aiService->sendMessages($messages, Prompt::load('generate_response'));
$aiService->sendStructuredMessages($messages, Prompt::load('judge_response'), $schema);
$execution = $aiService->endExecution();
// All three calls are linked to this execution.Query executions and get aggregate token usage:
use AiWorkflow\Models\AiWorkflowExecution;
$execution = AiWorkflowExecution::query()->byName('work_ticket')->recent()->first();
$execution->totalInputTokens();
$execution->totalOutputTokens();
$execution->totalTokens();
$execution->totalDurationMs();
$execution->requestCount();Two events are dispatched after every AI call, regardless of whether logging is enabled:
AiWorkflowRequestCompleted— prompt, method, model, finish reason, usage, duration, execution ID.AiWorkflowRequestFailed— prompt, method, model, exception, duration, execution ID.
A ready-to-use listener adds Sentry breadcrumbs for AI requests. Register in your EventServiceProvider:
use AiWorkflow\Events\AiWorkflowRequestCompleted;
use AiWorkflow\Events\AiWorkflowRequestFailed;
use AiWorkflow\Listeners\SentrySpanListener;
protected $listen = [
AiWorkflowRequestCompleted::class => [SentrySpanListener::class . '@handleCompleted'],
AiWorkflowRequestFailed::class => [SentrySpanListener::class . '@handleFailed'],
];No hard dependency on Sentry — the listener is a no-op if Sentry is not installed.
Responses can be cached per-prompt using a content-addressable key derived from the request parameters. Set cache_ttl in the prompt front-matter:
---
model: openrouter:anthropic/claude-4
cache_ttl: 3600
---Enable caching globally in your .env:
AI_WORKFLOW_CACHE=true
AI_WORKFLOW_CACHE_STORE=redis # optional, defaults to your app's default cache store
Cache hits skip the API call entirely and do not create log records.
Add before/after hooks to all AI requests using a middleware pipeline.
Register middleware in your config:
// config/ai-workflow.php
'middleware' => [
App\Middleware\LogRequestMetrics::class,
],Add middleware per-instance:
$aiService->addMiddleware(new App\Middleware\LogRequestMetrics());Implement AiWorkflowMiddleware:
use AiWorkflow\Middleware\AiWorkflowContext;
use AiWorkflow\Middleware\AiWorkflowMiddleware;
class LogRequestMetrics implements AiWorkflowMiddleware
{
public function handle(AiWorkflowContext $context, Closure $next): AiWorkflowContext
{
// Before the AI request
$start = microtime(true);
$context = $next($context);
// After the AI request
logger()->info('AI request took ' . (microtime(true) - $start) . 's');
return $context;
}
}Abstract base classes for input and output validation:
use AiWorkflow\Middleware\InputGuardrail;
use AiWorkflow\Middleware\AiWorkflowContext;
class PiiDetectionGuardrail extends InputGuardrail
{
protected function validate(AiWorkflowContext $context): void
{
// Throw GuardrailViolationException if PII is detected in messages
}
}InputGuardrail validates before the request; OutputGuardrail validates after. Both throw GuardrailViolationException on failure.
The replay engine lets you re-run recorded AI requests with different models or updated prompts. This is the foundation for evals.
use AiWorkflow\AiWorkflowReplayer;
$replayer = app(AiWorkflowReplayer::class);
// Replay exactly as recorded
$result = $replayer->replay($request);
// Replay with a different model
$result = $replayer->replay($request, model: 'anthropic:claude-4');
// Replay with the latest prompt from disk (uses the stored prompt_id to load)
$result = $replayer->replay($request, useCurrentPrompts: true);
// Both: latest prompt + different model
$result = $replayer->replay($request, useCurrentPrompts: true, model: 'anthropic:claude-4');
// Compare one request across multiple models
$results = $replayer->replayAcrossModels($request, [
'openrouter:google/gemini-3-pro',
'anthropic:claude-4',
'openrouter:openai/gpt-5.2',
]);
// Returns array keyed by model name.
// Replay an entire execution — each request loads its own prompt via prompt_id
$results = $replayer->replayExecution($execution, useCurrentPrompts: true);Evaluate AI outputs by replaying recorded requests from curated datasets across models with pluggable judges.
The workflow: run an AI action, verify the response is correct, add the execution to a named dataset, then eval that dataset against different models to see which ones produce equivalent results.
Datasets are collections of known-good executions. Use execution grouping to track AI calls, then add verified executions to a dataset:
// In your action, group AI calls under an execution:
$aiService->startExecution('decide_action #42', ['ticket_id' => 42]);
$response = $aiService->sendStructuredMessages($messages, $prompt, $schema);
$execution = $aiService->endExecution();
// $execution->id is the UUID you'll reference# After verifying the response was correct, add it to a dataset:
php artisan eval:add decide-actions abc-123-uuid
# List all datasets
php artisan eval:list
# Show executions in a dataset
php artisan eval:show decide-actions
# Remove an execution from a dataset
php artisan eval:remove decide-actions abc-123-uuidphp artisan eval:run decide-actions \
--models=openrouter:google/gemini-3-pro,openrouter:openai/gpt-5.2 \
--judge=App\\Eval\\MyJudgeThis replays every request in the dataset against each model, judges the results, and displays a per-model score table.
Implement AiWorkflowEvalJudge:
use AiWorkflow\Eval\AiWorkflowEvalJudge;
use AiWorkflow\Eval\AiWorkflowEvalResult;
use AiWorkflow\Models\AiWorkflowRequest;
use Prism\Prism\Text\Response;
use Prism\Prism\Structured\Response as StructuredResponse;
class MyJudge implements AiWorkflowEvalJudge
{
public function judge(AiWorkflowRequest $originalRequest, Response|StructuredResponse $response): AiWorkflowEvalResult
{
// Compare the new response against the original recorded response
// Return a score from 0.0 to 1.0
return new AiWorkflowEvalResult(score: 0.9, details: ['reasoning' => '...']);
}
}The package includes AiJudge — an AI-powered judge that semantically compares original and new responses (e.g. {"payer": "John"} vs {"payer": "john"} scores high). For domain-specific evaluation, implement your own judge with custom scoring logic.
use AiWorkflow\Eval\AiWorkflowEvalRunner;
use AiWorkflow\Models\AiWorkflowEvalDataset;
$runner = app(AiWorkflowEvalRunner::class);
$dataset = AiWorkflowEvalDataset::query()->where('name', 'decide-actions')->firstOrFail();
$evalRun = $runner->run(
name: 'Decision eval',
requests: $dataset->requests(),
models: ['openrouter:anthropic/claude-4', 'openrouter:google/gemini-3-pro'],
judge: app(MyJudge::class),
);
$evalRun->averageScore(); // overall
$evalRun->averageScoreForModel('openrouter:anthropic/claude-4'); // per modelRun YAML-defined test cases against prompts to verify AI outputs.
Test files live alongside prompts in a tests/ subdirectory:
resources/prompts/
classify_intent.md
tests/
classify_intent.yaml
variables:
company_name: "Test Corp"
cases:
- name: "Billing question"
messages:
- role: user
content: "How do I update my credit card?"
assert:
structured:
intent: "billing"
contains: "billing"
- name: "Multiple keywords"
messages:
- role: user
content: "I need help with my account password"
assert:
contains:
- "account"
- "password"# Test a specific prompt
php artisan ai-workflow:prompt-test classify_intent
# Test all prompts that have test files
php artisan ai-workflow:prompt-test
# Override the model
php artisan ai-workflow:prompt-test classify_intent --model=anthropic:claude-4All requests automatically retry on transient failures with random jitter (±25%) to prevent thundering herd:
- HTTP 429 (rate limit): waits ~30 seconds before retry.
- HTTP 5xx (server error): exponential backoff (~attempt x 2 seconds).
- Connection errors: linear backoff (~attempt x 1 second).
- 3 retries by default, configurable via
ai-workflow.retry.times.
Jitter can be disabled by setting ai-workflow.retry.jitter to false.
If all retries are exhausted, a RetriesExhaustedException is thrown with the retry count and original exception.
If a structured request fails to decode JSON (the model produced invalid output), the package automatically retries with the fallback_model if one is configured in the prompt's front-matter.
After each AI response, the finish reason is checked:
| Finish Reason | Behaviour |
|---|---|
Stop, ToolCalls |
Success — response returned normally. |
Unknown, Error, Other |
Transient issue — throws PrismException so callers can skip gracefully. |
Length, ContentFilter |
Degraded — reports to your error tracker via report(), but still returns the response. |
The package works with Prism's built-in faking:
use Prism\Prism\Enums\FinishReason;
use Prism\Prism\Facades\Prism;
use Prism\Prism\Testing\TextResponseFake;
Prism::fake([
TextResponseFake::make()
->withText('Mocked response')
->withFinishReason(FinishReason::Stop),
]);
// Your code that calls AiService will receive the fake response.docker compose up -d devtools
docker compose exec devtools composer install
docker compose exec devtools ./vendor/bin/pint # Code style
docker compose exec devtools ./vendor/bin/phpstan analyse --memory-limit=1G # Static analysis
docker compose exec devtools ./vendor/bin/phpunit # Tests