Skip to content

Commit 22a90da

Browse files
committed
feat(image): character consistency, style transfer, provider modernization
- Add referenceImageUrl, faceEmbedding, consistencyMode to ImageGenerationRequest - Replicate: dual-endpoint support (/models/.../predictions + /predictions) - Replicate: expand model catalog from 3 to 13 (Flux 1.1 Pro, Ultra, Redux, Canny, Depth, Pulid, Fill Pro, SDXL Lightning, Real-ESRGAN) - Replicate: auto-select Pulid for strict consistency, ControlNet routing by controlType - Fal: add editImage() (img2img + inpaint), expand catalog to 7 models - Fal: IP-Adapter character consistency mapping - SD-Local: IP-Adapter ControlNet injection for character consistency - PolicyAwareImageRouter: capability filtering (character-consistency, controlnet, style-transfer) - AvatarPipeline: per-stage consistency mode (strict for expressions, balanced for body) - New transferStyle() high-level API via Flux Redux with multi-provider fallback - OpenAI/Stability/OpenRouter/BFL: graceful debug warning for unsupported referenceImageUrl - 59 new tests, 170 total passing across image subsystem - New docs: CHARACTER_CONSISTENCY.md, STYLE_TRANSFER.md - Updated: CHANGELOG, README, HIGH_LEVEL_API, image-gen SKILL
1 parent f973a97 commit 22a90da

30 files changed

Lines changed: 4937 additions & 24 deletions

CHANGELOG.md

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,26 @@
1+
## [Unreleased]
2+
3+
### Added
4+
- `transferStyle()` high-level API for image-guided style transfer via Flux Redux
5+
- Character consistency fields on `ImageGenerationRequest`: `referenceImageUrl`, `faceEmbedding`, `consistencyMode`
6+
- Replicate: dual-endpoint support (modern `/models/.../predictions` + legacy `/predictions`)
7+
- Replicate: 10 new models in catalog (Flux 1.1 Pro, Ultra, Redux, Canny, Depth, Fill Pro, Pulid, SDXL Lightning, SDXL, Real-ESRGAN)
8+
- Replicate: character consistency via Pulid auto-selection when `consistencyMode: 'strict'`
9+
- Replicate: ControlNet image input (`controlImage`, `controlType`) for Flux Canny/Depth
10+
- Fal: `editImage()` support (img2img + inpainting)
11+
- Fal: 4 new models in catalog (Pro 1.1, Ultra, LoRA, Realism)
12+
- Fal: IP-Adapter character consistency mapping
13+
- SD-Local: IP-Adapter character consistency via ControlNet injection
14+
- `PolicyAwareImageRouter`: `'character-consistency'` capability filtering
15+
- `AvatarPipeline`: per-stage consistency mode (`strict` for expressions, `balanced` for body)
16+
- `docs/features/CHARACTER_CONSISTENCY.md`
17+
- `docs/features/STYLE_TRANSFER.md`
18+
- 59 new tests across providers, APIs, and integration scenarios
19+
- OpenAI, Stability, OpenRouter, BFL: graceful debug warning when `referenceImageUrl` is set but unsupported
20+
21+
### Changed
22+
- Replicate: default inpaint model upgraded from `flux-fill` to `flux-fill-pro`
23+
124
## <small>0.1.177 (2026-04-04)</small>
225

326
* fix(api): include systemBlocks on exported AgentOptions interface ([d79ddab](https://github.com/framersai/agentos/commit/d79ddab))

README.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -795,10 +795,11 @@ const resilient = agent({
795795
| `generateObject(opts)` | Zod-validated structured output extraction |
796796
| `streamObject(opts)` | Streaming structured output |
797797
| `embedText(opts)` | Text embedding generation (single or batch) |
798-
| `generateImage(opts)` | Image generation (OpenAI, Stability, Replicate, BFL, Fal) |
798+
| `generateImage(opts)` | Image generation with character consistency (7 providers) |
799799
| `editImage(opts)` | Image editing/inpainting |
800800
| `upscaleImage(opts)` | Image upscaling |
801801
| `variateImage(opts)` | Image variations |
802+
| `transferStyle(opts)` | Style transfer via Flux Redux / img2img |
802803
| `generateVideo(opts)` | Video generation |
803804
| `analyzeVideo(opts)` | Video analysis and understanding |
804805
| `detectScenes(opts)` | Scene detection in video |
@@ -829,6 +830,7 @@ import type {
829830
AgencyOptions, // agency() configuration
830831
GenerateTextOptions, // generateText() / streamText() options
831832
GenerateImageOptions, // generateImage() options
833+
TransferStyleOptions, // transferStyle() options
832834
GenerateObjectOptions, // generateObject() options
833835
EmbedTextOptions, // embedText() options
834836
ExtensionDescriptor, // Extension pack descriptor
Lines changed: 128 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,128 @@
1+
# Character Consistency — Face-Preserving Image Generation
2+
3+
> Generate images that maintain a consistent character identity across multiple outputs using reference images and face embeddings.
4+
5+
---
6+
7+
## Overview
8+
9+
Character consistency lets you anchor generated images to a reference face or character, ensuring the same person appears across portraits, expressions, full-body shots, and scene illustrations. AgentOS supports three levels of consistency via the `consistencyMode` parameter:
10+
11+
| Mode | Strength | Use Case |
12+
|------|----------|----------|
13+
| `'strict'` | 0.85–0.9 | Avatar expression sheets, emotion variants. Face must match exactly. |
14+
| `'balanced'` | 0.6 | Full-body shots, different angles. Recognizable but allows natural variation. |
15+
| `'loose'` | 0.3 | "Inspired by" generations. Style/mood carries over, face may drift. |
16+
17+
## Provider Support
18+
19+
| Provider | Mechanism | Models |
20+
|----------|-----------|--------|
21+
| **Replicate** | Pulid (strict), Flux image input (balanced/loose) | `zsxkib/pulid`, `black-forest-labs/flux-dev` |
22+
| **Fal** | IP-Adapter | `fal-ai/flux/dev` |
23+
| **SD-Local** | ControlNet + IP-Adapter extension | Any SD 1.5 / SDXL checkpoint |
24+
| OpenAI | Not supported (graceful ignore) ||
25+
| Stability | Not supported (graceful ignore) ||
26+
27+
## Basic Usage
28+
29+
```typescript
30+
import { generateImage } from '@framers/agentos';
31+
32+
// Generate a consistent expression variant
33+
const result = await generateImage({
34+
provider: 'replicate',
35+
prompt: 'Portrait of the character smiling warmly, soft lighting',
36+
referenceImageUrl: 'https://storage.example.com/character-neutral.png',
37+
consistencyMode: 'strict',
38+
});
39+
```
40+
41+
When `consistencyMode` is `'strict'` and no model is explicitly set, Replicate auto-selects `zsxkib/pulid` for maximum face consistency.
42+
43+
## Fields Reference
44+
45+
### `referenceImageUrl`
46+
47+
URL or base64 data URI of the reference character image. Each provider maps this to its native mechanism:
48+
49+
- **Replicate (Pulid):** `main_face_image` input
50+
- **Replicate (standard Flux):** `image` input with `image_strength`
51+
- **Fal:** `ip_adapter_image` body field
52+
- **SD-Local:** ControlNet `input_image` with IP-Adapter preprocessor
53+
54+
### `faceEmbedding`
55+
56+
Optional 512-dimensional vector from InsightFace or equivalent. Used by the `AvatarPipeline` for drift detection — after generating each image, the pipeline extracts the face embedding from the output and compares it to this anchor via cosine similarity. Images that drift below the threshold (default 0.6) are regenerated.
57+
58+
### `consistencyMode`
59+
60+
Controls how aggressively the provider preserves the reference identity:
61+
62+
```typescript
63+
// Strict — for expression sheets where faces must match
64+
await generateImage({
65+
prompt: 'Character looking angry, dramatic lighting',
66+
referenceImageUrl: neutralPortrait,
67+
consistencyMode: 'strict', // Pulid auto-selected on Replicate
68+
});
69+
70+
// Balanced — for full-body shots
71+
await generateImage({
72+
prompt: 'Full body shot of the character walking through a market',
73+
referenceImageUrl: neutralPortrait,
74+
consistencyMode: 'balanced',
75+
});
76+
77+
// Loose — for "inspired by" mood pieces
78+
await generateImage({
79+
prompt: 'Abstract portrait in the style of the character',
80+
referenceImageUrl: neutralPortrait,
81+
consistencyMode: 'loose',
82+
});
83+
```
84+
85+
## AvatarPipeline Integration
86+
87+
The `AvatarPipeline` uses consistency modes per stage:
88+
89+
| Stage | Mode | Rationale |
90+
|-------|------|-----------|
91+
| `neutral_portrait` | none | This IS the anchor — no reference exists yet |
92+
| `face_embedding` | none | Extraction, not generation |
93+
| `expression_sheet` | `'strict'` | Facial identity must match across all emotions |
94+
| `animated_emotes` | `'strict'` | Same character in motion |
95+
| `full_body` | `'balanced'` | Body proportions can vary; face should be recognizable |
96+
| `additional_angles` | `'balanced'` | 3/4 and profile views naturally differ from frontal |
97+
98+
```typescript
99+
import { AvatarPipeline } from '@framers/agentos/media/avatar';
100+
101+
const pipeline = new AvatarPipeline(faceService, imageGenerator);
102+
const result = await pipeline.generate({
103+
characterId: 'hero_001',
104+
identity: {
105+
displayName: 'Kael Stormwind',
106+
ageBand: 'young_adult',
107+
faceDescriptor: 'sharp jawline, green eyes, short dark hair, small scar above left eyebrow',
108+
},
109+
generationConfig: {
110+
baseModel: 'black-forest-labs/flux-dev',
111+
provider: 'replicate',
112+
},
113+
stages: ['neutral_portrait', 'face_embedding', 'expression_sheet', 'full_body'],
114+
});
115+
```
116+
117+
## Choosing the Right Mode
118+
119+
- **Avatars and expression sheets:** Always `'strict'`. The face is the product.
120+
- **Scene illustrations with known characters:** `'balanced'`. Character should be recognizable but the scene composition matters more.
121+
- **Style exploration and mood boards:** `'loose'`. The reference influences the vibe, not the pixels.
122+
- **No reference at all:** Omit `referenceImageUrl` entirely. The fields are fully optional.
123+
124+
## Related
125+
126+
- [Image Generation](./IMAGE_GENERATION.md) — Provider-agnostic generation API
127+
- [Style Transfer](./STYLE_TRANSFER.md) — Transfer visual aesthetics between images
128+
- [Image Editing](./IMAGE_EDITING.md) — Img2img, inpainting, upscaling

docs/features/STYLE_TRANSFER.md

Lines changed: 101 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,101 @@
1+
# Style Transfer — Image-Guided Aesthetic Translation
2+
3+
> Apply the visual style of one image to another using `transferStyle()`, backed by Flux Redux and cross-provider img2img.
4+
5+
---
6+
7+
## Overview
8+
9+
`transferStyle()` takes a source image and a style reference image, then produces an output that combines the content of the source with the visual aesthetic of the reference. This is useful for:
10+
11+
- Converting photographs to specific art styles (oil painting, anime, pixel art)
12+
- Applying a brand's visual identity to generated content
13+
- Creating consistent visual themes across a set of images
14+
15+
## `transferStyle()` API
16+
17+
```typescript
18+
import { transferStyle } from '@framers/agentos';
19+
20+
const result = await transferStyle({
21+
image: './photo.jpg',
22+
styleReference: './monet-waterlilies.jpg',
23+
prompt: 'Impressionist oil painting, visible brushstrokes, warm golden light',
24+
strength: 0.7,
25+
});
26+
27+
console.log(result.images[0].url);
28+
console.log(result.provider); // 'replicate'
29+
console.log(result.model); // 'black-forest-labs/flux-redux-dev'
30+
```
31+
32+
## Parameters
33+
34+
| Parameter | Type | Default | Description |
35+
|-----------|------|---------|-------------|
36+
| `image` | `string \| Buffer` | **required** | Source image (file path, URL, data URI, or Buffer) |
37+
| `styleReference` | `string \| Buffer` | **required** | Reference image whose style to apply |
38+
| `prompt` | `string` | **required** | Text guiding the transfer direction |
39+
| `strength` | `number` | `0.7` | How much reference style to apply (0 = unchanged, 1 = full transfer) |
40+
| `provider` | `string` | auto-detect | Override provider selection |
41+
| `model` | `string` | provider default | Override model selection |
42+
| `size` | `string` || Output dimensions (e.g. `'1024x1024'`) |
43+
| `negativePrompt` | `string` || Content to avoid |
44+
| `seed` | `number` || Reproducibility seed |
45+
| `policyTier` | `string` || Content policy tier for provider routing |
46+
47+
## Provider Routing
48+
49+
When no provider is specified, `transferStyle()` auto-detects the best available provider from environment variables:
50+
51+
| Priority | Provider | Model | How It Works |
52+
|----------|----------|-------|-------------|
53+
| 1 | Replicate | Flux Redux Dev | Purpose-built for image-guided generation. Style reference as primary input. |
54+
| 2 | Fal | Flux Dev | img2img with style description in prompt |
55+
| 3 | Stability | stable-image-core | img2img with strength parameter |
56+
| 4 | OpenAI | gpt-image-1 | editImage with descriptive prompt |
57+
58+
Replicate with Flux Redux produces the best results for style transfer because the model was trained specifically for image-conditioned generation.
59+
60+
## Strength Guide
61+
62+
| Range | Effect | Use Case |
63+
|-------|--------|----------|
64+
| 0.1–0.3 | Subtle color grading, minor texture shifts | Brand color overlays |
65+
| 0.4–0.6 | Moderate style influence, composition preserved | "In the style of" variations |
66+
| 0.7–0.8 | Strong style transfer, content recognizable | Art style conversion |
67+
| 0.9–1.0 | Near-complete adoption of reference aesthetic | Full aesthetic transformation |
68+
69+
## Examples
70+
71+
```typescript
72+
// Photograph → anime style
73+
const anime = await transferStyle({
74+
image: './portrait-photo.jpg',
75+
styleReference: './ghibli-frame.png',
76+
prompt: 'Studio Ghibli anime style, cel shading, vibrant colors',
77+
strength: 0.75,
78+
});
79+
80+
// Photograph → pixel art
81+
const pixel = await transferStyle({
82+
image: './landscape.jpg',
83+
styleReference: './pixel-art-reference.png',
84+
prompt: '16-bit pixel art, limited palette, retro game aesthetic',
85+
strength: 0.8,
86+
});
87+
88+
// Apply brand visual identity
89+
const branded = await transferStyle({
90+
image: './product-photo.jpg',
91+
styleReference: './brand-style-guide.png',
92+
prompt: 'Clean, modern, brand-consistent visual treatment',
93+
strength: 0.5,
94+
});
95+
```
96+
97+
## Related
98+
99+
- [Image Generation](./IMAGE_GENERATION.md) — Text-to-image generation
100+
- [Image Editing](./IMAGE_EDITING.md) — Img2img, inpainting, upscaling
101+
- [Character Consistency](./CHARACTER_CONSISTENCY.md) — Face-preserving generation

docs/getting-started/HIGH_LEVEL_API.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ Everything is one import. Pick the function that fits your task:
66
import {
77
generateText, streamText, // Text generation
88
generateObject, streamObject, // Structured output (Zod validated)
9-
generateImage, // Image generation
9+
generateImage, transferStyle, // Image generation & style transfer
1010
generateVideo, analyzeVideo, // Video generation & analysis
1111
generateMusic, generateSFX, // Audio generation
1212
performOCR, // Vision / OCR
@@ -23,7 +23,8 @@ import {
2323
| `generateText()` | One-shot text generation | `await generateText({ provider: 'openai', prompt: '...' })` |
2424
| `streamText()` | Stream text in real-time | `for await (const d of streamText({...}).textStream) {}` |
2525
| `generateObject()` | Extract structured JSON (Zod) | `await generateObject({ schema: z.object({...}), prompt: '...' })` |
26-
| `generateImage()` | Generate images | `await generateImage({ provider: 'openai', prompt: '...' })` |
26+
| `generateImage()` | Generate images (with character consistency) | `await generateImage({ provider: 'openai', prompt: '...' })` |
27+
| `transferStyle()` | Style transfer between images | `await transferStyle({ image: src, styleReference: ref, prompt: '...' })` |
2728
| `generateVideo()` | Generate video from text/image | `await generateVideo({ prompt: '...' })` |
2829
| `generateMusic()` | Generate music | `await generateMusic({ prompt: '...' })` |
2930
| `performOCR()` | Extract text from images | `await performOCR({ imagePath: './doc.png' })` |

0 commit comments

Comments
 (0)