feat: detect attached editorial images, skip AI image generation#131
feat: detect attached editorial images, skip AI image generation#131sweetmantech merged 6 commits intomainfrom
Conversation
When an attached image is classified as an editorial press photo (professional portrait with cinematic lighting, no text/overlays), the pipeline now uses it directly for video generation instead of generating a new AI image. Changes: - Add createEditorialDetectionAgent for AI-based editorial photo classification - Add detectEditorialImage using few-shot prompting (mirrors detectFace pattern) - Update classifyImages to return editorialImageUrl alongside faceGuideUrl - Update resolveFaceGuide to surface editorialImageUrl through the pipeline - Update createContentTask to skip image generation when editorial image found - Add tests for detectEditorialImage, classifyImages, update resolveFaceGuide tests Co-Authored-By: Paperclip <noreply@paperclip.ing>
📝 WalkthroughWalkthroughAdded editorial-photo detection: new ToolLoopAgent factory and detection function using few‑shot prompts; integrated editorial checks into image classification and face‑guide resolution; updated content task to skip AI image generation when an editorial image is attached; tests added/updated for detection and classification flows. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant ContentTask as Content Task
participant Classify as classifyImages()
participant Detect as detectEditorialImage()
participant Agent as Editorial Agent
participant Gemini as Google Gemini
participant Gen as generateImage()
Client->>ContentTask: create content request (images, usesImageOverlay)
ContentTask->>Classify: classifyImages(images, usesFaceGuide, usesImageOverlay)
loop per image
Classify->>Detect: detectEditorialImage(imageUrl)
Detect->>Agent: createEditorialDetectionAgent()
Detect->>Gemini: generate(few-shot prompt + target image)
Gemini-->>Detect: { isEditorial: true|false }
Detect-->>Classify: boolean result
alt isEditorial == true and none selected yet
Classify->>Classify: set editorialImageUrl (exclude from additional)
else
Classify->>Classify: add to additionalImageUrls / continue
end
end
Classify-->>ContentTask: { faceGuideUrl, editorialImageUrl, additionalImageUrls }
alt editorialImageUrl exists
ContentTask->>ContentTask: use editorialImageUrl (skip generate/upscale)
else
ContentTask->>Gen: generateImage(imageRefs, prompt)
Gen-->>ContentTask: generated image (may be upscaled)
end
ContentTask-->>Client: created content response
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Code Review — Editorial Image DetectionSummaryThis PR adds AI-based editorial photo detection to the content pipeline. When an attached image is identified as a professional editorial press photo (via Gemini Flash), it's used directly as the base image — skipping AI image generation and upscaling. Clean, focused implementation that follows existing patterns. CI Status
Branch Status
CLEAN Code AssessmentSRP ✅ — Each new file has one clear responsibility: OCP ✅ — Extended DRY ✅ — Follows the same detection pattern as YAGNI ✅ — No over-engineering. Focused implementation with no speculative features. Issues FoundSuggestions (non-blocking):
Security
Verdict: approve ✅Well-structured, follows existing patterns, good test coverage (new test files + updated existing tests), and graceful error handling. The suggestions above are minor improvements for future iterations. |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/content/classifyImages.ts (1)
34-53:⚠️ Potential issue | 🟠 MajorDon't short-circuit before editorial classification.
The
continueon Line 41 exits before the editorial check on Line 45, so a single attached press photo with a visible face only setsfaceGuideUrl. Downstream,src/tasks/createContentTask.tskeeps generating an AI image becauseeditorialImageUrlnever gets populated.💡 Suggested fix
for (const imageUrl of images) { const uploadedUrl = await fetchImageFromUrl(imageUrl); + let hasFace = false; if (usesFaceGuide && !faceGuideUrl) { - const hasFace = await detectFace(uploadedUrl); + hasFace = await detectFace(uploadedUrl); if (hasFace) { faceGuideUrl = uploadedUrl; - continue; } } if (usesImageOverlay && !editorialImageUrl) { const isEditorial = await detectEditorialImage(uploadedUrl); if (isEditorial) { editorialImageUrl = uploadedUrl; continue; } } + + if (hasFace) { + continue; + } additionalImageUrls.push(uploadedUrl); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/content/classifyImages.ts` around lines 34 - 53, In classifyImages.ts inside the loop that processes images (the block calling fetchImageFromUrl), don't short-circuit with continue after detectFace; instead run both detectFace and detectEditorialImage for the same uploadedUrl so one image can be both a face guide and an editorial image. Concretely, in the loop that references usesFaceGuide, faceGuideUrl, usesImageOverlay, editorialImageUrl, detectFace and detectEditorialImage: compute hasFace and isEditorial for uploadedUrl, set faceGuideUrl if usesFaceGuide && !faceGuideUrl && hasFace, set editorialImageUrl if usesImageOverlay && !editorialImageUrl && isEditorial, and only push uploadedUrl to additionalImageUrls if neither assignment happened. This preserves the original intent while allowing a single image to populate both faceGuideUrl and editorialImageUrl.
🧹 Nitpick comments (1)
src/content/detectEditorialImage.ts (1)
1-1: Use the Trigger.dev logger for this new classification path.This helper adds new runtime logging through
logStep, so the editorial-detection flow bypasses the standard logger the repo expects.As per coding guidelines, "Use
loggerfrom@trigger.dev/sdk/v3for logging".Also applies to: 45-50
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/content/detectEditorialImage.ts` at line 1, Replace the ad-hoc logStep usage with the repo-standard Trigger.dev logger: remove the import of logStep in src/content/detectEditorialImage.ts and import { logger } from "@trigger.dev/sdk/v3"; then update all calls to logStep (including the new runtime logging at the top and the calls around the detect/editorial flow at lines ~45-50) to use logger.info/debug/error as appropriate, preserving the original messages and context; ensure function names like detectEditorialImage (or any exported helpers in this file) now call logger instead of logStep and that the import and usages compile.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@src/content/classifyImages.ts`:
- Around line 34-53: In classifyImages.ts inside the loop that processes images
(the block calling fetchImageFromUrl), don't short-circuit with continue after
detectFace; instead run both detectFace and detectEditorialImage for the same
uploadedUrl so one image can be both a face guide and an editorial image.
Concretely, in the loop that references usesFaceGuide, faceGuideUrl,
usesImageOverlay, editorialImageUrl, detectFace and detectEditorialImage:
compute hasFace and isEditorial for uploadedUrl, set faceGuideUrl if
usesFaceGuide && !faceGuideUrl && hasFace, set editorialImageUrl if
usesImageOverlay && !editorialImageUrl && isEditorial, and only push uploadedUrl
to additionalImageUrls if neither assignment happened. This preserves the
original intent while allowing a single image to populate both faceGuideUrl and
editorialImageUrl.
---
Nitpick comments:
In `@src/content/detectEditorialImage.ts`:
- Line 1: Replace the ad-hoc logStep usage with the repo-standard Trigger.dev
logger: remove the import of logStep in src/content/detectEditorialImage.ts and
import { logger } from "@trigger.dev/sdk/v3"; then update all calls to logStep
(including the new runtime logging at the top and the calls around the
detect/editorial flow at lines ~45-50) to use logger.info/debug/error as
appropriate, preserving the original messages and context; ensure function names
like detectEditorialImage (or any exported helpers in this file) now call logger
instead of logStep and that the import and usages compile.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 0ed5ea98-2e18-43ce-bc89-8501332afcc6
📒 Files selected for processing (8)
src/agents/createEditorialDetectionAgent.tssrc/content/__tests__/classifyImages.test.tssrc/content/__tests__/detectEditorialImage.test.tssrc/content/__tests__/resolveFaceGuide.test.tssrc/content/classifyImages.tssrc/content/detectEditorialImage.tssrc/content/resolveFaceGuide.tssrc/tasks/createContentTask.ts
…tion Re-uploaded fal.media URLs are brand-new and occasionally unreachable by the model provider, causing detection to fail with "Cannot fetch content from the provided URL". The original input URL is already reachable (we just downloaded from it), so use it for detection and reserve the fal upload for downstream fal.ai pipelines. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| */ | ||
| export function createEditorialDetectionAgent() { | ||
| return new ToolLoopAgent({ | ||
| model: "google/gemini-3.1-flash-lite-preview", |
There was a problem hiding this comment.
DRY - Is this the same model used in the face guide detection agent?
- actual: not using shared const with other image detection agents.
- required: inline string replaced with shared const model string used both here and in the face guide detection agent.
| export async function classifyImages({ | ||
| images, | ||
| usesFaceGuide, | ||
| usesImageOverlay, |
There was a problem hiding this comment.
Why is this variable named usesImageOverlay rather than usesEditorialImage?
| } | ||
| } | ||
|
|
||
| if (usesImageOverlay && !editorialImageUrl) { |
There was a problem hiding this comment.
KISS - why is editorialImageUrl check required here? It is initialized in this functions as null without modifications, right?
| import { createEditorialDetectionAgent } from "../agents/createEditorialDetectionAgent"; | ||
|
|
||
| const EDITORIAL_EXAMPLE_URLS = [ | ||
| "https://dxfamqbi5zyezrs5.public.blob.vercel-storage.com/content-templates/artist-release-editorial/references/images/ref-01.png", |
There was a problem hiding this comment.
This link is invalid.
There was a problem hiding this comment.
- Extract IMAGE_DETECTION_MODEL constant shared by face + editorial agents (DRY) - Fix invalid editorial reference URL (was 404) — use the valid blob URL - Rename classify-layer param usesImageOverlay → usesEditorialImage; leave the template-level field as usesImageOverlay (it still means "overlays") Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…orial detectors Consolidates the two-shot image classification scaffold (example image + target image + structured boolean output + error fallback) into a single helper. detectFace and detectEditorialImage become thin configuration wrappers. Also removes URL truncation in logs — full URLs make the logs verifiable. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| if (usesEditorialImage && !editorialImageUrl) { | ||
| const isEditorial = await detectEditorialImage(imageUrl); | ||
| if (isEditorial) { | ||
| editorialImageUrl = uploadedUrl; | ||
| continue; | ||
| } | ||
| } |
There was a problem hiding this comment.
KISS principle: have we considered if needing 2 checks on the same image is necessary. Alternatively, how could the existing src/content/classifyImages.ts + agent files definition change to handle both cases in a single request?
Replace the per-kind binary detectors (detectFace + detectEditorialImage)
with a single classifyImage that returns one of {face_guide, editorial,
additional} in one Gemini call. Cuts API calls per image roughly in half
and makes adding new image categories a 2-line change (enum variant +
few-shot example) instead of a new agent + detection function + pipeline
branch.
- Remove: detectFace, detectEditorialImage, runImageFewShotClassification,
createFaceDetectionAgent, createEditorialDetectionAgent (and their tests)
- Add: createImageClassificationAgent (z.enum schema), classifyImage
(single few-shot call with one example per positive kind)
- classifyImages dispatches on the returned kind; skips classification
entirely when neither flag is set
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| // --- Step 5: Generate image (API) — skip if editorial image attached --- | ||
| let imageUrl: string; | ||
|
|
||
| if (editorialImageUrl) { | ||
| logStep("Using attached editorial image, skipping AI image generation", true, { | ||
| editorialImageUrl: editorialImageUrl.slice(0, 80), | ||
| }); | ||
| imageUrl = editorialImageUrl; | ||
| } else { | ||
| logStep("Generating image via API"); | ||
| const referenceImagePath = pickRandomReferenceImage(template); | ||
| const instruction = resolveImageInstruction(template); | ||
| const basePrompt = `${instruction} ${template.imagePrompt}`; | ||
| const fullPrompt = buildImagePrompt(basePrompt, template.styleGuide); | ||
|
|
||
| const imageRefs: string[] = []; | ||
| if (faceGuideUrl) imageRefs.push(faceGuideUrl); | ||
| if (referenceImagePath) imageRefs.push(referenceImagePath); | ||
| if (!template.usesImageOverlay && additionalImageUrls.length) { | ||
| imageRefs.push(...additionalImageUrls); | ||
| } | ||
|
|
||
| imageUrl = await generateImage({ | ||
| prompt: fullPrompt, | ||
| referenceImageUrl: faceGuideUrl ?? undefined, | ||
| images: imageRefs.length > 0 ? imageRefs : undefined, | ||
| }); |
There was a problem hiding this comment.
OCP - how can we minimize the additions to the src/tasks/createContentTask.ts function. If new logic is needed, abstract it to a new function file following tdd.
Step 5-6 of the pipeline (use editorial image OR generate + optional upscale) is now a single function call in the orchestrator. New image routing logic can live in resolveBaseImage without bloating the task file. Red-green TDD with a dedicated test file covering the editorial bypass, generation path, upscale toggle, and overlay-aware imageRefs assembly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
Changes
createEditorialDetectionAgent.ts— ToolLoopAgent for editorial photo classificationdetectEditorialImage.ts— few-shot AI detection (is this a professional editorial press photo?)classifyImages.ts— now returnseditorialImageUrlalongsidefaceGuideUrlresolveFaceGuide.ts— surfaceseditorialImageUrlthrough the pipelinecreateContentTask.ts— skips image generation + upscale when editorial image attacheddetectEditorialImage.test.ts,classifyImages.test.tsresolveFaceGuide.test.ts(new param + return field)Test plan
🤖 Generated with Claude Code
Summary by cubic
Detects editorial press photos in attachments and, when found, uses them directly for video creation, skipping image generation and upscale. Replaces per-kind detectors with a single multi-class classifier and pulls base-image logic into
resolveBaseImage.New Features
classifyImageviacreateImageClassificationAgent(usesIMAGE_DETECTION_MODEL) returningface_guide|editorial|additional.classifyImagesnow returnsfaceGuideUrl,editorialImageUrl, andadditionalImageUrls; runs classification only whenusesFaceGuideorusesEditorialImageis true.resolveFaceGuidesurfaceseditorialImageUrl;createContentTaskuses it to skip image generation and upscaling.resolveBaseImageto handle editorial bypass vs. generate + optional upscale; added focused tests.Bug Fixes
Written for commit e7a0a5d. Summary will update on new commits.
Summary by CodeRabbit
New Features
Tests