Skip to content

feat: detect attached editorial images, skip AI image generation#131

Merged
sweetmantech merged 6 commits intomainfrom
feature/editorial-image-detection
Apr 13, 2026
Merged

feat: detect attached editorial images, skip AI image generation#131
sweetmantech merged 6 commits intomainfrom
feature/editorial-image-detection

Conversation

@recoup-coding-agent
Copy link
Copy Markdown
Collaborator

@recoup-coding-agent recoup-coding-agent commented Apr 13, 2026

Summary

  • Adds AI-based editorial photo detection (mirrors existing face detection pattern) to classify attached images as professional press photos
  • When an editorial image is detected among attachments, the pipeline skips AI image generation and uses the attached image directly for video generation
  • Playlist cover overlays still proceed normally via ffmpeg

Changes

  • New: createEditorialDetectionAgent.ts — ToolLoopAgent for editorial photo classification
  • New: detectEditorialImage.ts — few-shot AI detection (is this a professional editorial press photo?)
  • Modified: classifyImages.ts — now returns editorialImageUrl alongside faceGuideUrl
  • Modified: resolveFaceGuide.ts — surfaces editorialImageUrl through the pipeline
  • Modified: createContentTask.ts — skips image generation + upscale when editorial image attached
  • New tests: detectEditorialImage.test.ts, classifyImages.test.ts
  • Updated tests: resolveFaceGuide.test.ts (new param + return field)

Test plan

  • All 347 tests pass
  • No new TypeScript errors introduced
  • Manual test: create content with editorial image attached → verify image gen is skipped
  • Manual test: create content without editorial image → verify normal pipeline unchanged
  • Manual test: create content with face guide + editorial + playlist cover → verify correct classification

🤖 Generated with Claude Code


Summary by cubic

Detects editorial press photos in attachments and, when found, uses them directly for video creation, skipping image generation and upscale. Replaces per-kind detectors with a single multi-class classifier and pulls base-image logic into resolveBaseImage.

  • New Features

    • Replaced per-kind detectors with classifyImage via createImageClassificationAgent (uses IMAGE_DETECTION_MODEL) returning face_guide | editorial | additional.
    • classifyImages now returns faceGuideUrl, editorialImageUrl, and additionalImageUrls; runs classification only when usesFaceGuide or usesEditorialImage is true.
    • resolveFaceGuide surfaces editorialImageUrl; createContentTask uses it to skip image generation and upscaling.
    • Extracted resolveBaseImage to handle editorial bypass vs. generate + optional upscale; added focused tests.
  • Bug Fixes

    • Use original input URLs for classification to avoid provider fetch errors.
    • Fixed the editorial reference URL in few-shot examples.

Written for commit e7a0a5d. Summary will update on new commits.

Summary by CodeRabbit

  • New Features

    • Added editorial image detection to identify press photos for use as image overlays, streamlining workflows by utilizing existing editorial images instead of generating new ones when applicable.
  • Tests

    • Expanded test coverage with comprehensive test suites for editorial image detection, image classification scenarios, and face-guide resolution workflows to ensure feature reliability.

When an attached image is classified as an editorial press photo (professional
portrait with cinematic lighting, no text/overlays), the pipeline now uses it
directly for video generation instead of generating a new AI image.

Changes:
- Add createEditorialDetectionAgent for AI-based editorial photo classification
- Add detectEditorialImage using few-shot prompting (mirrors detectFace pattern)
- Update classifyImages to return editorialImageUrl alongside faceGuideUrl
- Update resolveFaceGuide to surface editorialImageUrl through the pipeline
- Update createContentTask to skip image generation when editorial image found
- Add tests for detectEditorialImage, classifyImages, update resolveFaceGuide tests

Co-Authored-By: Paperclip <noreply@paperclip.ing>
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 13, 2026

📝 Walkthrough

Walkthrough

Added editorial-photo detection: new ToolLoopAgent factory and detection function using few‑shot prompts; integrated editorial checks into image classification and face‑guide resolution; updated content task to skip AI image generation when an editorial image is attached; tests added/updated for detection and classification flows.

Changes

Cohort / File(s) Summary
Editorial Agent
src/agents/createEditorialDetectionAgent.ts
New factory returning a ToolLoopAgent configured with Google Gemini, Zod schema (isEditorial: boolean), and single-step stop behavior.
Editorial Detection Function
src/content/detectEditorialImage.ts
New exported detectEditorialImage(imageUrl) that creates the agent, sends a few‑shot prompt (example editorial + target), logs results, and returns boolean with error handling.
Image Classification
src/content/classifyImages.ts
Added usesImageOverlay param; performs editorial detection when enabled; first editorial match set to editorialImageUrl (excluded from additionalImageUrls); face detection uses original image URL; return type now includes editorialImageUrl.
Face Guide Resolution
src/content/resolveFaceGuide.ts
Added editorialImageUrl to ResolveFaceGuideResult and usesImageOverlay param; flows editorialImageUrl through classification and fallback branches; updated JSDoc.
Content Task
src/tasks/createContentTask.ts
Step 2 now passes usesImageOverlay into resolveFaceGuide. Step 5: if editorialImageUrl present, use it directly and skip image-generation/upscale; otherwise preserve prior generation flow and upscaling logic.
Tests — New
src/content/__tests__/detectEditorialImage.test.ts, src/content/__tests__/classifyImages.test.ts
New Vitest suites mocking agent and dependencies; assert few‑shot prompt structure, agent outputs, editorial selection ordering, additionalImageUrls behavior, and error cases.
Tests — Updated
src/content/__tests__/resolveFaceGuide.test.ts
Updated to pass usesImageOverlay: false, expect editorialImageUrl: null, changed mock reset to vi.resetAllMocks(), and adjusted mock call counts.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant ContentTask as Content Task
    participant Classify as classifyImages()
    participant Detect as detectEditorialImage()
    participant Agent as Editorial Agent
    participant Gemini as Google Gemini
    participant Gen as generateImage()

    Client->>ContentTask: create content request (images, usesImageOverlay)
    ContentTask->>Classify: classifyImages(images, usesFaceGuide, usesImageOverlay)

    loop per image
        Classify->>Detect: detectEditorialImage(imageUrl)
        Detect->>Agent: createEditorialDetectionAgent()
        Detect->>Gemini: generate(few-shot prompt + target image)
        Gemini-->>Detect: { isEditorial: true|false }
        Detect-->>Classify: boolean result

        alt isEditorial == true and none selected yet
            Classify->>Classify: set editorialImageUrl (exclude from additional)
        else
            Classify->>Classify: add to additionalImageUrls / continue
        end
    end

    Classify-->>ContentTask: { faceGuideUrl, editorialImageUrl, additionalImageUrls }

    alt editorialImageUrl exists
        ContentTask->>ContentTask: use editorialImageUrl (skip generate/upscale)
    else
        ContentTask->>Gen: generateImage(imageRefs, prompt)
        Gen-->>ContentTask: generated image (may be upscaled)
    end

    ContentTask-->>Client: created content response
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐰 I sniffed the pixels, one by one,
A few‑shot hint, the job was done.
If real press light is found today,
We skip the forge and save the day.
Hooray for photos that arrive—hooray! 📸

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and concisely describes the main feature: detecting editorial images and skipping AI generation when found, which aligns with the PR's core objective.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/editorial-image-detection

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@recoup-coding-agent
Copy link
Copy Markdown
Collaborator Author

Code Review — Editorial Image Detection

Summary

This PR adds AI-based editorial photo detection to the content pipeline. When an attached image is identified as a professional editorial press photo (via Gemini Flash), it's used directly as the base image — skipping AI image generation and upscaling. Clean, focused implementation that follows existing patterns.

CI Status

Check Status Conclusion
test completed ✅ success
cubic in_progress ⏳ pending
CodeRabbit pending ⏳ pending

test (the only build/CI check) passes. Remaining pending items are third-party review bots.

Branch Status

  • Mergeable: ✅ Yes
  • Merge state: unstable (due to pending review bot checks, not build failures)
  • Branch freshness: No merge conflicts detected

CLEAN Code Assessment

SRP ✅ — Each new file has one clear responsibility: createEditorialDetectionAgent.ts configures the AI agent, detectEditorialImage.ts handles detection logic, classifyImages.ts orchestrates classification.

OCP ✅ — Extended classifyImages and resolveFaceGuide with new parameters rather than restructuring existing logic. The editorial path is additive.

DRY ✅ — Follows the same detection pattern as detectFace (create agent → generate → parse output). Reuses existing fetchImageFromUrl and classifyImages infrastructure.

YAGNI ✅ — No over-engineering. Focused implementation with no speculative features.

Issues Found

Suggestions (non-blocking):

  1. Preview model (google/gemini-3.1-flash-lite-preview) — The model ID includes -preview, which may be deprecated or changed without notice. Consider tracking this and switching to a GA model when available.

  2. Single reference exampleEDITORIAL_EXAMPLE_URLS has one entry. The few-shot approach works, but adding 1-2 more diverse examples (different lighting, backgrounds) could improve classification accuracy. Not blocking since the current approach is functional and error-safe.

Security

  • ✅ No hardcoded secrets or API keys
  • ✅ Reference image URL is a public Vercel blob — appropriate for example data
  • ✅ Error handling returns false on failure (safe default, no information leak)
  • ✅ Errors are logged with truncated URLs (slice(0, 80)) — no sensitive data exposure

Verdict: approve

Well-structured, follows existing patterns, good test coverage (new test files + updated existing tests), and graceful error handling. The suggestions above are minor improvements for future iterations.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/content/classifyImages.ts (1)

34-53: ⚠️ Potential issue | 🟠 Major

Don't short-circuit before editorial classification.

The continue on Line 41 exits before the editorial check on Line 45, so a single attached press photo with a visible face only sets faceGuideUrl. Downstream, src/tasks/createContentTask.ts keeps generating an AI image because editorialImageUrl never gets populated.

💡 Suggested fix
   for (const imageUrl of images) {
     const uploadedUrl = await fetchImageFromUrl(imageUrl);
+    let hasFace = false;
 
     if (usesFaceGuide && !faceGuideUrl) {
-      const hasFace = await detectFace(uploadedUrl);
+      hasFace = await detectFace(uploadedUrl);
       if (hasFace) {
         faceGuideUrl = uploadedUrl;
-        continue;
       }
     }
 
     if (usesImageOverlay && !editorialImageUrl) {
       const isEditorial = await detectEditorialImage(uploadedUrl);
       if (isEditorial) {
         editorialImageUrl = uploadedUrl;
         continue;
       }
     }
+
+    if (hasFace) {
+      continue;
+    }
 
     additionalImageUrls.push(uploadedUrl);
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/content/classifyImages.ts` around lines 34 - 53, In classifyImages.ts
inside the loop that processes images (the block calling fetchImageFromUrl),
don't short-circuit with continue after detectFace; instead run both detectFace
and detectEditorialImage for the same uploadedUrl so one image can be both a
face guide and an editorial image. Concretely, in the loop that references
usesFaceGuide, faceGuideUrl, usesImageOverlay, editorialImageUrl, detectFace and
detectEditorialImage: compute hasFace and isEditorial for uploadedUrl, set
faceGuideUrl if usesFaceGuide && !faceGuideUrl && hasFace, set editorialImageUrl
if usesImageOverlay && !editorialImageUrl && isEditorial, and only push
uploadedUrl to additionalImageUrls if neither assignment happened. This
preserves the original intent while allowing a single image to populate both
faceGuideUrl and editorialImageUrl.
🧹 Nitpick comments (1)
src/content/detectEditorialImage.ts (1)

1-1: Use the Trigger.dev logger for this new classification path.

This helper adds new runtime logging through logStep, so the editorial-detection flow bypasses the standard logger the repo expects.

As per coding guidelines, "Use logger from @trigger.dev/sdk/v3 for logging".

Also applies to: 45-50

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/content/detectEditorialImage.ts` at line 1, Replace the ad-hoc logStep
usage with the repo-standard Trigger.dev logger: remove the import of logStep in
src/content/detectEditorialImage.ts and import { logger } from
"@trigger.dev/sdk/v3"; then update all calls to logStep (including the new
runtime logging at the top and the calls around the detect/editorial flow at
lines ~45-50) to use logger.info/debug/error as appropriate, preserving the
original messages and context; ensure function names like detectEditorialImage
(or any exported helpers in this file) now call logger instead of logStep and
that the import and usages compile.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@src/content/classifyImages.ts`:
- Around line 34-53: In classifyImages.ts inside the loop that processes images
(the block calling fetchImageFromUrl), don't short-circuit with continue after
detectFace; instead run both detectFace and detectEditorialImage for the same
uploadedUrl so one image can be both a face guide and an editorial image.
Concretely, in the loop that references usesFaceGuide, faceGuideUrl,
usesImageOverlay, editorialImageUrl, detectFace and detectEditorialImage:
compute hasFace and isEditorial for uploadedUrl, set faceGuideUrl if
usesFaceGuide && !faceGuideUrl && hasFace, set editorialImageUrl if
usesImageOverlay && !editorialImageUrl && isEditorial, and only push uploadedUrl
to additionalImageUrls if neither assignment happened. This preserves the
original intent while allowing a single image to populate both faceGuideUrl and
editorialImageUrl.

---

Nitpick comments:
In `@src/content/detectEditorialImage.ts`:
- Line 1: Replace the ad-hoc logStep usage with the repo-standard Trigger.dev
logger: remove the import of logStep in src/content/detectEditorialImage.ts and
import { logger } from "@trigger.dev/sdk/v3"; then update all calls to logStep
(including the new runtime logging at the top and the calls around the
detect/editorial flow at lines ~45-50) to use logger.info/debug/error as
appropriate, preserving the original messages and context; ensure function names
like detectEditorialImage (or any exported helpers in this file) now call logger
instead of logStep and that the import and usages compile.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0ed5ea98-2e18-43ce-bc89-8501332afcc6

📥 Commits

Reviewing files that changed from the base of the PR and between 3071b1b and b0795ab.

📒 Files selected for processing (8)
  • src/agents/createEditorialDetectionAgent.ts
  • src/content/__tests__/classifyImages.test.ts
  • src/content/__tests__/detectEditorialImage.test.ts
  • src/content/__tests__/resolveFaceGuide.test.ts
  • src/content/classifyImages.ts
  • src/content/detectEditorialImage.ts
  • src/content/resolveFaceGuide.ts
  • src/tasks/createContentTask.ts

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 8 files

…tion

Re-uploaded fal.media URLs are brand-new and occasionally unreachable
by the model provider, causing detection to fail with "Cannot fetch
content from the provided URL". The original input URL is already
reachable (we just downloaded from it), so use it for detection and
reserve the fal upload for downstream fal.ai pipelines.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
*/
export function createEditorialDetectionAgent() {
return new ToolLoopAgent({
model: "google/gemini-3.1-flash-lite-preview",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DRY - Is this the same model used in the face guide detection agent?

  • actual: not using shared const with other image detection agents.
  • required: inline string replaced with shared const model string used both here and in the face guide detection agent.

Comment thread src/content/classifyImages.ts Outdated
export async function classifyImages({
images,
usesFaceGuide,
usesImageOverlay,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this variable named usesImageOverlay rather than usesEditorialImage?

Comment thread src/content/classifyImages.ts Outdated
}
}

if (usesImageOverlay && !editorialImageUrl) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

KISS - why is editorialImageUrl check required here? It is initialized in this functions as null without modifications, right?

Comment thread src/content/detectEditorialImage.ts Outdated
import { createEditorialDetectionAgent } from "../agents/createEditorialDetectionAgent";

const EDITORIAL_EXAMPLE_URLS = [
"https://dxfamqbi5zyezrs5.public.blob.vercel-storage.com/content-templates/artist-release-editorial/references/images/ref-01.png",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This link is invalid.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sweetmantech and others added 2 commits April 13, 2026 08:28
- Extract IMAGE_DETECTION_MODEL constant shared by face + editorial agents (DRY)
- Fix invalid editorial reference URL (was 404) — use the valid blob URL
- Rename classify-layer param usesImageOverlay → usesEditorialImage; leave
  the template-level field as usesImageOverlay (it still means "overlays")

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…orial detectors

Consolidates the two-shot image classification scaffold (example image +
target image + structured boolean output + error fallback) into a single
helper. detectFace and detectEditorialImage become thin configuration
wrappers. Also removes URL truncation in logs — full URLs make the logs
verifiable.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment thread src/content/classifyImages.ts Outdated
Comment on lines +45 to +51
if (usesEditorialImage && !editorialImageUrl) {
const isEditorial = await detectEditorialImage(imageUrl);
if (isEditorial) {
editorialImageUrl = uploadedUrl;
continue;
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

KISS principle: have we considered if needing 2 checks on the same image is necessary. Alternatively, how could the existing src/content/classifyImages.ts + agent files definition change to handle both cases in a single request?

Replace the per-kind binary detectors (detectFace + detectEditorialImage)
with a single classifyImage that returns one of {face_guide, editorial,
additional} in one Gemini call. Cuts API calls per image roughly in half
and makes adding new image categories a 2-line change (enum variant +
few-shot example) instead of a new agent + detection function + pipeline
branch.

- Remove: detectFace, detectEditorialImage, runImageFewShotClassification,
  createFaceDetectionAgent, createEditorialDetectionAgent (and their tests)
- Add: createImageClassificationAgent (z.enum schema), classifyImage
  (single few-shot call with one example per positive kind)
- classifyImages dispatches on the returned kind; skips classification
  entirely when neither flag is set

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment thread src/tasks/createContentTask.ts Outdated
Comment on lines +92 to +118
// --- Step 5: Generate image (API) — skip if editorial image attached ---
let imageUrl: string;

if (editorialImageUrl) {
logStep("Using attached editorial image, skipping AI image generation", true, {
editorialImageUrl: editorialImageUrl.slice(0, 80),
});
imageUrl = editorialImageUrl;
} else {
logStep("Generating image via API");
const referenceImagePath = pickRandomReferenceImage(template);
const instruction = resolveImageInstruction(template);
const basePrompt = `${instruction} ${template.imagePrompt}`;
const fullPrompt = buildImagePrompt(basePrompt, template.styleGuide);

const imageRefs: string[] = [];
if (faceGuideUrl) imageRefs.push(faceGuideUrl);
if (referenceImagePath) imageRefs.push(referenceImagePath);
if (!template.usesImageOverlay && additionalImageUrls.length) {
imageRefs.push(...additionalImageUrls);
}

imageUrl = await generateImage({
prompt: fullPrompt,
referenceImageUrl: faceGuideUrl ?? undefined,
images: imageRefs.length > 0 ? imageRefs : undefined,
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OCP - how can we minimize the additions to the src/tasks/createContentTask.ts function. If new logic is needed, abstract it to a new function file following tdd.

Step 5-6 of the pipeline (use editorial image OR generate + optional
upscale) is now a single function call in the orchestrator. New image
routing logic can live in resolveBaseImage without bloating the task file.
Red-green TDD with a dedicated test file covering the editorial bypass,
generation path, upscale toggle, and overlay-aware imageRefs assembly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@sweetmantech sweetmantech merged commit a0523c5 into main Apr 13, 2026
2 checks passed
@sweetmantech sweetmantech deleted the feature/editorial-image-detection branch April 13, 2026 14:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants