Skip to content

Add sd lora data container and preprocessing funcs#2281

Merged
xiaoyu-work merged 8 commits intomainfrom
xiaoyu/sd_data
Dec 19, 2025
Merged

Add sd lora data container and preprocessing funcs#2281
xiaoyu-work merged 8 commits intomainfrom
xiaoyu/sd_data

Conversation

@xiaoyu-work
Copy link
Collaborator

This pull request introduces a new sd_lora data component to the Olive data pipeline, adding advanced image preprocessing and auto-captioning capabilities for Stable Diffusion LoRA training. The main changes include registering the new component, implementing aspect ratio bucketing for efficient batching, and supporting automatic image captioning using BLIP-2 and Florence-2 models.

New SD LoRA Data Component:

  • Registered the new sd_lora module in olive.data.component.__init__.py, making it available for import and use in the Olive data pipeline.
  • Added a copyright header to the sd_lora package to ensure proper licensing.

Image Preprocessing Enhancements:

  • Implemented aspect_ratio_bucketing in sd_lora/aspect_ratio_bucketing.py, which automatically assigns images to buckets based on aspect ratio and resolution, supporting resizing and cropping for efficient Stable Diffusion training. Includes utilities for bucket generation, image resizing, and crop coordinate calculation.

Auto-Captioning Support:

  • Added auto_caption, blip2_caption, and florence2_caption preprocessing functions in sd_lora/auto_caption.py, enabling automatic image captioning using BLIP-2 and Florence-2 models. Supports batch processing, device selection, overwrite logic, and flexible caption storage (file or in-memory).

These additions significantly enhance Olive's data preparation workflow for image generation and LoRA fine-tuning tasks.## Describe your changes

Checklist before requesting a review

  • Add unit tests for this change.
  • Make sure all tests can pass.
  • Update documents if necessary.
  • Lint and apply fixes to your code by running lintrunner -a
  • Is this a user-facing change? If yes, give a description of this change to be included in the release notes.

(Optional) Issue link

Copy link
Contributor

@github-advanced-security github-advanced-security bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lintrunner found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.

shaahji
shaahji previously approved these changes Dec 15, 2025
Base automatically changed from xiaoyu/diffusers to main December 18, 2025 19:18
@xiaoyu-work xiaoyu-work dismissed shaahji’s stale review December 18, 2025 19:18

The base branch was changed.

xiaoyu-work and others added 3 commits December 18, 2025 13:16
…has a side-effect

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
…has a side-effect

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
@xiaoyu-work xiaoyu-work merged commit 1057d75 into main Dec 19, 2025
11 checks passed
@xiaoyu-work xiaoyu-work deleted the xiaoyu/sd_data branch December 19, 2025 18:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants