Add sd lora data container and preprocessing funcs by xiaoyu-work · Pull Request #2281 · microsoft/Olive

xiaoyu-work · 2025-12-09T21:59:36Z

This pull request introduces a new sd_lora data component to the Olive data pipeline, adding advanced image preprocessing and auto-captioning capabilities for Stable Diffusion LoRA training. The main changes include registering the new component, implementing aspect ratio bucketing for efficient batching, and supporting automatic image captioning using BLIP-2 and Florence-2 models.

New SD LoRA Data Component:

Registered the new sd_lora module in olive.data.component.__init__.py, making it available for import and use in the Olive data pipeline.
Added a copyright header to the sd_lora package to ensure proper licensing.

Image Preprocessing Enhancements:

Implemented aspect_ratio_bucketing in sd_lora/aspect_ratio_bucketing.py, which automatically assigns images to buckets based on aspect ratio and resolution, supporting resizing and cropping for efficient Stable Diffusion training. Includes utilities for bucket generation, image resizing, and crop coordinate calculation.

Auto-Captioning Support:

Added auto_caption, blip2_caption, and florence2_caption preprocessing functions in sd_lora/auto_caption.py, enabling automatic image captioning using BLIP-2 and Florence-2 models. Supports batch processing, device selection, overwrite logic, and flexible caption storage (file or in-memory).

These additions significantly enhance Olive's data preparation workflow for image generation and LoRA fine-tuning tasks.## Describe your changes

Checklist before requesting a review

Add unit tests for this change.
Make sure all tests can pass.
Update documents if necessary.
Lint and apply fixes to your code by running lintrunner -a
Is this a user-facing change? If yes, give a description of this change to be included in the release notes.

(Optional) Issue link

github-advanced-security

lintrunner found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.

olive/data/component/sd_lora/image_filtering.py

test/data_container/sd_lora/test_image_resizing.py

test/data_container/sd_lora/test_auto_caption.py

olive/data/component/sd_lora/image_resizing.py

olive/data/component/sd_lora/image_filtering.py

olive/data/component/sd_lora/dataloader.py

olive/data/component/sd_lora/image_filtering.py

The base branch was changed.

test/data_container/sd_lora/test_image_resizing.py

…has a side-effect Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

github-advanced-security bot found potential problems Dec 9, 2025

View reviewed changes

olive/data/component/sd_lora/image_filtering.py Fixed Show fixed Hide fixed

shaahji previously approved these changes Dec 15, 2025

View reviewed changes

Base automatically changed from xiaoyu/diffusers to main December 18, 2025 19:18

xiaoyu-work added 5 commits December 18, 2025 20:16

Add sd lora data container and preprocessing funcs

64c1b46

Fix lint

8cbbc5d

fix format

bf3b9bb

fix format

0d4ce7e

fix format

7ef866a

xiaoyu-work force-pushed the xiaoyu/sd_data branch from ce47f87 to 7ef866a Compare December 18, 2025 20:17

github-advanced-security bot found potential problems Dec 18, 2025

View reviewed changes

test/data_container/sd_lora/test_image_resizing.py Fixed Show fixed Hide fixed

test/data_container/sd_lora/test_image_resizing.py Fixed Show fixed Hide fixed

xiaoyu-work and others added 3 commits December 18, 2025 13:16

Potential fix for code scanning alert no. 19201: An assert statement …

0642d9b

…has a side-effect Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

Potential fix for code scanning alert no. 19202: An assert statement …

0196843

…has a side-effect Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

Fix test

60cc6d4

shaahji approved these changes Dec 19, 2025

View reviewed changes

xiaoyu-work merged commit 1057d75 into main Dec 19, 2025
11 checks passed

xiaoyu-work deleted the xiaoyu/sd_data branch December 19, 2025 18:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add sd lora data container and preprocessing funcs#2281

Add sd lora data container and preprocessing funcs#2281
xiaoyu-work merged 8 commits intomainfrom
xiaoyu/sd_data

xiaoyu-work commented Dec 9, 2025

Uh oh!

github-advanced-security bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

xiaoyu-work commented Dec 9, 2025

Checklist before requesting a review

(Optional) Issue link

Uh oh!

github-advanced-security bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants