Large-scale refactoring has been merged into `main` — migration guide inside #949

kohya-ss · 2026-05-13T23:54:37Z

kohya-ss
May 13, 2026
Maintainer

Update (2026-05-22): The refactoring has been merged into main via #950. See the migration guide at the bottom of this post. If anything broke for you after updating, please reply here.

更新 (2026-05-22): リファクタリングは #950 により main にマージされました。本投稿末尾の移行ガイドをご確認ください。更新後に動作がおかしい点があれば、このスレッドで返信をお願いします。

Thank you for using and contributing to this repository.

A large-scale internal refactoring is currently in progress and is nearing completion. We plan to merge it into main soon.

The user-facing interface of each script (script names, command-line options, dataset configuration, etc.) is essentially unchanged, so you can continue using the scripts as before.

On the implementation side, the architecture-specific classes are largely preserved, but the internal APIs (base classes, utility modules, etc.) have changed significantly. Maintainers of forks and dependent tools are kindly asked to review their code after the merge.

After the merge, a migration guide outlining the main changes and old-to-new mappings will be added to this discussion.

Since the refactoring is still in progress, we would appreciate it if pull requests that touch the common parts could be submitted after the refactoring has been merged.

There may be some temporary inconvenience, but we appreciate your understanding, as this work aims to improve long-term maintainability. Thank you.

日本語

いつも当リポジトリをご利用いただき、また多数の貢献をいただき、ありがとうございます。
現在、大規模な内部リファクタリングを進めており、間もなく完了する見込みです。近日中にmainへのマージを予定しています。

各スクリプト利用時のインターフェース（スクリプト名、コマンドラインオプション、データセット設定など）には基本的には変更ありませんので、そのままお使いいただけます。

実装面では、各アーキテクチャのクラスは基本的に維持されていますが、内部のAPI（基底クラスやユーティリティモジュールなど）は大きく変更されています。そのため、forkされたリポジトリや、依存ツールのメンテナの方々におかれましては、マージ後に確認をお願いいたします。
マージ後に、主要な変更点や新旧の対応を示した移行ガイドをこのdiscussionに追加する予定です。

pull requestにつきまして、リファクタリングが進行中のため、共通部分に関するpull requestはリファクタリングのマージ後にご提出いただければ幸いです。

一時的にご不便をおかけする場面もあるかもしれませんが、将来的な保守性を向上させるための取り組みとしてご理解いただければ幸いです。よろしくお願いいたします。

Migration Guide (post-merge)

The refactor described above has now landed on main via #950. This guide summarises the surface that changed for fork maintainers and authors of dependent tools.

1. End users — no changes required

Script names, command-line options, and dataset configuration (TOML) are unchanged. Existing training/inference/cache commands continue to work as before.

If you notice any behavioural change or breakage after updating, please report it in this discussion.

2. Fork / extension maintainers

Backward-compatible re-exports are in place, so most existing imports keep working. However, the canonical locations have moved, and a few APIs have shifted shape. New code should target the new paths.

2.1 New module layout

src/musubi_tuner/training/
├── trainer_base.py        # NetworkTrainer, DiTOutput, SS_METADATA_*
├── accelerator_setup.py   # clean_memory_on_device, prepare_accelerator, collator_class
├── sampling_prompts.py    # line_to_prompt_dict, load_prompts, should_sample_images
├── timesteps.py           # compute_density_for_timestep_sampling, get_sigmas, compute_loss_weighting_for_sd3
└── parser_common.py       # setup_parser_common, read_config_from_file

src/musubi_tuner/dataset/
├── architectures.py       # ARCHITECTURE_* constants
├── bucket.py              # BucketSelector, BucketBatchManager
├── cache_io.py            # save_latent_cache_*, save_text_encoder_output_cache_*
├── datasources.py         # ContentDatasource and subclasses
├── media_utils.py         # image/video glob, load, resize helpers
└── image_video_dataset.py # public dataset class (now a thin orchestrator)

2.2 Symbol relocation table

Symbol	Old import path	New canonical path
`NetworkTrainer`, `DiTOutput`, `SS_METADATA_KEY_*`, `SS_METADATA_MINIMUM_KEYS`	`musubi_tuner.hv_train_network`	`musubi_tuner.training.trainer_base`
`clean_memory_on_device`, `prepare_accelerator`, `collator_class`	`musubi_tuner.hv_train_network`	`musubi_tuner.training.accelerator_setup`
`line_to_prompt_dict`, `load_prompts`, `should_sample_images`	`musubi_tuner.hv_train_network`	`musubi_tuner.training.sampling_prompts`
`compute_density_for_timestep_sampling`, `get_sigmas`, `compute_loss_weighting_for_sd3`	`musubi_tuner.hv_train_network`	`musubi_tuner.training.timesteps`
`setup_parser_common`, `read_config_from_file`	`musubi_tuner.hv_train_network`	`musubi_tuner.training.parser_common`
`BucketSelector`, `BucketBatchManager`	`musubi_tuner.dataset.image_video_dataset`	`musubi_tuner.dataset.bucket`
`ContentDatasource`, `ImageDatasource`, `VideoDatasource`, `DirectoryDatasource`, `JsonlDatasource`	`musubi_tuner.dataset.image_video_dataset`	`musubi_tuner.dataset.datasources`
`save_latent_cache_`, `save_text_encoder_output_cache_`	`musubi_tuner.dataset.image_video_dataset`	`musubi_tuner.dataset.cache_io`
`resize_image_to_bucket`, image/video I/O helpers	`musubi_tuner.dataset.image_video_dataset`	`musubi_tuner.dataset.media_utils`
`ARCHITECTURE_*` constants	`musubi_tuner.dataset.image_video_dataset`	`musubi_tuner.dataset.architectures`

Old paths re-export the symbols above, so existing code does not need to change immediately.

2.3 `NetworkTrainer` is now architecture-agnostic

HunyuanVideo-specific defaults (load_vae, load_transformer, call_dit, process_sample_prompts, do_inference, …) used to live as concrete methods on NetworkTrainer. They are now on HunyuanVideoNetworkTrainer in hv_train_network.py, and the base class declares them as abstract hooks (NotImplementedError).

If your fork inherited HV behaviour by default (rather than overriding it), subclass HunyuanVideoNetworkTrainer instead of NetworkTrainer.

2.4 `train()` split into phase methods

NetworkTrainer.train() (~700 lines) was decomposed into private phase methods:

_validate_args_and_init → _init_session → _build_dataset
→ _prepare_accelerator_and_dtypes → _prepare_sampling → _load_dit_and_swap
→ _build_network → _build_optimizer_and_dataloader → _prepare_with_accelerator
→ _register_hooks_and_resume → _run_training_loop

Forks that overrode train() wholesale should migrate to the extension hooks below, or override a specific phase method.

2.5 Breaking API change: `call_dit` return type

# Before
def call_dit(self, ...): -> tuple[Tensor, Tensor]
    return pred, target

# After
def call_dit(self, ..., **kwargs) -> DiTOutput:
    return DiTOutput(pred=..., target=..., extra={...})

DiTOutput is a dataclass with pred, target, and an optional extra: dict escape hatch for side outputs (e.g. hidden features). It is exported from musubi_tuner.training.trainer_base (and re-exported from musubi_tuner.hv_train_network).

**kwargs was added so future per-arch conditioning (e.g. per-token timesteps) can flow through without further signature churn.

Any fork that overrides call_dit must update its return statement and signature. All in-tree per-architecture trainers have been updated.

2.6 New extension hooks (all opt-in; defaults are no-ops)

These were introduced primarily for Self-Flow (PR #913) but are general-purpose. They are documented as internal-only — no API stability guarantees yet so they may evolve based on use.

Lifecycle hooks:

Hook	When it fires	Typical use
`on_transformer_loaded(transformer)`	Right after `load_transformer`, before `eval()` and `accelerator.prepare`	Register `forward_hook` on raw blocks
`on_train_start(transformer, network)`	Top of `_run_training_loop`, post-`prepare`	Initialise EMA copies, schedulers, projection heads
`on_post_optimizer_step(transformer, sync_gradients)`	After `optimizer.step` / `lr_scheduler.step` / `zero_grad`	EMA update, per-step bookkeeping
`on_before_sample_images(transformer, network)` / `on_after_sample_images(...)`	Around every `sample_images` call site	Swap in EMA weights for sampling (try/finally guarantees `after` fires on exception)
`on_post_save(transformer, args, ..., force_sync_upload)`	End of `save_model`, after main checkpoint written	Save companion files (EMA, projection heads) with matching HF upload behaviour

Computation hooks:

Hook	Role	Default
`process_batch(batch) → (loss, loss_metrics: dict[str, float])`	Owns timestep sampling, `call_dit`, loss computation for one step	Vanilla flow-matching path
`compute_loss(output: DiTOutput, ..., weighting) → loss`	Loss formulation only (separable from `process_batch` if data flow is standard)	Weighted MSE on `pred` vs `target`

Contributor hooks (return values are merged into existing structures):

Hook	Merged into
`extra_trainable_params(args, transformer) → list[param-group]`	Optimizer param groups (alongside network params)
`extra_metadata(args) → dict`	Safetensors metadata at training-loop start
`extra_step_logs(args, logs) → dict`	Per-step `accelerator.log` payload

2.7 Reference example

src/musubi_tuner/flux_2_train_network_self_flow.py is a Self-Flow skeleton that exercises every hook above. The algorithmic body is NotImplementedError with TODO comments referencing PR #913 — the file is not yet runnable, but it is the recommended template for how to wire an extension on top of the new seams.

2.8 Minor latent fixes also included

A small follow-up commit (7a22df8) fixes bugs surfaced by Copilot review:

_int_or_float now accepts TOML-style float literals like "1.0" / "10.0" for int-typed CLI args (previously fell through to a misleading error).
convert_weight_keys is now invoked with the imported network_module object, matching its type annotation (the previous string argument was ignored by all overrides, so no behaviour change).
The Self-Flow skeleton no longer raises AttributeError on the after_sample_images path when Self-Flow is off.

日本語

移行ガイド（マージ後）

上述のリファクタリングは #950 を通じて main にマージされました。fork メンテナや依存ツール作者向けに、変更されたインターフェースを以下にまとめます。

1. 一般ユーザー — 対応不要

スクリプト名、コマンドラインオプション、データセット設定（TOML）は変更されていません。既存の training/inference/cache コマンドはそのままお使いいただけます。

万一動作が変わった、あるいは動かなくなった等あれば、この discussion でご報告いただけると助かります。

2. fork / 拡張のメンテナ向け

後方互換用の re-export が用意されているため、既存の import の多くはそのまま動作します。ただし、正規の位置は移動しており、いくつかの API はシグネチャが変わっています。新規コードは新パスを推奨します。

2.1 新しいモジュール構成

上の英語セクション 2.1 のツリー図を参照してください。

2.2 シンボル対応表

シンボル	旧パス	新パス（正規）
`NetworkTrainer`, `DiTOutput`, `SS_METADATA_KEY_*`, `SS_METADATA_MINIMUM_KEYS`	`musubi_tuner.hv_train_network`	`musubi_tuner.training.trainer_base`
`clean_memory_on_device`, `prepare_accelerator`, `collator_class`	`musubi_tuner.hv_train_network`	`musubi_tuner.training.accelerator_setup`
`line_to_prompt_dict`, `load_prompts`, `should_sample_images`	`musubi_tuner.hv_train_network`	`musubi_tuner.training.sampling_prompts`
`compute_density_for_timestep_sampling`, `get_sigmas`, `compute_loss_weighting_for_sd3`	`musubi_tuner.hv_train_network`	`musubi_tuner.training.timesteps`
`setup_parser_common`, `read_config_from_file`	`musubi_tuner.hv_train_network`	`musubi_tuner.training.parser_common`
`BucketSelector`, `BucketBatchManager`	`musubi_tuner.dataset.image_video_dataset`	`musubi_tuner.dataset.bucket`
`ContentDatasource`, `ImageDatasource`, `VideoDatasource`, `DirectoryDatasource`, `JsonlDatasource`	`musubi_tuner.dataset.image_video_dataset`	`musubi_tuner.dataset.datasources`
`save_latent_cache_`, `save_text_encoder_output_cache_`	`musubi_tuner.dataset.image_video_dataset`	`musubi_tuner.dataset.cache_io`
`resize_image_to_bucket`、画像/動画 I/O ヘルパー	`musubi_tuner.dataset.image_video_dataset`	`musubi_tuner.dataset.media_utils`
`ARCHITECTURE_*` 定数	`musubi_tuner.dataset.image_video_dataset`	`musubi_tuner.dataset.architectures`

旧パスは引き続き re-export されているので、既存コードを直ちに書き換える必要はありません。

2.3 `NetworkTrainer` がアーキテクチャ非依存に

これまで NetworkTrainer 基底クラスに HunyuanVideo 固有のデフォルト実装（load_vae, load_transformer, call_dit, process_sample_prompts, do_inference 等）が混在していました。これらは hv_train_network.py の HunyuanVideoNetworkTrainer に移動し、基底クラスでは abstract hook（NotImplementedError）となっています。

HV のデフォルト挙動を継承していた fork は、NetworkTrainer ではなく HunyuanVideoNetworkTrainer を継承してください。

2.4 `train()` のフェーズメソッド分割

NetworkTrainer.train()（約700行）は以下のプライベートフェーズメソッドに分解されました：

_validate_args_and_init → _init_session → _build_dataset
→ _prepare_accelerator_and_dtypes → _prepare_sampling → _load_dit_and_swap
→ _build_network → _build_optimizer_and_dataloader → _prepare_with_accelerator
→ _register_hooks_and_resume → _run_training_loop

train() 全体をオーバーライドしていた fork は、後述の拡張フック、または個別のフェーズメソッドのオーバーライドに移行してください。

2.5 互換性のない API 変更: `call_dit` の戻り値

# 旧
def call_dit(self, ...): -> tuple[Tensor, Tensor]
    return pred, target

# 新
def call_dit(self, ..., **kwargs) -> DiTOutput:
    return DiTOutput(pred=..., target=..., extra={...})

DiTOutput は pred, target, およびオプションの extra: dict（hidden feature 等の副次出力用エスケープハッチ）を持つ dataclass です。musubi_tuner.training.trainer_base からエクスポートされ、musubi_tuner.hv_train_network 経由でも import 可能です。

**kwargs は将来のアーキテクチャ固有の追加条件（例: per-token timestep）を後方互換に渡せるよう追加しました。

call_dit をオーバーライドしている fork は、return 文とシグネチャの更新が必須です。 in-tree のアーキテクチャ別 trainer はすべて更新済みです。

2.6 新しい拡張フック（すべて opt-in、デフォルト no-op）

主に Self-Flow（PR #913）のために導入しましたが、汎用的に使えます。あくまで内部利用を想定した API の位置づけで安定性は保証されませんので、実利用フィードバックに応じて変わる可能性があります。

ライフサイクル系：

フック	発火タイミング	典型用途
`on_transformer_loaded(transformer)`	`load_transformer` 直後、`eval()` および `accelerator.prepare` 前	生のブロックへの `forward_hook` 登録
`on_train_start(transformer, network)`	`_run_training_loop` の冒頭、`prepare` 後	EMA コピー、スケジューラ、projection head の初期化
`on_post_optimizer_step(transformer, sync_gradients)`	`optimizer.step` / `lr_scheduler.step` / `zero_grad` 直後	EMA 更新、ステップごとの記録
`on_before_sample_images(transformer, network)` / `on_after_sample_images(...)`	`sample_images` 呼び出しの前後	サンプル生成時の EMA weight swap（try/finally で例外時にも `after` が走る）
`on_post_save(transformer, args, ..., force_sync_upload)`	`save_model` 末尾、メインチェックポイント保存後	コンパニオンファイル（EMA、projection head）の保存と HF sync upload

計算系：

フック	役割	デフォルト
`process_batch(batch) → (loss, loss_metrics: dict[str, float])`	1ステップ分の timestep サンプリング、`call_dit`、損失計算を担う	通常の flow-matching パス
`compute_loss(output: DiTOutput, ..., weighting) → loss`	損失式のみ（データフローが標準なら `process_batch` と分けて差し替え可能）	`pred` vs `target` の重み付き MSE

コントリビューター系（戻り値が既存の構造にマージされる）：

フック	マージ先
`extra_trainable_params(args, transformer) → list[param-group]`	最適化対象パラメータグループ（network のパラメータと並列）
`extra_metadata(args) → dict`	学習ループ開始時の safetensors メタデータ
`extra_step_logs(args, logs) → dict`	ステップごとの `accelerator.log` ペイロード

2.7 参考実装

src/musubi_tuner/flux_2_train_network_self_flow.py は上記すべてのフックを使用している Self-Flow の skeleton です。アルゴリズム本体は PR #913 を参照する TODO 付きの NotImplementedError のため、現状は実行できませんが、新しい拡張ポイントの上に独自トレーナーを組み立てる際のテンプレートとして参照してください。

2.8 細かい修正も含まれます

Copilot レビュー由来の修正コミット（7a22df8）：

_int_or_float が int 型 CLI 引数に対して "1.0" / "10.0" のような TOML 由来の float リテラルを受け付けるように（以前は誤解を招くエラーで落ちていた）。
convert_weight_keys が型注釈どおり import 済み network_module オブジェクトを受け取るように内部修正（既存オーバーライドは引数を未使用だったため挙動への影響なし）。
Self-Flow skeleton で、Self-Flow が無効な場合に after_sample_images 経路で AttributeError が出ないようガード。

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Large-scale refactoring has been merged into `main` — migration guide inside #949

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

移行ガイド（マージ後）

1. 一般ユーザー — 対応不要

2. fork / 拡張のメンテナ向け

2.1 新しいモジュール構成

2.2 シンボル対応表

2.3 `NetworkTrainer` がアーキテクチャ非依存に

2.4 `train()` のフェーズメソッド分割

2.5 互換性のない API 変更: `call_dit` の戻り値

2.6 新しい拡張フック（すべて opt-in、デフォルト no-op）

2.7 参考実装

2.8 細かい修正も含まれます

Replies: 0 comments

Select a reply

Uh oh!

Uh oh!

Large-scale refactoring has been merged into main — migration guide inside #949

Uh oh!

Uh oh!

kohya-ss May 13, 2026 Maintainer

Migration Guide (post-merge)

1. End users — no changes required

2. Fork / extension maintainers

2.1 New module layout

2.2 Symbol relocation table

2.3 NetworkTrainer is now architecture-agnostic

2.4 train() split into phase methods

2.5 Breaking API change: call_dit return type

2.6 New extension hooks (all opt-in; defaults are no-ops)

2.7 Reference example

2.8 Minor latent fixes also included

移行ガイド（マージ後）

1. 一般ユーザー — 対応不要

2. fork / 拡張のメンテナ向け

2.1 新しいモジュール構成

2.2 シンボル対応表

2.3 NetworkTrainer がアーキテクチャ非依存に

2.4 train() のフェーズメソッド分割

2.5 互換性のない API 変更: call_dit の戻り値

2.6 新しい拡張フック（すべて opt-in、デフォルト no-op）

2.7 参考実装

2.8 細かい修正も含まれます

Replies: 0 comments

Large-scale refactoring has been merged into `main` — migration guide inside #949

kohya-ss
May 13, 2026
Maintainer

2.3 `NetworkTrainer` is now architecture-agnostic

2.4 `train()` split into phase methods

2.5 Breaking API change: `call_dit` return type

2.3 `NetworkTrainer` がアーキテクチャ非依存に

2.4 `train()` のフェーズメソッド分割

2.5 互換性のない API 変更: `call_dit` の戻り値