What's Changed
- Added longcat-video python api examples by @shaoxiongduan in #994
- [docs]: add LoRA extraction utilities documentation by @ShreejithSG in #992
- [bugfix] Add configs for TurboDiffusion T2V/I2V models by @loaydatrain in #993
- [ci] temporarily disable turbodiffusion ssim test by @SolitaryThinker in #1000
- [CI] Fixed Turbodiffusion I2V CI by @loaydatrain in #1002
- [feat!] Disable FSDP inference by default by @XOR-op in #1001
- [misc] [bugfix] unpin 'av' in pyproject by @SolitaryThinker in #1009
- [feat] Introduce Cosmos 2.5 Text2World pipeline by @KyleShao1016 in #974
- [CI] SSIM tests optimization: load all model weights from Modal persistent Volume by @alexzms in #958
- [CI] Fix OOM issues in ssim tests by @SolitaryThinker in #1011
- [Bug Fix] Add autograd wrapper for block-sparse attention in fastvideo-kernel + fix CMake extension linking by @alexzms in #1015
- [chore] release fastvideo-kernel 0.2.3 by @SolitaryThinker in #1018
- [feat] Hooks API and layerwise offloading for all DiTs by @XOR-op in #1006
- [kernel] Fix fastvideo-kernel release workflow by @SolitaryThinker in #1019
- [kernel] [bugfix] [ci] bump v0.2.4. Fix STA output handling, TurboDiffusion CUDA norm dtypes for fastvideo-kernel unit tests. by @alexzms in #1020
- Added LTX-2 Distilled T2V Generation by @shaoxiongduan in #1016
- fix: SP for hunyuanvideo 1.5 by @XOR-op in #1026
- [fastvideo-kernel] replace map to index with Triton implementation + add vsa benchmark by @alexzms in #1029
- [bugfix] add omegaconf as dep. by @SolitaryThinker in #1032
- [bugfix] Allow update timesteps for hy1.5 model. by @Davids048 in #1033
- [feat] Add HY-World1.5-Bidirectional-480P-I2V by @mignonjia in #1027
- [ci] Increase ci test error threshold by @alexzms in #1038
- [bugfix] Fix NCCL all_gather contiguity + correct ParallelTiledVAE decode tiling threshold by @KyleShao1016 in #1037
- [bugfix]: handle architectural differences while lora extraction by @ShreejithSG in #1035
- [bugfix] fix torchvision import by @RandNMR73 in #1039
- [docs] Update runpod instructions by @SolitaryThinker in #1043
- [docs] Offloading instruction by @XOR-op in #1022
- [feat]Add Matrix Game 2.0 training by @H1yori233 in #1017
- [docs] Update design overview and add agents tutorial by @SolitaryThinker in #1044
- [SP Sharding] Fix SP loss sharding on token axis (thw) with padding; add distributed correctness tests by @alexzms in #1045
- fix: sageattn3 installation by @XOR-op in #1050
- [bugfix] Double Normalization in Preprocessing Dataset by @H1yori233 in #1055
- [Bugfix] [Wan I2V] Fix CLIP Image encoder config by @JerryZhou54 in #1063
- [chore]: use higher precision timestamp in logging by @XOR-op in #1062
- [feat] Add Cosmos 2.5 I2W/V2W support (staged pipeline + examples) by @KyleShao1016 in #1021
- Added Sequence Parallelism for LTX-2 Distilled by @shaoxiongduan in #1036
- [misc] upgrade torch to 2.10 by @SolitaryThinker in #1048
- [feat] HYworld VAE with cache by @mignonjia in #1057
- [misc] Fix naming instruction in runpod.md by @SolitaryThinker in #1067
- [refactor] Action module by @XOR-op in #1065
- [core] Refactor and centralize our registry for models, pipelines, and sampling params by @SolitaryThinker in #1066
- Some minor fixes by @zhisbug in #1068
- [Feature] [Hy1.5] Support HY1.5 super-resolution pipeline for 1080p videos by @JerryZhou54 in #1046
- [Fix] remove video ratio limitation by @Eigensystem in #1069
- more fix and relocate STA arguments to pipeline config by @zhisbug in #1073
- [perf]: use CUDA IPC in multiproc executor to avoid serialization overhead by @XOR-op in #1061
- readme small fix by @jzhang38 in #1076
- [fix]:
_compile_conditionsregression by @XOR-op in #1077 - add AGENTS.md file by @RandNMR73 in #1085
- [Misc] [Training] Fixed a bunch of bugs in current training pipeline by @JerryZhou54 in #1084
- [Feat] Port LTX2 trainer by @Davids048 in #1074
- [Feat] Add Stable Diffusion 3.5 by @Ishxn20 in #1075
- [Fix] CI Transformer Tests by @Eigensystem in #1089
- [Model] LTX 2 Base by @Davids048 in #1064
- [misc] cleanup assets/ and demo/ by @SolitaryThinker in #1091
- [feat] Port LingBot-World-Base (Cam) by @H1yori233 in #1081
- [misc] update wechat group link by @SolitaryThinker in #1098
- [bugfix] Fixed ltx2 base cfg guidance by @shaoxiongduan in #1095
- [bugfix] fastvideo-kernel: fix VSA Triton padding NaNs and support q/kv length mismatch by @alexzms in #1094
- [kernel] add torch 2.10 to package build matrix by @SolitaryThinker in #1099
- [feature] Add Hunyuan-GameCraft model support by @MihirJagtap in #1071
- [bugfix] Fix failed kernel publish and SFT regressions by @SolitaryThinker in #1103
- [perf] causal MatrixGame optimization by @XOR-op in #1078
- [Fix] hunyuan postprecessing issue by @Eigensystem in #1104
- [bugfix] fix import PreTrainedModel in stepllm.py by @dsynkd in #1108
- Update README.md by @dsynkd in #1110
- [Feat] Native dit implementation for SD3.5 by @Ishxn20 in #1093
- [Misc] clean up VSA finetuning examples. by @jzhang38 in #1111
- [bugfix] get_torch_device and other device calls were being made on non-cuda platforms by @dsynkd in #1107
- [misc] add hy-world link to readme by @SolitaryThinker in #1113
- Improve Docs by @jzhang38 in #1112
- Upstream LTX2 Training by @jzhang38 in #1116
- [Misc] Remove StepVideo by @jzhang38 in #1118
- small refactor in post-processing to improve efficiency by @RandNMR73 in #1123
- [Misc] Remove Teacache by @jzhang38 in #1121
- [bugfix] Added ltx2 guidance missing modulation term by @shaoxiongduan in #1100
- [Misc] Remove STA by @jzhang38 in #1124
- [Feat] Improved CI by @Eigensystem in #1119
- [misc] fix hunyuan by @jzhang38 in #1125
- migrate uv by @Davids048 in #1127
- [fix] preprocessing issue by @Eigensystem in #1134
- [bugfix] fix matrix game kv indexing by @SolitaryThinker in #1135
- [Misc] Fix memory leakage in VideoGenerator by @jzhang38 in #1132
- Py/fix sp by @jzhang38 in #1138
- [CI][Feat] launch 2 instance to run ssim by @Eigensystem in #1137
- [bugfix]: fix a bug where collect_env was not running properly... by @dsynkd in #1145
- [Doc] add doc for inference architecture by @Eigensystem in #1147
- Added OpenAI-compatible API server and benchmark script by @AjAnubolu in #1109
- [Refactor] SP Mask --> original seq len; HunyuanVideo 1.5 does not need mask by @jzhang38 in #1142
- [CI] Add inference performance regression tests by @AjAnubolu in #1140
- [CI] PR template by @Eigensystem in #1157
- [misc] FlashAttention 4 support by @XOR-op in #1114
- feat: Building agent friendly repo by @GindaChen in #1151
- [Feat] Add causal Wan pipeline with multi-step denoising by @alexzms in #1161
- [feat] Refactor training framework into fastvideo/train by @alexzms in #1159
- Py/cleanup by @jzhang38 in #1163
- [feat] Self-Forcing methods in refactored training infra by @alexzms in #1164
- [feat] pre-commit support 120 col num by @alexzms in #1167
- [feat]: Knowledge Distillation training method for ODE-init (KDMethod + KDCausalMethod) by @alexzms in #1166
- [docs] Update README with realtime demo announcement by @zhisbug in #1169
- [CI] add contributor interaction automation by @Eigensystem in #1170
- [bugfix] self-forcing train/validation step mismatch by @H1yori233 in #1173
- [misc] update action loading in validation and preprocess by @H1yori233 in #1143
- [feat]: add HunyuanVideo model plugin for fastvideo/train framework by @alexzms in #1175
- Kandinsky5 lite dit clean by @jaisurya27 in #1088
- [misc]: reorganize training configs and add documentation by @alexzms in #1177
- [bugfix]: fix I2V preprocessing crash for models without CLIP (Wan2.2 I2V) by @alexzms in #1184
- [feat] Job Runner UI by @dsynkd in #1172
- Revert "[feat] Job Runner UI" by @Eigensystem in #1188
- [bugfix]: fix VAE temporal tiling blend corruption in tiled_encode by @alexzms in #1181
- [feat]: overhaul SSIM test infrastructure — partition scheduling, helper migration, CI fixes by @Eigensystem in #1185
- [ci] CI infrastructure cleanup and workflow reorganization (1/2) by @Eigensystem in #1186
- [ci] Merge Queue, label system overhaul, and slash commands (2/2) by @Eigensystem in #1187
- [ci] Add approval and pre-commit checks to merge protections by @Eigensystem in #1190
- [ci] CI follow-up: gate checks, issue label unification, draft PR skip by @Eigensystem in #1193
- [ci] Fix Merge Queue immediate dequeue by @Eigensystem in #1196
- [ci] Fix Merge Queue requeue and draft PR pre-commit skip by @Eigensystem in #1197
- ci: upgrade configuration to current format by @mergify[bot] in #1194
- [ci] Replace Merge Queue with auto-merge — reduce CI complexity by @Eigensystem in #1200
- [ci] Fix fork PR checkout for /test and Full Suite triggers by @Eigensystem in #1202
- [ci] Add TEST_SCOPE routing for clean single-test execution by @Eigensystem in #1203
- [ci] Trigger pre-commit on /test slash commands by @Eigensystem in #1205
- [ci] Post pre-commit status to PR commit SHA by @Eigensystem in #1206
- [ci] Add statuses:write permission for /test pre-commit by @Eigensystem in #1207
- [ci] Remove Mergify ready-label race condition by @Eigensystem in #1208
- [ci] Fix /merge to directly trigger Full Suite + simplify rebase conditions by @Eigensystem in #1209
- [ci] Add retry for flaky tests and fix stale SSIM references by @Eigensystem in #1210
- [ci] Ignore legacy reference videos when checking for HF download by @Eigensystem in #1211
- [ci] Fix jq crash when Buildkite build env is null by @Eigensystem in #1212
- [ci] Use pull_request_target for Full Suite trigger by @Eigensystem in #1213
- [ci] Add direct test retry with check overwrite and aggregate status refresh by @Eigensystem in #1214
- [ci] Use update instead of rebase for auto branch sync by @Eigensystem in #1215
- [feat] add gen3c (cosmos-7b) model and pipeline support by @vishruthb in #1059
- [feat] Job Runner UI by @Eigensystem in #1189
- ci: upgrade configuration to current format by @mergify[bot] in #1216
- [Feature] Add BSA (Bidirectional Sparse Attention) inference backend by @Satyam-53 in #1174
- [feat] [1/n] API improvements: add intial files for new fastvideo public API by @SolitaryThinker in #1218
- [perf]: Eliminate CPU-GPU synchronization bottlenecks in training pipeline by @rich7420 in #1217
- [bugfix]Fixing Lora distillation training distributed checkpointing bug by @klhhhhh in #1192
- [feat] [2/n] Improve API: add initial support in video_generator by @SolitaryThinker in #1220
- [feat] [3/n] Improve API: extend support to cli by @SolitaryThinker in #1226
- [feat] [4/n] Improve API: refactor sampling param and merge with presets by @SolitaryThinker in #1234
- [misc] small cleanup for API handling by @SolitaryThinker in #1235
- [feat] [5/n] Improve API: wire ServeConfig.default_request into OpenAI serving by @SolitaryThinker in #1237
- [feat] [5.5/n] Improve API: streaming server config surface + serve dispatch by @SolitaryThinker in #1238
- [test] add LTX-2 distilled T2V SSIM regression test by @SolitaryThinker in #1240
- [feat] [6/n] Improve API: LTX-2 public preset + asset wiring + gpu_pool translation by @SolitaryThinker in #1239
- [feat] Add typed LTX-2 continuation state and streaming session store by @SolitaryThinker in #1250
- [bugfix]: normalize uint8 pil_image in I2V VAE encoding by @Davids048 in #1249
- [feat] Streaming WebSocket server skeleton (single generator + fMP4) by @SolitaryThinker in #1251
- [docs]: clarify real_score_guidance_scale CFG parameterization by @alexzms in #1256
- [Perf] Skip bool-mask round-trip in block-sparse VSA attention by @Godmook in #1243
- [bugfix] Fix modal remote functions crash container on sys exit in CI remote functions by @Satyam-53 in #1261
- [ci] add CPU unit tests for fastvideo.train load_run_config by @alexzms in #1264
- [bugfix]: fix SP deadlock in negative prompt encoding during training by @alexzms in #1178
- [ci] add CPU unit tests for train checkpoint utilities in fastvideo.train by @alexzms in #1265
- [feat] Cosmos 2.5 training support in fastvideo.train by @alexzms in #1224
- [feat] Stable Audio Open 1.0: T2A + A2A + RePaint inpainting (native) by @SolitaryThinker in #1260
- [ci] add CPU unit tests for train callback system in fastvideo.train by @alexzms in #1267
- [misc] cleanup: grad-norm asserts, dead offload file, callback names by @alexzms in #1268
- [refactor] tests/local_tests: organize by model family by @SolitaryThinker in #1269
- [bugfix]: classify stable_audio fields in schema parity inventory by @SolitaryThinker in #1275
- [ci] pre-commit: drop stale excludes + document agent lint flow by @SolitaryThinker in #1276
- [bugfix] Update fa import by @Davids048 in #1271
- [docs] add hierarchical AGENTS.md per-directory guidance by @SolitaryThinker in #1278
- [ci] Add CI Performance Regression Tracking Changes by @Satyam-53 in #1248
- [misc] pin torch to 2.11.0 by @SolitaryThinker in #1277
- [misc]: standardize install instructions on uv pip install by @SolitaryThinker in #1279
- [feat] Improve API: streaming server GpuPool + worker subprocess by @SolitaryThinker in #1257
- [feat] Improve API: streaming prompt enhancer with LLMProvider abstraction by @SolitaryThinker in #1258
- [feat] Improve API: streaming auxiliaries (safety, rewrite, logger, mock) by @SolitaryThinker in #1284
- [feat] Improve API: streaming router (multi-replica load balancer + ws proxy) by @SolitaryThinker in #1286
- [ci] Replace flaky LTX-2 pixel SSIM with latent-slice cosine regression by @Godmook in #1253
- [infra]: Add activation trace hooks for pipeline debugging by @SolitaryThinker in #1293
- [feat] Add fastvideo.eval video evaluation suite by @shaoxiongduan in #1305
- [feat]: Loader umbrella-repo support + optional component dirs by @SolitaryThinker in #1294
- [infra]: New skill - decompose-pipeline-pr by @SolitaryThinker in #1303
- [ci] mergify: accept [skill]/[skills] and [infra] PR title tags by @SolitaryThinker in #1309
- [feat]: add LongCat bidirectional finetuning support by @aryan5v in #1244
- [misc]: import add-model skill stack to .agents/skills/ by @SolitaryThinker in #1308
- [misc] attention hot-path cleanup + denoising loop hoists by @alexzms in #1272
- [feat] FastVideo World Model Training by @H1yori233 in #1179
- [feat] Add Cosmos 2.5 T2W training pipeline (LoRA + full fine-tune) by @Mister-Raggs in #1227
- [docs] Dreamverse 01/14: Add integration provenance by @Davids048 in #1324
- [infra]: MagiHuman housekeeping (gitignore, codespell, skills index) (1/8) by @SolitaryThinker in #1295
- [docs] Dreamverse 02/14: Add app documentation by @Davids048 in #1325
- [feat] Dreamverse 03/14: Add backend skeleton by @Davids048 in #1326
- [feat]: T5-Gemma encoder for MagiHuman pipeline (2/8) by @SolitaryThinker in #1296
- [feat] Dreamverse 04/14: Add session and prompt logic by @Davids048 in #1327
- [feat]: MagiHuman DiT (transformer) port + parity tests (3/8) by @SolitaryThinker in #1297
- [feat]: MagiHuman pipeline stages (4/8) by @SolitaryThinker in #1298
- [feat] Dreamverse 05/14: Add streaming runtime by @Davids048 in #1328
- [feat]: MagiHuman pipeline orchestrator + 10-test parity battery (5/8) by @SolitaryThinker in #1299
- [docs]: MagiHuman provenance - AGENTS.md, JOURNAL.md, lessons (6/8) by @SolitaryThinker in #1300
- [infra]: MagiHuman checkpoint conversion + push scripts (7/8) by @SolitaryThinker in #1301
- [feat] Dreamverse 06/14: Add frontend scaffold by @Davids048 in #1329
- [ci] add GPU model loading tests for fastvideo.train (PR 4/9) by @alexzms in #1274
- [bugfix] MatrixGame2 SF distillation under gradient checkpointing by @H1yori233 in #1340
- feat: FP4 Flash Attention 4 for Blackwell GPUs by @Edenzzzz in #1221
- [feat] Dreamverse 07/14: Add frontend session UI by @Davids048 in #1330
- [feat] Dreamverse 08/14: Add frontend media and E2E coverage by @Davids048 in #1331
- [infra] Dreamverse 09/14: Add Docker and launch scripts by @Davids048 in #1332
- [misc]: empty init.py files with no logic by @SolitaryThinker in #1346
- [misc]: PR-1225 sync — housekeeping (1/12) by @SolitaryThinker in #1347
- [feat] eval: async VideoPool + metric streamlines by @shaoxiongduan in #1320
- [feat] Dreamverse 10/14: Add serving API contracts by @Davids048 in #1333
- [infra] Dreamverse 11/14: Add NVFP4 quantization support by @Davids048 in #1334
- [feat]: Add NVFP4QAT quantization config (Attn-QAT 2/12) by @SolitaryThinker in #1348
- [feat] Dreamverse 12/14: Add LTX2 refine and upsampler support by @Davids048 in #1335
- [docs] Add copy page action by @Davids048 in #1351
- [misc]: Add Dreamverse deploy skill frontmatter by @Davids048 in #1353
- [feat] Dreamverse 13/14: Activate LTX2 integration by @Davids048 in #1336
- [perf] shallow-copy VSA attn_metadata in train model plugins by @alexzms in #1342
- [perf] Dreamverse 14/14: Add LTX2 profile speedups by @Davids048 in #1337
- [feat]: Add NVFP4QAT linear layer (Attn-QAT 3/12) by @SolitaryThinker in #1350
- [misc]: demote ROCm-unavailable startup message to DEBUG by @SolitaryThinker in #1360
- [feat] eval: add audio metrics by @shaoxiongduan in #1352
- [infra] [dreamverse]: add instruction to install nasm and update ffmpeg installer to work in plain venv by @SolitaryThinker in #1361
- [feat] Add minimal LoRA finetuning support to the YAML training stack by @radicalyyyahaha in #1242
- [misc] Rename MatrixGame to MatrixGame2 by @H1yori233 in #1357
- [fix] Fix causal self-forcing attention settings by @mignonjia in #1355
- [feat]: add FastLTX-2.3 Gradio demo package (draft) by @Davids048 in #1247
- [perf] Mark LayerwiseOffloadHook entry points torch.compiler.disable (remove per-layer graph break) by @Mister-Raggs in #1365
- [Bugfix] FP4 FA4 installation fix by @Edenzzzz in #1367
- [bugfix]: shrink Dreamverse Docker context by @Davids048 in #1368
- [ci] Add Dreamverse Docker image workflow by @Davids048 in #1369
- [ci] Component time performance + reseed hf baseline skill by @Satyam-53 in #1292
- [feat] Optimize distributed weight loading in multi-node training by @Edenzzzz in #572
- [docs] Document performance benchmark workflow by @Satyam-53 in #1376
- [docs] Document enable_torch_compile (+ A/B example) by @Mister-Raggs in #1366
- [feat]: Attn-QAT inference + training backends (deadcode) (Attn-QAT 4/12) by @SolitaryThinker in #1358
- chore: pin dreamverse npm deps to address Dependabot alerts by @SolitaryThinker in #1359
- [infra] Add Dreamverse Modal UI image build by @Davids048 in #1381
- [infra] Use npm for Dreamverse web builds by @Davids048 in #1385
- [perf]: register FA2/FA3 default flash_attn_func as a torch.library custom op by @Mister-Raggs in #1373
- [ci] Add DreamVerse app CI tests by @Davids048 in #1386
- [refactor]: shared attention infra additions for QAT-compat (Attn-QAT 5/12) by @SolitaryThinker in #1383
- [refactor] eval: consolidate FVD into common.fvd, remove benchmarks/fvd by @shaoxiongduan in #1380
- [ci] add per-method single-step training tests for fastvideo.train by @alexzms in #1343
- [feat] VSA-256 fastpath on Blackwell via FA4 CuTe block-sparse attention by @alexzms in #1354
- [docs]: Wire activation trace into mkdocs nav + perf/troubleshooting by @SolitaryThinker in #1304
- [docs]: surface activation-trace utility in add-model skills by @SolitaryThinker in #1399
- [feat] eval: input ergonomics + Evaluator features + bug fixes by @shaoxiongduan in #1392
- [bugfix] Fix Dreamverse Modal compile warmup latency by @Davids048 in #1394
- [docs]: highlight Dreamverse deployment paths + add Server B200 (SSH) guide by @alexzms in #1409
- [feat] Add MatrixGame3.0 by @H1yori233 in #1201
- [feat] LTX-2.3 transformer support (config-gated extension of LTX-2) by @alexzms in #1397
- [feat] LTX-2.3 audio: BWE vocoder path by @alexzms in #1398
- [bugfix]: dreamverse modal bypasses ENTRYPOINT — set ffmpeg env + key check by @alexzms in #1413
- [perf] Add Adaptive Guidance (CFG gating) for stale-uncond reuse by @rich7420 in #1372
- [ci] Add additional Dreamverse UI tests by @kevin314 in #1417
- [bugfix] Fix STFT dtype mismatch by @kevin314 in #1419
- [bugfix] LTX2: honor
video_position_offset_secin the DiT by @H1yori233 in #1422 - [feat] LoRA controls and integration for Dreamverse by @H1yori233 in #1420
- [feat] dreamverse: sequence parallelism for serving by @shaoxiongduan in #1424
- [bugfix] tests: include ltx2_3_base in expected LTX2 preset set (#1427) by @Mister-Raggs in #1428
- [chore]: unpin runtime deps in pyproject.toml by @SolitaryThinker in #1431
- [chore]: release v0.2.0 by @SolitaryThinker in #1432
New Contributors
- @mignonjia made their first contribution in #1027
- @Ishxn20 made their first contribution in #1075
- @dsynkd made their first contribution in #1108
- @AjAnubolu made their first contribution in #1109
- @GindaChen made their first contribution in #1151
- @jaisurya27 made their first contribution in #1088
- @mergify[bot] made their first contribution in #1194
- @vishruthb made their first contribution in #1059
- @Satyam-53 made their first contribution in #1174
- @rich7420 made their first contribution in #1217
- @klhhhhh made their first contribution in #1192
- @Godmook made their first contribution in #1243
- @aryan5v made their first contribution in #1244
- @Mister-Raggs made their first contribution in #1227
- @radicalyyyahaha made their first contribution in #1242
Full Changelog: v0.1.7...v0.2.0