Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add UniDiffuser model and pipeline #2963

Merged
merged 334 commits into from
May 26, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
334 commits
Select commit Hold shift + click to select a range
115e382
Fix a bug of pano when not doing CFG (#3030)
ernestchu Apr 12, 2023
10c54cb
Text2video zero refinements (#3070)
19and99 Apr 12, 2023
945f300
Release: v0.15.0
patrickvonplaten Apr 12, 2023
322b5cb
[Tests] Speed up panorama tests (#3067)
sayakpaul Apr 12, 2023
af0c3a7
[Post release] v0.16.0dev (#3072)
patrickvonplaten Apr 12, 2023
7a1d100
Adds profiling flags, computes train metrics average. (#3053)
andsteing Apr 12, 2023
bbabf3f
[Pipelines] Make sure that None functions are correctly not saved (#3…
patrickvonplaten Apr 12, 2023
068d6b4
doc string example remove from_pt (#3083)
yiyixuxu Apr 13, 2023
9dd6058
[Tests] parallelize (#3078)
patrickvonplaten Apr 13, 2023
74907ee
Throw deprecation warning for return_cached_folder (#3092)
patrickvonplaten Apr 13, 2023
0be9f8b
Allow SD attend and excite pipeline to work with any size output imag…
jcoffland Apr 13, 2023
d6ae0ae
[docs] Update community pipeline docs (#2989)
stevhliu Apr 13, 2023
7f3cb6d
Add to support Guess Mode for StableDiffusionControlnetPipleline (#2998)
takuma104 Apr 14, 2023
1184b36
fix default value for attend-and-excite (#3099)
yiyixuxu Apr 14, 2023
fa6a6b4
remvoe one line as requested by gc team (#3077)
yiyixuxu Apr 14, 2023
a256f84
ddpm custom timesteps (#3007)
williamberman Apr 14, 2023
e9cb03e
Fix breaking change in `pipeline_stable_diffusion_controlnet.py` (#3118)
remorses Apr 16, 2023
c98e41d
Add global pooling to controlnet (#3121)
patrickvonplaten Apr 16, 2023
653b3c1
[Bug fix] Fix img2img processor with safety checker (#3127)
patrickvonplaten Apr 17, 2023
7fa3b6c
[Bug fix] Make sure correct timesteps are chosen for img2img (#3128)
patrickvonplaten Apr 17, 2023
cc14690
Improve deprecation warnings (#3131)
patrickvonplaten Apr 17, 2023
6f12a36
Fix config deprecation (#3129)
patrickvonplaten Apr 17, 2023
16ddd8b
feat: verfication of multi-gpu support for select examples. (#3126)
sayakpaul Apr 18, 2023
07731e9
speed up attend-and-excite fast tests (#3079)
yiyixuxu Apr 18, 2023
7a39b0f
Optimize log_validation in train_controlnet_flax (#3110)
cgarciae Apr 18, 2023
7ae597f
make style
patrickvonplaten Apr 18, 2023
00a5e55
Correct textual inversion readme (#3145)
patrickvonplaten Apr 18, 2023
ff5b99b
Add unet act fn to other model components (#3136)
williamberman Apr 18, 2023
c8eaea5
class labels timestep embeddings projection dtype cast (#3137)
williamberman Apr 18, 2023
1fac211
[ckpt loader] Allow loading the Inpaint and Img2Img pipelines, while …
cmdr2 Apr 19, 2023
f3300a8
add from_ckpt method as Mixin (#2318)
1lint Apr 19, 2023
fc3760d
Add TensorRT SD/txt2img Community Pipeline to diffusers along with Te…
asfiyab-nvidia Apr 19, 2023
6058378
Correct `Transformer2DModel.forward` docstring (#3074)
offchan42 Apr 19, 2023
e5335f3
Update pipeline_stable_diffusion_inpaint_legacy.py (#2903)
hwuebben Apr 19, 2023
4afb911
Modified altdiffusion pipline to support altdiffusion-m18 (#2993)
superhero-7 Apr 19, 2023
b320c6b
controlnet training resize inputs to multiple of 8 (#3135)
williamberman Apr 19, 2023
3417b1f
adding custom diffusion training to diffusers examples (#3031)
nupurkmr9 Apr 20, 2023
4a6aee9
make style
patrickvonplaten Apr 20, 2023
3306b04
Update custom_diffusion.mdx (#3165)
mishig25 Apr 20, 2023
abd21da
Added distillation for quantization example on textual inversion. (#2…
XinyuYe-Intel Apr 20, 2023
f99a9ff
Update Noise Autocorrelation Loss Function for Pix2PixZero Pipeline (…
clarencechen Apr 20, 2023
206b9b6
[DreamBooth] add text encoder LoRA support in the DreamBooth training…
sayakpaul Apr 20, 2023
6ad4392
Update Habana Gaudi documentation (#3169)
regisss Apr 21, 2023
4366b0c
Add model offload to x4 upscaler (#3187)
patrickvonplaten Apr 21, 2023
e21784e
[docs] Deterministic algorithms (#3172)
stevhliu Apr 21, 2023
006ae03
Update custom_diffusion.mdx to credit the author (#3163)
sayakpaul Apr 21, 2023
dac4d4a
Fix TensorRT community pipeline device set function (#3157)
asfiyab-nvidia Apr 21, 2023
c98a055
make `from_flax` work for controlnet (#3161)
yiyixuxu Apr 21, 2023
15a90e2
[docs] Clarify training args (#3146)
stevhliu Apr 21, 2023
cf35763
Multi Vector Textual Inversion (#3144)
patrickvonplaten Apr 21, 2023
c729403
Add `Karras sigmas` to HeunDiscreteScheduler (#3160)
youssefadr Apr 21, 2023
43c90b0
[AudioLDM] Fix dtype of returned waveform (#3189)
sanchit-gandhi Apr 21, 2023
49c9b4c
Fix bug in train_dreambooth_lora (#3183)
crywang Apr 22, 2023
a69502f
[Community Pipelines] Update lpw_stable_diffusion pipeline (#3197)
SkyTNT Apr 22, 2023
3267649
Make sure VAE attention works with Torch 2_0 (#3200)
patrickvonplaten Apr 22, 2023
de05ea0
Revert "[Community Pipelines] Update lpw_stable_diffusion pipeline" (…
williamberman Apr 22, 2023
8953209
[Bug fix] Fix batch size attention head size mismatch (#3214)
patrickvonplaten Apr 24, 2023
4e03663
fix mixed precision training on train_dreambooth_inpaint_lora (#3138)
themrzmaster Apr 25, 2023
167cb7a
adding enable_vae_tiling and disable_vae_tiling functions (#3225)
init-22 Apr 25, 2023
0431637
Add ControlNet v1.1 docs (#3226)
patrickvonplaten Apr 25, 2023
9e2f445
Fix issue in maybe_convert_prompt (#3188)
pdoane Apr 25, 2023
81950af
Sync cache version check from transformers (#3179)
ychfan Apr 25, 2023
711119a
Fix docs text inversion (#3166)
patrickvonplaten Apr 25, 2023
416f31a
add model (#3230)
patrickvonplaten Apr 25, 2023
7ad77dd
Allow return pt x4 (#3236)
patrickvonplaten Apr 26, 2023
3acc879
Allow fp16 attn for x4 upscaler (#3239)
patrickvonplaten Apr 26, 2023
4c73947
fix fast test (#3241)
patrickvonplaten Apr 26, 2023
81d7eba
Adds a document on token merging (#3208)
sayakpaul Apr 26, 2023
f83fbbd
[AudioLDM] Update docs to use updated ckpt (#3240)
sanchit-gandhi Apr 26, 2023
4cc60b5
Release: v0.16.0
patrickvonplaten Apr 26, 2023
163c33b
Post release for 0.16.0 (#3244)
patrickvonplaten Apr 26, 2023
59986b6
[docs] only mention one stage (#3246)
pcuenca Apr 26, 2023
a640f1b
Write model card in controlnet training script (#3229)
pcuenca Apr 26, 2023
7880ed7
[2064]: Add stochastic sampler (sample_dpmpp_sde) (#3020)
nipunjindal Apr 27, 2023
8def721
[Stochastic Sampler][Slow Test]: Cuda test fixes (#3257)
nipunjindal Apr 27, 2023
97cf386
Remove required from tracker_project_name (#3260)
pcuenca Apr 27, 2023
cf2bf70
adding required parameters while calling the get_up_block and get_dow…
init-22 Apr 27, 2023
76e5941
[docs] Update interface in repaint.mdx (#3119)
ernestchu Apr 27, 2023
1147c76
Update IF name to XL (#3262)
apolinario Apr 27, 2023
cd13b10
fix typo in score sde pipeline (#3132)
fecet Apr 27, 2023
5f3b10a
Fix typo in textual inversion JAX training script (#3123)
jairtrejo Apr 27, 2023
9585b23
AudioDiffusionPipeline - fix encode method after config changes (#3114)
teticio Apr 27, 2023
79706a7
Revert "Revert "[Community Pipelines] Update lpw_stable_diffusion pip…
patrickvonplaten Apr 27, 2023
12868b1
Fix community pipelines (#3266)
patrickvonplaten Apr 27, 2023
2c87f65
update notebook (#3259)
yiyixuxu Apr 27, 2023
a80f696
[docs] add notes for stateful model changes (#3252)
williamberman Apr 27, 2023
72a8467
[LoRA] quality of life improvements in the loading semantics and docs…
sayakpaul Apr 28, 2023
716c255
[Community Pipelines] EDICT pipeline implementation (#3153)
Joqsan Apr 28, 2023
c1dce20
[Docs]zh translated docs update (#3245)
DrDavidS Apr 28, 2023
5151f21
Update logging.mdx (#2863)
tolgacangoz Apr 28, 2023
ffe6e92
Add multiple conditions to StableDiffusionControlNetInpaintPipeline (…
timegate Apr 28, 2023
10d856a
Let's make sure that dreambooth always uploads to the Hub (#3272)
patrickvonplaten Apr 28, 2023
029a28f
Diffedit Zero-Shot Inpainting Pipeline (#2837)
clarencechen Apr 28, 2023
fd47d7c
add constant learning rate with custom rule (#3133)
jason9075 Apr 28, 2023
220657b
Allow disabling torch 2_0 attention (#3273)
patrickvonplaten Apr 28, 2023
1b95720
[doc] add link to training script (#3271)
yiyixuxu Apr 28, 2023
08fbaaf
temp disable spectogram diffusion tests (#3278)
williamberman Apr 28, 2023
799015c
Changed sample[0] to images[0] (#3304)
IliaLarchenko May 1, 2023
c38d004
Typo in tutorial (#3295)
IliaLarchenko May 1, 2023
6a84a74
Torch compile graph fix (#3286)
patrickvonplaten May 1, 2023
863bb75
Postprocessing refactor img2img (#3268)
yiyixuxu May 1, 2023
c8cc4f0
[Torch 2.0 compile] Fix more torch compile breaks (#3313)
patrickvonplaten May 2, 2023
6e8d065
fix: scale_lr and sync example readme and docs. (#3299)
sayakpaul May 3, 2023
d38b4d9
Update stable_diffusion.mdx (#3310)
mu94-csl May 3, 2023
0d22064
Fix missing variable assign in DeepFloyd-IF-II (#3315)
gitmylo May 3, 2023
5a75a8a
Correct doc build for patch releases (#3316)
patrickvonplaten May 3, 2023
5ea3424
Add Stable Diffusion RePaint to community pipelines (#3320)
Markus-Pobitzer May 3, 2023
7815c41
Fix multistep dpmsolver for cosine schedule (suitable for deepfloyd-i…
LuChengTHU May 3, 2023
0e8f4f0
[docs] Improve LoRA docs (#3311)
stevhliu May 4, 2023
7929587
Added input pretubation (#3292)
isamu-isozaki May 4, 2023
3e8d3d8
Update write_own_pipeline.mdx (#3323)
csaybar May 4, 2023
b4aa419
update controlling generation doc with latest goodies. (#3321)
sayakpaul May 5, 2023
458847e
[Quality] Make style (#3341)
patrickvonplaten May 5, 2023
140ab74
Fix config dpm (#3343)
patrickvonplaten May 5, 2023
652dbaa
Add the SDE variant of DPM-Solver and DPM-Solver++ (#3344)
LuChengTHU May 5, 2023
1d213de
Add upsample_size to AttnUpBlock2D, AttnDownBlock2D (#3275)
will-rice May 5, 2023
434b255
Add UniDiffuser classes to __init__ files, modify transformer block t…
dg845 Apr 14, 2023
7097dd7
Update fast tests to use test checkpoints stored on the hub and to be…
dg845 May 5, 2023
fc85263
Fix code with make style.
dg845 May 5, 2023
9d39bef
Revert "Fix code style with make style."
dg845 May 5, 2023
1cb726a
Merge branch 'main' into unidiffuser-pipeline
dg845 May 5, 2023
e62b32a
Add self.image_encoder, self.text_decoder to list of models to offloa…
dg845 May 5, 2023
fc540b5
Fix code quality with make style.
dg845 May 5, 2023
54c495f
Support using a data type embedding for UniDiffuser-v1.
dg845 May 6, 2023
8dd7b0b
Add fast test for checking UniDiffuser-v1 sampling.
dg845 May 9, 2023
34a40ad
Make changes so that the repository consistency tests pass.
dg845 May 9, 2023
0cddc3c
Add UniDiffuser dummy objects via make fix-copies.
dg845 May 9, 2023
16fd515
Fix bugs and make improvements to the UniDiffuser pipeline:
dg845 May 9, 2023
5728328
Fix code style with make style.
dg845 May 9, 2023
abd6fca
Add/edit docstrings for added classes and public pipeline methods. Al…
dg845 May 11, 2023
ae7d549
Add documentation for UniDiffuser and fix some typos/formatting in do…
dg845 May 11, 2023
2b92111
Fix code with make style.
dg845 May 11, 2023
a46e1ec
Refactor and improve the UniDiffuser convert_from_ckpt.py script.
dg845 May 11, 2023
a7f50f4
Move the UniDiffusers convert_from_ckpy.py script to diffusers/script…
dg845 May 11, 2023
8a57342
Fix code quality via make style.
dg845 May 11, 2023
006ab49
Improve UniDiffuser slow tests.
dg845 May 11, 2023
8f2d325
make style
dg845 May 11, 2023
a54d631
Fix some typos in the UniDiffuser docs.
dg845 May 11, 2023
fa9e387
Remove outdated logic based on transformers version in UniDiffuser pi…
dg845 May 11, 2023
19a20a5
Remove dependency on einops by refactoring einops operations to pure …
dg845 May 11, 2023
28dda62
make style
dg845 May 11, 2023
de8794c
Add slow test on full checkpoint for joint mode and correct expected …
dg845 May 11, 2023
7242f1b
make style
dg845 May 11, 2023
1a58958
Fix mixed precision issue by wrapping the offending code with the tor…
dg845 May 11, 2023
f36df41
Revert "Fix mixed precision issue by wrapping the offending code with…
dg845 May 11, 2023
1bc2b91
Add fast test for CUDA/fp16 model behavior (currently failing).
dg845 May 11, 2023
5341450
Fix the mixed precision issue and add additional tests of the pipelin…
dg845 May 11, 2023
b1a6f22
make style
dg845 May 11, 2023
54cfa3b
Use a CLIPVisionModelWithProjection instead of CLIPVisionModel for im…
dg845 May 11, 2023
10e3774
Make style and remove some testing code.
dg845 May 11, 2023
4d656b5
Fix shape errors for the 'joint' and 'img2text' modes.
dg845 May 12, 2023
be4abff
Fix tests and remove some testing code.
dg845 May 15, 2023
848b7e6
Add option to use fixed latents for UniDiffuserPipelineSlowTests and …
dg845 May 15, 2023
e56fab2
Improve UniDiffuser docs, particularly the usage examples, and improv…
dg845 May 16, 2023
ecaf07f
make style
dg845 May 16, 2023
c161e29
Fix examples to load model in float16.
dg845 May 16, 2023
926c7fb
In image-to-text mode, sample from the autoencoder moment distributio…
dg845 May 17, 2023
edbadcc
make style
dg845 May 17, 2023
6b35c03
When encoding the image using the VAE, scale the image latents by the…
dg845 May 21, 2023
f46593e
make style
dg845 May 21, 2023
ec7fb87
Clean up code and make slow tests pass.
dg845 May 21, 2023
029c96c
make fix-copies
patrickvonplaten May 8, 2023
6644d11
[docs] Fix docstring (#3334)
stevhliu May 8, 2023
c221086
if dreambooth lora (#3360)
williamberman May 9, 2023
f670e08
Postprocessing refactor all others (#3337)
yiyixuxu May 9, 2023
7266fc1
[docs] Improve safetensors docstring (#3368)
stevhliu May 9, 2023
4b76097
add: a warning message when using xformers in a PT 2.0 env. (#3365)
sayakpaul May 10, 2023
6e297b4
StableDiffusionInpaintingPipeline - resize image w.r.t height and wid…
rupertmenneer May 10, 2023
fec7bd1
make style
patrickvonplaten May 10, 2023
e162d49
[docs] Adapt a model (#3326)
stevhliu May 10, 2023
caa080c
[docs] Load safetensors (#3333)
stevhliu May 11, 2023
75c2f75
make style
patrickvonplaten May 11, 2023
f0c0f00
[Docs] Fix stable_diffusion.mdx typo (#3398)
sudowind May 11, 2023
42eabb8
Support ControlNet v1.1 shuffle properly (#3340)
takuma104 May 11, 2023
1965acf
[Tests] better determinism (#3374)
sayakpaul May 11, 2023
7b7b6bf
[docs] Add transformers to install (#3388)
stevhliu May 11, 2023
c998614
[deepspeed] partial ZeRO-3 support (#3076)
stas00 May 11, 2023
1085f3e
Add omegaconf for tests (#3400)
patrickvonplaten May 11, 2023
188de89
Fix various bugs with LoRA Dreambooth and Dreambooth script (#3353)
patrickvonplaten May 11, 2023
89a8f73
Fix docker file (#3402)
patrickvonplaten May 11, 2023
cb4016d
fix: deepseepd_plugin retrieval from accelerate state (#3410)
sayakpaul May 12, 2023
41763f6
[Docs] Add `sigmoid` beta_scheduler to docstrings of relevant Schedul…
Laurent2916 May 12, 2023
622c3c6
Don't install accelerate and transformers from source (#3415)
patrickvonplaten May 12, 2023
f001e07
Don't install transformers and accelerate from source (#3414)
patrickvonplaten May 12, 2023
80c2e55
Improve fast tests (#3416)
patrickvonplaten May 12, 2023
d749d57
attention refactor: the trilogy (#3387)
williamberman May 12, 2023
6ce7f8f
[Docs] update the PT 2.0 optimization doc with latest findings (#3370)
sayakpaul May 13, 2023
480b525
Fix style rendering (#3433)
pcuenca May 15, 2023
d3b3855
unCLIP scheduler do not use note (#3417)
williamberman May 15, 2023
9a31cce
Replace deprecated command with environment file (#3409)
jongwooo May 16, 2023
df625f4
fix warning message pipeline loading (#3446)
patrickvonplaten May 16, 2023
8065462
add stable diffusion tensorrt img2img pipeline (#3419)
asfiyab-nvidia May 16, 2023
d5f65dc
Refactor controlnet and add img2img and inpaint (#3386)
patrickvonplaten May 16, 2023
2b11926
[Scheduler] DPM-Solver (++) Inverse Scheduler (#3335)
clarencechen May 16, 2023
63abfce
[Docs] Fix incomplete docstring for resnet.py (#3438)
Laurent2916 May 16, 2023
32162aa
fix tiled vae blend extent range (#3384)
superlabs-dev May 16, 2023
3f5a176
Small update to "Next steps" section (#3443)
pcuenca May 16, 2023
3019e08
Allow arbitrary aspect ratio in IFSuperResolutionPipeline (#3298)
devxpy May 17, 2023
bb1172b
Adding 'strength' parameter to StableDiffusionInpaintingPipeline (#3…
rupertmenneer May 17, 2023
2a16062
[WIP] Bugfix - Pipeline.from_pretrained is broken when the pipeline i…
vimarshc May 17, 2023
68a97bd
Fix gradient checkpointing bugs in freezing part of models (requires_…
IrisRainbowNeko May 17, 2023
ce072e0
Make dreambooth lora more robust to orig unet (#3462)
patrickvonplaten May 17, 2023
ee10c71
Reduce peak VRAM by releasing large attention tensors (as soon as the…
cmdr2 May 17, 2023
9388b3a
Add min snr to text2img lora training script (#3459)
wfng92 May 17, 2023
2ef1b00
Add inpaint lora scale support (#3460)
Glaceon-Hyy May 17, 2023
62d9c72
[From ckpt] Fix from_ckpt (#3466)
patrickvonplaten May 17, 2023
368f9ad
Update full dreambooth script to work with IF (#3425)
williamberman May 17, 2023
68441bf
Add IF dreambooth docs (#3470)
williamberman May 17, 2023
eb7ae28
parameterize pass single args through tuple (#3477)
williamberman May 18, 2023
bb1e25a
attend and excite tests disable determinism on the class level (#3478)
williamberman May 18, 2023
9a195d7
dreambooth docs torch.compile note (#3471)
williamberman May 19, 2023
09ddb88
add: if entry in the dreambooth training docs. (#3472)
sayakpaul May 19, 2023
e36596c
[docs] Textual inversion inference (#3473)
stevhliu May 19, 2023
147da83
[docs] Distributed inference (#3376)
stevhliu May 19, 2023
2d8e089
[{Up,Down}sample1d] explicit view kernel size as number elements in f…
williamberman May 19, 2023
53e37b8
mps & onnx tests rework (#3449)
pcuenca May 20, 2023
8eae86d
[Attention processor] Better warning message when shifting to `AttnPr…
sayakpaul May 21, 2023
55ca69b
[Docs] add note on local directory path. (#3397)
sayakpaul May 21, 2023
a8219e8
Refactor full determinism (#3485)
patrickvonplaten May 22, 2023
a3e1153
Fix DPM single (#3413)
patrickvonplaten May 22, 2023
01b42e4
Add `use_Karras_sigmas` to DPMSolverSinglestepScheduler (#3476)
Isotr0py May 22, 2023
d22535a
Adds local_files_only bool to prevent forced online connection (#3486)
w4ffl35 May 22, 2023
b78e854
make style
patrickvonplaten May 22, 2023
a9ac5a8
[Docs] Korean translation (optimization, training) (#3488)
Snailpong May 22, 2023
634cf1f
DataLoader respecting EXIF data in Training Images (#3465)
Ambrosiussen May 22, 2023
5782887
make style
patrickvonplaten May 22, 2023
f61028f
feat: allow disk offload for diffuser models (#3285)
hari10599 May 22, 2023
30329a2
[Community] reference only control (#3435)
okotaku May 22, 2023
ca87f4d
Support for cross-attention bias / mask (#2634)
Birch-san May 22, 2023
cdf38f1
do not scale the initial global step by gradient accumulation steps w…
williamberman May 22, 2023
51f0951
Remove CPU latents logic for UniDiffuserPipelineFastTests.
dg845 May 21, 2023
4ccb2b5
make style
dg845 May 21, 2023
97e8eef
Revert "Clean up code and make slow tests pass."
dg845 May 21, 2023
9f7247c
Revert bad commit and clean up code.
dg845 May 22, 2023
302fde9
add: contributor note.
sayakpaul May 23, 2023
9f84416
Batched load of textual inversions (#3277)
pdoane May 8, 2023
6326cb7
Revert "add: contributor note."
dg845 May 23, 2023
6d0f321
Re-add contributor note and refactored fast tests fixed latents code …
dg845 May 23, 2023
73504c4
make style
dg845 May 23, 2023
0ed1857
Refactored the code:
dg845 May 24, 2023
d53026d
make style
dg845 May 24, 2023
0adb0a8
Remove padding logic from UniDiffuserTextDecoder.generate_beam since …
dg845 May 24, 2023
43b8894
Update checkpoint id for small test v1 checkpoint to hf-internal-test…
dg845 May 24, 2023
a5a9dac
make style
dg845 May 24, 2023
d4b11aa
Make improvements to the documentation.
dg845 May 25, 2023
98ce17d
Move ImageTextPipelineOutput documentation from /api/pipelines/unidif…
dg845 May 25, 2023
f8c325a
Change order of arguments for UniDiffuserTextDecoder.generate_beam.
dg845 May 26, 2023
b4feac8
make style
dg845 May 26, 2023
4f21661
Merge branch 'main' into unidiffuser-pipeline
dg845 May 26, 2023
07d68d7
Update docs/source/en/api/pipelines/unidiffuser.mdx
sayakpaul May 26, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/source/en/_toctree.yml
Expand Up @@ -230,6 +230,8 @@
title: UnCLIP
- local: api/pipelines/latent_diffusion_uncond
title: Unconditional Latent Diffusion
- local: api/pipelines/unidiffuser
title: UniDiffuser
- local: api/pipelines/versatile_diffusion
title: Versatile Diffusion
- local: api/pipelines/vq_diffusion
Expand Down
5 changes: 5 additions & 0 deletions docs/source/en/api/diffusion_pipeline.mdx
Expand Up @@ -45,3 +45,8 @@ By default diffusion pipelines return an object of class
By default diffusion pipelines return an object of class

[[autodoc]] pipelines.AudioPipelineOutput

## ImageTextPipelineOutput
By default diffusion pipelines return an object of class

[[autodoc]] ImageTextPipelineOutput
204 changes: 204 additions & 0 deletions docs/source/en/api/pipelines/unidiffuser.mdx
@@ -0,0 +1,204 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->

# UniDiffuser

The UniDiffuser model was proposed in [One Transformer Fits All Distributions in Multi-Modal Diffusion at Scale](https://arxiv.org/abs/2303.06555) by Fan Bao, Shen Nie, Kaiwen Xue, Chongxuan Li, Shi Pu, Yaole Wang, Gang Yue, Yue Cao, Hang Su, Jun Zhu.

The abstract of the [paper](https://arxiv.org/abs/2303.06555) is the following:

*This paper proposes a unified diffusion framework (dubbed UniDiffuser) to fit all distributions relevant to a set of multi-modal data in one model. Our key insight is -- learning diffusion models for marginal, conditional, and joint distributions can be unified as predicting the noise in the perturbed data, where the perturbation levels (i.e. timesteps) can be different for different modalities. Inspired by the unified view, UniDiffuser learns all distributions simultaneously with a minimal modification to the original diffusion model -- perturbs data in all modalities instead of a single modality, inputs individual timesteps in different modalities, and predicts the noise of all modalities instead of a single modality. UniDiffuser is parameterized by a transformer for diffusion models to handle input types of different modalities. Implemented on large-scale paired image-text data, UniDiffuser is able to perform image, text, text-to-image, image-to-text, and image-text pair generation by setting proper timesteps without additional overhead. In particular, UniDiffuser is able to produce perceptually realistic samples in all tasks and its quantitative results (e.g., the FID and CLIP score) are not only superior to existing general-purpose models but also comparable to the bespoken models (e.g., Stable Diffusion and DALL-E 2) in representative tasks (e.g., text-to-image generation).*

Resources:

* [Paper](https://arxiv.org/abs/2303.06555).
* [Original Code](https://github.com/thu-ml/unidiffuser).

Available Checkpoints are:
- *UniDiffuser-v0 (512x512 resolution)* [thu-ml/unidiffuser-v0](https://huggingface.co/thu-ml/unidiffuser-v0)
- *UniDiffuser-v1 (512x512 resolution)* [thu-ml/unidiffuser-v1](https://huggingface.co/thu-ml/unidiffuser-v1)

This pipeline was contributed by our community member [dg845](https://github.com/dg845).

## Available Pipelines:

| Pipeline | Tasks | Demo | Colab |
|:---:|:---:|:---:|:---:|
| [UniDiffuserPipeline](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pipeline_unidiffuser.py) | *Joint Image-Text Gen*, *Text-to-Image*, *Image-to-Text*,<br> *Image Gen*, *Text Gen*, *Image Variation*, *Text Variation* | [🤗 Spaces](https://huggingface.co/spaces/thu-ml/unidiffuser) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/unidiffuser.ipynb) |

## Usage Examples

Because the UniDiffuser model is trained to model the joint distribution of (image, text) pairs, it is capable of performing a diverse range of generation tasks.

### Unconditional Image and Text Generation

Unconditional generation (where we start from only latents sampled from a standard Gaussian prior) from a [`UniDiffuserPipeline`] will produce a (image, text) pair:

```python
import torch

from diffusers import UniDiffuserPipeline

device = "cuda"
model_id_or_path = "thu-ml/unidiffuser-v1"
pipe = UniDiffuserPipeline.from_pretrained(model_id_or_path, torch_dtype=torch.float16)
pipe.to(device)

# Unconditional image and text generation. The generation task is automatically inferred.
sample = pipe(num_inference_steps=20, guidance_scale=8.0)
image = sample.images[0]
text = sample.text[0]
image.save("unidiffuser_joint_sample_image.png")
print(text)
```

This is also called "joint" generation in the UniDiffusers paper, since we are sampling from the joint image-text distribution.

Note that the generation task is inferred from the inputs used when calling the pipeline.
It is also possible to manually specify the unconditional generation task ("mode") manually with [`UniDiffuserPipeline.set_joint_mode`]:

```python
# Equivalent to the above.
pipe.set_joint_mode()
sample = pipe(num_inference_steps=20, guidance_scale=8.0)
```

When the mode is set manually, subsequent calls to the pipeline will use the set mode without attempting the infer the mode.
You can reset the mode with [`UniDiffuserPipeline.reset_mode`], after which the pipeline will once again infer the mode.

You can also generate only an image or only text (which the UniDiffuser paper calls "marginal" generation since we sample from the marginal distribution of images and text, respectively):

```python
# Unlike other generation tasks, image-only and text-only generation don't use classifier-free guidance
# Image-only generation
pipe.set_image_mode()
sample_image = pipe(num_inference_steps=20).images[0]
# Text-only generation
pipe.set_text_mode()
sample_text = pipe(num_inference_steps=20).text[0]
sayakpaul marked this conversation as resolved.
Show resolved Hide resolved
```

### Text-to-Image Generation

UniDiffuser is also capable of sampling from conditional distributions; that is, the distribution of images conditioned on a text prompt or the distribution of texts conditioned on an image.
Here is an example of sampling from the conditional image distribution (text-to-image generation or text-conditioned image generation):

```python
import torch

from diffusers import UniDiffuserPipeline

device = "cuda"
model_id_or_path = "thu-ml/unidiffuser-v1"
pipe = UniDiffuserPipeline.from_pretrained(model_id_or_path, torch_dtype=torch.float16)
pipe.to(device)

# Text-to-image generation
prompt = "an elephant under the sea"

sample = pipe(prompt=prompt, num_inference_steps=20, guidance_scale=8.0)
t2i_image = sample.images[0]
t2i_image.save("unidiffuser_text2img_sample_image.png")
```

The `text2img` mode requires that either an input `prompt` or `prompt_embeds` be supplied. You can set the `text2img` mode manually with [`UniDiffuserPipeline.set_text_to_image_mode`].

### Image-to-Text Generation

Similarly, UniDiffuser can also produce text samples given an image (image-to-text or image-conditioned text generation):

```python
import torch

from diffusers import UniDiffuserPipeline
from diffusers.utils import load_image

device = "cuda"
model_id_or_path = "thu-ml/unidiffuser-v1"
pipe = UniDiffuserPipeline.from_pretrained(model_id_or_path, torch_dtype=torch.float16)
pipe.to(device)

# Image-to-text generation
image_url = "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/unidiffuser/unidiffuser_example_image.jpg"
init_image = load_image(image_url).resize((512, 512))

sample = pipe(image=init_image, num_inference_steps=20, guidance_scale=8.0)
i2t_text = sample.text[0]
print(i2t_text)
```

The `img2text` mode requires that an input `image` be supplied. You can set the `img2text` mode manually with [`UniDiffuserPipeline.set_image_to_text_mode`].

### Image Variation

The UniDiffuser authors suggest performing image variation through a "round-trip" generation method, where given an input image, we first perform an image-to-text generation, and the perform a text-to-image generation on the outputs of the first generation.
This produces a new image which is semantically similar to the input image:

```python
import torch

from diffusers import UniDiffuserPipeline
from diffusers.utils import load_image

device = "cuda"
model_id_or_path = "thu-ml/unidiffuser-v1"
pipe = UniDiffuserPipeline.from_pretrained(model_id_or_path, torch_dtype=torch.float16)
pipe.to(device)

# Image variation can be performed with a image-to-text generation followed by a text-to-image generation:
# 1. Image-to-text generation
image_url = "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/unidiffuser/unidiffuser_example_image.jpg"
init_image = load_image(image_url).resize((512, 512))

sample = pipe(image=init_image, num_inference_steps=20, guidance_scale=8.0)
i2t_text = sample.text[0]
print(i2t_text)

# 2. Text-to-image generation
sample = pipe(prompt=i2t_text, num_inference_steps=20, guidance_scale=8.0)
final_image = sample.images[0]
final_image.save("unidiffuser_image_variation_sample.png")
```

### Text Variation


Similarly, text variation can be performed on an input prompt with a text-to-image generation followed by a image-to-text generation:

```python
import torch

from diffusers import UniDiffuserPipeline

device = "cuda"
model_id_or_path = "thu-ml/unidiffuser-v1"
pipe = UniDiffuserPipeline.from_pretrained(model_id_or_path, torch_dtype=torch.float16)
pipe.to(device)

# Text variation can be performed with a text-to-image generation followed by a image-to-text generation:
# 1. Text-to-image generation
prompt = "an elephant under the sea"

sample = pipe(prompt=prompt, num_inference_steps=20, guidance_scale=8.0)
t2i_image = sample.images[0]
t2i_image.save("unidiffuser_text2img_sample_image.png")

# 2. Image-to-text generation
sample = pipe(image=t2i_image, num_inference_steps=20, guidance_scale=8.0)
final_prompt = sample.text[0]
print(final_prompt)
```
sayakpaul marked this conversation as resolved.
Show resolved Hide resolved

## UniDiffuserPipeline
[[autodoc]] UniDiffuserPipeline
- all
- __call__