Skip to content

ENFUGUE Web UI v0.3.0

Compare
Choose a tag to compare
@painebenjamin painebenjamin released this 09 Nov 13:44
· 244 commits to main since this release
711d9db

New Features

v030

Animation

ENFUGUE now supports animation. A huge array of changes have been made to accommodate this, including new backend pipelines, new downloadable model support, new interface elements, and a rethought execution planner.

Most importantly, all features available for images work for animation as well. This includes IP adapters, ControlNets, your custom models, LoRA, inversion, and anything else you can think of.

AnimateDiff

The premiere animation toolkit for Stable Diffusion is AnimateDiff for Stable Diffusion 1.5. When using any Stable Diffusion 1.5 model and enabling animation, AnimateDiff is loaded in the backend.

Motion Modules


Selecting motion modules in the GUI.


Motion modules are AI models that are injected into the Stable Diffusion UNet to control how that model interprets motion over time. When using AnimateDiff, you will by default use mm_sd_15_v2.ckpt, the latest base checkpoint. However, fine-tuned checkpoints are already available from the community, and these are supported in pre-configured models and on-the-fly configuration.


Browsing motion modules in CivitAI.


In addition, these are downloadable through the CivitAI download browser.

Motion LoRA


Selecting motion LoRA in the GUI.


Motion LoRA are additional models available to steer AnimateDiff; these were trained on specific camera motions, and can replicate them when using them.

These are always available in the UI, select them from the LoRA menu and they will be downloaded as needed.

Zoom In Zoom Out Zoom Pan Left Zoom Pan Right
Tilt Up Tilt Down Rolling Anti-Clockwise Rolling Clockwise

HotshotXL

HotshotXL is a recently released animation toolkit for Stable Diffusion XL. When you use any Stable Diffusion XL model and enable animation, Hotshot will be loaded in the backend.

Frame Windows


Frame windows in the GUI.


AnimateDiff and Hotshot XL both have limitations on how long they can animate for before losing coherence. To mitigate this, we can only ever attempt to animate a certain number of frames at a time, and blend these frame windows into one another to produce longer coherent motions. Use the Frame Window Size parameter to determine how many frames are used at once, and Frame Window Stride to indicate how many frames to step for the next window.


Left: 32 frames with no window. Notice how the motion loses coherence about halfway. Right: 32 frames with a size-16 window and size-8 stride. Notice the improved coherence of motion (as well as some distortions, which is part of the trade-off of using this technique.)


Position Encoder Slicing


Position encoder slicing in the GUI.


Both HotshotXL and AnimateDiff use 24-frame position encoding. If we cut that encoding short and interpolate the sliced encoding to a new length, we can effectively "slow down" motion. This is an experimental feature.


Left: no position encoder slicing, 16 frames, no window. Right: slicing positions at 16 frames, and scaling to 32 frames. The animation is 32 frames with no window. Notice the slower, smoother motion.


Motion Attention Scaling


Motion attention scaling in the GUI.


The application of motion during the inference process is a distinct step, and as a result of this we can apply a multiplier to how much effect that has on the final output. Using a small bit of math, we can determine at runtime the difference between the trained dimensions of the motion module and the current dimensions of your image, and use that to scale the motion. Enabling this in the UI also gives you access to a motion modifier which you can use to broadly control the "amount" of motion in a resulting video.


Left: No motion attention scaling. Right: Motion attention scaling enabled.


Prompt Travel


The prompt travel interface after enabling it.


Instead of merely offering one prompt during animation, we can interpolate between multiple prompts to change what is being animated at any given moment. Blend action words into one another to steer motion, or use entirely different prompts for morphing effects.


The resulting animation.


FILM - Frame Interpolation for Large Motion

Both HotshotXL and AnimateDiff were trained on 8 frames per second animations. In order to get higher framerates, we must create frames inbetween the AI-generated frames to smooth the motion out over more frames. Simply add a multiplication factor to create in-between frames - for example, a factor of 2 will double (less one) the total frame count by adding one frame in-between every other frame. Adding another factor will interpolate on the interpolated images, so adding a second factor of 2 will re-double (less one).

If you are upscaling and interpolating, the upscaling will be performed first.


Left: 16 frames with no interpolation (8 FPS.) Right: the same 16 frames interpolated twice with a factor of 2 (total 64 frames, 32 FPS.)


Looping and Reflecting

There are two options available to make an animation repeat seamlessly.

  • Reflect will play the animation in reverse after playing it forward. To alleviate the "bounce" that occurs at the inflection points, frame interpolation will be used to smoothly ease these changes.
  • Loop will create an animation that loops seamlessly. This is achieved through the same method as frame windows, only additionally wrapping the frame window around to the beginning. This will increase the total number of steps to make an animation. Note that this can also reduce the overall motion in the image, so it is recommended to combine this with other options such as motion attention scaling.

Left: A looping animation. Right: A reflected animation.

Tips

  • AnimateDiff is best used to make two-second videos at 16 frames per second. This means your frame window size should be 16 when trying to create longer animations with Stable Diffusion 1.5.
  • AnimateDiff performs best with Euler Discrete scheduling.
  • AnimateDiff version 1, as well as any motion modules derived from it (including motion LoRA,) may have visible watermarks due to the training data also having watermarks.
  • AnimateDiff can have artifacting around the corners of images that are larger than 512x512. To mitigate this, you can add around 8 pixels of extra space to trim off later, or use tiled diffusion.
  • HotShotXL is best used to make one-second videos at 8 frames per second. This means your frame window size should be 8 when trying to create longer animations with Stable Diffusion XL.
  • HotShotXL performs best with DDIM scheduling.

GUI Redesign

In order to accommodate animation, and as a refresher over the original design, the GUI has been entirely re-configured. The most significant changes are enumerated below.


ENFUGUE v0.3.0's GUI.


Sidebar Repositioning

The original sidebar has been moved from the right to the left. As the sidebar represented global options, it was decided the left-hand side was the better place for this to follow along the lines of photo manipulation programs like GIMP or Photoshop.

Redesigned Sample Chooser

The chooser that allows you to switch between viewing results and viewing the canvas has been moved to it's own dedicated bar.

In addition, this form takes two forms:


The sample chooser during animation.

The sample chooser during image generation.

Layers Menu


The new layers menu.


A layers menu has been added in the sidebar's place. This contains active options for your current layer.

Global Inpaint Options, Global Denoising Strength, Inverted Inpainting

As all invocations are now performed in a single inference step, there can only be one mask and one denoising strength. These have been moved to the global menu as a result. They will appear when there is any media on the canvas. Check the "Enable Inpainting" option to show the inpainting toolbar.


The UI when inpainting is active.


In addition, inpainting has been inverted from ENFUGUE's previous incarnation: black represents portions of the image left untouched, and white represents portions of the image denoised. This was changed to be more in line with how other UI's display inpainting masks and how they are used in the backend.

More Changes

  • To help alleviate confusion, numerous menus that were previously collapsed have now been made expanded by default.
  • Two quick options have been made available for adjusting the size of an image and the canvas in the toolbar. One will scale the element to the size of the canvas, and one will scale the canvas to the size of the image in the element.
  • The field previously called Engine Size is now called Tile Size.
  • The field previously called Chunking Size is now called Tile Stride.
  • The field previously called Chunking Mask is now called Tile Mask.
  • IP Adapter model type has been changed to a dropdown selection instead of appearing/disappearing checkboxes.

The new options for resizing canvas elements.


Tiling

Tiling has been added to ENFUGUE. Select between horizontally tiling, vertically tiling, or both. It even works with animation!

Select the "display tiled" icon in the sample chooser to see what the image looks like next to itself.


The tiling options in the UI.


A horizontally tiling image.


A vertically tiling image.


A horizontally and vertically tiling image.


Notes

Reporting Bugs, Troubleshooting

There are many, many changes in this release, it is likely that there will be bugs encountered on different operating systems, browsers, GPUs, and workflows. Please see this Wiki page for requested information when submitting bug reports, as well as where logs can be located to do some self-diagnosing.

TensorRT Builds Suspended Indefinitely

TensorRT-specific builds will no longer be released. These have led to significant amounts of confusion over the months, with very few people being able to make use of TensorRT.

It will remain available for the workflows it was previously available for, but you will need to install enfugue using one of the provided conda environment or into a different latent diffusion python environment via pip - see below for full instructions.

Pending MacOS Build

The MacOS build of v0.3.0 is pending. There has been difficulties finding a set of compatible dependencies, but it will be done soon. I apologize for the delay. You are welcome to try installing using the provided conda environment - full instructions below.

Full Changelog: 0.2.5...0.3.0

How-To Guide

Installing and Running: Portable Distributions

Select a portable distribution if you'd like to avoid having to install other programs, or want to have an isolated executable file that doesn't interfere with other environments on your system.

Summary

Platform Graphics API File(s) CUDA Version Torch Version
Windows CUDA enfugue-server-0.3.0-win-cuda-x86_64.zip.001
enfugue-server-0.3.0-win-cuda-x86_64.zip.002
11.8.0 2.1.0
Linux CUDA enfugue-server-0.3.0-manylinux-cuda-x86_64.tar.gz.0
enfugue-server-0.3.0-manylinux-cuda-x86_64.tar.gz.1
enfugue-server-0.3.0-manylinux-cuda-x86_64.tar.gz.2
11.8.0 2.1.0

Linux

To extract these files, you must concatenate them. Rather than taking up space in your file system, you can simply stream them together to tar. A console command to do that is:

cat enfugue-server-0.3.0* | tar -xvz

You are now ready to run the server with:

./enfugue-server/enfugue.sh

Press Ctrl+C to exit.

Windows

Download the win64 files here, and extract them using a program which allows extracting from multiple archives such as 7-Zip.

If you are using 7-Zip, you should not extract both files independently. If they are in the same directory when you unzip the first, 7-Zip will automatically unzip the second. The second file cannot be extracted on its own.

Locate the file enfugue-server.exe, and double-click it to run it. To exit, locate the icon in the bottom-right hand corner of your screen (the system tray) and right-click it, then select Quit.

Installing and Running: Conda

To install with the provided Conda environments, you need to install a version of Conda.

After installing Conda and configuring it so it is available to your shell or command-line, download one of the environment files depending on your platform and graphics API.

  1. First, choose windows-, linux- or macos- based on your platform.
  2. Then, choose your graphics API:
    • If you are on MacOS, you only have access to MPS.
    • If you have an Nvidia GPU or other CUDA-compatible device, select cuda.
    • Additional graphics APIs (rocm and directml) are being added and will be made available as they are developed. Please voice your desire for these to prioritize their development.

Finally, using the file you downloaded, create your Conda environment:

conda env create -f <downloaded_file.yml>

You've now installed Enfugue and all dependencies. To run it, activate the environment and then run the installed binary.

conda activate enfugue
python -m enfugue run

NOTE: The previously recommended command, enfugue run, has been observed to fail in certain environments. For this reason it is recommended to use the above more universally-compatible command.

Optional: DWPose Support

To install DW Pose support (a better, faster pose and face detection model), after installing Enfugue, execute the following (MacOS, Linux or Windows):

mim install "mmcv>=2.0.1"
mim install "mmdet>=3.1.0"
mim install "mmpose>=1.1.0"

Installing and Running: Self-Managed Environment

If you would like to manage dependencies yourself, or want to install Enfugue into an environment to share with another Stable Diffusion UI, you can install enfugue via pip. This is the only method available for AMD GPU's at present.

pip install enfugue

If you are on Linux and want TensorRT support, execute:

pip install enfugue[tensorrt]

If you are on Windows and want TensorRT support, follow the steps detailed here.

Thank you!