diff --git a/README.md b/README.md index dac3b3598aaf..68202ba095ee 100644 --- a/README.md +++ b/README.md @@ -37,7 +37,7 @@ limitations under the License. ## Installation -We recommend installing 🤗 Diffusers in a virtual environment from PyPI or Conda. For more details about installing [PyTorch](https://pytorch.org/get-started/locally/) and [Flax](https://flax.readthedocs.io/en/latest/#installation), please refer to their official documentation. +We recommend installing 🤗 Diffusers in a virtual environment from PyPI or Conda. For more details about installing [PyTorch](https://pytorch.org/get-started/locally/), please refer to their official documentation. ### PyTorch @@ -53,14 +53,6 @@ With `conda` (maintained by the community): conda install -c conda-forge diffusers ``` -### Flax - -With `pip` (official package): - -```bash -pip install --upgrade diffusers[flax] -``` - ### Apple Silicon (M1/M2) support Please refer to the [How to use Stable Diffusion in Apple Silicon](https://huggingface.co/docs/diffusers/optimization/mps) guide. diff --git a/docs/source/en/_toctree.yml b/docs/source/en/_toctree.yml index 18adba92235e..bf7f9c1354ff 100644 --- a/docs/source/en/_toctree.yml +++ b/docs/source/en/_toctree.yml @@ -194,8 +194,6 @@ - title: Model accelerators and hardware isExpanded: false sections: - - local: using-diffusers/stable_diffusion_jax_how_to - title: JAX/Flax - local: optimization/onnx title: ONNX - local: optimization/open_vino diff --git a/docs/source/en/api/models/autoencoderkl.md b/docs/source/en/api/models/autoencoderkl.md index baeab4017be3..3d949e9bb06c 100644 --- a/docs/source/en/api/models/autoencoderkl.md +++ b/docs/source/en/api/models/autoencoderkl.md @@ -44,15 +44,3 @@ model = AutoencoderKL.from_single_file(url) ## DecoderOutput [[autodoc]] models.autoencoders.vae.DecoderOutput - -## FlaxAutoencoderKL - -[[autodoc]] FlaxAutoencoderKL - -## FlaxAutoencoderKLOutput - -[[autodoc]] models.vae_flax.FlaxAutoencoderKLOutput - -## FlaxDecoderOutput - -[[autodoc]] models.vae_flax.FlaxDecoderOutput diff --git a/docs/source/en/api/models/controlnet.md b/docs/source/en/api/models/controlnet.md index 7ce14f17d56a..f56b7383a0d7 100644 --- a/docs/source/en/api/models/controlnet.md +++ b/docs/source/en/api/models/controlnet.md @@ -40,11 +40,3 @@ pipe = StableDiffusionControlNetPipeline.from_single_file(url, controlnet=contro ## ControlNetOutput [[autodoc]] models.controlnets.controlnet.ControlNetOutput - -## FlaxControlNetModel - -[[autodoc]] FlaxControlNetModel - -## FlaxControlNetOutput - -[[autodoc]] models.controlnets.controlnet_flax.FlaxControlNetOutput diff --git a/docs/source/en/api/models/overview.md b/docs/source/en/api/models/overview.md index 1c6a2092e684..eb9722739f99 100644 --- a/docs/source/en/api/models/overview.md +++ b/docs/source/en/api/models/overview.md @@ -19,10 +19,6 @@ All models are built from the base [`ModelMixin`] class which is a [`torch.nn.Mo ## ModelMixin [[autodoc]] ModelMixin -## FlaxModelMixin - -[[autodoc]] FlaxModelMixin - ## PushToHubMixin [[autodoc]] utils.PushToHubMixin diff --git a/docs/source/en/api/models/unet2d-cond.md b/docs/source/en/api/models/unet2d-cond.md index 175fb1122019..99a7c41ab286 100644 --- a/docs/source/en/api/models/unet2d-cond.md +++ b/docs/source/en/api/models/unet2d-cond.md @@ -23,9 +23,3 @@ The abstract from the paper is: ## UNet2DConditionOutput [[autodoc]] models.unets.unet_2d_condition.UNet2DConditionOutput - -## FlaxUNet2DConditionModel -[[autodoc]] models.unets.unet_2d_condition_flax.FlaxUNet2DConditionModel - -## FlaxUNet2DConditionOutput -[[autodoc]] models.unets.unet_2d_condition_flax.FlaxUNet2DConditionOutput diff --git a/docs/source/en/api/outputs.md b/docs/source/en/api/outputs.md index bed92f10f94a..a13bc89f2bea 100644 --- a/docs/source/en/api/outputs.md +++ b/docs/source/en/api/outputs.md @@ -54,10 +54,6 @@ To check a specific pipeline or model output, refer to its corresponding API doc [[autodoc]] pipelines.ImagePipelineOutput -## FlaxImagePipelineOutput - -[[autodoc]] pipelines.pipeline_flax_utils.FlaxImagePipelineOutput - ## AudioPipelineOutput [[autodoc]] pipelines.AudioPipelineOutput diff --git a/docs/source/en/api/pipelines/controlnet.md b/docs/source/en/api/pipelines/controlnet.md index eea3473d3609..2a654a37357f 100644 --- a/docs/source/en/api/pipelines/controlnet.md +++ b/docs/source/en/api/pipelines/controlnet.md @@ -72,11 +72,3 @@ Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) ## StableDiffusionPipelineOutput [[autodoc]] pipelines.stable_diffusion.StableDiffusionPipelineOutput - -## FlaxStableDiffusionControlNetPipeline -[[autodoc]] FlaxStableDiffusionControlNetPipeline - - all - - __call__ - -## FlaxStableDiffusionControlNetPipelineOutput -[[autodoc]] pipelines.stable_diffusion.FlaxStableDiffusionPipelineOutput diff --git a/docs/source/en/api/pipelines/overview.md b/docs/source/en/api/pipelines/overview.md index b5e3825fef6d..d3cc318a5459 100644 --- a/docs/source/en/api/pipelines/overview.md +++ b/docs/source/en/api/pipelines/overview.md @@ -106,10 +106,6 @@ The table below lists all the pipelines currently available in 🤗 Diffusers an [[autodoc]] pipelines.StableDiffusionMixin.disable_freeu -## FlaxDiffusionPipeline - -[[autodoc]] pipelines.pipeline_flax_utils.FlaxDiffusionPipeline - ## PushToHubMixin [[autodoc]] utils.PushToHubMixin diff --git a/docs/source/en/api/pipelines/stable_diffusion/img2img.md b/docs/source/en/api/pipelines/stable_diffusion/img2img.md index 32d83daadfb9..bec67b4f4e89 100644 --- a/docs/source/en/api/pipelines/stable_diffusion/img2img.md +++ b/docs/source/en/api/pipelines/stable_diffusion/img2img.md @@ -47,13 +47,3 @@ Make sure to check out the Stable Diffusion [Tips](overview#tips) section to lea ## StableDiffusionPipelineOutput [[autodoc]] pipelines.stable_diffusion.StableDiffusionPipelineOutput - -## FlaxStableDiffusionImg2ImgPipeline - -[[autodoc]] FlaxStableDiffusionImg2ImgPipeline - - all - - __call__ - -## FlaxStableDiffusionPipelineOutput - -[[autodoc]] pipelines.stable_diffusion.FlaxStableDiffusionPipelineOutput diff --git a/docs/source/en/api/pipelines/stable_diffusion/inpaint.md b/docs/source/en/api/pipelines/stable_diffusion/inpaint.md index eee794fe6a9e..0b558b2fc01d 100644 --- a/docs/source/en/api/pipelines/stable_diffusion/inpaint.md +++ b/docs/source/en/api/pipelines/stable_diffusion/inpaint.md @@ -49,13 +49,3 @@ If you're interested in using one of the official checkpoints for a task, explor ## StableDiffusionPipelineOutput [[autodoc]] pipelines.stable_diffusion.StableDiffusionPipelineOutput - -## FlaxStableDiffusionInpaintPipeline - -[[autodoc]] FlaxStableDiffusionInpaintPipeline - - all - - __call__ - -## FlaxStableDiffusionPipelineOutput - -[[autodoc]] pipelines.stable_diffusion.FlaxStableDiffusionPipelineOutput diff --git a/docs/source/en/api/pipelines/stable_diffusion/text2img.md b/docs/source/en/api/pipelines/stable_diffusion/text2img.md index 5cd942395de7..c17348c8ffd4 100644 --- a/docs/source/en/api/pipelines/stable_diffusion/text2img.md +++ b/docs/source/en/api/pipelines/stable_diffusion/text2img.md @@ -51,13 +51,3 @@ If you're interested in using one of the official checkpoints for a task, explor ## StableDiffusionPipelineOutput [[autodoc]] pipelines.stable_diffusion.StableDiffusionPipelineOutput - -## FlaxStableDiffusionPipeline - -[[autodoc]] FlaxStableDiffusionPipeline - - all - - __call__ - -## FlaxStableDiffusionPipelineOutput - -[[autodoc]] pipelines.stable_diffusion.FlaxStableDiffusionPipelineOutput diff --git a/docs/source/en/installation.md b/docs/source/en/installation.md index 179efb510be7..abde3251de27 100644 --- a/docs/source/en/installation.md +++ b/docs/source/en/installation.md @@ -12,7 +12,7 @@ specific language governing permissions and limitations under the License. # Installation -Diffusers is tested on Python 3.8+, PyTorch 1.4+, and Flax 0.4.1+. Follow the installation instructions for the deep learning library you're using, [PyTorch](https://pytorch.org/get-started/locally/) or [Flax](https://flax.readthedocs.io/en/latest/). +Diffusers is tested on Python 3.8+ and PyTorch 1.4+. Install [PyTorch](https://pytorch.org/get-started/locally/) according to your system and setup. Create a [virtual environment](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/) for easier management of separate projects and to avoid compatibility issues between dependencies. Use [uv](https://docs.astral.sh/uv/), a Rust-based Python package and project manager, to create a virtual environment and install Diffusers. @@ -32,12 +32,6 @@ PyTorch only supports Python 3.8 - 3.11 on Windows. uv pip install diffusers["torch"] transformers ``` -Use the command below for Flax. - -```bash -uv pip install diffusers["flax"] transformers -``` - @@ -71,27 +65,12 @@ An editable install is recommended for development workflows or if you're using Clone the repository and install Diffusers with the following commands. - - - ```bash git clone https://github.com/huggingface/diffusers.git cd diffusers uv pip install -e ".[torch]" ``` - - - -```bash -git clone https://github.com/huggingface/diffusers.git -cd diffusers -uv pip install -e ".[flax]" -``` - - - - > [!WARNING] > You must keep the `diffusers` folder if you want to keep using the library with the editable install. @@ -140,7 +119,7 @@ For more details about managing and cleaning the cache, take a look at the [Unde ## Telemetry logging Diffusers gathers telemetry information during [`~DiffusionPipeline.from_pretrained`] requests. -The data gathered includes the Diffusers and PyTorch/Flax version, the requested model or pipeline class, +The data gathered includes the Diffusers and PyTorch version, the requested model or pipeline class, and the path to a pretrained checkpoint if it is hosted on the Hub. This usage data helps us debug issues and prioritize new features. diff --git a/docs/source/en/training/controlnet.md b/docs/source/en/training/controlnet.md index 0170ff3da9ea..17da819db84b 100644 --- a/docs/source/en/training/controlnet.md +++ b/docs/source/en/training/controlnet.md @@ -14,7 +14,7 @@ specific language governing permissions and limitations under the License. [ControlNet](https://hf.co/papers/2302.05543) models are adapters trained on top of another pretrained model. It allows for a greater degree of control over image generation by conditioning the model with an additional input image. The input image can be a canny edge, depth map, human pose, and many more. -If you're training on a GPU with limited vRAM, you should try enabling the `gradient_checkpointing`, `gradient_accumulation_steps`, and `mixed_precision` parameters in the training command. You can also reduce your memory footprint by using memory-efficient attention with [xFormers](../optimization/xformers). JAX/Flax training is also supported for efficient training on TPUs and GPUs, but it doesn't support gradient checkpointing or xFormers. You should have a GPU with >30GB of memory if you want to train faster with Flax. +If you're training on a GPU with limited vRAM, you should try enabling the `gradient_checkpointing`, `gradient_accumulation_steps`, and `mixed_precision` parameters in the training command. You can also reduce your memory footprint by using memory-efficient attention with [xFormers](../optimization/xformers). This guide will explore the [train_controlnet.py](https://github.com/huggingface/diffusers/blob/main/examples/controlnet/train_controlnet.py) training script to help you become familiar with it, and how you can adapt it for your own use-case. @@ -28,45 +28,10 @@ pip install . Then navigate to the example folder containing the training script and install the required dependencies for the script you're using: - - ```bash cd examples/controlnet pip install -r requirements.txt ``` - - - -If you have access to a TPU, the Flax training script runs even faster! Let's run the training script on the [Google Cloud TPU VM](https://cloud.google.com/tpu/docs/run-calculation-jax). Create a single TPU v4-8 VM and connect to it: - -```bash -ZONE=us-central2-b -TPU_TYPE=v4-8 -VM_NAME=hg_flax - -gcloud alpha compute tpus tpu-vm create $VM_NAME \ - --zone $ZONE \ - --accelerator-type $TPU_TYPE \ - --version tpu-vm-v4-base - -gcloud alpha compute tpus tpu-vm ssh $VM_NAME --zone $ZONE -- \ -``` - -Install JAX 0.4.5: - -```bash -pip install "jax[tpu]==0.4.5" -f https://storage.googleapis.com/jax-releases/libtpu_releases.html -``` - -Then install the required dependencies for the Flax script: - -```bash -cd examples/controlnet -pip install -r requirements_flax.txt -``` - - - @@ -120,7 +85,7 @@ Many of the basic and important parameters are described in the [Text-to-image]( ### Min-SNR weighting -The [Min-SNR](https://huggingface.co/papers/2303.09556) weighting strategy can help with training by rebalancing the loss to achieve faster convergence. The training script supports predicting `epsilon` (noise) or `v_prediction`, but Min-SNR is compatible with both prediction types. This weighting strategy is only supported by PyTorch and is unavailable in the Flax training script. +The [Min-SNR](https://huggingface.co/papers/2303.09556) weighting strategy can help with training by rebalancing the loss to achieve faster convergence. The training script supports predicting `epsilon` (noise) or `v_prediction`, but Min-SNR is compatible with both prediction types. This weighting strategy is only supported by PyTorch. Add the `--snr_gamma` parameter and set it to the recommended value of 5.0: @@ -272,9 +237,6 @@ That's it! You don't need to add any additional parameters to your training comm - - - ```bash export MODEL_DIR="stable-diffusion-v1-5/stable-diffusion-v1-5" export OUTPUT_DIR="path/to/save/model" @@ -292,47 +254,6 @@ accelerate launch train_controlnet.py \ --push_to_hub ``` - - - -With Flax, you can [profile your code](https://jax.readthedocs.io/en/latest/profiling.html) by adding the `--profile_steps==5` parameter to your training command. Install the Tensorboard profile plugin: - -```bash -pip install tensorflow tensorboard-plugin-profile -tensorboard --logdir runs/fill-circle-100steps-20230411_165612/ -``` - -Then you can inspect the profile at [http://localhost:6006/#profile](http://localhost:6006/#profile). - - - -If you run into version conflicts with the plugin, try uninstalling and reinstalling all versions of TensorFlow and Tensorboard. The debugging functionality of the profile plugin is still experimental, and not all views are fully functional. The `trace_viewer` cuts off events after 1M, which can result in all your device traces getting lost if for example, you profile the compilation step by accident. - - - -```bash -python3 train_controlnet_flax.py \ - --pretrained_model_name_or_path=$MODEL_DIR \ - --output_dir=$OUTPUT_DIR \ - --dataset_name=fusing/fill50k \ - --resolution=512 \ - --learning_rate=1e-5 \ - --validation_image "./conditioning_image_1.png" "./conditioning_image_2.png" \ - --validation_prompt "red circle with blue background" "cyan circle with brown floral background" \ - --validation_steps=1000 \ - --train_batch_size=2 \ - --revision="non-ema" \ - --from_pt \ - --report_to="wandb" \ - --tracker_project_name=$HUB_MODEL_ID \ - --num_train_epochs=11 \ - --push_to_hub \ - --hub_model_id=$HUB_MODEL_ID -``` - - - - Once training is complete, you can use your newly trained model for inference! ```py diff --git a/docs/source/en/training/dreambooth.md b/docs/source/en/training/dreambooth.md index cff2bb500dab..3a5ba5aa39c3 100644 --- a/docs/source/en/training/dreambooth.md +++ b/docs/source/en/training/dreambooth.md @@ -14,7 +14,7 @@ specific language governing permissions and limitations under the License. [DreamBooth](https://huggingface.co/papers/2208.12242) is a training technique that updates the entire diffusion model by training on just a few images of a subject or style. It works by associating a special word in the prompt with the example images. -If you're training on a GPU with limited vRAM, you should try enabling the `gradient_checkpointing` and `mixed_precision` parameters in the training command. You can also reduce your memory footprint by using memory-efficient attention with [xFormers](../optimization/xformers). JAX/Flax training is also supported for efficient training on TPUs and GPUs, but it doesn't support gradient checkpointing or xFormers. You should have a GPU with >30GB of memory if you want to train faster with Flax. +If you're training on a GPU with limited vRAM, you should try enabling the `gradient_checkpointing` and `mixed_precision` parameters in the training command. You can also reduce your memory footprint by using memory-efficient attention with [xFormers](../optimization/xformers). This guide will explore the [train_dreambooth.py](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/train_dreambooth.py) script to help you become more familiar with it, and how you can adapt it for your own use-case. @@ -28,25 +28,11 @@ pip install . Navigate to the example folder with the training script and install the required dependencies for the script you're using: - - - ```bash cd examples/dreambooth pip install -r requirements.txt ``` - - - -```bash -cd examples/dreambooth -pip install -r requirements_flax.txt -``` - - - - 🤗 Accelerate is a library for helping you train on multiple GPUs/TPUs or with mixed-precision. It'll automatically configure your training setup based on your hardware and environment. Take a look at the 🤗 Accelerate [Quick tour](https://huggingface.co/docs/accelerate/quicktour) to learn more. @@ -110,7 +96,7 @@ Some basic and important parameters to know and specify are: ### Min-SNR weighting -The [Min-SNR](https://huggingface.co/papers/2303.09556) weighting strategy can help with training by rebalancing the loss to achieve faster convergence. The training script supports predicting `epsilon` (noise) or `v_prediction`, but Min-SNR is compatible with both prediction types. This weighting strategy is only supported by PyTorch and is unavailable in the Flax training script. +The [Min-SNR](https://huggingface.co/papers/2303.09556) weighting strategy can help with training by rebalancing the loss to achieve faster convergence. The training script supports predicting `epsilon` (noise) or `v_prediction`, but Min-SNR is compatible with both prediction types. This weighting strategy is only supported by PyTorch. Add the `--snr_gamma` parameter and set it to the recommended value of 5.0: @@ -311,9 +297,6 @@ That's it! You don't need to add any additional parameters to your training comm - - - ```bash export MODEL_NAME="stable-diffusion-v1-5/stable-diffusion-v1-5" export INSTANCE_DIR="./dog" @@ -334,29 +317,6 @@ accelerate launch train_dreambooth.py \ --push_to_hub ``` - - - -```bash -export MODEL_NAME="duongna/stable-diffusion-v1-4-flax" -export INSTANCE_DIR="./dog" -export OUTPUT_DIR="path-to-save-model" - -python train_dreambooth_flax.py \ - --pretrained_model_name_or_path=$MODEL_NAME \ - --instance_data_dir=$INSTANCE_DIR \ - --output_dir=$OUTPUT_DIR \ - --instance_prompt="a photo of sks dog" \ - --resolution=512 \ - --train_batch_size=1 \ - --learning_rate=5e-6 \ - --max_train_steps=400 \ - --push_to_hub -``` - - - - Once training is complete, you can use your newly trained model for inference! @@ -383,9 +343,6 @@ image.save("dog-bucket.png") - - - ```py from diffusers import DiffusionPipeline import torch @@ -395,39 +352,6 @@ image = pipeline("A photo of sks dog in a bucket", num_inference_steps=50, guida image.save("dog-bucket.png") ``` - - - -```py -import jax -import numpy as np -from flax.jax_utils import replicate -from flax.training.common_utils import shard -from diffusers import FlaxStableDiffusionPipeline - -pipeline, params = FlaxStableDiffusionPipeline.from_pretrained("path-to-your-trained-model", dtype=jax.numpy.bfloat16) - -prompt = "A photo of sks dog in a bucket" -prng_seed = jax.random.PRNGKey(0) -num_inference_steps = 50 - -num_samples = jax.device_count() -prompt = num_samples * [prompt] -prompt_ids = pipeline.prepare_inputs(prompt) - -# shard inputs and rng -params = replicate(params) -prng_seed = jax.random.split(prng_seed, jax.device_count()) -prompt_ids = shard(prompt_ids) - -images = pipeline(prompt_ids, params, prng_seed, num_inference_steps, jit=True).images -images = pipeline.numpy_to_pil(np.asarray(images.reshape((num_samples,) + images.shape[-3:]))) -image.save("dog-bucket.png") -``` - - - - ## LoRA LoRA is a training technique for significantly reducing the number of trainable parameters. As a result, training is faster and it is easier to store the resulting weights because they are a lot smaller (~100MBs). Use the [train_dreambooth_lora.py](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/train_dreambooth_lora.py) script to train with LoRA. diff --git a/docs/source/en/training/kandinsky.md b/docs/source/en/training/kandinsky.md index 77f7af03b801..561bc1c351b7 100644 --- a/docs/source/en/training/kandinsky.md +++ b/docs/source/en/training/kandinsky.md @@ -88,7 +88,7 @@ Most of the parameters are identical to the parameters in the [Text-to-image](te ### Min-SNR weighting -The [Min-SNR](https://huggingface.co/papers/2303.09556) weighting strategy can help with training by rebalancing the loss to achieve faster convergence. The training script supports predicting `epsilon` (noise) or `v_prediction`, but Min-SNR is compatible with both prediction types. This weighting strategy is only supported by PyTorch and is unavailable in the Flax training script. +The [Min-SNR](https://huggingface.co/papers/2303.09556) weighting strategy can help with training by rebalancing the loss to achieve faster convergence. The training script supports predicting `epsilon` (noise) or `v_prediction`, but Min-SNR is compatible with both prediction types. This weighting strategy is only supported by PyTorch. Add the `--snr_gamma` parameter and set it to the recommended value of 5.0: diff --git a/docs/source/en/training/lora.md b/docs/source/en/training/lora.md index 9a3512dd76df..e97d8acdac46 100644 --- a/docs/source/en/training/lora.md +++ b/docs/source/en/training/lora.md @@ -38,25 +38,11 @@ pip install . Navigate to the example folder with the training script and install the required dependencies for the script you're using: - - - ```bash cd examples/text_to_image pip install -r requirements.txt ``` - - - -```bash -cd examples/text_to_image -pip install -r requirements_flax.txt -``` - - - - 🤗 Accelerate is a library for helping you train on multiple GPUs/TPUs or with mixed-precision. It'll automatically configure your training setup based on your hardware and environment. Take a look at the 🤗 Accelerate [Quick tour](https://huggingface.co/docs/accelerate/quicktour) to learn more. diff --git a/docs/source/en/training/overview.md b/docs/source/en/training/overview.md index 032900d9ac20..55d6b1966137 100644 --- a/docs/source/en/training/overview.md +++ b/docs/source/en/training/overview.md @@ -23,18 +23,18 @@ Each training script is: Our current collection of training scripts include: -| Training | SDXL-support | LoRA-support | Flax-support | -|---|---|---|---| -| [unconditional image generation](https://github.com/huggingface/diffusers/tree/main/examples/unconditional_image_generation) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/training_example.ipynb) | | | | -| [text-to-image](https://github.com/huggingface/diffusers/tree/main/examples/text_to_image) | 👍 | 👍 | 👍 | -| [textual inversion](https://github.com/huggingface/diffusers/tree/main/examples/textual_inversion) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/sd_textual_inversion_training.ipynb) | | | 👍 | -| [DreamBooth](https://github.com/huggingface/diffusers/tree/main/examples/dreambooth) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/sd_dreambooth_training.ipynb) | 👍 | 👍 | 👍 | -| [ControlNet](https://github.com/huggingface/diffusers/tree/main/examples/controlnet) | 👍 | | 👍 | -| [InstructPix2Pix](https://github.com/huggingface/diffusers/tree/main/examples/instruct_pix2pix) | 👍 | | | -| [Custom Diffusion](https://github.com/huggingface/diffusers/tree/main/examples/custom_diffusion) | | | | -| [T2I-Adapters](https://github.com/huggingface/diffusers/tree/main/examples/t2i_adapter) | 👍 | | | -| [Kandinsky 2.2](https://github.com/huggingface/diffusers/tree/main/examples/kandinsky2_2/text_to_image) | | 👍 | | -| [Wuerstchen](https://github.com/huggingface/diffusers/tree/main/examples/wuerstchen/text_to_image) | | 👍 | | +| Training | SDXL-support | LoRA-support | +|---|---|---| +| [unconditional image generation](https://github.com/huggingface/diffusers/tree/main/examples/unconditional_image_generation) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/training_example.ipynb) | | | +| [text-to-image](https://github.com/huggingface/diffusers/tree/main/examples/text_to_image) | 👍 | 👍 | +| [textual inversion](https://github.com/huggingface/diffusers/tree/main/examples/textual_inversion) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/sd_textual_inversion_training.ipynb) | | | +| [DreamBooth](https://github.com/huggingface/diffusers/tree/main/examples/dreambooth) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/sd_dreambooth_training.ipynb) | 👍 | 👍 | +| [ControlNet](https://github.com/huggingface/diffusers/tree/main/examples/controlnet) | 👍 | | +| [InstructPix2Pix](https://github.com/huggingface/diffusers/tree/main/examples/instruct_pix2pix) | 👍 | | +| [Custom Diffusion](https://github.com/huggingface/diffusers/tree/main/examples/custom_diffusion) | | | +| [T2I-Adapters](https://github.com/huggingface/diffusers/tree/main/examples/t2i_adapter) | 👍 | | +| [Kandinsky 2.2](https://github.com/huggingface/diffusers/tree/main/examples/kandinsky2_2/text_to_image) | | 👍 | +| [Wuerstchen](https://github.com/huggingface/diffusers/tree/main/examples/wuerstchen/text_to_image) | | 👍 | These examples are **actively** maintained, so please feel free to open an issue if they aren't working as expected. If you feel like another training example should be included, you're more than welcome to start a [Feature Request](https://github.com/huggingface/diffusers/issues/new?assignees=&labels=&template=feature_request.md&title=) to discuss your feature idea with us and whether it meets our criteria of being self-contained, easy-to-tweak, beginner-friendly, and single-purpose. @@ -48,7 +48,7 @@ cd diffusers pip install . ``` -Then navigate to the folder of the training script (for example, [DreamBooth](https://github.com/huggingface/diffusers/tree/main/examples/dreambooth)) and install the `requirements.txt` file. Some training scripts have a specific requirement file for SDXL, LoRA or Flax. If you're using one of these scripts, make sure you install its corresponding requirements file. +Then navigate to the folder of the training script (for example, [DreamBooth](https://github.com/huggingface/diffusers/tree/main/examples/dreambooth)) and install the `requirements.txt` file. Some training scripts have a specific requirement file for SDXL or LoRA. If you're using one of these scripts, make sure you install its corresponding requirements file. ```bash cd examples/dreambooth diff --git a/docs/source/en/training/sdxl.md b/docs/source/en/training/sdxl.md index da8b93b6d690..12051b7c2d11 100644 --- a/docs/source/en/training/sdxl.md +++ b/docs/source/en/training/sdxl.md @@ -96,7 +96,7 @@ Most of the parameters are identical to the parameters in the [Text-to-image](te ### Min-SNR weighting -The [Min-SNR](https://huggingface.co/papers/2303.09556) weighting strategy can help with training by rebalancing the loss to achieve faster convergence. The training script supports predicting either `epsilon` (noise) or `v_prediction`, but Min-SNR is compatible with both prediction types. This weighting strategy is only supported by PyTorch and is unavailable in the Flax training script. +The [Min-SNR](https://huggingface.co/papers/2303.09556) weighting strategy can help with training by rebalancing the loss to achieve faster convergence. The training script supports predicting either `epsilon` (noise) or `v_prediction`, but Min-SNR is compatible with both prediction types. This weighting strategy is only supported by PyTorch. Add the `--snr_gamma` parameter and set it to the recommended value of 5.0: diff --git a/docs/source/en/training/text2image.md b/docs/source/en/training/text2image.md index 182621e89bdf..5212fe8393bc 100644 --- a/docs/source/en/training/text2image.md +++ b/docs/source/en/training/text2image.md @@ -20,7 +20,7 @@ The text-to-image script is experimental, and it's easy to overfit and run into Text-to-image models like Stable Diffusion are conditioned to generate images given a text prompt. -Training a model can be taxing on your hardware, but if you enable `gradient_checkpointing` and `mixed_precision`, it is possible to train a model on a single 24GB GPU. If you're training with larger batch sizes or want to train faster, it's better to use GPUs with more than 30GB of memory. You can reduce your memory footprint by enabling memory-efficient attention with [xFormers](../optimization/xformers). JAX/Flax training is also supported for efficient training on TPUs and GPUs, but it doesn't support gradient checkpointing, gradient accumulation or xFormers. A GPU with at least 30GB of memory or a TPU v3 is recommended for training with Flax. +Training a model can be taxing on your hardware, but if you enable `gradient_checkpointing` and `mixed_precision`, it is possible to train a model on a single 24GB GPU. If you're training with larger batch sizes or want to train faster, it's better to use GPUs with more than 30GB of memory. You can reduce your memory footprint by enabling memory-efficient attention with [xFormers](../optimization/xformers). This guide will explore the [train_text_to_image.py](https://github.com/huggingface/diffusers/blob/main/examples/text_to_image/train_text_to_image.py) training script to help you become familiar with it, and how you can adapt it for your own use-case. @@ -34,20 +34,10 @@ pip install . Then navigate to the example folder containing the training script and install the required dependencies for the script you're using: - - ```bash cd examples/text_to_image pip install -r requirements.txt ``` - - -```bash -cd examples/text_to_image -pip install -r requirements_flax.txt -``` - - @@ -106,7 +96,7 @@ Some basic and important parameters include: ### Min-SNR weighting -The [Min-SNR](https://huggingface.co/papers/2303.09556) weighting strategy can help with training by rebalancing the loss to achieve faster convergence. The training script supports predicting `epsilon` (noise) or `v_prediction`, but Min-SNR is compatible with both prediction types. This weighting strategy is only supported by PyTorch and is unavailable in the Flax training script. +The [Min-SNR](https://huggingface.co/papers/2303.09556) weighting strategy can help with training by rebalancing the loss to achieve faster convergence. The training script supports predicting `epsilon` (noise) or `v_prediction`, but Min-SNR is compatible with both prediction types. This weighting strategy is only supported by PyTorch. Add the `--snr_gamma` parameter and set it to the recommended value of 5.0: @@ -155,9 +145,6 @@ Lastly, the [training loop](https://github.com/huggingface/diffusers/blob/8959c5 Once you've made all your changes or you're okay with the default configuration, you're ready to launch the training script! 🚀 - - - Let's train on the [Naruto BLIP captions](https://huggingface.co/datasets/lambdalabs/naruto-blip-captions) dataset to generate your own Naruto characters. Set the environment variables `MODEL_NAME` and `dataset_name` to the model and the dataset (either from the Hub or a local path). If you're training on more than one GPU, add the `--multi_gpu` parameter to the `accelerate launch` command. @@ -187,43 +174,8 @@ accelerate launch --mixed_precision="fp16" train_text_to_image.py \ --push_to_hub ``` - - - -Training with Flax can be faster on TPUs and GPUs thanks to [@duongna211](https://github.com/duongna21). Flax is more efficient on a TPU, but GPU performance is also great. - -Set the environment variables `MODEL_NAME` and `dataset_name` to the model and the dataset (either from the Hub or a local path). - - - -To train on a local dataset, set the `TRAIN_DIR` and `OUTPUT_DIR` environment variables to the path of the dataset and where to save the model to. - - - -```bash -export MODEL_NAME="stable-diffusion-v1-5/stable-diffusion-v1-5" -export dataset_name="lambdalabs/naruto-blip-captions" - -python train_text_to_image_flax.py \ - --pretrained_model_name_or_path=$MODEL_NAME \ - --dataset_name=$dataset_name \ - --resolution=512 --center_crop --random_flip \ - --train_batch_size=1 \ - --max_train_steps=15000 \ - --learning_rate=1e-05 \ - --max_grad_norm=1 \ - --output_dir="sd-naruto-model" \ - --push_to_hub -``` - - - - Once training is complete, you can use your newly trained model for inference: - - - ```py from diffusers import StableDiffusionPipeline import torch @@ -234,39 +186,6 @@ image = pipeline(prompt="yoda").images[0] image.save("yoda-naruto.png") ``` - - - -```py -import jax -import numpy as np -from flax.jax_utils import replicate -from flax.training.common_utils import shard -from diffusers import FlaxStableDiffusionPipeline - -pipeline, params = FlaxStableDiffusionPipeline.from_pretrained("path/to/saved_model", dtype=jax.numpy.bfloat16) - -prompt = "yoda naruto" -prng_seed = jax.random.PRNGKey(0) -num_inference_steps = 50 - -num_samples = jax.device_count() -prompt = num_samples * [prompt] -prompt_ids = pipeline.prepare_inputs(prompt) - -# shard inputs and rng -params = replicate(params) -prng_seed = jax.random.split(prng_seed, jax.device_count()) -prompt_ids = shard(prompt_ids) - -images = pipeline(prompt_ids, params, prng_seed, num_inference_steps, jit=True).images -images = pipeline.numpy_to_pil(np.asarray(images.reshape((num_samples,) + images.shape[-3:]))) -image.save("yoda-naruto.png") -``` - - - - ## Next steps Congratulations on training your own text-to-image model! To learn more about how to use your new model, the following guides may be helpful: diff --git a/docs/source/en/training/text_inversion.md b/docs/source/en/training/text_inversion.md index b7083ae589ed..91af2f6afb81 100644 --- a/docs/source/en/training/text_inversion.md +++ b/docs/source/en/training/text_inversion.md @@ -14,7 +14,7 @@ specific language governing permissions and limitations under the License. [Textual Inversion](https://hf.co/papers/2208.01618) is a training technique for personalizing image generation models with just a few example images of what you want it to learn. This technique works by learning and updating the text embeddings (the new embeddings are tied to a special word you must use in the prompt) to match the example images you provide. -If you're training on a GPU with limited vRAM, you should try enabling the `gradient_checkpointing` and `mixed_precision` parameters in the training command. You can also reduce your memory footprint by using memory-efficient attention with [xFormers](../optimization/xformers). JAX/Flax training is also supported for efficient training on TPUs and GPUs, but it doesn't support gradient checkpointing or xFormers. With the same configuration and setup as PyTorch, the Flax training script should be at least ~70% faster! +If you're training on a GPU with limited vRAM, you should try enabling the `gradient_checkpointing` and `mixed_precision` parameters in the training command. You can also reduce your memory footprint by using memory-efficient attention with [xFormers](../optimization/xformers). This guide will explore the [textual_inversion.py](https://github.com/huggingface/diffusers/blob/main/examples/textual_inversion/textual_inversion.py) script to help you become more familiar with it, and how you can adapt it for your own use-case. @@ -28,25 +28,10 @@ pip install . Navigate to the example folder with the training script and install the required dependencies for the script you're using: - - - ```bash cd examples/textual_inversion pip install -r requirements.txt ``` - - - - -```bash -cd examples/textual_inversion -pip install -r requirements_flax.txt -``` - - - - 🤗 Accelerate is a library for helping you train on multiple GPUs/TPUs or with mixed-precision. It'll automatically configure your training setup based on your hardware and environment. Take a look at the 🤗 Accelerate [Quick tour](https://huggingface.co/docs/accelerate/quicktour) to learn more. @@ -189,9 +174,6 @@ One more thing before you launch the script. If you're interested in following a --validation_steps=100 ``` - - - ```bash export MODEL_NAME="stable-diffusion-v1-5/stable-diffusion-v1-5" export DATA_DIR="./cat" @@ -214,36 +196,8 @@ accelerate launch textual_inversion.py \ --push_to_hub ``` - - - -```bash -export MODEL_NAME="duongna/stable-diffusion-v1-4-flax" -export DATA_DIR="./cat" - -python textual_inversion_flax.py \ - --pretrained_model_name_or_path=$MODEL_NAME \ - --train_data_dir=$DATA_DIR \ - --learnable_property="object" \ - --placeholder_token="" \ - --initializer_token="toy" \ - --resolution=512 \ - --train_batch_size=1 \ - --max_train_steps=3000 \ - --learning_rate=5.0e-04 \ - --scale_lr \ - --output_dir="textual_inversion_cat" \ - --push_to_hub -``` - - - - After training is complete, you can use your newly trained model for inference like: - - - ```py from diffusers import StableDiffusionPipeline import torch @@ -254,42 +208,6 @@ image = pipeline("A train", num_inference_steps=50).images[0] image.save("cat-train.png") ``` - - - -Flax doesn't support the [`~loaders.TextualInversionLoaderMixin.load_textual_inversion`] method, but the textual_inversion_flax.py script [saves](https://github.com/huggingface/diffusers/blob/c0f058265161178f2a88849e92b37ffdc81f1dcc/examples/textual_inversion/textual_inversion_flax.py#L636C2-L636C2) the learned embeddings as a part of the model after training. This means you can use the model for inference like any other Flax model: - -```py -import jax -import numpy as np -from flax.jax_utils import replicate -from flax.training.common_utils import shard -from diffusers import FlaxStableDiffusionPipeline - -model_path = "path-to-your-trained-model" -pipeline, params = FlaxStableDiffusionPipeline.from_pretrained(model_path, dtype=jax.numpy.bfloat16) - -prompt = "A train" -prng_seed = jax.random.PRNGKey(0) -num_inference_steps = 50 - -num_samples = jax.device_count() -prompt = num_samples * [prompt] -prompt_ids = pipeline.prepare_inputs(prompt) - -# shard inputs and rng -params = replicate(params) -prng_seed = jax.random.split(prng_seed, jax.device_count()) -prompt_ids = shard(prompt_ids) - -images = pipeline(prompt_ids, params, prng_seed, num_inference_steps, jit=True).images -images = pipeline.numpy_to_pil(np.asarray(images.reshape((num_samples,) + images.shape[-3:]))) -image.save("cat-train.png") -``` - - - - ## Next steps Congratulations on training your own Textual Inversion model! 🎉 To learn more about how to use your new model, the following guides may be helpful: diff --git a/docs/source/en/using-diffusers/schedulers.md b/docs/source/en/using-diffusers/schedulers.md index aabb9dd31c96..6d928f8037c4 100644 --- a/docs/source/en/using-diffusers/schedulers.md +++ b/docs/source/en/using-diffusers/schedulers.md @@ -165,53 +165,6 @@ image Most images look very similar and are comparable in quality. Again, it often comes down to your specific use case so a good approach is to run multiple different schedulers and compare the results. -### Flax schedulers - -To compare Flax schedulers, you need to additionally load the scheduler state into the model parameters. For example, let's change the default scheduler in [`FlaxStableDiffusionPipeline`] to use the super fast [`FlaxDPMSolverMultistepScheduler`]. - -> [!WARNING] -> The [`FlaxLMSDiscreteScheduler`] and [`FlaxDDPMScheduler`] are not compatible with the [`FlaxStableDiffusionPipeline`] yet. - -```py -import jax -import numpy as np -from flax.jax_utils import replicate -from flax.training.common_utils import shard -from diffusers import FlaxStableDiffusionPipeline, FlaxDPMSolverMultistepScheduler - -scheduler, scheduler_state = FlaxDPMSolverMultistepScheduler.from_pretrained( - "stable-diffusion-v1-5/stable-diffusion-v1-5", - subfolder="scheduler" -) -pipeline, params = FlaxStableDiffusionPipeline.from_pretrained( - "stable-diffusion-v1-5/stable-diffusion-v1-5", - scheduler=scheduler, - variant="bf16", - dtype=jax.numpy.bfloat16, -) -params["scheduler"] = scheduler_state -``` - -Then you can take advantage of Flax's compatibility with TPUs to generate a number of images in parallel. You'll need to make a copy of the model parameters for each available device and then split the inputs across them to generate your desired number of images. - -```py -# Generate 1 image per parallel device (8 on TPUv2-8 or TPUv3-8) -prompt = "A photograph of an astronaut riding a horse on Mars, high resolution, high definition." -num_samples = jax.device_count() -prompt_ids = pipeline.prepare_inputs([prompt] * num_samples) - -prng_seed = jax.random.PRNGKey(0) -num_inference_steps = 25 - -# shard inputs and rng -params = replicate(params) -prng_seed = jax.random.split(prng_seed, jax.device_count()) -prompt_ids = shard(prompt_ids) - -images = pipeline(prompt_ids, params, prng_seed, num_inference_steps, jit=True).images -images = pipeline.numpy_to_pil(np.asarray(images.reshape((num_samples,) + images.shape[-3:]))) -``` - ## Models Models are loaded from the [`ModelMixin.from_pretrained`] method, which downloads and caches the latest version of the model weights and configurations. If the latest files are available in the local cache, [`~ModelMixin.from_pretrained`] reuses files in the cache instead of re-downloading them. diff --git a/docs/source/en/using-diffusers/stable_diffusion_jax_how_to.md b/docs/source/en/using-diffusers/stable_diffusion_jax_how_to.md deleted file mode 100644 index ac9ffe0dfc11..000000000000 --- a/docs/source/en/using-diffusers/stable_diffusion_jax_how_to.md +++ /dev/null @@ -1,225 +0,0 @@ - - -# JAX/Flax - -[[open-in-colab]] - -🤗 Diffusers supports Flax for super fast inference on Google TPUs, such as those available in Colab, Kaggle or Google Cloud Platform. This guide shows you how to run inference with Stable Diffusion using JAX/Flax. - -Before you begin, make sure you have the necessary libraries installed: - -```py -# uncomment to install the necessary libraries in Colab -#!pip install -q jax==0.3.25 jaxlib==0.3.25 flax transformers ftfy -#!pip install -q diffusers -``` - -You should also make sure you're using a TPU backend. While JAX does not run exclusively on TPUs, you'll get the best performance on a TPU because each server has 8 TPU accelerators working in parallel. - -If you are running this guide in Colab, select *Runtime* in the menu above, select the option *Change runtime type*, and then select *TPU* under the *Hardware accelerator* setting. Import JAX and quickly check whether you're using a TPU: - -```python -import jax -import jax.tools.colab_tpu -jax.tools.colab_tpu.setup_tpu() - -num_devices = jax.device_count() -device_type = jax.devices()[0].device_kind - -print(f"Found {num_devices} JAX devices of type {device_type}.") -assert ( - "TPU" in device_type, - "Available device is not a TPU, please select TPU from Runtime > Change runtime type > Hardware accelerator" -) -# Found 8 JAX devices of type Cloud TPU. -``` - -Great, now you can import the rest of the dependencies you'll need: - -```python -import jax.numpy as jnp -from jax import pmap -from flax.jax_utils import replicate -from flax.training.common_utils import shard - -from diffusers import FlaxStableDiffusionPipeline -``` - -## Load a model - -Flax is a functional framework, so models are stateless and parameters are stored outside of them. Loading a pretrained Flax pipeline returns *both* the pipeline and the model weights (or parameters). In this guide, you'll use `bfloat16`, a more efficient half-float type that is supported by TPUs (you can also use `float32` for full precision if you want). - -```python -dtype = jnp.bfloat16 -pipeline, params = FlaxStableDiffusionPipeline.from_pretrained( - "CompVis/stable-diffusion-v1-4", - variant="bf16", - dtype=dtype, -) -``` - -## Inference - -TPUs usually have 8 devices working in parallel, so let's use the same prompt for each device. This means you can perform inference on 8 devices at once, with each device generating one image. As a result, you'll get 8 images in the same amount of time it takes for one chip to generate a single image! - - - -Learn more details in the [How does parallelization work?](#how-does-parallelization-work) section. - - - -After replicating the prompt, get the tokenized text ids by calling the `prepare_inputs` function on the pipeline. The length of the tokenized text is set to 77 tokens as required by the configuration of the underlying CLIP text model. - -```python -prompt = "A cinematic film still of Morgan Freeman starring as Jimi Hendrix, portrait, 40mm lens, shallow depth of field, close up, split lighting, cinematic" -prompt = [prompt] * jax.device_count() -prompt_ids = pipeline.prepare_inputs(prompt) -prompt_ids.shape -# (8, 77) -``` - -Model parameters and inputs have to be replicated across the 8 parallel devices. The parameters dictionary is replicated with [`flax.jax_utils.replicate`](https://flax.readthedocs.io/en/latest/api_reference/flax.jax_utils.html#flax.jax_utils.replicate) which traverses the dictionary and changes the shape of the weights so they are repeated 8 times. Arrays are replicated using `shard`. - -```python -# parameters -p_params = replicate(params) - -# arrays -prompt_ids = shard(prompt_ids) -prompt_ids.shape -# (8, 1, 77) -``` - -This shape means each one of the 8 devices receives as an input a `jnp` array with shape `(1, 77)`, where `1` is the batch size per device. On TPUs with sufficient memory, you could have a batch size larger than `1` if you want to generate multiple images (per chip) at once. - -Next, create a random number generator to pass to the generation function. This is standard procedure in Flax, which is very serious and opinionated about random numbers. All functions that deal with random numbers are expected to receive a generator to ensure reproducibility, even when you're training across multiple distributed devices. - -The helper function below uses a seed to initialize a random number generator. As long as you use the same seed, you'll get the exact same results. Feel free to use different seeds when exploring results later in the guide. - -```python -def create_key(seed=0): - return jax.random.PRNGKey(seed) -``` - -The helper function, or `rng`, is split 8 times so each device receives a different generator and generates a different image. - -```python -rng = create_key(0) -rng = jax.random.split(rng, jax.device_count()) -``` - -To take advantage of JAX's optimized speed on a TPU, pass `jit=True` to the pipeline to compile the JAX code into an efficient representation and to ensure the model runs in parallel across the 8 devices. - - - -You need to ensure all your inputs have the same shape in subsequent calls, otherwise JAX will need to recompile the code which is slower. - - - -The first inference run takes more time because it needs to compile the code, but subsequent calls (even with different inputs) are much faster. For example, it took more than a minute to compile on a TPU v2-8, but then it takes about **7s** on a future inference run! - -```py -%%time -images = pipeline(prompt_ids, p_params, rng, jit=True)[0] - -# CPU times: user 56.2 s, sys: 42.5 s, total: 1min 38s -# Wall time: 1min 29s -``` - -The returned array has shape `(8, 1, 512, 512, 3)` which should be reshaped to remove the second dimension and get 8 images of `512 × 512 × 3`. Then you can use the [`~utils.numpy_to_pil`] function to convert the arrays into images. - -```python -from diffusers.utils import make_image_grid - -images = images.reshape((images.shape[0] * images.shape[1],) + images.shape[-3:]) -images = pipeline.numpy_to_pil(images) -make_image_grid(images, rows=2, cols=4) -``` - -![img](https://huggingface.co/datasets/YiYiXu/test-doc-assets/resolve/main/stable_diffusion_jax_how_to_cell_38_output_0.jpeg) - -## Using different prompts - -You don't necessarily have to use the same prompt on all devices. For example, to generate 8 different prompts: - -```python -prompts = [ - "Labrador in the style of Hokusai", - "Painting of a squirrel skating in New York", - "HAL-9000 in the style of Van Gogh", - "Times Square under water, with fish and a dolphin swimming around", - "Ancient Roman fresco showing a man working on his laptop", - "Close-up photograph of young black woman against urban background, high quality, bokeh", - "Armchair in the shape of an avocado", - "Clown astronaut in space, with Earth in the background", -] - -prompt_ids = pipeline.prepare_inputs(prompts) -prompt_ids = shard(prompt_ids) - -images = pipeline(prompt_ids, p_params, rng, jit=True).images -images = images.reshape((images.shape[0] * images.shape[1],) + images.shape[-3:]) -images = pipeline.numpy_to_pil(images) - -make_image_grid(images, 2, 4) -``` - -![img](https://huggingface.co/datasets/YiYiXu/test-doc-assets/resolve/main/stable_diffusion_jax_how_to_cell_43_output_0.jpeg) - -## How does parallelization work? - -The Flax pipeline in 🤗 Diffusers automatically compiles the model and runs it in parallel on all available devices. Let's take a closer look at how that process works. - -JAX parallelization can be done in multiple ways. The easiest one revolves around using the [`jax.pmap`](https://jax.readthedocs.io/en/latest/_autosummary/jax.pmap.html) function to achieve single-program multiple-data (SPMD) parallelization. It means running several copies of the same code, each on different data inputs. More sophisticated approaches are possible, and you can go over to the JAX [documentation](https://jax.readthedocs.io/en/latest/index.html) to explore this topic in more detail if you are interested! - -`jax.pmap` does two things: - -1. Compiles (or "`jit`s") the code which is similar to `jax.jit()`. This does not happen when you call `pmap`, and only the first time the `pmap`ped function is called. -2. Ensures the compiled code runs in parallel on all available devices. - -To demonstrate, call `pmap` on the pipeline's `_generate` method (this is a private method that generates images and may be renamed or removed in future releases of 🤗 Diffusers): - -```python -p_generate = pmap(pipeline._generate) -``` - -After calling `pmap`, the prepared function `p_generate` will: - -1. Make a copy of the underlying function, `pipeline._generate`, on each device. -2. Send each device a different portion of the input arguments (this is why it's necessary to call the *shard* function). In this case, `prompt_ids` has shape `(8, 1, 77, 768)` so the array is split into 8 and each copy of `_generate` receives an input with shape `(1, 77, 768)`. - -The most important thing to pay attention to here is the batch size (1 in this example), and the input dimensions that make sense for your code. You don't have to change anything else to make the code work in parallel. - -The first time you call the pipeline takes more time, but the calls afterward are much faster. The `block_until_ready` function is used to correctly measure inference time because JAX uses asynchronous dispatch and returns control to the Python loop as soon as it can. You don't need to use that in your code; blocking occurs automatically when you want to use the result of a computation that has not yet been materialized. - -```py -%%time -images = p_generate(prompt_ids, p_params, rng) -images = images.block_until_ready() - -# CPU times: user 1min 15s, sys: 18.2 s, total: 1min 34s -# Wall time: 1min 15s -``` - -Check your image dimensions to see if they're correct: - -```python -images.shape -# (8, 1, 512, 512, 3) -``` - -## Resources - -To learn more about how JAX works with Stable Diffusion, you may be interested in reading: - -* [Accelerating Stable Diffusion XL Inference with JAX on Cloud TPU v5e](https://hf.co/blog/sdxl_jax)