Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ In the following, we give an overview of different ways to contribute, ranked by
As said before, **all contributions are valuable to the community**.
In the following, we will explain each contribution a bit more in detail.

For all contributions 4.-9. you will need to open a PR. It is explained in detail how to do so in [Opening a pull requst](#how-to-open-a-pr)
For all contributions 4.-9. you will need to open a PR. It is explained in detail how to do so in [Opening a pull request](#how-to-open-a-pr)

### 1. Asking and answering questions on the Diffusers discussion forum or on the Diffusers Discord

Expand All @@ -63,7 +63,7 @@ In the same spirit, you are of immense help to the community by answering such q

**Please** keep in mind that the more effort you put into asking or answering a question, the higher
the quality of the publicly documented knowledge. In the same way, well-posed and well-answered questions create a high-quality knowledge database accessible to everybody, while badly posed questions or answers reduce the overall quality of the public knowledge database.
In short, a high quality question or answer is *precise*, *concise*, *relevant*, *easy-to-understand*, *accesible*, and *well-formated/well-posed*. For more information, please have a look through the [How to write a good issue](#how-to-write-a-good-issue) section.
In short, a high quality question or answer is *precise*, *concise*, *relevant*, *easy-to-understand*, *accessible*, and *well-formated/well-posed*. For more information, please have a look through the [How to write a good issue](#how-to-write-a-good-issue) section.

**NOTE about channels**:
[*The forum*](https://discuss.huggingface.co/c/discussion-related-to-httpsgithubcomhuggingfacediffusers/63) is much better indexed by search engines, such as Google. Posts are ranked by popularity rather than chronologically. Hence, it's easier to look up questions and answers that we posted some time ago.
Expand Down Expand Up @@ -168,7 +168,7 @@ more precise, provide the link to a duplicated issue or redirect them to [the fo
If you have verified that the issued bug report is correct and requires a correction in the source code,
please have a look at the next sections.

For all of the following contributions, you will need to open a PR. It is explained in detail how to do so in the [Opening a pull requst](#how-to-open-a-pr) section.
For all of the following contributions, you will need to open a PR. It is explained in detail how to do so in the [Opening a pull request](#how-to-open-a-pr) section.

### 4. Fixing a "Good first issue"

Expand Down
2 changes: 1 addition & 1 deletion docs/source/en/api/pipelines/diffedit.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ this in the generated mask, you simply have to set the embeddings related to the
`source_prompt` and "dog" to `target_prompt`.
* When generating partially inverted latents using `invert`, assign a caption or text embedding describing the
overall image to the `prompt` argument to help guide the inverse latent sampling process. In most cases, the
source concept is sufficently descriptive to yield good results, but feel free to explore alternatives.
source concept is sufficiently descriptive to yield good results, but feel free to explore alternatives.
* When calling the pipeline to generate the final edited image, assign the source concept to `negative_prompt`
and the target concept to `prompt`. Taking the above example, you simply have to set the embeddings related to
the phrases including "cat" to `negative_prompt` and "dog" to `prompt`.
Expand Down
2 changes: 1 addition & 1 deletion docs/source/en/api/pipelines/kandinsky.md
Original file line number Diff line number Diff line change
Expand Up @@ -396,7 +396,7 @@ t2i_pipe.unet.set_attn_processor(AttnAddedKVProcessor())
```

With PyTorch >= 2.0, you can also use Kandinsky with `torch.compile` which depending
on your hardware can signficantly speed-up your inference time once the model is compiled.
on your hardware can significantly speed-up your inference time once the model is compiled.
To use Kandinsksy with `torch.compile`, you can do:

```py
Expand Down
2 changes: 1 addition & 1 deletion docs/source/en/api/pipelines/kandinsky_v22.md
Original file line number Diff line number Diff line change
Expand Up @@ -263,7 +263,7 @@ t2i_pipe.unet.set_attn_processor(AttnAddedKVProcessor())
```

With PyTorch >= 2.0, you can also use Kandinsky with `torch.compile` which depending
on your hardware can signficantly speed-up your inference time once the model is compiled.
on your hardware can significantly speed-up your inference time once the model is compiled.
To use Kandinsksy with `torch.compile`, you can do:

```py
Expand Down
6 changes: 3 additions & 3 deletions docs/source/en/conceptual/contribution.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ In the following, we give an overview of different ways to contribute, ranked by
As said before, **all contributions are valuable to the community**.
In the following, we will explain each contribution a bit more in detail.

For all contributions 4.-9. you will need to open a PR. It is explained in detail how to do so in [Opening a pull requst](#how-to-open-a-pr)
For all contributions 4.-9. you will need to open a PR. It is explained in detail how to do so in [Opening a pull request](#how-to-open-a-pr)

### 1. Asking and answering questions on the Diffusers discussion forum or on the Diffusers Discord

Expand All @@ -63,7 +63,7 @@ In the same spirit, you are of immense help to the community by answering such q

**Please** keep in mind that the more effort you put into asking or answering a question, the higher
the quality of the publicly documented knowledge. In the same way, well-posed and well-answered questions create a high-quality knowledge database accessible to everybody, while badly posed questions or answers reduce the overall quality of the public knowledge database.
In short, a high quality question or answer is *precise*, *concise*, *relevant*, *easy-to-understand*, *accesible*, and *well-formated/well-posed*. For more information, please have a look through the [How to write a good issue](#how-to-write-a-good-issue) section.
In short, a high quality question or answer is *precise*, *concise*, *relevant*, *easy-to-understand*, *accessible*, and *well-formated/well-posed*. For more information, please have a look through the [How to write a good issue](#how-to-write-a-good-issue) section.

**NOTE about channels**:
[*The forum*](https://discuss.huggingface.co/c/discussion-related-to-httpsgithubcomhuggingfacediffusers/63) is much better indexed by search engines, such as Google. Posts are ranked by popularity rather than chronologically. Hence, it's easier to look up questions and answers that we posted some time ago.
Expand Down Expand Up @@ -168,7 +168,7 @@ more precise, provide the link to a duplicated issue or redirect them to [the fo
If you have verified that the issued bug report is correct and requires a correction in the source code,
please have a look at the next sections.

For all of the following contributions, you will need to open a PR. It is explained in detail how to do so in the [Opening a pull requst](#how-to-open-a-pr) section.
For all of the following contributions, you will need to open a PR. It is explained in detail how to do so in the [Opening a pull request](#how-to-open-a-pr) section.

### 4. Fixing a `Good first issue`

Expand Down
2 changes: 1 addition & 1 deletion docs/source/en/optimization/torch2.0.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True)
images = pipe(prompt, num_inference_steps=steps, num_images_per_prompt=batch_size).images[0]
```

Depending on GPU type, `torch.compile` can provide an *addtional speed-up* of **5-300x** on top of SDPA! If you're using more recent GPU architectures such as Ampere (A100, 3090), Ada (4090), and Hopper (H100), `torch.compile` is able to squeeze even more performance out of these GPUs.
Depending on GPU type, `torch.compile` can provide an *additional speed-up* of **5-300x** on top of SDPA! If you're using more recent GPU architectures such as Ampere (A100, 3090), Ada (4090), and Hopper (H100), `torch.compile` is able to squeeze even more performance out of these GPUs.

Compilation requires some time to complete, so it is best suited for situations where you prepare your pipeline once and then perform the same type of inference operations multiple times. For example, calling the compiled pipeline on a different image size triggers compilation again which can be expensive.

Expand Down
4 changes: 2 additions & 2 deletions docs/source/en/training/custom_diffusion.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ write_basic_config()

Now let's get our dataset. Download dataset from [here](https://www.cs.cmu.edu/~custom-diffusion/assets/data.zip) and unzip it. To use your own dataset, take a look at the [Create a dataset for training](create_dataset) guide.

We also collect 200 real images using `clip-retrieval` which are combined with the target images in the training dataset as a regularization. This prevents overfitting to the the given target image. The following flags enable the regularization `with_prior_preservation`, `real_prior` with `prior_loss_weight=1.`.
We also collect 200 real images using `clip-retrieval` which are combined with the target images in the training dataset as a regularization. This prevents overfitting to the given target image. The following flags enable the regularization `with_prior_preservation`, `real_prior` with `prior_loss_weight=1.`.
The `class_prompt` should be the category name same as target image. The collected real images are with text captions similar to the `class_prompt`. The retrieved image are saved in `class_data_dir`. You can disable `real_prior` to use generated images as regularization. To collect the real images use this command first before training.

```bash
Expand Down Expand Up @@ -106,7 +106,7 @@ accelerate launch train_custom_diffusion.py \

**Use `--enable_xformers_memory_efficient_attention` for faster training with lower VRAM requirement (16GB per GPU). Follow [this guide](https://github.com/facebookresearch/xformers) for installation instructions.**

To track your experiments using Weights and Biases (`wandb`) and to save intermediate results (whcih we HIGHLY recommend), follow these steps:
To track your experiments using Weights and Biases (`wandb`) and to save intermediate results (which we HIGHLY recommend), follow these steps:

* Install `wandb`: `pip install wandb`.
* Authorize: `wandb login`.
Expand Down
2 changes: 1 addition & 1 deletion docs/source/en/training/text_inversion.md
Original file line number Diff line number Diff line change
Expand Up @@ -192,7 +192,7 @@ been added to the text encoder embedding matrix and consequently been trained.
<Tip>

💡 The community has created a large library of different textual inversion embedding vectors, called [sd-concepts-library](https://huggingface.co/sd-concepts-library).
Instead of training textual inversion embeddings from scratch you can also see whether a fitting textual inversion embedding has already been added to the libary.
Instead of training textual inversion embeddings from scratch you can also see whether a fitting textual inversion embedding has already been added to the library.

</Tip>

Expand Down
2 changes: 1 addition & 1 deletion docs/source/en/using-diffusers/controlnet.md
Original file line number Diff line number Diff line change
Expand Up @@ -434,7 +434,7 @@ high_threshold = 200

canny_image = cv2.Canny(canny_image, low_threshold, high_threshold)

# zero out middle columns of image where pose will be overlayed
# zero out middle columns of image where pose will be overlaid
zero_start = canny_image.shape[1] // 4
zero_end = zero_start + canny_image.shape[1] // 2
canny_image[:, zero_start:zero_end] = 0
Expand Down
2 changes: 1 addition & 1 deletion docs/source/en/using-diffusers/shap-e.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ export_to_gif(images[1], "cake_3d.gif")

## Image-to-3D

To generate a 3D object from another image, use the [`ShapEImg2ImgPipeline`]. You can use an existing image or generate an entirely new one. Let's use the the [Kandinsky 2.1](../api/pipelines/kandinsky) model to generate a new image.
To generate a 3D object from another image, use the [`ShapEImg2ImgPipeline`]. You can use an existing image or generate an entirely new one. Let's use the [Kandinsky 2.1](../api/pipelines/kandinsky) model to generate a new image.

```py
from diffusers import DiffusionPipeline
Expand Down
2 changes: 1 addition & 1 deletion examples/community/run_onnx_controlnet.py
Original file line number Diff line number Diff line change
Expand Up @@ -553,7 +553,7 @@ def __call__(
instead.
image (`torch.FloatTensor`, `PIL.Image.Image`, `np.ndarray`, `List[torch.FloatTensor]`, `List[PIL.Image.Image]`, `List[np.ndarray]`,:
`List[List[torch.FloatTensor]]`, `List[List[np.ndarray]]` or `List[List[PIL.Image.Image]]`):
The initial image will be used as the starting point for the image generation process. Can also accpet
The initial image will be used as the starting point for the image generation process. Can also accept
image latents as `image`, if passing latents directly, it will not be encoded again.
control_image (`torch.FloatTensor`, `PIL.Image.Image`, `np.ndarray`, `List[torch.FloatTensor]`, `List[PIL.Image.Image]`, `List[np.ndarray]`,:
`List[List[torch.FloatTensor]]`, `List[List[np.ndarray]]` or `List[List[PIL.Image.Image]]`):
Expand Down
2 changes: 1 addition & 1 deletion examples/community/run_tensorrt_controlnet.py
Original file line number Diff line number Diff line change
Expand Up @@ -657,7 +657,7 @@ def __call__(
instead.
image (`torch.FloatTensor`, `PIL.Image.Image`, `np.ndarray`, `List[torch.FloatTensor]`, `List[PIL.Image.Image]`, `List[np.ndarray]`,:
`List[List[torch.FloatTensor]]`, `List[List[np.ndarray]]` or `List[List[PIL.Image.Image]]`):
The initial image will be used as the starting point for the image generation process. Can also accpet
The initial image will be used as the starting point for the image generation process. Can also accept
image latents as `image`, if passing latents directly, it will not be encoded again.
control_image (`torch.FloatTensor`, `PIL.Image.Image`, `np.ndarray`, `List[torch.FloatTensor]`, `List[PIL.Image.Image]`, `List[np.ndarray]`,:
`List[List[torch.FloatTensor]]`, `List[List[np.ndarray]]` or `List[List[PIL.Image.Image]]`):
Expand Down
4 changes: 2 additions & 2 deletions examples/custom_diffusion/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ write_basic_config()

Now let's get our dataset. Download dataset from [here](https://www.cs.cmu.edu/~custom-diffusion/assets/data.zip) and unzip it.

We also collect 200 real images using `clip-retrieval` which are combined with the target images in the training dataset as a regularization. This prevents overfitting to the the given target image. The following flags enable the regularization `with_prior_preservation`, `real_prior` with `prior_loss_weight=1.`.
We also collect 200 real images using `clip-retrieval` which are combined with the target images in the training dataset as a regularization. This prevents overfitting to the given target image. The following flags enable the regularization `with_prior_preservation`, `real_prior` with `prior_loss_weight=1.`.
The `class_prompt` should be the category name same as target image. The collected real images are with text captions similar to the `class_prompt`. The retrieved image are saved in `class_data_dir`. You can disable `real_prior` to use generated images as regularization. To collect the real images use this command first before training.

```bash
Expand Down Expand Up @@ -82,7 +82,7 @@ accelerate launch train_custom_diffusion.py \

**Use `--enable_xformers_memory_efficient_attention` for faster training with lower VRAM requirement (16GB per GPU). Follow [this guide](https://github.com/facebookresearch/xformers) for installation instructions.**

To track your experiments using Weights and Biases (`wandb`) and to save intermediate results (whcih we HIGHLY recommend), follow these steps:
To track your experiments using Weights and Biases (`wandb`) and to save intermediate results (which we HIGHLY recommend), follow these steps:

* Install `wandb`: `pip install wandb`.
* Authorize: `wandb login`.
Expand Down
2 changes: 1 addition & 1 deletion examples/dreambooth/train_dreambooth.py
Original file line number Diff line number Diff line change
Expand Up @@ -1119,7 +1119,7 @@ def compute_text_embeddings(prompt):
unet, optimizer, train_dataloader, lr_scheduler
)

# For mixed precision training we cast all non-trainable weigths (vae, non-lora text_encoder and non-lora unet) to half-precision
# For mixed precision training we cast all non-trainable weights (vae, non-lora text_encoder and non-lora unet) to half-precision
# as these weights are only used for inference, keeping weights in full precision is not required.
weight_dtype = torch.float32
if accelerator.mixed_precision == "fp16":
Expand Down
2 changes: 1 addition & 1 deletion examples/dreambooth/train_dreambooth_lora.py
Original file line number Diff line number Diff line change
Expand Up @@ -794,7 +794,7 @@ def main(args):
text_encoder.requires_grad_(False)
unet.requires_grad_(False)

# For mixed precision training we cast all non-trainable weigths (vae, non-lora text_encoder and non-lora unet) to half-precision
# For mixed precision training we cast all non-trainable weights (vae, non-lora text_encoder and non-lora unet) to half-precision
# as these weights are only used for inference, keeping weights in full precision is not required.
weight_dtype = torch.float32
if accelerator.mixed_precision == "fp16":
Expand Down
2 changes: 1 addition & 1 deletion examples/dreambooth/train_dreambooth_lora_sdxl.py
Original file line number Diff line number Diff line change
Expand Up @@ -707,7 +707,7 @@ def main(args):
text_encoder_two.requires_grad_(False)
unet.requires_grad_(False)

# For mixed precision training we cast all non-trainable weigths (vae, non-lora text_encoder and non-lora unet) to half-precision
# For mixed precision training we cast all non-trainable weights (vae, non-lora text_encoder and non-lora unet) to half-precision
# as these weights are only used for inference, keeping weights in full precision is not required.
weight_dtype = torch.float32
if accelerator.mixed_precision == "fp16":
Expand Down
2 changes: 1 addition & 1 deletion examples/research_projects/colossalai/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ The `text` include the tag `Teyvat`, `Name`,`Element`, `Weapon`, `Region`, `Mode

## Training

The arguement `placement` can be `cpu`, `auto`, `cuda`, with `cpu` the GPU RAM required can be minimized to 4GB but will deceleration, with `cuda` you can also reduce GPU memory by half but accelerated training, with `auto` a more balanced solution for speed and memory can be obtained。
The argument `placement` can be `cpu`, `auto`, `cuda`, with `cpu` the GPU RAM required can be minimized to 4GB but will deceleration, with `cuda` you can also reduce GPU memory by half but accelerated training, with `auto` a more balanced solution for speed and memory can be obtained。

**___Note: Change the `resolution` to 768 if you are using the [stable-diffusion-2](https://huggingface.co/stabilityai/stable-diffusion-2) 768x768 model.___**

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -323,7 +323,7 @@ accelerate launch train_dreambooth.py \

### Using DreamBooth for other pipelines than Stable Diffusion

Altdiffusion also support dreambooth now, the runing comman is basically the same as abouve, all you need to do is replace the `MODEL_NAME` like this:
Altdiffusion also support dreambooth now, the runing comman is basically the same as above, all you need to do is replace the `MODEL_NAME` like this:
One can now simply change the `pretrained_model_name_or_path` to another architecture such as [`AltDiffusion`](https://huggingface.co/docs/diffusers/api/pipelines/alt_diffusion).

```
Expand Down
2 changes: 1 addition & 1 deletion examples/research_projects/sdxl_flax/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -151,7 +151,7 @@ telling JAX which input arguments are static, that is, arguments that
are known at compile time and won't change. In our case, it is num_inference_steps,
height, width and return_latents.

Once the function is compiled, these parameters are ommited from future calls and
Once the function is compiled, these parameters are omitted from future calls and
cannot be changed without modifying the code and recompiling.

```python
Expand Down
4 changes: 2 additions & 2 deletions src/diffusers/pipelines/shap_e/renderer.py
Original file line number Diff line number Diff line change
Expand Up @@ -911,7 +911,7 @@ def decode_to_image(
n_coarse_samples=64,
n_fine_samples=128,
):
# project the the paramters from the generated latents
# project the parameters from the generated latents
projected_params = self.params_proj(latents)

# update the mlp layers of the renderer
Expand Down Expand Up @@ -955,7 +955,7 @@ def decode_to_mesh(
query_batch_size: int = 4096,
texture_channels: Tuple = ("R", "G", "B"),
):
# 1. project the the paramters from the generated latents
# 1. project the parameters from the generated latents
projected_params = self.params_proj(latents)

# 2. update the mlp layers of the renderer
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ class UniDiffuserTextDecoder(ModelMixin, ConfigMixin, ModuleUtilsMixin):
prefix_length (`int`):
Max number of prefix tokens that will be supplied to the model.
prefix_inner_dim (`int`):
The hidden size of the the incoming prefix embeddings. For UniDiffuser, this would be the hidden dim of the
The hidden size of the incoming prefix embeddings. For UniDiffuser, this would be the hidden dim of the
CLIP text encoder.
prefix_hidden_dim (`int`, *optional*):
Hidden dim of the MLP if we encode the prefix.
Expand Down
Loading