diff --git a/docs/source/en/training/lora.md b/docs/source/en/training/lora.md index 9e87c224f428..78ac8a140e7c 100644 --- a/docs/source/en/training/lora.md +++ b/docs/source/en/training/lora.md @@ -77,7 +77,7 @@ accelerate config default Or if your environment doesn't support an interactive shell, like a notebook, you can use: -```bash +```py from accelerate.utils import write_basic_config write_basic_config() @@ -170,7 +170,7 @@ Aside from setting up the LoRA layers, the training script is more or less the s Once you've made all your changes or you're okay with the default configuration, you're ready to launch the training script! 🚀 -Let's train on the [Pokémon BLIP captions](https://huggingface.co/datasets/lambdalabs/pokemon-blip-captions) dataset to generate our yown Pokémon. Set the environment variables `MODEL_NAME` and `DATASET_NAME` to the model and dataset respectively. You should also specify where to save the model in `OUTPUT_DIR`, and the name of the model to save to on the Hub with `HUB_MODEL_ID`. The script creates and saves the following files to your repository: +Let's train on the [Pokémon BLIP captions](https://huggingface.co/datasets/lambdalabs/pokemon-blip-captions) dataset to generate our own Pokémon. Set the environment variables `MODEL_NAME` and `DATASET_NAME` to the model and dataset respectively. You should also specify where to save the model in `OUTPUT_DIR`, and the name of the model to save to on the Hub with `HUB_MODEL_ID`. The script creates and saves the following files to your repository: - saved model checkpoints - `pytorch_lora_weights.safetensors` (the trained LoRA weights) diff --git a/docs/source/en/using-diffusers/freeu.md b/docs/source/en/using-diffusers/freeu.md index 911cdf7e98d5..7b1fb908cac9 100644 --- a/docs/source/en/using-diffusers/freeu.md +++ b/docs/source/en/using-diffusers/freeu.md @@ -128,7 +128,7 @@ seed = 2023 # The values come from # https://github.com/lyn-rgb/FreeU_Diffusers#video-pipelines pipe.enable_freeu(b1=1.2, b2=1.4, s1=0.9, s2=0.2) -video_frames = pipe(prompt, height=320, width=576, num_frames=30, generator=torch.manual_seed(seed)).frames +video_frames = pipe(prompt, height=320, width=576, num_frames=30, generator=torch.manual_seed(seed)).frames[0] export_to_video(video_frames, "astronaut_rides_horse.mp4") ``` diff --git a/examples/community/README.md b/examples/community/README.md index f69a81c59baf..d1da1b231faf 100755 --- a/examples/community/README.md +++ b/examples/community/README.md @@ -750,7 +750,7 @@ This example produces the following images: ![image](https://user-images.githubusercontent.com/4313860/198328706-295824a4-9856-4ce5-8e66-278ceb42fd29.png) ### GlueGen Stable Diffusion Pipeline -GlueGen is a minimal adapter that allow alignment between any encoder (Text Encoder of different language, Multilingual Roberta, AudioClip) and CLIP text encoder used in standard Stable Diffusion model. This method allows easy language adaptation to available english Stable Diffusion checkpoints without the need of an image captioning dataset as well as long training hours. +GlueGen is a minimal adapter that allow alignment between any encoder (Text Encoder of different language, Multilingual Roberta, AudioClip) and CLIP text encoder used in standard Stable Diffusion model. This method allows easy language adaptation to available english Stable Diffusion checkpoints without the need of an image captioning dataset as well as long training hours. Make sure you downloaded `gluenet_French_clip_overnorm_over3_noln.ckpt` for French (there are also pre-trained weights for Chinese, Italian, Japanese, Spanish or train your own) at [GlueGen's official repo](https://github.com/salesforce/GlueGen/tree/main) @@ -782,9 +782,9 @@ if __name__ == "__main__": ).to(device) pipeline.load_language_adapter("gluenet_French_clip_overnorm_over3_noln.ckpt", num_token=token_max_length, dim=1024, dim_out=768, tensor_norm=tensor_norm) - prompt = "une voiture sur la plage" + prompt = "une voiture sur la plage" - generator = torch.Generator(device=device).manual_seed(42) + generator = torch.Generator(device=device).manual_seed(42) image = pipeline(prompt, generator=generator).images[0] image.save("gluegen_output_fr.png") ``` @@ -1755,7 +1755,7 @@ with torch.cpu.amp.autocast(enabled=True, dtype=torch.bfloat16): ``` The following code compares the performance of the original stable diffusion xl pipeline with the ipex-optimized pipeline. -By using this optimized pipeline, we can get about 1.4-2 times performance boost with BFloat16 on fourth generation of Intel Xeon CPUs, +By using this optimized pipeline, we can get about 1.4-2 times performance boost with BFloat16 on fourth generation of Intel Xeon CPUs, code-named Sapphire Rapids. ```python @@ -1826,7 +1826,7 @@ This approach is using (optional) CoCa model to avoid writing image description. This SDXL pipeline support unlimited length prompt and negative prompt, compatible with A1111 prompt weighted style. -You can provide both `prompt` and `prompt_2`. If only one prompt is provided, `prompt_2` will be a copy of the provided `prompt`. Here is a sample code to use this pipeline. +You can provide both `prompt` and `prompt_2`. If only one prompt is provided, `prompt_2` will be a copy of the provided `prompt`. Here is a sample code to use this pipeline. ```python from diffusers import DiffusionPipeline @@ -3397,7 +3397,7 @@ invert_prompt = "A lying cat" input_image = "siamese.jpg" steps = 50 -# Provide prompt used for generation. Same if reconstruction +# Provide prompt used for generation. Same if reconstruction prompt = "A lying cat" # or different if editing. prompt = "A lying dog" @@ -3493,7 +3493,7 @@ output_frames = pipe( mask_end=0.8, mask_strength=0.5, negative_prompt='longbody, lowres, bad anatomy, bad hands, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality' -).frames +).frames[0] export_to_video( output_frames, "/path/to/video.mp4", 5) @@ -3636,8 +3636,8 @@ image = torch.from_numpy(faces[0].normed_embedding).unsqueeze(0) images = pipeline( prompt="A photo of a girl wearing a black dress, holding red roses in hand, upper body, behind is the Eiffel Tower", image_embeds=image, - negative_prompt="monochrome, lowres, bad anatomy, worst quality, low quality", - num_inference_steps=20, num_images_per_prompt=num_images, width=512, height=704, + negative_prompt="monochrome, lowres, bad anatomy, worst quality, low quality", + num_inference_steps=20, num_images_per_prompt=num_images, width=512, height=704, generator=generator ).images diff --git a/src/diffusers/configuration_utils.py b/src/diffusers/configuration_utils.py index 3aa0f1bb8278..386331cfa167 100644 --- a/src/diffusers/configuration_utils.py +++ b/src/diffusers/configuration_utils.py @@ -127,7 +127,7 @@ def __getattr__(self, name: str) -> Any: """The only reason we overwrite `getattr` here is to gracefully deprecate accessing config attributes directly. See https://github.com/huggingface/diffusers/pull/3129 - Tihs funtion is mostly copied from PyTorch's __getattr__ overwrite: + This function is mostly copied from PyTorch's __getattr__ overwrite: https://pytorch.org/docs/stable/_modules/torch/nn/modules/module.html#Module """ @@ -529,7 +529,7 @@ def extract_init_dict(cls, config_dict, **kwargs): f"{cls.config_name} configuration file." ) - # 5. Give nice info if config attributes are initiliazed to default because they have not been passed + # 5. Give nice info if config attributes are initialized to default because they have not been passed passed_keys = set(init_dict.keys()) if len(expected_keys - passed_keys) > 0: logger.info( diff --git a/src/diffusers/image_processor.py b/src/diffusers/image_processor.py index f6ccfda9fcb8..daeb8fd6fa6d 100644 --- a/src/diffusers/image_processor.py +++ b/src/diffusers/image_processor.py @@ -332,7 +332,7 @@ def resize( image: Union[PIL.Image.Image, np.ndarray, torch.Tensor], height: int, width: int, - resize_mode: str = "default", # "defalt", "fill", "crop" + resize_mode: str = "default", # "default", "fill", "crop" ) -> Union[PIL.Image.Image, np.ndarray, torch.Tensor]: """ Resize image. @@ -448,7 +448,7 @@ def preprocess( image: PipelineImageInput, height: Optional[int] = None, width: Optional[int] = None, - resize_mode: str = "default", # "defalt", "fill", "crop" + resize_mode: str = "default", # "default", "fill", "crop" crops_coords: Optional[Tuple[int, int, int, int]] = None, ) -> torch.Tensor: """ @@ -479,7 +479,7 @@ def preprocess( if isinstance(image, torch.Tensor): # if image is a pytorch tensor could have 2 possible shapes: # 1. batch x height x width: we should insert the channel dimension at position 1 - # 2. channnel x height x width: we should insert batch dimension at position 0, + # 2. channel x height x width: we should insert batch dimension at position 0, # however, since both channel and batch dimension has same size 1, it is same to insert at position 1 # for simplicity, we insert a dimension of size 1 at position 1 for both cases image = image.unsqueeze(1) diff --git a/src/diffusers/pipelines/auto_pipeline.py b/src/diffusers/pipelines/auto_pipeline.py index 46b4f17c77e4..fc30fc4d2f62 100644 --- a/src/diffusers/pipelines/auto_pipeline.py +++ b/src/diffusers/pipelines/auto_pipeline.py @@ -343,7 +343,7 @@ def from_pipe(cls, pipeline, **kwargs): pipeline linked to the pipeline class using pattern matching on pipeline class name. All the modules the pipeline contains will be used to initialize the new pipeline without reallocating - additional memoery. + additional memory. The pipeline is set in evaluation mode (`model.eval()`) by default. @@ -616,7 +616,7 @@ def from_pipe(cls, pipeline, **kwargs): image-to-image pipeline linked to the pipeline class using pattern matching on pipeline class name. All the modules the pipeline contains will be used to initialize the new pipeline without reallocating - additional memoery. + additional memory. The pipeline is set in evaluation mode (`model.eval()`) by default. @@ -892,7 +892,7 @@ def from_pipe(cls, pipeline, **kwargs): pipeline linked to the pipeline class using pattern matching on pipeline class name. All the modules the pipeline class contain will be used to initialize the new pipeline without reallocating - additional memoery. + additional memory. The pipeline is set in evaluation mode (`model.eval()`) by default. diff --git a/src/diffusers/pipelines/text_to_video_synthesis/pipeline_text_to_video_synth.py b/src/diffusers/pipelines/text_to_video_synthesis/pipeline_text_to_video_synth.py index 0ed0765703f2..bbf26e8960ed 100644 --- a/src/diffusers/pipelines/text_to_video_synthesis/pipeline_text_to_video_synth.py +++ b/src/diffusers/pipelines/text_to_video_synthesis/pipeline_text_to_video_synth.py @@ -52,7 +52,7 @@ >>> pipe.enable_model_cpu_offload() >>> prompt = "Spiderman is surfing" - >>> video_frames = pipe(prompt).frames + >>> video_frames = pipe(prompt).frames[0] >>> video_path = export_to_video(video_frames) >>> video_path ``` diff --git a/src/diffusers/pipelines/text_to_video_synthesis/pipeline_text_to_video_synth_img2img.py b/src/diffusers/pipelines/text_to_video_synthesis/pipeline_text_to_video_synth_img2img.py index 40c486316e13..1dba5ef0fea0 100644 --- a/src/diffusers/pipelines/text_to_video_synthesis/pipeline_text_to_video_synth_img2img.py +++ b/src/diffusers/pipelines/text_to_video_synthesis/pipeline_text_to_video_synth_img2img.py @@ -52,7 +52,7 @@ >>> pipe.to("cuda") >>> prompt = "spiderman running in the desert" - >>> video_frames = pipe(prompt, num_inference_steps=40, height=320, width=576, num_frames=24).frames + >>> video_frames = pipe(prompt, num_inference_steps=40, height=320, width=576, num_frames=24).frames[0] >>> # safe low-res video >>> video_path = export_to_video(video_frames, output_video_path="./video_576_spiderman.mp4") @@ -73,7 +73,7 @@ >>> video = [Image.fromarray(frame).resize((1024, 576)) for frame in video_frames] >>> # and denoise it - >>> video_frames = pipe(prompt, video=video, strength=0.6).frames + >>> video_frames = pipe(prompt, video=video, strength=0.6).frames[0] >>> video_path = export_to_video(video_frames, output_video_path="./video_1024_spiderman.mp4") >>> video_path ```