diff --git a/docs/source/en/_toctree.yml b/docs/source/en/_toctree.yml index 5084299bb0dd..d6c753056044 100644 --- a/docs/source/en/_toctree.yml +++ b/docs/source/en/_toctree.yml @@ -146,6 +146,8 @@ title: Loaders - local: api/utilities title: Utilities + - local: api/image_processor + title: Vae Image Processor title: Main Classes - sections: - local: api/pipelines/overview diff --git a/docs/source/en/api/image_processor.mdx b/docs/source/en/api/image_processor.mdx new file mode 100644 index 000000000000..1964df214f94 --- /dev/null +++ b/docs/source/en/api/image_processor.mdx @@ -0,0 +1,22 @@ + + +# Image Processor for VAE + +Image processor provides a unified API for Stable Diffusion pipelines to prepare their image inputs for VAE encoding, as well as post-processing their outputs once decoded. This includes transformations such as resizing, normalization, and conversion between PIL Image, PyTorch, and Numpy arrays. + +All pipelines with VAE image processor will accept image inputs in the format of PIL Image, PyTorch tensor, or Numpy array, and will able to return outputs in the format of PIL Image, Pytorch tensor, and Numpy array based on the `output_type` argument from the user. Additionally, the User can pass encoded image latents directly to the pipeline, or ask the pipeline to return latents as output with `output_type = 'pt'` argument. This allows you to take the generated latents from one pipeline and pass it to another pipeline as input, without ever having to leave the latent space. It also makes it much easier to use multiple pipelines together, by passing PyTorch tensors directly between different pipelines. + + +## VaeImageProcessor + +[[autodoc]] image_processor.VaeImageProcessor \ No newline at end of file