In [1]:
!pip install git+https://github.com/huggingface/diffusers
!pip install transformers accelerate datasets

In [2]:
import torch
from diffusers import DiffusionPipeline

pipe = DiffusionPipeline.from_pretrained(
    "Efficient-Large-Model/Sana_1600M_4Kpx_BF16_diffusers", 
    torch_dtype=torch.bfloat16,
    variant="bf16",
)
pipe.to("cuda" if torch.cuda.is_available() else "mps" if torch.mps.is_available() else "cpu")

# attention fails with bfloat16 due to dtype mismatch
pipe.vae.to(torch.float32) 

# tiled (V)AE, otherwise OOM on 24 GB VRAM
# +increase tile_sample_min_{height,width}, default is 448 -> visible arifacts with 4x4k img
pipe.vae.enable_tiling(tile_sample_min_height=1024, tile_sample_min_width=1024)

for fruit in ["apple", "orange", "strawberry"]:
    image = pipe(
        prompt=f"A melting {fruit}",
        height=4096,
        width=4096,
        guidance_scale=5.0,
    )[0]
    image[0].save(f"{fruit}.png")


A mixture of bf16 and non-bf16 filenames will be loaded.
Loaded bf16 filenames:
[transformer/diffusion_pytorch_model.bf16.safetensors, text_encoder/model.bf16-00001-of-00002.safetensors, vae/diffusion_pytorch_model.bf16.safetensors, text_encoder/model.bf16-00002-of-00002.safetensors]
Loaded non-bf16 filenames:
[transformer/diffusion_pytorch_model-00002-of-00002.safetensors, transformer/diffusion_pytorch_model-00001-of-00002.safetensors
If this behavior is not expected, please check your folder structure.


Loading pipeline components...:   0%|          | 0/5 [00:00<?, ?it/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

The 'batch_size' argument of HybridCache is deprecated and will be removed in v4.49. Use the more precisely named 'max_batch_size' argument instead.
The 'batch_size' attribute of HybridCache is deprecated and will be removed in v4.49. Use the more precisely named 'self.max_batch_size' attribute instead.


  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]