Skip to content

[Bug] TypeError: CLIPTextModel.__init__() got unexpected keyword argument 'offload_state_dict' (Latest Diffs/Transf) #12480

@lingbin1964

Description

@lingbin1964

Describe the bug

This is the final, complete bug report draft, incorporating the specific version numbers that led to the dependency conflict.

You can copy and paste the entire text below into a new issue on the Hugging Face diffusers GitHub Issues page: https://github.com/huggingface/diffusers/issues.

📝 Bug Report for Hugging Face diffusers
Title: [Bug] TypeError: CLIPTextModel.init() got unexpected keyword argument 'offload_state_dict' (Latest Diffs/Transf)

  1. Description of the Bug
    When attempting to load the StableDiffusionControlNetImg2ImgPipeline using from_pretrained, the process fails with a TypeError because the diffusers library is passing an unrecognized argument (offload_state_dict) to the CLIPTextModel constructor from the transformers library.

This issue occurs despite the latest versions of the packages and persists even after:

Clearing the local Hugging Face cache (~/.cache/huggingface/hub/models--runwayml--stable-diffusion-v1-5).

Explicitly forcing the pipeline to use CPU-safe arguments (map_location='cpu').

  1. Reproduction Steps (Minimal Example)
    The bug is triggered when attempting to load the main pipeline:

import torch
from diffusers import ControlNetModel, StableDiffusionControlNetImg2ImgPipeline

1. Initialize ControlNet (Example model)

controlnet = ControlNetModel.from_pretrained(
"lllyasviel/control_v11p_sd15_scribble",
torch_dtype=torch.float32,
map_location="cpu"
)

2. Attempt to load the main pipeline, which fails on the Text Encoder (CLIPTextModel)

pipe = StableDiffusionControlNetImg2ImgPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5",
controlnet=controlnet,
torch_dtype=torch.float32,
map_location="cpu"
)

  1. Error Traceback
    oading pipeline components...: 17%|███████████ | 1/6 [00:00<00:00, 16.31it/s]
    Traceback (most recent call last):
    ...
    File "/home/lingbin/Desktop/ai_refiner_venv/lib/python3.12/site-packages/diffusers/pipelines/pipeline_loading_utils.py", line 849, in load_sub_model
    loaded_sub_model = load_method(os.path.join(cached_folder, name), **loading_kwargs)
    File "/home/lingbin/Desktop/ai_refiner_venv/lib/python3.12/site-packages/transformers/modeling_utils.py", line 4974, in from_pretrained
    model = cls(config, *model_args, **model_kwargs)
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    TypeError: CLIPTextModel.init() got an unexpected keyword argument 'offload_state_dict'

Reproduction

import torch
from diffusers import ControlNetModel, StableDiffusionControlNetImg2ImgPipeline

Define minimal environment for reproduction

The bug is a software incompatibility, so using CPU ensures anyone can reproduce it.

DEVICE = torch.device("cpu")
DTYPE = torch.float32

Model names required to trigger the loading path

MODEL_NAME = "runwayml/stable-diffusion-v1-5"
CONTROLNET_NAME = "lllyasviel/sd-controlnet-scribble"

--- Reproduction Start ---

1. Load the ControlNet model (required before pipeline load)

controlnet = ControlNetModel.from_pretrained(
CONTROLNET_NAME,
torch_dtype=DTYPE,
).to(DEVICE)

2. Initialize the main pipeline

This is the line that will trigger the TypeError due to package incompatibility

pipe = StableDiffusionControlNetImg2ImgPipeline.from_pretrained(
MODEL_NAME,
controlnet=controlnet,
torch_dtype=DTYPE,
)

print("If this line prints, the bug is fixed.")

Logs

(ai_refiner_venv) lingbin@BANGKOK:~/Desktop/gemini$ python3 ai_refiner_gpu_optimized.py \
  --input ./drawings_test \
  --output ./refined_art \
  --prompt "a cute cartoon drawing of a cat with a thick black outline, volumetric lighting, on a clean white background" \
  --detector scribble
/home/lingbin/Desktop/ai_refiner_venv/lib/python3.12/site-packages/controlnet_aux/mediapipe_face/mediapipe_face_common.py:7: UserWarning: The module 'mediapipe' is not installed. The package will have limited functionality. Please install it using the command: pip install 'mediapipe'
  warnings.warn(
/home/lingbin/Desktop/ai_refiner_venv/lib/python3.12/site-packages/timm/models/layers/__init__.py:48: FutureWarning: Importing from timm.models.layers is deprecated, please import via timm.layers
  warnings.warn(f"Importing from {__name__} is deprecated, please import via timm.layers", FutureWarning)
/home/lingbin/Desktop/ai_refiner_venv/lib/python3.12/site-packages/timm/models/registry.py:4: FutureWarning: Importing from timm.models.registry is deprecated, please import via timm.models
  warnings.warn(f"Importing from {__name__} is deprecated, please import via timm.models", FutureWarning)
/home/lingbin/Desktop/ai_refiner_venv/lib/python3.12/site-packages/controlnet_aux/segment_anything/modeling/tiny_vit_sam.py:654: UserWarning: Overwriting tiny_vit_5m_224 in registry with controlnet_aux.segment_anything.modeling.tiny_vit_sam.tiny_vit_5m_224. This is because the name being registered conflicts with an existing name. Please check if this is not expected.
  return register_model(fn_wrapper)
/home/lingbin/Desktop/ai_refiner_venv/lib/python3.12/site-packages/controlnet_aux/segment_anything/modeling/tiny_vit_sam.py:654: UserWarning: Overwriting tiny_vit_11m_224 in registry with controlnet_aux.segment_anything.modeling.tiny_vit_sam.tiny_vit_11m_224. This is because the name being registered conflicts with an existing name. Please check if this is not expected.
  return register_model(fn_wrapper)
/home/lingbin/Desktop/ai_refiner_venv/lib/python3.12/site-packages/controlnet_aux/segment_anything/modeling/tiny_vit_sam.py:654: UserWarning: Overwriting tiny_vit_21m_224 in registry with controlnet_aux.segment_anything.modeling.tiny_vit_sam.tiny_vit_21m_224. This is because the name being registered conflicts with an existing name. Please check if this is not expected.
  return register_model(fn_wrapper)
/home/lingbin/Desktop/ai_refiner_venv/lib/python3.12/site-packages/controlnet_aux/segment_anything/modeling/tiny_vit_sam.py:654: UserWarning: Overwriting tiny_vit_21m_384 in registry with controlnet_aux.segment_anything.modeling.tiny_vit_sam.tiny_vit_21m_384. This is because the name being registered conflicts with an existing name. Please check if this is not expected.
  return register_model(fn_wrapper)
/home/lingbin/Desktop/ai_refiner_venv/lib/python3.12/site-packages/controlnet_aux/segment_anything/modeling/tiny_vit_sam.py:654: UserWarning: Overwriting tiny_vit_21m_512 in registry with controlnet_aux.segment_anything.modeling.tiny_vit_sam.tiny_vit_21m_512. This is because the name being registered conflicts with an existing name. Please check if this is not expected.
  return register_model(fn_wrapper)
[INFO] Running on CPU (Fixed Args) (torch_dtype=torch.float32)
model_index.json: 100%|█████████████████████████████████████████████████████████████████████████████| 541/541 [00:00<00:00, 4.22MB/s]
special_tokens_map.json: 100%|██████████████████████████████████████████████████████████████████████| 472/472 [00:00<00:00, 1.67MB/s]
preprocessor_config.json: 100%|█████████████████████████████████████████████████████████████████████| 342/342 [00:00<00:00, 2.07MB/s]
scheduler_config.json: 100%|████████████████████████████████████████████████████████████████████████| 308/308 [00:00<00:00, 2.69MB/s]
tokenizer_config.json: 100%|████████████████████████████████████████████████████████████████████████| 806/806 [00:00<00:00, 9.55MB/s]
merges.txt: 525kB [00:00, 14.8MB/s]                                                                        | 0.00/308 [00:00<?, ?B/s]
config.json: 100%|██████████████████████████████████████████████████████████████████████████████████| 617/617 [00:00<00:00, 3.63MB/s]
vocab.json: 1.06MB [00:00, 19.8MB/s]████████████                                                      | 4/13 [00:01<00:02,  3.75it/s]
config.json: 100%|███████████████████████████████████████████████████████████████████████████████████| 547/547 [00:00<00:00, 971kB/s]
config.json: 100%|██████████████████████████████████████████████████████████████████████████████████| 743/743 [00:00<00:00, 3.18MB/s]
vae/diffusion_pytorch_model.safetensors: 100%|████████████████████████████████████████████████████| 335M/335M [00:12<00:00, 26.7MB/s]
text_encoder/model.safetensors: 100%|█████████████████████████████████████████████████████████████| 492M/492M [00:21<00:00, 23.2MB/s]
unet/diffusion_pytorch_model.safetensors: 100%|██████████████████████████████████████████████████| 3.44G/3.44G [00:27<00:00, 124MB/s]
Fetching 13 files: 100%|█████████████████████████████████████████████████████████████████████████████| 13/13 [00:29<00:00,  2.29s/it]
Keyword arguments {'map_location': 'cpu'} are not expected by StableDiffusionControlNetImg2ImgPipeline and will be ignored., 207MB/s]
Loading pipeline components...:   0%|                                                                          | 0/6 [00:00<?, ?it/s]`torch_dtype` is deprecated! Use `dtype` instead!
Loading pipeline components...:  17%|███████████                                                       | 1/6 [00:00<00:00, 16.31it/s]
Traceback (most recent call last):
  File "/home/lingbin/Desktop/gemini/ai_refiner_gpu_optimized.py", line 168, in <module>
    main()
  File "/home/lingbin/Desktop/gemini/ai_refiner_gpu_optimized.py", line 81, in main
    pipe = StableDiffusionControlNetImg2ImgPipeline.from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lingbin/Desktop/ai_refiner_venv/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/lingbin/Desktop/ai_refiner_venv/lib/python3.12/site-packages/diffusers/pipelines/pipeline_utils.py", line 1025, in from_pretrained
    loaded_sub_model = load_sub_model(
                       ^^^^^^^^^^^^^^^
  File "/home/lingbin/Desktop/ai_refiner_venv/lib/python3.12/site-packages/diffusers/pipelines/pipeline_loading_utils.py", line 849, in load_sub_model
    loaded_sub_model = load_method(os.path.join(cached_folder, name), **loading_kwargs)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lingbin/Desktop/ai_refiner_venv/lib/python3.12/site-packages/transformers/modeling_utils.py", line 277, in _wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/lingbin/Desktop/ai_refiner_venv/lib/python3.12/site-packages/transformers/modeling_utils.py", line 4974, in from_pretrained
    model = cls(config, *model_args, **model_kwargs)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: CLIPTextModel.__init__() got an unexpected keyword argument 'offload_state_dict'

System Info

  1. System / Dependency Info (Crucial)
    This issue occurs with the following package versions, indicating an internal compatibility failure between the latest released versions of diffusers and transformers:

Name: diffusers
Version: 0.35.1
...

Name: transformers
Version: 4.57.0
...

Name: accelerate
Version: 1.10.1
...

Name: torch
Version: 2.8.0

No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 24.04.3 LTS
Release: 24.04
Codename: noble
6.14.0-33-generic
RTX5070
| NVIDIA-SMI 580.82.09 Driver Version: 580.82.09 CUDA Version: 13.0 |
+-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 5070 Off | 00000000:02:00.0 On | N/A |
| 0% 48C P5 21W / 250W | 594MiB / 12227MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 2046 G /usr/lib/xorg/Xorg 298MiB |
| 0 N/A N/A 2263 G /usr/bin/gnome-shell 47MiB |
| 0 N/A N/A 2738 G ...exec/xdg-desktop-portal-gnome 3MiB |
| 0 N/A N/A 3026 G ...9c6403d5790d1d1d682f48fb04598 142MiB |
| 0 N/A N/A 3480 G /proc/self/exe 50MiB |

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2025 NVIDIA Corporation
Built on Wed_Aug_20_01:58:59_PM_PDT_2025
Cuda compilation tools, release 13.0, V13.0.88
Build cuda_13.0.r13.0/compiler.36424714_0
python3 --version
Python 3.12.3
Python 3.12.3

Who can help?

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions