[Bug] TypeError: CLIPTextModel.__init__() got unexpected keyword argument 'offload_state_dict' (Latest Diffs/Transf)

### Describe the bug

This is the final, complete bug report draft, incorporating the specific version numbers that led to the dependency conflict.

You can copy and paste the entire text below into a new issue on the Hugging Face diffusers GitHub Issues page: https://github.com/huggingface/diffusers/issues.

📝 Bug Report for Hugging Face diffusers
Title: [Bug] TypeError: CLIPTextModel.__init__() got unexpected keyword argument 'offload_state_dict' (Latest Diffs/Transf)

1. Description of the Bug
When attempting to load the StableDiffusionControlNetImg2ImgPipeline using from_pretrained, the process fails with a TypeError because the diffusers library is passing an unrecognized argument (offload_state_dict) to the CLIPTextModel constructor from the transformers library.

This issue occurs despite the latest versions of the packages and persists even after:

Clearing the local Hugging Face cache (~/.cache/huggingface/hub/models--runwayml--stable-diffusion-v1-5).

Explicitly forcing the pipeline to use CPU-safe arguments (map_location='cpu').

2. Reproduction Steps (Minimal Example)
The bug is triggered when attempting to load the main pipeline:

import torch
from diffusers import ControlNetModel, StableDiffusionControlNetImg2ImgPipeline

# 1. Initialize ControlNet (Example model)
controlnet = ControlNetModel.from_pretrained(
    "lllyasviel/control_v11p_sd15_scribble",
    torch_dtype=torch.float32,
    map_location="cpu"
)

# 2. Attempt to load the main pipeline, which fails on the Text Encoder (CLIPTextModel)
pipe = StableDiffusionControlNetImg2ImgPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    controlnet=controlnet,
    torch_dtype=torch.float32,
    map_location="cpu"
)


3. Error Traceback
oading pipeline components...:  17%|███████████                           | 1/6 [00:00<00:00, 16.31it/s]
Traceback (most recent call last):
...
  File "/home/lingbin/Desktop/ai_refiner_venv/lib/python3.12/site-packages/diffusers/pipelines/pipeline_loading_utils.py", line 849, in load_sub_model
    loaded_sub_model = load_method(os.path.join(cached_folder, name), **loading_kwargs)
  File "/home/lingbin/Desktop/ai_refiner_venv/lib/python3.12/site-packages/transformers/modeling_utils.py", line 4974, in from_pretrained
    model = cls(config, *model_args, **model_kwargs)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: CLIPTextModel.__init__() got an unexpected keyword argument 'offload_state_dict'












### Reproduction

import torch
from diffusers import ControlNetModel, StableDiffusionControlNetImg2ImgPipeline

# Define minimal environment for reproduction
# The bug is a software incompatibility, so using CPU ensures anyone can reproduce it.
DEVICE = torch.device("cpu")
DTYPE = torch.float32 

# Model names required to trigger the loading path
MODEL_NAME = "runwayml/stable-diffusion-v1-5"
CONTROLNET_NAME = "lllyasviel/sd-controlnet-scribble"

# --- Reproduction Start ---

# 1. Load the ControlNet model (required before pipeline load)
controlnet = ControlNetModel.from_pretrained(
    CONTROLNET_NAME,
    torch_dtype=DTYPE,
).to(DEVICE)

# 2. Initialize the main pipeline
# This is the line that will trigger the TypeError due to package incompatibility
pipe = StableDiffusionControlNetImg2ImgPipeline.from_pretrained(
    MODEL_NAME, 
    controlnet=controlnet, 
    torch_dtype=DTYPE,
)

print("If this line prints, the bug is fixed.")

### Logs

```shell
(ai_refiner_venv) lingbin@BANGKOK:~/Desktop/gemini$ python3 ai_refiner_gpu_optimized.py \
  --input ./drawings_test \
  --output ./refined_art \
  --prompt "a cute cartoon drawing of a cat with a thick black outline, volumetric lighting, on a clean white background" \
  --detector scribble
/home/lingbin/Desktop/ai_refiner_venv/lib/python3.12/site-packages/controlnet_aux/mediapipe_face/mediapipe_face_common.py:7: UserWarning: The module 'mediapipe' is not installed. The package will have limited functionality. Please install it using the command: pip install 'mediapipe'
  warnings.warn(
/home/lingbin/Desktop/ai_refiner_venv/lib/python3.12/site-packages/timm/models/layers/__init__.py:48: FutureWarning: Importing from timm.models.layers is deprecated, please import via timm.layers
  warnings.warn(f"Importing from {__name__} is deprecated, please import via timm.layers", FutureWarning)
/home/lingbin/Desktop/ai_refiner_venv/lib/python3.12/site-packages/timm/models/registry.py:4: FutureWarning: Importing from timm.models.registry is deprecated, please import via timm.models
  warnings.warn(f"Importing from {__name__} is deprecated, please import via timm.models", FutureWarning)
/home/lingbin/Desktop/ai_refiner_venv/lib/python3.12/site-packages/controlnet_aux/segment_anything/modeling/tiny_vit_sam.py:654: UserWarning: Overwriting tiny_vit_5m_224 in registry with controlnet_aux.segment_anything.modeling.tiny_vit_sam.tiny_vit_5m_224. This is because the name being registered conflicts with an existing name. Please check if this is not expected.
  return register_model(fn_wrapper)
/home/lingbin/Desktop/ai_refiner_venv/lib/python3.12/site-packages/controlnet_aux/segment_anything/modeling/tiny_vit_sam.py:654: UserWarning: Overwriting tiny_vit_11m_224 in registry with controlnet_aux.segment_anything.modeling.tiny_vit_sam.tiny_vit_11m_224. This is because the name being registered conflicts with an existing name. Please check if this is not expected.
  return register_model(fn_wrapper)
/home/lingbin/Desktop/ai_refiner_venv/lib/python3.12/site-packages/controlnet_aux/segment_anything/modeling/tiny_vit_sam.py:654: UserWarning: Overwriting tiny_vit_21m_224 in registry with controlnet_aux.segment_anything.modeling.tiny_vit_sam.tiny_vit_21m_224. This is because the name being registered conflicts with an existing name. Please check if this is not expected.
  return register_model(fn_wrapper)
/home/lingbin/Desktop/ai_refiner_venv/lib/python3.12/site-packages/controlnet_aux/segment_anything/modeling/tiny_vit_sam.py:654: UserWarning: Overwriting tiny_vit_21m_384 in registry with controlnet_aux.segment_anything.modeling.tiny_vit_sam.tiny_vit_21m_384. This is because the name being registered conflicts with an existing name. Please check if this is not expected.
  return register_model(fn_wrapper)
/home/lingbin/Desktop/ai_refiner_venv/lib/python3.12/site-packages/controlnet_aux/segment_anything/modeling/tiny_vit_sam.py:654: UserWarning: Overwriting tiny_vit_21m_512 in registry with controlnet_aux.segment_anything.modeling.tiny_vit_sam.tiny_vit_21m_512. This is because the name being registered conflicts with an existing name. Please check if this is not expected.
  return register_model(fn_wrapper)
[INFO] Running on CPU (Fixed Args) (torch_dtype=torch.float32)
model_index.json: 100%|█████████████████████████████████████████████████████████████████████████████| 541/541 [00:00<00:00, 4.22MB/s]
special_tokens_map.json: 100%|██████████████████████████████████████████████████████████████████████| 472/472 [00:00<00:00, 1.67MB/s]
preprocessor_config.json: 100%|█████████████████████████████████████████████████████████████████████| 342/342 [00:00<00:00, 2.07MB/s]
scheduler_config.json: 100%|████████████████████████████████████████████████████████████████████████| 308/308 [00:00<00:00, 2.69MB/s]
tokenizer_config.json: 100%|████████████████████████████████████████████████████████████████████████| 806/806 [00:00<00:00, 9.55MB/s]
merges.txt: 525kB [00:00, 14.8MB/s]                                                                        | 0.00/308 [00:00<?, ?B/s]
config.json: 100%|██████████████████████████████████████████████████████████████████████████████████| 617/617 [00:00<00:00, 3.63MB/s]
vocab.json: 1.06MB [00:00, 19.8MB/s]████████████                                                      | 4/13 [00:01<00:02,  3.75it/s]
config.json: 100%|███████████████████████████████████████████████████████████████████████████████████| 547/547 [00:00<00:00, 971kB/s]
config.json: 100%|██████████████████████████████████████████████████████████████████████████████████| 743/743 [00:00<00:00, 3.18MB/s]
vae/diffusion_pytorch_model.safetensors: 100%|████████████████████████████████████████████████████| 335M/335M [00:12<00:00, 26.7MB/s]
text_encoder/model.safetensors: 100%|█████████████████████████████████████████████████████████████| 492M/492M [00:21<00:00, 23.2MB/s]
unet/diffusion_pytorch_model.safetensors: 100%|██████████████████████████████████████████████████| 3.44G/3.44G [00:27<00:00, 124MB/s]
Fetching 13 files: 100%|█████████████████████████████████████████████████████████████████████████████| 13/13 [00:29<00:00,  2.29s/it]
Keyword arguments {'map_location': 'cpu'} are not expected by StableDiffusionControlNetImg2ImgPipeline and will be ignored., 207MB/s]
Loading pipeline components...:   0%|                                                                          | 0/6 [00:00<?, ?it/s]`torch_dtype` is deprecated! Use `dtype` instead!
Loading pipeline components...:  17%|███████████                                                       | 1/6 [00:00<00:00, 16.31it/s]
Traceback (most recent call last):
  File "/home/lingbin/Desktop/gemini/ai_refiner_gpu_optimized.py", line 168, in <module>
    main()
  File "/home/lingbin/Desktop/gemini/ai_refiner_gpu_optimized.py", line 81, in main
    pipe = StableDiffusionControlNetImg2ImgPipeline.from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lingbin/Desktop/ai_refiner_venv/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/lingbin/Desktop/ai_refiner_venv/lib/python3.12/site-packages/diffusers/pipelines/pipeline_utils.py", line 1025, in from_pretrained
    loaded_sub_model = load_sub_model(
                       ^^^^^^^^^^^^^^^
  File "/home/lingbin/Desktop/ai_refiner_venv/lib/python3.12/site-packages/diffusers/pipelines/pipeline_loading_utils.py", line 849, in load_sub_model
    loaded_sub_model = load_method(os.path.join(cached_folder, name), **loading_kwargs)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lingbin/Desktop/ai_refiner_venv/lib/python3.12/site-packages/transformers/modeling_utils.py", line 277, in _wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/lingbin/Desktop/ai_refiner_venv/lib/python3.12/site-packages/transformers/modeling_utils.py", line 4974, in from_pretrained
    model = cls(config, *model_args, **model_kwargs)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: CLIPTextModel.__init__() got an unexpected keyword argument 'offload_state_dict'
```

### System Info

4. System / Dependency Info (Crucial)
This issue occurs with the following package versions, indicating an internal compatibility failure between the latest released versions of diffusers and transformers:

Name: diffusers
Version: 0.35.1
...
---
Name: transformers
Version: 4.57.0
...
---
Name: accelerate
Version: 1.10.1
...
---
Name: torch
Version: 2.8.0

No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 24.04.3 LTS
Release:	24.04
Codename:	noble
6.14.0-33-generic
RTX5070
| NVIDIA-SMI 580.82.09              Driver Version: 580.82.09      CUDA Version: 13.0     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 5070        Off |   00000000:02:00.0  On |                  N/A |
|  0%   48C    P5             21W /  250W |     594MiB /  12227MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A            2046      G   /usr/lib/xorg/Xorg                      298MiB |
|    0   N/A  N/A            2263      G   /usr/bin/gnome-shell                     47MiB |
|    0   N/A  N/A            2738      G   ...exec/xdg-desktop-portal-gnome          3MiB |
|    0   N/A  N/A            3026      G   ...9c6403d5790d1d1d682f48fb04598        142MiB |
|    0   N/A  N/A            3480      G   /proc/self/exe                           50MiB |

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2025 NVIDIA Corporation
Built on Wed_Aug_20_01:58:59_PM_PDT_2025
Cuda compilation tools, release 13.0, V13.0.88
Build cuda_13.0.r13.0/compiler.36424714_0
python3 --version
Python 3.12.3
Python 3.12.3

### Who can help?

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug] TypeError: CLIPTextModel.init() got unexpected keyword argument 'offload_state_dict' (Latest Diffs/Transf) #12480

Describe the bug

1. Initialize ControlNet (Example model)

2. Attempt to load the main pipeline, which fails on the Text Encoder (CLIPTextModel)

Reproduction

Define minimal environment for reproduction

The bug is a software incompatibility, so using CPU ensures anyone can reproduce it.

Model names required to trigger the loading path

--- Reproduction Start ---

1. Load the ControlNet model (required before pipeline load)

2. Initialize the main pipeline

This is the line that will trigger the TypeError due to package incompatibility

Logs

System Info

Name: diffusers
Version: 0.35.1
...

Name: transformers
Version: 4.57.0
...

Name: accelerate
Version: 1.10.1
...

Who can help?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] TypeError: CLIPTextModel.__init__() got unexpected keyword argument 'offload_state_dict' (Latest Diffs/Transf) #12480

Description

Describe the bug

1. Initialize ControlNet (Example model)

2. Attempt to load the main pipeline, which fails on the Text Encoder (CLIPTextModel)

Reproduction

Define minimal environment for reproduction

The bug is a software incompatibility, so using CPU ensures anyone can reproduce it.

Model names required to trigger the loading path

--- Reproduction Start ---

1. Load the ControlNet model (required before pipeline load)

2. Initialize the main pipeline

This is the line that will trigger the TypeError due to package incompatibility

Logs

System Info

Name: diffusers Version: 0.35.1 ...

Name: transformers Version: 4.57.0 ...

Name: accelerate Version: 1.10.1 ...

Who can help?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[Bug] TypeError: CLIPTextModel.init() got unexpected keyword argument 'offload_state_dict' (Latest Diffs/Transf) #12480

Name: diffusers
Version: 0.35.1
...

Name: transformers
Version: 4.57.0
...

Name: accelerate
Version: 1.10.1
...