Skip to content

unload_textual_inversion doesn't unload tokenizer_2 & text_encoder_2 #6974

@H3zi

Description

@H3zi

Describe the bug

unload_textual_inversion unloads only the pipeline tokenizer and text_encoder.
I can submit a PR if you like

Reproduction

Following Advanced Lora training inference examples

# load embeddings to the text encoders
state_dict = load_file(embedding_path)

# notice we load the tokens <s0><s1>, as "TOK" as only a place-holder and training was performed using the new initialized tokens - <s0><s1>
# load embeddings of text_encoder 1 (CLIP ViT-L/14)
pipe.load_textual_inversion(state_dict["clip_l"], token=["<s0>", "<s1>"], text_encoder=pipe.text_encoder, tokenizer=pipe.tokenizer)
# load embeddings of text_encoder 2 (CLIP ViT-G/14)
pipe.load_textual_inversion(state_dict["clip_g"], token=["<s0>", "<s1>"], text_encoder=pipe.text_encoder_2, tokenizer=pipe.tokenizer_2)

And then trying to unload and load again:

pipe.unload_textual_inversion()
# load embeddings of text_encoder 1 (CLIP ViT-L/14)
pipe.load_textual_inversion(state_dict["clip_l"], token=["<s0>", "<s1>"], text_encoder=pipe.text_encoder, tokenizer=pipe.tokenizer)
# load embeddings of text_encoder 2 (CLIP ViT-G/14)
pipe.load_textual_inversion(state_dict["clip_g"], token=["<s0>", "<s1>"], text_encoder=pipe.text_encoder_2, tokenizer=pipe.tokenizer_2)

Will raise the following exception:

ValueError: Token already in tokenizer vocabulary. Please choose a different token name or remove and embedding from the tokenizer and text encoder.

Logs

Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/lora_advanced/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/ubuntu/miniconda3/envs/lora_advanced/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/ubuntu/.vscode-server/extensions/ms-python.debugpy-2024.0.0-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/__main__.py", line 39, in <module>
    cli.main()
  File "/home/ubuntu/.vscode-server/extensions/ms-python.debugpy-2024.0.0-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 430, in main
    run()
  File "/home/ubuntu/.vscode-server/extensions/ms-python.debugpy-2024.0.0-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 284, in run_file
    runpy.run_path(target, run_name="__main__")
  File "/home/ubuntu/.vscode-server/extensions/ms-python.debugpy-2024.0.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 321, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "/home/ubuntu/.vscode-server/extensions/ms-python.debugpy-2024.0.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 135, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "/home/ubuntu/.vscode-server/extensions/ms-python.debugpy-2024.0.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 124, in _run_code
    exec(code, run_globals)
  File "/home/ubuntu/dev/tailored_generation/tailored_generation/finetune/eval.py", line 76, in <module>
    pipe.load_textual_inversion(state_dict["clip_g"], token=["<s0>", "<s1>"], text_encoder=pipe.text_encoder_2, tokenizer=pipe.tokenizer_2)
  File "/home/ubuntu/miniconda3/envs/lora_advanced/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn
    return fn(*args, **kwargs)
  File "/home/ubuntu/miniconda3/envs/lora_advanced/lib/python3.10/site-packages/diffusers/loaders/textual_inversion.py", line 402, in load_textual_inversion
    tokens, embeddings = self._retrieve_tokens_and_embeddings(tokens, state_dicts, tokenizer)
  File "/home/ubuntu/miniconda3/envs/lora_advanced/lib/python3.10/site-packages/diffusers/loaders/textual_inversion.py", line 229, in _retrieve_tokens_and_embeddings
    raise ValueError(
ValueError: Token <s0> already in tokenizer vocabulary. Please choose a different token name or remove <s0> and embedding from the tokenizer and text encoder.

System Info

  • diffusers version: 0.26.2
  • Platform: Linux-5.15.0-1036-aws-x86_64-with-glibc2.31
  • Python version: 3.10.13
  • PyTorch version (GPU?): 2.2.0+cu121 (True)
  • Huggingface_hub version: 0.20.3
  • Transformers version: 4.37.2
  • Accelerate version: 0.27.0
  • xFormers version: not installed
  • Using GPU in script?: NVIDIA A10G
  • Using distributed or parallel set-up in script?: no

Who can help?

@sayakpaul

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions