Skip to content

Represent learnt concept in textual inversion with more than one token #369

@Luvata

Description

@Luvata

Describe the bug

As we discuss in #266

The original textual inversion support using more than one vector to represent the learnt concept.
For the current implementation, if we just extend the learned vocab and CLIP token embedding then it would use only one vector for it.

What could be the best way to support this? cc @patil-suraj

Reproduction

No response

Logs

No response

System Info

diffusers v0.2.4

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions