Skip to content

Commit

Permalink
fix(docs): explain converting Textual Inversions, using layer tokens,…
Browse files Browse the repository at this point in the history
… and prompt range syntax (#179)
  • Loading branch information
ssube committed Mar 8, 2023
1 parent 30b08c6 commit 7800581
Show file tree
Hide file tree
Showing 2 changed files with 72 additions and 3 deletions.
60 changes: 57 additions & 3 deletions docs/converting-models.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,9 @@ weights, to the directories used by `diffusers` and on to the ONNX models used b
- [Figuring out which script produced the LoRA weights](#figuring-out-which-script-produced-the-lora-weights)
- [LoRA weights from cloneofsimo/lora](#lora-weights-from-cloneofsimolora)
- [LoRA weights from kohya-ss/sd-scripts](#lora-weights-from-kohya-sssd-scripts)
- [Converting Textual Inversion embeddings](#converting-textual-inversion-embeddings)
- [Figuring out what token a Textual Inversion uses](#figuring-out-what-token-a-textual-inversion-uses)
- [Figuring out how many layers a Textual Inversion uses](#figuring-out-how-many-layers-a-textual-inversion-uses)

## Conversion steps for each type of model

Expand All @@ -25,11 +28,14 @@ You can start from a diffusers directory, HuggingFace Hub repository, or an SD c
3. diffusers directory or LoRA weights from `cloneofsimo/lora` to...
4. ONNX models

One disadvantage of using ONNX is that LoRA weights must be merged with the base model before being converted,
so the final output is roughly the size of the base model. Hopefully this can be reduced in the future.
Textual inversions can be converted directly to ONNX by merging them with the base model.

One current disadvantage of using ONNX is that LoRA weights must be merged with the base model before being converted,
so the final output is roughly the size of the base model. Hopefully this can be reduced in the future
(https://github.com/ssube/onnx-web/issues/213).

If you are using the Auto1111 web UI or another tool, you may not need to convert the models to ONNX. In that case,
you will not have an `extras.json` file and should skip step 4.
you will not have an `extras.json` file and should skip the last step.

## Converting diffusers models

Expand Down Expand Up @@ -233,3 +239,51 @@ Make sure to set the `format` key and that it matches the format you used to exp
Based on docs in:

- https://github.com/kohya-ss/sd-scripts/blob/main/train_network_README-ja.md#%E3%83%9E%E3%83%BC%E3%82%B8%E3%82%B9%E3%82%AF%E3%83%AA%E3%83%97%E3%83%88%E3%81%AB%E3%81%A4%E3%81%84%E3%81%A6

## Converting Textual Inversion embeddings

You can convert Textual Inversion embeddings by merging their weights and tokens into a copy of their base model,
which is directly supported by the conversion script in `onnx-web` with no additional steps.

Textual Inversions may have more than one set of weights, which can be used and controlled separately. Some Textual
Inversions provide their own token, but you can set a custom token for any of them.

### Figuring out what token a Textual Inversion uses

The base token, without any layer numbers, should be printed to the logs with the string `found embedding for token`:

```none
[2023-03-08 04:54:00,234] INFO: MainProcess MainThread onnx_web.convert.diffusion.textual_inversion: found embedding for token <concept>: torch.Size([768])
[2023-03-08 04:54:01,624] INFO: MainProcess MainThread onnx_web.convert.diffusion.textual_inversion: added 1 tokens
[2023-03-08 04:54:01,814] INFO: MainProcess MainThread onnx_web.convert.diffusion.textual_inversion: saving tokenizer for textual inversion
[2023-03-08 04:54:01,859] INFO: MainProcess MainThread onnx_web.convert.diffusion.textual_inversion: saving text encoder for textual inversion
```

If you have set a custom token, that will be shown instead. If more than one token has been added, they will be
numbered following the pattern `base-N`, starting with 0.

### Figuring out how many layers a Textual Inversion uses

Textual Inversions produced by [the Stable Conceptualizer notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/stable_conceptualizer_inference.ipynb)
only have a single layer, while many others have more than one.

The number of layers is shown in the server logs when the model is converted:

```none
[2023-03-08 04:54:00,234] INFO: MainProcess MainThread onnx_web.convert.diffusion.textual_inversion: found embedding for token <concept>: torch.Size([768])
[2023-03-08 04:54:01,624] INFO: MainProcess MainThread onnx_web.convert.diffusion.textual_inversion: added 1 tokens
[2023-03-08 04:54:01,814] INFO: MainProcess MainThread onnx_web.convert.diffusion.textual_inversion: saving tokenizer for textual inversion
[2023-03-08 04:54:01,859] INFO: MainProcess MainThread onnx_web.convert.diffusion.textual_inversion: saving text encoder for textual inversion
...
[2023-03-08 04:58:06,378] INFO: MainProcess MainThread onnx_web.convert.diffusion.textual_inversion: generating 74 layer tokens
[2023-03-08 04:58:06,379] INFO: MainProcess MainThread onnx_web.convert.diffusion.textual_inversion: found embedding for token ['goblin-0', 'goblin-1', 'goblin-2', 'goblin-3', 'goblin-4', 'goblin-5', 'goblin-6', 'gob
lin-7', 'goblin-8', 'goblin-9', 'goblin-10', 'goblin-11', 'goblin-12', 'goblin-13', 'goblin-14', 'goblin-15', 'goblin-16', 'goblin-17', 'goblin-18', 'goblin-19', 'goblin-20', 'goblin-21', 'goblin-22', 'goblin-23', 'goblin-24', 'goblin-25', 'goblin-26', 'goblin-27', 'goblin-28', 'goblin-29', 'goblin-30', 'goblin-31', 'goblin-32', 'goblin-33', 'goblin-34', 'goblin-35', 'goblin-36', 'goblin-37', 'goblin-38', 'goblin-39', 'goblin-40', 'goblin-41', 'goblin-42', 'goblin-43', 'goblin-44', 'goblin-45', 'goblin-46', 'goblin-47', 'goblin-48', 'goblin-49', 'goblin-50', 'goblin-51', 'goblin-52', 'goblin-53', 'goblin-54', 'goblin-55', 'goblin-56', 'goblin-57', 'goblin-58', 'goblin-59', 'goblin-60', 'goblin-61', 'goblin-62', 'goblin-63', 'goblin-64', 'goblin-65', 'goblin-66', 'goblin-67', 'goblin-68', 'goblin-69', 'goblin-70', 'goblin-71', 'goblin-72', 'goblin-73'] (*): torch.Size([74, 768])
[2023-03-08 04:58:07,685] INFO: MainProcess MainThread onnx_web.convert.diffusion.textual_inversion: added 74 tokens
[2023-03-08 04:58:07,874] DEBUG: MainProcess MainThread onnx_web.convert.diffusion.textual_inversion: embedding torch.Size([768]) vector for layer goblin-0
[2023-03-08 04:58:07,874] DEBUG: MainProcess MainThread onnx_web.convert.diffusion.textual_inversion: embedding torch.Size([768]) vector for layer goblin-1
[2023-03-08 04:58:07,874] DEBUG: MainProcess MainThread onnx_web.convert.diffusion.textual_inversion: embedding torch.Size([768]) vector for layer goblin-2
[2023-03-08 04:58:07,875] DEBUG: MainProcess MainThread onnx_web.convert.diffusion.textual_inversion: embedding torch.Size([768]) vector for layer goblin-3
```

Figuring out the number of layers after the model has been converted currently requires the original tensor file
(https://github.com/ssube/onnx-web/issues/212).
15 changes: 15 additions & 0 deletions docs/user-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ Please see [the server admin guide](server-admin.md) for details on how to confi
- [Model sources](#model-sources)
- [Downloading models from Civitai](#downloading-models-from-civitai)
- [Using a custom VAE](#using-a-custom-vae)
- [Using and controlling Textual Inversions](#using-and-controlling-textual-inversions)
- [Tabs](#tabs)
- [Txt2img tab](#txt2img-tab)
- [Scheduler parameter](#scheduler-parameter)
Expand Down Expand Up @@ -300,6 +301,20 @@ Some common VAE models include:
- https://huggingface.co/stabilityai/sd-vae-ft-mse
- https://huggingface.co/stabilityai/sd-vae-ft-mse-original

### Using and controlling Textual Inversions

You can use a Textual Inversion along with a diffusion model by giving one or more of the tokens from the inversion
model. Some Textual Inversions only have a single layer and some have 75 or more.

You can provide more than one of the numbered layer tokens using the `base-{X,Y}` range syntax in your prompt. This
uses the Python range rules, so `X` is inclusive and `Y` is not. The range `autumn-{0,5}` will be expanded into the
tokens `autumn-0 autumn-1 autumn-2 autumn-3 autumn-4`. You can use the layer tokens individually, out of order, and
repeat some layers or omit them entirely. You can provide a step as the third parameter, which will skip layers:
`even-layers-{0,100,2}` will be expanded into
`even-layers-0 even-layers-2 even-layers-4 even-layers-6 ... even-layers-98`.

The range syntax does not currently work when the Long Prompt Weighting pipeline is enabled.

## Tabs

### Txt2img tab
Expand Down

0 comments on commit 7800581

Please sign in to comment.