Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ Web Interface, interactive Command Line Interface, and also serves as
the foundation for multiple commercial products.

**Quick links**: [[How to
Install](https://invoke-ai.github.io/InvokeAI/#installation)] [<a
Install](https://invoke-ai.github.io/InvokeAI/installation/INSTALLATION/)] [<a
href="https://discord.gg/ZmtBAhwWhy">Discord Server</a>] [<a
href="https://invoke-ai.github.io/InvokeAI/">Documentation and
Tutorials</a>] [<a
Expand Down Expand Up @@ -81,7 +81,7 @@ Table of Contents 📝
## Quick Start

For full installation and upgrade instructions, please see:
[InvokeAI Installation Overview](https://invoke-ai.github.io/InvokeAI/installation/)
[InvokeAI Installation Overview](https://invoke-ai.github.io/InvokeAI/installation/INSTALLATION/)

If upgrading from version 2.3, please read [Migrating a 2.3 root
directory to 3.0](#migrating-to-3) first.
Expand Down
Binary file modified docs/assets/nodes/groupsconditioning.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/assets/nodes/groupsnoise.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/assets/prompt-blending/blue-sphere-red-cube-hybrid.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
7 changes: 5 additions & 2 deletions docs/contributing/CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,14 @@ To join, just raise your hand on the InvokeAI Discord server (#dev-chat) or the
#### Development
If you’d like to help with development, please see our [development guide](contribution_guides/development.md). If you’re unfamiliar with contributing to open source projects, there is a tutorial contained within the development guide.

#### Nodes
If you’d like to help with development, please see our [nodes contribution guide](/nodes/contributingNodes). If you’re unfamiliar with contributing to open source projects, there is a tutorial contained within the development guide.

#### Documentation
If you’d like to help with documentation, please see our [documentation guide](contribution_guides/documenation.md).
If you’d like to help with documentation, please see our [documentation guide](contribution_guides/documentation.md).

#### Translation
If you'd like to help with translation, please see our [translation guide](docs/contributing/.contribution_guides/translation.md).
If you'd like to help with translation, please see our [translation guide](contribution_guides/translation.md).

#### Tutorials
Please reach out to @imic or @hipsterusername on [Discord](https://discord.gg/ZmtBAhwWhy) to help create tutorials for InvokeAI.
Expand Down
18 changes: 11 additions & 7 deletions docs/contributing/INVOCATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -270,9 +270,12 @@ new Invocation ready to be used.

![resize node editor](../assets/contributing/resize_node_editor.png)

# Advanced
## Contributing Nodes
Once you've created a Node, the next step is to share it with the community! The best way to do this is to submit a Pull Request to add the Node to the [Community Nodes](nodes/communityNodes) list. If you're not sure how to do that, take a look a at our [contributing nodes overview](contributingNodes).

## Custom Input Fields
## Advanced

### Custom Input Fields

Now that you know how to create your own Invocations, let us dive into slightly
more advanced topics.
Expand Down Expand Up @@ -352,7 +355,7 @@ input field.
We will discuss the `Config` class in extra detail later in this guide and how
you can use it to make your Invocations more robust.

## Custom Output Types
### Custom Output Types

Like with custom inputs, sometimes you might find yourself needing custom
outputs that InvokeAI does not provide. We can easily set one up.
Expand Down Expand Up @@ -396,7 +399,7 @@ All set. We now have an output type that requires what we need to create a
blank_image. And if you noticed it, we even used the `Config` class to ensure
the fields are required.

## Custom Configuration
### Custom Configuration

As you might have noticed when making inputs and outputs, we used a class called
`Config` from _pydantic_ to further customize them. Because our inputs and
Expand Down Expand Up @@ -492,7 +495,7 @@ later time.

# **[TODO]**

## Custom Components For Frontend
### Custom Components For Frontend

Every backend input type should have a corresponding frontend component so the
UI knows what to render when you use a particular field type.
Expand All @@ -513,7 +516,7 @@ now.

---

# OLD -- TO BE DELETED OR MOVED LATER
<!-- # OLD -- TO BE DELETED OR MOVED LATER

---

Expand Down Expand Up @@ -787,4 +790,5 @@ With the customization in place, the schema will now show these properties as
required, obviating the need for extensive null checks in client code.

See this `pydantic` issue for discussion on this solution:
<https://github.com/pydantic/pydantic/discussions/4577>
<https://github.com/pydantic/pydantic/discussions/4577> -->

208 changes: 0 additions & 208 deletions docs/features/NODES.md

This file was deleted.

163 changes: 45 additions & 118 deletions docs/features/PROMPTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,80 +4,6 @@ title: Prompting-Features

# :octicons-command-palette-24: Prompting-Features

## **Negative and Unconditioned Prompts**

Any words between a pair of square brackets will instruct Stable
Diffusion to attempt to ban the concept from the generated image. The
same effect is achieved by placing words in the "Negative Prompts"
textbox in the Web UI.

```text
this is a test prompt [not really] to make you understand [cool] how this works.
```

In the above statement, the words 'not really cool` will be ignored by Stable
Diffusion.

Here's a prompt that depicts what it does.

original prompt:

`#!bash "A fantastical translucent pony made of water and foam, ethereal, radiant, hyperalism, scottish folklore, digital painting, artstation, concept art, smooth, 8 k frostbite 3 engine, ultra detailed, art by artgerm and greg rutkowski and magali villeneuve"`

`#!bash parameters: steps=20, dimensions=512x768, CFG=7.5, Scheduler=k_euler_a, seed=1654590180`

<figure markdown>

![step1](../assets/negative_prompt_walkthru/step1.png)

</figure>

That image has a woman, so if we want the horse without a rider, we can
influence the image not to have a woman by putting [woman] in the prompt, like
this:

`#!bash "A fantastical translucent poney made of water and foam, ethereal, radiant, hyperalism, scottish folklore, digital painting, artstation, concept art, smooth, 8 k frostbite 3 engine, ultra detailed, art by artgerm and greg rutkowski and magali villeneuve [woman]"`
(same parameters as above)

<figure markdown>

![step2](../assets/negative_prompt_walkthru/step2.png)

</figure>

That's nice - but say we also don't want the image to be quite so blue. We can
add "blue" to the list of negative prompts, so it's now [woman blue]:

`#!bash "A fantastical translucent poney made of water and foam, ethereal, radiant, hyperalism, scottish folklore, digital painting, artstation, concept art, smooth, 8 k frostbite 3 engine, ultra detailed, art by artgerm and greg rutkowski and magali villeneuve [woman blue]"`
(same parameters as above)

<figure markdown>

![step3](../assets/negative_prompt_walkthru/step3.png)

</figure>

Getting close - but there's no sense in having a saddle when our horse doesn't
have a rider, so we'll add one more negative prompt: [woman blue saddle].

`#!bash "A fantastical translucent poney made of water and foam, ethereal, radiant, hyperalism, scottish folklore, digital painting, artstation, concept art, smooth, 8 k frostbite 3 engine, ultra detailed, art by artgerm and greg rutkowski and magali villeneuve [woman blue saddle]"`
(same parameters as above)

<figure markdown>

![step4](../assets/negative_prompt_walkthru/step4.png)

</figure>

!!! notes "Notes about this feature:"

* The only requirement for words to be ignored is that they are in between a pair of square brackets.
* You can provide multiple words within the same bracket.
* You can provide multiple brackets with multiple words in different places of your prompt. That works just fine.
* To improve typical anatomy problems, you can add negative prompts like `[bad anatomy, extra legs, extra arms, extra fingers, poorly drawn hands, poorly drawn feet, disfigured, out of frame, tiling, bad art, deformed, mutated]`.

---

## **Prompt Syntax Features**

The InvokeAI prompting language has the following features:
Expand All @@ -102,9 +28,6 @@ The following syntax is recognised:
`a tall thin man (picking (apricots)1.3)1.1`. (`+` is equivalent to 1.1, `++`
is pow(1.1,2), `+++` is pow(1.1,3), etc; `-` means 0.9, `--` means pow(0.9,2),
etc.)
- attention also applies to `[unconditioning]` so
`a tall thin man picking apricots [(ladder)0.01]` will _very gently_ nudge SD
away from trying to draw the man on a ladder

You can use this to increase or decrease the amount of something. Starting from
this prompt of `a man picking apricots from a tree`, let's see what happens if
Expand Down Expand Up @@ -150,7 +73,7 @@ Or, alternatively, with more man:
| ---------------------------------------------- | ---------------------------------------------- | ---------------------------------------------- | ---------------------------------------------- |
| ![](../assets/prompt_syntax/mountain-man1.png) | ![](../assets/prompt_syntax/mountain-man2.png) | ![](../assets/prompt_syntax/mountain-man3.png) | ![](../assets/prompt_syntax/mountain-man4.png) |

### Blending between prompts
### Prompt Blending

- `("a tall thin man picking apricots", "a tall thin man picking pears").blend(1,1)`
- The existing prompt blending using `:<weight>` will continue to be supported -
Expand All @@ -168,6 +91,24 @@ Or, alternatively, with more man:
See the section below on "Prompt Blending" for more information about how this
works.

### Prompt Conjunction
Join multiple clauses together to create a conjoined prompt. Each clause will be passed to CLIP separately.

For example, the prompt:

```bash
"A mystical valley surround by towering granite cliffs, watercolor, warm"
```

Can be used with .and():
```bash
("A mystical valley", "surround by towering granite cliffs", "watercolor", "warm").and()
```

Each will give you different results - try them out and see what you prefer!



### Cross-Attention Control ('prompt2prompt')

Sometimes an image you generate is almost right, and you just want to change one
Expand All @@ -190,7 +131,7 @@ For example, consider the prompt `a cat.swap(dog) playing with a ball in the for

- For multiple word swaps, use parentheses: `a (fluffy cat).swap(barking dog) playing with a ball in the forest`.
- To swap a comma, use quotes: `a ("fluffy, grey cat").swap("big, barking dog") playing with a ball in the forest`.
- Supports options `t_start` and `t_end` (each 0-1) loosely corresponding to bloc97's `prompt_edit_tokens_start/_end` but with the math swapped to make it easier to
- Supports options `t_start` and `t_end` (each 0-1) loosely corresponding to (bloc97's)[(https://github.com/bloc97/CrossAttentionControl)] `prompt_edit_tokens_start/_end` but with the math swapped to make it easier to
intuitively understand. `t_start` and `t_end` are used to control on which steps cross-attention control should run. With the default values `t_start=0` and `t_end=1`, cross-attention control is active on every step of image generation. Other values can be used to turn cross-attention control off for part of the image generation process.
- For example, if doing a diffusion with 10 steps for the prompt is `a cat.swap(dog, t_start=0.3, t_end=1.0) playing with a ball in the forest`, the first 3 steps will be run as `a cat playing with a ball in the forest`, while the last 7 steps will run as `a dog playing with a ball in the forest`, but the pixels that represent `dog` will be locked to the pixels that would have represented `cat` if the `cat` prompt had been used instead.
- Conversely, for `a cat.swap(dog, t_start=0, t_end=0.7) playing with a ball in the forest`, the first 7 steps will run as `a dog playing with a ball in the forest` with the pixels that represent `dog` locked to the same pixels that would have represented `cat` if the `cat` prompt was being used instead. The final 3 steps will just run `a cat playing with a ball in the forest`.
Expand All @@ -201,7 +142,7 @@ Prompt2prompt `.swap()` is not compatible with xformers, which will be temporari
The `prompt2prompt` code is based off
[bloc97's colab](https://github.com/bloc97/CrossAttentionControl).

### Escaping parantheses () and speech marks ""
### Escaping parentheses () and speech marks ""

If the model you are using has parentheses () or speech marks "" as part of its
syntax, you will need to "escape" these using a backslash, so that`(my_keyword)`
Expand All @@ -212,37 +153,31 @@ the parentheses as part of the prompt syntax and it will get confused.

## **Prompt Blending**

You may blend together different sections of the prompt to explore the AI's
You may blend together prompts to explore the AI's
latent semantic space and generate interesting (and often surprising!)
variations. The syntax is:

```bash
blue sphere:0.25 red cube:0.75 hybrid
("prompt #1", "prompt #2").blend(0.25, 0.75)
```

This will tell the sampler to blend 25% of the concept of a blue sphere with 75%
of the concept of a red cube. The blend weights can use any combination of
integers and floating point numbers, and they do not need to add up to 1.
Everything to the left of the `:XX` up to the previous `:XX` is used for
merging, so the overall effect is:

```bash
0.25 * "blue sphere" + 0.75 * "white duck" + hybrid
```
This will tell the sampler to blend 25% of the concept of prompt #1 with 75%
of the concept of prompt #2. It is recommended to keep the sum of the weights to around 1.0, but interesting things might happen if you go outside of this range.

Because you are exploring the "mind" of the AI, the AI's way of mixing two
concepts may not match yours, leading to surprising effects. To illustrate, here
are three images generated using various combinations of blend weights. As
usual, unless you fix the seed, the prompts will give you different results each
time you run them.

<figure markdown>
Let's examine how this affects image generation results:

### "blue sphere, red cube, hybrid"

</figure>
```bash
"blue sphere, red cube, hybrid"
```

This example doesn't use melding at all and represents the default way of mixing
This example doesn't use blending at all and represents the default way of mixing
concepts.

<figure markdown>
Expand All @@ -251,55 +186,47 @@ concepts.

</figure>

It's interesting to see how the AI expressed the concept of "cube" as the four
quadrants of the enclosing frame. If you look closely, there is depth there, so
the enclosing frame is actually a cube.
It's interesting to see how the AI expressed the concept of "cube" within the sphere. If you look closely, there is depth there, so the enclosing frame is actually a cube.

<figure markdown>

### "blue sphere:0.25 red cube:0.75 hybrid"
```bash
("blue sphere", "red cube").blend(0.25, 0.75)
```

![blue-sphere-25-red-cube-75](../assets/prompt-blending/blue-sphere-0.25-red-cube-0.75-hybrid.png)

</figure>

Now that's interesting. We get neither a blue sphere nor a red cube, but a red
sphere embedded in a brick wall, which represents a melding of concepts within
the AI's "latent space" of semantic representations. Where is Ludwig
Wittgenstein when you need him?
Now that's interesting. We get an image with a resemblance of a red cube, with a hint of blue shadows which represents a melding of concepts within the AI's "latent space" of semantic representations.

<figure markdown>

### "blue sphere:0.75 red cube:0.25 hybrid"
```bash
("blue sphere", "red cube").blend(0.75, 0.25)
```

![blue-sphere-75-red-cube-25](../assets/prompt-blending/blue-sphere-0.75-red-cube-0.25-hybrid.png)

</figure>

Definitely more blue-spherey. The cube is gone entirely, but it's really cool
abstract art.
Definitely more blue-spherey.

<figure markdown>

### "blue sphere:0.5 red cube:0.5 hybrid"

![blue-sphere-5-red-cube-5-hybrid](../assets/prompt-blending/blue-sphere-0.5-red-cube-0.5-hybrid.png)

```bash
("blue sphere", "red cube").blend(0.5, 0.5)
```
</figure>

Whoa...! I see blue and red, but no spheres or cubes. Is the word "hybrid"
summoning up the concept of some sort of scifi creature? Let's find out.

<figure markdown>
![blue-sphere-5-red-cube-5-hybrid](../assets/prompt-blending/blue-sphere-0.5-red-cube-0.5-hybrid.png)
</figure>

### "blue sphere:0.5 red cube:0.5"

![blue-sphere-5-red-cube-5](../assets/prompt-blending/blue-sphere-0.5-red-cube-0.5.png)
Whoa...! I see blue and red, and if I squint, spheres and cubes.

</figure>

Indeed, removing the word "hybrid" produces an image that is more like what we'd
expect.

## Dynamic Prompts

Expand Down
4 changes: 0 additions & 4 deletions docs/features/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,10 +30,6 @@ image output.
### * [Image-to-Image Guide](IMG2IMG.md)
Use a seed image to build new creations in the CLI.

### * [Generating Variations](VARIATIONS.md)
Have an image you like and want to generate many more like it? Variations
are the ticket.

## Model Management

### * [Model Installation](../installation/050_INSTALLING_MODELS.md)
Expand Down
27 changes: 27 additions & 0 deletions docs/help/diffusion.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
Taking the time to understand the diffusion process will help you to understand how to more effectively use InvokeAI.

There are two main ways Stable Diffusion works - with images, and latents.

Image space represents images in pixel form that you look at. Latent space represents compressed inputs. It’s in latent space that Stable Diffusion processes images. A VAE (Variational Auto Encoder) is responsible for compressing and encoding inputs into latent space, as well as decoding outputs back into image space.

To fully understand the diffusion process, we need to understand a few more terms: UNet, CLIP, and conditioning.

A U-Net is a model trained on a large number of latent images with with known amounts of random noise added. This means that the U-Net can be given a slightly noisy image and it will predict the pattern of noise needed to subtract from the image in order to recover the original.

CLIP is a model that tokenizes and encodes text into conditioning. This conditioning guides the model during the denoising steps to produce a new image.

The U-Net and CLIP work together during the image generation process at each denoising step, with the U-Net removing noise in such a way that the result is similar to images in the U-Net’s training set, while CLIP guides the U-Net towards creating images that are most similar to the prompt.


When you generate an image using text-to-image, multiple steps occur in latent space:
1. Random noise is generated at the chosen height and width. The noise’s characteristics are dictated by seed. This noise tensor is passed into latent space. We’ll call this noise A.
2. Using a model’s U-Net, a noise predictor examines noise A, and the words tokenized by CLIP from your prompt (conditioning). It generates its own noise tensor to predict what the final image might look like in latent space. We’ll call this noise B.
3. Noise B is subtracted from noise A in an attempt to create a latent image consistent with the prompt. This step is repeated for the number of sampler steps chosen.
4. The VAE decodes the final latent image from latent space into image space.

Image-to-image is a similar process, with only step 1 being different:
1. The input image is encoded from image space into latent space by the VAE. Noise is then added to the input latent image. Denoising Strength dictates how may noise steps are added, and the amount of noise added at each step. A Denoising Strength of 0 means there are 0 steps and no noise added, resulting in an unchanged image, while a Denoising Strength of 1 results in the image being completely replaced with noise and a full set of denoising steps are performance. The process is then the same as steps 2-4 in the text-to-image process.

Furthermore, a model provides the CLIP prompt tokenizer, the VAE, and a U-Net (where noise prediction occurs given a prompt and initial noise tensor).

A noise scheduler (eg. DPM++ 2M Karras) schedules the subtraction of noise from the latent image across the sampler steps chosen (step 3 above). Less noise is usually subtracted at higher sampler steps.
Loading