invoke-ai · Millu · Aug 25, 2023 · Aug 17, 2023 · Aug 18, 2023 · Aug 18, 2023
diff --git a/README.md b/README.md
@@ -43,7 +43,7 @@ Web Interface, interactive Command Line Interface, and also serves as
 the foundation for multiple commercial products.
 
 **Quick links**: [[How to
-  Install](https://invoke-ai.github.io/InvokeAI/#installation)] [<a
+  Install](https://invoke-ai.github.io/InvokeAI/installation/INSTALLATION/)] [<a
   href="https://discord.gg/ZmtBAhwWhy">Discord Server</a>] [<a
   href="https://invoke-ai.github.io/InvokeAI/">Documentation and
   Tutorials</a>] [<a
@@ -81,7 +81,7 @@ Table of Contents 📝
 ## Quick Start
 
 For full installation and upgrade instructions, please see:
-[InvokeAI Installation Overview](https://invoke-ai.github.io/InvokeAI/installation/)
+[InvokeAI Installation Overview](https://invoke-ai.github.io/InvokeAI/installation/INSTALLATION/)
 
 If upgrading from version 2.3, please read [Migrating a 2.3 root
 directory to 3.0](#migrating-to-3) first.

@@ -14,11 +14,14 @@ To join, just raise your hand on the InvokeAI Discord server (#dev-chat) or the
 #### Development
 If you’d like to help with development, please see our [development guide](contribution_guides/development.md). If you’re unfamiliar with contributing to open source projects, there is a tutorial contained within the development guide.
 
+#### Nodes
+If you’d like to help with development, please see our [nodes contribution guide](/nodes/contributingNodes). If you’re unfamiliar with contributing to open source projects, there is a tutorial contained within the development guide.
+
 #### Documentation
-If you’d like to help with documentation, please see our [documentation guide](contribution_guides/documenation.md).
+If you’d like to help with documentation, please see our [documentation guide](contribution_guides/documentation.md).
 
 #### Translation
-If you'd like to help with translation, please see our [translation guide](docs/contributing/.contribution_guides/translation.md).
+If you'd like to help with translation, please see our [translation guide](contribution_guides/translation.md).
 
 #### Tutorials 
 Please reach out to @imic or @hipsterusername on [Discord](https://discord.gg/ZmtBAhwWhy) to help create tutorials for InvokeAI.

@@ -270,9 +270,12 @@ new Invocation ready to be used.
 
 ![resize node editor](../assets/contributing/resize_node_editor.png)
 
-# Advanced
+## Contributing Nodes
+Once you've created a Node, the next step is to share it with the community! The best way to do this is to submit a Pull Request to add the Node to the [Community Nodes](nodes/communityNodes) list. If you're not sure how to do that, take a look a at our [contributing nodes overview](contributingNodes). 
 
-## Custom Input Fields
+## Advanced
+
+### Custom Input Fields
 
 Now that you know how to create your own Invocations, let us dive into slightly
 more advanced topics.
@@ -352,7 +355,7 @@ input field.
 We will discuss the `Config` class in extra detail later in this guide and how
 you can use it to make your Invocations more robust.
 
-## Custom Output Types
+### Custom Output Types
 
 Like with custom inputs, sometimes you might find yourself needing custom
 outputs that InvokeAI does not provide. We can easily set one up.
@@ -396,7 +399,7 @@ All set. We now have an output type that requires what we need to create a
 blank_image. And if you noticed it, we even used the `Config` class to ensure
 the fields are required.
 
-## Custom Configuration
+### Custom Configuration
 
 As you might have noticed when making inputs and outputs, we used a class called
 `Config` from _pydantic_ to further customize them. Because our inputs and
@@ -492,7 +495,7 @@ later time.
 
 # **[TODO]**
 
-## Custom Components For Frontend
+### Custom Components For Frontend
 
 Every backend input type should have a corresponding frontend component so the
 UI knows what to render when you use a particular field type.
@@ -513,7 +516,7 @@ now.
 
 ---
 
-# OLD -- TO BE DELETED OR MOVED LATER
+<!-- # OLD -- TO BE DELETED OR MOVED LATER
 
 ---
 
@@ -787,4 +790,5 @@ With the customization in place, the schema will now show these properties as
 required, obviating the need for extensive null checks in client code.
 
 See this `pydantic` issue for discussion on this solution:
-<https://github.com/pydantic/pydantic/discussions/4577>
+<https://github.com/pydantic/pydantic/discussions/4577> -->
+
@@ -4,80 +4,6 @@ title: Prompting-Features
 
 # :octicons-command-palette-24: Prompting-Features
 
-## **Negative and Unconditioned Prompts**
-
-Any words between a pair of square brackets will instruct Stable
-Diffusion to attempt to ban the concept from the generated image. The
-same effect is achieved by placing words in the "Negative Prompts"
-textbox in the Web UI.
-
-```text
-this is a test prompt [not really] to make you understand [cool] how this works.
-```
-
-In the above statement, the words 'not really cool` will be ignored by Stable
-Diffusion.
-
-Here's a prompt that depicts what it does.
-
-original prompt:
-
-`#!bash "A fantastical translucent pony made of water and foam, ethereal, radiant, hyperalism, scottish folklore, digital painting, artstation, concept art, smooth, 8 k frostbite 3 engine, ultra detailed, art by artgerm and greg rutkowski and magali villeneuve"`
-
-`#!bash parameters: steps=20, dimensions=512x768, CFG=7.5, Scheduler=k_euler_a, seed=1654590180`
-
-<figure markdown>
-
-![step1](../assets/negative_prompt_walkthru/step1.png)
-
-</figure>
-
-That image has a woman, so if we want the horse without a rider, we can
-influence the image not to have a woman by putting [woman] in the prompt, like
-this:
-
-`#!bash "A fantastical translucent poney made of water and foam, ethereal, radiant, hyperalism, scottish folklore, digital painting, artstation, concept art, smooth, 8 k frostbite 3 engine, ultra detailed, art by artgerm and greg rutkowski and magali villeneuve [woman]"`
-(same parameters as above)
-
-<figure markdown>
-
-![step2](../assets/negative_prompt_walkthru/step2.png)
-
-</figure>
-
-That's nice - but say we also don't want the image to be quite so blue. We can
-add "blue" to the list of negative prompts, so it's now [woman blue]:
-
-`#!bash "A fantastical translucent poney made of water and foam, ethereal, radiant, hyperalism, scottish folklore, digital painting, artstation, concept art, smooth, 8 k frostbite 3 engine, ultra detailed, art by artgerm and greg rutkowski and magali villeneuve [woman blue]"`
-(same parameters as above)
-
-<figure markdown>
-
-![step3](../assets/negative_prompt_walkthru/step3.png)
-
-</figure>
-
-Getting close - but there's no sense in having a saddle when our horse doesn't
-have a rider, so we'll add one more negative prompt: [woman blue saddle].
-
-`#!bash "A fantastical translucent poney made of water and foam, ethereal, radiant, hyperalism, scottish folklore, digital painting, artstation, concept art, smooth, 8 k frostbite 3 engine, ultra detailed, art by artgerm and greg rutkowski and magali villeneuve [woman blue saddle]"`
-(same parameters as above)
-
-<figure markdown>
-
-![step4](../assets/negative_prompt_walkthru/step4.png)
-
-</figure>
-
-!!! notes "Notes about this feature:"
-
-    * The only requirement for words to be ignored is that they are in between a pair of square brackets.
-    * You can provide multiple words within the same bracket.
-    * You can provide multiple brackets with multiple words in different places of your prompt. That works just fine.
-    * To improve typical anatomy problems, you can add negative prompts like `[bad anatomy, extra legs, extra arms, extra fingers, poorly drawn hands, poorly drawn feet, disfigured, out of frame, tiling, bad art, deformed, mutated]`.
-
----
-
 ## **Prompt Syntax Features**
 
 The InvokeAI prompting language has the following features:
@@ -102,9 +28,6 @@ The following syntax is recognised:
   `a tall thin man (picking (apricots)1.3)1.1`. (`+` is equivalent to 1.1, `++`
   is pow(1.1,2), `+++` is pow(1.1,3), etc; `-` means 0.9, `--` means pow(0.9,2),
   etc.)
-- attention also applies to `[unconditioning]` so
-  `a tall thin man picking apricots [(ladder)0.01]` will _very gently_ nudge SD
-  away from trying to draw the man on a ladder
 
 You can use this to increase or decrease the amount of something. Starting from
 this prompt of `a man picking apricots from a tree`, let's see what happens if
@@ -150,7 +73,7 @@ Or, alternatively, with more man:
 | ---------------------------------------------- | ---------------------------------------------- | ---------------------------------------------- | ---------------------------------------------- |
 | ![](../assets/prompt_syntax/mountain-man1.png) | ![](../assets/prompt_syntax/mountain-man2.png) | ![](../assets/prompt_syntax/mountain-man3.png) | ![](../assets/prompt_syntax/mountain-man4.png) |
 
-### Blending between prompts
+### Prompt Blending
 
 - `("a tall thin man picking apricots", "a tall thin man picking pears").blend(1,1)`
 - The existing prompt blending using `:<weight>` will continue to be supported -
@@ -168,6 +91,24 @@ Or, alternatively, with more man:
 See the section below on "Prompt Blending" for more information about how this
 works.
 
+### Prompt Conjunction  
+Join multiple clauses together to create a conjoined prompt. Each clause will be passed to CLIP separately. 
+
+For example, the prompt: 
+
+```bash
+"A mystical valley surround by towering granite cliffs, watercolor, warm"
+```
+
+Can be used with .and():
+```bash
+("A mystical valley", "surround by towering granite cliffs", "watercolor", "warm").and()
+```
+
+Each will give you different results - try them out and see what you prefer!
+
+
+
 ### Cross-Attention Control ('prompt2prompt')
 
 Sometimes an image you generate is almost right, and you just want to change one
@@ -190,7 +131,7 @@ For example, consider the prompt `a cat.swap(dog) playing with a ball in the for
 
       - For multiple word swaps, use parentheses: `a (fluffy cat).swap(barking dog) playing with a ball in the forest`.
       - To swap a comma, use quotes: `a ("fluffy, grey cat").swap("big, barking dog") playing with a ball in the forest`.
-- Supports options `t_start` and `t_end` (each 0-1) loosely corresponding to bloc97's `prompt_edit_tokens_start/_end` but with the math swapped to make it easier to
+- Supports options `t_start` and `t_end` (each 0-1) loosely corresponding to (bloc97's)[(https://github.com/bloc97/CrossAttentionControl)] `prompt_edit_tokens_start/_end` but with the math swapped to make it easier to
   intuitively understand. `t_start` and `t_end` are used to control on which steps cross-attention control should run. With the default values `t_start=0` and `t_end=1`, cross-attention control is active on every step of image generation. Other values can be used to turn cross-attention control off for part of the image generation process.
     - For example, if doing a diffusion with 10 steps for the prompt is `a cat.swap(dog, t_start=0.3, t_end=1.0) playing with a ball in the forest`, the first 3 steps will be run as `a cat playing with a ball in the forest`, while the last 7 steps will run as `a dog playing with a ball in the forest`, but the pixels that represent `dog` will be locked to the pixels that would have represented `cat` if the `cat` prompt had been used instead.
     - Conversely, for `a cat.swap(dog, t_start=0, t_end=0.7) playing with a ball in the forest`, the first 7 steps will run as `a dog playing with a ball in the forest` with the pixels that represent `dog` locked to the same pixels that would have represented `cat` if the `cat` prompt was being used instead. The final 3 steps will just run `a cat playing with a ball in the forest`.
@@ -201,7 +142,7 @@ Prompt2prompt `.swap()` is not compatible with xformers, which will be temporari
 The `prompt2prompt` code is based off
 [bloc97's colab](https://github.com/bloc97/CrossAttentionControl).
 
-### Escaping parantheses () and speech marks ""
+### Escaping parentheses () and speech marks ""
 
 If the model you are using has parentheses () or speech marks "" as part of its
 syntax, you will need to "escape" these using a backslash, so that`(my_keyword)`
@@ -212,37 +153,31 @@ the parentheses as part of the prompt syntax and it will get confused.
 
 ## **Prompt Blending**
 
-You may blend together different sections of the prompt to explore the AI's
+You may blend together prompts to explore the AI's
 latent semantic space and generate interesting (and often surprising!)
 variations. The syntax is:
 
 ```bash
-blue sphere:0.25 red cube:0.75 hybrid
+("prompt #1", "prompt #2").blend(0.25, 0.75)
 ```
 
-This will tell the sampler to blend 25% of the concept of a blue sphere with 75%
-of the concept of a red cube. The blend weights can use any combination of
-integers and floating point numbers, and they do not need to add up to 1.
-Everything to the left of the `:XX` up to the previous `:XX` is used for
-merging, so the overall effect is:
-
-```bash
-0.25 * "blue sphere" + 0.75 * "white duck" + hybrid
-```
+This will tell the sampler to blend 25% of the concept of prompt #1 with 75%
+of the concept of prompt #2. It is recommended to keep the sum of the weights to around 1.0, but interesting things might happen if you go outside of this range.
 
 Because you are exploring the "mind" of the AI, the AI's way of mixing two
 concepts may not match yours, leading to surprising effects. To illustrate, here
 are three images generated using various combinations of blend weights. As
 usual, unless you fix the seed, the prompts will give you different results each
 time you run them.
 
-<figure markdown>
+Let's examine how this affects image generation results:
 
-### "blue sphere, red cube, hybrid"
 
-</figure>
+```bash
+"blue sphere, red cube, hybrid"
+```
 
-This example doesn't use melding at all and represents the default way of mixing
+This example doesn't use blending at all and represents the default way of mixing
 concepts.
 
 <figure markdown>
@@ -251,55 +186,47 @@ concepts.
 
 </figure>
 
-It's interesting to see how the AI expressed the concept of "cube" as the four
-quadrants of the enclosing frame. If you look closely, there is depth there, so
-the enclosing frame is actually a cube.
+It's interesting to see how the AI expressed the concept of "cube" within the sphere. If you look closely, there is depth there, so the enclosing frame is actually a cube.
 
 <figure markdown>
 
-### "blue sphere:0.25 red cube:0.75 hybrid"
+```bash
+("blue sphere", "red cube").blend(0.25, 0.75)
+```
 
 ![blue-sphere-25-red-cube-75](../assets/prompt-blending/blue-sphere-0.25-red-cube-0.75-hybrid.png)
 
 </figure>
 
-Now that's interesting. We get neither a blue sphere nor a red cube, but a red
-sphere embedded in a brick wall, which represents a melding of concepts within
-the AI's "latent space" of semantic representations. Where is Ludwig
-Wittgenstein when you need him?
+Now that's interesting. We get an image with a resemblance of a red cube, with a hint of blue shadows which represents a melding of concepts within the AI's "latent space" of semantic representations. 
 
 <figure markdown>
 
-### "blue sphere:0.75 red cube:0.25 hybrid"
+```bash
+("blue sphere", "red cube").blend(0.75, 0.25)
+```
 
 ![blue-sphere-75-red-cube-25](../assets/prompt-blending/blue-sphere-0.75-red-cube-0.25-hybrid.png)
 
 </figure>
 
-Definitely more blue-spherey. The cube is gone entirely, but it's really cool
-abstract art.
+Definitely more blue-spherey. 
 
 <figure markdown>
 
-### "blue sphere:0.5 red cube:0.5 hybrid"
-
-![blue-sphere-5-red-cube-5-hybrid](../assets/prompt-blending/blue-sphere-0.5-red-cube-0.5-hybrid.png)
-
+```bash
+("blue sphere", "red cube").blend(0.5, 0.5)
+```
 </figure>
 
-Whoa...! I see blue and red, but no spheres or cubes. Is the word "hybrid"
-summoning up the concept of some sort of scifi creature? Let's find out.
-
 <figure markdown>
+![blue-sphere-5-red-cube-5-hybrid](../assets/prompt-blending/blue-sphere-0.5-red-cube-0.5-hybrid.png)
+</figure>
 
-### "blue sphere:0.5 red cube:0.5"
 
-![blue-sphere-5-red-cube-5](../assets/prompt-blending/blue-sphere-0.5-red-cube-0.5.png)
+Whoa...! I see blue and red, and if I squint, spheres and cubes.
 
-</figure>
 
-Indeed, removing the word "hybrid" produces an image that is more like what we'd
-expect.
 
 ## Dynamic Prompts
 

@@ -30,10 +30,6 @@ image output.
 ### * [Image-to-Image Guide](IMG2IMG.md)
 Use a seed image to build new creations in the CLI.
 
-### * [Generating Variations](VARIATIONS.md)
-Have an image you like and want to generate many more like it? Variations
-are the ticket.
-
 ## Model Management
 
 ### * [Model Installation](../installation/050_INSTALLING_MODELS.md)

@@ -0,0 +1,27 @@
+Taking the time to understand the diffusion process will help you to understand how to more effectively use InvokeAI.
+
+There are two main ways Stable Diffusion works - with images, and latents.
+
+Image space represents images in pixel form that you look at. Latent space represents compressed inputs. It’s in latent space that Stable Diffusion processes images. A VAE (Variational Auto Encoder) is responsible for compressing and encoding inputs into latent space, as well as decoding outputs back into image space.
+
+To fully understand the diffusion process, we need to understand a few more terms: UNet, CLIP, and conditioning.
+
+A U-Net is a model trained on a large number of latent images with with known amounts of random noise added.  This means that the U-Net can be given a slightly noisy image and it will predict the pattern of noise needed to subtract from the image in order to recover the original. 
+
+CLIP is a model that tokenizes and encodes text into conditioning. This conditioning guides the model during the denoising steps to produce a new image. 
+
+The U-Net and CLIP work together during the image generation process at each denoising step, with the U-Net removing noise in such a way that the result is similar to images in the U-Net’s training set, while CLIP guides the U-Net towards creating images that are most similar to the prompt.
+
+
+When you generate an image using text-to-image, multiple steps occur in latent space:
+1. Random noise is generated at the chosen height and width. The noise’s characteristics are dictated by  seed. This noise tensor is passed into latent space. We’ll call this noise A.
+2. Using a model’s U-Net, a noise predictor examines noise A, and the words tokenized by CLIP from your prompt (conditioning). It generates its own noise tensor to predict what the final image might look like in latent space. We’ll call this noise B.
+3. Noise B is subtracted from noise A in an attempt to create a latent image consistent with the prompt. This step is repeated for the number of sampler steps chosen.
+4. The VAE decodes the final latent image from latent space into image space.
+
+Image-to-image is a similar process, with only step 1 being different:
+1. The input image is encoded from image space into latent space by the VAE. Noise is then added to the input latent image. Denoising Strength dictates how may noise steps are added, and the amount of noise added at each step. A Denoising Strength of 0 means there are 0 steps and no noise added, resulting in an unchanged image, while a Denoising Strength of 1 results in the image being completely replaced with noise and a full set of denoising steps are performance. The process is then the same as steps 2-4 in the text-to-image process. 
+
+Furthermore, a model provides the CLIP prompt tokenizer, the VAE, and a U-Net (where noise prediction occurs given a prompt and initial noise tensor).
+
+A noise scheduler (eg. DPM++ 2M Karras) schedules the subtraction of noise from the latent image across the sampler steps chosen (step 3 above). Less noise is usually subtracted at higher sampler steps.