-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dynamic prompt generation script for parameter scans #2831
Conversation
Simple script to generate a file of InvokeAI prompts and settings that scan across steps and other parameters. To use, create a file named "template.yaml" (or similar) formatted like this >>> cut here <<< steps: "30:50:1" seed: 50 cfg: - 7 - 8 - 12 sampler: - ddim - k_lms prompt: - a sunny meadow in the mountains - a gathering storm in the mountains >>> cut here <<< Create sections named "steps", "seed", "cfg", "sampler" and "prompt". - Each section can have a constant value such as this: steps: 50 - Or a range of numeric values in the format: steps: "<start>:<stop>:<step>" - Or a list of values in the format: - value1 - value2 - value3 Be careful to: 1) put quotation marks around numeric ranges; 2) put a space between the "-" and the value in a list of values; and 3) use spaces, not tabs, at the beginnings of indented lines. When you run this script, capture the output into a text file like this: python generate_param_scan.py template.yaml > output_prompts.txt "output_prompts.txt" will now contain an expansion of all the list values you provided. You can examine it in a text editor such as Notepad. Now start the CLI, and feed the expanded prompt file to it using the "!replay" command: !replay output_prompts.txt Alternatively, you can directly feed the output of this script by issuing a command like this from the developer's console: python generate_param_scan.py template.yaml | invokeai You can use the web interface to view the resulting images and their metadata.
Very cool! Is this effectively the “dynamic prompt” feature? I.e., it’s creating every possible permutation of all those settings? |
Yeah, that's what this is. Hang on a bit and I'll add a syntax for combining prompt fragments like this:
|
- ability to cycle through models and dimensions - process automatically through invokeai - create an .md file to display the grid results
I believe that dynamic prompting with some syntax ( |
That's why this is going into the |
Part of it, yes... but |
Nothing preventing this from going into |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My review won't unblock this, but I'll give it the performative 👍 to support this being merged.
Update: To allow for faster dynamic prompt generation, the script now renders in parallel across multiple processes and GPUs. By default, it will launch one process per CUDA GPU, but this can be controlled using
Note that the images come out in higgledy-piggledy order, but on the filesystem they will sort in the order specified in the prompt template. I am considering a simple extension to allow this script to run across multiple nodes of a compute cluster, provided that the user has installed invokeai each of the nodes, but I should get back to working on nodes first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worked great with up to 3 parallel processes on a 24GB Tesla P40, after a few tries to get it going. I noted the issues inline.
Also, because stdout
from individual threads isn't captured, it wasn't easy to determine their root causes.
But overall works great in parallel on a single GPU (with the expected performance hit).
@lstein interestingly enough, the way you've structured the YAML is exactly how I'd write this as an input of an Argo workflow, which could easily parallelize it across hundreds of GPU instances. What's the approach you're thinking of? I did take a stab at it again using nodes over the weekend, but didn't get very far yet. |
That's an interesting coincidence!
The approach I'm thinking of is the same one as is used by `gnu-parallel`.
Basically it requires that each worker node has a public key-based ssh
login for the current user and that each worker has the same set of
InvokeAI models installed (or they may all use a shared directory). The
master node runs an ssh session across each worker which launches InvokeAI,
feeds the expanded prompts to the worker via its pseudo-tty standard in,
and accepts the resulting image via stdout. It's not at all elegant, but it
is actually quite scaleable and benefits from the built-in strong
authentication and encryption that ssh provides.
This almost can work now, but it requires a modification to the CLI to
encapsulate the image and its metadata as JSON-encoded objects rather than
writing to local disk.
Did you have a chance to test the multiprocessing on a local machine? I
haven't done much in the way of exception handling and am wondering whether
the errors are interpretable when there is a crash due to out of RAM or out
of VRAM conditions.
…On Mon, Mar 6, 2023 at 11:54 PM Eugene Brodsky ***@***.***> wrote:
I am considering a simple extension to allow this script to run across
multiple nodes of a compute cluster, provided that the user has installed
invokeai each of the nodes, but I should get back to working on nodes first.
@lstein <https://github.com/lstein> interestingly enough, the way you've
structured the YAML is exactly how I'd write this as an input of an Argo
workflow, which could easily parallelize it across hundreds of GPU
instances. What's the approach you're thinking of? I did take a stab at it
again using nodes over the weekend, but didn't get very far yet.
—
Reply to this email directly, view it on GitHub
<#2831 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAA3EVJIJSKMGIMMNY6DFCTW225QFANCNFSM6AAAAAAVJ7KXU4>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Cool! sounds similar to the SLURM scheduler, right? haven't worked with it it myself, but i know it's widely used in academia. Yes, i did test multiprocessing (single GPU, 3 threads) - worked very well with the exception of a couple stumbling blocks - see my previous comments. Main issues were the prompt-in-filename and the lack of log output. Worked great once I figured out my prompts were not filename-friendly. obscured logging output IS indeed a problem for troubleshooting though. I haven't gotten a CUDA OOM (will try!), but overall it would be helpful to capture and print stdout/stderr from the threads, even at the expense of pretty output, at least for now. I can approve to unblock this, but please ping me to re-test if you'd like. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
worked well with simple prompts (3 processes / 1 GPU)
filenames should not contain the prompt - this breaks with more complex prompts
I think the lack of logging is a blocker, and I'll fix this by redirecting stderr to a file rather than trying to suppress it. The reason I tried to suppress it is that I couldn't stand the sight of 8 TQDM's fighting with each other for control of the console! |
agreed! I'm very tempted to solve this with |
@ebr Logging of stderr is now implemented correctly (I think). Here is what it looks like:
The log files themselves have the TQDM progress bars, but I've separated them by process so that they'll be internally coherent. Runtime errors get logged to the end of the log file as expected. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the rebuild of stats.html
an intentional part of this PR?
Overall this looks like a super handy script we'll be glad to have, and it's well-documented. 👍
A few thoughts for things we can do later (certainly don't need to be in this iteration):
- direct HTML output instead of Markdown.
- adding proper batch support to the backend will be much more efficient than having more than one process per GPU in many cases. (We can vary many things within a parallel batch, but not all of the template fields, so you still may end up using this mode of parallelization if you want to compare models, for example.)
Programatically generate a large number of images varying by prompt and other image generation parameters
This is a little standalone script named
dynamic_prompting.py
that enables the generation of dynamic prompts. Using YAML syntax, you specify a template of prompt phrases and lists of generation parameters, and the script will generate a cross product of prompts and generation settings for you. You can save these prompts to disk for later use, or pipe them to the invokeai CLI to generate the images on the fly.Typical uses are testing step and CFG values systematically while holding the seed and prompt constant, testing out various artist's styles, and comparing the results of the same prompt across different models.
A typical template will look like this:
This will generate 96 different images, each of which varies by one of the dimensions specified in the template. For example, the prompt axis will generate a cross product list like:
A typical usage would be:
This will populate
/tmp/scanning
with each of the requested images, and also generate alog.md
file which you can open with an e-book reader to show something like this:Full instructions can be obtained using the
--instructions
switch, and an example template can be printed out using--example
: