-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable fast switching among models at the invoke> command line #1066
Conversation
- This PR enables two new commands in the invoke.py script !models -- list the available models and their cache status !switch <model> -- switch to the indicated model Example: invoke> !models laion400m not loaded Latent Diffusion LAION400M model stable-diffusion-1.4 active Stable Diffusion inference model version 1.4 waifu-1.3 cached Waifu anime model version 1.3 invoke> !switch waifu-1.3 >> Caching model stable-diffusion-1.4 in system RAM >> Retrieving model waifu-1.3 from system RAM cache More details: - Use fast switching algorithm described in PR #948 - Models are selected using their configuration stanza name given in models.yaml. - To avoid filling up CPU RAM with cached models, this PR implements an LRU cache that monitors available CPU RAM. - The caching code allows the minimum value of available RAM to be adjusted, but invoke.py does not currently have a command-line argument that allows you to set it. The minimum free RAM is arbitrarily set to 2 GB. - Add optional description field to configs/models.yaml Unrelated fixes: - Added ">>" to CompViz model loading messages in order to make user experience more consistent. - When generating an image greater than defaults, will only warn about possible VRAM filling the first time. - Fixed bug that was causing help message to be printed twice. This involved moving the import line for the web backend into the section where it is called.
- This PR enables two new commands in the invoke.py script !models -- list the available models and their cache status !switch <model> -- switch to the indicated model Example: invoke> !models laion400m not loaded Latent Diffusion LAION400M model stable-diffusion-1.4 active Stable Diffusion inference model version 1.4 waifu-1.3 cached Waifu anime model version 1.3 invoke> !switch waifu-1.3 >> Caching model stable-diffusion-1.4 in system RAM >> Retrieving model waifu-1.3 from system RAM cache The name and descriptions of the models are taken from `config/models.yaml`. A future enhancement to `model_cache.py` will be to enable new model stanzas to be added to the file programmatically. This will be useful for the WebGUI. More details: - Use fast switching algorithm described in PR #948 - Models are selected using their configuration stanza name given in models.yaml. - To avoid filling up CPU RAM with cached models, this PR implements an LRU cache that monitors available CPU RAM. - The caching code allows the minimum value of available RAM to be adjusted, but invoke.py does not currently have a command-line argument that allows you to set it. The minimum free RAM is arbitrarily set to 2 GB. - Add optional description field to configs/models.yaml Unrelated fixes: - Added ">>" to CompViz model loading messages in order to make user experience more consistent. - When generating an image greater than defaults, will only warn about possible VRAM filling the first time. - Fixed bug that was causing help message to be printed twice. This involved moving the import line for the web backend into the section where it is called. Coauthored by: @ArDiouscuros
Hey folks, I've tested the fast model switching on both a CUDA system and an intel box with no GPU, so I think it will work on the Mac, but I'm asking @Any-Winter-4079 to check it out just in case. I tried to make this useful for the WebGUI. The API is in
The model_dict contains the following keys:
|
Tested on M1 Max 32GB. stable diffusion 1.4 image generated
image generated
image generated
crashed with:
Switching on another cached model throws same error. |
It look like it is because in model_cache it is not returned to mps, only for cuda.
If I add
|
Stupid bug! I'll fix.
Lincoln
On Wed, Oct 12, 2022 at 9:58 AM Jan Skurovec ***@***.***> wrote:
It look like it is because in model_cache it is not returned to mps, only
for cuda.
def _model_from_cpu(self,model):
if self._has_cuda():
model.to(self.device)
model.first_stage_model.to(self.device)
model.cond_stage_model.to(self.device)
model.cond_stage_model.device = self.device
return model
If I add model.to(self.device) it is workign again and generatign same
image.
def _model_from_cpu(self,model):
if self._has_cuda():
model.to(self.device)
model.first_stage_model.to(self.device)
model.cond_stage_model.to(self.device)
model.cond_stage_model.device = self.device
else:
model.to(self.device)
return model
—
Reply to this email directly, view it on GitHub
<#1066 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAA3EVMH4QC52NOB4RETDGLWC27ZPANCNFSM6AAAAAARC6U4ZI>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
--
Written on my cell phone. Anything that seems odd is the fault of
auto-correct.
|
H'mmm. On second thought, I'm not sure whether this will fix the problem.
On non-CUDA systems, the only device is "cpu", and so the caching commands
are essentially intended to be no-ops on MPS systems.
Unless I'm fundamentally misunderstanding something!
Lincoln
On Wed, Oct 12, 2022 at 10:14 AM Lincoln Stein ***@***.***>
wrote:
Stupid bug! I'll fix.
Lincoln
On Wed, Oct 12, 2022 at 9:58 AM Jan Skurovec ***@***.***>
wrote:
> It look like it is because in model_cache it is not returned to mps, only
> for cuda.
>
> def _model_from_cpu(self,model):
> if self._has_cuda():
> model.to(self.device)
> model.first_stage_model.to(self.device)
> model.cond_stage_model.to(self.device)
> model.cond_stage_model.device = self.device
>
> return model
>
> If I add model.to(self.device) it is workign again and generatign same
> image.
>
> def _model_from_cpu(self,model):
> if self._has_cuda():
> model.to(self.device)
> model.first_stage_model.to(self.device)
> model.cond_stage_model.to(self.device)
> model.cond_stage_model.device = self.device
> else:
> model.to(self.device)
>
> return model
>
> —
> Reply to this email directly, view it on GitHub
> <#1066 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AAA3EVMH4QC52NOB4RETDGLWC27ZPANCNFSM6AAAAAARC6U4ZI>
> .
> You are receiving this because you authored the thread.Message ID:
> ***@***.***>
>
--
Written on my cell phone. Anything that seems odd is the fault of
auto-correct.
--
Written on my cell phone. Anything that seems odd is the fault of
auto-correct.
|
Or is there an "mps" device?
On Wed, Oct 12, 2022 at 10:24 AM Lincoln Stein ***@***.***>
wrote:
H'mmm. On second thought, I'm not sure whether this will fix the problem.
On non-CUDA systems, the only device is "cpu", and so the caching commands
are essentially intended to be no-ops on MPS systems.
Unless I'm fundamentally misunderstanding something!
Lincoln
On Wed, Oct 12, 2022 at 10:14 AM Lincoln Stein ***@***.***>
wrote:
> Stupid bug! I'll fix.
>
> Lincoln
>
> On Wed, Oct 12, 2022 at 9:58 AM Jan Skurovec ***@***.***>
> wrote:
>
>> It look like it is because in model_cache it is not returned to mps,
>> only for cuda.
>>
>> def _model_from_cpu(self,model):
>> if self._has_cuda():
>> model.to(self.device)
>> model.first_stage_model.to(self.device)
>> model.cond_stage_model.to(self.device)
>> model.cond_stage_model.device = self.device
>>
>> return model
>>
>> If I add model.to(self.device) it is workign again and generatign same
>> image.
>>
>> def _model_from_cpu(self,model):
>> if self._has_cuda():
>> model.to(self.device)
>> model.first_stage_model.to(self.device)
>> model.cond_stage_model.to(self.device)
>> model.cond_stage_model.device = self.device
>> else:
>> model.to(self.device)
>>
>> return model
>>
>> —
>> Reply to this email directly, view it on GitHub
>> <#1066 (comment)>,
>> or unsubscribe
>> <https://github.com/notifications/unsubscribe-auth/AAA3EVMH4QC52NOB4RETDGLWC27ZPANCNFSM6AAAAAARC6U4ZI>
>> .
>> You are receiving this because you authored the thread.Message ID:
>> ***@***.***>
>>
> --
> Written on my cell phone. Anything that seems odd is the fault of
> auto-correct.
>
--
Written on my cell phone. Anything that seems odd is the fault of
auto-correct.
--
Written on my cell phone. Anything that seems odd is the fault of
auto-correct.
|
edit: at least on m1 mac, I currently do not have intel mac maybe create
|
So the code should be:
def _model_from_cpu(self,model):
if self.device != 'cpu':
model.to(self.device)
model.first_stage_model.to(self.device)
model.cond_stage_model.to(self.device)
model.cond_stage_model.device = self.device
return model
I'm at a conference right now, but will commit this fix sometime today.
L
On Wed, Oct 12, 2022 at 10:30 AM Jan Skurovec ***@***.***> wrote:
self.device.type returns 'mps'
—
Reply to this email directly, view it on GitHub
<#1066 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAA3EVLIQI5ODUDNDWE3Y2LWC3DOVANCNFSM6AAAAAARC6U4ZI>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
--
Written on my cell phone. Anything that seems odd is the fault of
auto-correct.
|
After fix switching models works. Images are consistent between models. |
Seems to work well on a 8gb M1 too. I didn't think its popping on my 8gb, but the OS is swapping the cached model out to swap, this is having no affect on me, concerned about effect 16Gb M1, with larger images though. Just running with a few debug statements to test. Ideas: |
Yep, macOS is so lying about the free memory
|
hold on AVG_MODEL_SIZE=2.1GIGS
If you're using DEFAULT_MIN_AVAIL for self.min_avail_mem wouldn't avail_memory have to be negative to pop |
- fixed backwards calculation of minimum available memory - only execute m.padding adjustment code once upon load
This is what comes from coding at 2 am, and also why it's so important to have multiple eyes on the code! It looks like I got this backward, and I'm surprised that it worked on my system. But I probably tested it backwards too. Latest commit fixes this, and addresses other misc issues. Thanks for the help debugging on M1! |
Latest changes working on my m1. |
Same here , on my 8Gb M1 |
So would someone provide a code review approval so that I can move on to working on the next version of outpainting? |
Once the amount of available memory falls below 2 GB what will happen is that each new model will replace the last one in cache memory. To retrieve the previous model the system will have to reload from disk. I think the main problem might be when the user wants to give InvokeAI more latitude to use available memory. |
|
@Any-Winter-4079 PR #1056 was merged about same time this was created, so it is possible its from version before fix |
The switch-models branch was made before |
Just started testing this. I'm getting Sequence:
Also |
I can confirm this bug on a CUDA system. I'd tested out the scenario of loading a missing model file earlier in development and it was working, but more recent changes apparently broke it. I'll make another commit later today, and probably include code for interactively adding and editing stanzas in the |
Thanks for the fix. One thing is stable-diffusion-1.4 can't be renamed in Besides that, it works well as far as I've tested it. By the way, is there correlation between sd-1.4 and waifu-1.3 (especially SD at low step values) in terms of seeds? I find it pretty fascinating. The prompt for these images is the same. Only seeds change.
I guess this is the answer, but I still find it pretty cool. |
I'm going to keep testing a bit more, with larger images especially. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
- !import_model <path/to/model/weights> will import a new model, prompt the user for its name and description, write it to the models.yaml file, and load it. - !edit_model <model_name> will bring up a previously-defined model and prompt the user to edit its descriptive fields. Example of !import_model <pre> invoke> <b>!import_model models/ldm/stable-diffusion-v1/model-epoch08-float16.ckpt</b> >> Model import in process. Please enter the values needed to configure this model: Name for this model: <b>waifu-diffusion</b> Description of this model: <b>Waifu Diffusion v1.3</b> Configuration file for this model: <b>configs/stable-diffusion/v1-inference.yaml</b> Default image width: <b>512</b> Default image height: <b>512</b> >> New configuration: waifu-diffusion: config: configs/stable-diffusion/v1-inference.yaml description: Waifu Diffusion v1.3 height: 512 weights: models/ldm/stable-diffusion-v1/model-epoch08-float16.ckpt width: 512 OK to import [n]? <b>y</b> >> Caching model stable-diffusion-1.4 in system RAM >> Loading waifu-diffusion from models/ldm/stable-diffusion-v1/model-epoch08-float16.ckpt | LatentDiffusion: Running in eps-prediction mode | DiffusionWrapper has 859.52 M params. | Making attention of type 'vanilla' with 512 in_channels | Working with z of shape (1, 4, 32, 32) = 4096 dimensions. | Making attention of type 'vanilla' with 512 in_channels | Using faster float16 precision </pre> Example of !edit_model <pre> invoke> <b>!edit_model waifu-diffusion</b> >> Editing model waifu-diffusion from configuration file ./configs/models.yaml description: <b>Waifu diffusion v1.4beta</b> weights: models/ldm/stable-diffusion-v1/<b>model-epoch10-float16.ckpt</b> config: configs/stable-diffusion/v1-inference.yaml width: 512 height: 512 >> New configuration: waifu-diffusion: config: configs/stable-diffusion/v1-inference.yaml description: Waifu diffusion v1.4beta weights: models/ldm/stable-diffusion-v1/model-epoch10-float16.ckpt height: 512 width: 512 OK to import [n]? y >> Caching model stable-diffusion-1.4 in system RAM >> Loading waifu-diffusion from models/ldm/stable-diffusion-v1/model-epoch10-float16.ckpt ... </pre>
I've added a new commit that gives Here are examples of how they work: !import_modelinvoke> !import_model models/ldm/stable-diffusion-v1/model-epoch08-float16.ckpt >> Model import in process. Please enter the values needed to configure this model: Name for this model: waifu-diffusion Description of this model: Waifu Diffusion v1.3 Configuration file for this model: configs/stable-diffusion/v1-inference.yaml Default image width: 512 Default image height: 512 >> New configuration: waifu-diffusion: config: configs/stable-diffusion/v1-inference.yaml description: Waifu Diffusion v1.3 height: 512 weights: models/ldm/stable-diffusion-v1/model-epoch08-float16.ckpt width: 512 OK to import [n]? y >> Caching model stable-diffusion-1.4 in system RAM >> Loading waifu-diffusion from models/ldm/stable-diffusion-v1/model-epoch08-float16.ckpt | LatentDiffusion: Running in eps-prediction mode | DiffusionWrapper has 859.52 M params. | Making attention of type 'vanilla' with 512 in_channels | Working with z of shape (1, 4, 32, 32) = 4096 dimensions. | Making attention of type 'vanilla' with 512 in_channels | Using faster float16 precision ...etc !edit_modelinvoke> !edit_model waifu-diffusion >> Editing model waifu-diffusion from configuration file ./configs/models.yaml description: Waifu diffusion v1.4beta weights: models/ldm/stable-diffusion-v1/model-epoch10-float16.ckpt config: configs/stable-diffusion/v1-inference.yaml width: 512 height: 512 >> New configuration: waifu-diffusion: config: configs/stable-diffusion/v1-inference.yaml description: Waifu diffusion v1.4beta weights: models/ldm/stable-diffusion-v1/model-epoch10-float16.ckpt height: 512 width: 512 OK to change [n]? y >> Caching model stable-diffusion-1.4 in system RAM >> Loading waifu-diffusion from models/ldm/stable-diffusion-v1/model-epoch10-float16.ckpt ... |
Yeah, Thanks for letting me know about the correlation between SD and Waifu. I hadn't noticed that. It's quite cool. |
That's probably a good idea.
Testing now (there's a lot to test). 1) Passing a wrong modelIt does create a
Trying to use it.
Silently fails and continues having the current model
2) Deleting a modelI can't remove the wrongly created model except from the yaml file, right? I think either we prevent wrong additions, or allow to remove (or ideally both) 3) Adding an existing modelI edited the
It's not super important, but it'd be nice to say: hey, reminder that this ckpt is already in 4) Using a recently added model
5) AutocompletingThe autocomplete doesn't show the new model either.
but
|
Good feedback. I knew about the problem with autocomplete not picking up the new model until you restart the script, but the rest is new to me. I'll add more error reporting as well as a delete option. I think I'll merge in now and add the error checks in a subsequent PR. |
New commands:
This PR enables two new commands in the invoke.py script
!models
-- list the available models and their cache status!switch <model>
-- switch to the indicated modelExample:
The name and descriptions of the models are taken from
config/models.yaml
. A future enhancement tomodel_cache.py
will be to enable new model stanzas to be added to the file programmatically. This will be useful for the WebGUI.More details:
ldm/invoke/model_cache.py
Unrelated fixes:
Co-authored by @ArDiouscuros