generate options #35

materialvision · 2020-03-24T11:50:16Z

Hi. I tried to find a way to control the generate function in a more flexible way. Could someone point me in the right direction? I am looking to make series of larger single images instead of the "contact sheet" style that it now outputs.

lucidrains · 2020-03-24T11:52:32Z

@materialvision hello! there is a setting --num_image_tiles that you can set to 1 if you desire single images

materialvision · 2020-03-24T12:10:07Z

Thanks! What about size? Right now the images are only 128x128 px. --image_size setting makes no difference...

lucidrains · 2020-03-24T12:14:22Z

@materialvision you can only generate on the image size that you trained it on

materialvision · 2020-03-24T12:39:38Z

Thank you. Of course then 1024px is difficult to get to work on my 8GB GPU... will try more different settings but seems like 512 is the largest possible....

materialvision · 2020-03-25T09:49:21Z

One more newbie question... Will it be possible to generate interpolations as series of images (for animation)?

lucidrains · 2020-04-01T21:08:58Z

@materialvision ohh not yet, but I can look into adding one, perhaps with a flag like --generate-interpolation?

materialvision · 2020-04-02T06:14:15Z

That would be amazing, thanks!

dancrew32 · 2020-04-07T00:14:01Z

Say that you trained at image_size 128px and you spent a little while (and $) training it on a p2.xlarge (https://aws.amazon.com/ec2/instance-types/p2/). Say the results are great, but now you want to increase resolution. Is there a strategy for "upgrading" the model to 1024px or larger (2048px? 7680 × 4320 8K?) without having to redo all of the 100k num_train_steps at --image_size 1024?

Also any suggestions for training this thing faster/cheaper? Recommendations for GPUs also welcome. Thanks for sharing your implementations/setups, everyone.

lucidrains · 2020-04-09T19:19:47Z

@dancrew32 Hello Dan, unfortunately not at the moment. I think the cheapest route is to use the official stylegan2 repository from Nvidia, and to train on Colab for free. You can checkpoint every so often to your google drive, and resume for a couple days. I will eventually get around to making this library compatible with Microsoft's Deepspeed for accelerated distributed training

dancrew32 · 2020-04-25T21:48:21Z

I upgraded Google Drive storage (100GB for $1.99/mo), went in on Colab Pro ($9.99/mo) and it appears to be able to train with defaults at --image_size=256, getting 1-2s iterations! Looks like 54 hours required for 100k iterations.

I tried image_size=1024 and image_size=512 out but couldn't get it to fit in the 16GB GPU Google Colab pro offers. This is the "high memory" runtime GPU option. I've not tried TPU.

Logging my trials to make it fit in 1024 or 512 below for anyone who's interested:

1024 defaults

stylegan2_pytorch --data='/content/drive/My Drive/path/to/images' --image_size=1024

Runtime error during the first iteration:

tcmalloc: large alloc 2415919104 bytes == 0x7088000 @  0x7fc0e5fedb6b ...
tcmalloc: large alloc 1207959552 bytes == 0x99098000 @  0x7fc0e5fedb6b ...
tcmalloc: large alloc 2415919104 bytes == 0x7fbfd0000000 @  0x7fc0e5fedb6b ...
tcmalloc: large alloc 2415919104 bytes == 0x7fbf40000000 @  0x7fc0e5fedb6b ...

Traceback (most recent call last):
  File "/usr/local/bin/stylegan2_pytorch", line 66, in <module>
    fire.Fire(train_from_folder)
  File "/usr/local/lib/python3.6/dist-packages/fire/core.py", line 138, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/usr/local/lib/python3.6/dist-packages/fire/core.py", line 468, in _Fire
    target=component.__name__)
  File "/usr/local/lib/python3.6/dist-packages/fire/core.py", line 672, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/usr/local/bin/stylegan2_pytorch", line 61, in train_from_folder
    retry_call(model.train, tries=3, exceptions=NanException)
  File "/usr/local/lib/python3.6/dist-packages/retry/api.py", line 101, in retry_call
    return __retry_internal(partial(f, *args, **kwargs), exceptions, tries, delay, max_delay, backoff, jitter, logger)
  File "/usr/local/lib/python3.6/dist-packages/retry/api.py", line 33, in __retry_internal
    return f()
  File "/usr/local/lib/python3.6/dist-packages/stylegan2_pytorch/stylegan2_pytorch.py", line 527, in train
    generated_images = self.GAN.G(w_styles, noise)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/stylegan2_pytorch/stylegan2_pytorch.py", line 337, in forward
    x, rgb = block(x, rgb, style, input_noise)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/stylegan2_pytorch/stylegan2_pytorch.py", line 274, in forward
    x = self.conv2(x, style2)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/stylegan2_pytorch/stylegan2_pytorch.py", line 228, in forward
    weights = w2 * (w1 + 1)

# RuntimeError: CUDA out of memory. Tried to allocate 6.75 GiB (GPU 0; 15.90 GiB total capacity; 14.78 GiB already allocated; 403.88 MiB free; 14.80 GiB reserved in total by PyTorch)

1024 and 512 via "Memory considerations" recommendations

I tried the following settings from the Memory considerations section of the readme for 1024 and 512, but experience a similar error where there is just not enough memory.

stylegan2_pytorch --data='/content/drive/My Drive/path/to/images' --image_size=1024 --batch-size=3 --gradient-accumulate-every=5 --network-capacity=16

fp16, Apex not available

Tried --fp16 at 1024, but picked up this Apex not available error:

Traceback (most recent call last):
  File "/usr/local/bin/stylegan2_pytorch", line 66, in <module>
    fire.Fire(train_from_folder)
  File "/usr/local/lib/python3.6/dist-packages/fire/core.py", line 138, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/usr/local/lib/python3.6/dist-packages/fire/core.py", line 468, in _Fire
    target=component.__name__)
  File "/usr/local/lib/python3.6/dist-packages/fire/core.py", line 672, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/usr/local/bin/stylegan2_pytorch", line 42, in train_from_folder
    fp16 = fp16
  File "/usr/local/lib/python3.6/dist-packages/stylegan2_pytorch/stylegan2_pytorch.py", line 458, in __init__
    assert not fp16 or fp16 and APEX_AVAILABLE, 'Apex is not available for you to use mixed precision training'
AssertionError: Apex is not available for you to use mixed precision training

512 defaults

stylegan2_pytorch --data='/content/drive/My Drive/path/to/images' --image_size=512

Similar runtime error:

# RuntimeError: CUDA out of memory. Tried to allocate 1.69 GiB (GPU 0; 15.90 GiB total capacity; 12.73 GiB already allocated; 1.49 GiB free; 13.71 GiB reserved in total by PyTorch)

So close to fitting, but just ~200MB over.

512 batch=1

stylegan2_pytorch --data='/content/drive/My Drive/path/to/images' --image_size=512 --batch-size=1 --gradient-accumulate-every=5 --network-capacity=16

RuntimeError: CUDA out of memory. Tried to allocate 576.00 MiB (GPU 0; 15.90 GiB total capacity; 13.94 GiB already allocated; 29.88 MiB free; 15.17 GiB reserved in total by PyTorch)

Smaller network capacity

I was able to get it to start training for 512px images by lowering all values to 1: batch size, gradient-accumulate-every and setting network-capacity to 8 (1s/iteration):

stylegan2_pytorch --data='/content/drive/My Drive/path/to/images' --image_size=512 --batch-size=1 --gradient-accumulate-every=1 --network-capacity=8

Conclusion

Not sure if it is worth waiting two days with these settings if it will be low quality, so kicking off a default 256 run now. Thanks again for the library!

materialvision · 2020-04-26T08:56:26Z

Thanks for sharing your trials. I had been planning to test on Colab for 512... but it seems you had problems there also. The only thing I would try is to raise the gradient-accumulate-every, as stated in the readme it should be higher as you go down on batch-size. I manage to run on my RTX 2070 with: --image-size 512 --batch-size 1 --gradient-accumulate-every 20 --network-capacity 7 but results stops improving after some epochs... Let us know if you get some good colab results!

dancrew32 · 2020-04-26T19:53:43Z

Thanks for the suggestion @materialvision, those settings are working on Colab getting ~5s/iteration (140 hours to 100k iterations, so 6 days haha). I guess you can run two notebooks with the high ram GPU, so I'm doing a shootout of 256 default vs. 512 with your settings.

ugurcansakizli · 2020-05-04T11:25:22Z

Hello, total noob here. Is there a way to generate similar images to a specific image we like? And can we ask for more than one image with --generate? (like --generate --num-image-tiles 1 --num-images 5)
-upon some more reading I guess what I want is to be able to explore the latent space.

NickAcPT mentioned this issue Apr 11, 2020

[Bug report] Image size not taking into account when using --generate #42

Closed

dancrew32 mentioned this issue Apr 26, 2020

Possible to safe more often? #27

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

generate options #35

generate options #35

materialvision commented Mar 24, 2020

lucidrains commented Mar 24, 2020

materialvision commented Mar 24, 2020

lucidrains commented Mar 24, 2020

materialvision commented Mar 24, 2020

materialvision commented Mar 25, 2020

lucidrains commented Apr 1, 2020

materialvision commented Apr 2, 2020

dancrew32 commented Apr 7, 2020 •

edited

Loading

lucidrains commented Apr 9, 2020

dancrew32 commented Apr 25, 2020

materialvision commented Apr 26, 2020

dancrew32 commented Apr 26, 2020 •

edited

Loading

ugurcansakizli commented May 4, 2020 •

edited

Loading

generate options #35

generate options #35

Comments

materialvision commented Mar 24, 2020

lucidrains commented Mar 24, 2020

materialvision commented Mar 24, 2020

lucidrains commented Mar 24, 2020

materialvision commented Mar 24, 2020

materialvision commented Mar 25, 2020

lucidrains commented Apr 1, 2020

materialvision commented Apr 2, 2020

dancrew32 commented Apr 7, 2020 • edited Loading

lucidrains commented Apr 9, 2020

dancrew32 commented Apr 25, 2020

1024 defaults

1024 and 512 via "Memory considerations" recommendations

fp16, Apex not available

512 defaults

512 batch=1

Smaller network capacity

Conclusion

materialvision commented Apr 26, 2020

dancrew32 commented Apr 26, 2020 • edited Loading

ugurcansakizli commented May 4, 2020 • edited Loading

dancrew32 commented Apr 7, 2020 •

edited

Loading

dancrew32 commented Apr 26, 2020 •

edited

Loading

ugurcansakizli commented May 4, 2020 •

edited

Loading