Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

generate options #35

Open
materialvision opened this issue Mar 24, 2020 · 13 comments
Open

generate options #35

materialvision opened this issue Mar 24, 2020 · 13 comments

Comments

@materialvision
Copy link

Hi. I tried to find a way to control the generate function in a more flexible way. Could someone point me in the right direction? I am looking to make series of larger single images instead of the "contact sheet" style that it now outputs.

@lucidrains
Copy link
Owner

@materialvision hello! there is a setting --num_image_tiles that you can set to 1 if you desire single images

@materialvision
Copy link
Author

Thanks! What about size? Right now the images are only 128x128 px. --image_size setting makes no difference...

@lucidrains
Copy link
Owner

@materialvision you can only generate on the image size that you trained it on

@materialvision
Copy link
Author

Thank you. Of course then 1024px is difficult to get to work on my 8GB GPU... will try more different settings but seems like 512 is the largest possible....

@materialvision
Copy link
Author

One more newbie question... Will it be possible to generate interpolations as series of images (for animation)?

@lucidrains
Copy link
Owner

@materialvision ohh not yet, but I can look into adding one, perhaps with a flag like --generate-interpolation?

@materialvision
Copy link
Author

That would be amazing, thanks!

@dancrew32
Copy link

dancrew32 commented Apr 7, 2020

Say that you trained at image_size 128px and you spent a little while (and $) training it on a p2.xlarge (https://aws.amazon.com/ec2/instance-types/p2/). Say the results are great, but now you want to increase resolution. Is there a strategy for "upgrading" the model to 1024px or larger (2048px? 7680 × 4320 8K?) without having to redo all of the 100k num_train_steps at --image_size 1024?

Also any suggestions for training this thing faster/cheaper? Recommendations for GPUs also welcome. Thanks for sharing your implementations/setups, everyone.

@lucidrains
Copy link
Owner

@dancrew32 Hello Dan, unfortunately not at the moment. I think the cheapest route is to use the official stylegan2 repository from Nvidia, and to train on Colab for free. You can checkpoint every so often to your google drive, and resume for a couple days. I will eventually get around to making this library compatible with Microsoft's Deepspeed for accelerated distributed training

@dancrew32
Copy link

I upgraded Google Drive storage (100GB for $1.99/mo), went in on Colab Pro ($9.99/mo) and it appears to be able to train with defaults at --image_size=256, getting 1-2s iterations! Looks like 54 hours required for 100k iterations.

I tried image_size=1024 and image_size=512 out but couldn't get it to fit in the 16GB GPU Google Colab pro offers. This is the "high memory" runtime GPU option. I've not tried TPU.

Screen Shot 2020-04-25 at 1 33 43 PM

Logging my trials to make it fit in 1024 or 512 below for anyone who's interested:


1024 defaults

stylegan2_pytorch --data='/content/drive/My Drive/path/to/images' --image_size=1024

Runtime error during the first iteration:

tcmalloc: large alloc 2415919104 bytes == 0x7088000 @  0x7fc0e5fedb6b ...
tcmalloc: large alloc 1207959552 bytes == 0x99098000 @  0x7fc0e5fedb6b ...
tcmalloc: large alloc 2415919104 bytes == 0x7fbfd0000000 @  0x7fc0e5fedb6b ...
tcmalloc: large alloc 2415919104 bytes == 0x7fbf40000000 @  0x7fc0e5fedb6b ...

Traceback (most recent call last):
  File "/usr/local/bin/stylegan2_pytorch", line 66, in <module>
    fire.Fire(train_from_folder)
  File "/usr/local/lib/python3.6/dist-packages/fire/core.py", line 138, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/usr/local/lib/python3.6/dist-packages/fire/core.py", line 468, in _Fire
    target=component.__name__)
  File "/usr/local/lib/python3.6/dist-packages/fire/core.py", line 672, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/usr/local/bin/stylegan2_pytorch", line 61, in train_from_folder
    retry_call(model.train, tries=3, exceptions=NanException)
  File "/usr/local/lib/python3.6/dist-packages/retry/api.py", line 101, in retry_call
    return __retry_internal(partial(f, *args, **kwargs), exceptions, tries, delay, max_delay, backoff, jitter, logger)
  File "/usr/local/lib/python3.6/dist-packages/retry/api.py", line 33, in __retry_internal
    return f()
  File "/usr/local/lib/python3.6/dist-packages/stylegan2_pytorch/stylegan2_pytorch.py", line 527, in train
    generated_images = self.GAN.G(w_styles, noise)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/stylegan2_pytorch/stylegan2_pytorch.py", line 337, in forward
    x, rgb = block(x, rgb, style, input_noise)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/stylegan2_pytorch/stylegan2_pytorch.py", line 274, in forward
    x = self.conv2(x, style2)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/stylegan2_pytorch/stylegan2_pytorch.py", line 228, in forward
    weights = w2 * (w1 + 1)

# RuntimeError: CUDA out of memory. Tried to allocate 6.75 GiB (GPU 0; 15.90 GiB total capacity; 14.78 GiB already allocated; 403.88 MiB free; 14.80 GiB reserved in total by PyTorch)

1024 and 512 via "Memory considerations" recommendations

I tried the following settings from the Memory considerations section of the readme for 1024 and 512, but experience a similar error where there is just not enough memory.

stylegan2_pytorch --data='/content/drive/My Drive/path/to/images' --image_size=1024 --batch-size=3 --gradient-accumulate-every=5 --network-capacity=16

fp16, Apex not available

Tried --fp16 at 1024, but picked up this Apex not available error:

Traceback (most recent call last):
  File "/usr/local/bin/stylegan2_pytorch", line 66, in <module>
    fire.Fire(train_from_folder)
  File "/usr/local/lib/python3.6/dist-packages/fire/core.py", line 138, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/usr/local/lib/python3.6/dist-packages/fire/core.py", line 468, in _Fire
    target=component.__name__)
  File "/usr/local/lib/python3.6/dist-packages/fire/core.py", line 672, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/usr/local/bin/stylegan2_pytorch", line 42, in train_from_folder
    fp16 = fp16
  File "/usr/local/lib/python3.6/dist-packages/stylegan2_pytorch/stylegan2_pytorch.py", line 458, in __init__
    assert not fp16 or fp16 and APEX_AVAILABLE, 'Apex is not available for you to use mixed precision training'
AssertionError: Apex is not available for you to use mixed precision training

512 defaults

stylegan2_pytorch --data='/content/drive/My Drive/path/to/images' --image_size=512

Similar runtime error:

# RuntimeError: CUDA out of memory. Tried to allocate 1.69 GiB (GPU 0; 15.90 GiB total capacity; 12.73 GiB already allocated; 1.49 GiB free; 13.71 GiB reserved in total by PyTorch)

So close to fitting, but just ~200MB over.

512 batch=1

stylegan2_pytorch --data='/content/drive/My Drive/path/to/images' --image_size=512 --batch-size=1 --gradient-accumulate-every=5 --network-capacity=16
RuntimeError: CUDA out of memory. Tried to allocate 576.00 MiB (GPU 0; 15.90 GiB total capacity; 13.94 GiB already allocated; 29.88 MiB free; 15.17 GiB reserved in total by PyTorch)

Smaller network capacity

I was able to get it to start training for 512px images by lowering all values to 1: batch size, gradient-accumulate-every and setting network-capacity to 8 (1s/iteration):

stylegan2_pytorch --data='/content/drive/My Drive/path/to/images' --image_size=512 --batch-size=1 --gradient-accumulate-every=1 --network-capacity=8

Conclusion

Not sure if it is worth waiting two days with these settings if it will be low quality, so kicking off a default 256 run now. Thanks again for the library!

@materialvision
Copy link
Author

Thanks for sharing your trials. I had been planning to test on Colab for 512... but it seems you had problems there also. The only thing I would try is to raise the gradient-accumulate-every, as stated in the readme it should be higher as you go down on batch-size. I manage to run on my RTX 2070 with: --image-size 512 --batch-size 1 --gradient-accumulate-every 20 --network-capacity 7 but results stops improving after some epochs... Let us know if you get some good colab results!

@dancrew32
Copy link

dancrew32 commented Apr 26, 2020

Thanks for the suggestion @materialvision, those settings are working on Colab getting ~5s/iteration (140 hours to 100k iterations, so 6 days haha). I guess you can run two notebooks with the high ram GPU, so I'm doing a shootout of 256 default vs. 512 with your settings.

@ugurcansakizli
Copy link

ugurcansakizli commented May 4, 2020

Hello, total noob here. Is there a way to generate similar images to a specific image we like? And can we ask for more than one image with --generate? (like --generate --num-image-tiles 1 --num-images 5)
-upon some more reading I guess what I want is to be able to explore the latent space.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants