Weight demodulation #2429

jvwilliams23 · 2024-02-20T22:33:17Z

Hi,

We are trying to do weight demodulation on the weights of a 2D convolution layer, as in StyleGAN2 paper. Example of weight demodulation in Nvidia StyleGAN2 pytorch code:

weight = torch.nn.Parameter(torch.randn([out_channels, in_channels, kernel_size, kernel_size])
w = weight
w = w * styles.reshape(batch_size, 1, -1, 1, 1) # [NOIkk]
# Sum over 
dcoefs = (w.square().sum(dim=[2,3,4]) + 1e-8).rsqrt() # [NO]
...
if up > 1:
    weight = weight.transpose(0,1)
x =  conv(x=x, w=weight, transpose=True)
x = x * dcoefs.to(x.dtype).reshape(batch_size, -1, 1, 1)

Question; how to perform the multidimensional sum() on the weights to compute dcoefs?

Best,
Josh

The text was updated successfully, but these errors were encountered:

tbennun · 2024-02-29T03:56:28Z

@jvwilliams23 Thank you for reporting. I have implemented a version of the multi-dimensional reduction layer in #2430. You can also find a usage example in the corresponding unit test.

jvwilliams23 · 2024-02-29T08:36:14Z

Hi @tbennun thanks for implementing this. I am wondering, is it possible for me to install this branch on your fork via spack? Or do I need to wait until the PR is merged?

jvwilliams23 · 2024-03-04T10:57:40Z

Hi @tbennun thanks for implementing this. I am wondering, is it possible for me to install this branch on your fork via spack? Or do I need to wait until the PR is merged?

Nevermind - got it! Testing now.

jvwilliams23 · 2024-03-13T10:56:11Z

Hi @tbennun. Works great! Thanks for implementing this. I will mark as closed.

tbennun · 2024-03-13T15:53:37Z

Happy to hear that!

jvwilliams23 · 2024-03-21T08:57:46Z

Hi @tbennun. Does this work in a model parallel setting (i.e. the reduction is multiplied by the output of a model parallel convolution layer, like shown below)?

reduction_kernel = lbann.MultiDimReduction(lbann.Square(weights_times_styles), axes=reduction_axes)
dcoefs = lbann.Reshape(lbann.Rsqrt(reduction_kernel), dims=[out_channels, 1, 1, 1])

# scale activations by styles before convolution, scale by dcoefs after convolution
styles_reshaped = lbann.Tessellate(
    styles_reshaped, dims=[in_channels, in_resolution, in_resolution]
)
x = lbann.Multiply(x, styles_reshaped)

if parallel_strategy_global is not None:
  print("modulated_conv2d parallel_strategy = ", parallel_strategy_global)
conv_mod = lm.Convolution2dModule(
  weights=weight.weights,
  parallel_strategy=parallel_strategy_global,
  **conv_kwargs
) 
x = conv_mod(x)

dcoefs_reshape = lbann.Reshape(dcoefs, dims=[out_channels, 1, 1])
dcoefs_reshape = lbann.Tessellate(dcoefs_reshape, dims=[out_channels, resolution, resolution])
x = lbann.Multiply(x, dcoefs_reshape)

I seem to get the following error:

Process 1 caught error message:
****************************************************************
LBANN error on rank 1 (/home/jwilliams/lbann-builds/dev-lbann-clean/include/lbann/utils/cutensor_support.hpp:160): cuTENSOR error (status=15): CUTENSOR_STATUS_NOT_SUPPORTED
Stack trace:
   0: lbann::stack_trace::get[abi:cxx11]()
   1: lbann::exception::exception(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)
   2: /home/jwilliams/lbann-builds/dev-lbann-clean/build_cutensor/install/lib64/liblbann.so.0.104.0(+0xc024cb8) [0x7f1583da3cb8] (could not find stack frame symbol)
   3: lbann::multidim_reduction_layer<float, (lbann::data_layout)1, (hydrogen::Device)1>::fp_compute()
   4: lbann::data_type_layer<float, float>::forward_prop()
   5: lbann::model::forward_prop(lbann::execution_mode)
   6: lbann::SGDTrainingAlgorithm::train_mini_batch(lbann::SGDExecutionContext&, lbann::model&, lbann::data_coordinator&, lbann::ScopeTimer)
   7: lbann::SGDTrainingAlgorithm::train(lbann::SGDExecutionContext&, lbann::model&, lbann::data_coordinator&, lbann::SGDTerminationCriteria const&)
   8: lbann::SGDTrainingAlgorithm::apply(lbann::ExecutionContext&, lbann::model&, lbann::data_coordinator&, lbann::execution_mode)
   9: lbann::trainer::train(lbann::model*, long long, long long)
  10: /home/jwilliams/lbann-builds/dev-lbann-clean/build_cutensor/install/bin/lbann() [0x4384b2] (could not find stack frame symbol)
  11: __libc_start_main (demangling failed)
  12: /home/jwilliams/lbann-builds/dev-lbann-clean/build_cutensor/install/bin/lbann() [0x43684e] (could not find stack frame symbol)
****************************************************************

tbennun · 2024-03-21T15:58:06Z

Multi-dimensional reduction itself cannot run in model-parallel mode at the moment, but should accept model parallel outputs if you explicitly set it to be data parallel.

jvwilliams23 · 2024-03-21T16:37:18Z

Like below?

reduction_kernel = lbann.MultiDimReduction(
      lbann.Square(w),
      axes=reduction_axes, 
      data_layout='data_parallel',
      parallel_strategy=None
)

I get the same error message.

tbennun · 2024-03-21T16:56:49Z

@benson31 any ideas?

tbennun self-assigned this Feb 21, 2024

tbennun mentioned this issue Feb 29, 2024

Implement multi-dimensional reduction and refactor cuTENSOR support #2430

Merged

jvwilliams23 closed this as completed Mar 13, 2024

jvwilliams23 mentioned this issue Jul 12, 2024

cuTENSOR error (CUTENSOR_STATUS_NOT_SUPPORTED) #2463

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Weight demodulation #2429

Weight demodulation #2429

jvwilliams23 commented Feb 20, 2024 •

edited

Loading

tbennun commented Feb 29, 2024

jvwilliams23 commented Feb 29, 2024

jvwilliams23 commented Mar 4, 2024

jvwilliams23 commented Mar 13, 2024

tbennun commented Mar 13, 2024

jvwilliams23 commented Mar 21, 2024

tbennun commented Mar 21, 2024

jvwilliams23 commented Mar 21, 2024

tbennun commented Mar 21, 2024

Weight demodulation #2429

Weight demodulation #2429

Comments

jvwilliams23 commented Feb 20, 2024 • edited Loading

tbennun commented Feb 29, 2024

jvwilliams23 commented Feb 29, 2024

jvwilliams23 commented Mar 4, 2024

jvwilliams23 commented Mar 13, 2024

tbennun commented Mar 13, 2024

jvwilliams23 commented Mar 21, 2024

tbennun commented Mar 21, 2024

jvwilliams23 commented Mar 21, 2024

tbennun commented Mar 21, 2024

jvwilliams23 commented Feb 20, 2024 •

edited

Loading