You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.
Which results in across nodes gradient descent for discriminator and encoder,
yet decoder is not being shared and effectively a separate domain is being trained per node. DataParallel is being used to accelerate batch processing with multiple GPUs per node.
A thing is that single node training does still rotate domain where from batches are sampled.
In such a case single node training does learn a hidden representation which is domain agnostic,
yet synthesis will produce the very same input waveform.
What way to transfer the domain in such a case? There's no parameterized domain class
inside latent representation and a single decoder is being trained.
Is there a hidden purpose for a single node training? Since it's not clear what way to make music translation with that.
Regarding pretrained weights, encoder parameters are different across nodes checkpoints. For some reason they were not synchronized during the training:
Outputs generated by loaded encoders from lastmodel_0.pth, lastmodel_1.pth, and lastmodel_2.pth for a common input numpy.random.randint(0, 256, (1, 800)) are different.
The text was updated successfully, but these errors were encountered:
Regarding multinode training:
You are correct, each node trains only a single decoder, and the encoder is shared between all nodes. Gradients are shared with regard to the encoder, but each decoder is trained on a single node. For 6 domains, we used 6 nodes.
Regarding single node training:
This is indeed a bug. A solution will be keep a list of decoders (+ a list of optimizers) and update the relevant decoder based on the domain.
Regarding weights:
The bestmodel*.pth is the checkpoint that produced the lowest eval loss across all epochs per musical domain. The "lastmodel.pth" contains the same encoder across all domains.
To transfer domain a separate decoder is required per domain,
as it is stated in the paper:
In a distributed setup
discriminator
andencoder
are wrapped withDistributedDataParallel
,where as
decoder
has onlyDataParallel
:music-translation/src/train.py
Line 168 in fd51cbc
Which results in across nodes gradient descent for
discriminator
andencoder
,yet
decoder
is not being shared and effectively a separate domain is being trained per node.DataParallel
is being used to accelerate batch processing with multiple GPUs per node.A thing is that single node training does still rotate domain where from batches are sampled.
music-translation/src/train.py
Line 273 in fd51cbc
In such a case single node training does learn a hidden representation which is domain agnostic,
yet synthesis will produce the very same input waveform.
What way to transfer the domain in such a case? There's no parameterized domain class
inside latent representation and a single decoder is being trained.
Is there a hidden purpose for a single node training? Since it's not clear what way to make music translation with that.
Regarding pretrained weights,
encoder
parameters are different across nodes checkpoints. For some reason they were not synchronized during the training:Outputs generated by loaded encoders from
lastmodel_0.pth
,lastmodel_1.pth
, andlastmodel_2.pth
for a common inputnumpy.random.randint(0, 256, (1, 800))
are different.The text was updated successfully, but these errors were encountered: