Using pretrained models shows epoch 1372 -- is this expected? #31

skol101 · 2023-01-11T08:36:44Z

I've downloaded D-freevc.pth and freevc.pth and renamed them respectively to D_0.pth, and G_0.pth.

As you can see it shows step 253800. Running this time on 1 gpu with 64 batch size.

CUDA_VISIBLE_DEVICES="0" python train.py -c configs/freevc.json -m freevc
INFO:freevc:{'train': {'log_interval': 200, 'eval_interval': 5000, 'seed': 1234, 'epochs': 10000, 'learning_rate': 0.0002, 'betas': [0.8, 0.99], 'eps': 1e-09, 'batch_size': 64, 'fp16_run': False, 'lr_decay': 0.999875, 'segment_size': 8960, 'init_lr_ratio': 1, 'warmup_epochs': 0, 'c_mel': 45, 'c_kl': 1.0, 'use_sr': True, 'max_speclen': 128, 'port': '8001'}, 'data': {'training_files': 'filelists/train.txt', 'validation_files': 'filelists/val.txt', 'max_wav_value': 32768.0, 'sampling_rate': 16000, 'filter_length': 1280, 'hop_length': 320, 'win_length': 1280, 'n_mel_channels': 80, 'mel_fmin': 0.0, 'mel_fmax': None}, 'model': {'inter_channels': 192, 'hidden_channels': 192, 'filter_channels': 768, 'n_heads': 2, 'n_layers': 6, 'kernel_size': 3, 'p_dropout': 0.1, 'resblock': '1', 'resblock_kernel_sizes': [3, 7, 11], 'resblock_dilation_sizes': [[1, 3, 5], [1, 3, 5], [1, 3, 5]], 'upsample_rates': [10, 8, 2, 2], 'upsample_initial_channel': 512, 'upsample_kernel_sizes': [16, 16, 4, 4], 'n_layers_q': 3, 'use_spectral_norm': False, 'gin_channels': 256, 'ssl_dim': 1024, 'use_spk': True}, 'model_dir': './logs/freevc'}
INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0
INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes.
./logs/freevc/G_0.pth
INFO:freevc:Loaded checkpoint './logs/freevc/G_0.pth' (iteration 1372)
./logs/freevc/D_0.pth
INFO:freevc:Loaded checkpoint './logs/freevc/D_0.pth' (iteration 1372)
INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration.
INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration.
INFO:freevc:Train Epoch: 1372 [89%]
INFO:freevc:[2.6407272815704346, 2.6749014854431152, 10.811471939086914, 19.81775665283203, 2.324024200439453, 253800, 0.00016843613063817603]
INFO:freevc:====> Epoch: 1372

The text was updated successfully, but these errors were encountered:

OlaWod · 2023-01-11T13:22:51Z

Yes 1372 is expected. train.txt has 42191 lines, 1372epoch * 42191sample = 57886052, 57886052 / 64 = 904,469.5625. Since batch_sampler in dataloader will discard samples that are too short or too long, the valid training samples will be a little fewer than 42191, and so the resulted steps will be a little fewer than 904,469.5625, so the steps are approximately 900k steps.
Btw since I share one machine with some of my mates, sometimes I have to press ctrl+c and let them use the gpu card for a while, and then I continue training, not sure if this will affect the iteration number a bit or not.

As for 'step 253800', I guess maybe your training set does not have 42191 samples.

skol101 closed this as completed Jan 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using pretrained models shows epoch 1372 -- is this expected? #31

Using pretrained models shows epoch 1372 -- is this expected? #31

skol101 commented Jan 11, 2023 •

edited

Loading

OlaWod commented Jan 11, 2023 •

edited

Loading

Using pretrained models shows epoch 1372 -- is this expected? #31

Using pretrained models shows epoch 1372 -- is this expected? #31

Comments

skol101 commented Jan 11, 2023 • edited Loading

OlaWod commented Jan 11, 2023 • edited Loading

skol101 commented Jan 11, 2023 •

edited

Loading

OlaWod commented Jan 11, 2023 •

edited

Loading