Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using pretrained models shows epoch 1372 -- is this expected? #31

Closed
skol101 opened this issue Jan 11, 2023 · 1 comment
Closed

Using pretrained models shows epoch 1372 -- is this expected? #31

skol101 opened this issue Jan 11, 2023 · 1 comment

Comments

@skol101
Copy link

skol101 commented Jan 11, 2023

I've downloaded D-freevc.pth and freevc.pth and renamed them respectively to D_0.pth, and G_0.pth.

As you can see it shows step 253800. Running this time on 1 gpu with 64 batch size.

CUDA_VISIBLE_DEVICES="0" python train.py -c configs/freevc.json -m freevc
INFO:freevc:{'train': {'log_interval': 200, 'eval_interval': 5000, 'seed': 1234, 'epochs': 10000, 'learning_rate': 0.0002, 'betas': [0.8, 0.99], 'eps': 1e-09, 'batch_size': 64, 'fp16_run': False, 'lr_decay': 0.999875, 'segment_size': 8960, 'init_lr_ratio': 1, 'warmup_epochs': 0, 'c_mel': 45, 'c_kl': 1.0, 'use_sr': True, 'max_speclen': 128, 'port': '8001'}, 'data': {'training_files': 'filelists/train.txt', 'validation_files': 'filelists/val.txt', 'max_wav_value': 32768.0, 'sampling_rate': 16000, 'filter_length': 1280, 'hop_length': 320, 'win_length': 1280, 'n_mel_channels': 80, 'mel_fmin': 0.0, 'mel_fmax': None}, 'model': {'inter_channels': 192, 'hidden_channels': 192, 'filter_channels': 768, 'n_heads': 2, 'n_layers': 6, 'kernel_size': 3, 'p_dropout': 0.1, 'resblock': '1', 'resblock_kernel_sizes': [3, 7, 11], 'resblock_dilation_sizes': [[1, 3, 5], [1, 3, 5], [1, 3, 5]], 'upsample_rates': [10, 8, 2, 2], 'upsample_initial_channel': 512, 'upsample_kernel_sizes': [16, 16, 4, 4], 'n_layers_q': 3, 'use_spectral_norm': False, 'gin_channels': 256, 'ssl_dim': 1024, 'use_spk': True}, 'model_dir': './logs/freevc'}
INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0
INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes.
./logs/freevc/G_0.pth
INFO:freevc:Loaded checkpoint './logs/freevc/G_0.pth' (iteration 1372)
./logs/freevc/D_0.pth
INFO:freevc:Loaded checkpoint './logs/freevc/D_0.pth' (iteration 1372)
INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration.
INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration.
INFO:freevc:Train Epoch: 1372 [89%]
INFO:freevc:[2.6407272815704346, 2.6749014854431152, 10.811471939086914, 19.81775665283203, 2.324024200439453, 253800, 0.00016843613063817603]
INFO:freevc:====> Epoch: 1372

 
@OlaWod
Copy link
Owner

OlaWod commented Jan 11, 2023

Yes 1372 is expected. train.txt has 42191 lines, 1372epoch * 42191sample = 57886052, 57886052 / 64 = 904,469.5625. Since batch_sampler in dataloader will discard samples that are too short or too long, the valid training samples will be a little fewer than 42191, and so the resulted steps will be a little fewer than 904,469.5625, so the steps are approximately 900k steps.
Btw since I share one machine with some of my mates, sometimes I have to press ctrl+c and let them use the gpu card for a while, and then I continue training, not sure if this will affect the iteration number a bit or not.
2F63003A3D5372FF590F737C8B81D752
As for 'step 253800', I guess maybe your training set does not have 42191 samples.

@skol101 skol101 closed this as completed Jan 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants