Python Notebook fails on training #37

jamierpond · 2022-12-14T22:56:27Z

Hi! Jus trying to run a simple demo. Currently following all the demos/examples/other people's tutorials and I have everything set up the same as them, but i keep getting errors. I'm wondering if you would please point me in the right direction?

/content/diff-svc
| Hparams chains:  ['/content/diff-svc/training/config_nsf.yaml']
| Hparams: 
;33;mK_step: 1000, ;33;maccumulate_grad_batches: 1, ;33;maudio_num_mel_bins: 128, ;33;maudio_sample_rate: 44100, ;33;mbinarization_args: {'shuffle': False, 'with_align': True, 'with_f0': True, 'with_hubert': True, 'with_spk_embed': False, 'with_wav': False}, 
;33;mbinarizer_cls: preprocessing.SVCpre.SVCBinarizer, ;33;mbinary_data_dir: data/binary/neer, ;33;mcheck_val_every_n_epoch: 10, ;33;mchoose_test_manually: False, ;33;mclip_grad_norm: 1, 
;33;mconfig_path: training/config_nsf.yaml, ;33;mcontent_cond_steps: [], ;33;mcwt_add_f0_loss: False, ;33;mcwt_hidden_size: 128, ;33;mcwt_layers: 2, 
;33;mcwt_loss: l1, ;33;mcwt_std_scale: 0.8, ;33;mdatasets: ['opencpop'], ;33;mdebug: False, ;33;mdec_ffn_kernel_size: 9, 
;33;mdec_layers: 4, ;33;mdecay_steps: 20000, ;33;mdecoder_type: fft, ;33;mdict_dir: , ;33;mdiff_decoder_type: wavenet, 
;33;mdiff_loss_type: l2, ;33;mdilation_cycle_length: 4, ;33;mdropout: 0.1, ;33;mds_workers: 4, ;33;mdur_enc_hidden_stride_kernel: ['0,2,3', '0,2,3', '0,1,3'], 
;33;mdur_loss: mse, ;33;mdur_predictor_kernel: 3, ;33;mdur_predictor_layers: 5, ;33;menc_ffn_kernel_size: 9, ;33;menc_layers: 4, 
;33;mencoder_K: 8, ;33;mencoder_type: fft, ;33;mendless_ds: True, ;33;mf0_bin: 256, ;33;mf0_max: 1100.0, 
;33;mf0_min: 40.0, ;33;mffn_act: gelu, ;33;mffn_padding: SAME, ;33;mfft_size: 2048, ;33;mfmax: 16000, 
;33;mfmin: 40, ;33;mfs2_ckpt: , ;33;mgaussian_start: True, ;33;mgen_dir_name: , ;33;mgen_tgt_spk_id: -1, 
;33;mhidden_size: 256, ;33;mhop_size: 512, ;33;mhubert_gpu: True, ;33;mhubert_path: checkpoints/hubert/hubert_soft.pt, ;33;minfer: False, 
;33;mkeep_bins: 128, ;33;mlambda_commit: 0.25, ;33;mlambda_energy: 0.0, ;33;mlambda_f0: 1.0, ;33;mlambda_ph_dur: 0.3, 
;33;mlambda_sent_dur: 1.0, ;33;mlambda_uv: 1.0, ;33;mlambda_word_dur: 1.0, ;33;mload_ckpt: /content/diff-svc/pretrain/nehito.ckpt, ;33;mlog_interval: 100, 
;33;mloud_norm: False, ;33;mlr: 0.0008, ;33;mmax_beta: 0.02, ;33;mmax_epochs: 3000, ;33;mmax_eval_sentences: 1, 
;33;mmax_eval_tokens: 60000, ;33;mmax_frames: 42000, ;33;mmax_input_tokens: 60000, ;33;mmax_sentences: 12, ;33;mmax_tokens: 128000, 
;33;mmax_updates: 1000000, ;33;mmel_loss: ssim:0.5|l1:0.5, ;33;mmel_vmax: 1.5, ;33;mmel_vmin: -6.0, ;33;mmin_level_db: -120, 
;33;mno_fs2: True, ;33;mnorm_type: gn, ;33;mnum_ckpt_keep: 10, ;33;mnum_heads: 2, ;33;mnum_sanity_val_steps: 1, 
;33;mnum_spk: 1, ;33;mnum_test_samples: 0, ;33;mnum_valid_plots: 10, ;33;moptimizer_adam_beta1: 0.9, ;33;moptimizer_adam_beta2: 0.98, 
;33;mout_wav_norm: False, ;33;mpe_ckpt: checkpoints/0102_xiaoma_pe/model_ckpt_steps_60000.ckpt, ;33;mpe_enable: False, ;33;mperform_enhance: True, ;33;mpitch_ar: False, 
;33;mpitch_enc_hidden_stride_kernel: ['0,2,5', '0,2,5', '0,2,5'], ;33;mpitch_extractor: parselmouth, ;33;mpitch_loss: l2, ;33;mpitch_norm: log, ;33;mpitch_type: frame, 
;33;mpndm_speedup: 10, ;33;mpre_align_args: {'allow_no_txt': False, 'denoise': False, 'forced_align': 'mfa', 'txt_processor': 'zh_g2pM', 'use_sox': True, 'use_tone': False}, ;33;mpre_align_cls: data_gen.singing.pre_align.SingingPreAlign, ;33;mpredictor_dropout: 0.5, ;33;mpredictor_grad: 0.1, 
;33;mpredictor_hidden: -1, ;33;mpredictor_kernel: 5, ;33;mpredictor_layers: 5, ;33;mprenet_dropout: 0.5, ;33;mprenet_hidden_size: 256, 
;33;mpretrain_fs_ckpt: , ;33;mprocessed_data_dir: xxx, ;33;mprofile_infer: False, ;33;mraw_data_dir: data/raw/neer, ;33;mref_norm_layer: bn, 
;33;mrel_pos: True, ;33;mreset_phone_dict: True, ;33;mresidual_channels: 384, ;33;mresidual_layers: 20, ;33;msave_best: False, 
;33;msave_ckpt: True, ;33;msave_codes: ['configs', 'modules', 'src', 'utils'], ;33;msave_f0: True, ;33;msave_gt: False, ;33;mschedule_type: linear, 
;33;mseed: 1234, ;33;msort_by_len: True, ;33;mspeaker_id: neer, ;33;mspec_max: [-0.07976219058036804, 0.3064012825489044, 0.45079874992370605, 0.48896849155426025, 0.38102585077285767, 0.5545408129692078, 0.6556591391563416, 0.5011460781097412, 0.7585625052452087, 0.7933887243270874, 0.7276718020439148, 0.6568117141723633, 0.8160334825515747, 0.7098748087882996, 0.7070586681365967, 0.9631615281105042, 0.8693066835403442, 0.8992214202880859, 0.8334618210792542, 0.9382892847061157, 0.761588454246521, 1.0139938592910767, 0.8147022128105164, 0.8377708196640015, 0.8404781818389893, 0.5279245376586914, 0.7715780735015869, 0.5754967331886292, 0.19373822212219238, 0.11457031220197678, -0.048836078494787216, 0.2835775315761566, 0.1506994366645813, -0.016768964007496834, 0.07266628742218018, -0.05616551637649536, -0.010572524741292, 0.1133032739162445, 0.16342110931873322, 0.035064052790403366, 0.3116454482078552, 0.16785651445388794, 0.1354154646396637, 0.36229264736175537, 0.372775673866272, -0.10152062773704529, 0.22035335004329681, 0.183604434132576, 0.04665748029947281, 0.23221279680728912, 0.21843412518501282, 0.049887944012880325, -0.05100967362523079, -0.0010432127164676785, -0.06516791135072708, 0.07901491224765778, -0.18570756912231445, -0.14707334339618683, -0.11538795381784439, -0.1341129094362259, -0.15978987514972687, -0.18778416514396667, -0.2038293480873108, -0.25516536831855774, -0.24493663012981415, -0.15004149079322815, -0.016140246763825417, -0.07177135348320007, -0.3963303565979004, -0.3779948353767395, -0.25783461332321167, -0.16094177961349487, -0.23505426943302155, -0.3541640043258667, -0.34247317910194397, -0.3881177306175232, -0.4593522846698761, -0.5756832957267761, -0.35765165090560913, -0.542741060256958, -0.4082295298576355, -0.4770561158657074, -0.17004281282424927, -0.27877169847488403, -0.15326324105262756, -0.4180527925491333, -0.27339401841163635, -0.23254677653312683, -0.29365968704223633, -0.33521631360054016, -0.3491170406341553, -0.18533602356910706, -0.29260891675949097, -0.44137561321258545, -0.6128101944923401, -0.731763482093811, -0.6580878496170044, -0.11427026987075806, -0.3944733738899231, -0.6505616903305054, -0.6488122344017029, -0.7484522461891174, -0.7040322422981262, -0.6145080924034119, -0.531133770942688, -0.5737754702568054, -0.6910640597343445, -0.6721180081367493, -0.8550227284431458, -0.7104114294052124, -0.6984644532203674, -0.8648133277893066, -1.0164130926132202, -1.0275567770004272, -1.1420173645019531, -1.068782925605774, -1.244425654411316, -1.302030086517334, -1.5661638975143433, -1.639020562171936, -1.697121500968933, -1.9838589429855347, -2.2957139015197754, -2.2596089839935303, -2.119849443435669, -2.2869279384613037, -2.358459711074829, -2.3582921028137207], ;33;mspec_min: [-4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102], 
;33;mspk_cond_steps: [], ;33;mstop_token_weight: 5.0, ;33;mtask_cls: training.task.SVC_task.SVCTask, ;33;mtest_ids: [], ;33;mtest_input_dir: , 
;33;mtest_num: 0, ;33;mtest_prefixes: ['test'], ;33;mtest_set_name: test, ;33;mtimesteps: 1000, ;33;mtrain_set_name: train, 
;33;muse_crepe: True, ;33;muse_denoise: False, ;33;muse_energy_embed: False, ;33;muse_gt_dur: False, ;33;muse_gt_f0: False, 
;33;muse_midi: False, ;33;muse_nsf: True, ;33;muse_pitch_embed: True, ;33;muse_pos_embed: True, ;33;muse_spk_embed: False, 
;33;muse_spk_id: False, ;33;muse_split_spk_id: False, ;33;muse_uv: False, ;33;muse_var_enc: False, ;33;muse_vec: False, 
;33;mval_check_interval: 1000, ;33;mvalid_num: 0, ;33;mvalid_set_name: valid, ;33;mvalidate: False, ;33;mvocoder: network.vocoders.nsf_hifigan.NsfHifiGAN, 
;33;mvocoder_ckpt: checkpoints/nsf_hifigan/model, ;33;mwarmup_updates: 2000, ;33;mwav2spec_eps: 1e-6, ;33;mweight_decay: 0, ;33;mwin_size: 2048, 
;33;mwork_dir: checkpoints/neer, 
| Mel losses: {'ssim': 0.5, 'l1': 0.5}
| Load HifiGAN:  checkpoints/nsf_hifigan/model
Removing weight norm...
12/14 10:52:47 PM gpu available: True, used: True
Traceback (most recent call last):
  File "run.py", line 15, in <module>
    run_task()
  File "run.py", line 11, in run_task
    task_cls.start()
  File "/content/diff-svc/training/task/base_task.py", line 234, in start
    trainer.fit(task)
  File "/content/diff-svc/utils/pl_utils.py", line 487, in fit
    model.model = model.build_model()
  File "/content/diff-svc/training/task/fs2.py", line 75, in build_model
    self.load_ckpt(hparams['load_ckpt'], strict=True)
  File "/content/diff-svc/training/task/base_task.py", line 84, in load_ckpt
    utils.load_ckpt(self.__getattr__(current_model_name), ckpt_base_dir, current_model_name, force, strict)
  File "/content/diff-svc/utils/__init__.py", line 202, in load_ckpt
    cur_model.load_state_dict(state_dict, strict=strict)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1604, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for GaussianDiffusion:
	Unexpected key(s) in state_dict: "fs2.encoder.layers.0.op.layer_norm1.weight", "fs2.encoder.layers.0.op.layer_norm1.bias", "fs2.encoder.layers.0.op.self_attn.in_proj_weight", "fs2.encoder.layers.0.op.self_attn.out_proj.weight", "fs2.encoder.layers.0.op.layer_norm2.weight", "fs2.encoder.layers.0.op.layer_norm2.bias", "fs2.encoder.layers.0.op.ffn.ffn_1.weight", "fs2.encoder.layers.0.op.ffn.ffn_1.bias", "fs2.encoder.layers.0.op.ffn.ffn_2.weight", "fs2.encoder.layers.0.op.ffn.ffn_2.bias", "fs2.encoder.layers.1.op.layer_norm1.weight", "fs2.encoder.layers.1.op.layer_norm1.bias", "fs2.encoder.layers.1.op.self_attn.in_proj_weight", "fs2.encoder.layers.1.op.self_attn.out_proj.weight", "fs2.encoder.layers.1.op.layer_norm2.weight", "fs2.encoder.layers.1.op.layer_norm2.bias", "fs2.encoder.layers.1.op.ffn.ffn_1.weight", "fs2.encoder.layers.1.op.ffn.ffn_1.bias", "fs2.encoder.layers.1.op.ffn.ffn_2.weight", "fs2.encoder.layers.1.op.ffn.ffn_2.bias", "fs2.encoder.layers.2.op.layer_norm1.weight", "fs2.encoder.layers.2.op.layer_norm1.bias", "fs2.encoder.layers.2.op.self_attn.in_proj_weight", "fs2.encoder.layers.2.op.self_attn.out_proj.weight", "fs2.encoder.layers.2.op.layer_norm2.weight", "fs2.encoder.layers.2.op.layer_norm2.bias", "fs2.encoder.layers.2.op.ffn.ffn_1.weight", "fs2.encoder.layers.2.op.ffn.ffn_1.bias", "fs2.encoder.layers.2.op.ffn.ffn_2.weight", "fs2.encoder.layers.2.op.ffn.ffn_2.bias", "fs2.encoder.layers.3.op.layer_norm1.weight", "fs2.encoder.layers.3.op.layer_norm1.bias", "fs2.encoder.layers.3.op.self_attn.in_proj_weight", "fs2.encoder.layers.3.op.self_attn.out_proj.weight", "fs2.encoder.layers.3.op.layer_norm2.weight", "fs2.encoder.layers.3.op.layer_norm2.bias", "fs2.encoder.layers.3.op.ffn.ffn_1.weight", "fs2.encoder.layers.3.op.ffn.ffn_1.bias", "fs2.encoder.layers.3.op.ffn.ffn_2.weight", "fs2.encoder.layers.3.op.ffn.ffn_2.bias", "fs2.encoder.layer_norm.weight", "fs2.encoder.layer_norm.bias", "fs2.decoder.pos_embed_alpha", "fs2.decoder.embed_positions._float_tensor", "fs2.decoder.layers.0.op.layer_norm1.weight", "fs2.decoder.layers.0.op.layer_norm1.bias", "fs2.decoder.layers.0.op.self_attn.in_proj_weight", "fs2.decoder.layers.0.op.self_attn.out_proj.weight", "fs2.decoder.layers.0.op.layer_norm2.weight", "fs2.decoder.layers.0.op.layer_norm2.bias", "fs2.decoder.layers.0.op.ffn.ffn_1.weight", "fs2.decoder.layers.0.op.ffn.ffn_1.bias", "fs2.decoder.layers.0.op.ffn.ffn_2.weight", "fs2.decoder.layers.0.op.ffn.ffn_2.bias", "fs2.decoder.layers.1.op.layer_norm1.weight", "fs2.decoder.layers.1.op.layer_norm1.bias", "fs2.decoder.layers.1.op.self_attn.in_proj_weight", "fs2.decoder.layers.1.op.self_attn.out_proj.weight", "fs2.decoder.layers.1.op.layer_norm2.weight", "fs2.decoder.layers.1.op.layer_norm2.bias", "fs2.decoder.layers.1.op.ffn.ffn_1.weight", "fs2.decoder.layers.1.op.ffn.ffn_1.bias", "fs2.decoder.layers.1.op.ffn.ffn_2.weight", "fs2.decoder.layers.1.op.ffn.ffn_2.bias", "fs2.decoder.layers.2.op.layer_norm1.weight", "fs2.decoder.layers.2.op.layer_norm1.bias", "fs2.decoder.layers.2.op.self_attn.in_proj_weight", "fs2.decoder.layers.2.op.self_attn.out_proj.weight", "fs2.decoder.layers.2.op.layer_norm2.weight", "fs2.decoder.layers.2.op.layer_norm2.bias", "fs2.decoder.layers.2.op.ffn.ffn_1.weight", "fs2.decoder.layers.2.op.ffn.ffn_1.bias", "fs2.decoder.layers.2.op.ffn.ffn_2.weight", "fs2.decoder.layers.2.op.ffn.ffn_2.bias", "fs2.decoder.layers.3.op.layer_norm1.weight", "fs2.decoder.layers.3.op.layer_norm1.bias", "fs2.decoder.layers.3.op.self_attn.in_proj_weight", "fs2.decoder.layers.3.op.self_attn.out_proj.weight", "fs2.decoder.layers.3.op.layer_norm2.weight", "fs2.decoder.layers.3.op.layer_norm2.bias", "fs2.decoder.layers.3.op.ffn.ffn_1.weight", "fs2.decoder.layers.3.op.ffn.ffn_1.bias", "fs2.decoder.layers.3.op.ffn.ffn_2.weight", "fs2.decoder.layers.3.op.ffn.ffn_2.bias", "fs2.decoder.layer_norm.weight", "fs2.decoder.layer_norm.bias". 
	size mismatch for spec_min: copying a param with shape torch.Size([1, 1, 80]) from checkpoint, the shape in current model is torch.Size([1, 1, 128]).
	size mismatch for spec_max: copying a param with shape torch.Size([1, 1, 80]) from checkpoint, the shape in current model is torch.Size([1, 1, 128]).
	size mismatch for denoise_fn.input_projection.weight: copying a param with shape torch.Size([256, 80, 1]) from checkpoint, the shape in current model is torch.Size([384, 128, 1]).
	size mismatch for denoise_fn.input_projection.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([384]).
	size mismatch for denoise_fn.mlp.0.weight: copying a param with shape torch.Size([1024, 256]) from checkpoint, the shape in current model is torch.Size([1536, 384]).
	size mismatch for denoise_fn.mlp.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1536]).
	size mismatch for denoise_fn.mlp.2.weight: copying a param with shape torch.Size([256, 1024]) from checkpoint, the shape in current model is torch.Size([384, 1536]).
	size mismatch for denoise_fn.mlp.2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([384]).
	size mismatch for denoise_fn.residual_layers.0.dilated_conv.weight: copying a param with shape torch.Size([512, 256, 3]) from checkpoint, the shape in current model is torch.Size([768, 384, 3]).
	size mismatch for denoise_fn.residual_layers.0.dilated_conv.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]).

... blah blah blah more of the same

The text was updated successfully, but these errors were encountered:

prophesier · 2022-12-15T16:07:01Z

If you are using colab for training, it is best to ask the author of colab because I am not sure what modifications colab has made to the source code. Alternatively, you can ask on the discord channel on the homepage, where most colab authors should be present.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python Notebook fails on training #37

Python Notebook fails on training #37

jamierpond commented Dec 14, 2022

prophesier commented Dec 15, 2022

Python Notebook fails on training #37

Python Notebook fails on training #37

Comments

jamierpond commented Dec 14, 2022

prophesier commented Dec 15, 2022