You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi! Jus trying to run a simple demo. Currently following all the demos/examples/other people's tutorials and I have everything set up the same as them, but i keep getting errors. I'm wondering if you would please point me in the right direction?
/content/diff-svc
| Hparams chains: ['/content/diff-svc/training/config_nsf.yaml']
| Hparams:
;33;mK_step: 1000, ;33;maccumulate_grad_batches: 1, ;33;maudio_num_mel_bins: 128, ;33;maudio_sample_rate: 44100, ;33;mbinarization_args: {'shuffle': False, 'with_align': True, 'with_f0': True, 'with_hubert': True, 'with_spk_embed': False, 'with_wav': False},
;33;mbinarizer_cls: preprocessing.SVCpre.SVCBinarizer, ;33;mbinary_data_dir: data/binary/neer, ;33;mcheck_val_every_n_epoch: 10, ;33;mchoose_test_manually: False, ;33;mclip_grad_norm: 1,
;33;mconfig_path: training/config_nsf.yaml, ;33;mcontent_cond_steps: [], ;33;mcwt_add_f0_loss: False, ;33;mcwt_hidden_size: 128, ;33;mcwt_layers: 2,
;33;mcwt_loss: l1, ;33;mcwt_std_scale: 0.8, ;33;mdatasets: ['opencpop'], ;33;mdebug: False, ;33;mdec_ffn_kernel_size: 9,
;33;mdec_layers: 4, ;33;mdecay_steps: 20000, ;33;mdecoder_type: fft, ;33;mdict_dir: , ;33;mdiff_decoder_type: wavenet,
;33;mdiff_loss_type: l2, ;33;mdilation_cycle_length: 4, ;33;mdropout: 0.1, ;33;mds_workers: 4, ;33;mdur_enc_hidden_stride_kernel: ['0,2,3', '0,2,3', '0,1,3'],
;33;mdur_loss: mse, ;33;mdur_predictor_kernel: 3, ;33;mdur_predictor_layers: 5, ;33;menc_ffn_kernel_size: 9, ;33;menc_layers: 4,
;33;mencoder_K: 8, ;33;mencoder_type: fft, ;33;mendless_ds: True, ;33;mf0_bin: 256, ;33;mf0_max: 1100.0,
;33;mf0_min: 40.0, ;33;mffn_act: gelu, ;33;mffn_padding: SAME, ;33;mfft_size: 2048, ;33;mfmax: 16000,
;33;mfmin: 40, ;33;mfs2_ckpt: , ;33;mgaussian_start: True, ;33;mgen_dir_name: , ;33;mgen_tgt_spk_id: -1,
;33;mhidden_size: 256, ;33;mhop_size: 512, ;33;mhubert_gpu: True, ;33;mhubert_path: checkpoints/hubert/hubert_soft.pt, ;33;minfer: False,
;33;mkeep_bins: 128, ;33;mlambda_commit: 0.25, ;33;mlambda_energy: 0.0, ;33;mlambda_f0: 1.0, ;33;mlambda_ph_dur: 0.3,
;33;mlambda_sent_dur: 1.0, ;33;mlambda_uv: 1.0, ;33;mlambda_word_dur: 1.0, ;33;mload_ckpt: /content/diff-svc/pretrain/nehito.ckpt, ;33;mlog_interval: 100,
;33;mloud_norm: False, ;33;mlr: 0.0008, ;33;mmax_beta: 0.02, ;33;mmax_epochs: 3000, ;33;mmax_eval_sentences: 1,
;33;mmax_eval_tokens: 60000, ;33;mmax_frames: 42000, ;33;mmax_input_tokens: 60000, ;33;mmax_sentences: 12, ;33;mmax_tokens: 128000,
;33;mmax_updates: 1000000, ;33;mmel_loss: ssim:0.5|l1:0.5, ;33;mmel_vmax: 1.5, ;33;mmel_vmin: -6.0, ;33;mmin_level_db: -120,
;33;mno_fs2: True, ;33;mnorm_type: gn, ;33;mnum_ckpt_keep: 10, ;33;mnum_heads: 2, ;33;mnum_sanity_val_steps: 1,
;33;mnum_spk: 1, ;33;mnum_test_samples: 0, ;33;mnum_valid_plots: 10, ;33;moptimizer_adam_beta1: 0.9, ;33;moptimizer_adam_beta2: 0.98,
;33;mout_wav_norm: False, ;33;mpe_ckpt: checkpoints/0102_xiaoma_pe/model_ckpt_steps_60000.ckpt, ;33;mpe_enable: False, ;33;mperform_enhance: True, ;33;mpitch_ar: False,
;33;mpitch_enc_hidden_stride_kernel: ['0,2,5', '0,2,5', '0,2,5'], ;33;mpitch_extractor: parselmouth, ;33;mpitch_loss: l2, ;33;mpitch_norm: log, ;33;mpitch_type: frame,
;33;mpndm_speedup: 10, ;33;mpre_align_args: {'allow_no_txt': False, 'denoise': False, 'forced_align': 'mfa', 'txt_processor': 'zh_g2pM', 'use_sox': True, 'use_tone': False}, ;33;mpre_align_cls: data_gen.singing.pre_align.SingingPreAlign, ;33;mpredictor_dropout: 0.5, ;33;mpredictor_grad: 0.1,
;33;mpredictor_hidden: -1, ;33;mpredictor_kernel: 5, ;33;mpredictor_layers: 5, ;33;mprenet_dropout: 0.5, ;33;mprenet_hidden_size: 256,
;33;mpretrain_fs_ckpt: , ;33;mprocessed_data_dir: xxx, ;33;mprofile_infer: False, ;33;mraw_data_dir: data/raw/neer, ;33;mref_norm_layer: bn,
;33;mrel_pos: True, ;33;mreset_phone_dict: True, ;33;mresidual_channels: 384, ;33;mresidual_layers: 20, ;33;msave_best: False,
;33;msave_ckpt: True, ;33;msave_codes: ['configs', 'modules', 'src', 'utils'], ;33;msave_f0: True, ;33;msave_gt: False, ;33;mschedule_type: linear,
;33;mseed: 1234, ;33;msort_by_len: True, ;33;mspeaker_id: neer, ;33;mspec_max: [-0.07976219058036804, 0.3064012825489044, 0.45079874992370605, 0.48896849155426025, 0.38102585077285767, 0.5545408129692078, 0.6556591391563416, 0.5011460781097412, 0.7585625052452087, 0.7933887243270874, 0.7276718020439148, 0.6568117141723633, 0.8160334825515747, 0.7098748087882996, 0.7070586681365967, 0.9631615281105042, 0.8693066835403442, 0.8992214202880859, 0.8334618210792542, 0.9382892847061157, 0.761588454246521, 1.0139938592910767, 0.8147022128105164, 0.8377708196640015, 0.8404781818389893, 0.5279245376586914, 0.7715780735015869, 0.5754967331886292, 0.19373822212219238, 0.11457031220197678, -0.048836078494787216, 0.2835775315761566, 0.1506994366645813, -0.016768964007496834, 0.07266628742218018, -0.05616551637649536, -0.010572524741292, 0.1133032739162445, 0.16342110931873322, 0.035064052790403366, 0.3116454482078552, 0.16785651445388794, 0.1354154646396637, 0.36229264736175537, 0.372775673866272, -0.10152062773704529, 0.22035335004329681, 0.183604434132576, 0.04665748029947281, 0.23221279680728912, 0.21843412518501282, 0.049887944012880325, -0.05100967362523079, -0.0010432127164676785, -0.06516791135072708, 0.07901491224765778, -0.18570756912231445, -0.14707334339618683, -0.11538795381784439, -0.1341129094362259, -0.15978987514972687, -0.18778416514396667, -0.2038293480873108, -0.25516536831855774, -0.24493663012981415, -0.15004149079322815, -0.016140246763825417, -0.07177135348320007, -0.3963303565979004, -0.3779948353767395, -0.25783461332321167, -0.16094177961349487, -0.23505426943302155, -0.3541640043258667, -0.34247317910194397, -0.3881177306175232, -0.4593522846698761, -0.5756832957267761, -0.35765165090560913, -0.542741060256958, -0.4082295298576355, -0.4770561158657074, -0.17004281282424927, -0.27877169847488403, -0.15326324105262756, -0.4180527925491333, -0.27339401841163635, -0.23254677653312683, -0.29365968704223633, -0.33521631360054016, -0.3491170406341553, -0.18533602356910706, -0.29260891675949097, -0.44137561321258545, -0.6128101944923401, -0.731763482093811, -0.6580878496170044, -0.11427026987075806, -0.3944733738899231, -0.6505616903305054, -0.6488122344017029, -0.7484522461891174, -0.7040322422981262, -0.6145080924034119, -0.531133770942688, -0.5737754702568054, -0.6910640597343445, -0.6721180081367493, -0.8550227284431458, -0.7104114294052124, -0.6984644532203674, -0.8648133277893066, -1.0164130926132202, -1.0275567770004272, -1.1420173645019531, -1.068782925605774, -1.244425654411316, -1.302030086517334, -1.5661638975143433, -1.639020562171936, -1.697121500968933, -1.9838589429855347, -2.2957139015197754, -2.2596089839935303, -2.119849443435669, -2.2869279384613037, -2.358459711074829, -2.3582921028137207], ;33;mspec_min: [-4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102],
;33;mspk_cond_steps: [], ;33;mstop_token_weight: 5.0, ;33;mtask_cls: training.task.SVC_task.SVCTask, ;33;mtest_ids: [], ;33;mtest_input_dir: ,
;33;mtest_num: 0, ;33;mtest_prefixes: ['test'], ;33;mtest_set_name: test, ;33;mtimesteps: 1000, ;33;mtrain_set_name: train,
;33;muse_crepe: True, ;33;muse_denoise: False, ;33;muse_energy_embed: False, ;33;muse_gt_dur: False, ;33;muse_gt_f0: False,
;33;muse_midi: False, ;33;muse_nsf: True, ;33;muse_pitch_embed: True, ;33;muse_pos_embed: True, ;33;muse_spk_embed: False,
;33;muse_spk_id: False, ;33;muse_split_spk_id: False, ;33;muse_uv: False, ;33;muse_var_enc: False, ;33;muse_vec: False,
;33;mval_check_interval: 1000, ;33;mvalid_num: 0, ;33;mvalid_set_name: valid, ;33;mvalidate: False, ;33;mvocoder: network.vocoders.nsf_hifigan.NsfHifiGAN,
;33;mvocoder_ckpt: checkpoints/nsf_hifigan/model, ;33;mwarmup_updates: 2000, ;33;mwav2spec_eps: 1e-6, ;33;mweight_decay: 0, ;33;mwin_size: 2048,
;33;mwork_dir: checkpoints/neer,
| Mel losses: {'ssim': 0.5, 'l1': 0.5}
| Load HifiGAN: checkpoints/nsf_hifigan/model
Removing weight norm...
12/14 10:52:47 PM gpu available: True, used: True
Traceback (most recent call last):
File "run.py", line 15, in <module>
run_task()
File "run.py", line 11, in run_task
task_cls.start()
File "/content/diff-svc/training/task/base_task.py", line 234, in start
trainer.fit(task)
File "/content/diff-svc/utils/pl_utils.py", line 487, in fit
model.model = model.build_model()
File "/content/diff-svc/training/task/fs2.py", line 75, in build_model
self.load_ckpt(hparams['load_ckpt'], strict=True)
File "/content/diff-svc/training/task/base_task.py", line 84, in load_ckpt
utils.load_ckpt(self.__getattr__(current_model_name), ckpt_base_dir, current_model_name, force, strict)
File "/content/diff-svc/utils/__init__.py", line 202, in load_ckpt
cur_model.load_state_dict(state_dict, strict=strict)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1604, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for GaussianDiffusion:
Unexpected key(s) in state_dict: "fs2.encoder.layers.0.op.layer_norm1.weight", "fs2.encoder.layers.0.op.layer_norm1.bias", "fs2.encoder.layers.0.op.self_attn.in_proj_weight", "fs2.encoder.layers.0.op.self_attn.out_proj.weight", "fs2.encoder.layers.0.op.layer_norm2.weight", "fs2.encoder.layers.0.op.layer_norm2.bias", "fs2.encoder.layers.0.op.ffn.ffn_1.weight", "fs2.encoder.layers.0.op.ffn.ffn_1.bias", "fs2.encoder.layers.0.op.ffn.ffn_2.weight", "fs2.encoder.layers.0.op.ffn.ffn_2.bias", "fs2.encoder.layers.1.op.layer_norm1.weight", "fs2.encoder.layers.1.op.layer_norm1.bias", "fs2.encoder.layers.1.op.self_attn.in_proj_weight", "fs2.encoder.layers.1.op.self_attn.out_proj.weight", "fs2.encoder.layers.1.op.layer_norm2.weight", "fs2.encoder.layers.1.op.layer_norm2.bias", "fs2.encoder.layers.1.op.ffn.ffn_1.weight", "fs2.encoder.layers.1.op.ffn.ffn_1.bias", "fs2.encoder.layers.1.op.ffn.ffn_2.weight", "fs2.encoder.layers.1.op.ffn.ffn_2.bias", "fs2.encoder.layers.2.op.layer_norm1.weight", "fs2.encoder.layers.2.op.layer_norm1.bias", "fs2.encoder.layers.2.op.self_attn.in_proj_weight", "fs2.encoder.layers.2.op.self_attn.out_proj.weight", "fs2.encoder.layers.2.op.layer_norm2.weight", "fs2.encoder.layers.2.op.layer_norm2.bias", "fs2.encoder.layers.2.op.ffn.ffn_1.weight", "fs2.encoder.layers.2.op.ffn.ffn_1.bias", "fs2.encoder.layers.2.op.ffn.ffn_2.weight", "fs2.encoder.layers.2.op.ffn.ffn_2.bias", "fs2.encoder.layers.3.op.layer_norm1.weight", "fs2.encoder.layers.3.op.layer_norm1.bias", "fs2.encoder.layers.3.op.self_attn.in_proj_weight", "fs2.encoder.layers.3.op.self_attn.out_proj.weight", "fs2.encoder.layers.3.op.layer_norm2.weight", "fs2.encoder.layers.3.op.layer_norm2.bias", "fs2.encoder.layers.3.op.ffn.ffn_1.weight", "fs2.encoder.layers.3.op.ffn.ffn_1.bias", "fs2.encoder.layers.3.op.ffn.ffn_2.weight", "fs2.encoder.layers.3.op.ffn.ffn_2.bias", "fs2.encoder.layer_norm.weight", "fs2.encoder.layer_norm.bias", "fs2.decoder.pos_embed_alpha", "fs2.decoder.embed_positions._float_tensor", "fs2.decoder.layers.0.op.layer_norm1.weight", "fs2.decoder.layers.0.op.layer_norm1.bias", "fs2.decoder.layers.0.op.self_attn.in_proj_weight", "fs2.decoder.layers.0.op.self_attn.out_proj.weight", "fs2.decoder.layers.0.op.layer_norm2.weight", "fs2.decoder.layers.0.op.layer_norm2.bias", "fs2.decoder.layers.0.op.ffn.ffn_1.weight", "fs2.decoder.layers.0.op.ffn.ffn_1.bias", "fs2.decoder.layers.0.op.ffn.ffn_2.weight", "fs2.decoder.layers.0.op.ffn.ffn_2.bias", "fs2.decoder.layers.1.op.layer_norm1.weight", "fs2.decoder.layers.1.op.layer_norm1.bias", "fs2.decoder.layers.1.op.self_attn.in_proj_weight", "fs2.decoder.layers.1.op.self_attn.out_proj.weight", "fs2.decoder.layers.1.op.layer_norm2.weight", "fs2.decoder.layers.1.op.layer_norm2.bias", "fs2.decoder.layers.1.op.ffn.ffn_1.weight", "fs2.decoder.layers.1.op.ffn.ffn_1.bias", "fs2.decoder.layers.1.op.ffn.ffn_2.weight", "fs2.decoder.layers.1.op.ffn.ffn_2.bias", "fs2.decoder.layers.2.op.layer_norm1.weight", "fs2.decoder.layers.2.op.layer_norm1.bias", "fs2.decoder.layers.2.op.self_attn.in_proj_weight", "fs2.decoder.layers.2.op.self_attn.out_proj.weight", "fs2.decoder.layers.2.op.layer_norm2.weight", "fs2.decoder.layers.2.op.layer_norm2.bias", "fs2.decoder.layers.2.op.ffn.ffn_1.weight", "fs2.decoder.layers.2.op.ffn.ffn_1.bias", "fs2.decoder.layers.2.op.ffn.ffn_2.weight", "fs2.decoder.layers.2.op.ffn.ffn_2.bias", "fs2.decoder.layers.3.op.layer_norm1.weight", "fs2.decoder.layers.3.op.layer_norm1.bias", "fs2.decoder.layers.3.op.self_attn.in_proj_weight", "fs2.decoder.layers.3.op.self_attn.out_proj.weight", "fs2.decoder.layers.3.op.layer_norm2.weight", "fs2.decoder.layers.3.op.layer_norm2.bias", "fs2.decoder.layers.3.op.ffn.ffn_1.weight", "fs2.decoder.layers.3.op.ffn.ffn_1.bias", "fs2.decoder.layers.3.op.ffn.ffn_2.weight", "fs2.decoder.layers.3.op.ffn.ffn_2.bias", "fs2.decoder.layer_norm.weight", "fs2.decoder.layer_norm.bias".
size mismatch for spec_min: copying a param with shape torch.Size([1, 1, 80]) from checkpoint, the shape in current model is torch.Size([1, 1, 128]).
size mismatch for spec_max: copying a param with shape torch.Size([1, 1, 80]) from checkpoint, the shape in current model is torch.Size([1, 1, 128]).
size mismatch for denoise_fn.input_projection.weight: copying a param with shape torch.Size([256, 80, 1]) from checkpoint, the shape in current model is torch.Size([384, 128, 1]).
size mismatch for denoise_fn.input_projection.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([384]).
size mismatch for denoise_fn.mlp.0.weight: copying a param with shape torch.Size([1024, 256]) from checkpoint, the shape in current model is torch.Size([1536, 384]).
size mismatch for denoise_fn.mlp.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1536]).
size mismatch for denoise_fn.mlp.2.weight: copying a param with shape torch.Size([256, 1024]) from checkpoint, the shape in current model is torch.Size([384, 1536]).
size mismatch for denoise_fn.mlp.2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([384]).
size mismatch for denoise_fn.residual_layers.0.dilated_conv.weight: copying a param with shape torch.Size([512, 256, 3]) from checkpoint, the shape in current model is torch.Size([768, 384, 3]).
size mismatch for denoise_fn.residual_layers.0.dilated_conv.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]).
... blah blah blah more of the same
The text was updated successfully, but these errors were encountered:
If you are using colab for training, it is best to ask the author of colab because I am not sure what modifications colab has made to the source code. Alternatively, you can ask on the discord channel on the homepage, where most colab authors should be present.
Hi! Jus trying to run a simple demo. Currently following all the demos/examples/other people's tutorials and I have everything set up the same as them, but i keep getting errors. I'm wondering if you would please point me in the right direction?
The text was updated successfully, but these errors were encountered: