Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python Notebook fails on training #37

Open
jamierpond opened this issue Dec 14, 2022 · 1 comment
Open

Python Notebook fails on training #37

jamierpond opened this issue Dec 14, 2022 · 1 comment

Comments

@jamierpond
Copy link

Hi! Jus trying to run a simple demo. Currently following all the demos/examples/other people's tutorials and I have everything set up the same as them, but i keep getting errors. I'm wondering if you would please point me in the right direction?

/content/diff-svc
| Hparams chains:  ['/content/diff-svc/training/config_nsf.yaml']
| Hparams: 
;33;mK_step: 1000, ;33;maccumulate_grad_batches: 1, ;33;maudio_num_mel_bins: 128, ;33;maudio_sample_rate: 44100, ;33;mbinarization_args: {'shuffle': False, 'with_align': True, 'with_f0': True, 'with_hubert': True, 'with_spk_embed': False, 'with_wav': False}, 
;33;mbinarizer_cls: preprocessing.SVCpre.SVCBinarizer, ;33;mbinary_data_dir: data/binary/neer, ;33;mcheck_val_every_n_epoch: 10, ;33;mchoose_test_manually: False, ;33;mclip_grad_norm: 1, 
;33;mconfig_path: training/config_nsf.yaml, ;33;mcontent_cond_steps: [], ;33;mcwt_add_f0_loss: False, ;33;mcwt_hidden_size: 128, ;33;mcwt_layers: 2, 
;33;mcwt_loss: l1, ;33;mcwt_std_scale: 0.8, ;33;mdatasets: ['opencpop'], ;33;mdebug: False, ;33;mdec_ffn_kernel_size: 9, 
;33;mdec_layers: 4, ;33;mdecay_steps: 20000, ;33;mdecoder_type: fft, ;33;mdict_dir: , ;33;mdiff_decoder_type: wavenet, 
;33;mdiff_loss_type: l2, ;33;mdilation_cycle_length: 4, ;33;mdropout: 0.1, ;33;mds_workers: 4, ;33;mdur_enc_hidden_stride_kernel: ['0,2,3', '0,2,3', '0,1,3'], 
;33;mdur_loss: mse, ;33;mdur_predictor_kernel: 3, ;33;mdur_predictor_layers: 5, ;33;menc_ffn_kernel_size: 9, ;33;menc_layers: 4, 
;33;mencoder_K: 8, ;33;mencoder_type: fft, ;33;mendless_ds: True, ;33;mf0_bin: 256, ;33;mf0_max: 1100.0, 
;33;mf0_min: 40.0, ;33;mffn_act: gelu, ;33;mffn_padding: SAME, ;33;mfft_size: 2048, ;33;mfmax: 16000, 
;33;mfmin: 40, ;33;mfs2_ckpt: , ;33;mgaussian_start: True, ;33;mgen_dir_name: , ;33;mgen_tgt_spk_id: -1, 
;33;mhidden_size: 256, ;33;mhop_size: 512, ;33;mhubert_gpu: True, ;33;mhubert_path: checkpoints/hubert/hubert_soft.pt, ;33;minfer: False, 
;33;mkeep_bins: 128, ;33;mlambda_commit: 0.25, ;33;mlambda_energy: 0.0, ;33;mlambda_f0: 1.0, ;33;mlambda_ph_dur: 0.3, 
;33;mlambda_sent_dur: 1.0, ;33;mlambda_uv: 1.0, ;33;mlambda_word_dur: 1.0, ;33;mload_ckpt: /content/diff-svc/pretrain/nehito.ckpt, ;33;mlog_interval: 100, 
;33;mloud_norm: False, ;33;mlr: 0.0008, ;33;mmax_beta: 0.02, ;33;mmax_epochs: 3000, ;33;mmax_eval_sentences: 1, 
;33;mmax_eval_tokens: 60000, ;33;mmax_frames: 42000, ;33;mmax_input_tokens: 60000, ;33;mmax_sentences: 12, ;33;mmax_tokens: 128000, 
;33;mmax_updates: 1000000, ;33;mmel_loss: ssim:0.5|l1:0.5, ;33;mmel_vmax: 1.5, ;33;mmel_vmin: -6.0, ;33;mmin_level_db: -120, 
;33;mno_fs2: True, ;33;mnorm_type: gn, ;33;mnum_ckpt_keep: 10, ;33;mnum_heads: 2, ;33;mnum_sanity_val_steps: 1, 
;33;mnum_spk: 1, ;33;mnum_test_samples: 0, ;33;mnum_valid_plots: 10, ;33;moptimizer_adam_beta1: 0.9, ;33;moptimizer_adam_beta2: 0.98, 
;33;mout_wav_norm: False, ;33;mpe_ckpt: checkpoints/0102_xiaoma_pe/model_ckpt_steps_60000.ckpt, ;33;mpe_enable: False, ;33;mperform_enhance: True, ;33;mpitch_ar: False, 
;33;mpitch_enc_hidden_stride_kernel: ['0,2,5', '0,2,5', '0,2,5'], ;33;mpitch_extractor: parselmouth, ;33;mpitch_loss: l2, ;33;mpitch_norm: log, ;33;mpitch_type: frame, 
;33;mpndm_speedup: 10, ;33;mpre_align_args: {'allow_no_txt': False, 'denoise': False, 'forced_align': 'mfa', 'txt_processor': 'zh_g2pM', 'use_sox': True, 'use_tone': False}, ;33;mpre_align_cls: data_gen.singing.pre_align.SingingPreAlign, ;33;mpredictor_dropout: 0.5, ;33;mpredictor_grad: 0.1, 
;33;mpredictor_hidden: -1, ;33;mpredictor_kernel: 5, ;33;mpredictor_layers: 5, ;33;mprenet_dropout: 0.5, ;33;mprenet_hidden_size: 256, 
;33;mpretrain_fs_ckpt: , ;33;mprocessed_data_dir: xxx, ;33;mprofile_infer: False, ;33;mraw_data_dir: data/raw/neer, ;33;mref_norm_layer: bn, 
;33;mrel_pos: True, ;33;mreset_phone_dict: True, ;33;mresidual_channels: 384, ;33;mresidual_layers: 20, ;33;msave_best: False, 
;33;msave_ckpt: True, ;33;msave_codes: ['configs', 'modules', 'src', 'utils'], ;33;msave_f0: True, ;33;msave_gt: False, ;33;mschedule_type: linear, 
;33;mseed: 1234, ;33;msort_by_len: True, ;33;mspeaker_id: neer, ;33;mspec_max: [-0.07976219058036804, 0.3064012825489044, 0.45079874992370605, 0.48896849155426025, 0.38102585077285767, 0.5545408129692078, 0.6556591391563416, 0.5011460781097412, 0.7585625052452087, 0.7933887243270874, 0.7276718020439148, 0.6568117141723633, 0.8160334825515747, 0.7098748087882996, 0.7070586681365967, 0.9631615281105042, 0.8693066835403442, 0.8992214202880859, 0.8334618210792542, 0.9382892847061157, 0.761588454246521, 1.0139938592910767, 0.8147022128105164, 0.8377708196640015, 0.8404781818389893, 0.5279245376586914, 0.7715780735015869, 0.5754967331886292, 0.19373822212219238, 0.11457031220197678, -0.048836078494787216, 0.2835775315761566, 0.1506994366645813, -0.016768964007496834, 0.07266628742218018, -0.05616551637649536, -0.010572524741292, 0.1133032739162445, 0.16342110931873322, 0.035064052790403366, 0.3116454482078552, 0.16785651445388794, 0.1354154646396637, 0.36229264736175537, 0.372775673866272, -0.10152062773704529, 0.22035335004329681, 0.183604434132576, 0.04665748029947281, 0.23221279680728912, 0.21843412518501282, 0.049887944012880325, -0.05100967362523079, -0.0010432127164676785, -0.06516791135072708, 0.07901491224765778, -0.18570756912231445, -0.14707334339618683, -0.11538795381784439, -0.1341129094362259, -0.15978987514972687, -0.18778416514396667, -0.2038293480873108, -0.25516536831855774, -0.24493663012981415, -0.15004149079322815, -0.016140246763825417, -0.07177135348320007, -0.3963303565979004, -0.3779948353767395, -0.25783461332321167, -0.16094177961349487, -0.23505426943302155, -0.3541640043258667, -0.34247317910194397, -0.3881177306175232, -0.4593522846698761, -0.5756832957267761, -0.35765165090560913, -0.542741060256958, -0.4082295298576355, -0.4770561158657074, -0.17004281282424927, -0.27877169847488403, -0.15326324105262756, -0.4180527925491333, -0.27339401841163635, -0.23254677653312683, -0.29365968704223633, -0.33521631360054016, -0.3491170406341553, -0.18533602356910706, -0.29260891675949097, -0.44137561321258545, -0.6128101944923401, -0.731763482093811, -0.6580878496170044, -0.11427026987075806, -0.3944733738899231, -0.6505616903305054, -0.6488122344017029, -0.7484522461891174, -0.7040322422981262, -0.6145080924034119, -0.531133770942688, -0.5737754702568054, -0.6910640597343445, -0.6721180081367493, -0.8550227284431458, -0.7104114294052124, -0.6984644532203674, -0.8648133277893066, -1.0164130926132202, -1.0275567770004272, -1.1420173645019531, -1.068782925605774, -1.244425654411316, -1.302030086517334, -1.5661638975143433, -1.639020562171936, -1.697121500968933, -1.9838589429855347, -2.2957139015197754, -2.2596089839935303, -2.119849443435669, -2.2869279384613037, -2.358459711074829, -2.3582921028137207], ;33;mspec_min: [-4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102, -4.999994277954102], 
;33;mspk_cond_steps: [], ;33;mstop_token_weight: 5.0, ;33;mtask_cls: training.task.SVC_task.SVCTask, ;33;mtest_ids: [], ;33;mtest_input_dir: , 
;33;mtest_num: 0, ;33;mtest_prefixes: ['test'], ;33;mtest_set_name: test, ;33;mtimesteps: 1000, ;33;mtrain_set_name: train, 
;33;muse_crepe: True, ;33;muse_denoise: False, ;33;muse_energy_embed: False, ;33;muse_gt_dur: False, ;33;muse_gt_f0: False, 
;33;muse_midi: False, ;33;muse_nsf: True, ;33;muse_pitch_embed: True, ;33;muse_pos_embed: True, ;33;muse_spk_embed: False, 
;33;muse_spk_id: False, ;33;muse_split_spk_id: False, ;33;muse_uv: False, ;33;muse_var_enc: False, ;33;muse_vec: False, 
;33;mval_check_interval: 1000, ;33;mvalid_num: 0, ;33;mvalid_set_name: valid, ;33;mvalidate: False, ;33;mvocoder: network.vocoders.nsf_hifigan.NsfHifiGAN, 
;33;mvocoder_ckpt: checkpoints/nsf_hifigan/model, ;33;mwarmup_updates: 2000, ;33;mwav2spec_eps: 1e-6, ;33;mweight_decay: 0, ;33;mwin_size: 2048, 
;33;mwork_dir: checkpoints/neer, 
| Mel losses: {'ssim': 0.5, 'l1': 0.5}
| Load HifiGAN:  checkpoints/nsf_hifigan/model
Removing weight norm...
12/14 10:52:47 PM gpu available: True, used: True
Traceback (most recent call last):
  File "run.py", line 15, in <module>
    run_task()
  File "run.py", line 11, in run_task
    task_cls.start()
  File "/content/diff-svc/training/task/base_task.py", line 234, in start
    trainer.fit(task)
  File "/content/diff-svc/utils/pl_utils.py", line 487, in fit
    model.model = model.build_model()
  File "/content/diff-svc/training/task/fs2.py", line 75, in build_model
    self.load_ckpt(hparams['load_ckpt'], strict=True)
  File "/content/diff-svc/training/task/base_task.py", line 84, in load_ckpt
    utils.load_ckpt(self.__getattr__(current_model_name), ckpt_base_dir, current_model_name, force, strict)
  File "/content/diff-svc/utils/__init__.py", line 202, in load_ckpt
    cur_model.load_state_dict(state_dict, strict=strict)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1604, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for GaussianDiffusion:
	Unexpected key(s) in state_dict: "fs2.encoder.layers.0.op.layer_norm1.weight", "fs2.encoder.layers.0.op.layer_norm1.bias", "fs2.encoder.layers.0.op.self_attn.in_proj_weight", "fs2.encoder.layers.0.op.self_attn.out_proj.weight", "fs2.encoder.layers.0.op.layer_norm2.weight", "fs2.encoder.layers.0.op.layer_norm2.bias", "fs2.encoder.layers.0.op.ffn.ffn_1.weight", "fs2.encoder.layers.0.op.ffn.ffn_1.bias", "fs2.encoder.layers.0.op.ffn.ffn_2.weight", "fs2.encoder.layers.0.op.ffn.ffn_2.bias", "fs2.encoder.layers.1.op.layer_norm1.weight", "fs2.encoder.layers.1.op.layer_norm1.bias", "fs2.encoder.layers.1.op.self_attn.in_proj_weight", "fs2.encoder.layers.1.op.self_attn.out_proj.weight", "fs2.encoder.layers.1.op.layer_norm2.weight", "fs2.encoder.layers.1.op.layer_norm2.bias", "fs2.encoder.layers.1.op.ffn.ffn_1.weight", "fs2.encoder.layers.1.op.ffn.ffn_1.bias", "fs2.encoder.layers.1.op.ffn.ffn_2.weight", "fs2.encoder.layers.1.op.ffn.ffn_2.bias", "fs2.encoder.layers.2.op.layer_norm1.weight", "fs2.encoder.layers.2.op.layer_norm1.bias", "fs2.encoder.layers.2.op.self_attn.in_proj_weight", "fs2.encoder.layers.2.op.self_attn.out_proj.weight", "fs2.encoder.layers.2.op.layer_norm2.weight", "fs2.encoder.layers.2.op.layer_norm2.bias", "fs2.encoder.layers.2.op.ffn.ffn_1.weight", "fs2.encoder.layers.2.op.ffn.ffn_1.bias", "fs2.encoder.layers.2.op.ffn.ffn_2.weight", "fs2.encoder.layers.2.op.ffn.ffn_2.bias", "fs2.encoder.layers.3.op.layer_norm1.weight", "fs2.encoder.layers.3.op.layer_norm1.bias", "fs2.encoder.layers.3.op.self_attn.in_proj_weight", "fs2.encoder.layers.3.op.self_attn.out_proj.weight", "fs2.encoder.layers.3.op.layer_norm2.weight", "fs2.encoder.layers.3.op.layer_norm2.bias", "fs2.encoder.layers.3.op.ffn.ffn_1.weight", "fs2.encoder.layers.3.op.ffn.ffn_1.bias", "fs2.encoder.layers.3.op.ffn.ffn_2.weight", "fs2.encoder.layers.3.op.ffn.ffn_2.bias", "fs2.encoder.layer_norm.weight", "fs2.encoder.layer_norm.bias", "fs2.decoder.pos_embed_alpha", "fs2.decoder.embed_positions._float_tensor", "fs2.decoder.layers.0.op.layer_norm1.weight", "fs2.decoder.layers.0.op.layer_norm1.bias", "fs2.decoder.layers.0.op.self_attn.in_proj_weight", "fs2.decoder.layers.0.op.self_attn.out_proj.weight", "fs2.decoder.layers.0.op.layer_norm2.weight", "fs2.decoder.layers.0.op.layer_norm2.bias", "fs2.decoder.layers.0.op.ffn.ffn_1.weight", "fs2.decoder.layers.0.op.ffn.ffn_1.bias", "fs2.decoder.layers.0.op.ffn.ffn_2.weight", "fs2.decoder.layers.0.op.ffn.ffn_2.bias", "fs2.decoder.layers.1.op.layer_norm1.weight", "fs2.decoder.layers.1.op.layer_norm1.bias", "fs2.decoder.layers.1.op.self_attn.in_proj_weight", "fs2.decoder.layers.1.op.self_attn.out_proj.weight", "fs2.decoder.layers.1.op.layer_norm2.weight", "fs2.decoder.layers.1.op.layer_norm2.bias", "fs2.decoder.layers.1.op.ffn.ffn_1.weight", "fs2.decoder.layers.1.op.ffn.ffn_1.bias", "fs2.decoder.layers.1.op.ffn.ffn_2.weight", "fs2.decoder.layers.1.op.ffn.ffn_2.bias", "fs2.decoder.layers.2.op.layer_norm1.weight", "fs2.decoder.layers.2.op.layer_norm1.bias", "fs2.decoder.layers.2.op.self_attn.in_proj_weight", "fs2.decoder.layers.2.op.self_attn.out_proj.weight", "fs2.decoder.layers.2.op.layer_norm2.weight", "fs2.decoder.layers.2.op.layer_norm2.bias", "fs2.decoder.layers.2.op.ffn.ffn_1.weight", "fs2.decoder.layers.2.op.ffn.ffn_1.bias", "fs2.decoder.layers.2.op.ffn.ffn_2.weight", "fs2.decoder.layers.2.op.ffn.ffn_2.bias", "fs2.decoder.layers.3.op.layer_norm1.weight", "fs2.decoder.layers.3.op.layer_norm1.bias", "fs2.decoder.layers.3.op.self_attn.in_proj_weight", "fs2.decoder.layers.3.op.self_attn.out_proj.weight", "fs2.decoder.layers.3.op.layer_norm2.weight", "fs2.decoder.layers.3.op.layer_norm2.bias", "fs2.decoder.layers.3.op.ffn.ffn_1.weight", "fs2.decoder.layers.3.op.ffn.ffn_1.bias", "fs2.decoder.layers.3.op.ffn.ffn_2.weight", "fs2.decoder.layers.3.op.ffn.ffn_2.bias", "fs2.decoder.layer_norm.weight", "fs2.decoder.layer_norm.bias". 
	size mismatch for spec_min: copying a param with shape torch.Size([1, 1, 80]) from checkpoint, the shape in current model is torch.Size([1, 1, 128]).
	size mismatch for spec_max: copying a param with shape torch.Size([1, 1, 80]) from checkpoint, the shape in current model is torch.Size([1, 1, 128]).
	size mismatch for denoise_fn.input_projection.weight: copying a param with shape torch.Size([256, 80, 1]) from checkpoint, the shape in current model is torch.Size([384, 128, 1]).
	size mismatch for denoise_fn.input_projection.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([384]).
	size mismatch for denoise_fn.mlp.0.weight: copying a param with shape torch.Size([1024, 256]) from checkpoint, the shape in current model is torch.Size([1536, 384]).
	size mismatch for denoise_fn.mlp.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1536]).
	size mismatch for denoise_fn.mlp.2.weight: copying a param with shape torch.Size([256, 1024]) from checkpoint, the shape in current model is torch.Size([384, 1536]).
	size mismatch for denoise_fn.mlp.2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([384]).
	size mismatch for denoise_fn.residual_layers.0.dilated_conv.weight: copying a param with shape torch.Size([512, 256, 3]) from checkpoint, the shape in current model is torch.Size([768, 384, 3]).
	size mismatch for denoise_fn.residual_layers.0.dilated_conv.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]).

... blah blah blah more of the same
@prophesier
Copy link
Owner

If you are using colab for training, it is best to ask the author of colab because I am not sure what modifications colab has made to the source code. Alternatively, you can ask on the discord channel on the homepage, where most colab authors should be present.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants