Skip to content
This repository has been archived by the owner on Nov 11, 2023. It is now read-only.

train开始之后一堆 xxx is not in the checkpoint的错 #20

Closed
owenlius opened this issue Mar 13, 2023 · 5 comments
Closed

train开始之后一堆 xxx is not in the checkpoint的错 #20

owenlius opened this issue Mar 13, 2023 · 5 comments
Labels
question Further information is requested

Comments

@owenlius
Copy link

本地和colab都是一样的错,环境都是没问题的。训练了10000次但是推出来的声音都是只有噪音。
请问是g_0和d_0的问题嘛 但是看log都是loaded了
请大佬帮忙看看 谢谢!
INFO:44k:{'train': {'log_interval': 200, 'eval_interval': 800, 'seed': 1234, 'epochs': 10000, 'learning_rate': 0.0001, 'betas': [0.8, 0.99], 'eps': 1e-09, 'batch_size': 6, 'fp16_run': False, 'lr_decay': 0.999875, 'segment_size': 10240, 'init_lr_ratio': 1, 'warmup_epochs': 0, 'c_mel': 45, 'c_kl': 1.0, 'use_sr': True, 'max_speclen': 512, 'port': '8001', 'keep_ckpts': 3}, 'data': {'training_files': 'filelists/train.txt', 'validation_files': 'filelists/val.txt', 'max_wav_value': 32768.0, 'sampling_rate': 44100, 'filter_length': 2048, 'hop_length': 512, 'win_length': 2048, 'n_mel_channels': 80, 'mel_fmin': 0.0, 'mel_fmax': 22050}, 'model': {'inter_channels': 192, 'hidden_channels': 192, 'filter_channels': 768, 'n_heads': 2, 'n_layers': 6, 'kernel_size': 3, 'p_dropout': 0.1, 'resblock': '1', 'resblock_kernel_sizes': [3, 7, 11], 'resblock_dilation_sizes': [[1, 3, 5], [1, 3, 5], [1, 3, 5]], 'upsample_rates': [8, 8, 2, 2, 2], 'upsample_initial_channel': 512, 'upsample_kernel_sizes': [16, 16, 4, 4, 4], 'n_layers_q': 3, 'use_spectral_norm': False, 'gin_channels': 256, 'ssl_dim': 256, 'n_speakers': 200}, 'spk': {'owen': 0}, 'model_dir': './logs/44k'}
2023-03-13 13:35:44.410924: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
DEBUG:tensorflow:Falling back to TensorFlow client; we recommended you install the Cloud TPU client directly with pip install cloud-tpu-client.
2023-03-13 13:35:45.367774: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-03-13 13:35:45.367898: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-03-13 13:35:45.367920: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
DEBUG:h5py.conv:Creating converter from 7 to 5
DEBUG:h5py.conv:Creating converter from 5 to 7
DEBUG:h5py.conv:Creating converter from 7 to 5
DEBUG:h5py.conv:Creating converter from 5 to 7
DEBUG:jaxlib.mlir.mlir_libs:Initializing MLIR with module: site_initialize_0
DEBUG:jaxlib.mlir.mlir_libs:Registering dialects from initializer <module 'jaxlib.mlir.mlir_libs.site_initialize_0' from '/usr/local/lib/python3.9/dist-packages/jaxlib/mlir/mlir_libs/site_initialize_0.so'>
DEBUG:jax.src.path:etils.epath found. Using etils.epath for file I/O.
INFO:numexpr.utils:NumExpr defaulting to 2 threads.
INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0
INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes.
./logs/44k/G_0.pth
error, emb_g.weight is not in the checkpoint
INFO:44k:emb_g.weight is not in the checkpoint
error, pre.weight is not in the checkpoint
INFO:44k:pre.weight is not in the checkpoint
error, pre.bias is not in the checkpoint
INFO:44k:pre.bias is not in the checkpoint
error, enc_p.proj.weight is not in the checkpoint
INFO:44k:enc_p.proj.weight is not in the checkpoint
error, enc_p.proj.bias is not in the checkpoint
INFO:44k:enc_p.proj.bias is not in the checkpoint
error, enc_p.f0_emb.weight is not in the checkpoint
INFO:44k:enc_p.f0_emb.weight is not in the checkpoint
error, enc_p.enc
.attn_layers.0.emb_rel_k is not in the checkpoint
INFO:44k:enc_p.enc
.attn_layers.0.emb_rel_k is not in the checkpoint
error, enc_p.enc
.attn_layers.0.emb_rel_v is not in the checkpoint
INFO:44k:enc_p.enc
.attn_layers.0.emb_rel_v is not in the checkpoint
error, enc_p.enc
.attn_layers.0.conv_q.weight is not in the checkpoint
INFO:44k:enc_p.enc
.attn_layers.0.conv_q.weight is not in the checkpoint
error, enc_p.enc
.attn_layers.0.conv_q.bias is not in the checkpoint
INFO:44k:enc_p.enc
.attn_layers.0.conv_q.bias is not in the checkpoint
error, enc_p.enc
.attn_layers.0.conv_k.weight is not in the checkpoint
INFO:44k:enc_p.enc
.attn_layers.0.conv_k.weight is not in the checkpoint
error, enc_p.enc
.attn_layers.0.conv_k.bias is not in the checkpoint
INFO:44k:enc_p.enc
.attn_layers.0.conv_k.bias is not in the checkpoint
error, enc_p.enc_.attn_layers.0.conv_v.weight is not in the checkpoint
INFO:44k:enc_p.enc_.attn_layers.0.conv_v.weight is not in the checkpoint
error, enc_p.enc_.attn_layers.0.conv_v.bias is not in the checkpoint
INFO:44k:enc_p.enc_.attn_layers.0.conv_v.bias is not in the checkpoint
error, enc_p.enc_.attn_layers.0.conv_o.weight is not in the checkpoint
INFO:44k:enc_p.enc_.attn_layers.0.conv_o.weight is not in the checkpoint
error, enc_p.enc_.attn_layers.0.conv_o.bias is not in the checkpoint
INFO:44k:enc_p.enc_.attn_layers.0.conv_o.bias is not in the checkpoint
error, enc_p.enc_.attn_layers.1.emb_rel_k is not in the checkpoint
INFO:44k:enc_p.enc_.attn_layers.1.emb_rel_k is not in the checkpoint
error, enc_p.enc_.attn_layers.1.emb_rel_v is not in the checkpoint
INFO:44k:enc_p.enc_.attn_layers.1.emb_rel_v is not in the checkpoint
error, enc_p.enc_.attn_layers.1.conv_q.weight is not in the checkpoint
INFO:44k:enc_p.enc_.attn_layers.1.conv_q.weight is not in the checkpoint
error, enc_p.enc_.attn_layers.1.conv_q.bias is not in the checkpoint
INFO:44k:enc_p.enc_.attn_layers.1.conv_q.bias is not in the checkpoint
error, enc_p.enc_.attn_layers.1.conv_k.weight is not in the checkpoint
INFO:44k:enc_p.enc_.attn_layers.1.conv_k.weight is not in the checkpoint
error, enc_p.enc_.attn_layers.1.conv_k.bias is not in the checkpoint
INFO:44k:enc_p.enc_.attn_layers.1.conv_k.bias is not in the checkpoint
error, enc_p.enc_.attn_layers.1.conv_v.weight is not in the checkpoint
INFO:44k:enc_p.enc_.attn_layers.1.conv_v.weight is not in the checkpoint
error, enc_p.enc_.attn_layers.1.conv_v.bias is not in the checkpoint
INFO:44k:enc_p.enc_.attn_layers.1.conv_v.bias is not in the checkpoint
error, enc_p.enc_.attn_layers.1.conv_o.weight is not in the checkpoint
INFO:44k:enc_p.enc_.attn_layers.1.conv_o.weight is not in the checkpoint
error, enc_p.enc_.attn_layers.1.conv_o.bias is not in the checkpoint
INFO:44k:enc_p.enc_.attn_layers.1.conv_o.bias is not in the checkpoint
error, enc_p.enc_.attn_layers.2.emb_rel_k is not in the checkpoint
INFO:44k:enc_p.enc_.attn_layers.2.emb_rel_k is not in the checkpoint
error, enc_p.enc_.attn_layers.2.emb_rel_v is not in the checkpoint
INFO:44k:enc_p.enc_.attn_layers.2.emb_rel_v is not in the checkpoint
error, enc_p.enc_.attn_layers.2.conv_q.weight is not in the checkpoint
INFO:44k:enc_p.enc_.attn_layers.2.conv_q.weight is not in the checkpoint

@E-sion
Copy link

E-sion commented Mar 13, 2023

一样的情况

@Miuzarte
Copy link
Contributor

尝试移走两个底模文件后开始训练,让它自己随机生成保存点来训练看看,我怀疑你们底模没找对,3.0、4.0、4.0v2的模型是互不通用的

@Miuzarte Miuzarte added the question Further information is requested label Mar 14, 2023
@NaruseMioShirakana
Copy link
Contributor

这个问题是由于你所使用的预训练模型或者你要继续训练的模型不是该项目的模型或者与你正在使用的版本不一致。
SoVits的各个版本之间区别非常大,所以权重是不通用的,预训练模型和模型也是不通用的

@upright2003
Copy link

hubert/checkpoint_best_legacy_500.pt 有没有摆放?

@owenlius
Copy link
Author

找到原因了,是底模的版本用的是其他版本的底模。感谢大佬们解答!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants