Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error in training #59

Open
yejr0229 opened this issue Oct 31, 2023 · 1 comment
Open

error in training #59

yejr0229 opened this issue Oct 31, 2023 · 1 comment

Comments

@yejr0229
Copy link

When I train the model with 1 1080Ti,I encounter this error:

Traceback (most recent call last):
File "/media/data4/yejr/conda_env/v2a/lib/python3.7/site-packages/hydra/_internal/utils.py", line 252, in run_and_report
assert mdl is not None
AssertionError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/yejr/Digital_Avater/vid2avatar-main/code/train.py", line 45, in
main()
File "/media/data4/yejr/conda_env/v2a/lib/python3.7/site-packages/hydra/main.py", line 52, in decorated_main
config_name=config_name,
File "/media/data4/yejr/conda_env/v2a/lib/python3.7/site-packages/hydra/_internal/utils.py", line 378, in _run_hydra
lambda: hydra.run(
File "/media/data4/yejr/conda_env/v2a/lib/python3.7/site-packages/hydra/_internal/utils.py", line 294, in run_and_report
raise ex
File "/media/data4/yejr/conda_env/v2a/lib/python3.7/site-packages/hydra/_internal/utils.py", line 211, in run_and_report
return func()
File "/media/data4/yejr/conda_env/v2a/lib/python3.7/site-packages/hydra/_internal/utils.py", line 381, in
overrides=args.overrides,
File "/media/data4/yejr/conda_env/v2a/lib/python3.7/site-packages/hydra/_internal/hydra.py", line 111, in run
_ = ret.return_value
File "/media/data4/yejr/conda_env/v2a/lib/python3.7/site-packages/hydra/core/utils.py", line 233, in return_value
raise self._return_value
File "/media/data4/yejr/conda_env/v2a/lib/python3.7/site-packages/hydra/core/utils.py", line 160, in run_job
ret.return_value = task_function(task_cfg)
File "/home/yejr/Digital_Avater/vid2avatar-main/code/train.py", line 41, in main
trainer.fit(model, trainset, validset)
File "/media/data4/yejr/conda_env/v2a/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 739, in fit
self._fit_impl, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path
File "/media/data4/yejr/conda_env/v2a/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 683, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/media/data4/yejr/conda_env/v2a/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 773, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "/media/data4/yejr/conda_env/v2a/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1195, in _run
self._dispatch()
File "/media/data4/yejr/conda_env/v2a/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1275, in _dispatch
self.training_type_plugin.start_training(self)
File "/media/data4/yejr/conda_env/v2a/lib/python3.7/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 202, in start_training
self._results = trainer.run_stage()
File "/media/data4/yejr/conda_env/v2a/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1285, in run_stage
return self._run_train()
File "/media/data4/yejr/conda_env/v2a/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1315, in _run_train
self.fit_loop.run()
File "/media/data4/yejr/conda_env/v2a/lib/python3.7/site-packages/pytorch_lightning/loops/base.py", line 145, in run
self.advance(*args, **kwargs)
File "/media/data4/yejr/conda_env/v2a/lib/python3.7/site-packages/pytorch_lightning/loops/fit_loop.py", line 234, in advance
self.epoch_loop.run(data_fetcher)
File "/media/data4/yejr/conda_env/v2a/lib/python3.7/site-packages/pytorch_lightning/loops/base.py", line 151, in run
output = self.on_run_end()
File "/media/data4/yejr/conda_env/v2a/lib/python3.7/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 286, in on_run_end
epoch_end_outputs = model.training_epoch_end(epoch_end_outputs)
File "/home/yejr/Digital_Avater/vid2avatar-main/code/v2a_model.py", line 86, in training_epoch_end
mesh_canonical = generate_mesh(lambda x: self.query_oc(x, cond), self.model.smpl_server.verts_c[0], point_batch=10000, res_up=2)
File "/home/yejr/Digital_Avater/vid2avatar-main/code/lib/utils/meshing.py", line 38, in generate_mesh
value_grid = mesh_extractor.to_dense()
File "lib/libmise/mise.pyx", line 164, in lib.libmise.mise.MISE.to_dense
assert(not isnan(out_view[i, j, k]))
AssertionError

Can you tell me how to deal with it?Thank you so much.

@MoyGcc
Copy link
Owner

MoyGcc commented Nov 28, 2023

Hi, I think it's a general training error and not related to the GPU you used. Did you use the provided example training data?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants