Failure during optimization #5

ecmjohnson · 2022-04-19T17:28:52Z

Hello, I'm trying to run ViSER on some of my own datasets. Out of my 5 datasets, 2 succeed and 3 fail: all with the same failure case:

> /HPS/articulated_nerf/work/viser/nnutils/mesh_net.py(809)forward()
-> self.match_loss = (csm_pred - csm_gt).norm(2,1)[mask].mean() * 0.1
(Pdb) 
Traceback (most recent call last):
  File "optimize.py", line 59, in <module>
    app.run(main)
  File "/HPS/articulated_nerf/work/miniconda3/envs/viser/lib/python3.8/site-packages/absl/app.py", line 312, in run
    _run_main(main, args)
  File "/HPS/articulated_nerf/work/miniconda3/envs/viser/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main
    sys.exit(main(argv))
  File "optimize.py", line 56, in main
    trainer.train()
  File "/HPS/articulated_nerf/work/viser/nnutils/train_utils.py", line 339, in train
    total_loss,aux_output = self.model(input_batch)
  File "/HPS/articulated_nerf/work/miniconda3/envs/viser/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/HPS/articulated_nerf/work/miniconda3/envs/viser/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 705, in forward
    output = self.module(*inputs[0], **kwargs[0])
  File "/HPS/articulated_nerf/work/miniconda3/envs/viser/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/HPS/articulated_nerf/work/viser/nnutils/mesh_net.py", line 809, in forward
    self.match_loss = (csm_pred - csm_gt).norm(2,1)[mask].mean() * 0.1
  File "/HPS/articulated_nerf/work/viser/nnutils/mesh_net.py", line 809, in forward
    self.match_loss = (csm_pred - csm_gt).norm(2,1)[mask].mean() * 0.1
  File "/HPS/articulated_nerf/work/miniconda3/envs/viser/lib/python3.8/bdb.py", line 88, in trace_dispatch
    return self.dispatch_line(frame)
  File "/HPS/articulated_nerf/work/miniconda3/envs/viser/lib/python3.8/bdb.py", line 113, in dispatch_line
    if self.quitting: raise BdbQuit
bdb.BdbQuit
Traceback (most recent call last):
  File "/HPS/articulated_nerf/work/miniconda3/envs/viser/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/HPS/articulated_nerf/work/miniconda3/envs/viser/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/HPS/articulated_nerf/work/miniconda3/envs/viser/lib/python3.8/site-packages/torch/distributed/launch.py", line 340, in <module>
    main()
  File "/HPS/articulated_nerf/work/miniconda3/envs/viser/lib/python3.8/site-packages/torch/distributed/launch.py", line 326, in main
    sigkill_handler(signal.SIGTERM, None)  # not coming back
  File "/HPS/articulated_nerf/work/miniconda3/envs/viser/lib/python3.8/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler
    raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd)
subprocess.CalledProcessError: Command '['/HPS/articulated_nerf/work/miniconda3/envs/viser/bin/python', '-u', 'optimize.py', '--local_rank=0', '--name=cactus_full-1003-0', '--checkpoint_dir', 'log', '--n_bones', '21', '--num_epochs', '20', '--dataname', 'cactus_full-init', '--ngpu', '1', '--batch_size', '4', '--seed', '1003']' returned non-zero exit status 1.
Killing subprocess 6097

Full error log 1
Full error log 2
Full error log 3

I would tend to assume this is a division by zero in the identified line. Have you encountered this issue before?

I have tried multiple values of init_frame and end_frame for initially optimizing on a subset (where the failure occurs). I have also tried different seed values. I haven't found any choice of these parameters that cause these datasets to avoid this failure case.

Any help or insight you can provide would be appreciated

The text was updated successfully, but these errors were encountered:

gengshan-y · 2022-04-19T19:10:46Z

Hello, this seems to be an initialization issue. The rendered mask might not be overlapped with observed mask when principal point is not initialized properly.

Does this solve the problem? #4 (comment)

ecmjohnson · 2022-04-20T15:44:03Z

Ah, let me clarify my understanding: so the ppx and ppy pixel coordinates are not necessarily the principal point of the camera projection (i.e. typically half width and half height respectively) and I should adjust them to be centered on the object for the start_idx frame in the init optimization. Is that correct?

I had already set the ppx and ppy to be half width and half height respectively for my datasets, but it is possible that this point did not overlap the masks in the datasets which failed.

gengshan-y · 2022-04-20T19:57:32Z

Your understanding is correct. Let me give more explanation if you are interested -- ppx, ppy is supposed to be the principal point of camera, but if we initialize them as the correct values, the renderings may not overlap due to the incorrect initial root translation estimation, which causes the problem.

I would suggest using the following to avoid tedious manual initialization of ppx, ppy:

Besides passing principal points to the config file, another option is to pass --cnnpp to optimize.py, which optimizes an image CNN to predict principal points. In this case, we have some mechanism here to ensure the silhouette rendering and ground-truth overlaps.

ecmjohnson · 2022-04-27T17:21:36Z

Ah, excellent! That solves the issue of failing during optimization

Thanks!

ecmjohnson closed this as completed Apr 27, 2022

gengshan-y mentioned this issue Apr 28, 2022

Question about the flatten loss google/lasr#17

Closed

gengshan-y mentioned this issue Oct 27, 2022

warning: loading empty camera #2

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failure during optimization #5

Failure during optimization #5

ecmjohnson commented Apr 19, 2022

gengshan-y commented Apr 19, 2022

ecmjohnson commented Apr 20, 2022

gengshan-y commented Apr 20, 2022

ecmjohnson commented Apr 27, 2022

Failure during optimization #5

Failure during optimization #5

Comments

ecmjohnson commented Apr 19, 2022

gengshan-y commented Apr 19, 2022

ecmjohnson commented Apr 20, 2022

gengshan-y commented Apr 20, 2022

ecmjohnson commented Apr 27, 2022