You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
With your debugging help done in other issues, I was able to get up to the last step.
I have 3 questions about this last step:
If I use single 2080ti here (as you set gpus='0'), I am getting OOM allocation error so I assigned three 2080ti here. Is this an acceptable approach? Because you did not seem to allow allocating multi-gpus for calculating geometry buffers. Also, should I consider using imh=256 instead of 512 to reduce memory usage?
Error message as follows:
tensorflow.python.framework.errors_impl.ResourceExhaustedError:
OOM when allocating tensor with shape[68361728,3] and type float on /job:localhost/replica:0/t
ask:0/device:GPU:0 by allocator GPU_0_bfc [Op:Mul]
What I initially did was copying the whole script (step 1, 2, and 3) of the final step, and running it with $ bash ./script.sh. However this causes the error saying that it cannot find the ckpt-2 and ckpt-10 files that should be pre-existed. So I separated three scripts, and was able to get up to the shape pre-training and joint optimization process. I hope my execution did not cause the below tensorflow warning regarding:
The calling iterator did not fully read the dataset being cached.
In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset will be discarded.
This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`.
You should use `dataset.take(k).cache().repeat()` instead.
I am getting the following error in the very last step and cannot complete your hotdog example ("Simultaneous Relighting and View Synthesis (testing)"):
[test] Restoring trained model
[models/base] Trainable layers registered:
['net_normal_mlp_layer0', 'net_normal_mlp_layer1', 'net_normal_mlp_layer2', 'net_normal_mlp_layer3', 'net_normal_out_layer0', 'net_lvis_mlp_layer0', 'net_lvis_mlp_layer1', 'net_lvis_mlp_layer2', 'net_lvis_mlp_layer3', 'net_lvis_out_layer0']
[models/base] Trainable layers registered:
['net_brdf_mlp_layer0', 'net_brdf_mlp_layer1', 'net_brdf_mlp_layer2', 'net_brdf_mlp_layer3', 'net_brdf_out_layer0']
[models/base] Trainable layers registered:
['net_albedo_mlp_layer0', 'net_albedo_mlp_layer1', 'net_albedo_mlp_layer2', 'net_albedo_mlp_layer3', 'net_albedo_out_layer0', 'net_brdf_z_mlp_layer0', 'net_brdf_z_mlp_layer1', 'net_brdf_z_mlp_layer2', 'net_brdf_z_mlp_layer3', 'net_brdf_z_out_layer0', 'net_normal_mlp_layer0', 'net_normal_mlp_layer1', 'net_normal_mlp_layer2', 'net_normal_mlp_layer3', 'net_normal_out_layer0', 'net_lvis_mlp_layer0', 'net_lvis_mlp_layer1', 'net_lvis_mlp_layer2', 'net_lvis_mlp_layer3', 'net_lvis_out_layer0']
[test] Running inference
Inferring Views: 0%| | 0/200 [00:00<?, ?it/s]
2021-09-14 01:46:33.905210: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2021-09-14 01:47:05.401366: W tensorflow/core/kernels/data/cache_dataset_ops.cc:794] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
Inferring Views: 0%| | 0/200 [02:22<?, ?it/s]
Traceback (most recent call last):
File "/home/jiwonchoi/code/nerfactor/nerfactor/test.py", line 209, in <module>
app.run(main)
File "/home/jiwonchoi/.conda/envs/nerfactor/lib/python3.6/site-packages/absl/app.py", line 312, in run
_run_main(main, args)
File "/home/jiwonchoi/.conda/envs/nerfactor/lib/python3.6/site-packages/absl/app.py", line 258, in _run_main
sys.exit(main(argv))
File "/home/jiwonchoi/code/nerfactor/nerfactor/test.py", line 192, in main
brdf_z_override=brdf_z_override)
File "/home/jiwonchoi/code/nerfactor/nerfactor/models/nerfactor.py", line 266, in call
relight_probes=relight_probes)
File "/home/jiwonchoi/code/nerfactor/nerfactor/models/nerfactor.py", line 362, in _render
rgb_probes = tf.concat([x[:, None, :] for x in rgb_probes], axis=1)
File "/home/jiwonchoi/.conda/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/util/dispatch.py", line 180, in wrapper
return target(*args, **kwargs)
File "/home/jiwonchoi/.conda/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/ops/array_ops.py", line 1606, in concat
return gen_array_ops.concat_v2(values=values, axis=axis, name=name)
File "/home/jiwonchoi/.conda/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/ops/gen_array_ops.py", line 1181, in concat_v2
_ops.raise_from_not_ok_status(e, name)
File "/home/jiwonchoi/.conda/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 6653, in raise_from_not_ok_status
six.raise_from(core._status_to_exception(e.code, message), None)
File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InvalidArgumentError: OpKernel 'ConcatV2' has constraint on attr 'T' not in NodeDef '[N=0, Tidx=DT_INT32]', KernelDef: 'op: "ConcatV2" device_type: "GPU" constraint { name: "T" allowed_values { list { type: DT_INT32 } } } host_memory_arg: "values" host_memory_arg: "axis" host_memory_arg: "output"' [Op:ConcatV2] name: concat
It seems like rgb_probes = tf.concat([x[:, None, :] for x in rgb_probes], axis=1) this line of code causes the issue. Not sure how to debug this.
Thank you in advance.
The text was updated successfully, but these errors were encountered:
it seems that you haven't download the light probes. you can download them in the author's project pages. And then set the 'test_envmap_dir' term in lr5e-3.ini
Hi,
With your debugging help done in other issues, I was able to get up to the last step.
I have 3 questions about this last step:
gpus='0'
), I am getting OOM allocation error so I assigned three 2080ti here. Is this an acceptable approach? Because you did not seem to allow allocating multi-gpus for calculating geometry buffers. Also, should I consider usingimh=256
instead of 512 to reduce memory usage?Error message as follows:
$ bash ./script.sh
. However this causes the error saying that it cannot find theckpt-2
andckpt-10
files that should be pre-existed. So I separated three scripts, and was able to get up to the shape pre-training and joint optimization process. I hope my execution did not cause the below tensorflow warning regarding:It seems like
rgb_probes = tf.concat([x[:, None, :] for x in rgb_probes], axis=1)
this line of code causes the issue. Not sure how to debug this.Thank you in advance.
The text was updated successfully, but these errors were encountered: