-
Notifications
You must be signed in to change notification settings - Fork 122
Config is not fully applied #6
Comments
Hi @kwea123, Great catch! What you say is true for the On my tests with the In comparison, under the same setup, this (awesome!) PyTorch port runs at 5.7 it/s (also with 3 layers lesser than my exp), and the TensorFlow release (in a different environment) runs at 3.44 it/s), again with three fewer layers. As for not applying But yes, many thanks for alerting! I know benchmarking can be quite arbitrary, and relies on several underlying factors (SW, HW, and more). I'll leave this issue open, for that reason. (Also, the test cases that compare the TF impl to mine function-by-function are all here :) ) |
Ok I modified the config to Lines 164 to 165 in a14357d
You forgot to add x = xyz between these lines, so the x in the following for loop is not the correct input. After adding that line I can successfully run the code, but in my setup (also RTX2080Ti, pytorch 1.4.0 CUDA 10.2) I can only get 8.5it/s (6.8it/s for the other pytorch impl and 5.64it/s for the original repo). Anyways thanks for the contribution, and hope you can correct these bugs here and there in the future.
|
@kwea123 I thought I fixed this, my bad :) It should in fact be
I'll merge this in in a couple days, but if it's urgent, feel free to PR |
Also, I'd be curious to help debug further. Did you run |
I didn't run that. I just wanted to see the speed. |
Caching, in my experimens, contributed to a bump up of +4 it/sec |
In my understanding the caching is only useful for Moreover, do we even need to store them on the disk? Why not just store in the RAM before training starts? For example for lego half resolution data, it has 100x400x400x8x4 bytes = 0.47GB only, which is feasible. |
I had the same presumption, so refrained from implementing caching when I initially put this repo out :) Can load it all into RAM, but the random sampling at each iter seems to take longer, compared to "sampling beforehand and caching to disk". In my case, I sample about 500 random variants of 1024 samples each per view, and this becomes hard to store in RAM, more so for LLFF data. |
Do you have an action profile? Data loading, model forward takes how much time, backward takes how much time, etc.? |
I did, when I profiled. But I must admit, I didnt' religiously log most stuff. However, I jotted down a few timing details in my journal, on how much time a particular module took before vs after optimization, for a few modules. Before I began profiling, I did a few obvious optimization steps (like ensure all Tensors are on CUDA, vectorize ops, etc.). After that, I had a time of (The notion of an iteration is same as the original It was very hard to decipher running times from most of my notes (sorry!), but I've been able to recover a few.
One big caveat: I haven't been able to recall the exact config params that I used. Posting here, so it helps people by hinting where further optimization is possible :) |
@krrish94 To compare runtime fairly, what I did is to make sure the forward/backward results in the training loop match exactly as the TF implementation for certain iterations. I think that's probably the only way to prevent hyperparameter mismatch. Thanks for the work though! |
Bug not fixed, speed is questionable. |
For example here
nerf-pytorch/train_nerf.py
Lines 117 to 123 in a14357d
The
hidden_size
in the config is never applied... which means it will always be set to the default value128
no matter what you set in the config file... I suspect that there are still many places where the config is not fully applied, which means your implementation might differ a lot from the original repo.I was first astonished at the speed you mentioned in the readme, but now it seems that your model is not the same as the original repo... yours is tinier (
hidden_size=128
in contrast to original256
), so no wonder your speed is faster!Could you please carefully check again, or mention in the readme that you use a smaller model? Otherwise it is misleading.
The text was updated successfully, but these errors were encountered: