New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unpooling indices stored as tf.Variables #23
Comments
Thanks, rodamn! You are completely right. We will fix this ASAP. |
Actually, I merged the pull request submitted by @jtatusko that uses a faster interleave method and makes prepare_indices() unnecessary. Recommend merging his pull request in order to close this issue. |
@NikolausDemmel Yeah, I was able to load the model trained on NYU depth dataset and get it to perform inference. Even though the .npy file stores data for the tf.Variables I mentioned above, I presume Tensor Flow just ignores that data when loading all the params when those nodes are no longer present. |
FWIW, I added TF code to add a checkpoint after I loaded the dataset .npy file, and then in future runs I load from my checkpoint. Removing these variable nodes actually only reduces the load size from ~250 MB to ~200MB if I recall, so you don't end up saving that much load time or model data, however, loading the model from a checkpoint seemed faster than using the current framework that loads the model from a .npy file (I didn't time them for comparison, but I recall it being quite a bit faster). Hope this helps. |
We have now fixed this issue. Based on @jtatusko's suggestion, the interleaving layer has been made faster and does not use tf.Variable for the unpooling indices. |
In network.py:
This is the wrong use of tf.Variable (variable nodes are used as a source for value that the net expects to change, typically weights). In this case, this is just reshaping the indices from np.meshgrid, so these values aren't weights, or anything like that. There could be a specific reason these are made as tf.Variable that I'm unaware of, but it seems these lines should be:
Why this matters:
tf.Variable nodes typically store "trainable" values, which must be stored in checkpoints and loaded weight files. Since these are four 4D-flattened-to-1D arrays and there is a set of these for each up-conversion, this is a lot of data being stored to disk, which must also be loaded from disk (and saved, when creating checkpoints). These are basically indices so no change (learning) is expected, this saving and loading I propose is needless.
Case in point, these variables seems to be the prime contributor to the long load time of the weights file. In
predict.py
:takes several minutes on my computer in the current state. Changing
prepare_indices()
as indicated above reduces the load time by orders of magnitude, however making this change MIGHT make the new model incompatible with the current weights file,NYU_ResNet-UpProj.npy
(I am having trouble making the net work with this change, so more investigation is needed on my end, but I figured I would raise this issue in case others are available to work on resolve this).Since this is a non-functional change, I propose the authors try the following:
If the starting weights aren't available, I suppose a full retraining would just need to generate acceptable results.
The text was updated successfully, but these errors were encountered: