Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unpooling indices stored as tf.Variables #23

Closed
rodamn opened this issue Jul 11, 2017 · 6 comments
Closed

Unpooling indices stored as tf.Variables #23

rodamn opened this issue Jul 11, 2017 · 6 comments

Comments

@rodamn
Copy link

rodamn commented Jul 11, 2017

In network.py:

  def prepare_indices(self, before, row, col, after, dims ):
    x_0 = tf.Variable(x0.reshape([-1]), name = 'x_0')
    x_1 = tf.Variable(x1.reshape([-1]), name = 'x_1')
    x_2 = tf.Variable(x2.reshape([-1]), name = 'x_2')
    x_3 = tf.Variable(x3.reshape([-1]), name = 'x_3')

This is the wrong use of tf.Variable (variable nodes are used as a source for value that the net expects to change, typically weights). In this case, this is just reshaping the indices from np.meshgrid, so these values aren't weights, or anything like that. There could be a specific reason these are made as tf.Variable that I'm unaware of, but it seems these lines should be:

  x_0 = x0.reshape([-1])

Why this matters:
tf.Variable nodes typically store "trainable" values, which must be stored in checkpoints and loaded weight files. Since these are four 4D-flattened-to-1D arrays and there is a set of these for each up-conversion, this is a lot of data being stored to disk, which must also be loaded from disk (and saved, when creating checkpoints). These are basically indices so no change (learning) is expected, this saving and loading I propose is needless.

Case in point, these variables seems to be the prime contributor to the long load time of the weights file. In predict.py:

  net.load(model_data_path, sess)

takes several minutes on my computer in the current state. Changing prepare_indices() as indicated above reduces the load time by orders of magnitude, however making this change MIGHT make the new model incompatible with the current weights file, NYU_ResNet-UpProj.npy (I am having trouble making the net work with this change, so more investigation is needed on my end, but I figured I would raise this issue in case others are available to work on resolve this).

Since this is a non-functional change, I propose the authors try the following:

  1. Remove the tf.Variable nodes as shown above (making them simple operations)
  2. Retrain using identical meta-parameters as in the paper (if the starting weight values are still available)
  3. Compare results pre- and post- change to ensure they generate the same output?

If the starting weights aren't available, I suppose a full retraining would just need to generate acceptable results.

@chrirupp
Copy link
Collaborator

Thanks, rodamn! You are completely right. We will fix this ASAP.

@rodamn
Copy link
Author

rodamn commented Jul 28, 2017

Actually, I merged the pull request submitted by @jtatusko that uses a faster interleave method and makes prepare_indices() unnecessary. Recommend merging his pull request in order to close this issue.

@NikolausDemmel
Copy link

@rodamn, does that mean that with @jtatusko's patch you can use the same trained models, or do you need to retrain from scratch? What about the model size?

@rodamn
Copy link
Author

rodamn commented Aug 10, 2017

@NikolausDemmel Yeah, I was able to load the model trained on NYU depth dataset and get it to perform inference. Even though the .npy file stores data for the tf.Variables I mentioned above, I presume Tensor Flow just ignores that data when loading all the params when those nodes are no longer present.

@rodamn
Copy link
Author

rodamn commented Aug 10, 2017

FWIW, I added TF code to add a checkpoint after I loaded the dataset .npy file, and then in future runs I load from my checkpoint. Removing these variable nodes actually only reduces the load size from ~250 MB to ~200MB if I recall, so you don't end up saving that much load time or model data, however, loading the model from a checkpoint seemed faster than using the current framework that loads the model from a .npy file (I didn't time them for comparison, but I recall it being quite a bit faster). Hope this helps.

@irolaina
Copy link
Owner

We have now fixed this issue. Based on @jtatusko's suggestion, the interleaving layer has been made faster and does not use tf.Variable for the unpooling indices.
We also added the corresponding checkpoint files for faster loading of the weights.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants