Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Several observations during the empirical study #4

Closed
surfreta opened this issue Nov 12, 2016 · 2 comments
Closed

Several observations during the empirical study #4

surfreta opened this issue Nov 12, 2016 · 2 comments

Comments

@surfreta
Copy link

surfreta commented Nov 12, 2016

Hi Joel,

I have run multiple experiments concurrently, and have several observations. Thanks for your insight.

  1. When the batch size is setup as 20, or higher, I found that the learning rate (using momentum) starts to decrease only after more than 10 epochs. It keeps the same at the very beginning. This is what I described in the other thread. Right now, I can see the learning rate is decreasing as expected after epoch 10.

  2. When the batch size is setup at 2 or 4. I can see the learning rate starts to decrease from the very first several epochs.
    I am not very clear about how to explain this behavior.

  3. In the competition, they use so-called dice coefficient, which is different with the loss function you are using, do you have any specific consideration for this?

  4. I have been trying to test the adam optimizer.

It will work if I call it as trainer = unet.Trainer(net, optimizer="adam", opt_kwargs=dict(learning_rate=0.0015))

However, it will give the following error message if I call it as
trainer = unet.Trainer(net, optimizer="adam", opt_kwargs=dict(momentum=0.0015))
It will give some error message as:

Traceback (most recent call last):
File "launcher.py", line 54, in
path = trainer.train(generator, "/data/unet_trained", training_iters=1406, epochs=100, display_step=100)
File "/test/u-net/ver6/unet.py", line 341, in train
init = self._initialize(training_iters, output_path, restore)
File "/test/u-net/ver6/unet.py", line 298, in _initialize
self.optimizer = self._get_optimizer(training_iters, global_step)
File "/test/u-net/ver6/unet.py", line 281, in _get_optimizer
**self.opt_kwargs).minimize(self.net.cost,
TypeError: init() got an unexpected keyword argument 'momentum'

After I reading your code, it seems to me the parameter in opt_kwargs should not impact at all regardless whether it is "momentum" or "learning_rate" because as shown
in the following, you always set it as "learning_rate" in opt_kwargs.pop. I think I may be not very clear about the mechanism of opt_kwargs

if self.optimizer == "momentum":
            learning_rate = self.opt_kwargs.pop("learning_rate", 0.2)
            decay_rate = self.opt_kwargs.pop("decay_rate", 0.95)
elif self.optimizer == "adam":
            learning_rate = self.opt_kwargs.pop("learning_rate", 0.001)
            self.learning_rate_node = tf.Variable(learning_rate)
@jakeret
Copy link
Owner

jakeret commented Nov 13, 2016

1&2) I looked at the code, it could be that there is a bug when I set up the exponential decaying learning rate. Maybe it should be global_step=global_step*self.batch_size but I'm not sure and I can't test it atm.

  1. I don't have any experience with the dice coefficient. I could change the tf_unet code a bit such that it becomes easier to change the loss function

  2. everything you pass as opt_kwargs is forwarded to the optimizer. According to the API, momentum is not a parameter of AdamOptimizer. So the exception is expected

@surfreta
Copy link
Author

surfreta commented Nov 13, 2016

Hi Joel,

I think your finding is right. This mnist example https://github.com/tensorflow/tensorflow/blob/master/tensorflow/models/image/mnist/convolutional.py#L245 discusses the same thing.

However, I have a question regarding your using global_step

the sess.run((self.optimizer ....)) (line 380 in unet.py)is called within the iteration setup by

for step in range((epoch*training_iters), ((epoch+1)*training_iters)):

It seems to me, we should use step*self.batch_size to get the current index in the dataset.
I understand when you define self.learning_rate_nodeat line 285, you can only get access global_step.

In specific, my question is how do you connect stepwith global_step. I could not find the code in your program that
can bridge these two vectors. I know I may miss something, but I am kind of confused where I am wrong.

Besides, during the training epochs, the prediction figures looks like the background (or opposite) of the raw image. Did you have similar observations even for your other data sets, if you have not tested the kaggle set. Thanks.

epoch_0

surfreta

@jakeret jakeret closed this as completed Mar 7, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants