Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Having trouble training #65

Open
AmeetR opened this issue Jun 3, 2019 · 2 comments
Open

Having trouble training #65

AmeetR opened this issue Jun 3, 2019 · 2 comments

Comments

@AmeetR
Copy link

AmeetR commented Jun 3, 2019

Hi,

I'm trying to train on cityscapes in order to first replicate the 70% miou and then move to other driving datasets to see what happens. However, I'm having trouble replicating this. I can't seem to get the loss to converge below 1.7. I'm training from scratch on purpose in order to get a clean baseline for the other datasets.

@hellochick
Copy link
Owner

Hey AmeetR

If you run through the history issues, you will found that we have discussed this problem several times.
Since this repository just convert the pre-trained weight from caffe original code to tensorflow version.
And the training code is just giving a try. If you want to replicate the performance, you need to implement the Synchronize BN Layer first in order to do large batch size training (as described in the paper).

@AmeetR
Copy link
Author

AmeetR commented Jun 3, 2019

Hi, @hellochick thanks for responding. I'll try to implement that layer tomorrow, but it looks like every time I try to increase the batch more than two my gpu runs out of memory. Also, I did look through all of the history issues and couldn't find anything much of use, which is why I made a new issue. That said, I'm now getting a loss of ~.25, but the evaluation is still .03. Any idea why this may be?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants