Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with finetune model #318

Closed
puppyapple opened this issue Nov 28, 2019 · 4 comments
Closed

Problem with finetune model #318

puppyapple opened this issue Nov 28, 2019 · 4 comments

Comments

@puppyapple
Copy link

@erogol @reuben @twerkmeister @ekr Hey guys, thanks for you great work here! I'm trying training Tacotron2 with a custom dataset and generally it runs well but still with some issues that I failed to resolve. It would be kind if you could give me some ideas about them.
The nearest problem I got yesterday is when I tried to finetune my model with BN version prenet as mentioned by @erogol in other comment. But with distributed training launched by 'python3.7 distribute.py --restore_path xxxx/best_model.pth.tar', I soon got cuda memory error while found the GPU situation showed below. If I understand well, the main GPU device 0 has been used by all the other 7 subprocess and ran out of memeory while the other 7 still had free memories. I did some search and this probably relates to the restore of Adam optimizer since someone comment that Adam has to restore all parameters only from main GPU device? Any idear about this?
image
Other doubts are about some training details that I posted here #58 (comment)
I woud be grateful if you could share some ideas with me, thanks in advance!

@puppyapple
Copy link
Author

Anyone got same issue like this?

@erogol
Copy link
Contributor

erogol commented Dec 7, 2019

There is a small bug in master fine-tune which wastes some memory with the loaded checkpoint. I guess Ive fixed it on Dec branch. Otherwise in couple days I will

@puppyapple
Copy link
Author

Thanks for the reply. I encounter this issue when using dev branch. Looking forward to your update! I tried to locate the problem myself but no luck with that 😂

@erogol
Copy link
Contributor

erogol commented Dec 9, 2019

now fixed on dev

@erogol erogol closed this as completed Dec 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants