Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Saving of models #37

Closed
dave7895 opened this issue Mar 30, 2020 · 11 comments
Closed

Saving of models #37

dave7895 opened this issue Mar 30, 2020 · 11 comments

Comments

@dave7895
Copy link

Am I correct in the assumption that the save models get updated all the time and not just when saving and this save is just the creation of a permanent and non-changing model? Or is everything happening from the start to the first save point, 1000 by default I think, gone to waste if I were to stop it before reaching this mark?

@lucidrains
Copy link
Owner

@dave7895 models are periodically saved at intervals of 1k iterations, and you can find it under ./models/{project name}. they should be never overwritten, and you can even go back in time and load previous models along the way with --load-from {checkpoint num}

@dave7895
Copy link
Author

dave7895 commented Apr 2, 2020

I understand. Sorry for my possibly badly formulated question. I meant: If I train 990 iterations and then interrupt the script, are all the 990 iterations wasted or are they in the most recent model and if I continue there are basically 1990 iterations in this checkpoint added?

@lucidrains
Copy link
Owner

@dave7895 ahh, they are wasted, it's too expensive to save every iteration. however, you can change the frequency at which it saves with a --save-every flag

@dave7895
Copy link
Author

dave7895 commented Apr 3, 2020

Yeah, okay I can understand this. Did this change with a more recent release as I hadn't updated my package and so the --save-every flag did not work. But I did not know this and so I trained it about 15k iterations but always interrupted before the checkpoint as I did not know it was at 10k and not 1k. These are the ones named automatically 0.jpg, so the ones before the first checkpoint.
0-ema
0
0-mr

That's why I am confused because I parallely train on colab and there the ones from before the 1st checkpoint look like this:
0
0-ema
0-mr

@lucidrains
Copy link
Owner

oh sorry I think I misunderstood. If you run the same command with the same project name (or leave at default), it will pick up where you left off

@lucidrains
Copy link
Owner

Oh and yup, save-every was added in recent updates!

@dave7895
Copy link
Author

dave7895 commented Apr 6, 2020

Thanks for the clarification. Where does this get saved then?

@lucidrains
Copy link
Owner

@dave7895 it's saved to the ./models folder where you executed the command

@dave7895
Copy link
Author

dave7895 commented Apr 6, 2020

Ok so just updating and then duplicating with only one evolving at the 1k mark? Is this happening when the images are updated?

@lucidrains
Copy link
Owner

@dave7895 yup! it happens every time the images are updated

@dave7895
Copy link
Author

dave7895 commented Apr 7, 2020

Ok. Thanks for your time.

@dave7895 dave7895 closed this as completed Apr 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants