-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to restore the trainning #160
Comments
@robeson1010 Hi sorry for late resposne. def set_model(self):
encoder = self.architecture_config['model_params']['encoder']
if encoder == 'from_scratch':
self.model = UNet(**self.architecture_config['model_params'])
else:
config = PRETRAINED_NETWORKS[encoder]
self.model = config['model'](**config['model_config'])
self._initialize_model_weights = lambda: None
self.load('YOUR_FILEPATH_TO_MODEL') If you want to load the model that you pretrained that has one of those Resnet archs. When you restart it will start from epoch 0 (though your weights from epoch 54 will be used). I would suggest using a smaller lr if you were using some sort of decay. As of now we are not checkpointing the optimizer state so it will be difficult to restore the exact state of your training at epoch 54 but usually restarting with new optimizer gets the job done. I hope this helps. |
@jakubczakon Really thanks |
"As of now we are not checkpointing the optimizer state so it will be difficult to restore the exact state of your training" Is this still the case? I was hoping to run the training 5-10 epochs at a time and keep checking on the model's progress. Then I'd like to add some new classes, but that's a different problem. Basically I don't want to pay for the full 100 and then find out that something went wrong, or otherwise pay for 100 when 50 might suffice. |
I have trained the data for 3 days but unfortunately the processing interrupted due to some reasons. I have used the 'python main.py -- train --pipeline_name unet_weighted' but it trained from epochs 0. How can I restore the training processing from my last time (54 epochs already)?
The text was updated successfully, but these errors were encountered: