New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Backend worker monitoring thread interrupted or backend worker process died. #537
Comments
@fancyerii, From the shared logs I could observe the following exception as the root cause of failure while creating the worker processes :
Looks like some environment-related issue. Please share the following details :
|
I have copied all the codes of OrderClassifier to the handler(testtorchserving.py) but it seemed that it's not imported?
|
@fancyerii : Somehow the
|
@fancyerii There is problem in the way you are saving and loading your model. And it is generally not a recommended way of saving model for production or external deployment. This is what you are using -> Save/Load Entire Model for details. Save: torch.save(model.state_dict(), PATH) Load: model = TheModelClass(*args, **kwargs) |
I have a similar problem and the line that saves the model is from the transformer library. Then to load the model you can use: Will this cause problems with Torchserve? |
This seems related to #283 |
save_pretrained() internally uses the pytorch recommended way for saving the model. I believe this query was regarding your issue #617 . |
@fancyerii @Bartlett-Will Closing this due to lack of activity. Please reopen if this is still an issue |
I have used huggingface transformer to train a text classification model and deploy it to pytorch serv.
The full codes can be found here. The code is not well organized but the model is very simple:
After training, I saved the best whole model:
The full code is here.
It seems the initialization part throws exception, So I paste the codes here:
because I can't find any clue in ts_log.log, I print it out to find out where the problem occurs. It seems problem occurs in this line:
The full log is here.
I found relavant logs here:
It print out "model: /tmp/models/5886359598784a97ace9c91df12d99590ade3efe/best_model.bin", which is before "self.model = torch.load(model_dir+"/best_model.bin")".
And "print("load model success")" is not executed.
So I guess "self.model = torch.load(model_dir+"/best_model.bin")" failed.
So I tried to load this model to check whether is good.
The codes above are correctly executed.
So What's wrong with it?
The text was updated successfully, but these errors were encountered: