Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate torchserve process in pid file before starting #1866

Merged
merged 3 commits into from
Sep 29, 2022

Conversation

rohithkrn
Copy link
Collaborator

@rohithkrn rohithkrn commented Sep 21, 2022

Description

Fixes #1862 - There's an edge case where if .model_server.pid file is not deleted when torchserve stops abruptly and another process with same pid is started before torchserve is attempted to start again at which point it falsely assumes that the new pid of another process is torchserve process and exits with message "TorchServe is already running, please use torchserve --stop to stop TorchServe." This additional check validates that the process is actually a torchserve process and if not it will start torchserve.
Testing:

  • Usual path - verified that torchserve start and stop work as before
  • Edge case path - manually changed the pid file to replace pid with another process and verified that torchserve started without complaining that torchserve is already running.

@codecov
Copy link

codecov bot commented Sep 21, 2022

Codecov Report

Merging #1866 (65ed2a0) into master (9217c6a) will decrease coverage by 0.13%.
The diff coverage is 0.00%.

❗ Current head 65ed2a0 differs from pull request most recent head e09ad75. Consider uploading reports for the commit e09ad75 to get more accurate results

@@            Coverage Diff             @@
##           master    #1866      +/-   ##
==========================================
- Coverage   45.13%   44.99%   -0.14%     
==========================================
  Files          63       63              
  Lines        2601     2609       +8     
  Branches       60       60              
==========================================
  Hits         1174     1174              
- Misses       1427     1435       +8     
Impacted Files Coverage Δ
ts/model_server.py 0.00% <0.00%> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Collaborator

@agunapal agunapal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good

@msaroufim
Copy link
Member

Could you please add some detail as to how you verified this fix worked?

@rohithkrn
Copy link
Collaborator Author

Could you please add some detail as to how you verified this fix worked?

@msaroufim updated the PR description to reflect the details

@msaroufim msaroufim added bug Something isn't working triaged Issue has been reviewed and triaged labels Sep 26, 2022
@msaroufim msaroufim merged commit b30c341 into pytorch:master Sep 29, 2022
jagadeeshi2i pushed a commit to jagadeeshi2i/serve that referenced this pull request Nov 1, 2022
* validate torchserve process in pid file before starting

* remove unrequired if statement
jagadeeshi2i pushed a commit to jagadeeshi2i/serve that referenced this pull request Nov 3, 2022
* validate torchserve process in pid file before starting

* remove unrequired if statement
@khelkun
Copy link

khelkun commented Feb 10, 2023

Just to let you know that I have the same kind of issue on Windows server 2019, with TorchServe 0.7.1.

From Anaconda Prompt (ran as admninistrator), I run torchserve --start ..., everything goes fine including the inference test on the served model. I stop the torchserve --start ... command with CTRL+C.

I guess the SIGINT is not catched by torchserve.exe on Windows to delete the .model_server.pid from %APP_DATA%\Local\Temp\1\ so I have to delete it manually before running the next torchserve --start ... command.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triaged Issue has been reviewed and triaged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

valid ts process during start
5 participants