Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inference performance files #213

Merged
merged 8 commits into from
Oct 21, 2022
Merged

Conversation

Aniket-Parlikar
Copy link
Contributor

Added new files for measuring inference performance

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Copy link
Collaborator

@ivanzvonkov ivanzvonkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments about code
Overall code looks okay. I see lots of comments which don't need to be there and repeated code that can be moved into a single file.

@ivanzvonkov
Copy link
Collaborator

The suggested implementation in #201 states:

  • Individual function for each task which outputs performance indicators
  • General script for all tasks which outputs a log/txt file with all performance indicators
  1. It appears to me that some of the performance indicators are still missing, I've made a Google Sheet to track which have been recorded here: https://docs.google.com/spreadsheets/d/1_ZqWCInh8xBGglFrd4r_L2urMZG5f_U6zBdr3wy54Jk/edit?usp=sharing
    Is this accurate?

  2. There is no general script which outputs a log/txt file with all performance indicators. Why not?

@ivanzvonkov
Copy link
Collaborator

@Aniket-Parlikar I see several comments are not yet addressed, please let me know when this is ready for a second look

@Aniket-Parlikar
Copy link
Contributor Author

In regards to this comment,
PFA the answers below.

#213 (comment)

  1. It appears to me that some of the performance indicators are still missing, I've made a Google Sheet to track which have been recorded here: https://docs.google.com/spreadsheets/d/1_ZqWCInh8xBGglFrd4r_L2urMZG5f_U6zBdr3wy54Jk/edit?usp=sharing
    Is this accurate?
    Ans: I had already uploaded files which contains the information indicated in the missing fields.Malawi_2020_September.csv(I'll rename it for better understanding) contains information regarding the performance parameters of single model inside a Docker container.

Whereas, multi_models_logs.csv contains information about the performance parameters of multiple models deployed in a Docker container.

The cloudrun_logs.txt contains information about the performance parameters of multiple models deployed on Google cloud run service.

  1. There is no general script which outputs a log/txt file with all performance indicators. Why not?
    The main reason I believe is due to the fact that we intend to obtain the performance parameters of models deployed in various environments and hence, they need to run on each environment seperately. In addition, some of these parameters vary from environment to environment and we need different measurement approaches in such situations.


CPU performance 53% for 10 requests and around 80% for 1000 requests

Container count 30 for 10 requests and around 139 for 1000 requests
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be updated based on previous comment?

Yes, there are 10 active containers for 10 requests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, these can be discarded

Request latencies: 10.69mins
Container CPU utilization: 75.39%
Container memory utilization: 46.55%
Container startup latency: 4.68s
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice summary!!


while (process.is_running()):
# set the sleep time to monitor at an interval of every second.
time.sleep(0.5)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering why is it 0.5 here and 1 above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was generally set as the execution time of test_models.py is pretty less as compared to concurrent_inference.py due to which while profiling the program, we can try to get performance parameters more accurately.

Copy link
Collaborator

@ivanzvonkov ivanzvonkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates, this is a good comprehensive overview!
I think the notebook may need to be updated to not have the duplicate result.
I left a couple small comments, once they are addressed this is ready to merge!


Time taken:300seconds
Container instance count: max: active -151
Request latencies: 10.69mins
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is request latency is probably the biggest issue! I wonder why it is growing so fast

Updated the inference_log_files.txt by removing unwanted logs.
@Aniket-Parlikar Aniket-Parlikar merged commit 5a2ffaa into master Oct 21, 2022
@ivanzvonkov ivanzvonkov deleted the Inference_Performance_files branch October 24, 2022 16:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants