Inference performance files #213

Aniket-Parlikar · 2022-09-21T01:30:35Z

Added new files for measuring inference performance

review-notebook-app · 2022-09-21T01:30:38Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

src/inference_profiling/cloud_run_logs.txt

src/inference_profiling/concurrent_inference.py

src/inference_profiling/multiprocess_inference.py

src/inference_profiling/profile_program.py

src/inference_profiling/profile_program_local.py

src/inference_profiling/test_models.py

ivanzvonkov

Comments about code
Overall code looks okay. I see lots of comments which don't need to be there and repeated code that can be moved into a single file.

src/inference_profiling/local_model_tests_visualization.ipynb

src/inference_profiling/model_logs_visualization.ipynb

ivanzvonkov · 2022-09-23T17:50:46Z

The suggested implementation in #201 states:

Individual function for each task which outputs performance indicators
General script for all tasks which outputs a log/txt file with all performance indicators

It appears to me that some of the performance indicators are still missing, I've made a Google Sheet to track which have been recorded here: https://docs.google.com/spreadsheets/d/1_ZqWCInh8xBGglFrd4r_L2urMZG5f_U6zBdr3wy54Jk/edit?usp=sharing
Is this accurate?
There is no general script which outputs a log/txt file with all performance indicators. Why not?

ivanzvonkov · 2022-10-03T16:10:18Z

@Aniket-Parlikar I see several comments are not yet addressed, please let me know when this is ready for a second look

Aniket-Parlikar · 2022-10-03T17:06:25Z

In regards to this comment,
PFA the answers below.

#213 (comment)

It appears to me that some of the performance indicators are still missing, I've made a Google Sheet to track which have been recorded here: https://docs.google.com/spreadsheets/d/1_ZqWCInh8xBGglFrd4r_L2urMZG5f_U6zBdr3wy54Jk/edit?usp=sharing
Is this accurate?
Ans: I had already uploaded files which contains the information indicated in the missing fields.Malawi_2020_September.csv(I'll rename it for better understanding) contains information regarding the performance parameters of single model inside a Docker container.

Whereas, multi_models_logs.csv contains information about the performance parameters of multiple models deployed in a Docker container.

The cloudrun_logs.txt contains information about the performance parameters of multiple models deployed on Google cloud run service.

There is no general script which outputs a log/txt file with all performance indicators. Why not?
The main reason I believe is due to the fact that we intend to obtain the performance parameters of models deployed in various environments and hence, they need to run on each environment seperately. In addition, some of these parameters vary from environment to environment and we need different measurement approaches in such situations.

ivanzvonkov · 2022-10-12T23:02:53Z

src/inference_profiling/Inference_log_files.txt

+
+CPU performance 53% for 10 requests and around 80% for 1000 requests
+
+Container count 30 for 10 requests and around 139 for 1000 requests


Should this be updated based on previous comment?

Yes, there are 10 active containers for 10 requests

Yes, these can be discarded

ivanzvonkov · 2022-10-12T23:03:49Z

src/inference_profiling/Inference_log_files.txt

+Request latencies: 10.69mins
+Container CPU utilization: 75.39%
+Container memory utilization: 46.55%
+Container startup latency: 4.68s


Nice summary!!

ivanzvonkov · 2022-10-12T23:07:22Z

src/inference_profiling/profile_program_local.py

+
+while (process.is_running()):
+    # set the sleep time to monitor at an interval of every second.
+    time.sleep(0.5)


Just wondering why is it 0.5 here and 1 above?

It was generally set as the execution time of test_models.py is pretty less as compared to concurrent_inference.py due to which while profiling the program, we can try to get performance parameters more accurately.

ivanzvonkov

Thanks for the updates, this is a good comprehensive overview!
I think the notebook may need to be updated to not have the duplicate result.
I left a couple small comments, once they are addressed this is ready to merge!

ivanzvonkov · 2022-10-12T23:11:03Z

src/inference_profiling/Inference_log_files.txt

+
+Time taken:300seconds
+Container instance count: max: active -151
+Request latencies: 10.69mins


This is request latency is probably the biggest issue! I wonder why it is growing so fast

Updated the inference_log_files.txt by removing unwanted logs.

Aniket-Parlikar added 2 commits September 10, 2022 17:12

Added new files

e8523a5

Added new files for measuring Inference performance

2d08a8f

Made changes

014adfd