-
Notifications
You must be signed in to change notification settings - Fork 594
Implement LoadGen RegisterIssueQueryThread() #620
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement LoadGen RegisterIssueQueryThread() #620
Conversation
|
How do I set the threads as a user? Is it in my own run.py in the benchmarks? Or I need to recompile loadgen with the number of threads that I want to set? @nvpohanh |
You create a thread, and call
I haven't thought about how we can do this in Python (not a Python expert 😢). Proposal would be:
No recompile is needed. You only need to change the test settings and make sure the correct number of threads call Note that I haven't added the Python binding yet, but I can do it. Let me add a TODO |
|
What's the performance increase going from 1 thread to 2 on a typical GPU on any inference model (e.g. resnet50) with loadgen in terms of percentage? 10%? 50%? 100%? What's the host CPU usage for 1 and 2 threads? |
|
Unfortunately we're not able to disclose measured scaling factors, since that would to some extent reveal overall performance. However, the current implementation represents a significant scaling bottleneck on one of our systems containing multiple high-end GPUs, and the submission scores will illustrate this. If Intel or any other submitter believes that these MRs inhibit their ability to demonstrate the performance of which their system is capable, we're happy to take input on alternative approaches for improving scalability. Regarding CPU utilization... it will depend on the SUT. If the SUT and QSL are empty and loadgen is operating at maximum throughput, loadgen will consume 100% of the thread(s). However, using multiple threads is opt-in, and we expect submitters will opt in only if it increases their score. If opting in is helpful for Intel on any of the benchmarks and you have performance improvements for this path, we'll be happy to review. |
|
This change looks good to me @nvpohanh Thank you for your contribution. |
|
Tried the changes at my local with default behavior and no performance impact is observed. I did not enable the issuequery thread as the issueQuery call back function in my implementation is really a thin function as below Query* q = new Query; // Send the query to query queue and return right after There are multiple worker thread in the implementation to process the query in the queues where the real work is done. Since the issuequery function is so thin in my implementation, I don't think there is any benefit to make it multithreaded. Thus I am curious why NVidia can't employ the same approach, make the issueQuery thin and small and use multi-threaded workers in the background? |
…e-query-threads Implement LoadGen RegisterIssueQueryThread() Former-commit-id: 3c6e6d1
This is built on top of #618
Changes:
TODO:
If you want to see a pure incremental diff for the second feature, see: nvpohanh#1