Skip to content

Conversation

@nvpohanh
Copy link
Contributor

@nvpohanh nvpohanh commented Jun 15, 2020

This is built on top of #618

Changes:

  • Added a new TestSettings flag: server_num_issue_query_threads.
  • Implemented multi-thread IssueQuery() functionality in issue_query_controller.cc/h. This feature is enabled only when in Server scenario, server_num_issue_query_threads is set to positive numbers, and if there are threads calling RegisterIssueQueryThread().
  • Moved SampleMetadata, QueryMetadata, ResponseDelegate, QueryScheduler implementations and part of IssueQueries() to issue_query_controller.cc/h with very minor modifications.

TODO:

  • Add Python binding for RegisterIssueQueryThread()
  • Add small Python repro program to test this new binding function.

If you want to see a pure incremental diff for the second feature, see: nvpohanh#1

@christ1ne
Copy link
Contributor

christ1ne commented Jun 16, 2020

How do I set the threads as a user? Is it in my own run.py in the benchmarks? Or I need to recompile loadgen with the number of threads that I want to set? @nvpohanh

@nvpohanh
Copy link
Contributor Author

How do I set the threads as a user?

You create a thread, and call RegisterIssueQueryThread(). This call will be blocking until the entire test ends. You can see the MultiBasicSUT in repro.cpp as an example.

Is it in my own run.py in the benchmarks?

I haven't thought about how we can do this in Python (not a Python expert 😢). Proposal would be:
(1) Add Python binding for RegisterIssueQueryThread().
(2) In your run.py, do something like

import loadgen
import threading

def start_issue_query_thread():
    loadgen.register_issue_query()

def main():
    x = threading.Thread(target=start_issue_query_thread)
    x.start()
    # Create SUT, QSL, test_settings
    test_settings.server_num_issue_query_threads = 1
    # Configure other test settings
    loadgen.StartTest(...)
    x.join()

Or I need to recompile loadgen with the number of threads that I want to set?

No recompile is needed. You only need to change the test settings and make sure the correct number of threads call RegisterIssueQueryThread()

Note that I haven't added the Python binding yet, but I can do it. Let me add a TODO

@christ1ne
Copy link
Contributor

christ1ne commented Jun 18, 2020

What's the performance increase going from 1 thread to 2 on a typical GPU on any inference model (e.g. resnet50) with loadgen in terms of percentage? 10%? 50%? 100%? What's the host CPU usage for 1 and 2 threads?

@DilipSequeira
Copy link
Collaborator

Unfortunately we're not able to disclose measured scaling factors, since that would to some extent reveal overall performance. However, the current implementation represents a significant scaling bottleneck on one of our systems containing multiple high-end GPUs, and the submission scores will illustrate this.

If Intel or any other submitter believes that these MRs inhibit their ability to demonstrate the performance of which their system is capable, we're happy to take input on alternative approaches for improving scalability.

Regarding CPU utilization... it will depend on the SUT. If the SUT and QSL are empty and loadgen is operating at maximum throughput, loadgen will consume 100% of the thread(s). However, using multiple threads is opt-in, and we expect submitters will opt in only if it increases their score. If opting in is helpful for Intel on any of the benchmarks and you have performance improvements for this path, we'll be happy to review.

@christ1ne
Copy link
Contributor

This change looks good to me @nvpohanh Thank you for your contribution.

@aaronzhongii
Copy link
Contributor

aaronzhongii commented Jun 25, 2020

Tried the changes at my local with default behavior and no performance impact is observed. I did not enable the issuequery thread as the issueQuery call back function in my implementation is really a thin function as below

Query* q = new Query;
q->qid = qid;
q->qs = qs;
q->cntQs = cntQs;
q->offset = 0;

// Send the query to query queue and return right after
qQueueMutexes[qid].lock();
qQueues[qid].push(q);
qQueueMutexes[qid].unlock();
qQueueCVs[qid].notify_one();
cntTotalQs += cntQs;

There are multiple worker thread in the implementation to process the query in the queues where the real work is done. Since the issuequery function is so thin in my implementation, I don't think there is any benefit to make it multithreaded.

Thus I am curious why NVidia can't employ the same approach, make the issueQuery thin and small and use multi-threaded workers in the background?

@christ1ne christ1ne merged commit 3c6e6d1 into mlcommons:master Jun 30, 2020
arjunsuresh pushed a commit to GATEOverflow/inference that referenced this pull request Apr 29, 2024
…e-query-threads

Implement LoadGen RegisterIssueQueryThread()

Former-commit-id: 3c6e6d1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants