Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix single row prediction performance in a multi-threaded environment #6024

Merged
merged 25 commits into from
Mar 18, 2024

Conversation

Ten0
Copy link
Contributor

@Ten0 Ten0 commented Aug 6, 2023

Fixes #6021

  • Store all resources that are reused across single-row predictions in the dedicated SingleRowPredictor (aka FastConfig)
  • Use that instead of resources in the Booster when doing Fast™ single row predictions to avoid having to lock the Booster exclusively.

Fixes microsoft#6021

- Store all resources that are reused across single-row predictions in the dedicated `SingleRowPredictor` (aka `FastConfig`)
- Use that instead of resources in the `Booster` when doing single row predictions to avoid having to lock the `Booster` exclusively.
- A FastConfig being alive now takes a shared lock on the booster (it was likely very incorrect to mutate the booster while this object was already built anyway)
@Ten0
Copy link
Contributor Author

Ten0 commented Aug 6, 2023

Workflows run would help :) (not sure what is currently using this and should be tested)

@Ten0
Copy link
Contributor Author

Ten0 commented Aug 7, 2023

@microsoft-github-policy-service agree

@jameslamb
Copy link
Collaborator

You don't need to force-push here. All commits will be squashed on merge.

@Ten0
Copy link
Contributor Author

Ten0 commented Aug 7, 2023

You don't need to force-push here. All commits will be squashed on merge.

Thanks! :) That was just me organizing my work though (I wanted this commit specifically to be a single one and working in case it would end up being reverted)

@Ten0 Ten0 force-pushed the 6021-fix_single_row_contention branch from f0e4227 to f41755b Compare August 7, 2023 22:08
@jameslamb
Copy link
Collaborator

@shiyu1994 or @guolinke could you please help with a review on this one?

And @AlbertoEAF if you have time?

src/c_api.cpp Show resolved Hide resolved
@AlbertoEAF
Copy link
Contributor

@Ten0 thanks for contributing! Since we're adding capability for it, can you add a test to our C++ test suite showing multiple predictors using the fast API and sharing the booster?

@Ten0
Copy link
Contributor Author

Ten0 commented Sep 18, 2023

So I've finished my benchmarks and it turns out:

  • I can see the expected performance improvements of x N_CPUs performance
  • The sub-optimal locking has no impact
  • I was afraid that memory barriers due to the read lock might be noticeable, so I thought a bit unfortunate that we would have to take the shared read lock for every prediction. I was originally considering holding the shared lock for the entire lifetime of the SingleRowPredictor, but as it turned out some implementations using the C library (notably R) would rely on being able to update the Booster after constructing a SingleRowPredictor, so that would cause a deadlock. (f41755b). As it turns out, this shows no impact in my benchmarks, so I'm going to be reasonably happy with leaving it as is (that is, with taking a shared lock for every single row prediction, as is currently implemented).
    image

@stonebrakert6
Copy link

stonebrakert6 commented Oct 11, 2023

Thanks a lot @Ten0 for this commit. I am a new user of lightGBM(need to mention this so that you don't confuse me with a veteran).
So, IIRC, the usage now would be something similar to this

  1. BoosterHandle is created once and is then shared among all threads.
  2. Each thread has its own copy of FastConfigHandle and then calls LGBM_BoosterPredictForMatSingleRowFastInit to initialize its private(non-shared) FastConfigHandle using the BoosterHandle created in step 1.
  3. Each thread can now call LGBM_BoosterPredictForMatSingleRowFast(passing in its FastConfigHandle) to make predictions.

Is this understanding correct?

@Ten0
Copy link
Contributor Author

Ten0 commented Oct 11, 2023

Each thread has its own copy of FastConfigHandle

Of BoosterHandle but I think that's what you meant. Otherwise yes that's exactly it.

@stonebrakert6
Copy link

Of BoosterHandle but I think that's what you meant.

Sorry, but I didn't mean that. I really meant FastConfigHandle. Quoting you on #6021

This way, when processing on multiple threads, each thread can make its call to LGBM_BoosterPredictForMatSingleRowFastInit, then work without contention:

Here is the pseudo-code, would need to fill ... to make it compile. Also, it has memory/resource leakage.

#include <iostream>
#include <thread>
#include <vector>

#include "LightGBM/c_api.h"

// # of features
const int kFeatures = 13;
// # of threads which call predict
const int nworkers = 10;
// Shared booster
BoosterHandle booster_handle;

// Every thread exectues this function
void predict(ssize_t beg, ssize_t end) {
  // Each thread has its own FastConfigHandle
  FastConfigHandle config;
  int rc = LGBM_BoosterPredictForMatSingleRowFastInit(
      booster_handle, C_API_PREDICT_NORMAL, 0, 0, C_API_DTYPE_FLOAT64,
      kFeatures, "", &config);
  if (rc != 0) {
    abort();
  }
  for (ssize_t i = beg; i < end; i++) {
    int64_t len = 0;
    rc = LGBM_BoosterPredictForMatSingleRowFast(config, ...);
  }
}

int main(int argc, char* argv[]) {
  int num_iterations;
  int rc = LGBM_BoosterCreateFromModelfile(argv[1], &num_iterations,
                                           &booster_handle);
  if (rc != 0) {
    std::cout << "LGBM_BoosterCreateFromModelfile() returned " << rc << '\n';
    return 1;
  }
  std::vector<std::thread> workers(nworkers);
  for (ssize_t i = 0; i < nworkers; i++) {
    if (i != nworkers - 1) {
      workers[i] = std::thread(predict, ...);
    }
  }
  for (std::thread& t : workers) {
    t.join();
  }
  return 0;
}

If each thread creates its own Booster then memory usage for a single model would be x N, where N is the number of threads. So I want to share the booster with all threads and during prediction, I don't want to acquire the write lock, (that's why I am using your fork).

Let me know if my understanding of API usage is correct or not. I want to use your fork(because I need scalable predict/inference performance)

@Ten0
Copy link
Contributor Author

Ten0 commented Oct 11, 2023

Let me know if my understanding of API usage is correct or not

Your understanding is correct, it seems to be just a matter of definitions that caused our misunderstanding. (Copy, handle, globals...)

I want to use your fork(because I need scalable predict/inference performance)

Happy to help :)
If you happen to feel like helping to get this merged to not have to deal with the complications of using a fork I'd happily accept contributions to the PR with regards to the "please showcase usage in the C++ test suite" review comment - I'm not sure exactly when I'll have time to finalize this.

Ten0 added a commit to Ten0/LightGBM that referenced this pull request Jan 12, 2024
@Ten0 Ten0 requested a review from borchero as a code owner January 12, 2024 00:08
@Ten0 Ten0 force-pushed the 6021-fix_single_row_contention branch from f54a6e2 to 6db33b3 Compare January 12, 2024 00:10
@Ten0
Copy link
Contributor Author

Ten0 commented Jan 12, 2024

TLDR; All comments resolved.

can you add a test to our C++ test suite showing multiple predictors using the fast API and sharing the booster?

That took a while to implement because a similar test didn't already exist (the C api was only tested through the python wrapper and the other implementations binding to it) but it's finally done! (1fcbc3f)

I've also added a workaround for #6142. The state of this issue is that, with the workaround, users would not hit issues any more with this API than they would with the current existing APIs around both single row and matrix prediction.

AFAICT this is ready for merge.

Side note: we've been using this in production for a few months without any issue (with the workaround implemented in the wrapper)

@Ten0 Ten0 force-pushed the 6021-fix_single_row_contention branch from 2022132 to b3848a5 Compare January 12, 2024 00:30
@Ten0 Ten0 force-pushed the 6021-fix_single_row_contention branch from b3848a5 to bf5fc5f Compare January 12, 2024 00:59
@Ten0
Copy link
Contributor Author

Ten0 commented Jan 22, 2024

@guolinke Static analysis complains that the TODO tag (#6024 (comment)) is not assigned. Who should I assign it to?

@jameslamb
Copy link
Collaborator

TODO

We prefer not using TODO comments at all, and instead using the Issues backlog to track future work. You won't find many in this repo:

Could you please put the content of that comment into a new issue at https://github.com/microsoft/LightGBM/issues with as much detail as necessary for someone else to pick it up (and linking to this PR)?

Copy link
Collaborator

@guolinke guolinke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@jameslamb
Copy link
Collaborator

Sorry, I'd missed that @guolinke approved this back in January!

I've just updated it to latest master. Will merge this today assuming that CI passes.

@jameslamb
Copy link
Collaborator

Thanks very much!

@jameslamb jameslamb merged commit 0a3e1a5 into microsoft:master Mar 18, 2024
40 of 42 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix single row prediction performance in a multi-threaded environment
6 participants