Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additional performance benchmarks #4

Closed
Zahlii opened this issue Aug 24, 2021 · 4 comments
Closed

Additional performance benchmarks #4

Zahlii opened this issue Aug 24, 2021 · 4 comments

Comments

@Zahlii
Copy link

Zahlii commented Aug 24, 2021

Hi, currently evaluating this as a potential performance enhancement on our MLOps / Inference stack.

Tought I'd give some numbers here (based on MacBook Pro 2019).

Test set up as follows:
a) generate artificial data X = 1E6 x 200 float64, Y = X.sum() for regression, Y = X.sum() > 100 for binary classifier
b) for n_feat in [...] -> fit model on 1000 samples and n_feat features; compile model
c) for batchsize in [...] -> predict 10 times a randomly sampled batch of all data items, using (1) LGBM.predict(), (2). lleaves.predict(), (3) lleaves.predict(n_jobs=1); measure TOTAL time taken

For regression results are:

image

Independent of the number of features, the break-even between parallel lleaves and 1 job seems to be around 1k samples at once, independent of the number of features. Using this logic, we would get better performance than LGBM at all number of samples.

For classification:

image

Also, here, the break-even is around 1k samples.

For classification with HIGHLY IMBALANCED data (1/50 positive), the break-even is only at 10k samples - Any ideas on why this is the case?

image

@Zahlii
Copy link
Author

Zahlii commented Aug 24, 2021

Some further ones, this time including categorical-only features

Classifier
image

Regression
image

@siboehm
Copy link
Owner

siboehm commented Aug 25, 2021

What's the issue here? The plots look fine to me. Some notes:

  • Benchmarking should always be done on something as close to the production model as possible. Things like tree sizes, tree depth, tree layout, number of categoricals, ... will have a large impact on (relative) performance
  • Ideally you should look at the trees that are produced by your benchmark using lightgbm.plot_tree to make sure they are not somehow degenerate.
  • The parallelization implementation of lleaves is kept pretty simple and just implemented in Python, whereas LightGBM calls pthreads directly from C++ afaik. This will mean the parallelization overhead of lleaves is larger, hence the break-even comes somewhat late.

@Zahlii
Copy link
Author

Zahlii commented Aug 25, 2021

@siboehm no real issue here; I just wanted to share the findings I had based on the benchmark. To me, the important take-away is that for most inference payloads WE are seeing (usually 1-100 samples at a time), lleaves provides a performance gain, although only with disabled parallelization. Since the break-even can vary wildly, I think it may be important for high-performance settings to smartly toggle the parallelization on/off depending on the number of samples to be predicted at once.

@siboehm
Copy link
Owner

siboehm commented Aug 25, 2021

That's true! Thanks for sharing your benchmark results, I thought there was some performance issue you were bringing up but even after squinting hard at the plots could see anything out of the ordinary :D So I'm happy lleaves is working well for you!

Regarding the parallelization:

  • Right now (without setting any options) lleaves parallelizes by using a threadpool of size os.cpu_count(). On a CPU with Hyperthreads this will be 2x the number of physical cores. Alternatively lleaves could default to something like os.cpu_count() / 2, which probably has much less overhead for only a slight dip in performance.
  • I might rework the parallelization interface at some point, so there'll be some changes coming. I want to always keep it as an option for people to directly interface with the C function for least possible overhead.

If it's ok for you feel free to close the issue, but do keep me in the loop if you find any other outliers / observations :) I'm interested in how people are using lleaves and whether it makes more sense to develop the library into the easy-to-use or highest-possible-performance direction.

@siboehm siboehm closed this as completed Sep 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants