Compute Gaussian kernel performance improvements. #9

mierzejk · 2020-08-03T13:53:55Z

densratio.RuLSIF.compute_kernel_Gaussian has been updated with a performance-improved implementation. The sheet comparing the baseline (original) and performance-improved implementations is also available at https://bit.ly/3X7asIm; I hope it is pretty self-explanatory.

The densratio.RuLSIF.set_compute_kernel_target (also available to be imported directly from densratio) accepts one of the following string arguments, and sets the underlying engine to carry out calculations:

numpy - numpy broadcasting optimized. It must be noted the underlying BLAS library (e.g. Intel's MKL) can take advantage of multi threading model.
cpu - numba generalized universal function single thread optimized.
parallel - numba generalized universal function multi thread optimized. Please be advised all threading layer specifics apply.

Because of aforementioned multi threading technicalities, the engine defaults to cpu when numba is available, or numpy otherwise. I do not think adding the numba requirement is the best idea, as it can potentially be not backward compatible with other existing projects already dependent on densratio.
The performance-improved densratio.RuLSIF.set_compute_kernel_target implementation returns a numpy.matrix if any of the first two arguments is of the numpy.matrix type. Or it returns and expects a numpy.ndarray, in case future commits replace the deprecated numpy.matrix with just numpy.ndarray.

…able.

mierzejk · 2020-08-03T14:11:03Z

The pull request may, at least partially resolve the following issues: #6 estimate density ratio of large training set and test set and #8 density ratio estimation of high dimension data. According to my tests, both numpy and numba targets can deal with x_list and y_list matrices that consume over 20GB+ altogether, if enough virtual memory is available.
The pull request offers prospect of even greater performance improvement for large sets of data by taking advantage of numba cuda target. Yet that would require some extra work, not fully aligned with currently implemented numba.guvectorize approach.

mierzejk · 2020-08-18T11:20:40Z

A side-note in respect of the performance results: just recently I ran the benchmark with the same densratio_py codebase I have submitted in the following two environments:

My over 6-year-old Dell Precision M4800, Intel Core i9 with 8 cores and 32 GB RAM available, running Ubuntu 18.04.4 LTS.
Virtualized Windows Server 2016, 32 cores and 128 GB RAM available.

And to my surprise, despite the fact all 32 cores were being utilized in Windows environment, the process executed a few times faster on my reportedly less powerful laptop. I am not really sure what the real cause of that is. It might be the operating system itself. But perhaps it is due to the fact I have my laptop setup with regard to PyTorch performance, namely I have built numpy, numba, Cython and mkl from sources by myself. On Windows all packages have been delivered pre-built either by Anacoda or pip.
The original benchmark results I attached to the first pull request post were measured in the first environment, i.e. my Dell Precision M4800 running Ubuntu 18.04.4 LTS.

hoxo-m · 2022-10-08T07:34:13Z

It is the greatest contribution!

mierzejk added 3 commits August 3, 2020 13:00

Gaussian kernel generalized universal helper function, if numba avail…

1731df1

…able.

Performance optimized compute_kernel_Gaussian implementation.

2307c75

A function to set compute kernel target.

9c2bbf9

mierzejk changed the title ~~Compute kernel Gaussian performance improvement.~~ Compute kernel Gaussian performance improvements. Aug 3, 2020

hoxo-m self-requested a review August 4, 2020 00:11

hoxo-m self-assigned this Aug 4, 2020

mierzejk mentioned this pull request Aug 18, 2021

instable results #13

Open

hoxo-m merged commit 1c3229a into hoxo-m:master Oct 8, 2022

mierzejk changed the title ~~Compute kernel Gaussian performance improvements.~~ Compute Gaussian kernel performance improvements. Aug 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compute Gaussian kernel performance improvements. #9

Compute Gaussian kernel performance improvements. #9

mierzejk commented Aug 3, 2020 •

edited

Loading

mierzejk commented Aug 3, 2020 •

edited

Loading

mierzejk commented Aug 18, 2020

hoxo-m commented Oct 8, 2022 •

edited

Loading

Compute Gaussian kernel performance improvements. #9

Compute Gaussian kernel performance improvements. #9

Conversation

mierzejk commented Aug 3, 2020 • edited Loading

mierzejk commented Aug 3, 2020 • edited Loading

mierzejk commented Aug 18, 2020

hoxo-m commented Oct 8, 2022 • edited Loading

mierzejk commented Aug 3, 2020 •

edited

Loading

mierzejk commented Aug 3, 2020 •

edited

Loading

hoxo-m commented Oct 8, 2022 •

edited

Loading