feature: online pca algorithm #2550

Alexandr-Solovev · 2023-10-16T09:20:45Z

No description provided.

cpp/oneapi/dal/algo/pca/backend/cpu/partial_train_kernel_cov.cpp

cpp/oneapi/dal/algo/pca/backend/gpu/partial_train_kernel.hpp

cpp/oneapi/dal/algo/pca/detail/finalize_train_ops.cpp

cpp/oneapi/dal/algo/pca/detail/finalize_train_ops.hpp

cpp/oneapi/dal/algo/pca/detail/finalize_train_ops_dpc.cpp

cpp/oneapi/dal/algo/pca/detail/partial_train_ops_dpc.cpp

cpp/oneapi/dal/algo/pca/test/fixture.hpp

cpp/oneapi/dal/algo/pca/test/online.cpp

examples/oneapi/cpp/source/pca/pca_dense_online.cpp

examples/oneapi/dpc/source/pca/pca_cor_dense_online.cpp

Alexandr-Solovev · 2023-10-19T11:43:48Z

/intelci: run

Alexandr-Solovev · 2023-10-24T08:12:48Z

/intelci: run

Alexandr-Solovev · 2023-10-24T16:52:10Z

/intelci: run

Alexandr-Solovev · 2023-10-24T22:09:47Z

/intelci: run

ethanglaser · 2023-10-26T14:40:40Z

cpp/oneapi/dal/algo/covariance/backend/cpu/finalize_compute_kernel_dense.cpp

+    daal_covariance::internal::Hyperparameter daal_hyperparameter;
+    /// the logic of block size calculation is copied from DAAL,
+    /// to be changed to passing the values from the performance model
+    std::int64_t blockSize = 140;


why 140? and would it be 140 when row_count > 50000?

I've taken the blockSize from batch kernel. May be @Vika-F can clarify this value?

This logic was copied from DAAL. The constants were defined in DAAL as the result of a series of performance measurements performed in the past. In DAAL, batch and online implementations share large amount of code, including those constants which are actually performance related hyperparameters of the algorithm.
In the future this logic will be moved into hyperparameter classes, this was already implemented in batch oneDAL algorithm, and will be replaced further by more intelligent solution (not just hard-coded constants).

@Vika-F thank you for clarification, seems reasonable for now. Based on this logic block_size would be 140 when number of rows is > 50k, is that intended as well? Seems a bit odd to set to 1024 for larger data but then back to 140 for very large data.

It might be that in case of large number of rows the particular size of the block is not that sufficient - as there is enough work available for each thread. But this is only my guess. The original data from the performance experiments based on which those constants were defined is already unavailable.
And that's why we are moving towards more intelligent solution than hard-coded constants.

Alexandr-Solovev · 2023-10-26T15:22:36Z

/intelci: run

cpp/oneapi/dal/algo/pca/detail/finalize_train_ops.hpp

cpp/oneapi/dal/algo/pca/detail/finalize_train_ops_dpc.cpp

cpp/oneapi/dal/algo/pca/detail/partial_train_ops_dpc.cpp

examples/oneapi/cpp/source/pca/pca_dense_online.cpp

ethanglaser · 2023-10-26T16:04:34Z

examples/oneapi/cpp/source/pca/pca_dense_online.cpp

+    dal::pca::partial_train_result<> partial_result;
+    std::cout << method_name << "\n" << std::endl;
+    auto input_table = split_table_by_rows<double>(x_train, nBlocks);
+


since examples are user facing and meant to demonstrate and explain usage and online is different from traditional train - might be worth adding in some intermediate model evaluation (or at least a comment that provides high level of what is happening with partial train) to fully demonstrate capabilities of online training

Good point, will be added in separate pr

I also think that it would be better not to split the data table which was read from a single file into 10 blocks. More naturally would be to have several (maybe not 10, but rather 3) input files that contain the input blocks of data. And compute the result from those files.

Its a good point, but I guess it suits for examples, because in online bazel tests we do generate data

Yes, this comment is related to examples only.
In Bazel tests it is more convenient to have the data generated on the fly.

ethanglaser · 2023-10-26T16:45:51Z

cpp/oneapi/dal/algo/pca/test/fixture.hpp

+        INFO("check if eigenvectors matrix is orthogonal")
+        check_eigenvectors_orthogonality(eigenvectors);
+    }
+
    void check_infer_result(const pca::descriptor<Float, Method>& desc,


why is this function empty? shouldnt it be validating the results? or does that happen elsewhere

We dont have accuracy check for results in batch. Will be added in separate pr

would it ever fail?

In terms of accuracy no, but it will be failed if it has incorrect dimensions or its not orthogonality

cpp/oneapi/dal/algo/pca/test/online.cpp

Alexandr-Solovev · 2023-10-27T08:02:59Z

/intelci: run

Alexandr-Solovev · 2023-10-27T12:43:02Z

/intelci: run

Alexandr-Solovev · 2023-10-28T07:20:12Z

/intelci: run

Alexandr-Solovev · 2023-10-30T10:28:58Z

/intelci: run

Vika-F · 2023-10-30T10:38:07Z

cpp/oneapi/dal/algo/pca/backend/gpu/finalize_train_kernel_cov_dpc.cpp

+using descriptor_t = detail::descriptor_base<task::dim_reduction>;
+
+template <typename Float>
+auto compute_eigenvectors_on_host(sycl::queue& q,


It is possible to compute eigenvectors on device in three steps using oneMKL LAPACK functions:

Reduce symmetric correlation matrix to tridiagonal form $A = Q \cdot T \cdot Q^T$ with ?sytrd or ?syrdb

Compute eigenvectors $u$ of a tridiagonal matrix T using ?stemr

Compute eigenvectors $v$ of the original matrix $A$ from the eigenvectors of the matrix $T$ as $v = Q \cdot u$

Eigenvalues of $A$ and $T$ are the same.

It's a good point, in the latest MKLGPUFPK we will have syevd(USM-versrion), it provides opportunity to compute eigenvalues and eigenvectors on gpu. It will be added in a separate pr.

Vika-F · 2023-10-30T10:46:44Z

cpp/oneapi/dal/algo/pca/backend/gpu/partial_train_kernel_cov_dpc.cpp

+template <typename Float>
+auto compute_sums(sycl::queue& q,
+                  const pr::ndview<Float, 2>& data,
+                  const dal::backend::event_vector& deps = {}) {
+    ONEDAL_PROFILER_TASK(compute_sums, q);
+    ONEDAL_ASSERT(data.has_data());
+
+    const std::int64_t column_count = data.get_dimension(1);
+    auto sums = pr::ndarray<Float, 1>::empty(q, { column_count }, sycl::usm::alloc::device);
+    auto reduce_event =
+        pr::reduce_by_columns(q, data, sums, pr::sum<Float>{}, pr::identity<Float>{}, deps);
+
+    return std::make_tuple(sums, reduce_event);
+}


This code is duplicated at least in batch covariance algorithm, maybe somewhere else as well:
https://github.com/oneapi-src/oneDAL/blob/master/cpp/oneapi/dal/algo/covariance/backend/gpu/compute_kernel_dense_impl_dpc.cpp#L45
Please consider moving this into some common place and share the implementation between Covariance and PCA_Cov algorithms.

Vika-F · 2023-10-30T10:50:10Z

cpp/oneapi/dal/algo/pca/backend/gpu/partial_train_kernel_cov_dpc.cpp

+}
+
+template <typename Float>
+auto compute_crossproduct(sycl::queue& q,


This code is also duplicated here and in Covariance. Please consider removing the duplication.

And there are more duplications throughout the file. It would be better to get rid of them.

Looks like removing duplication in this pr requires a lot of changes in batch implementations also. Is it ok for you, if I do it in a separate pr?

Vika-F

The API looks good to me, but I have a couple of comments to the internal implementation.

In CPU part of the batch PCA-covariance algorithms the DAAL covariance algorithm is just passed to PCA kernel, but in the online PCA you decided to call Covariance explicitly from oneDAL. Why it is implemented like this? Maybe passing the Covariance as an object to PCA can simplify the implementation.
I think it is not a question to online algorithm, but to PCA implementation in general. Currently PCA + covariance implementation differs in oneDAL and sklearn. oneDAL uses correlation matrix to compute PCA, and sklearn uses variance-covariance matrix. The results of those two approaches are very different in case the input dataset is not normalized. I see that this implementation also use correlation matrix to compute PCA and this choice is hard-coded. I think it is better to have an option to choose which kind of matrix: variance-covariance or correlation to use in PCA.
The code related to performance hyperparameters is duplicated 3+ times in this PR. I think it is important to have a follow up task of adding hyperparameter classes into online Covariance algorithm in oneDAL.

Please see the other comments also in the PR.

Alexandr-Solovev · 2023-10-30T13:05:48Z

@Vika-F Thanks for review!
About difference in CPU part with calling PCA and Cov kernels. So, DAAL implementation of PcaOnline kernel contains additional variable. Its shared pointer to DAAL object OnlineCovariance. It contains all methods and all variables from OnlineCovariance class. And it works when we call finalize and partial from the same place, but with oneDAL online design we had two ways:
1)Contain DAAL OnlineCovariance as part of input of finalize
2)Avoid it and using daal Cov kernel, because in general online PcaCov also calls Cov kernel.

Vika-F

LGTM.
But let's have the comments regarding code duplications addressed in a separate PR the sooner the better.

Alexandr-Solovev added 3 commits October 16, 2023 02:19

init commit with partial train

2fa909e

fixed infrastructure for partial train

b706b02

placeholder for gpu kernel has been added

4229e6c

Alexandr-Solovev added dpc++ Issue/PR related to DPC++ functionality new algorithm New algorithm or method in oneDAL labels Oct 16, 2023

KulikovNikita reviewed Oct 17, 2023

View reviewed changes

Alexandr-Solovev and others added 5 commits October 18, 2023 03:31

add init kernel for partial train

4848b50

add cpu finalize kernel +minor fixes

7bab6d9

extending tests

04e934f

comments resolve

d6ce28a

minor fix

6e3f06a

Alexandr-Solovev marked this pull request as ready for review October 19, 2023 11:43

Alexandr-Solovev requested review from Alexsandruss, samir-nasibli and maria-Petrova as code owners October 19, 2023 11:43

Alexandr-Solovev requested a review from KulikovNikita October 19, 2023 11:43

add draft svd

3be87a5

Alexandr-Solovev changed the title ~~feature: online pca algorithm~~ WIP:feature: online pca algorithm Oct 20, 2023

Alexandr-Solovev added 4 commits October 20, 2023 04:09

add svd wa

87ac47a

svd method

b8ce866

minor test update

30ea1dc

fix svd on cpu

866c8b7

Alexandr-Solovev changed the title ~~WIP:feature: online pca algorithm~~ feature: online pca algorithm Oct 24, 2023

Alexandr-Solovev added 2 commits October 24, 2023 02:01

minor fix

37d8460

Merge branch 'oneapi-src:master' into dev/asolovev_online_pca

25aee46

fix vector in macos ci step

5b263fa

Alexandr-Solovev requested review from ahuber21 and avolkov-intel October 26, 2023 14:09