Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: online pca algorithm #2550

Merged

Conversation

Alexandr-Solovev
Copy link
Contributor

No description provided.

@Alexandr-Solovev Alexandr-Solovev added dpc++ Issue/PR related to DPC++ functionality new algorithm New algorithm or method in oneDAL labels Oct 16, 2023
cpp/oneapi/dal/algo/pca/detail/finalize_train_ops.cpp Outdated Show resolved Hide resolved
cpp/oneapi/dal/algo/pca/detail/finalize_train_ops.hpp Outdated Show resolved Hide resolved
cpp/oneapi/dal/algo/pca/detail/finalize_train_ops_dpc.cpp Outdated Show resolved Hide resolved
cpp/oneapi/dal/algo/pca/detail/partial_train_ops_dpc.cpp Outdated Show resolved Hide resolved
cpp/oneapi/dal/algo/pca/test/fixture.hpp Outdated Show resolved Hide resolved
cpp/oneapi/dal/algo/pca/test/online.cpp Outdated Show resolved Hide resolved
examples/oneapi/cpp/source/pca/pca_dense_online.cpp Outdated Show resolved Hide resolved
examples/oneapi/dpc/source/pca/pca_cor_dense_online.cpp Outdated Show resolved Hide resolved
@Alexandr-Solovev
Copy link
Contributor Author

/intelci: run

@Alexandr-Solovev Alexandr-Solovev changed the title feature: online pca algorithm WIP:feature: online pca algorithm Oct 20, 2023
@Alexandr-Solovev Alexandr-Solovev changed the title WIP:feature: online pca algorithm feature: online pca algorithm Oct 24, 2023
@Alexandr-Solovev
Copy link
Contributor Author

/intelci: run

@Alexandr-Solovev
Copy link
Contributor Author

/intelci: run

@Alexandr-Solovev
Copy link
Contributor Author

/intelci: run

daal_covariance::internal::Hyperparameter daal_hyperparameter;
/// the logic of block size calculation is copied from DAAL,
/// to be changed to passing the values from the performance model
std::int64_t blockSize = 140;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why 140? and would it be 140 when row_count > 50000?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've taken the blockSize from batch kernel. May be @Vika-F can clarify this value?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic was copied from DAAL. The constants were defined in DAAL as the result of a series of performance measurements performed in the past. In DAAL, batch and online implementations share large amount of code, including those constants which are actually performance related hyperparameters of the algorithm.
In the future this logic will be moved into hyperparameter classes, this was already implemented in batch oneDAL algorithm, and will be replaced further by more intelligent solution (not just hard-coded constants).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Vika-F thank you for clarification, seems reasonable for now. Based on this logic block_size would be 140 when number of rows is > 50k, is that intended as well? Seems a bit odd to set to 1024 for larger data but then back to 140 for very large data.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be that in case of large number of rows the particular size of the block is not that sufficient - as there is enough work available for each thread. But this is only my guess. The original data from the performance experiments based on which those constants were defined is already unavailable.
And that's why we are moving towards more intelligent solution than hard-coded constants.

@Alexandr-Solovev
Copy link
Contributor Author

/intelci: run

dal::pca::partial_train_result<> partial_result;
std::cout << method_name << "\n" << std::endl;
auto input_table = split_table_by_rows<double>(x_train, nBlocks);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since examples are user facing and meant to demonstrate and explain usage and online is different from traditional train - might be worth adding in some intermediate model evaluation (or at least a comment that provides high level of what is happening with partial train) to fully demonstrate capabilities of online training

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, will be added in separate pr

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think that it would be better not to split the data table which was read from a single file into 10 blocks. More naturally would be to have several (maybe not 10, but rather 3) input files that contain the input blocks of data. And compute the result from those files.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its a good point, but I guess it suits for examples, because in online bazel tests we do generate data

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this comment is related to examples only.
In Bazel tests it is more convenient to have the data generated on the fly.

INFO("check if eigenvectors matrix is orthogonal")
check_eigenvectors_orthogonality(eigenvectors);
}

void check_infer_result(const pca::descriptor<Float, Method>& desc,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this function empty? shouldnt it be validating the results? or does that happen elsewhere

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We dont have accuracy check for results in batch. Will be added in separate pr

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it ever fail?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In terms of accuracy no, but it will be failed if it has incorrect dimensions or its not orthogonality

@Alexandr-Solovev
Copy link
Contributor Author

/intelci: run

@Alexandr-Solovev
Copy link
Contributor Author

/intelci: run

@Alexandr-Solovev Alexandr-Solovev added the oneAPI Issue/PR related to oneAPI part label Oct 28, 2023
@Alexandr-Solovev
Copy link
Contributor Author

/intelci: run

@Alexandr-Solovev
Copy link
Contributor Author

/intelci: run

using descriptor_t = detail::descriptor_base<task::dim_reduction>;

template <typename Float>
auto compute_eigenvectors_on_host(sycl::queue& q,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible to compute eigenvectors on device in three steps using oneMKL LAPACK functions:

  1. Reduce symmetric correlation matrix to tridiagonal form $A = Q \cdot T \cdot Q^T$ with ?sytrd or ?syrdb
  2. Compute eigenvectors $u$ of a tridiagonal matrix T using ?stemr
  3. Compute eigenvectors $v$ of the original matrix $A$ from the eigenvectors of the matrix $T$ as $v = Q \cdot u$

Eigenvalues of $A$ and $T$ are the same.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a good point, in the latest MKLGPUFPK we will have syevd(USM-versrion), it provides opportunity to compute eigenvalues and eigenvectors on gpu. It will be added in a separate pr.

Comment on lines +44 to +57
template <typename Float>
auto compute_sums(sycl::queue& q,
const pr::ndview<Float, 2>& data,
const dal::backend::event_vector& deps = {}) {
ONEDAL_PROFILER_TASK(compute_sums, q);
ONEDAL_ASSERT(data.has_data());

const std::int64_t column_count = data.get_dimension(1);
auto sums = pr::ndarray<Float, 1>::empty(q, { column_count }, sycl::usm::alloc::device);
auto reduce_event =
pr::reduce_by_columns(q, data, sums, pr::sum<Float>{}, pr::identity<Float>{}, deps);

return std::make_tuple(sums, reduce_event);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is duplicated at least in batch covariance algorithm, maybe somewhere else as well:
https://github.com/oneapi-src/oneDAL/blob/master/cpp/oneapi/dal/algo/covariance/backend/gpu/compute_kernel_dense_impl_dpc.cpp#L45
Please consider moving this into some common place and share the implementation between Covariance and PCA_Cov algorithms.

}

template <typename Float>
auto compute_crossproduct(sycl::queue& q,
Copy link
Contributor

@Vika-F Vika-F Oct 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is also duplicated here and in Covariance. Please consider removing the duplication.

And there are more duplications throughout the file. It would be better to get rid of them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like removing duplication in this pr requires a lot of changes in batch implementations also. Is it ok for you, if I do it in a separate pr?

Copy link
Contributor

@Vika-F Vika-F left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The API looks good to me, but I have a couple of comments to the internal implementation.

  • In CPU part of the batch PCA-covariance algorithms the DAAL covariance algorithm is just passed to PCA kernel, but in the online PCA you decided to call Covariance explicitly from oneDAL. Why it is implemented like this? Maybe passing the Covariance as an object to PCA can simplify the implementation.
  • I think it is not a question to online algorithm, but to PCA implementation in general. Currently PCA + covariance implementation differs in oneDAL and sklearn. oneDAL uses correlation matrix to compute PCA, and sklearn uses variance-covariance matrix. The results of those two approaches are very different in case the input dataset is not normalized. I see that this implementation also use correlation matrix to compute PCA and this choice is hard-coded. I think it is better to have an option to choose which kind of matrix: variance-covariance or correlation to use in PCA.
  • The code related to performance hyperparameters is duplicated 3+ times in this PR. I think it is important to have a follow up task of adding hyperparameter classes into online Covariance algorithm in oneDAL.

Please see the other comments also in the PR.

@Alexandr-Solovev
Copy link
Contributor Author

@Vika-F Thanks for review!
About difference in CPU part with calling PCA and Cov kernels. So, DAAL implementation of PcaOnline kernel contains additional variable. Its shared pointer to DAAL object OnlineCovariance. It contains all methods and all variables from OnlineCovariance class. And it works when we call finalize and partial from the same place, but with oneDAL online design we had two ways:
1)Contain DAAL OnlineCovariance as part of input of finalize
2)Avoid it and using daal Cov kernel, because in general online PcaCov also calls Cov kernel.

Copy link
Contributor

@Vika-F Vika-F left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
But let's have the comments regarding code duplications addressed in a separate PR the sooner the better.

@Alexandr-Solovev Alexandr-Solovev merged commit 7a304ec into oneapi-src:master Oct 31, 2023
13 of 14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dpc++ Issue/PR related to DPC++ functionality new algorithm New algorithm or method in oneDAL oneAPI Issue/PR related to oneAPI part
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants