-
Notifications
You must be signed in to change notification settings - Fork 210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature: online pca algorithm #2550
feature: online pca algorithm #2550
Conversation
cpp/oneapi/dal/algo/pca/backend/cpu/partial_train_kernel_cov.cpp
Outdated
Show resolved
Hide resolved
/intelci: run |
/intelci: run |
/intelci: run |
/intelci: run |
daal_covariance::internal::Hyperparameter daal_hyperparameter; | ||
/// the logic of block size calculation is copied from DAAL, | ||
/// to be changed to passing the values from the performance model | ||
std::int64_t blockSize = 140; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why 140? and would it be 140 when row_count > 50000?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've taken the blockSize
from batch kernel. May be @Vika-F can clarify this value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic was copied from DAAL. The constants were defined in DAAL as the result of a series of performance measurements performed in the past. In DAAL, batch and online implementations share large amount of code, including those constants which are actually performance related hyperparameters of the algorithm.
In the future this logic will be moved into hyperparameter
classes, this was already implemented in batch oneDAL algorithm, and will be replaced further by more intelligent solution (not just hard-coded constants).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Vika-F thank you for clarification, seems reasonable for now. Based on this logic block_size would be 140 when number of rows is > 50k, is that intended as well? Seems a bit odd to set to 1024 for larger data but then back to 140 for very large data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be that in case of large number of rows the particular size of the block is not that sufficient - as there is enough work available for each thread. But this is only my guess. The original data from the performance experiments based on which those constants were defined is already unavailable.
And that's why we are moving towards more intelligent solution than hard-coded constants.
/intelci: run |
dal::pca::partial_train_result<> partial_result; | ||
std::cout << method_name << "\n" << std::endl; | ||
auto input_table = split_table_by_rows<double>(x_train, nBlocks); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since examples are user facing and meant to demonstrate and explain usage and online is different from traditional train - might be worth adding in some intermediate model evaluation (or at least a comment that provides high level of what is happening with partial train) to fully demonstrate capabilities of online training
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, will be added in separate pr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also think that it would be better not to split the data table which was read from a single file into 10 blocks. More naturally would be to have several (maybe not 10, but rather 3) input files that contain the input blocks of data. And compute the result from those files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its a good point, but I guess it suits for examples, because in online bazel tests we do generate data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this comment is related to examples only.
In Bazel tests it is more convenient to have the data generated on the fly.
INFO("check if eigenvectors matrix is orthogonal") | ||
check_eigenvectors_orthogonality(eigenvectors); | ||
} | ||
|
||
void check_infer_result(const pca::descriptor<Float, Method>& desc, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this function empty? shouldnt it be validating the results? or does that happen elsewhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We dont have accuracy check for results in batch. Will be added in separate pr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it ever fail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In terms of accuracy no, but it will be failed if it has incorrect dimensions or its not orthogonality
/intelci: run |
/intelci: run |
/intelci: run |
/intelci: run |
using descriptor_t = detail::descriptor_base<task::dim_reduction>; | ||
|
||
template <typename Float> | ||
auto compute_eigenvectors_on_host(sycl::queue& q, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is possible to compute eigenvectors on device in three steps using oneMKL LAPACK functions:
- Reduce symmetric correlation matrix to tridiagonal form
$A = Q \cdot T \cdot Q^T$ with ?sytrd or ?syrdb - Compute eigenvectors
$u$ of a tridiagonal matrix T using ?stemr - Compute eigenvectors
$v$ of the original matrix$A$ from the eigenvectors of the matrix$T$ as$v = Q \cdot u$
Eigenvalues of
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a good point, in the latest MKLGPUFPK we will have syevd(USM-versrion), it provides opportunity to compute eigenvalues and eigenvectors on gpu. It will be added in a separate pr.
template <typename Float> | ||
auto compute_sums(sycl::queue& q, | ||
const pr::ndview<Float, 2>& data, | ||
const dal::backend::event_vector& deps = {}) { | ||
ONEDAL_PROFILER_TASK(compute_sums, q); | ||
ONEDAL_ASSERT(data.has_data()); | ||
|
||
const std::int64_t column_count = data.get_dimension(1); | ||
auto sums = pr::ndarray<Float, 1>::empty(q, { column_count }, sycl::usm::alloc::device); | ||
auto reduce_event = | ||
pr::reduce_by_columns(q, data, sums, pr::sum<Float>{}, pr::identity<Float>{}, deps); | ||
|
||
return std::make_tuple(sums, reduce_event); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is duplicated at least in batch covariance algorithm, maybe somewhere else as well:
https://github.com/oneapi-src/oneDAL/blob/master/cpp/oneapi/dal/algo/covariance/backend/gpu/compute_kernel_dense_impl_dpc.cpp#L45
Please consider moving this into some common place and share the implementation between Covariance and PCA_Cov algorithms.
} | ||
|
||
template <typename Float> | ||
auto compute_crossproduct(sycl::queue& q, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is also duplicated here and in Covariance. Please consider removing the duplication.
And there are more duplications throughout the file. It would be better to get rid of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like removing duplication in this pr requires a lot of changes in batch implementations also. Is it ok for you, if I do it in a separate pr?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The API looks good to me, but I have a couple of comments to the internal implementation.
- In CPU part of the batch PCA-covariance algorithms the DAAL covariance algorithm is just passed to PCA kernel, but in the online PCA you decided to call Covariance explicitly from oneDAL. Why it is implemented like this? Maybe passing the Covariance as an object to PCA can simplify the implementation.
- I think it is not a question to online algorithm, but to PCA implementation in general. Currently PCA + covariance implementation differs in oneDAL and sklearn. oneDAL uses correlation matrix to compute PCA, and sklearn uses variance-covariance matrix. The results of those two approaches are very different in case the input dataset is not normalized. I see that this implementation also use correlation matrix to compute PCA and this choice is hard-coded. I think it is better to have an option to choose which kind of matrix: variance-covariance or correlation to use in PCA.
- The code related to performance hyperparameters is duplicated 3+ times in this PR. I think it is important to have a follow up task of adding
hyperparameter
classes into online Covariance algorithm in oneDAL.
Please see the other comments also in the PR.
@Vika-F Thanks for review! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
But let's have the comments regarding code duplications addressed in a separate PR the sooner the better.
No description provided.