-
Notifications
You must be signed in to change notification settings - Fork 210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable ARM(SVE) CPU support with reference backend #2614
Enable ARM(SVE) CPU support with reference backend #2614
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the most part, the structure looks good, and a change for building on a generic (non vendor-specific) CPU looks very similar to what is here. One of the most common comments I have (I didn't put all of them in), is that it's probably good to have x86 branches first, as this is the most likely code path at the moment.
There are a few things to discuss still, which can be quite big in themselves:
- Do we target building with Bazel for new architectures? I don't have an opinion yet, as I'm not familiar with Bazel
- How do we structure the code for new architectures? At the moment, there are multiple
#ifdef
in the files, where if this is done for each (micro-)architecture, the files can become cluttered. I would advocate for having target-specific sub-directories, and put target specific code in those, keeping as much common code in the files they are currently in.
@@ -16,7 +17,11 @@ | |||
|
|||
#pragma once | |||
#include <cstdint> | |||
#include <immintrin.h> | |||
|
|||
#ifndef __ARM_ARCH |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like #ifdef TARGET_X86_64
- it shouldn't be included if Arm isn't defined. It should be included if x86 is defined.
cpp/oneapi/dal/algo/triangle_counting/backend/cpu/intersection_tc.hpp
Outdated
Show resolved
Hide resolved
cpp/oneapi/dal/backend/primitives/intersection/intersection.hpp
Outdated
Show resolved
Hide resolved
@keeranroth we din't really change any structure (apart from adding make file extensions for micro-archs) and stuck to what x86 follows so we just added arm compatibility on top of it. My thoughts on further points:
|
/intelci: run |
The approach above involves too many mk files like CMPLR.BACKEND.ARCH.mk. Having "n" backends and "m" archs, it will give [nxm] matrix. For every new arch it will be necessary to add all backends and vise versa and repeat it for all supported compilers. My suggestion is to separate backend and arch definitions to have a single file at least for every backend. For instance, defs.ref.mk for "ref" backend could contain COMPILER.defs += -DDAAL_REF -DONEDAL_REF -DONEAPI_DAAL_NO_MKL_GPU_FUNC while cmplr.clang.x86.mk could use COMPILER.defs above (both are included in main makefile): **COMPILER.mac.clang = clang++ -m64 -fgnu-runtime -stdlib=libc++ -mmacosx-version-min=10.15 -fwrapv COMPILER.lnx.clang = clang++ -m64 |
@@ -42,11 +43,16 @@ namespace daal | |||
*/ | |||
enum CpuType | |||
{ | |||
#ifdef __ARM_ARCH | |||
sve = 0, /*!< ARM processors based on Arm's Scalable Vector Extension (SVE) */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sve = 0, /*!< ARM processors based on Arm's Scalable Vector Extension (SVE) */ | |
sve = 0, /*!< ARM(R) processors based on Arm(R) Scalable Vector Extension (SVE) */ |
avx512 = 2 /*!< Intel(R) Xeon(R) processors based on Intel(R) Advanced Vector Extensions 512 (Intel(R) AVX-512) \DAAL_DEPRECATED */ | ||
|
||
#ifdef __ARM_ARCH | ||
sve = 2, /*!< ARM processors based on Arm's Scalable Vector Extension (SVE) */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sve = 2, /*!< ARM processors based on Arm's Scalable Vector Extension (SVE) */ | |
sve = 2, /*!< ARM(R) processors based on Arm(R) Scalable Vector Extension (SVE) */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for adding ARM support to oneDAL! This is a tremendous work.
I have reviewed the code in ./cpp folder and it looks good and mostly correct. Please see the additional comments below.
But my biggest concern is that the Azure pipelines are not updated to test those changes.
.ci/pipelines/ci.yml file needs to be updated to support a new job for ARM.
Can you please update the pipelines so we can see if the changes are working as expected?
#ifdef __ARM_ARCH | ||
#define CPU_TYPE sve | ||
#else | ||
#define CPU_TYPE avx512 | ||
#endif | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please replace CPU_TYPE
with DAAL_CPU_TYPE
.
Also those changes might negatively impact the performance on ARM because the size of the block was tuned here for AVX512 and the same block size might not be good for other CPU architectures.
It is possible to tune the block size by running performance experiments with the different block sizes using oneDAL API as follows:
const auto input = dal::read<dal::table>(dal::csv::data_source{ input_file_name });
const auto cov_desc = dal::covariance::descriptor{}.set_result_options(
dal::covariance::result_options::cor_matrix | dal::covariance::result_options::means);
dal::covariance::detail::compute_parameters params{};
params.set_cpu_macro_block(140 /* tested block size */);
const auto result = dal::compute(cov_desc, params, input);
It is also possible to use Python API for this purpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for suggestion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @Vika-F,
Can you please share the python equivalent code to generate block size, and what "input_file_name" should be use to get optimal block size ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Covariance algorithms' hyperparameters can be tuned using PCA, as the Covariance algorithm is used as part of PCA.
Here is the code:
pca = PCA(n_components=2, svd_solver="full")
pca.get_hyperparameters("fit").cpu_macro_block = 140 # tested block size
pca.fit(X)
The input_file_name
(or ndarray X in Python code) should point to the file with the input data for Covariance/PCA algorithm.
You can place there any data which you think is relevant for performance tuning.
@@ -41,6 +46,28 @@ | |||
void __daal_serv_CPUHasAVX512f_enable_it_mac(); | |||
#endif | |||
|
|||
#if defined(__ARM_ARCH) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. This sounds reasonable. Possible naming for the new directory: compiler/arm
.
#ifdef __ARM_ARCH | ||
vertex_similarity_result<task::all_vertex_pairs> jaccard_sve( | ||
#else | ||
vertex_similarity_result<task::all_vertex_pairs> jaccard_avx512( | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will not work like this. Because the code below uses AVX512 intrinsics which are incompatible with SVE.
That's why please use the default code path on ARM.
@@ -1,6 +1,7 @@ | |||
/* file: algorithm_container_base_batch.h */ | |||
/******************************************************************************* | |||
* Copyright 2014 Intel Corporation | |||
* Copyright 2023-24 FUJITSU LIMITED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Copyright 2023-24 FUJITSU LIMITED | |
* Copyright 2024 FUJITSU LIMITED |
I believe we have an internal agreement that year in the file header reflect the year of initial file creation or modification (in your case)
@@ -143,10 +144,15 @@ class AlgorithmContainerImpl<batch> : public AlgorithmContainer<batch> | |||
* \tparam avx2Container Implementation for Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2) | |||
* \tparam avx512Container Implementation for Intel(R) Xeon(R) processors based on Intel AVX-512 | |||
*/ | |||
#ifdef __ARM_ARCH |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about other macros? I believe I've seen at least __arch64__
, _M_ARM64
somewhere in ARM specific code (in other repos)
@@ -1,6 +1,7 @@ | |||
/* file: env_detect_features.cpp */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would actually suggest to split this particular file into env_detct_x86_features.cpp
and ``env_detect_arm_features.cppin order to not create the
#ifdef` hell
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also suggest to have the files live in cpp/daal/src/services/compiler/generic/<arch>/env_detect_features.cpp
, and then have the appropriate file built by the build system, depending on compile-time configuration
cpp/oneapi/dal/algo/subgraph_isomorphism/backend/cpu/compiler_adapt.hpp
Outdated
Show resolved
Hide resolved
cpp/oneapi/dal/algo/subgraph_isomorphism/backend/cpu/compiler_adapt.hpp
Outdated
Show resolved
Hide resolved
@@ -0,0 +1,131 @@ | |||
/* file: kernel_inst_arm.h */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The file "kernel_inst_x86.h" was recently modified - some unused macros were removed. Please see PR #2617 for the details.
Please make the similar removals in this file.
The macros that contain "SYCL" should also be removed as all new SYCL development is moved into oneapi/dal part of the code base.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Vika-F that PR does not show any SYCL related modifications, could you please help point it out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rakshithgb-fujitsu - this is not part of this PR, but a comment for longerterm plans. all sycl related finctionality in DAAL library is deprecated and will be removed. so there is no point in copying DAAL SYCL related defines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rakshithgb-fujitsu Yes, my point was about the deprecation of SYCL-related functionality in DAAL. Since it is deprecated, no need to replicate those defines for ARM. They will be removed in near future anyway.
Now I also see that it is not easy to remove those defines at all right now as they are used in various files.
So, let's leave them as is for now in order not to block the development anymore.
.ci/scripts/build.sh
Outdated
@@ -1,6 +1,7 @@ | |||
#! /bin/bash | |||
#=============================================================================== | |||
# Copyright 2019 Intel Corporation | |||
# Copyright 2023-24 FUJITSU LIMITED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per discussion with UXL - we can start adding "Copyright Contributors to the oneDAL project", while keeping existing Intel copyright that already exist.
There should be some checks fixes done internally to reflect this + we will be starting using new copyright as well
@@ -53,7 +60,7 @@ static partial_compute_result<Task> call_daal_kernel_partial_compute( | |||
/// the logic of block size calculation is copied from DAAL, | |||
/// to be changed to passing the values from the performance model | |||
std::int64_t blockSize = 140; | |||
if (ctx.get_enabled_cpu_extensions() == dal::detail::cpu_extension::avx512) { | |||
if (ctx.get_enabled_cpu_extensions() == CPU_EXTENSION) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure if this is right way to handle this - the idea of this block is that this is avx512 specialized code here and as well as in some other algorithms. It assume avx512 isa and i'm not sire that it would work that easily on ARM. We probably can have a better approach defined for architecture/ISA specific implementation in algorithms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. One possible approach is to have a function defined which returns the block size, and this can be different per architecture. This is already done in a few locations with templating, but it looks like we want a runtime check here, so a slightly different implementation is required.
dev/download_tbb.sh
Outdated
|
||
rm -rf __deps/tbb | ||
pushd oneTBB | ||
git checkout v2021.11.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
even once this would be in separate script it would worth placing dependency version at least to the header of the file
.ci/env/tbb.sh
Outdated
CoreCount=$(lscpu -p | grep -Ev '^#' | wc -l) | ||
|
||
rm -rf __deps/tbb | ||
pushd oneTBB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name of the directory should reflect that this is a build directory, so onetbb-build
or similar would be better. I'm not a fan of mixed-case names, but that is just a personal preference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@keeranroth thank you for suggestion.
Currently makefile expect TBB dependency to reside inside __deps/tbb/
. Changing name of directory would require making changes in other makefiles as well, which at this point I think is not required. @napetrov any comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was referring to the oneTBB
directory name, which is put into the root directory at the moment. This can probably be onetbb-src or something similar, so that the clone at the top of this file would be:
git clone https://github.com/oneapi-src/oneTBB.git onetbb-src
and it's probably worth doing a shallow clone so that the switch of the branch isn't necessary
git clone --depth 1 --branch v2021.11.0 https://github.com/oneapi-src/oneTBB.git onetbb-src
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think downloading sources archive is a better option that gitclones here.
In terms of names - i guess the point is to have build directory properly named as build directory and compiler tbb can be put in to dependencies directory. Point is that it would be easier to debug things with clear naming
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@napetrov, what if we go with shallow clone as suggested by @keeranroth so that same task can be done in less lines of code as compared to archive download.
git clone --depth 1 --branch v2021.11.0 https://github.com/oneapi-src/oneTBB.git onetbb-src
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have nothing against git clone. The only point - it would be good to have single place for versions modification.
Also there is open question on if env/tbb.sh should be merged with download_tbb script from dev folder
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original pull request had both in the same file, which I commented I thought looked weird, in that the download script was also an Arm build recipe. I think that it's better with two different scripts, but happy to go with the majority on that one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 different scipts is option, but then tbb build script for arm should be named properly. For now it's just tbb.sh that would mean that this is universal script. So might be just rename it it to reflect that it's intended to work on arm only and potentially add platform check there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also think env/tbb.sh
should be generic. Only catch is for intel and arm, we know dependency path are __deps/tbb/lnx/lib/intel64/gcc4.8/
and __deps/tbb/lnx/lib/arm/gcc4.8/
respectively. How do we take care for other architecture?
Our suggestion for deciding architecture directory -
arch=$(uname -m)
if [ "${arch}" == "x86_64" ]; then
arch_dir="intel64"
elif [ "${arch}" == "aarch64" ]; then
arch_dir="arm"
else
arch_dir=${arch}
fi
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, we can put things in this way. Although - i thought that arm support would be cross compiled ? i.e. build would happen on x86 machine for arm target. so uname will not necessary return arm platform
Also i would say we can move download_tbb script logic here for Intel and just use prebuild tbb instead of building it from sources.
deploy/nuget/prepare_dal_nuget.sh
Outdated
@@ -101,6 +110,9 @@ create_package() { | |||
if [ "${platform}" = "linux-x64" ]; then | |||
dl_postfix=.so.${major_binary_version}.${minor_binary_version} | |||
sl_postfix=.a | |||
elif [ "${platform}" = "linux-aarch64" ]; then | |||
dl_postfix=.${major_binary_version}.${minor_binary_version}.dylib |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be .so
suffix on linux?
@@ -1,5 +1,6 @@ | |||
#=============================================================================== |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to have this file with architecture-specific parts split out into their own files. That requires quite a bit of a refactor of this file so that we can keep as much common code in here as possible. It should be possible to just split out the setting of variables into architecture specific parts at the start of the file, and then use these later on in the makefile. I'm happy to have a look at this, if required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@keeranroth yup please have a look at this and do make suggestions as needed, this seems a bit tricky to isolate the parts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack. I'll start having a look at this tomorrow.
71c0021
to
ab084d1
Compare
04ee1b0
to
b86d408
Compare
@ethanglaser @napetrov can you please help to check if the internal CI is still failing with the recent rebase! Thanks |
/intelci: run |
Hi @rageshhajela16! I doublecheck how __release_lnx/daal/latest/lib/cmake/oneDAL/oneDALConfig.cmake looks like and it looks So instead of lib/"arch_name" it uses just lib, and I would like to say that its incorrectly. I also doublechecked your changes and these changes looks good for me https://github.com/oneapi-src/oneDAL/pull/2614/files#diff-a711276942a2647ec3ee01db428047af7e0efcd9ed25a93f600bc9bf53b70c2dR212 |
From the build directory name it looks like, onedal is built using Everything works fine. All test examples building and passing successfully. here compiler is automatically taken as oneDALConfig.cmake contain correct prefix - Just to cross verify, is it possible to share configuration used to generate binaries? More precisely build command. |
Hi @ajay-fuji @rageshhajela16 . I also checked it with manually build and it works fine. For me rn it looks like an issue in the CI step command/script, will check, thanks! |
on icc usage question - yes, we use it for build currently with plan for migration to icx. |
@ethanglaser here the job with infra fixes https://intel-ci.intel.com/eecfd78a-97b3-f181-afe8-a4bf010d0e2e |
would suggest running nightly validation to check on broader scope, but so far things looks good and ready for integration |
Previous regular job has successfully finished! |
/intelci: run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's wait for CI/Nightly results but overall changes are good to go
/intelci: run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good from CI perspective. Thanks for taking a look @Alexandr-Solovev!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Ci is also green. @napetrov should you merge it?
Thanks everyone for working on this PR! |
Thanks everyone contributing to this PR and for the review! |
Following on from pull request oneapi-src#2614, we want to make it easier to distinguish common and platform specific code when building. We take common code from the main makefile, and split this out into arch and uarch specific files. We try and share as much code as possible between uarch files. In this first commit, we make sure that the x86 build works on linux. Next commits will make sure that the Mac, Windows and Arm builds also work. For now these are broken.
Following on from pull request oneapi-src#2614, we want to make it easier to distinguish common and platform specific code when building. We take common code from the main makefile, and split this out into arch and uarch specific files. We try and share as much code as possible between uarch files. In this first commit, we make sure that the x86 build works on linux. Next commits will make sure that the Mac, Windows and Arm builds also work. For now these are broken.
* Don't use double brackets in source script Match the same style as used in the rest of the script. I observed a syntax error when running in a Docker container locally as well, so fix this up quickly. * Move arch and uarch specific code out of main Makefile Following on from pull request #2614, we want to make it easier to distinguish common and platform specific code when building. We take common code from the main makefile, and split this out into arch and uarch specific files. We try and share as much code as possible between uarch files. In this first commit, we make sure that the x86 build works on linux. Next commits will make sure that the Mac, Windows and Arm builds also work. For now these are broken. * Update Mac, Windows and Arm makefile configuration Put Mac and Windows makefile code into their own files, and correct some copy-paste errors in the Arm build. This should build on all platforms now, but there are some more commits to go on top before the refactor is done. * Refactor compiler configuration in makefiles We want to have as much common code shared between makefiles for different compilers, as well as for the different backends. We shuffle things around a little. This now passes on x86 Linux as well as aarch64. Needs some internal tests to be run. * Remove recursive definition * Code review copyright notice and new line changes * Remove year from contributors copyright notice * Don't use year in copyright * review comments
Proposed changes in this PR
make -f makefile daal oneapi_c BACKEND_CONFIG=ref PLAT=lnxarm -j$(nproc)
Open Issues/Limitations: