Intel® oneAPI Data Analytics Library 2021.2
The release introduces the following changes:
Library Engineering:
- Enabled new PyPI distribution channel for daal4py:
- Four latest Python versions (3.6, 3.7, 3.8, 3.9) are supported on Linux, Windows and MacOS.
- Support of both CPU and GPU is included in the package.
- You can download daal4py using the following command:
pip install daal4py
- Introduced CMake support for oneDAL examples
Support Materials
The following additional materials were created:
- Medium blogs:
- Kaggle kernels:
What's New
Introduced new oneDAL and daal4py functionality:
- CPU:
- Hist method for Decision Forest Classification and Regression, which outperforms the existing exact method
- Bit-to-bit results reproducibility for: Linear and Ridge regressions, LASSO and ElasticNet, KMeans training and initialization, PCA, SVM, kNN Brute Force method, Decision Forest Classification and Regression
- GPU:
- Multi-node multi-GPU algorithms: KMeans (batch), Covariance (batch and online), Low order moments (batch and online) and PCA
- Sparsity support for SVM algorithm
Improved oneDAL and daal4py performance for the following algorithms:
- CPU:
- Decision Forest training Classification and Regression
- Support Vector Machines training and prediction
- Logistic Regression, Logistic Loss and Cross Entropy for non-homogeneous input types
- GPU:
- Decision Forest training Classification and Regression
- All algorithms with GPU kernels (as a result of migration to Unified Shared Memory data management)
- Reduced performance overhead for oneAPI C++ interfaces on CPU and oneAPI DPC++ interfaces on GPU
Added technical preview features in Graph Analytics:
- CPU:
- Local and Global Triangle Counting
Introduced new functionality for scikit-learn patching through daal4py:
- CPU:
- Patches for four latest scikit-learn releases: 0.21.X, 0.22.X, 0.23.X and 0.24.X
- Acceleration of
roc_auc_score
function - Bit-to-bit results reproducibility for:
LinearRegression
,Ridge
,SVC
,KMeans
,PCA
,Lasso
,ElasticNet
,tSNE
,KNeighborsClassifier
,KNeighborsRegressor
,NearestNeighbors
,RandomForestClassifier
,RandomForestRegressor
Improved performance of the following scikit-learn estimators via scikit-learn patching:
- CPU
RandomForestClassifier
andRandomForestRegressor
scikit-learn estimators: training and prediction- Principal Component Analysis (PCA) scikit-learn estimator: training
- Support Vector Classification (SVC) scikit-learn estimators: training and prediction
- Support Vector Classification (SVC) scikit-learn estimator with the
probability==True
parameter: training and prediction
Fixed the following issues:
-
Scikit-learn patching:
- Improved accuracy of
RandomForestClassifier
andRandomForestRegressor
scikit-learn estimators - Fixed patching issues with
pairwise_distances
- Fixed the behavior of the
patch_sklearn
andunpatch_sklearn
functions - Fixed unexpected behavior that made accelerated functionality unavailable through scikit-learn patching if the unput was not of
float32
orfloat64
data types. Scikit-learn patching now works with all numpy data types. - Fixed a memory leak that appeared when
DataFrame
from pandas was used as an input type - Fixed performance issue for interoperability with Modin
- Improved accuracy of
-
daal4py:
- Fixed the crash of SVM and kNN algorithms on Windows on GPU
-
oneDAL:
- Improved accuracy of Decision Forest Classification and Regression on CPU
- Improved accuracy of KMeans algorithm on GPU
- Improved stability of Linear Regression and Logistic Regression algorithms on GPU
Known Issues
- oneDAL
vars.sh
script does not support kornShell