Intel Scalable Dataframe Compiler (Intel® SDC) scales analytics/ML codes in Python to bare-metal cluster/cloud performance automatically. It compiles a subset of Python (Pandas/Numpy) to efficient parallel binaries with MPI, requiring only minimal code changes. Intel SDC is orders of magnitude faster than alternatives like Apache Spark.
Intel SDC's documentation can be found here.
conda install -c intel -c intel/label/test sdc
Here is a Pi calculation example in Intel SDC:
import sdc
import numpy as np
import time
@sdc.jit
def calc_pi(n):
t1 = time.time()
x = 2 * np.random.ranf(n) - 1
y = 2 * np.random.ranf(n) - 1
pi = 4 * np.sum(x**2 + y**2 < 1) / n
print("Execution time:", time.time()-t1, "\nresult:", pi)
return pi
calc_pi(2 * 10**8)
Save this in a file named pi.py and run (on 8 cores):
mpiexec -n 8 python pi.py
This should demonstrate about 100x speedup compared to regular Python version without @sdc.jit and mpiexec.
These academic papers describe the underlying methods in Intel SDC:
We use Anaconda distribution of Python for setting up Intel SDC build environment.
If you do not have conda, we recommend using Miniconda3:
wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh chmod +x miniconda.sh ./miniconda.sh -b export PATH=$HOME/miniconda3/bin:$PATH
It is possible to build Intel SDC via conda-build or setuptools. Follow one of the cases below to install Intel SDC and its dependencies on Linux.
PYVER=<3.6 or 3.7> conda create -n CBLD python=$PYVER conda-build source activate CBLD git clone https://github.com/IntelPython/sdc cd sdc # build Intel SDC conda build --python $PYVER --override-channels -c numba -c conda-forge -c defaults buildscripts/sdc-conda-recipe
PYVER=<3.6 or 3.7> conda create -n SDC -q -y -c numba -c conda-forge -c defaults numba mpich pyarrow=0.15.0 arrow-cpp=0.15.0 gcc_linux-64 gxx_linux-64 gfortran_linux-64 scipy pandas boost python=$PYVER source activate SDC git clone https://github.com/IntelPython/sdc cd sdc # build SDC python setup.py install
In case of issues, reinstalling in a new conda environment is recommended.
Building Intel® SDC on Windows requires Build Tools for Visual Studio 2019 (with component MSVC v140 - VS 2015 C++ build tools (v14.00)):
- Install Build Tools for Visual Studio 2019 (with component MSVC v140 - VS 2015 C++ build tools (v14.00)).
- Install Miniconda for Windows.
- Start 'Anaconda prompt'
It is possible to build Intel SDC via conda-build or setuptools. Follow one of the cases below to install Intel SDC and its dependencies on Windows.
set PYVER=<3.6 or 3.7> conda create -n CBLD -q -y python=%PYVER% conda-build conda-verify vc vs2015_runtime vs2015_win-64 conda activate CBLD git clone https://github.com/IntelPython/sdc.git cd sdc conda build --python %PYVER% --override-channels -c numba -c defaults -c intel buildscripts\sdc-conda-recipe
conda create -n SDC -c numba -c defaults -c intel -c conda-forge python=<3.6 or 3.7> numba impi-devel pyarrow=0.15.0 arrow-cpp=0.15.0 scipy pandas boost conda activate SDC git clone https://github.com/IntelPython/sdc.git cd sdc set INCLUDE=%INCLUDE%;%CONDA_PREFIX%\Library\include set LIB=%LIB%;%CONDA_PREFIX%\Library\lib %CONDA_PREFIX%\Library\bin\mpivars.bat quiet python setup.py install
- If the
cl
compiler throws the error fatalerror LNK1158: cannot run 'rc.exe'
, add Windows Kits to your PATH (e.g.C:\Program Files (x86)\Windows Kits\8.0\bin\x86
). - Some errors can be mitigated by
set DISTUTILS_USE_SDK=1
. - For setting up Visual Studio, one might need go to registry at
HKEY_LOCAL_MACHINE\SOFTWARE\WOW6432Node\Microsoft\VisualStudio\SxS\VS7
, and add a string value named14.0
whose data isC:\Program Files (x86)\Microsoft Visual Studio 14.0\
. - Sometimes if the conda version or visual studio version being used are not latest then building Intel SDC can throw some vague error about a keyword used in a file. So make sure you are using the latest versions.
conda install h5py python sdc/tests/gen_test_data.py python -m unittest