# Requirements

---
- 🧩 Intel OpenAPI Base Toolkit:

 > ➡️ [https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit-download.html]

---
- 🧩 Python Package:

 > ⚡`dpctl` — Data Parallel Control : `pip install dpctl`

 > ⚡`dpnp` — Data Parallel NumPy : `pip install dpnp`

 > ⚡`numba-dpex` — Numba Data Parallel Extension : `pip install -i https://pypi.anaconda.org/dppy/label/dev/simple numba-dpex`

 > ⚡`scikit-learn-intelex` — Intel Extension for scikit-learn : `pip install scikit-learn-intelex`

---


### 🧩 `dpctl` — Data Parallel Control

🔧 A Python interface to SYCL (C++ parallel programming model).  
🖥️ Enables control over devices (CPU, GPU) and execution queues.  
📦 Useful for managing low-level data-parallel operations.

➡️ [PyPI link](https://pypi.org/project/dpctl/)  
➡️ [GitHub](https://github.com/IntelPython/dpctl)

---

### 🧮 `dpnp` — Data Parallel NumPy

📊 A NumPy-compatible library for accelerated array operations.  
⚡ Drop-in replacement for NumPy using SYCL under the hood.  
🎯 Run NumPy-like code on Intel GPUs and CPUs without rewriting.

➡️ [PyPI link](https://pypi.org/project/dpnp/)  
➡️ [GitHub](https://github.com/IntelPython/dpnp)

---

### ⚙️ `numba-dpex` — Numba Data Parallel Extension

🚀 Enables JIT-compiled Python kernels for SYCL devices (like Intel GPUs).  
🧠 Extends Numba's syntax to support `@kernel` and `@dpjit` decorators.  
💡 Write CUDA-style parallel loops in Python!

➡️ [PyPI link](https://pypi.org/project/numba-dpex/)  
➡️ [GitHub](https://github.com/IntelPython/numba-dpex)

---

### 🤖 `scikit-learn-intelex` — Intel Extension for scikit-learn

📈 Accelerates `scikit-learn` estimators using Intel oneAPI Data Analytics Library (oneDAL).  
🧪 Improves training/prediction time significantly on Intel hardware.  
✅ Plug-and-play: just import and patch!

```python
from sklearnex import patch_sklearn
patch_sklearn()



# Check if the GPU is connected

In [1]:
import importlib

packages = {
    "dpctl": "dpctl",
    "dpnp": "dpnp",
    "numba-dpex": "numba_dpex",
    "scikit-learn": "sklearn"
}

for name, module in packages.items():
    spec = importlib.util.find_spec(module)
    if spec is not None:
        try:
            mod = importlib.import_module(module)
            version = getattr(mod, "__version__", "version not found")
            print(f"✅ {name} is installed — version: {version}")
        except Exception as e:
            print(f"⚠️ {name} is installed but failed to get version — {e}")
    else:
        print(f"❌ {name} is NOT installed")


✅ dpctl is installed — version: 0.19.0
✅ dpnp is installed — version: 0.17.0
⚠️ numba-dpex is installed but failed to get version — DLL load failed while importing _dpexrt_python: The specified module could not be found.
✅ scikit-learn is installed — version: 1.6.1


In [2]:
import dpctl

for device in dpctl.get_devices():
    print("🖥️ Device:", device.name)
    print("🔧 Backend:", device.backend)
    print("🚀 Is GPU:", device.is_gpu)
    print("===")

🖥️ Device: Intel(R) Arc(TM) B580 Graphics
🔧 Backend: backend_type.level_zero
🚀 Is GPU: True
===
🖥️ Device: Intel(R) Arc(TM) B580 Graphics
🔧 Backend: backend_type.opencl
🚀 Is GPU: True
===
🖥️ Device: AMD Ryzen 7 5700G with Radeon Graphics         
🔧 Backend: backend_type.opencl
🚀 Is GPU: False
===


In [3]:
try:
    import dpnp
    print("✅ dpnp imported successfully.")

    a = dpnp.arange(10)
    b = dpnp.ones(10)
    c = a + b

    print("a (dpnp):", a)
    print("b (dpnp):", b)
    print("a + b:", c)

    print("✅ dpnp computation ran without error.")
except Exception as e:
    print("❌ dpnp is NOT working properly:")
    print(e)



✅ dpnp imported successfully.
a (dpnp): [0 1 2 3 4 5 6 7 8 9]
b (dpnp): [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
a + b: [ 1.  2.  3.  4.  5.  6.  7.  8.  9. 10.]
✅ dpnp computation ran without error.


# CPU vs GPU Test

In [4]:
import numpy as np
import dpnp
import dpctl
import time

# CPU (numpy)
size = 100_000_000
a_cpu = np.ones(size, dtype=np.float32)
b_cpu = np.ones(size, dtype=np.float32)

start_cpu = time.time()
c_cpu = a_cpu + b_cpu
end_cpu = time.time()
print(f"🧠 NumPy (CPU) Time: {end_cpu - start_cpu:.3f} sec")

# GPU (dpnp + Intel GPU)
device = dpctl.SyclDevice("level_zero:gpu")
queue = dpctl.SyclQueue(device)

a_gpu = dpnp.ones(size, dtype=dpnp.float32, sycl_queue=queue)
b_gpu = dpnp.ones(size, dtype=dpnp.float32, sycl_queue=queue)

start_gpu = time.time()
c_gpu = a_gpu + b_gpu
end_gpu = time.time()
print(f"🚀 dpnp (GPU) Time: {end_gpu - start_gpu:.3f} sec")


🧠 NumPy (CPU) Time: 0.120 sec
🚀 dpnp (GPU) Time: 0.023 sec


In [5]:
import numpy as np
import dpnp
import dpctl
import time

# Settings
n = 4096        # Matrix size (4096x4096)
repeats = 50    # Number of times to multiply

print(f"🔁 Repeating {repeats} matrix multiplications of {n}x{n}")

# 🧠 CPU (NumPy)
A_cpu = np.random.rand(n, n).astype(np.float32)
B_cpu = np.random.rand(n, n).astype(np.float32)

start_cpu = time.time()
for _ in range(repeats):
    C_cpu = A_cpu @ B_cpu
end_cpu = time.time()
print(f"🧠 NumPy (CPU) Time: {end_cpu - start_cpu:.2f} sec")

# 🚀 GPU (dpnp + Intel Arc via Level-Zero)
device = dpctl.SyclDevice("level_zero:gpu")
queue = dpctl.SyclQueue(device)

A_gpu = dpnp.array(A_cpu, sycl_queue=queue)
B_gpu = dpnp.array(B_cpu, sycl_queue=queue)

start_gpu = time.time()
for _ in range(repeats):
    C_gpu = A_gpu @ B_gpu
end_gpu = time.time()
print(f"🚀 dpnp (GPU) Time: {end_gpu - start_gpu:.2f} sec")


🔁 Repeating 50 matrix multiplications of 4096x4096
🧠 NumPy (CPU) Time: 10.56 sec
🚀 dpnp (GPU) Time: 0.55 sec


In [6]:
import numpy as np
import dpnp
import dpctl
import time
from tqdm import tqdm

# Settings
n       = 4096        # Matrix size (4096x4096)
repeats = 500    # Number of times to multiply

print(f"Repeating {repeats} matrix multiplications of {n}x{n}")

# CPU (NumPy)
A_cpu = np.random.rand(n, n).astype(np.float32)
B_cpu = np.random.rand(n, n).astype(np.float32)

start_cpu = time.time()
for _ in tqdm(range(repeats), desc="NumPy CPU"):
    C_cpu = A_cpu @ B_cpu
end_cpu = time.time()
print(f"NumPy (CPU) Time: {end_cpu - start_cpu:.2f} sec")

# GPU (dpnp + Intel Arc via Level-Zero)
device = dpctl.SyclDevice("level_zero:gpu")
queue = dpctl.SyclQueue(device)

A_gpu = dpnp.array(A_cpu, sycl_queue=queue)
B_gpu = dpnp.array(B_cpu, sycl_queue=queue)

start_gpu = time.time()
for _ in tqdm(range(repeats), desc="dpnp GPU"):
    C_gpu = A_gpu @ B_gpu
end_gpu = time.time()

print(f"dpnp (GPU) Time: {end_gpu - start_gpu:.2f} sec")


Repeating 500 matrix multiplications of 4096x4096


NumPy CPU: 100%|█████████████████████████████████████████████████████████████████████| 500/500 [01:49<00:00,  4.57it/s]


NumPy (CPU) Time: 109.42 sec


dpnp GPU: 100%|█████████████████████████████████████████████████████████████████████| 500/500 [00:04<00:00, 103.21it/s]

dpnp (GPU) Time: 4.85 sec



