# Performance Optimization in NumPy

Optimizing performance in NumPy is essential when working with large datasets or computationally intensive operations. This section covers techniques for efficient array processing, memory optimization, profiling, and accelerating NumPy code with Numba.

---

### 1. Efficient Array Processing Techniques

In [1]:
import numpy as np

In [2]:
# Creating a large array
arr = np.array([1, 2, 3, 4])

# In-place addition to modify the original array
arr += 10
print("In-place addition result:", arr)

In-place addition result: [11 12 13 14]


In [3]:
# Creating an array with default dtype (float64)
large_float_arr = np.ones((1000, 1000))

# Checking memory usage of default dtype
print("Memory usage of float64 array:", large_float_arr.nbytes)

# Creating an array with dtype float32 to reduce memory usage
large_float32_arr = np.ones((1000, 1000), dtype=np.float32)
print("Memory usage of float32 array:", large_float32_arr.nbytes)

Memory usage of float64 array: 8000000
Memory usage of float32 array: 4000000


### 2. Using NumPy's Built-in Functions for Optimization
NumPy's built-in functions are optimized in C, making them significantly faster than writing Python loops. Whenever possible, avoid explicit loops in Python and rely on vectorized operations.

In [4]:
# Summing elements using a loop
arr = np.arange(1e6)

# Slow method: Python loop
total = 0
for i in arr:
    total += i

# Fast method: Using NumPy's built-in sum function
total_fast = np.sum(arr)
print("Sum using NumPy's built-in function:", total_fast)

Sum using NumPy's built-in function: 499999500000.0


### 3. Profiling and Tuning with np.profiler
Profiling NumPy Code with timeit
You can use the timeit module to profile the execution time of NumPy operations.

In [5]:
import timeit

# Time a slow Python loop for summing
loop_time = timeit.timeit('total = sum(arr)', setup='import numpy as np; arr = np.arange(1e6)', number=100)

# Time the NumPy optimized sum function
numpy_time = timeit.timeit('np.sum(arr)', setup='import numpy as np; arr = np.arange(1e6)', number=100)

print(f"Python loop time: {loop_time:.6f} seconds")
print(f"NumPy sum time: {numpy_time:.6f} seconds")

Python loop time: 6.665754 seconds
NumPy sum time: 0.040848 seconds


### 4. Compiling NumPy Code with Numba
Numba is a just-in-time (JIT) compiler that can significantly accelerate NumPy code by compiling it to machine code. Numba works well for loops and computations involving NumPy arrays, providing performance close to C or Fortran.

In [6]:
from numba import jit
import numpy as np

In [7]:

# Creating a function to perform element-wise multiplication
@jit(nopython=True)  # The nopython mode gives the best performance
def multiply_arrays(a, b):
    result = np.empty_like(a)
    for i in range(len(a)):
        result[i] = a[i] * b[i]
    return result

# Arrays to multiply
arr1 = np.arange(1e6)
arr2 = np.arange(1e6)

# Without Numba (regular Python)
result = multiply_arrays(arr1, arr2)

# Using Numba for JIT compilation
result_numba = multiply_arrays(arr1, arr2)
print("Result after Numba acceleration:", result_numba[:5])


Result after Numba acceleration: [ 0.  1.  4.  9. 16.]


##### Code to Compare Normal NumPy vs Numba Performance:

In [8]:
import timeit

# Custom function with a loop and more logic
@jit(nopython=True)
def custom_numba_function(arr):
    result = np.zeros_like(arr)
    for i in range(len(arr)):
        if arr[i] % 2 == 0:
            result[i] = arr[i] ** 2
        else:
            result[i] = arr[i] ** 3
    return result

# Normal NumPy approach (using vectorized operations where possible)
def custom_numpy_function(arr):
    result = np.zeros_like(arr)
    even_mask = arr % 2 == 0
    result[even_mask] = arr[even_mask] ** 2
    result[~even_mask] = arr[~even_mask] ** 3
    return result

# Test data
arr = np.arange(1e6)

# Timing NumPy version
numpy_time = timeit.timeit('custom_numpy_function(arr)', globals=globals(), number=10)
print(f"Time taken by NumPy: {numpy_time:.6f} seconds")

# Timing Numba-accelerated version
numba_time = timeit.timeit('custom_numba_function(arr)', globals=globals(), number=10)
print(f"Time taken by Numba: {numba_time:.6f} seconds")

# Ensure results are the same
result_numpy = custom_numpy_function(arr)
result_numba = custom_numba_function(arr)
print("Results match:", np.allclose(result_numpy, result_numba))


Time taken by NumPy: 0.575021 seconds
Time taken by Numba: 0.393062 seconds
Results match: True
