# NumPy Performance Optimization

**Author:** RSK World  
**Website:** https://rskworld.in  
**Email:** help@rskworld.in  
**Phone:** +91 93305 39277

This notebook covers techniques to optimize NumPy code for better performance, including memory management, choosing the right data types, and avoiding common performance pitfalls.


In [None]:
# Author: RSK World
# Website: https://rskworld.in
# Email: help@rskworld.in
# Phone: +91 93305 39277

import numpy as np
import time
import sys


## 1. Choosing the Right Data Type

Selecting appropriate data types can significantly impact memory usage and performance.


In [None]:
# Author: RSK World
# Website: https://rskworld.in
# Email: help@rskworld.in
# Phone: +91 93305 39277

# Memory usage comparison for different data types
size = 1000000

# Float64 (default, 8 bytes per element)
arr_float64 = np.random.rand(size).astype(np.float64)
print(f"Float64 array:")
print(f"  Size: {arr_float64.nbytes / 1024 / 1024:.2f} MB")
print(f"  Itemsize: {arr_float64.itemsize} bytes")

# Float32 (4 bytes per element, half the memory)
arr_float32 = np.random.rand(size).astype(np.float32)
print(f"\nFloat32 array:")
print(f"  Size: {arr_float32.nbytes / 1024 / 1024:.2f} MB")
print(f"  Itemsize: {arr_float32.itemsize} bytes")
print(f"  Memory saved: {(arr_float64.nbytes - arr_float32.nbytes) / 1024 / 1024:.2f} MB")

# Int32 vs Int64
arr_int64 = np.random.randint(0, 100, size, dtype=np.int64)
arr_int32 = np.random.randint(0, 100, size, dtype=np.int32)
print(f"\nInt64 array: {arr_int64.nbytes / 1024 / 1024:.2f} MB")
print(f"Int32 array: {arr_int32.nbytes / 1024 / 1024:.2f} MB")
print(f"Memory saved: {(arr_int64.nbytes - arr_int32.nbytes) / 1024 / 1024:.2f} MB")


In [None]:
# Author: RSK World
# Website: https://rskworld.in
# Email: help@rskworld.in
# Phone: +91 93305 39277

# Performance comparison: Float64 vs Float32
size = 5000000
arr64 = np.random.rand(size).astype(np.float64)
arr32 = np.random.rand(size).astype(np.float32)

# Float64 operations
start = time.time()
result64 = arr64 ** 2 + arr64 * 3
time64 = time.time() - start

# Float32 operations
start = time.time()
result32 = arr32 ** 2 + arr32 * 3
time32 = time.time() - start

print(f"Array size: {size:,}")
print(f"Float64 time: {time64:.6f} seconds")
print(f"Float32 time: {time32:.6f} seconds")
print(f"Speedup: {time64/time32:.2f}x")
print(f"\nNote: Float32 uses less memory and can be faster on some systems")


## 2. Memory Management: Views vs Copies

Understanding when NumPy creates views (memory-efficient) vs copies (memory-intensive).


In [None]:
# Author: RSK World
# Website: https://rskworld.in
# Email: help@rskworld.in
# Phone: +91 93305 39277

# Views vs Copies
original = np.arange(12).reshape(3, 4)
print("Original array:\n", original)
print("Base:", original.base is None)  # True means it owns the data

# Slicing creates a VIEW (no copy)
view = original[1:, 1:]
print("\nView (slice):\n", view)
print("Base:", view.base is None)  # False means it's a view
print("Same memory:", np.shares_memory(original, view))

# Explicit copy
copy = original.copy()
print("\nCopy:\n", copy)
print("Base:", copy.base is None)  # True means it owns the data
print("Same memory:", np.shares_memory(original, copy))


In [None]:
# Author: RSK World
# Website: https://rskworld.in
# Email: help@rskworld.in
# Phone: +91 93305 39277

# Performance: View vs Copy
size = 10000000
arr = np.random.rand(size)

# Using view (fast, no memory copy)
start = time.time()
view = arr[::2]  # Every other element
view_sum = view.sum()
view_time = time.time() - start

# Using copy (slower, memory copy)
start = time.time()
copy = arr[::2].copy()
copy_sum = copy.sum()
copy_time = time.time() - start

print(f"Array size: {size:,}")
print(f"View operation time: {view_time:.6f} seconds")
print(f"Copy operation time: {copy_time:.6f} seconds")
print(f"Memory used by view: {view.nbytes / 1024 / 1024:.2f} MB")
print(f"Memory used by copy: {copy.nbytes / 1024 / 1024:.2f} MB")
print(f"\nNote: Views are faster and use less memory when you don't need a copy")


## 3. Avoiding Unnecessary Copies

Techniques to avoid creating copies when not needed.


In [None]:
# Author: RSK World
# Website: https://rskworld.in
# Email: help@rskworld.in
# Phone: +91 93305 39277

# Operations that create copies vs views
arr = np.arange(12).reshape(3, 4)

# Reshape creates a VIEW (if possible)
reshaped = arr.reshape(2, 6)
print("Reshape creates view:", np.shares_memory(arr, reshaped))

# Transpose creates a VIEW
transposed = arr.T
print("Transpose creates view:", np.shares_memory(arr, transposed))

# Ravel creates a VIEW (if possible)
raveled = arr.ravel()
print("Ravel creates view:", np.shares_memory(arr, raveled))

# Flatten creates a COPY
flattened = arr.flatten()
print("Flatten creates copy:", not np.shares_memory(arr, flattened))

# Slicing creates a VIEW
sliced = arr[1:, 1:]
print("Slicing creates view:", np.shares_memory(arr, sliced))


In [None]:
# Author: RSK World
# Website: https://rskworld.in
# Email: help@rskworld.in
# Phone: +91 93305 39277

# In-place operations (modify array without creating new one)
arr = np.array([1, 2, 3, 4, 5])
print("Original:", arr)

# In-place addition (faster, less memory)
arr += 10
print("After += 10:", arr)

# Regular operation creates new array
arr = arr + 10  # Creates new array
print("After + 10:", arr)

# In-place multiplication
arr *= 2
print("After *= 2:", arr)


## 4. Vectorization Over Loops

Always prefer vectorized operations over Python loops.


In [None]:
# Author: RSK World
# Website: https://rskworld.in
# Email: help@rskworld.in
# Phone: +91 93305 39277

# Performance comparison: Loop vs Vectorized
size = 1000000
arr = np.random.rand(size)

# Python loop (SLOW)
start = time.time()
result_loop = np.zeros(size)
for i in range(size):
    result_loop[i] = arr[i] ** 2 + arr[i] * 3
loop_time = time.time() - start

# Vectorized (FAST)
start = time.time()
result_vectorized = arr ** 2 + arr * 3
vectorized_time = time.time() - start

print(f"Array size: {size:,}")
print(f"Loop time: {loop_time:.6f} seconds")
print(f"Vectorized time: {vectorized_time:.6f} seconds")
print(f"Speedup: {loop_time/vectorized_time:.1f}x faster")
print(f"\nResults match: {np.allclose(result_loop, result_vectorized)}")


In [None]:
# Author: RSK World
# Website: https://rskworld.in
# Email: help@rskworld.in
# Phone: +91 93305 39277

# Vectorized operations on 2D arrays
matrix = np.random.rand(1000, 1000)

# Vectorized row-wise operations
start = time.time()
row_sums = matrix.sum(axis=1)
row_means = matrix.mean(axis=1)
vectorized_time = time.time() - start

# Loop approach (much slower)
start = time.time()
row_sums_loop = np.zeros(1000)
row_means_loop = np.zeros(1000)
for i in range(1000):
    row_sums_loop[i] = matrix[i, :].sum()
    row_means_loop[i] = matrix[i, :].mean()
loop_time = time.time() - start

print(f"Matrix size: 1000x1000")
print(f"Vectorized time: {vectorized_time:.6f} seconds")
print(f"Loop time: {loop_time:.6f} seconds")
print(f"Speedup: {loop_time/vectorized_time:.1f}x faster")


## 5. Using NumPy's Built-in Functions

NumPy's built-in functions are optimized and faster than custom implementations.


In [None]:
# Author: RSK World
# Website: https://rskworld.in
# Email: help@rskworld.in
# Phone: +91 93305 39277

# Using NumPy's optimized functions
arr = np.random.rand(1000000)

# NumPy's sum (optimized C code)
start = time.time()
np_sum = np.sum(arr)
np_time = time.time() - start

# Python's built-in sum (slower for arrays)
start = time.time()
py_sum = sum(arr)
py_time = time.time() - start

print(f"Array size: {len(arr):,}")
print(f"np.sum() time: {np_time:.6f} seconds")
print(f"Python sum() time: {py_time:.6f} seconds")
print(f"Speedup: {py_time/np_time:.1f}x faster")
print(f"\nResults: np.sum={np_sum:.2f}, sum()={py_sum:.2f}")


In [None]:
# Author: RSK World
# Website: https://rskworld.in
# Email: help@rskworld.in
# Phone: +91 93305 39277

# NumPy's universal functions (ufuncs) are optimized
arr = np.random.rand(1000000)

# NumPy ufuncs
start = time.time()
result1 = np.sqrt(arr)
result2 = np.exp(arr[:1000])  # Using smaller slice to avoid overflow
result3 = np.sin(arr)
ufunc_time = time.time() - start

# Manual implementation (much slower)
start = time.time()
result1_manual = np.array([np.sqrt(x) for x in arr])
result2_manual = np.array([np.exp(x) for x in arr[:1000]])
result3_manual = np.array([np.sin(x) for x in arr])
manual_time = time.time() - start

print(f"Array size: {len(arr):,}")
print(f"NumPy ufuncs time: {ufunc_time:.6f} seconds")
print(f"Manual loop time: {manual_time:.6f} seconds")
print(f"Speedup: {manual_time/ufunc_time:.1f}x faster")


## 6. Memory Layout and Contiguity

Understanding memory layout can help optimize array operations.


In [None]:
# Author: RSK World
# Website: https://rskworld.in
# Email: help@rskworld.in
# Phone: +91 93305 39277

# Contiguous arrays (faster operations)
arr = np.arange(12).reshape(3, 4)
print("Original array:\n", arr)
print("C-contiguous:", arr.flags['C_CONTIGUOUS'])
print("F-contiguous:", arr.flags['F_CONTIGUOUS'])

# Transpose is not C-contiguous
transposed = arr.T
print("\nTransposed array:\n", transposed)
print("C-contiguous:", transposed.flags['C_CONTIGUOUS'])
print("F-contiguous:", transposed.flags['F_CONTIGUOUS'])

# Make it contiguous for better performance
contiguous = np.ascontiguousarray(transposed)
print("\nMade contiguous:")
print("C-contiguous:", contiguous.flags['C_CONTIGUOUS'])


In [None]:
# Author: RSK World
# Website: https://rskworld.in
# Email: help@rskworld.in
# Phone: +91 93305 39277

# Performance: Contiguous vs Non-contiguous
size = 5000
arr = np.random.rand(size, size)

# C-contiguous array
start = time.time()
result1 = arr.sum(axis=1)
time1 = time.time() - start

# Non-contiguous (transposed)
arr_T = arr.T
start = time.time()
result2 = arr_T.sum(axis=0)  # Sum along first axis
time2 = time.time() - start

print(f"Array size: {size}x{size}")
print(f"C-contiguous sum time: {time1:.6f} seconds")
print(f"Non-contiguous sum time: {time2:.6f} seconds")
print(f"Contiguous is {time2/time1:.2f}x faster")


## 7. Pre-allocating Arrays

Pre-allocating arrays is faster than appending.


In [None]:
# Author: RSK World
# Website: https://rskworld.in
# Email: help@rskworld.in
# Phone: +91 93305 39277

# Pre-allocation (FAST)
size = 100000
start = time.time()
preallocated = np.zeros(size)
for i in range(size):
    preallocated[i] = i ** 2
prealloc_time = time.time() - start

# Appending (SLOW - creates new array each time)
start = time.time()
appended = np.array([])
for i in range(size):
    appended = np.append(appended, i ** 2)
append_time = time.time() - start

print(f"Size: {size:,}")
print(f"Pre-allocation time: {prealloc_time:.6f} seconds")
print(f"Appending time: {append_time:.6f} seconds")
print(f"Speedup: {append_time/prealloc_time:.1f}x faster")
print(f"\nNote: Pre-allocation is MUCH faster!")


In [None]:
# Author: RSK World
# Website: https://rskworld.in
# Email: help@rskworld.in
# Phone: +91 93305 39277

# Better: Use vectorized operations instead of loops
size = 100000

# Pre-allocated with loop
start = time.time()
arr1 = np.zeros(size)
for i in range(size):
    arr1[i] = i ** 2
time1 = time.time() - start

# Fully vectorized (BEST)
start = time.time()
arr2 = np.arange(size) ** 2
time2 = time.time() - start

print(f"Size: {size:,}")
print(f"Pre-allocated loop time: {time1:.6f} seconds")
print(f"Vectorized time: {time2:.6f} seconds")
print(f"Vectorized is {time1/time2:.1f}x faster")
print(f"\nResults match: {np.array_equal(arr1, arr2)}")


## 8. Avoiding Python Loops with NumPy Functions

Use NumPy's functions that replace common loop patterns.


In [None]:
# Author: RSK World
# Website: https://rskworld.in
# Email: help@rskworld.in
# Phone: +91 93305 39277

# Instead of loops, use NumPy functions
arr = np.random.rand(1000000)

# Bad: Python loop
start = time.time()
max_val = arr[0]
for val in arr:
    if val > max_val:
        max_val = val
loop_time = time.time() - start

# Good: NumPy function
start = time.time()
max_val_np = np.max(arr)
np_time = time.time() - start

print(f"Array size: {len(arr):,}")
print(f"Loop max time: {loop_time:.6f} seconds")
print(f"np.max() time: {np_time:.6f} seconds")
print(f"Speedup: {loop_time/np_time:.1f}x faster")
print(f"\nResults: loop={max_val:.6f}, np.max={max_val_np:.6f}")


In [None]:
# Author: RSK World
# Website: https://rskworld.in
# Email: help@rskworld.in
# Phone: +91 93305 39277

# Using np.where instead of loops
arr = np.random.rand(1000000)

# Bad: Loop with condition
start = time.time()
result_loop = np.zeros_like(arr)
for i in range(len(arr)):
    if arr[i] > 0.5:
        result_loop[i] = arr[i] * 2
    else:
        result_loop[i] = arr[i] / 2
loop_time = time.time() - start

# Good: Vectorized with np.where
start = time.time()
result_vectorized = np.where(arr > 0.5, arr * 2, arr / 2)
vectorized_time = time.time() - start

print(f"Array size: {len(arr):,}")
print(f"Loop time: {loop_time:.6f} seconds")
print(f"Vectorized time: {vectorized_time:.6f} seconds")
print(f"Speedup: {loop_time/vectorized_time:.1f}x faster")
print(f"\nResults match: {np.allclose(result_loop, result_vectorized)}")


## 9. Using NumPy's Advanced Features

Leveraging NumPy's advanced features for better performance.


In [None]:
# Author: RSK World
# Website: https://rskworld.in
# Email: help@rskworld.in
# Phone: +91 93305 39277

# Using np.einsum for complex operations (can be faster)
A = np.random.rand(100, 50)
B = np.random.rand(50, 80)

# Standard matrix multiplication
start = time.time()
result1 = A @ B
time1 = time.time() - start

# Using einsum (can be optimized by NumPy)
start = time.time()
result2 = np.einsum('ij,jk->ik', A, B)
time2 = time.time() - start

print(f"Matrix A: {A.shape}, Matrix B: {B.shape}")
print(f"Standard @ time: {time1:.6f} seconds")
print(f"einsum time: {time2:.6f} seconds")
print(f"Results match: {np.allclose(result1, result2)}")


In [None]:
# Author: RSK World
# Website: https://rskworld.in
# Email: help@rskworld.in
# Phone: +91 93305 39277

# Using np.fromfunction for array creation
size = 1000

# Standard approach
start = time.time()
arr1 = np.zeros((size, size))
for i in range(size):
    for j in range(size):
        arr1[i, j] = i + j
time1 = time.time() - start

# Using fromfunction (vectorized)
start = time.time()
arr2 = np.fromfunction(lambda i, j: i + j, (size, size), dtype=np.float64)
time2 = time.time() - start

print(f"Array size: {size}x{size}")
print(f"Loop time: {time1:.6f} seconds")
print(f"fromfunction time: {time2:.6f} seconds")
print(f"Speedup: {time1/time2:.1f}x faster")
print(f"\nResults match: {np.allclose(arr1, arr2)}")


## 10. Best Practices Summary

Key takeaways for optimizing NumPy code.


In [None]:
# Author: RSK World
# Website: https://rskworld.in
# Email: help@rskworld.in
# Phone: +91 93305 39277

print("NumPy Performance Optimization Best Practices:")
print("=" * 50)
print("\n1. USE VECTORIZED OPERATIONS")
print("   ✓ arr ** 2 (vectorized)")
print("   ✗ [x**2 for x in arr] (loop)")

print("\n2. CHOOSE APPROPRIATE DATA TYPES")
print("   ✓ Use float32 if precision allows (half memory)")
print("   ✗ Always use float64 (default)")

print("\n3. AVOID UNNECESSARY COPIES")
print("   ✓ Use views when possible (slicing, reshape)")
print("   ✗ Use .copy() only when needed")

print("\n4. USE IN-PLACE OPERATIONS")
print("   ✓ arr += 10 (in-place)")
print("   ✗ arr = arr + 10 (creates new array)")

print("\n5. PRE-ALLOCATE ARRAYS")
print("   ✓ np.zeros(size) then fill")
print("   ✗ np.append() in loops")

print("\n6. USE NUMPY'S BUILT-IN FUNCTIONS")
print("   ✓ np.sum(), np.max(), np.mean()")
print("   ✗ Python's sum(), max(), etc.")

print("\n7. AVOID PYTHON LOOPS")
print("   ✓ Vectorized operations")
print("   ✗ for loops over array elements")

print("\n8. UNDERSTAND MEMORY LAYOUT")
print("   ✓ C-contiguous arrays for row operations")
print("   ✓ F-contiguous arrays for column operations")


## Summary

In this notebook, we learned:
- How to choose appropriate data types for memory and performance
- Understanding views vs copies and when to use each
- The importance of vectorization over Python loops
- Memory layout and contiguity considerations
- Pre-allocation strategies
- Best practices for NumPy performance optimization

**Key Takeaways:**
- Vectorized operations are orders of magnitude faster than loops
- Choosing the right data type can halve memory usage
- Views are memory-efficient; use copies only when necessary
- Pre-allocate arrays instead of appending
- Use NumPy's built-in functions instead of Python equivalents
- Understand memory layout for optimal performance

**Remember:** Profile your code to identify bottlenecks, then optimize!

---

**Author:** RSK World  
**Website:** https://rskworld.in  
**Email:** help@rskworld.in  
**Phone:** +91 93305 39277
