# NumPy Arrays vs Python Lists: Performance Comparison

In this notebook, we compare **Python lists** and **NumPy arrays** in terms of **execution time** when performing numerical computations.

Python lists are general-purpose containers, while NumPy arrays are designed specifically for **efficient numerical operations**.  
The goal is to understand **why NumPy is faster** and **when it should be used**.


## What is the Experiment?

We generate two large collections of random integers and perform **element-wise addition**.

Mathematically, we compute:

$$
c_i = a_i + b_i \quad \text{for } i = 1, 2, \dots, N
$$

We perform this operation using:
- Python lists with an explicit `for` loop
- NumPy arrays with vectorized operations

Both approaches compute the **same result**, but the execution time is very different.


## Why Are Python Lists Slow for Numerical Computation?

Python lists:
- Store references to Python objects
- Require a **Python-level loop** for element-wise operations
- Perform type checking at runtime

This means each addition operation is executed **one element at a time** inside the Python interpreter, which introduces significant overhead for large $N$.


## Why Are NumPy Arrays Faster?

NumPy arrays:
- Store data in **contiguous memory**
- Use **homogeneous data types**
- Execute operations in optimized **C and Fortran code**

When we write:

$$
\mathbf{c} = \mathbf{a} + \mathbf{b}
$$

NumPy performs the computation in **compiled code**, avoiding Python loops entirely.  
This technique is called **vectorization**.


## Why This Matters in Machine Learning

Machine learning models frequently work with:
- Vectors and matrices
- Millions (or billions) of numbers

Efficient numerical computation is critical because:
- Training time depends heavily on speed
- NumPy is the foundation of libraries such as:
  - PyTorch
  - TensorFlow
  - Scikit-learn

Understanding NumPy performance helps you write **faster and more scalable ML code**.


## What Should You Observe?

After running the code:
- Both methods produce the same numerical result
- NumPy arrays run **significantly faster**
- The performance gap increases as $N$ becomes larger

This experiment demonstrates **why NumPy is preferred** for numerical and machine learning workloads.


In [5]:
# ============================
# NumPy Arrays vs Python Lists
# Element-wise Addition
# ============================

# ----------------------------
# Step 1: Timing decorator
# ----------------------------
def time_execution(func):
    import time
    def wrapper(*args, **kwargs):
        start = time.time()

        # TODO: execute the function and store its output

        end = time.time()

        # TODO: return both the result and elapsed time
    return wrapper


# ----------------------------
# Step 2: List-based execution
# ----------------------------
@time_execution
def list_execution():
    import random

    # TODO: create list_1 with random integers (size = 10_000_000)
    # TODO: create list_2 with random integers (size = 10_000_000)

    # TODO: compute element-wise addition using zip and list comprehension

    return list_3


# ----------------------------
# Step 3: NumPy-based execution
# ----------------------------
@time_execution
def array_execution():
    import numpy as np

    # TODO: create array_1 using NumPy random integers
    # TODO: create array_2 using NumPy random integers

    # TODO: perform vectorized addition

    return array_3


# ----------------------------
# Step 4: Run and compare
# ----------------------------
# TODO: run list_execution
# TODO: run array_execution

# TODO: print execution times


In [6]:
# ============================
# NumPy Arrays vs Python Lists
# Element-wise Addition
# ============================

# ----------------------------
# Timing decorator
# ----------------------------
def time_execution(func):
    import time
    def wrapper(*args, **kwargs):
        start = time.time()
        result = func(*args, **kwargs)   # Execute function
        end = time.time()
        return result, end - start       # Return result and time
    return wrapper


# ----------------------------
# List-based execution
# ----------------------------
@time_execution
def list_execution():
    import random

    # Generate two Python lists
    list_1 = [random.randint(1, 100) for _ in range(10_000_000)]
    list_2 = [random.randint(1, 100) for _ in range(10_000_000)]

    # Element-wise addition using zip
    list_3 = [a + b for a, b in zip(list_1, list_2)]

    return list_3


# ----------------------------
# NumPy-based execution
# ----------------------------
@time_execution
def array_execution():
    import numpy as np

    # Generate NumPy arrays
    array_1 = np.random.randint(1, 100, size=10_000_000)
    array_2 = np.random.randint(1, 100, size=10_000_000)

    # Vectorized element-wise multiplication
    array_3 = array_1 + array_2

    return array_3


# ----------------------------
# Run and compare
# ----------------------------
list_result, list_time = list_execution()
array_result, array_time = array_execution()

print(f"List execution time: {list_time:.4f} seconds")
print(f"Array execution time: {array_time:.4f} seconds")


List execution time: 7.0967 seconds
Array execution time: 0.1174 seconds
