<a href="https://colab.research.google.com/github/saffarizadeh/INSY5378/blob/main/NumPy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<img src="https://kambizsaffari.com/Logo/College_of_Business.cmyk-hz-lg.png" width="500px"/>

# *INSY 5378 - Advanced AI*

# **NumPy Basics**

Instructor: Dr. Kambiz Saffari

---

#Numpy

`numpy` is the core library for scientific computing in Python. It provides a high-performance multidimensional array object, and tools for working with these arrays. If you are already familiar with MATLAB, you might find this tutorial useful to get started with Numpy.

## Arrays
A numpy array is a grid of values, all of the same type, and is indexed by a tuple of nonnegative integers. The number of dimensions is the rank of the array; the shape of an array is a tuple of integers giving the size of the array along each dimension.

We can initialize numpy arrays from nested Python lists, and access elements using square brackets:

In [1]:
import numpy as np

In [2]:
a = np.array([1, 2, 3])   # Create a rank 1 array
print(type(a))            # Prints "<class 'numpy.ndarray'>"
print(a.shape)            # Prints "(3,)"
print(a[0], a[1], a[2])   # Prints "1 2 3"

<class 'numpy.ndarray'>
(3,)
1 2 3


In [3]:
a[0] = 5                  # Change an element of the array
print(a)                  # Prints "[5 2 3]"

[5 2 3]


In [4]:
b = np.array([[1,2,3],[4,5,6]])    # Create a rank 2 array
print(b)
print(b.shape)                     # Prints "(2, 3)"

[[1 2 3]
 [4 5 6]]
(2, 3)


In [5]:
print(b[0, 0], b[0, 1], b[1, 0])   # Prints "1 2 4"

1 2 4


### Array attributes (shape, ndim, dtype)

Deep learning frameworks represent data as **tensors**. In NumPy, tensors are `ndarray`s.  
Before we do more operations, get comfortable inspecting an array:

- `shape`: sizes along each dimension  
- `ndim`: number of dimensions (rank)  
- `size`: total number of elements  
- `dtype`: element type (important for performance and compatibility with DL frameworks)


In [6]:
# Using the earlier array b (shape: 2x3)
print("b =")
print(b)
print("shape:", b.shape)
print("ndim:", b.ndim)
print("size:", b.size)
print("dtype:", b.dtype)

b =
[[1 2 3]
 [4 5 6]]
shape: (2, 3)
ndim: 2
size: 6
dtype: int64


### Array indexing
Numpy offers several ways to index into arrays.

Slicing: Similar to Python lists, numpy arrays can be sliced. Since arrays may be multidimensional, you must specify a slice for each dimension of the array:

Create the following rank 2 array with shape (3, 4)

[[ 1  2  3  4]

 [ 5  6  7  8]

 [ 9 10 11 12]]

In [7]:
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
print(a)

[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]


Use slicing to pull out the subarray consisting of the first 2 rows and columns 1 and 2; b is the following array of shape (2, 2):

[[2 3]

 [6 7]]

In [8]:
b = a[:2, 1:3]

A slice of an array is a view into the same data, so modifying it will modify the original array.

In [9]:
print(a[0, 1])   # Prints "2"

2


In [10]:
b[0, 0] = 77     # b[0, 0] is the same piece of data as a[0, 1]

In [11]:
print(a[0, 1])   # Prints "77"

77


You can select a single row or column of an array too:

In [12]:
row_r1 = a[1, :]    # Rank 1 view of the second row of a
print(row_r1)

[5 6 7 8]


In [13]:
col_r1 = a[:, 2]    # Rank 1 view of the third column of a
print(col_r1)

[ 3  7 11]


Boolean array indexing: Boolean array indexing lets you pick out arbitrary elements of an array. Frequently this type of indexing is used to select the elements of an array that satisfy some condition. Here is an example:

In [14]:
a = np.array([[1,2], [3, 4], [5, 6]])
print(a)

[[1 2]
 [3 4]
 [5 6]]


In [15]:
bool_idx = (a > 2)   # Find the elements of a that are bigger than 2;
                     # this returns a numpy array of Booleans of the same
                     # shape as a, where each slot of bool_idx tells
                     # whether that element of a is > 2.
print(bool_idx)

[[False False]
 [ True  True]
 [ True  True]]


We use boolean array indexing to construct a rank 1 array consisting of the elements of a corresponding to the True values of bool_idx

In [16]:
print(a[bool_idx])

[3 4 5 6]


We can do all of the above in a single concise statement:

In [17]:
print(a[a > 2])     # Prints "[3 4 5 6]"

[3 4 5 6]


## Numpy Datatypes

https://numpy.org/doc/stable/reference/arrays.dtypes.html

Every numpy array is a grid of elements of the same type. Numpy provides a large set of numeric datatypes that you can use to construct arrays. Numpy tries to guess a datatype when you create an array, but functions that construct arrays usually also include an optional argument to explicitly specify the datatype. Here is an example:

In [18]:
x = np.array([1, 2])   # Let numpy choose the datatype
print(x.dtype)         # Prints "int64"

x = np.array([1.0, 2.0])   # Let numpy choose the datatype
print(x.dtype)             # Prints "float64"

x = np.array([1, 250], dtype=np.int64)   # Force a particular datatype
print(x.dtype)

int64
float64
int64


To convert as existing numpy array to a different datatype:

In [19]:
x.astype(np.int8)

array([ 1, -6], dtype=int8)

In [20]:
x.astype(np.uint8)

array([  1, 250], dtype=uint8)

### Why `float32` matters in deep learning

NumPy often defaults to `float64`, but most deep learning work uses **`float32`** (or `float16/bfloat16`) for speed and GPU memory efficiency.  
When you move data into TensorFlow/JAX/PyTorch, you will frequently cast to `float32`.


In [21]:
x64 = np.array([1.0, 2.0, 3.0])          # default float64
x32 = x64.astype(np.float32)               # cast to float32

print("x64 dtype:", x64.dtype, "nbytes:", x64.nbytes)
print("x32 dtype:", x32.dtype, "nbytes:", x32.nbytes)

x64 dtype: float64 nbytes: 24
x32 dtype: float32 nbytes: 12


### NAN

To specify `not a number` value, which is often used to represent missing data, you can use `np.nan`

https://numpy.org/doc/stable/reference/constants.html#numpy.nan

In [22]:
np.nan

nan

## Numpy Methods

### Min and Max
https://numpy.org/doc/stable/reference/generated/numpy.amin.html

https://numpy.org/doc/stable/reference/generated/numpy.amax.html

You can find `min` and `max` values in an array.

In [23]:
b = np.array([12, 4, 1.2, 2.5, 5, 14])

In [24]:
b.min()

np.float64(1.2)

In [25]:
b.max()

np.float64(14.0)

In [26]:
# or you can use np itself
np.amin(b)

np.float64(1.2)

In [27]:
np.amax(b)

np.float64(14.0)

You can also find the index of the `min` and `max` values.

In [28]:
np.argmin(b)

np.int64(2)

In [29]:
np.argmax(b)

np.int64(5)

### Mean and Standard Deviation

https://numpy.org/doc/stable/reference/generated/numpy.mean.html

https://numpy.org/doc/stable/reference/generated/numpy.std.html

In [30]:
c = np.array([[1,2], [3, 4], [5, 6]])
c

array([[1, 2],
       [3, 4],
       [5, 6]])

In [31]:
c.mean()

np.float64(3.5)

In [32]:
c.mean(axis=0) # mean of c along axis 0

array([3., 4.])

In [33]:
c.mean(axis=1) # mean of c along axis 1

array([1.5, 3.5, 5.5])

In [34]:
c.std()

np.float64(1.707825127659933)

In [35]:
c.std(axis=0)

array([1.63299316, 1.63299316])

In [36]:
c.std(axis=1)

array([0.5, 0.5, 0.5])

### arange

https://numpy.org/doc/stable/reference/generated/numpy.arange.html

In [37]:
np.arange(6)

array([0, 1, 2, 3, 4, 5])

### Linspace

https://numpy.org/doc/stable/reference/generated/numpy.linspace.html

In [38]:
np.linspace(0,1,5)

array([0.  , 0.25, 0.5 , 0.75, 1.  ])

In [39]:
np.linspace(0,1,5, endpoint=False)

array([0. , 0.2, 0.4, 0.6, 0.8])

### Repeat

https://numpy.org/doc/stable/reference/generated/numpy.repeat.html

In [40]:
np.repeat(0,5)

array([0, 0, 0, 0, 0])

### Zeros and Ones

https://numpy.org/doc/stable/reference/generated/numpy.zeros.html

https://numpy.org/doc/stable/reference/generated/numpy.zeros_like.html

https://numpy.org/doc/stable/reference/generated/numpy.ones.html

https://numpy.org/doc/stable/reference/generated/numpy.ones_like.html

In [41]:
np.zeros((2,3))

array([[0., 0., 0.],
       [0., 0., 0.]])

In [42]:
np.ones((4,2))

array([[1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.]])

In [43]:
np.zeros_like(a) # returns an array of the same shape as the input array with elements replaced by zero

array([[0, 0],
       [0, 0],
       [0, 0]])

In [44]:
np.ones_like(a) # returns an array of the same shape as the input array with elements replaced by one

array([[1, 1],
       [1, 1],
       [1, 1]])

### Reshape

https://numpy.org/doc/stable/reference/generated/numpy.reshape.html

In [45]:
a = np.arange(6)
a

array([0, 1, 2, 3, 4, 5])

In [46]:
a_reshaped = a.reshape(3,2)
a_reshaped

array([[0, 1],
       [2, 3],
       [4, 5]])

There is a special case of reshaping that is very useful in deep learning: flattening an array to create a long 1-dimensional array

In [47]:
a_reshaped.reshape(1,-1) # one row

array([[0, 1, 2, 3, 4, 5]])

In [48]:
a_reshaped.reshape(-1,1) # one column

array([[0],
       [1],
       [2],
       [3],
       [4],
       [5]])

There is another special case of reshaping that is also useful in deep learning: adding a new axis to an array.

In [49]:
b = np.arange(6).reshape(3,2)
b

array([[0, 1],
       [2, 3],
       [4, 5]])

In [50]:
b.shape

(3, 2)

In [51]:
reshaped_b = b.reshape((1,3,2))
reshaped_b

array([[[0, 1],
        [2, 3],
        [4, 5]]])

In [52]:
reshaped_b.shape

(1, 3, 2)

We can also achieve the same result using the following code:

In [53]:
reshaped_b_2 = b[np.newaxis, :]
reshaped_b_2

array([[[0, 1],
        [2, 3],
        [4, 5]]])

In [54]:
reshaped_b_2.shape

(1, 3, 2)

### Squeeze

https://numpy.org/doc/stable/reference/generated/numpy.squeeze.html

To remove extra dimensions that don't add anything to our array you can use `squeeze`

In [55]:
squeezed_reshaped_b_2 = reshaped_b_2.squeeze()
squeezed_reshaped_b_2

array([[0, 1],
       [2, 3],
       [4, 5]])

In [56]:
squeezed_reshaped_b_2.shape

(3, 2)

### Element-wise Operations

NumPy arrays support element-wise operations, which are fundamental to deep learning computations. Operations are applied to each element independently.

In [57]:
x = np.array([1, 2, 3, 4])
y = np.array([5, 6, 7, 8])

print("Addition:", x + y)
print("Subtraction:", x - y)
print("Multiplication:", x * y)  # Element-wise, NOT matrix multiplication
print("Division:", x / y)
print("Power:", x ** 2)

Addition: [ 6  8 10 12]
Subtraction: [-4 -4 -4 -4]
Multiplication: [ 5 12 21 32]
Division: [0.2        0.33333333 0.42857143 0.5       ]
Power: [ 1  4  9 16]


Scalar operations are broadcast to all elements:

In [58]:
print("Add scalar:", x + 10)
print("Multiply by scalar:", x * 2)
print("Divide by scalar:", x / 2)

Add scalar: [11 12 13 14]
Multiply by scalar: [2 4 6 8]
Divide by scalar: [0.5 1.  1.5 2. ]


### Mathematical Functions

NumPy provides many mathematical functions that are essential for implementing neural network components like activation functions and loss functions.

https://numpy.org/doc/stable/reference/routines.math.html

In [59]:
x = np.array([0, 1, 2, 3])

print("exp (used in softmax, sigmoid):", np.exp(x))
print("log (used in cross-entropy loss):", np.log(np.array([1, 2.718, 7.389])))
print("sqrt:", np.sqrt(np.array([1, 4, 9, 16])))

exp (used in softmax, sigmoid): [ 1.          2.71828183  7.3890561  20.08553692]
log (used in cross-entropy loss): [0.         0.99989632 1.99999241]
sqrt: [1. 2. 3. 4.]


Example: Implementing the sigmoid activation function

In [60]:
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

x = np.array([-2, -1, 0, 1, 2])
print("Sigmoid:", sigmoid(x))

Sigmoid: [0.11920292 0.26894142 0.5        0.73105858 0.88079708]


Example: Implementing the ReLU activation function

In [61]:
def relu(x):
    return np.maximum(0, x)

x = np.array([-2, -1, 0, 1, 2])
print("ReLU:", relu(x))

ReLU: [0 0 0 1 2]


### Broadcasting

Broadcasting is a powerful mechanism that allows NumPy to work with arrays of different shapes during arithmetic operations. This is **critical** for deep learning, as it's how frameworks like TensorFlow, PyTorch, and JAX handle operations between tensors of different dimensions.

https://numpy.org/doc/stable/user/basics.broadcasting.html

**Broadcasting Rules:**
1. If arrays have different numbers of dimensions, the shape of the smaller array is padded with ones on the left
2. Arrays with size 1 along a dimension act as if they have the size of the array with the largest shape along that dimension

In [62]:
# Broadcasting a scalar to an array
a = np.array([[1, 2, 3],
              [4, 5, 6]])
print("Original array:")
print(a)
print("\nAdd 10 to all elements (scalar broadcast):")
print(a + 10)

Original array:
[[1 2 3]
 [4 5 6]]

Add 10 to all elements (scalar broadcast):
[[11 12 13]
 [14 15 16]]


In [63]:
# Broadcasting a 1D array across rows
row_vector = np.array([1, 2, 3])
print("Array shape:", a.shape)
print("Row vector shape:", row_vector.shape)
print("\nAdd row vector to each row:")
print(a + row_vector)

Array shape: (2, 3)
Row vector shape: (3,)

Add row vector to each row:
[[2 4 6]
 [5 7 9]]


In [64]:
# Broadcasting a column vector across columns
col_vector = np.array([[10],
                       [20]])
print("Column vector shape:", col_vector.shape)
print("\nAdd column vector to each column:")
print(a + col_vector)

Column vector shape: (2, 1)

Add column vector to each column:
[[11 12 13]
 [24 25 26]]


**Common deep learning use case:** Subtracting the mean from each feature (column)

In [65]:
# Simulating feature normalization (common in neural networks)
data = np.array([[1, 200, 3000],
                 [2, 300, 4000],
                 [3, 400, 5000]])

# Compute mean of each column (feature)
column_means = data.mean(axis=0)  # shape: (3,)
print("Column means:", column_means)

# Subtract mean from each column (broadcasting!)
centered_data = data - column_means
print("\nCentered data:")
print(centered_data)

Column means: [2.e+00 3.e+02 4.e+03]

Centered data:
[[   -1.  -100. -1000.]
 [    0.     0.     0.]
 [    1.   100.  1000.]]


### Matrix Multiplication

Matrix multiplication is the core operation in neural networks. Every layer in a neural network performs matrix multiplication between inputs and weights.

https://numpy.org/doc/stable/reference/generated/numpy.matmul.html

In [66]:
A = np.array([[1, 2],
              [3, 4],
              [5, 6]])  # Shape: (3, 2)

B = np.array([[1, 2, 3],
              [4, 5, 6]])  # Shape: (2, 3)

# For 2D arrays, these are equivalent
print("Using @ operator (recommended):")
print(A @ B)
print("\nUsing np.matmul:")
print(np.matmul(A, B))
print("\nUsing np.dot:")
print(np.dot(A, B))

Using @ operator (recommended):
[[ 9 12 15]
 [19 26 33]
 [29 40 51]]

Using np.matmul:
[[ 9 12 15]
 [19 26 33]
 [29 40 51]]

Using np.dot:
[[ 9 12 15]
 [19 26 33]
 [29 40 51]]


**`np.dot()` vs `np.matmul()` / `@` — Important Differences**

For 1D and 2D arrays, `np.dot()` and `np.matmul()`/`@` behave identically. However, they differ in two key ways:

1. **Scalars:** `np.dot()` allows scalar multiplication, but `np.matmul()`/`@` does not.
2. **Higher-dimensional arrays (3D+):** `np.matmul()`/`@` treats them as stacks of matrices and broadcasts, while `np.dot()` computes a sum-product over the last axis of the first array and the second-to-last of the second.

**Recommendation:** Use `@` or `np.matmul()` for matrix multiplication in deep learning code — it's clearer and matches the behavior of frameworks like PyTorch and TensorFlow.

In [67]:
# Difference 1: Scalars
print("np.dot with scalar:", np.dot(3, 4))  # Works: returns 12
# np.matmul(3, 4)  # Would raise an error: "matmul: Input operand 0 does not have enough dimensions"

# Difference 2: Higher-dimensional arrays (batch of matrices)
# Create two batches of 2x2 matrices (shape: 2, 2, 2)
batch_A = np.array([[[1, 2], [3, 4]],
                    [[5, 6], [7, 8]]])  # 2 matrices

batch_B = np.array([[[1, 0], [0, 1]],
                    [[1, 1], [1, 1]]])  # 2 matrices

print("\nbatch_A shape:", batch_A.shape)
print("batch_B shape:", batch_B.shape)

# matmul/@ treats these as stacks of matrices, multiplying corresponding pairs
print("\nnp.matmul (batch matrix multiply) shape:", np.matmul(batch_A, batch_B).shape)
print(np.matmul(batch_A, batch_B))

# np.dot produces a different result with a different shape!
print("\nnp.dot shape:", np.dot(batch_A, batch_B).shape)
print("(np.dot computes sum-product over last/second-to-last axes — usually not what you want for batched matmul)")

np.dot with scalar: 12

batch_A shape: (2, 2, 2)
batch_B shape: (2, 2, 2)

np.matmul (batch matrix multiply) shape: (2, 2, 2)
[[[ 1  2]
  [ 3  4]]

 [[11 11]
  [15 15]]]

np.dot shape: (2, 2, 2, 2)
(np.dot computes sum-product over last/second-to-last axes — usually not what you want for batched matmul)


**Important:** Don't confuse `*` (element-wise) with `@` (matrix multiplication)!

In [68]:
X = np.array([[1, 2],
              [3, 4]])
Y = np.array([[5, 6],
              [7, 8]])

print("Element-wise multiplication (X * Y):")
print(X * Y)
print("\nMatrix multiplication (X @ Y):")
print(X @ Y)

Element-wise multiplication (X * Y):
[[ 5 12]
 [21 32]]

Matrix multiplication (X @ Y):
[[19 22]
 [43 50]]


**Neural network layer example:** A single dense/fully-connected layer computes `output = input @ weights + bias`

In [69]:
# Simulating a dense layer with 3 inputs and 2 outputs
# Batch of 4 samples, each with 3 features
inputs = np.array([[0.1, 0.2, 0.3],
                   [0.4, 0.5, 0.6],
                   [0.7, 0.8, 0.9],
                   [1.0, 1.1, 1.2]])  # Shape: (4, 3)

weights = np.array([[0.1, 0.2],
                    [0.3, 0.4],
                    [0.5, 0.6]])  # Shape: (3, 2)

bias = np.array([0.1, 0.2])  # Shape: (2,)

# Forward pass of a dense layer
output = inputs @ weights + bias  # Broadcasting adds bias to each row
print("Input shape:", inputs.shape)
print("Weight shape:", weights.shape)
print("Output shape:", output.shape)
print("\nOutput:")
print(output)

Input shape: (4, 3)
Weight shape: (3, 2)
Output shape: (4, 2)

Output:
[[0.32 0.48]
 [0.59 0.84]
 [0.86 1.2 ]
 [1.13 1.56]]


### Transpose

Transposing arrays (swapping dimensions) is frequently needed when working with neural networks, especially when dealing with batch dimensions.

https://numpy.org/doc/stable/reference/generated/numpy.transpose.html

In [70]:
A = np.array([[1, 2, 3],
              [4, 5, 6]])
print("Original shape:", A.shape)
print(A)

print("\nTransposed (using .T):")
print(A.T)
print("Transposed shape:", A.T.shape)

Original shape: (2, 3)
[[1 2 3]
 [4 5 6]]

Transposed (using .T):
[[1 4]
 [2 5]
 [3 6]]
Transposed shape: (3, 2)


For higher-dimensional arrays, you can specify the axis order:

In [71]:
# 3D array: (batch, height, width)
images = np.arange(24).reshape(2, 3, 4)
print("Original shape (batch, height, width):", images.shape)

# Reorder to (batch, width, height)
reordered = np.transpose(images, (0, 2, 1))
print("Reordered shape (batch, width, height):", reordered.shape)

Original shape (batch, height, width): (2, 3, 4)
Reordered shape (batch, width, height): (2, 4, 3)


### Sum and Reductions with Axis

Like `mean` and `std`, the `sum` function (and other reductions) can operate along specific axes. This is essential for operations like computing batch statistics.

https://numpy.org/doc/stable/reference/generated/numpy.sum.html

In [72]:
a = np.array([[1, 2, 3],
              [4, 5, 6]])
print("Array:")
print(a)
print("\nSum of all elements:", np.sum(a))
print("Sum along axis 0 (columns):", np.sum(a, axis=0))
print("Sum along axis 1 (rows):", np.sum(a, axis=1))

Array:
[[1 2 3]
 [4 5 6]]

Sum of all elements: 21
Sum along axis 0 (columns): [5 7 9]
Sum along axis 1 (rows): [ 6 15]


**keepdims parameter:** Keeps the reduced dimension as size 1, which is useful for broadcasting

In [73]:
print("Sum axis=1 without keepdims:", np.sum(a, axis=1).shape)
print(np.sum(a, axis=1))

print("\nSum axis=1 with keepdims=True:", np.sum(a, axis=1, keepdims=True).shape)
print(np.sum(a, axis=1, keepdims=True))

Sum axis=1 without keepdims: (2,)
[ 6 15]

Sum axis=1 with keepdims=True: (2, 1)
[[ 6]
 [15]]


**Example:** Implementing softmax (used in classification outputs)

In [74]:
def softmax(x):
    # Subtract max for numerical stability
    exp_x = np.exp(x - np.max(x, axis=-1, keepdims=True))
    return exp_x / np.sum(exp_x, axis=-1, keepdims=True)

logits = np.array([[1.0, 2.0, 3.0],
                   [1.0, 1.0, 1.0]])
probs = softmax(logits)
print("Logits:")
print(logits)
print("\nSoftmax probabilities:")
print(probs)
print("\nSum of each row (should be 1.0):", np.sum(probs, axis=1))

Logits:
[[1. 2. 3.]
 [1. 1. 1.]]

Softmax probabilities:
[[0.09003057 0.24472847 0.66524096]
 [0.33333333 0.33333333 0.33333333]]

Sum of each row (should be 1.0): [1. 1.]


### Concatenate and Stack

Combining arrays is common when building datasets or merging outputs from multiple model branches.

https://numpy.org/doc/stable/reference/generated/numpy.concatenate.html

https://numpy.org/doc/stable/reference/generated/numpy.stack.html

In [75]:
a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6], [7, 8]])

# Concatenate along existing axis
print("Concatenate along axis 0 (stack vertically):")
print(np.concatenate([a, b], axis=0))

print("\nConcatenate along axis 1 (stack horizontally):")
print(np.concatenate([a, b], axis=1))

Concatenate along axis 0 (stack vertically):
[[1 2]
 [3 4]
 [5 6]
 [7 8]]

Concatenate along axis 1 (stack horizontally):
[[1 2 5 6]
 [3 4 7 8]]


In [76]:
# Convenient shortcuts
print("vstack (vertical stack):")
print(np.vstack([a, b]))

print("\nhstack (horizontal stack):")
print(np.hstack([a, b]))

vstack (vertical stack):
[[1 2]
 [3 4]
 [5 6]
 [7 8]]

hstack (horizontal stack):
[[1 2 5 6]
 [3 4 7 8]]


`np.stack` creates a **new** axis, unlike `concatenate` which joins along an existing axis:

In [77]:
# Stack creates a new dimension
print("Original shapes: a=", a.shape, "b=", b.shape)
print("\nstack along axis 0:")
stacked = np.stack([a, b], axis=0)
print(stacked)
print("Stacked shape:", stacked.shape)

Original shapes: a= (2, 2) b= (2, 2)

stack along axis 0:
[[[1 2]
  [3 4]]

 [[5 6]
  [7 8]]]
Stacked shape: (2, 2, 2)


### Flatten and Ravel

Converting multidimensional arrays to 1D is needed when feeding data to fully connected layers.

https://numpy.org/doc/stable/reference/generated/numpy.ndarray.flatten.html

https://numpy.org/doc/stable/reference/generated/numpy.ravel.html

In [78]:
a = np.array([[1, 2, 3],
              [4, 5, 6]])
print("Original shape:", a.shape)
print(a)

print("\nFlattened:")
print(a.flatten())
print("Shape:", a.flatten().shape)

Original shape: (2, 3)
[[1 2 3]
 [4 5 6]]

Flattened:
[1 2 3 4 5 6]
Shape: (6,)


`ravel()` is similar but returns a view when possible (more memory efficient), while `flatten()` always returns a copy:

In [79]:
print("Using ravel():")
print(a.ravel())

Using ravel():
[1 2 3 4 5 6]


### expand_dims

An alternative to `np.newaxis` for adding dimensions. This is commonly used to add batch dimensions.

https://numpy.org/doc/stable/reference/generated/numpy.expand_dims.html

In [80]:
# A single image (height, width)
single_image = np.zeros((28, 28))
print("Single image shape:", single_image.shape)

# Add batch dimension at the front
batched = np.expand_dims(single_image, axis=0)
print("With batch dimension:", batched.shape)

# Add channel dimension at the end
with_channel = np.expand_dims(batched, axis=-1)
print("With channel dimension:", with_channel.shape)  # (batch, height, width, channels)

Single image shape: (28, 28)
With batch dimension: (1, 28, 28)
With channel dimension: (1, 28, 28, 1)


### Clip

Constraining values to a range is useful for gradient clipping and ensuring numerical stability.

https://numpy.org/doc/stable/reference/generated/numpy.clip.html

In [81]:
a = np.array([-3, -1, 0, 1, 5, 10])
print("Original:", a)
print("Clipped to [0, 5]:", np.clip(a, 0, 5))
print("Clipped to [None, 2] (only max):", np.clip(a, None, 2))

Original: [-3 -1  0  1  5 10]
Clipped to [0, 5]: [0 0 0 1 5 5]
Clipped to [None, 2] (only max): [-3 -1  0  1  2  2]


**Example:** Preventing log(0) errors in cross-entropy loss

In [82]:
# Predictions that might include 0 or 1
predictions = np.array([0.0, 0.5, 1.0])

# Clip to avoid log(0) = -inf
epsilon = 1e-7
safe_predictions = np.clip(predictions, epsilon, 1 - epsilon)
print("Safe predictions:", safe_predictions)
print("Log of safe predictions:", np.log(safe_predictions))

Safe predictions: [1.000000e-07 5.000000e-01 9.999999e-01]
Log of safe predictions: [-1.61180957e+01 -6.93147181e-01 -1.00000005e-07]


### Where (Conditional Selection)

`np.where` selects elements based on a condition, useful for implementing piecewise functions.

https://numpy.org/doc/stable/reference/generated/numpy.where.html

In [83]:
a = np.array([1, -2, 3, -4, 5])

# Replace negative values with 0 (like ReLU)
result = np.where(a > 0, a, 0)
print("Original:", a)
print("After np.where(a > 0, a, 0):", result)

Original: [ 1 -2  3 -4  5]
After np.where(a > 0, a, 0): [1 0 3 0 5]


In [84]:
# np.where can also return indices where condition is True
indices = np.where(a > 0)
print("\nIndices where a > 0:", indices)
print("Values at those indices:", a[indices])


Indices where a > 0: (array([0, 2, 4]),)
Values at those indices: [1 3 5]


### Argmax / Argmin (very common in classification)

In classification, you often turn a vector of class scores/probabilities into a predicted class index using `argmax`.


In [85]:
probs = np.array([0.1, 0.7, 0.2])
pred_class = np.argmax(probs)
print("probs:", probs)
print("predicted class index:", pred_class)

# With a batch of samples (rows)
batch_probs = np.array([[0.1, 0.7, 0.2],
                        [0.8, 0.1, 0.1]])
print("\nBatch predicted classes:", np.argmax(batch_probs, axis=1))

probs: [0.1 0.7 0.2]
predicted class index: 1

Batch predicted classes: [1 0]


### Sorting, unique values, and counting

These are handy when cleaning datasets, building vocabularies, or analyzing label distributions.


In [86]:
labels = np.array([3, 1, 2, 3, 2, 2, 5])
print("labels:", labels)

print("sorted:", np.sort(labels))
print("unique:", np.unique(labels))

# unique with counts (label distribution)
uniq, counts = np.unique(labels, return_counts=True)
print("counts:", dict(zip(uniq, counts)))

labels: [3 1 2 3 2 2 5]
sorted: [1 2 2 2 3 3 5]
unique: [1 2 3 5]
counts: {np.int64(1): np.int64(1), np.int64(2): np.int64(3), np.int64(3): np.int64(2), np.int64(5): np.int64(1)}


### One-hot encoding (quick NumPy version)

Deep learning libraries often have one-hot utilities, but it's useful to understand the idea.


In [87]:
num_classes = 4
y = np.array([0, 2, 1, 3])   # class indices

one_hot = np.eye(num_classes, dtype=np.float32)[y]
print("y:", y)
print("one_hot shape:", one_hot.shape)
print(one_hot)

y: [0 2 1 3]
one_hot shape: (4, 4)
[[1. 0. 0. 0.]
 [0. 0. 1. 0.]
 [0. 1. 0. 0.]
 [0. 0. 0. 1.]]


### Integer (fancy) indexing

Using integer arrays/lists to select specific rows/columns is common when creating mini-batches or shuffling data.


In [88]:
X = np.arange(20).reshape(10, 2)  # 10 samples, 2 features
idx = np.array([0, 3, 7])                          # pick a mini-batch
batch = X[idx]

print("X shape:", X.shape)
print("idx:", idx)
print("batch shape:", batch.shape)
print(batch)

X shape: (10, 2)
idx: [0 3 7]
batch shape: (3, 2)
[[ 0  1]
 [ 6  7]
 [14 15]]


### Copy vs View

Understanding when NumPy creates a view (shares memory) vs a copy is important for avoiding unexpected bugs.

https://numpy.org/doc/stable/user/basics.copies.html

In [89]:
original = np.array([1, 2, 3, 4, 5])

# Slicing creates a VIEW (shares memory)
view = original[1:4]
view[0] = 99
print("After modifying view:")
print("View:", view)
print("Original:", original)  # Original is also changed!

After modifying view:
View: [99  3  4]
Original: [ 1 99  3  4  5]


In [90]:
original = np.array([1, 2, 3, 4, 5])

# .copy() creates an independent COPY
copied = original[1:4].copy()
copied[0] = 99
print("After modifying copy:")
print("Copy:", copied)
print("Original:", original)  # Original is unchanged

After modifying copy:
Copy: [99  3  4]
Original: [1 2 3 4 5]


**Tip:** Use `.copy()` when you need to modify a slice without affecting the original array.

### Random and Distributions

You can use numpy to generate random numbers.

https://numpy.org/doc/stable/reference/random/index.html

**Recommended (modern) API:** `np.random.default_rng`

For reproducibility in experiments (and to match what you will do in ML), prefer the new `Generator` API and set a seed.


In [91]:
rng = np.random.default_rng(seed=42)

print("Uniform [0,1):", rng.random())
print("Integers 0..9:", rng.integers(0, 10, size=5))
print("Normal samples:", rng.normal(loc=0.0, scale=1.0, size=(2, 3)))

Uniform [0,1): 0.7739560485559633
Integers 0..9: [6 4 4 8 0]
Normal samples: [[-1.95103519 -1.30217951  0.1278404 ]
 [-0.31624259 -0.01680116 -0.85304393]]


The older `np.random.*` functions still work and are common in older code, so it's good to recognize them:

In [92]:
np.random.random()

0.8538765834196566

In [93]:
np.random.rand()

0.4096533194781158

In [94]:
np.random.rand()

0.059825464679002405

In [95]:
np.random.randint(1,10)

7

In [96]:
b = np.array([12, 4, 1.2, 2.5, 5, 14])
np.random.choice(b)

np.float64(4.0)

You can specify the number of draws from a list.

In [97]:
np.random.choice(b, 2)

array([12. ,  2.5])

You can assign a probability to each item in a list.

In [98]:
np.random.choice([12, 4, 1.2, 2.5, 5, 14], p=[0.1, 0.1, 0.1, 0.5, 0.05, 0.15])

np.float64(2.5)

#### Draw from `normal` distribution

Check this tool to learn more about this distribution: https://homepage.divms.uiowa.edu/~mbognar/applets/normal.html

In [99]:
np.random.normal()

0.6108883201231362

#### Draw from `binomial` distribution

Check this tool to learn more about this distribution: https://homepage.divms.uiowa.edu/~mbognar/applets/binnormal.html

`n`: number of trials

`p`: probability of success in one trial

In [100]:
np.random.binomial(2000, 0.2)

406