# NumPy Tutorial

Author: Ally Chen

This is an introductory tutorial for NumPy, the core Python library for fast numerical computing.


## What is NumPy?

NumPy provides high-performance, multidimensional arrays and vectorized operations that replace slow Python loops. Tools like pandas, SciPy, scikit-learn, and PyTorch, rely on NumPy under the hood. The result is **cleaner code that runs much faster with far less looping**. By the end of this tutorial, you’ll have the skills necessary to perform the following:

1. Represent data as vectors and matrices.
2. Create and inspect arrays.
3. Index and slice arrays.
4. Perform element-wise operations and broadcasting.
5. Reshape & copy data.


## NumPy Installation

Run the following cell. If it errors, install by running `pip install numpy` in your terminal.


In [None]:
import numpy as np

## Multi-Dimensional Arrays


**_Why multi-dimensional arrays for machine learning?_**

Multi-dimensional arrays are the backbone of NumPy. Most real machine learning data isn’t flat. A single example might be a list of features (measurements used to make predictions), a picture (a grid of pixels), or a short sequence over time, and we usually process **many** examples at once, in what we call "batches". A multi-dimensional array lets us keep each "dimension" of the data separate (rows, columns, time, color channels, batch size) and run fast operations without Python loops

**_What do multi-dimensional arrays look like?_**

- A 1D array is like a simple list `[1, 2, 3]`.
- A 2D array is like a table or matrix (rows × columns).
- A 3D array is like a stack of tables.
- Higher dimensions are possible, but most people stop at 2D or 3D for practical work.

_NumPy arrays are called `ndarrays` (“n-dimensional arrays”) because they can have any number of dimensions._
The more dimensions an array has, the trickier they are to understand.


**_When do you use 1D arrays?_**

Use a 1D array when your data is one list of values, such as a single feature vector (measurements used to make predictions) or a list of labels (what a model is trying to predict). If you start stacking many of these together (many samples or many time steps), move up to 2D or higher.


In [None]:
# 1D array/vector
x = np.array([5, 10, 15])
print(x)

**_When do you use 2D arrays?_**

Use a 2D array when your data naturally forms a table. In machine learning this is the standard “samples × features” layout where each row is one example and each column is one feature. A 2D array can also represent a single grayscale image or any matrix. If you find yourself juggling many 1D arrays, switch to a 2D array so rows and columns are explicit and easy to index.


In [None]:
# 2D array (matrix), with 2 rows and 3 columns
arr2 = np.array([[1, 2, 3], [4, 5, 6]])
print(arr2)

**_When do you use 3D arrays?_**

Use a 3D array when there’s one more “direction” to keep track of, like color channels, time steps, or batches of 2D items. A single color image is naturally 3D (height × width × channels). Channels are the color components stored for each pixel; for an RGB image there are three channels, red, green, and blue.


Think of a 3D array as a stack of 2D tables (matrices). Each table has rows and columns (2D), when you stack multiple tables on top of eachother, another dimension is added.


In [None]:
# A 3D array: 2 "tables", each with 3 rows and 4 columns.
arr3 = np.array(
    [
        # Table 1
        [[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]],
        # Table 2
        [[13, 14, 15, 16], [17, 18, 19, 20], [21, 22, 23, 24]],
    ]
)
print(arr3)

## Common Arrays:

An handy feature of NumPy is that it can create arrays for you, instead of you having to typing out every element individually. Here's some of the most common ones:

1. `np.array([...])`

2. **Constant arrays:**

   - `np.zeros()`
   - `np.ones()`
   - `np.full()`

3. `np.arange(start, stop, step)`
4. `np.linspace(start, stop, num)`


#### **np.array()**

This creates an ndarray from a Python list, as we've seen when learning multi-dimensional arrays.


In [None]:
# 1D
a = np.array([1, 2, 3])

# 2D
b = np.array([[1, 2, 3], [4, 5, 6]])

# 3D
c = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])

#### **Constant Arrays (zeros, ones, full)**

Constant arrays are super useful for quickly creating arrays filled with zeros or ones of any shape you want. When you prepare data or build simple baselines, you often need an array that starts at a known constant value, then you fill or update it later. NumPy makes this simple: use `np.zeros(shape)` to allocate an array of zeros; `np.ones(shape)` works the same way but fills with ones; and `np.full(shape, value)` lets you choose any constant. The API is identical across all three, only the fill value changes.


**_How do we create these arrays?_**

The structure for this is `np.zeros(shape)`, where `shape` is the dimensions of the array you want NumPy to build. The same applies for `np.ones` and `np.full`.


In [None]:
# 1D array of 5 zeros
a = np.zeros(5)

# 2D array (2 rows, 3 columns)
b = np.ones((2, 3))

# 3D array (2 tables, 3 rows, 4 columns)
c = np.full((2, 3, 4), 7)

print("a:\n", a)
print("b:\n", b)
print("c:\n", c)

#### **Exercise:** Creating Multi-Dimensional & Constant Number Arrays


Try it out by completing the following tasks.

**Task:**

1. Create a `5x5` array of all zeros.

2. Create a `2x3` array of all ones.

3. Create a `2x2x2` array filled with 5.

4. Create a 3D array with 2 tables.

- The first table should start at 100
- The second table should start at 200
- Both tables should have a shape of (2, 3)
- Both tables should increment by 1 as you go across the rows

Use the starter code provided below:


In [None]:
# 1. Create a 5x5 array of zeros
a = ___
print("a:", a)

In [None]:
# 2. Create a 4x3 array of ones
b = ___
print("b:", b)

In [None]:
# 3. Create a 3x3x3 array filled with 5
c = ___
print("c:", c)

In [None]:
# 4. 2 layers, each 2x3
d = ___
print("d:", d)

**SOLUTION**

**_Pause here and try it yourself before scrolling down._**


In [None]:
# 1.
a = np.zeros((5, 5))

# 2.
b = np.ones((4, 3))

# 3.
c = np.full((3, 3, 3), 5)

# 4.
d = np.array(
    [
        [[100, 101, 102], [103, 104, 105]],
        [[200, 201, 202], [203, 204, 205]],
    ]
)

#### **np.arange()**

Now lets move onto creating sequences!

`np.arange(start, stop, step)` creates values by stepping from the start up to, but not including, stop. It’s most natural when you care about the step size, like “give me every 2nd number from 0 to 10,” which would be `np.arange(0, 10, 2)` → `0, 2, 4, 6, 8`. It’s the fastest way to make arrays of evenly spaced numbers without typing them out manually. They also work with any step size, you can make arrays that skip numbers or even count backwards.

- `start` &rarr; where the sequence begins (default = 0).
- `stop` &rarr; where the sequence ends (no default).
- `step` &rarr; how much to increase each time (default = 1).


In [None]:
# Sequence:
# 0 up to (but not including) 100.
a = np.arange(100)

# Sequence by any step size:
# Start = 6, stop = 15 (ends at 12), step = 3.
b = np.arange(6, 15, 3)

# Counting down:
# De-increment by 1.
c = np.arange(5, -5, -1)

print("a:", a)
print("b:", b)
print("c:", c)

#### **np.linspace()**

`np.linspace(start, stop, num)` creates exactly `num` evenly spaced values between start and stop. It’s the right choice when you care about how many points you get, like when generating points for the x-axis of a plot.

- `start` &rarr; where the sequence begins (inclusive).
- `stop` &rarr; where the sequence ends (inclusive, but can be changed).
- `num` &rarr; how many elements to generate (default = 50).


In [None]:
# Split the line from 0 to 10 into 5 equal parts
arr = np.linspace(0, 10, 5)
print(arr)

**_When should you use `.arange` vs `linspace`?_**

- Use `.arange` when you know the **step size** you want.
- Use `.linspace` when you know the **number of points** you want.


## Inspecting Arrays


This section will look at the metadata of an array (not just the numbers inside it). That way, when working with bigger data later, you know how to read and deal with the information.

**When you create arrays, its important to know:**

- `.shape` &rarr; the size in each dimension (rows, cols, …)
- `.ndim` &rarr; number of dimensions (1D vector, 2D matrix, etc.)
- `.size` &rarr; total number of elements in array,
- `.dtype` &rarr; type of values inside the array (integers, floats, etc.)


**_Why is `shape` important?_**

When NumPy code doesn’t behave the way you expect, the first thing to check is the array’s `shape`. The shape attribute shows the structure of an array, meaning how many rows, columns, or higher dimensions it has. This is important because NumPy operations often require arrays to have matching or compatible shapes. Most NumPy errors come from shape mismatches.


In [None]:
# 2 rows, 3 columns
arr2d = np.zeros((2, 3))
print("2D Shape:", arr2d.shape)

**How do we read the shape of a 3D array?**


In [None]:
arr3d = np.array(
    [[[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]], [[13, 14, 15, 16], [17, 18, 19, 20], [21, 22, 23, 24]]]
)
print(arr3d.shape)

In this 3D array:

- 2 &rarr; shows the number of "tables" (or layers in the stack).
- 3 &rarr; shows the number of rows in each table.
- 4 &rarr; shows the number of columns in each table.


**A Key Distinction: Vector `(N,)` vs Column vector `(N, 1)`**

A vector has shape `(N,)`. It has one axis and is a row. Meanwhile, a column vector has shape `(N, 1)`. It has two axes and is a column. They print similarly but behave differently. This distinction matters and is a common reason for errors in numpy code.


In [None]:
vector = np.zeros((5,))
column_vector = np.zeros((5, 1))

print("Vector shape: ", vector.shape)
print(vector)
print()
print("Column vector shape: ", column_vector.shape)
print(column_vector)

**_Why is `size` important?_**

The `size` attribute tells you the total number of elements inside the array. This is useful for double-checking that you created the expected amount of data and for estimating how much memory the array might use.


In [None]:
arr = np.ones((2, 3))
print(arr.size)

**_Why is `ndim` important?_**

The `ndim` attribute shows how many dimensions the array has. Knowing the number of dimensions helps you understand the structure of your data.


In [None]:
arr = np.ones((3, 2, 3))
print(arr.ndim)

**_Why is `dtype` important?_**

The `dtype` attribute tells you what kind of values are stored in the array, such as integers, floating-point numbers, or booleans. Names like `int8`, `int32`, `float32`, `float64` tell you how many bits are used per element. A larger number means they take up more memory, but have more representational power.


In [None]:
print(arr.dtype)

## Indexing & Slicing


**_Why is indexing useful?_**

Indexing is important because real-world datasets are often very large. Being able to directly access a specific value, or a whole row, at once without looping through the entire array makes NumPy code faster and simpler. Instead of working with the whole dataset, you can zoom in on specific values or smaller parts.

**_How do we use indexing?_**

In a 1D array, indexing works the same as a list:


In [None]:
arr = np.array([10, 20, 30])
print(arr[0])
print(arr[2])

In a 2D array you index with `arr[row, column]`: the first number moves down the rows and the second moves across the columns.


In [None]:
arr2d = np.array([[1, 2, 3], [4, 5, 6]])
print(arr2d[0, 1])  # row 0, col 1
print(arr2d[1, 2])  # row 1, col 2


In 3D arrays you need three indices: `arr[table, row, column]`. Indexing 3D arrays is similar to indexing 2D arrays. The first index selects the table, the second moves down the rows within that table, and the third moves across the columns.


In [None]:
arr3d = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])

# selects the first table’s first row and first column
print(arr3d[0, 0, 0])

# selects the first table’s second row and third column
print(arr3d[0, 1, 2])

# picks the second table’s first row and second column
print(arr3d[1, 0, 1])

# picks the second table’s second row and third column
print(arr3d[1, 1, 2])

**_What is Slicing?_**

Slicing lets you take a sub-array along any dimension without writing loops. You can use it to grab a whole row or column, crop an image, or split a dataset by ranges. It’s faster than looping in plain Python because NumPy does the work in optimized C.

The basic pattern is:
`array[start:stop:step]`

- start &rarr; where to begin (index number, default = 0)
- stop &rarr; where to end (but it does not include this index)
- step &rarr; how much to skip each time (default = 1)

**Let's look at how we apply this:**

In 1D, a slice is `a[start:stop:step]`. It returns a new view that walks from `start` up to but not including `stop`, jumping by `step`. This is the fastest way to take ranges or every n-th element without loops. Removing a value reverts it to that values default.


In [None]:
a = np.array([10, 20, 30, 40, 50])

# elements at indices 1,2,3
print(a[1:4])

# first 3 elements
print(a[:3])

# last 3 elements
print(a[2:])

# every 2nd element
print(a[::2])

In 2D, slicing uses two parts: `arr[row_slice, col_slice]`. The first piece chooses rows; the second piece chooses columns.

A lone `:` means “take everything” along that axis. This lets you grab full rows or columns, crop a sub-table, or keep every other column with no loops required.


In [None]:
b = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

# rows 0–1, columns 1–2
print(b[0:2, 1:3])

# all rows, first column
print(b[:, 0])

# second row, all columns
print(b[1, :])


A 3D array is the same, with one more axis: `arr[table_slice, row_slice, col_slice]`


In [None]:
c = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])

# all tables, all rows, first two columns
print(c[:, :, 0:2])

# first table, all rows, column 1
print(c[0, :, 1])

# second table, all rows, first two columns
print(c[1, :, :2])


#### **Exercise:** Indexing & Slicing 3D Arrays


Using the array provided below, complete the following tasks:


**Task**

1. Use indexing to get the element in the second table, second row, third column.
2. Use slicing to print the entire first table.
3. Use slicing to print the last row from each table.
4. Use slicing to print the second column from each table.
5. Combine indexing and slicing to print the last 2 rows and last 2 columns from both tables.


In [None]:
arr3d = np.array([[[1, 2, 3], [4, 5, 6], [7, 8, 9]], [[10, 11, 12], [13, 14, 15], [16, 17, 18]]])

In [None]:
# 1. Use indexing to get the element in the second table, second row, third column.
a = __
print(a)

In [None]:
# 2. Use slicing to print the entire first table.
b = __
print(b)

In [None]:
# 3. Use slicing to print the last row from each table.
c = __
print(c)

In [None]:
# 4. Use slicing to print the second column from each table.
d = __
print(d)

In [None]:
# 5. Combine indexing and slicing to print the last 2 rows and last 2 columns from both tables.
e = __
print(e)

**SOLUTION**

**_Pause here and try it yourself before scrolling down._**


In [None]:
# 1. Specific element (table 1, row 1, col 2)
print("1.")
print(arr3d[1, 1, 2])

# 2. Entire first table
print("2.")
print(arr3d[0, :, :])

# 3. Last row from each table
print("3.")
print(arr3d[:, 2, :])

# 4. Second column from each table
print("4.")
print(arr3d[:, :, 1])

# 5. Last 2 rows and last 2 columns from both tables
print("5.")
print(arr3d[:, 1:3, 1:3])

## Element-Wise Operations


**_What is an element-wise operation?_**


NumPy arrays are designed so that when you use standard math symbols (+, -, *, /, etc.), the operation is applied to each element individually, not to the array as a whole. These are called element-wise operations. For this to work, the **shapes must be exactly the same** so NumPy can line up elements in the left-hand array with an element in the same position in the right-hand array. *If your shapes don’t match, save that for the next part (broadcasting), where we’ll see how NumPy handles it without loops.\*


**_Element-wise Examples:_**


In [None]:
a = np.array([1, 2, 3])
b = np.array([10, 20, 30])

# Addition
print("\nArray (a + b):\n", a + b)

# Multiplication
print("\nArray (a * b):\n", a * b)

**Explanation:** NumPy is adding and multiplying each element above.
By performing `a + b`, every element in `a` is being added to the corresponding element in `b`: `1 + 10`, `2 + 20`, `3 + 30`. The same applies to multiplication.


## Broadcasting


**_What is Broadcasting?_**


Broadcasting is how NumPy lets you write loop-free, vectorized operations even when array shapes don’t exactly match. During an operation, NumPy can automatically **stretch** any dimension of size `1` or `0` so the shapes line up. Officially, broadcasting is the set of rules that allow arrays of different shapes to work together in element-wise operations. In machine learning, we can use broadcasting to stretch a linear model meant to classify a single example into a table ready to be used on an entire dataset.

**_What makes two dimensions compatible?_**

Two dimensions are compatible if they are equal OR one of them is `1`. All dimensions must be compatible for the operation to proceed.


**_Scalars + Arrays_**

Normally, if you tried to **add a list and a number in plain Python**, it would give you an **error**. This is where NumPy comes into play.

NumPy allows you to mix scalars and arrays using broadcasting. A single number is broadcast to every element automatically.

- `a` has shape (2,3), `b` is just a scalar with shape( , ).


In [None]:
a = np.array([[1, 2, 3], [4, 5, 6]])

b = 10

print(a + b)

A scalar has no dimensions. Broadcasting works by **expanding smaller arrays** so their shapes match, so NumPy "stretches" the scalar into the same shape as `a`. _This saves you from writing loops or manually copying values._


Essentially, it turns `b = 10` into:


In [None]:
[[10, 10, 10], [10, 10, 10]]

Then performs element-wise addition:


In [None]:
[[1 + 10, 2 + 10, 3 + 10], [4 + 10, 5 + 10, 6 + 10]]

**_Arrays + Arrays_**


Next, let's look at broadcasting when dealing with only arrays.

Suppose we have:


In [None]:
A = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

row = np.array([100, 200, 300])
col = np.array([[100], [200], [300]])

**`(N,)` vs `(N, 1)`**

A row vector with shape `(3,)` matches the **columns** of `A`, so NumPy stretches it horizontally.

A column vector with shape `(3, 1)` matches the **rows** of `A`, so NumPy stretches it vertically.

This is why `(3,)` and `(3, 1)` behave differently even though they hold three numbers.


In [None]:
print("A shape:", A.shape, "| row shape:", row.shape, "| col shape:", col.shape)

print("\nA + row  (adds to each row across columns):\n", A + row)
print("\nA + col  (adds down rows):\n", A + col)


#### **Exercise:** Broadcasting


**Task**

Analyze shape compatability and guess outputs after broadcasting

Fill in each cell with your guess of whether the array shapes are compatible for the operation and the output of the operation.


In [None]:
A = np.array([[1, 2, 3], [4, 5, 6]])

B = np.array([10, 20, 30])
C = np.array([[100], [200]])
D = np.array([1000])
E = np.array([[1, 2]])

print("A shape:", A.shape)
print("B shape:", B.shape)
print("C shape:", C.shape)
print("D shape:", D.shape)
print("E shape:", E.shape)

In [None]:
# Problem 1: A + B
# Fill in your predictions:
compatible_1 = None  # True or False
predicted_shape_1 = None  # Tuple like (2, 3)
predicted_result_1 = None  # Your predicted array

result_1 = A + B
print("Result shape:", result_1.shape)
print(result_1)

In [None]:
# Problem 2: A + C
# Fill in your predictions:
compatible_2 = None
predicted_shape_2 = None
predicted_result_2 = None

result_2 = A + C
print("Result shape:", result_2.shape)
print(result_2)

In [None]:
# Problem 3: B + C
# Fill in your predictions:
compatible_3 = None
predicted_shape_3 = None
predicted_result_3 = None

result_3 = B + C
print("Result shape:", result_3.shape)
print(result_3)

In [None]:
# Problem 4: A * D
# Fill in your predictions:
compatible_4 = None
predicted_shape_4 = None
predicted_result_4 = None

result_4 = A * D
print("Result shape:", result_4.shape)
print(result_4)

In [None]:
# Problem 5: A + E
# Fill in your predictions:
compatible_5 = None
predicted_shape_5 = None
predicted_result_5 = None

result_5 = A + E
print("Result shape:", result_5.shape)
print(result_5)

## **Bonus:** Reshaping & Copying


**_What is Reshaping?_**

Reshaping changes the shape (rows × columns × dimensions) of an array without changing the data inside. The total number of elements must stay the same.


Here's an example below:


In [None]:
arr = np.arange(1, 13)
print(arr)

- The original array has 12 elements.
- Reshaping into `3, 4` means 3 rows x 4 columns = 12 elements.


In [None]:
reshaped = arr.reshape(3, 4)
print(reshaped)


A handy feature NumPy offers is the use of `-1`. This saves you the hassle of having to do the math yourself, `-1` tells NumPy to figure out the dimension for you.
_You can only pass one `-1` at a time, or else NumPy won't know which dimension to calculate._


In [None]:
print(arr.reshape(3, -1))  # 3 rows, NumPy calculates columns

print(arr.reshape(-1, 6))  # 6 columns, NumPy calculates rows


**_What is Copying?_**

In NumPy, when you assign one array to another variable, both variables point to the same data in memory. This means that if you change one variable, the other changes too. This is also known as a **view** or **reference**. To prevent this, we use `.copy()` for an independent array. This duplicates the data in memory, so changes in one array don’t affect the other.


In [None]:
a = np.array([1, 2, 3])
b = a  # not a real copy
c = a.copy()  # real copy

b[0] = 99

# a and b share the same data, changing b also changes a.
print(a)  # [99  2  3]

# .copy() creates a completely independent array, changing c does not change a.
print(c)  # [1  2  3]


#### **Exercise:** Reshaping & Copying


**Task**

1. Create a `3×4` array `A` with the numbers 1–12. Print array `A`.
2. Reshape it into a new array `V` that is a view of the original.
   - Modify element `[0, 0] = 55` in the original.
3. Now make a copy `C` of the original array and reshape it.
   - Modify element `[0, 0] = 66` in the original.


In [None]:
# 1. Using arange() and reshape(), create a 3×4 array A with the numbers 1–12.
#    Print array A.

In [None]:
# 2. Reshape it and store it in a new array `V`, that is a view of the original
#    Modify element `[0, 0] = 55` in the original
#    Compare the two arrays and confirm whether they are still identical.

In [None]:
# 3. Make a real copy `C = A.reshape(2, 6).copy()`
#    Modify element `[0, 0] = 66` in the original
#    Compare the two arrays and confirm whether they are still identical

**SOLUTION**

**_Pause here and try it yourself before scrolling down._**


In [None]:
# 1. create 3x4 array A
A = np.arange(1, 13).reshape(3, 4)
print("Original A:", A)

# 2. reshape into a view V
V = A.reshape(2, 6)

# modify element [0,0] in A
A[0, 0] = 55
print("After modifying A (view test):")
print("A:", A)
print("V:", V)  # V should also show the change (still identical to A’s data)

# 3. make a copy C of the original and reshape
C = A.reshape(2, 6).copy()

# modify element [0,0] in A again
A[0, 0] = 66
print("After re-modifying A (copy test):")
print("A:", A)
print("C:", C)  # C should NOT change
