# Intro to NumPy

## NumPy Basics, Continued

The examples here use trivial "datasets" for clarity and simplicity. In real world use, NumPy can easily handle arrays with 1M elements (e.g. 1000 x 1000 matrix). With careful memory management it can handle 100 to 1000x that.

Note that these tools are complex and this simplified introduction glosses over some important nuances that we will elaborate on later, as required.

Much of the material in this section is adapted from Chapter 4 of *Python for Data Analysis* (McKinney 2022).

### Import

By convention, NumPy is usually imported as `np` - shorthand notation.

In [None]:
import numpy as np

### Arithmetic with Arrays

There is no need for loops when operating on an `ndarray`, because arithmetic operations are **vectorized**. This means that they take advantage of modern cpu architecture to execute the same operation on multiple data elements simultaneously, processing entire arrays in compiled C code rather than interpreting Python loops element by element.

Any operations between equal-size arrays apply the operation element-wise.

In [None]:
arr = np.array([[1., 2., 3.], [4., 5., 6.]])

arr * arr

In [None]:
arr - arr

Operations with scalars propogate the scalar argument to each element in the array. This uses a process called **broadcasting**, which we will explore in more detail later.

In [None]:
1 / arr

In [None]:
arr ** 2

#### Aggregations

NumPy provides several aggregation functions, including `np.sum`, `np.mean`, and `np.std`.

In [None]:
# total of all elements
np.sum(arr)

**Note:** very different from `sum` in base Python!

In [None]:
sum(arr)

In [None]:
np.mean(arr)

In [None]:
np.std(arr)

#### Linear Algebra

Though we won't use it much in this class, NumPy (and SciPy) provide many tools for working with n-dimensional arrays (matrices). Fundamental to that are dot products and matrix multiplication.

The dot product of two 1D arrays (vectors) is calculated with `np.dot(a, b)`.

In [None]:
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

dot_product = np.dot(a, b)
print("a · b =", dot_product)

Remember that the dot product is the sum of all element-wise products: $a \cdot b = \sum_{i=1}^{n} a_i b_i$

In the example above that is $(1 \times 4) + (2 \times 5) + (3 \times 6) = 4 + 10 + 18 = 32$.

Matrix multiplication is supported with the `@` operator.

In [None]:
A = np.array([[1, 2],
              [5, 6]])
B = np.array([[3, 4],
              [7, 8]])

C = A @ B
print("\nMatrix multiplication:")
print(C)

In matrix multiplication, each element of the result is the dot product of the corresponding row and column. Using NumPy's slicing syntax, we could say: $C[i,j] = A[i,:] \cdot B[:,j]$.

As a consequence, for 1D arrays, the dot product is equivalent to matrix multiplication. Therefore, `np.dot(a,b)` and `a @ b` are interchangable for two 1D arrays. Using `np.dot()` can be more explicit about your intent to compute a dot product (returning a scalar), while `@` is the standard for matrix operations.

#### Array Comparisons

Comparisons between arrays of the same size yield Boolean arrays.

In [None]:
arr2 = np.array([[0., 4., 1.], [7., 2., 12.]])

arr2 > arr

This result is commonly refered to as a *boolean mask*. Objects like this are commonly used to apply other options only to the elements of the array identified by `True`.

### Create Arrays from Scratch

A number of functions exist to create arrays from scratch.

`np.zeros` and `np.ones` will create an array of the specified shape filled with zeros, or ones, respectively.

In [None]:
np.zeros(10)

In [None]:
np.ones((3, 6))  # note inner parens!

Note: to create an array with 2 or more dimensions, you must pass a tuple of the desired shape. In this case the tuple **must** be enclosed in parentheses to avoid being interpreted as multipl arguments.

In [None]:
# this will be interpreted as two arguments, not one specifying the shape
np.ones(3, 6)

Here we see that `6` is being interpreted as the second argument, `dtype`.

In [None]:
help(np.ones)

To generate an array filled with a range of values, use `np.arange` with start, stop, and step parameters, which correspond to the same in `range()` from base Python. `arange` is limited to 1D arrays, but you can reshape the result into higher order arrays as we'll see later.

In [None]:
np.arange(10)

In [None]:
np.arange(0, 1, 0.1)

Be careful of floating-point precision issues when using `np.range`. For exact factional steps, `np.linspace` is often a more reliable way to create evenly spaced divisions:

In [None]:
# 0 to 1 in 11 evenly spaced points
np.linspace(0, 1, 11)

Several other basic array creation functions exist. The most commonly used are summarized in the following table.

![NumPy Array Creation (McKinney Table 4-1)](images/03a-numpy-array-creation.png)

### Random Generation and Selection

NumPy offers a number of ways to work with "random" data. For example, you can easily generate arrays of arbitrary size filled with random samples from various distributions.

In [None]:
# set the seed
np.random.seed(42)

# Random values
print("\nRandom uniform [0,1):")
print(np.random.random((2, 3)))

print("\nRandom normal (mean=0, std=1):")
print(np.random.normal(size=(2, 3)))

It also provides tools for randomly reordering arrays or selecting elements from them.

`np.random.shuffle` reorders elements of the array in-place.

In [None]:
# Choice and shuffle
cards = np.array(list('23456789JQKA') + ["10"])
print("\nCards:")
print(cards)

np.random.shuffle(cards)
print("\nShuffled:")
print(cards)

The functions in this section are all part of NumPy's random module (i.e., `np.random`). As such, they do not work as methods:

In [None]:
cards.random.shuffle()

How is the functions / modules distinction made? Conceptually, random operations are things you do to an array, not an intrinsic property of the array itself.

`np.random.choice` randomly selects elements from an existing array to create a new object.

In [None]:
# default is size one, e.g., a card:
card = np.random.choice(cards)
print("Card:", card)

# can choose more than one, e.g., a hand:
hand = np.random.choice(cards, size=5)
print("Hand:", hand)

# big hand
big_hand = np.random.choice(cards, size=10)
print("Ten cards:", big_hand)

Note that, by default, choice uses a uniform distribution (all choices equally likely) and samples with replacement (duplicates possible).

`choice` has options to control that behavior:

In [None]:
help(np.random.choice)

In [None]:
big_hand_wor = np.random.choice(cards, size=10, replace=False)
print("Without replacement:", big_hand_wor)

### Array Manipulation Techniques

NumPy provides tools for converting between row and column form or changing the dimensions of an array. None of the methods described below adds or removes elements.

`np.reshape` changes the dimensions without changing the element order.

In [None]:
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
print(arr)
print(arr.shape)

Here we've created a 1D array of 12 elements, represented by a *single* set of outer square brackets.

In [None]:
print("As row:", arr.reshape(1, 12))        # one row
print("As column:\n", arr.reshape(12, 1))   # one column
print("As table:\n", arr.reshape(3, 4))     # 3 rows, 4 cols
print("As cube:\n", arr.reshape(2, 2, 3))   # 2x2x3 3D array

Each of these operations transforms our 1D array into a 2D array. It is important to note that you can create 2D arrays that have only one row or column - see the first two outputs above. All 2D arrays are represented by *two* sets of outer square brackets.

`np.transpose` flips an array over its diagonal by swapping all axes. For 1D or 2D, rows become columns, and columns become rows.

In [None]:
arr = np.array([[1, 2, 3],
                [4, 5, 6]])

transposed = arr.transpose()
print("\nTransposed (3,2):")
print(transposed)

A likely point of confusion... we're referring to these functions in the form `np.function` (e.g., `np.transpose(arr)`), but using them as object methods (e.g., `arr.transpose`). As shown below, these are functionally equivalent.

In [None]:
np.transpose(arr)

Methods are often more concise, especially when chaining operations (e.g. `arr.transpose().sum()`), while functions may be clearer when working with multiple arrays or using NumPy operations in isolation. In practice, method notation is preferred.

Also note that NumPy offers a shorthand method name for transpose, `.T`:

In [None]:
arr.T

Finally, `np.swapaxes` swaps *two* specific axes. This gives the same result as transpose for 2D arrays, but more control for higher dimensional data. We will explore it more when / if the need arises.

In [None]:
# specify the axix numbers to swap
arr.swapaxes(0, 1)

`reshape`, `transpose`, and `swapaxes` all usually return a view of the data, but a copy may be returned if the array is large.