# More on Numpy

In [1]:
import numpy as np

`numpy` arrays can have multiple dimensions and there are various ways to create.


## Create from lists

If you all the elements that need to go into the array, 
you can use the basic construction from lists:

In [2]:
a = np.array([0, 1., 2.0, 3.0])
a

array([0., 1., 2., 3.])

In [3]:
b = np.array([
    [0.0, 0.1, 0.2, 0.3],
    [1.0, 1.1, 1.2, 1.3],
    [2.0, 2.1, 2.2, 2.3]
    ])
b

array([[0. , 0.1, 0.2, 0.3],
       [1. , 1.1, 1.2, 1.3],
       [2. , 2.1, 2.2, 2.3]])

In [4]:
c = np.array([
    [
      [0.0, 0.1, 0.2, 0.3],
      [1.0, 1.1, 1.2, 1.3],
      [2.0, 2.1, 2.2, 2.3]
    ],
    [
      [100.0, 100.1, 100.2, 100.3],
      [101.0, 101.1, 101.2, 101.3],
      [102.0, 102.1, 102.2, 102.3]
    ]
    ])
c

array([[[0.000e+00, 1.000e-01, 2.000e-01, 3.000e-01],
        [1.000e+00, 1.100e+00, 1.200e+00, 1.300e+00],
        [2.000e+00, 2.100e+00, 2.200e+00, 2.300e+00]],

       [[1.000e+02, 1.001e+02, 1.002e+02, 1.003e+02],
        [1.010e+02, 1.011e+02, 1.012e+02, 1.013e+02],
        [1.020e+02, 1.021e+02, 1.022e+02, 1.023e+02]]])

## Metadata

The `np.array` provides access to a number of useful attributes to describe its contents.

In [5]:
def describe(x):
    print('Type:', type(x))
    print('Type of elements:', x.dtype)
    print('Number of elements:', x.size)
    print('Number of dimensions:', x.ndim)
    print('Number of elements in each dimension:', x.shape)

In [6]:
print('a')
describe(a)

print('-------------------')
print('-------------------')
print('b')
describe(b)

print('-------------------')
print('-------------------')
print('c')
describe(c)

a
Type: <class 'numpy.ndarray'>
Type of elements: float64
Number of elements: 4
Number of dimensions: 1
Number of elements in each dimension: (4,)
-------------------
-------------------
b
Type: <class 'numpy.ndarray'>
Type of elements: float64
Number of elements: 12
Number of dimensions: 2
Number of elements in each dimension: (3, 4)
-------------------
-------------------
c
Type: <class 'numpy.ndarray'>
Type of elements: float64
Number of elements: 24
Number of dimensions: 3
Number of elements in each dimension: (2, 3, 4)


## Create `np.array` with fixed values

To create an array of zeros:

In [7]:
np.zeros(5) 

array([0., 0., 0., 0., 0.])

To create an array of ones:

In [8]:
np.ones(5) 

array([1., 1., 1., 1., 1.])

To create an array populated with a fixed value:

In [9]:
np.full(5, 3.0) 

array([3., 3., 3., 3., 3.])

For multi-dimensional arrays, you pass the shape tuple

In [10]:
np.zeros((2,3)) 

array([[0., 0., 0.],
       [0., 0., 0.]])

In [11]:
np.zeros((2,3,5)) 

array([[[0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.]],

       [[0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.]]])

In [12]:
np.ones((2,3)) 

array([[1., 1., 1.],
       [1., 1., 1.]])

In [13]:
np.ones((2,3,5)) 

array([[[1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.]],

       [[1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.]]])

In [14]:
np.full((2,3), 3.) 

array([[3., 3., 3.],
       [3., 3., 3.]])

In [15]:
np.full((2,3,5), 3.) 

array([[[3., 3., 3., 3., 3.],
        [3., 3., 3., 3., 3.],
        [3., 3., 3., 3., 3.]],

       [[3., 3., 3., 3., 3.],
        [3., 3., 3., 3., 3.],
        [3., 3., 3., 3., 3.]]])

## Create `np.array` with fixed values with the shape of an other `np.array`

In [16]:
np.zeros_like(a) 

array([0., 0., 0., 0.])

In [17]:
np.zeros_like(b) 

array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

In [18]:
np.zeros_like(c) 

array([[[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]],

       [[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]]])

In [19]:
np.ones_like(a) 

array([1., 1., 1., 1.])

In [20]:
np.ones_like(b) 

array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

In [21]:
np.ones_like(c) 

array([[[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]],

       [[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]]])

In [22]:
np.full_like(a, 3.) 

array([3., 3., 3., 3.])

In [23]:
np.full_like(b, 3.) 

array([[3., 3., 3., 3.],
       [3., 3., 3., 3.],
       [3., 3., 3., 3.]])

In [24]:
np.full_like(c, 3.) 

array([[[3., 3., 3., 3.],
        [3., 3., 3., 3.],
        [3., 3., 3., 3.]],

       [[3., 3., 3., 3.],
        [3., 3., 3., 3.],
        [3., 3., 3., 3.]]])

## Create uninitialized empty `np.array`

`empty` family of functions creates an array without initializing its values 
to any particular value, to save on initialization if you do not have the values
when creating the array. Beware that they is no guarantee what the values will
be. Until you explicitly set them, they may be "garbage".

In [25]:
np.empty(5) 

array([3., 3., 3., 3., 3.])

In [26]:
np.empty((2,3,5)) 

array([[[3., 3., 3., 3., 3.],
        [3., 3., 3., 3., 3.],
        [3., 3., 3., 3., 3.]],

       [[3., 3., 3., 3., 3.],
        [3., 3., 3., 3., 3.],
        [3., 3., 3., 3., 3.]]])

In [27]:
np.empty_like(a) 

array([3., 3., 3., 3.])

In [28]:
np.empty_like(b) 

array([[3., 3., 3., 3.],
       [3., 3., 3., 3.],
       [3., 3., 3., 3.]])

In [29]:
np.empty_like(c) 

array([[[3., 3., 3., 3.],
        [3., 3., 3., 3.],
        [3., 3., 3., 3.]],

       [[3., 3., 3., 3.],
        [3., 3., 3., 3.],
        [3., 3., 3., 3.]]])

## Create `np.array` with some common patterns

### Diagonal matrices

`np.identity` creates an identity matrix with `1.0` on the diagonal

In [30]:
np.identity(4) 

array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.]])

`np.eye` is a similar function

In [31]:
np.eye(4) 

array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.]])

but with other more general functionality.

You can use it to create a non-square matrix with `1.0` on a diagonal

In [32]:
np.eye(4,5) 

array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.]])

You can set the `1.0` on different "diagonals".

For example, the first upper diagonal

In [33]:
np.eye(4,5,1) 

array([[0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])

the second upper diagonal

In [34]:
np.eye(4,5,2) 

array([[0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.],
       [0., 0., 0., 0., 0.]])

the first lower diagonal

In [35]:
np.eye(4,5,-1) 

array([[0., 0., 0., 0., 0.],
       [1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.]])

the second lower diagonal

In [36]:
np.eye(4,5,-2) 

array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.]])

### `arange` and `linspace`

`arange` is an array-valued version of the built-in Python `range` function.

It can be used to create an array with a range of values, by providing the end of
the range

In [37]:
np.arange(3.0)

array([0., 1., 2.])

You can also give the first and the after last value

In [38]:
np.arange(1.0, 6.0)

array([1., 2., 3., 4., 5.])

You can also give provide the step size

In [39]:
np.arange(1.0, 10.0, 2.0)

array([1., 3., 5., 7., 9.])

`linspace` instead produces evenly spaced values over the interval provided.

You provide the beginning and end of the range, and the number of values

In [40]:
np.linspace(0., 100.0, 5)

array([  0.,  25.,  50.,  75., 100.])

The functions have more flexibility than what we show here. For all the possibilities,
refer to the official documentation.


## Views vs copies

One important–and extremely useful–thing to know about array slices is that they return views rather than copies of the array data. This is one area in which NumPy array slicing differs from Python list slicing: in lists, slices will be copies. Consider a two-dimensional array:

In [41]:
x = np.random.randint(10, size=(3, 4))  # Two-dimensional array
x

array([[7, 2, 7, 9],
       [7, 6, 4, 3],
       [5, 5, 9, 1]])

Let's extract a $2 \times 2$ subarray from this:

In [42]:
x_sub = x[:2, :2]
x_sub

array([[7, 2],
       [7, 6]])

Now if we modify this subarray, we'll see that the original array is changed! Observe:

In [43]:
x_sub[0, 0] = 99
x_sub

array([[99,  2],
       [ 7,  6]])

In [44]:
x

array([[99,  2,  7,  9],
       [ 7,  6,  4,  3],
       [ 5,  5,  9,  1]])

This default behavior is actually quite useful: it means that when we work with large datasets, we can access and process pieces of these datasets without the need to copy the underlying data buffer.

Despite the nice features of array views, it is sometimes useful to instead explicitly copy the data within an array or a subarray. This can be most easily done with the `copy()` method:

In [45]:
x_sub_copy = x[:2, :2].copy()
x_sub_copy

array([[99,  2],
       [ 7,  6]])

If we now modify this subarray, the original array is not touched:

In [46]:
x_sub_copy[0, 0] = 42
x_sub_copy

array([[42,  2],
       [ 7,  6]])

In [47]:
x

array([[99,  2,  7,  9],
       [ 7,  6,  4,  3],
       [ 5,  5,  9,  1]])

## Reshaping

Another useful type of operation is reshaping of arrays. The most flexible way of doing this is with the reshape method. For example, if you want to put the numbers 1 through 9 in a $3 \times 3$ grid, you can do the following:

In [48]:
grid = np.arange(1, 10).reshape((3, 3))
grid

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

Note that for this to work, the size of the initial array must match the size of the reshaped array. Where possible, the reshape method will use a no-copy view of the initial array, but with non-contiguous memory buffers this is not always the case.

Another common reshaping pattern is the conversion of a one-dimensional array into a two-dimensional row or column matrix. This can be done with the reshape method, or more easily done by making use of the newaxis keyword within a slice operation:

In [49]:
x = np.array([1, 2, 3])
# row vector via reshape
x.reshape((1, 3)) 

array([[1, 2, 3]])

In [50]:
# row vector via newaxis
x[np.newaxis, :]

array([[1, 2, 3]])

In [51]:
# column vector via reshape
x.reshape((3, 1))

array([[1],
       [2],
       [3]])

In [52]:
# column vector via newaxis
x[:, np.newaxis]

array([[1],
       [2],
       [3]])

You can also transpose a matrix using

In [53]:
grid.T

array([[1, 4, 7],
       [2, 5, 8],
       [3, 6, 9]])

or

In [54]:
grid.transpose()

array([[1, 4, 7],
       [2, 5, 8],
       [3, 6, 9]])

## Resizing

During reshaping, the total number of elements is unchanged. By contrast, resizing operations can increase of decrease the number of elements.

Given an array

In [55]:
g = np.arange(15)

Let's down-size to a $3 \times 1$ array

In [56]:
np.resize(g, (3, 1))

array([[0],
       [1],
       [2]])

or down-size to a $1 \times 5$ array

In [57]:
np.resize(g, (1, 5))

array([[0, 1, 2, 3, 4]])

or down-size to a $2 \times 5$ array

In [58]:
np.resize(g, (2, 5))

array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

We can also upsize

In [59]:
np.resize(g, (5, 4))

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14,  0],
       [ 1,  2,  3,  4]])

## Array concatenation and splitting

It's also possible to combine multiple arrays into one, and to conversely split a single array into multiple arrays. We'll take a look at those operations here.

Concatenation, or joining of two arrays in NumPy, is primarily accomplished using the routines `np.concatenate`, `np.vstack`, and `np.hstack`. 

`np.concatenate` takes a tuple or list of arrays as its first argument, as we can see here:

In [60]:
x = np.array([1, 2, 3])
y = np.array([3, 2, 1])
np.concatenate([x, y])

array([1, 2, 3, 3, 2, 1])

You can also concatenate more than two arrays at once:

In [61]:
z = [99, 99, 99]
np.concatenate([x, y, z])

array([ 1,  2,  3,  3,  2,  1, 99, 99, 99])

It can also be used for two-dimensional arrays:

In [62]:
grid = np.array([[1, 2, 3],
                 [4, 5, 6]])
grid

array([[1, 2, 3],
       [4, 5, 6]])

In [63]:
# concatenate along the first axis
np.concatenate([grid, grid])

array([[1, 2, 3],
       [4, 5, 6],
       [1, 2, 3],
       [4, 5, 6]])

In [64]:
# concatenate along the second axis (zero-indexed)
np.concatenate([grid, grid], axis=1)

array([[1, 2, 3, 1, 2, 3],
       [4, 5, 6, 4, 5, 6]])

For working with arrays of mixed dimensions, it can be clearer to use the `np.vstack` (vertical stack) and `np.hstack` (horizontal stack) functions:

In [65]:
x = np.array([1, 2, 3])
grid = np.array([[9, 8, 7],
                 [6, 5, 4]])

# vertically stack the arrays
np.vstack([x, grid])

array([[1, 2, 3],
       [9, 8, 7],
       [6, 5, 4]])

In [66]:
# horizontally stack the arrays
y = np.array([[99],
              [99]])
np.hstack([grid, y])

array([[ 9,  8,  7, 99],
       [ 6,  5,  4, 99]])

Similary, `np.dstack` will stack arrays along the third axis.

The opposite of concatenation is splitting, which is implemented by the functions `np.split`, `np.hsplit`, and `np.vsplit`. For each of these, we can pass a list of indices giving the split points:

In [67]:
x = [1, 2, 3, 99, 99, 3, 2, 1]
x1, x2, x3 = np.split(x, [3, 5])
print(x1, x2, x3)

[1 2 3] [99 99] [3 2 1]


Notice that N split-points, leads to N + 1 subarrays. The related functions `np.hsplit` and `np.vsplit` are similar:

In [68]:
grid = np.arange(16).reshape((4, 4))
grid

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])

In [69]:
upper, lower = np.vsplit(grid, [2])
print('upper:\n', upper)
print('lower:\n', lower)

upper:
 [[0 1 2 3]
 [4 5 6 7]]
lower:
 [[ 8  9 10 11]
 [12 13 14 15]]


In [70]:
left, right = np.hsplit(grid, [2])
print('left:\n', left)
print('right:\n', right)

left:
 [[ 0  1]
 [ 4  5]
 [ 8  9]
 [12 13]]
right:
 [[ 2  3]
 [ 6  7]
 [10 11]
 [14 15]]


## Array flattening

You can flatten a multidimensional array to a one-dimensional one. 

For example, given a grid

In [71]:
grid = np.arange(16).reshape((4, 4))
grid

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])

you can transform to a 1-d array

In [72]:
grid.flatten()

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15])

The default behavior is to concatenate the rows. You can explicitly require 
the row order

In [73]:
grid.flatten(order='C')

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15])

of the column order

In [74]:
grid.flatten(order='F')

array([ 0,  4,  8, 12,  1,  5,  9, 13,  2,  6, 10, 14,  3,  7, 11, 15])

## Broadcasting

Numpy also supports arithmetic operations between arrays of different shapes when possible.

The simplest example of broadcasting is when combining a scalar value with an array

In [75]:
arr = np.arange(5)
arr

array([0, 1, 2, 3, 4])

multiplying the array by 2 will __broadcast__ the value 2 to all the other elements
in the multiplication operation:

In [76]:
arr * 2

array([0, 2, 4, 6, 8])

Frequently we have a smaller array and a larger array, and we want to use the smaller array multiple times to perform some operation on the larger array.

For example, suppose that we want to add a constant vector to each row of a matrix.

In [77]:
x = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])
v = np.array([1, 0, 1])

We could stack 4 copies of `v` on top of each other

In [78]:
vv = np.tile(v, (4, 1)) 
vv

array([[1, 0, 1],
       [1, 0, 1],
       [1, 0, 1],
       [1, 0, 1]])

and use elementwise addition

In [79]:
y = x + vv 
y

array([[ 2,  2,  4],
       [ 5,  5,  7],
       [ 8,  8, 10],
       [11, 11, 13]])

Numpy broadcasting allows us to perform this computation without actually creating multiple copies of v. Consider this version, using broadcasting:

In [80]:
print('x shape:', x.shape, '\t', 'v shape:', v.shape)
z = x + v 
z

x shape: (4, 3) 	 v shape: (3,)


array([[ 2,  2,  4],
       [ 5,  5,  7],
       [ 8,  8, 10],
       [11, 11, 13]])

The line `y = x + v` works even though `x` has shape `(4, 3)` and `v` has shape `(3,)` due to broadcasting. It is as if `v` actually had shape `(4, 3)`, where each row was a copy of `v`, and the sum was performed elementwise.

Broadcasting two arrays together follows these rules:

- If the arrays do not have the same rank, prepend the shape of the lower rank array with 1s until both shapes have the same length.
- The two arrays are said to be compatible in a dimension if they have the same size in the dimension, or if one of the arrays has size 1 in that dimension.
- The arrays can be broadcast together if they are compatible in all dimensions.
- After broadcasting, each array behaves as if it had shape equal to the elementwise maximum of shapes of the two input arrays.
- In any dimension where one array had size 1 and the other array had size greater than 1, the first array behaves as if it were copied along that dimension

If this explanation does not make sense, try reading the explanation from the [documentation](https://numpy.org/doc/stable/user/basics.broadcasting.html) or this [explanation](http://scipy.github.io/old-wiki/pages/EricsBroadcastingDoc).

Functions that support broadcasting are known as universal functions. You can find the list of all universal functions in the [documentation](https://numpy.org/doc/stable/reference/ufuncs.html#available-ufuncs).

Broadcasting typically makes your code more concise and faster, but is sometimes tricky to grasp correctly.