# 1.2 – The Basics of NumPy Arrays

Data manipulation in Python is nearly synonymous with NumPy array manipulation: even newer tools like Pandas are built around the NumPy array.
This section will present several examples of using NumPy array manipulation to access data and subarrays, and to split, reshape, and join the arrays.
While the types of operations shown here may seem a bit dry and pedantic, they comprise the building blocks of many other examples used throughout the book.
Get to know them well!

We'll cover a few categories of basic array manipulations here:

- *Attributes of arrays*: Determining the size, shape, memory consumption, and data types of arrays
- *Indexing of arrays*: Getting and setting the value of individual array elements
- *Slicing of arrays*: Getting and setting smaller subarrays within a larger array
- *Reshaping of arrays*: Changing the shape of a given array
- *Joining and splitting of arrays*: Combining multiple arrays into one, and splitting one array into many

## NumPy Array Attributes

First let's discuss some useful array attributes.
We'll start by defining three random arrays, a one-dimensional, two-dimensional, and three-dimensional array.
We'll use NumPy's random number generator, which we will *seed* with a set value in order to ensure that the same random arrays are generated each time this code is run:

In [1]:
import numpy as np
np.random.seed(0)  # seed for reproducibility

x1 = np.random.randint(10, size=6)  # One-dimensional array
x2 = np.random.randint(10, size=(3, 4))  # Two-dimensional array
x3 = np.random.randint(10, size=(3, 4, 5))  # Three-dimensional array

Each array has attributes ``ndim`` (the number of dimensions), ``shape`` (the size of each dimension), and ``size`` (the total size of the array):

In [2]:
print("x3 ndim: ", x3.ndim)
print("x3 shape:", x3.shape)
print("x3 size: ", x3.size)

x3 ndim:  3
x3 shape: (3, 4, 5)
x3 size:  60


Another useful attribute is the ``dtype``, the data type of the array:

In [3]:
print("dtype:", x3.dtype)

dtype: int64


Other attributes include ``itemsize``, which lists the size (in bytes) of each array element, and ``nbytes``, which lists the total size (in bytes) of the array:

In [4]:
print("itemsize:", x3.itemsize, "bytes")
print("nbytes:", x3.nbytes, "bytes")

itemsize: 8 bytes
nbytes: 480 bytes


In general, we expect that ``nbytes`` is equal to ``itemsize`` times ``size``.

## Array Indexing: Accessing Single Elements

If you are familiar with Python's standard list indexing, indexing in NumPy will feel quite familiar.
In a one-dimensional array, the $i^{th}$ value (counting from zero) can be accessed by specifying the desired index in square brackets, just as with Python lists:

In [5]:
x1  # the whole list

array([5, 0, 3, 3, 7, 9])

In [6]:
x1[0]  # the first element, indexing from zero!

5

In [7]:
x1[4]  # the fifth element, indexing from zero!

7

**Your turn.** Access the entries of `x1` that have values equal to ``3``:

In [8]:
# write your code here



To index from the end of the array, you can use negative indices:

In [9]:
x1[-1]  # the last element

9

In [10]:
x1[-2]  # the one before the last element

7

**Your turn.** Use negative indices to access the entries of `x1` that have values equal to ``3``:

In [11]:
# write your code here



In a multi-dimensional array, items can be accessed using a comma-separated tuple of indices:

In [12]:
x2  # a two-dimensional array

array([[3, 5, 2, 4],
       [7, 6, 8, 8],
       [1, 6, 7, 7]])

In [13]:
x2[0, 0]  # the (0,0)-th element

3

In [14]:
x2[2, 0]  # the (2,0)-th element

1

In [15]:
x2[2, -1]  # the (2,3)-th element

7

**Your turn.** Access the elements of `x2` having values ``6``:

In [16]:
# write your code here



Values can also be modified using any of the above index notation:

In [17]:
x2[0, 0] = 12  # set the (0,0)-th element to 12
x2

array([[12,  5,  2,  4],
       [ 7,  6,  8,  8],
       [ 1,  6,  7,  7]])

**Your turn.** Modify the value ``2`` into ``3`` and the value ``1`` into ``9`` of ``x2``. Then print the ``x2`` to verify the result. You should obtain:

```
array([[12,  5,  3,  4],
       [ 7,  6,  8,  8],
       [ 9,  6,  7,  7]])
```

In [18]:
# write your code here



Keep in mind that, unlike Python lists, NumPy arrays have a fixed type.
This means, for example, that if you attempt to insert a floating-point value to an integer array, the value will be silently truncated. Don't be caught unaware by this behavior!

In [19]:
x1[0] = 3.14159  # this will be truncated!
x1

array([3, 0, 3, 3, 7, 9])

If you try to assign an incompatible value, you will get an error message:

In [20]:
x1[0] = "a"

ValueError: invalid literal for int() with base 10: 'a'

## Array Slicing: Accessing Subarrays

Just as we can use square brackets to access individual array elements, we can also use them to access subarrays with the *slice* notation, marked by the colon (``:``) character.
The NumPy slicing syntax follows that of the standard Python list; to access a slice of an array ``x``, use this:
``` python
x[start:stop:step]
```
If any of these are unspecified, they default to the values ``start=0``, ``stop=``*``size of dimension``*, ``step=1``.
We'll take a look at accessing sub-arrays in one dimension and in multiple dimensions.

### One-dimensional subarrays

In [21]:
x = np.arange(10)
x

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [22]:
x[:5]  # first five elements

array([0, 1, 2, 3, 4])

In [23]:
x[5:]  # elements after index 5

array([5, 6, 7, 8, 9])

In [24]:
x[4:7]  # middle sub-array

array([4, 5, 6])

In [25]:
x[::2]  # every other element

array([0, 2, 4, 6, 8])

In [26]:
x[1::2]  # every other element, starting at index 1

array([1, 3, 5, 7, 9])

**Your turn.** Access all multiples of `3` in `x` using a single line of code. You should obtain: ```[3, 6, 9]```

In [27]:
# write your code here



A potentially confusing case is when the ``step`` value is negative.
In this case, the defaults for ``start`` and ``stop`` are swapped.
This becomes a convenient way to reverse an array:

In [28]:
x[::-1]  # all elements, reversed

array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])

In [29]:
x[5::-2]  # every other starting from the end to index 5

array([5, 3, 1])

**Your turn.** Access all multiples of `3` in `x` in the reversed order with a single line of code. You should obtain: ```[9, 6, 3]```

In [30]:
# write your code here



### Multi-dimensional subarrays

Multi-dimensional slices work in the same way, with multiple slices separated by commas.
For example:

In [31]:
x2

array([[12,  5,  2,  4],
       [ 7,  6,  8,  8],
       [ 1,  6,  7,  7]])

In [32]:
x2[:2, :3]  # slice the first two rows and the first three columns

array([[12,  5,  2],
       [ 7,  6,  8]])

In [33]:
x2[-2:, -3:]  # slice the last two rows and the last three columns

array([[6, 8, 8],
       [6, 7, 7]])

In [34]:
x2[:, ::2]  # slice all rows, every other column

array([[12,  2],
       [ 7,  8],
       [ 1,  7]])

Finally, subarray dimensions can even be reversed together:

In [35]:
x2[::-1, ::-1]

array([[ 7,  7,  6,  1],
       [ 8,  8,  6,  7],
       [ 4,  2,  5, 12]])

This is the same as the flip function

In [36]:
np.flip(x2)

array([[ 7,  7,  6,  1],
       [ 8,  8,  6,  7],
       [ 4,  2,  5, 12]])

**Your turn.** Access the bottom left $2 \times 3$ submatrix of `x2`, that is, if `x2` is
```
[[12  5  3  4]
 [ 7  6  8  8]
 [ 9  6  7  7]]
```
you should get
```
[[7 6 8]
 [9 6 7]]
```

In [37]:
# write your code here



#### Accessing array rows and columns

One commonly needed routine is accessing of single rows or columns of an array.
This can be done by combining indexing and slicing, using an empty slice marked by a single colon (``:``):

In [38]:
print(x2[:, 0])  # first column of x2

[12  7  1]


In [39]:
print(x2[0, :])  # first row of x2

[12  5  2  4]


**Your turn.** Access the the middle two columns of `x2`:

In [40]:
# write your code here



In the case of row access, the empty slice can be omitted for a more compact syntax:

In [41]:
print(x2[0])  # equivalent to x2[0, :]

[12  5  2  4]


**Your turn.** Access the the middle row of `x2` in two different ways:

In [42]:
# write your code here



### Subarrays as no-copy views

One important–and extremely useful–thing to know about array slices is that they return **views** rather than **copies** of the array data.
This is one area in which NumPy array slicing differs from Python list slicing: in lists, slices will be copies.
Consider our two-dimensional array from before:

In [43]:
print(x2)

[[12  5  2  4]
 [ 7  6  8  8]
 [ 1  6  7  7]]


Let's extract a $2\times 2$ subarray from this:

In [44]:
x2_sub = x2[:2, :2]
print(x2_sub)

[[12  5]
 [ 7  6]]


Now if we modify this subarray, we'll see that the original array is also changed! Observe:

In [45]:
x2_sub[0, 0] = 99  # modify the sub-array
print(x2_sub)

[[99  5]
 [ 7  6]]


In [46]:
print(x2)  # inspect the original array - 12 has been replaced with 99!

[[99  5  2  4]
 [ 7  6  8  8]
 [ 1  6  7  7]]


**Important!** This default behavior is actually quite useful: it means that when we work with large datasets, we can access and process pieces of these datasets without the need to copy the underlying data buffer.

### Creating copies of arrays

Despite the nice features of array views, it is sometimes useful to instead explicitly copy the data within an array or a subarray. This can be most easily done with the ``copy()`` method:

In [47]:
x2_sub_copy = x2[:2, :2].copy()
print(x2_sub_copy)

[[99  5]
 [ 7  6]]


If we now modify this subarray, the original array is not touched:

In [48]:
x2_sub_copy[0, 0] = 42
print(x2_sub_copy)

[[42  5]
 [ 7  6]]


In [49]:
print(x2)  # no changes to the (0,0)-th entry

[[99  5  2  4]
 [ 7  6  8  8]
 [ 1  6  7  7]]


**Your turn.** Create a copy of the bottom right $2 \times 2$ subarray of `x2`. Then set the diagonal values of the subarray to `-3` and `-6`. Then print both the original array and the subarray and verify that the original array has not changed. You should get
```
x2_sub_copy:
[[-3  8]
 [ 7 -6]]
x2:
[[99  5  3  4]
 [ 7  6  8  8]
 [ 9  6  7  7]]
```

In [50]:
# write your code here



In [51]:
# write your code here



## Reshaping of Arrays

Another useful type of operation is reshaping of arrays.
The most flexible way of doing this is with the ``reshape`` method.
For example, if you want to put the numbers 1 through 9 in a $3 \times 3$ grid, you can do the following:

In [52]:
grid = np.arange(1, 10).reshape((3, 3))
print(grid)

[[1 2 3]
 [4 5 6]
 [7 8 9]]


Note that for this to work, the size of the initial array must match the size of the reshaped array. 
Where possible, the ``reshape`` method will use a **no-copy view** of the initial array, but with non-contiguous memory buffers this is not always the case.

Another common reshaping pattern is the conversion of a one-dimensional array into a two-dimensional row or column matrix.
This can be done with the ``reshape`` method, or more easily done by making use of the ``newaxis`` keyword within a slice operation:

In [53]:
# create a numpy list
x = np.array([1, 2, 3]) 
x

array([1, 2, 3])

In [54]:
# row vector via reshape - observe the structure of the square brackets!
x.reshape((1, 3)) 

array([[1, 2, 3]])

In [55]:
# row vector via newaxis
x[np.newaxis, :] 

array([[1, 2, 3]])

In [56]:
# column vector via reshape
x.reshape((3, 1))

array([[1],
       [2],
       [3]])

In [57]:
# column vector via newaxis
x[:, np.newaxis]

array([[1],
       [2],
       [3]])

We will see this type of transformation often throughout the course.

The new shape parameter can also be ``-1`` (eg: ``(2,-1)`` or ``(-1,3)`` but not ``(-1, -1)``). It simply means that it is an unknown dimension and we want numpy to figure it out. And numpy will figure this by looking at the 'length of the array and remaining dimensions' and making sure it satisfies the above mentioned criteria.

In [58]:
x = np.array(([[1,2,3],
               [4,5,6]]))

# numpy will replace -1 with 2 
x.reshape((3,-1))

array([[1, 2],
       [3, 4],
       [5, 6]])

**Your turn.** Create a NumPy list of integers from 1 to 12. Then reshape it into a $3\times 4$ array in two different ways. You should obtain:
```
[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]
```

In [59]:
# write your code here



## Array Concatenation and Splitting

All of the preceding routines worked on single arrays. It's also possible to combine multiple arrays into one, and to conversely split a single array into multiple arrays. We'll take a look at those operations here.

### Concatenation of arrays

Concatenation, or joining of two arrays in NumPy, is primarily accomplished using the routines ``np.concatenate``, ``np.vstack``, and ``np.hstack``.
``np.concatenate`` takes a tuple or list of arrays as its first argument, as we can see here:

In [60]:
x = np.array([1, 2, 3])
y = np.array([3, 2, 1])
np.concatenate([x, y])  # glues the two lists

array([1, 2, 3, 3, 2, 1])

You can also concatenate more than two arrays at once:

In [61]:
z = [99, 99, 99]
print(np.concatenate([x, y, z])) # glues all three lists

[ 1  2  3  3  2  1 99 99 99]


It can also be used for two-dimensional arrays:

In [62]:
grid = np.array([[1, 2, 3],
                 [4, 5, 6]])
grid

array([[1, 2, 3],
       [4, 5, 6]])

In [63]:
# concatenate along the first axis - axis=0 - vertically - the default way for arrays
np.concatenate([grid, grid])

array([[1, 2, 3],
       [4, 5, 6],
       [1, 2, 3],
       [4, 5, 6]])

In [64]:
# concatenate along the second axis - horizontally
np.concatenate([grid, grid], axis=1)

array([[1, 2, 3, 1, 2, 3],
       [4, 5, 6, 4, 5, 6]])

When working with arrays of mixed dimensions, it can be clearer to use the ``np.vstack`` (vertical stack) and ``np.hstack`` (horizontal stack) functions:

In [65]:
x = np.array([1, 2, 3])
grid = np.array([[9, 8, 7],
                 [6, 5, 4]])

# vertically stack the arrays
np.vstack([x, grid])

array([[1, 2, 3],
       [9, 8, 7],
       [6, 5, 4]])

In [66]:
# horizontally stack the arrays
y = np.array([[99],
              [99]])
np.hstack([grid, y])

array([[ 9,  8,  7, 99],
       [ 6,  5,  4, 99]])

Similary, ``np.dstack`` will stack arrays along the third axis.

**Your turn.** Create two $3 \times 4$ arrays of random integers. Then stack them vertically and horizontally in two different ways.

In [67]:
# write your code here



### Splitting of arrays

The opposite of concatenation is splitting, which is implemented by the functions ``np.split``, ``np.hsplit``, and ``np.vsplit``.  For each of these, we can pass a list of indices giving the split points:

In [68]:
x = [1, 2, 3, 99, 99, 3, 2, 1]
x1, x2, x3 = np.split(x, [3, 5])
print(x1, x2, x3)

[1 2 3] [99 99] [3 2 1]


Notice that *N* split-points, leads to *N + 1* subarrays.
The related functions ``np.hsplit`` and ``np.vsplit`` are similar:

In [69]:
grid = np.arange(16).reshape((4, 4))
grid

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])

In [70]:
upper, lower = np.vsplit(grid, [2])  # split at row index 2
print(upper)
print(lower)

[[0 1 2 3]
 [4 5 6 7]]
[[ 8  9 10 11]
 [12 13 14 15]]


In [71]:
left, right = np.hsplit(grid, [2])  # split at column index 2
print(left)
print(right)

[[ 0  1]
 [ 4  5]
 [ 8  9]
 [12 13]]
[[ 2  3]
 [ 6  7]
 [10 11]
 [14 15]]


Similarly, ``np.dsplit`` will split arrays along the third axis.

It is also possible to slice an array into equal parts by indicating the number of slices only:

In [72]:
x = np.arange(9)
np.split(x, 3)

[array([0, 1, 2]), array([3, 4, 5]), array([6, 7, 8])]

However, numpy will raise an error if splitting can not be done:

In [73]:
x = np.arange(9)
np.split(x, 4)

ValueError: array split does not result in an equal division

Splitting is often used to randomly split a large dataset into smaller parts (minibatches):

In [74]:
x = np.arange(100)    # your original data
np.random.shuffle(x)  # suffle the data

x_split = np.split(x,np.arange(20,100,20))  # split into five equal parts of size 20, the "minibatches"
for i in x_split: 
    print(i)

[25 56 31 60 68 51  7  5 86 28 54 94  9  6 73 45 74 97  4 96]
[53 12 77 57 55 81 26 29 39 33  8 99 65 37 61 44 24 63 89 19]
[38 72 32 16 66 11 40 70 22 71 95  1 84 14 27 90 17 93 82 46]
[18 75 67 30 83 62  2 59 91 92 43 10 80 21 42 98 87 34 36 50]
[ 0 88 64 35 41 69 49 48 85 13 79 23 58 20 15 78 52 76  3 47]


**Your turn.** Create a list of integers from 1 to 81. Then reshape it into a $9\times9$ array and split into nine $3\times 3$ arrays.

In [75]:
# write your code here



## Example: Pooling 

Pooling is a sample-based discretization process. The objective is to down-sample an input representation (image, hidden-layer output matrix in deep neural networks, etc.), reducing its dimensionality and allowing for assumptions to be made about features contained in the sub-regions binned. The most common pooling types are max pooling and average pooling. 

Suppose we have a $4\times4$ matrix representing our initial input and a $2\times 2$ kernel that we'll run over our input. We'll have a stride of 2 (meaning the ($dx$, $dy$) for stepping over our input will be (2, 2)) and won't overlap regions.

For each of the regions represented by the kernel, we will take the max or average of that region and create a new, output matrix where each element is the max of a region in the original input:

![pooling](figures/pooling.png)

In case of the blue sub-array, we have:

In [76]:
image = np.array([[8,7,5,3],[12,9,5,7],[13,2,10,3],[9,4,5,14]])
print("Max pooling:", image[:2,:2].max())
print("Average pooling:", image[:2,:2].mean())

Max pooling: 12
Average pooling: 9.0


Suppose, more generally, that the input image is an $m \times n$ array and the kernel is of size $k \times l$. For simplicity, suppose further that $k$ divides $m$ and $l$ divides $n$. Then the pooling operation can be achieved using the ``.reshape()`` method:

In [77]:
m, n = 6, 6  # image dimensions
k, l = 2, 2  # kernel dimensions

image = np.random.randint(0,255,size=(m,n))  # a grayscale image with values in [0,255]
image

array([[117, 189,  83, 161, 104, 160],
       [228, 251, 251, 121,  70, 213],
       [ 31,  13,  71, 184, 152,  79],
       [ 41,  18,  40, 182, 207,  11],
       [166, 111,  93, 249, 129, 223],
       [118,  44, 216, 125,  24,  67]])

In [78]:
image.reshape(m//k, k, n//l, l)  # reshape into a 4-dimensional array

array([[[[117, 189],
         [ 83, 161],
         [104, 160]],

        [[228, 251],
         [251, 121],
         [ 70, 213]]],


       [[[ 31,  13],
         [ 71, 184],
         [152,  79]],

        [[ 41,  18],
         [ 40, 182],
         [207,  11]]],


       [[[166, 111],
         [ 93, 249],
         [129, 223]],

        [[118,  44],
         [216, 125],
         [ 24,  67]]]])

The result is an $\frac{m}{k} \times k \times \frac{n}{l} \times l$ dimensional array. We need to pool it over the first and the third axes (counting from zero):

In [79]:
print(image.reshape(m//k, k, n//l, l).max(axis=(1, 3)))  # max pooling

[[251 251 213]
 [ 41 184 207]
 [166 249 223]]


In [80]:
print(image.reshape(m//k, k, n//l, l).mean(axis=(1, 3)))  # average pooling

[[196.25 154.   136.75]
 [ 25.75 119.25 112.25]
 [109.75 170.75 110.75]]


These operations play a key role in the [Convolution Neural Networks (CNN)](https://en.wikipedia.org/wiki/Convolutional_neural_network). You will learn more about them in the the Machine Learning and Neural Networks module.

## Exercises

**Exercise 1.2.1** Create an $n$ dimensional array of a size $m_1 \times m_2 \times \cdots \times m_n$, where $n, m_1, \ldots, m_n$ are random integers in the range between 1 and 10. Then:

- print the number of dimensions of the array
- print the shape of the array
- print the size of the array

- print the data type of the array
- print the size (in bytes) of each entry of the array
- print the size (in bytes) of the whole array
- verify that the size (in bytes) of the whole array equals the size (in bytes) of each entry multiplied by the size of the array

In [81]:
# write your solution here



**Exercise 1.2.2** Create a vector with values ranging from 15 to 35. Then:
- print all values except the first and the last one
- print all the even-indexed values
- print all the odd-indexed values
- print every third values
- print all values in the reversed order

In [82]:
# write your solution here



**Exercise 1.2.3** Create a $10\times10$ array, in which the elements on the borders are set to 1, and inside to 0.

In [83]:
# write your solution here



**Exercise 1.2.4** Create a $5\times5$ array or random integers from 0 to 9. Then:
- print the second row
- print the third column
- print the inner $3\times3$ matrix
- make a copy of the the inner $3\times3$ matrix
- set the value of the middle element of the copied array to 1 and print the array 

In [84]:
# write your solution here



**Exercise 1.2.5** Create a $3\times4$ array of integers from 10 to 21. Then print the whole array and each element of the array individually.

*Hint: Explore numpy ``nditer`` method, and use ``print(x,end=" ")`` to print each element in a single line*.

In [85]:
# write your solution here



**Exercise 1.2.6** Create a $2\times10$ array of random integers and reshape it into a $4\times5$ array.

In [86]:
# write your solution here



**Exercise 1.2.7** Create two $3\times3$ arrays of random integers. Then stack them horizontally and vertically in two different ways.

In [87]:
# write your solution here



**Exercise 1.2.8** Create a 9-component random vector. Then split it into three equal parts.

In [88]:
# write your solution here



**Exercise 1.2.9** Create a $4\times10$ array of random integers. Then:
- split it horizontally into two equal parts;
- split it vertically into two equal parts.

In [89]:
# write your solution here



**Exercise 1.2.10** Create a $5\times5$ array with integers from 1 to 5 in each row. Then do the same only for each column.

In [90]:
# write your solution here



**Exercise 1.2.11** Create the following pattern without hardcoding:
```
[1, 1, 1, 2, 2, 2, 3, 3, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3]
```
Use only numpy functions and the below input array `a`.

 *Hint:* explore numpy methods ``r_``, ``repeat`` and ``tile``

In [91]:
# write your solution here



<!--NAVIGATION-->
< [1.1 – Intro to NumPy and Data Types](L11_Intro_to_NumPy_and_Data_Types.ipynb) | [Contents](../index.ipynb) | [1.3 – Computation on NumPy Arrays: Universal Functions](L13_Computation_on_Arrays_UFuncs.ipynb) >

*This notebook contains an excerpt from the [Python Data Science Handbook](http://shop.oreilly.com/product/0636920034919.do) by Jake VanderPlas; also available [on GitHub](https://github.com/jakevdp/PythonDataScienceHandbook).*