<a href="https://colab.research.google.com/github/sudiptaroyshuvo/Numpy-basic/blob/main/01_Basics_of_NumPy_final.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Basics of NumPy

This tutorial shows you how to construct multi-dimensional arrays and perform basic operations on them with the NumPy library. By themselves, the presented multi-dimensional arrays and the functions provided by NumPy very likely seem rather tiresome and dull. But the basics shown here, especially the underlying concepts and syntax, are the tools that are used literally daily while manipulating data in the field of data science.

The following topics are covered in this tutorial:

- Construction
- Attributes
- Indexing
- Slicing
- Combined indexing & slicing
- Views & copies
- Reshaping
- Combining and splitting


**Disclaimer:** Please note that the NumPy library is constantly evolving over time and that some features and functions become obsolete and are replaced by new ones. However, this does not change the general way of working with NumPy.

# Setup NumPy

Like every other library, NumPy must be imported in Python with the import keyword. And it is typically given the alias name **np**.

With the attribute **\_\_version\_\_**, we can get the version of the library and print it. (Some functions might only be available starting with a certain version number.)

We also set the seed value of the random number generator to a specific value. This has the advantage that when the notebooks is restarted and the cells executed in the same sequence, the randomly generated numbers are the same in every run. But be aware that if you execute cells several times, you draw more random numbers from the generator and that this changes the subsequent numbers that are generated in the following cells. If you want to the same random numbers as in the material provided from us, then just restart the notebooks once more and let it run up to the cell you want to continue with.

In [None]:
 import numpy as np
print('Numpy version:', np.__version__)
# set the random number generator to a specific seed value,
# so that the same random numbers are returned
np.random.seed(123)


Numpy version: 1.21.6


# Construction

There are many convenient ways to construct multidimensional arrays in NumPy and to fill it with values. In the following, only a few are shown. If you need an array with some specific values, then take a look at a documentation of NumPy and see what functions are offered for this purpose. We cannot cover all of them (and it would also not be that interesting after a while).

A NumPy array can simply be constructed from a Python list that provides values.

In [None]:
a0 = np.array([0, 1, 2, 3])
print(a0)


[0 1 2 3]


Passing a list of lists to **array()**, multi-dimensional arrays can also be constructed. Make sure you provide the right number of square brackes and in the correct order. It is easy to forget brackets or put them in the wrong order when constructing higher-dimensional arrays this way.

In [None]:
a00 = np.array([[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 0, 1]])
print(a00)

[[0 1 2 3]
 [4 5 6 7]
 [8 9 0 1]]


If you need a large array to test something out, you can use the **randn()** function of the **random** module of NumPy to get random floating point numbers that are sampled from the standard normal (Gaussian) distribution (of mean 0 and variance 1).

Note that the **print()** method outputs the values of the array within square brackets ([, ]). And it conveniently shortens the output if it is too long (see the three dots in the middle).

In [None]:
a1 = np.random.randn(10000000)
print(a1)

[-1.0856306   0.99734545  0.2829785  ... -0.52814446  0.08482555
  0.9228148 ]


With the **uniform()** function, an array of a given shape is generated by sampling random values from a uniform distribution within a given interval [low, high). This means that each values in the given range has the same probability to be include in the array. Please note that the high end of the interval is open and thus the end value will never be sampled. The function may include the low value in the array, but excludes the high value.

The dimensions are neatly organized in the output in pairs of nested brackets. The first dimension ("rows") are the outside brackets, and the second dimension ("columns") are the inside brackets.

In [None]:
# generate an array of shape (3, 5)
# and initialize with values in the interval from [0, 100) (excluding 100␣,→itself)
a2 = np.random.uniform(0, 100, (3, 5))
print(a2)

[[20.20499052 34.38819988 66.25202847 77.36564441 29.7074708 ]
 [57.6617209  27.24146786 78.76757381 44.947833   84.01904067]
 [76.28783409 15.38079956 61.26190759 68.79159952 11.14729942]]


In [None]:
a7 = np.random.uniform(0, 50, (6, 2, 7, 5, 9, 2, 3))
print(a7)

[[[[[[[2.02139983e+01 1.47392196e+01 9.74256468e+00]
      [7.61863404e+00 4.63134294e+01 3.08467491e+01]]

     [[3.84131153e+01 1.10475149e+01 4.77436047e+01]
      [1.25995136e+01 2.11552427e+00 8.99503875e+00]]

     [[8.19476668e+00 4.17740850e+01 1.96829535e+01]
      [3.77169985e+01 1.99618862e+01 3.99234268e+01]]

     ...

     [[1.18803501e+01 2.59655545e+01 4.14475767e+01]
      [4.92101295e+01 1.26780939e+01 3.35702570e+01]]

     [[4.61551247e+01 1.14032290e+01 1.11288516e+01]
      [1.45821987e+01 3.72715918e+01 3.13107028e+01]]

     [[3.13796407e+01 1.73772346e+01 4.53959218e+01]
      [4.11018775e+01 4.23862683e+00 4.95558696e+01]]]


    [[[2.20204690e+01 1.35957881e+01 3.85285704e-01]
      [1.26498049e+01 2.13537336e+01 1.38975849e+01]]

     [[1.70478872e+01 3.96844426e+01 6.51224968e+00]
      [4.34087142e+01 4.27480879e+01 2.31273211e+01]]

     [[6.16571336e-01 1.94615823e+01 4.14430890e+01]
      [1.42775272e+01 1.59914301e+01 1.88164093e+01]]

     ...

     [

There are also convenient ways to construct arrays with all zeros, ones, or given a value with the **zeros()**, **ones()**, and **full()** functions. Pass the shape of the array as a tuple as first argument.

In [None]:
z = np.zeros((2, 5))
print(z)
print()
o = np.ones((3, 4, 2))
print(o)
print()
n = np.full((3, 6), 9)
print(n)


[[0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]]

[[[1. 1.]
  [1. 1.]
  [1. 1.]
  [1. 1.]]

 [[1. 1.]
  [1. 1.]
  [1. 1.]
  [1. 1.]]

 [[1. 1.]
  [1. 1.]
  [1. 1.]
  [1. 1.]]]

[[9 9 9 9 9 9]
 [9 9 9 9 9 9]
 [9 9 9 9 9 9]]


Also useful is the construction of a 2D identity matrix that is constructed with the **eye()** function. As this matrix is 2D and symmetric, the function only requires one size value for the dimension and not a tuple of values.

In [None]:
e = np.eye(5)
print(e)

[[1. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0.]
 [0. 0. 1. 0. 0.]
 [0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 1.]]


For all the above construction functions, you can define the data type of the array with the named **dtype** argument.

Common data types are **int64** (integer), **uint64** (unsigned integer), **float64**, **complex64**, where 64 refers to a 64-bit version. There are also variations for 8, 16, 32, and in the case of complex 128 bits.

In [None]:
ui = np.full((2, 3), 7, dtype=np.uint16)
print(ui)
print()
print(ui.dtype)


[[7 7 7]
 [7 7 7]]

uint16


## Attributes

When working interactively with Python and receiving multidimensional arrays as a result of a function, you often want to know what the shape and data type of the returned array is. Especially with NumPy functions, one often wants to aggregate the values in a particular dimension, so that the dimensions are reduced, or to combine arrays, so that further dimensions are created. By checking the following attributes, one can often determine if the executed function performed as expected. If the shape of the array returned is not right, it is very likely that the calculation done by the function was also not performed as expected. In many cases, you then have to be more specific on which axis of the array the function should be performed.

Remember, in contrast to methods (functions associated with a class) attributes do not need parenthesis.

The attributer **ndim** gives the number of dimensions.

In [None]:
print(a1.ndim)
print(a2.ndim)
print(a7.ndim)


1
2
7


More interesting is the size of each dimension of the array, which is given by the attribute **shape** of each array.

In [None]:
print(a1.shape)
print(a2.shape)
print(a7.shape)


(10000000,)
(3, 5)
(6, 2, 7, 5, 9, 2, 3)


Check the next code cell to verify that the attribute is actually a tuple. (Remember that the Python function **type()** returns the type of an object.)

In [None]:
type(a1.shape)
type(a7.shape)
type(a2.shape)

tuple

Although the function **type()** will return for an instance of a NumPy array the type numpy.ndarray, the function cannot be used to retrieve the type of the values store in a NumPy array.

In [None]:
type(a1)


numpy.ndarray

To retrieve the type of the values stored in a NumPy array, use the **dtype** attribute instead. (Now you should also see the difference between a function, e.g. type(), and an attribute, e.g. dtype, and their syntactical use on a variable.)

In [None]:
print(a1.dtype)
print(a0.dtype)

float64
int64


The variable a1 is therefore an object of the type NumPy array, which in turn stores values of the type floating point number (real number).

The attribute **size** holds the total number of elements in the array. 

In [None]:
print(a1.size)
print(a7.size)

10000000
22680


Less interesting are the attributes **itemsize** that holds the size of each array element in bytes and **nbytes** that holds the total size in bytes of the array. (Note that a floating point number with 64-bits (float64) is 8 bytes long. The attribute **nbytes** is therefore calculated as size * itemsize.)

In [None]:
print(a2.itemsize)
print(a7.nbytes)
print(a7.size * a7.itemsize)

8
181440
181440


# Indexing

Accessing the elements of an array in NumPy is done in the same way as accessing the elements of a Python list using square brackets **[]**. As in many other programming languages, Python and NumPy use zero-based numbering, so that the first element of an array is accessed at index 0 (written in square bracket notation as [0]), the second element is accessed at index 1, the third at index 2, etc.

In [None]:
# first element of a1 at index 0
print(a1[0])
# seventh element of a1 at index 6
print(a1[6])

-1.0856306033005612
-2.426679243393074


Using the **range()** function of Python, we can create a range of n values from 0 to n-1 with **range(n)**, and loop over all these values in a for-loop and assign the current value at each iteration of the loop to a running variable, which is in the example named i. With this running variable i, we can access the first n elements of an array one after another.

In [None]:
for i in range(10):
 print(f'a1[{i}] : {a1[i]}')

a1[0] : -1.0856306033005612
a1[1] : 0.9973454465835858
a1[2] : 0.28297849805199204
a1[3] : -1.506294713918092
a1[4] : -0.5786002519685364
a1[5] : 1.651436537097151
a1[6] : -2.426679243393074
a1[7] : -0.42891262885617726
a1[8] : 1.265936258705534
a1[9] : -0.8667404022651017


As in Python, negative index numbers can be used to index backwards starting from the last element of an array. The last element is then index with **[-1]**, the second to last with **[-2]**, etc.

In [None]:
 print(a1[-1])
print(a1[-3])
print(a1[-4])

0.9228147982468194
-0.5281444572478659
0.8235542397139756


To access single elements in a **multidimensional array**, you need to provide a comma-separated tuple of indices.

In [None]:
print(a2)
print()
print(a2[2, 1])
print()
# Remember the shape of a7 is (6, 2, 7, 5, 9, 2, 3)
print(a7[1, 0, 4, 3, 7, 0, 1])

[[20.20499052 34.38819988 66.25202847 77.36564441 29.7074708 ]
 [57.6617209  27.24146786 78.76757381 44.947833   84.01904067]
 [76.28783409 15.38079956 61.26190759 68.79159952 11.14729942]]

15.380799556658042

44.07216208890629


You can not only read values from an array, but also set values at a given index.

In [None]:
a2[1, 2] = 3.123
print(a2)

[[20.20499052 34.38819988 66.25202847 77.36564441 29.7074708 ]
 [57.6617209  27.24146786  3.123      44.947833   84.01904067]
 [76.28783409 15.38079956 61.26190759 68.79159952 11.14729942]]


But be aware that NumPy arrays store all values of the array in the same data type. If you try to set an array element with a value that is not the same type as the values of the array, then this value will be silently converted to make it fit. Do not be surprised if this happens.

In [None]:
print(a0)
a0[2] = 7.56
# the decimals are not stored as the floating point number is
# first converted to an integer number, loosing the decimals
# in the process
print(a0)

[0 1 7 3]
[0 1 7 3]


# Slicing

While we access individual array elements using indexing, we access subarrays using slicing. As with Python lists, slicing NumPy arrays uses the notation with square brackets and the colon (:) character.

A slice of an array is accessed with **[start : stop : step]**, where an open interval notation is assumed, so that the element identified by stop is not included.

In [None]:
r10 = np.random.randint(10, size=(10))
print(r10)
print(r10[0:5:1])
print(r10[2:8:1])
print(r10[0:10:2])

[4 9 7 2 9 9 1 0 0 6]
[4 9 7 2 9]
[7 2 9 9 1 0]
[4 7 9 1 0]


What makes slicing very confusing at first is that any of the 3 values (start, stop, and step) can be omitted and will then take on a **default value (start=0, stop=size of dimension, step=1)**. But both colons need to be in any case provided.

The same slices from above could also be written in the following way.

In [None]:
print(r10[:5:])
print(r10[2:8:])
print(r10[::2])

[4 9 7 2 9]
[7 2 9 9 1 0]
[4 7 9 1 0]


To make matters worse, the step value can be negative, and then the defaults for start and stop are swapped (start=size of dimension, stop=0).

In [None]:
print(r10[::-1])
print(r10[8::-1])

[6 0 0 1 9 9 2 7 9 4]
[0 0 1 9 9 2 7 9 4]


A nightmare at first, but slicing on **multidimensional arrays** actually just follows the strict notation of defining a comma-separated list of slicing ranges for each dimension. And you have to strictly provide one slicing range for each dimension.

In [None]:
i12 = np.array([[8, 2, 5, 3, 6], [9, 4, 0, 2, 8], [7, 6, 1, 9, 3]])
print(i12)
print()
print(i12[0:2:1, 2:5:1])

[[8 2 5 3 6]
 [9 4 0 2 8]
 [7 6 1 9 3]]

[[5 3 6]
 [0 2 8]]


The confusion starts when the values are not explicitly specified, but default values are assumed.

In [None]:
# same as above
print(i12[:2:, 2::])

[[5 3 6]
 [0 2 8]]


You can also skip the last colon in case you do not need to provide the step value.

In [None]:
print(i12[1::, ::-2])
print()
print(i12[1:, ::-2])

[[8 0 9]
 [3 1 7]]

[[8 0 9]
 [3 1 7]]


In [None]:
# (again) same as (further) above
print(i12[:2, 2:])

[[5 3 6]
 [0 2 8]]


You can also use a single colon (:) for an empty slice. (An empty slice is a slice with default values.)

In [None]:
print(i12[0:2, :])

[[8 2 5 3 6]
 [9 4 0 2 8]]


The ellipsis (three dots) notation can be used to have NumPy figure out the other dimensions. This is actually more helpful for reshaping arrays. But also in slicing, it can be used instead of the double colons (::).

In [None]:
print(i12[1::, ...])

[[9 4 0 2 8]
 [7 6 1 9 3]]


In [None]:
print(i12[1::, ...])

[[9 4 0 2 8]
 [7 6 1 9 3]]


By the way, if a dimension is 1 and we do not need the dimension anymore, then the **squeeze()** function of NumPy can be used to get rid of it.

In [None]:
# Remember the shape of a7 is (6, 2, 7, 5, 9, 2, 3)
print(a7[0:3:, ...].shape)
print(a7[0:3:, 0:1:, ..., 0:3:, 0:1:2].shape)

(3, 2, 7, 5, 9, 2, 3)
(3, 1, 7, 5, 9, 2, 1)


By squeezing an array, the number of dimensions is reduced by omitting any dimensions of size 1. But the number of array elements is still the same, which can easily verified with the size attribute.

In [None]:
print(np.squeeze(a7[0:3:, 0:1:, ..., 0:3:, 0:1:2]).shape)

(3, 7, 5, 9, 2)


In [None]:
print(np.squeeze(a7[0:3:, 0:1:, ..., 0:3:, 0:1:2]).size)
print(a7[0:3:, 0:1:, ..., 0:3:, 0:1:2].size)

1890
1890


**Think about this is for a minute.** A first intuition would perhaps say that values are also lost by the omission of a dimension. But why is this not the case?

And be careful to adapt your indexing and slicing accordingly once you squeezed the dimensions of an array.

# Combined Indexing & Slicing

You can combine indexing and slicing, if you only need to access a single row or column (or further dimension) of an array.

In [None]:
print(i12)
print()
print(i12[1, 1:4])

[[8 2 5 3 6]
 [9 4 0 2 8]
 [7 6 1 9 3]]

[4 0 2]


But be aware that by using indexing, your resulting array loses the indexed dimension. 

In [None]:
print(i12.shape)
print(i12[1, 1:4].shape)

(3, 5)
(3,)


This is better explaned further with a 3-dimensional array.

In [None]:
i3D = np.random.randint(0, 10, size=(2, 4, 5))
print(i3D.shape)
print()
print(i3D)

(2, 4, 5)

[[[1 0 7 6 0]
  [6 8 8 7 5]
  [6 7 0 8 9]
  [0 4 2 9 1]]

 [[4 6 7 4 7]
  [7 8 2 1 5]
  [2 6 4 5 2]
  [9 0 0 9 5]]]


First, we use only slicing to cut out a subarray of the second dimension. Check the shape of the resulting array.

In [None]:
print(i3D[::, 1:3, ::].shape)
print()
print(i3D[::, 1:3, ::])

(2, 2, 5)

[[[6 8 8 7 5]
  [6 7 0 8 9]]

 [[7 8 2 1 5]
  [2 6 4 5 2]]]


Following is almost the same slicing operation, but with fewer elements defined for the second dimension. It is still a 3-dimensional array.

In [None]:
print(i3D[::, 1:2, ::].shape)
print()
print(i3D[::, 1:2, ::])

(2, 1, 5)

[[[6 8 8 7 5]]

 [[7 8 2 1 5]]]


And now using indexing (instead of slicing) for the second dimension, whereby we loose the second dimension of the array. The values are still the same as with the slicing above, but the array dimensions are different.

In [None]:
print(i3D[::, 1, ::].shape)
print()
print(i3D[::, 1, ::])

(2, 5)

[[6 8 8 7 5]
 [7 8 2 1 5]]


Cutting out single rows, columns, or further dimensions can be extremely shortened by indexing and empty slicing.

In [None]:
# all of row 1
print(i12[1, :])
# all of column 4
print(i12[:, 4])

[9 4 0 2 8]
[6 8 3]


But make sure this is really what you want. If you want to keep the number of dimensions, then better use slicing alone.

In [None]:
# all of column 4
print(i12[:, 4:5])

[[6]
 [8]
 [3]]


# Views & Copies

It is very important to note that the slicing operation on an array returns a subarray view rather than a copy of the subarray. This means that if you change a value in the subarray slice, the value in the original array will also change.

In [None]:
print(i12)
print()
i12_slice = i12[1:3, 1:4]
print(i12_slice)

[[8 2 5 3 6]
 [9 4 0 2 8]
 [7 6 1 9 3]]

[[4 0 2]
 [6 1 9]]


Note how changing the value at index (0, 1) of the subarray slice also changes the value at the original array.

In [None]:
i12_slice[0, 1] = 8
print(i12_slice)
print()
print(i12)

[[4 8 2]
 [6 1 9]]

[[8 2 5 3 6]
 [9 4 8 2 8]
 [7 6 1 9 3]]


Although this behavior may be surprising or even annoying at first sight, it is actually very useful when large parts of the array are to be changed. For example, see how an entire slice can be set to a given value with just one line of code. Setting values on slices can be a very effective way to work with data. 

In [None]:
i12[1:3, 1:4] = 3
print(i12)

[[8 2 5 3 6]
 [9 3 3 3 8]
 [7 3 3 3 3]]


If changes to the sliced subarray should not result in changes to the original array, then a copy must be explicitly created from the slice with the **copy()** method.

In [None]:
i12_slice_copy = i12[0:2, 0:3].copy()
print(i12_slice_copy)

[[8 2 5]
 [9 3 3]]


If the copied subarray is now modified, the original array remains untouched.

In [None]:
i12_slice_copy[:, :] = 5
print(i12_slice_copy)
print()
print(i12)

[[5 5 5]
 [5 5 5]]

[[8 2 5 3 6]
 [9 3 3 3 8]
 [7 3 3 3 3]]


# Reshaping

It is sometimes useful and necessary to reshape an array with the **reshape()** method, where the size of the initial array (the number of values) must be the same as the size of the resulting array.

This can also be used to generate a new array.

(The arange function of NumPy returns an array with values within a given interval (start, stop) and step. The function is more flexible than the Python range() (generator) function as it allows to construct ranges with data types other than integer values.)

In [None]:
f16 = np.arange(16.0, dtype=np.float64)
print(f16)
# f16 is a Numpy array
print(type(f16))
# that stores values of data type (64-bit) floating point
print(f16.dtype)

[ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9. 10. 11. 12. 13. 14. 15.]
<class 'numpy.ndarray'>
float64


Note that the shape of the new array must be given to the method as a tuple. That is the reason for the double parenthesis.

In [None]:
f16_reshaped = f16.reshape((4, 4))
print(f16_reshaped)

[[ 0.  1.  2.  3.]
 [ 4.  5.  6.  7.]
 [ 8.  9. 10. 11.]
 [12. 13. 14. 15.]]


Besides reshaping an array to a specific shape, it is often sufficient to just add another dimension to the array. This can be easily accomplished with the keyword newaxis of NumPy that is used in conjunction with the slicing operator. Just insert the keyword newaxis wherever you want a new dimension.

In the following the 1-dimensional array is turned into a 2-dimensional row or column matrix.

In [None]:
f4 = np.arange(4, dtype=np.float64)
# row matrix
print(f4[np.newaxis, :])
print()
# column matrix
print(f4[:, np.newaxis])

[[0. 1. 2. 3.]]

[[0.]
 [1.]
 [2.]
 [3.]]


Once we perform computations on arrays, we often have to bring them into compatible shapes. And using the keyword newaxis is a convenient way to do that.

# Combining and Splitting

Multidimensional arrays can be both combined and split. A distinction is made between operations in which the number of dimensions remains the same and only the size of one dimension changes, and operations that behave exactly the other way around. Depending on what you want to achieve, you have to choose the appropriate function.

The function **concatenate()** combines two or more arrays that are passed as a list (or tuple) of arrays as the first argument to the function. Concatenate always joins along an existing axis.

In [None]:
x = np.random.randint(0, 9, (3))
y = np.random.randint(0, 9, (3))
z = np.random.randint(0, 9, (3))
r = np.concatenate([x, y, z])
print(x)
print(y)
print(z)
print()
print(r)

[6 4 6]
[6 5 3]
[4 0 6]

[6 4 6 6 5 3 4 0 6]


For multidimensional arrays, the dimension to concatenate by can be specified by the **axis** argument. (Be aware that the numbering of axis starts with 0 and that the axis must exist. You cannot increase the number of dimensions of the arrays with this function, just the size of the dimensions.)

In [None]:
x2 = np.random.randint(0, 9, (2, 4))
y2 = np.random.randint(0, 9, (2, 4))
r1 = np.concatenate([x2, y2], axis=0)
r2 = np.concatenate([x2, y2], axis=1)
print(x2)
print()
print(y2)
print()
print(r1)
print()
print(r2)

[[6 5 5 1]
 [3 3 2 4]]

[[8 2 5 6]
 [4 0 6 7]]

[[6 5 5 1]
 [3 3 2 4]
 [8 2 5 6]
 [4 0 6 7]]

[[6 5 5 1 8 2 5 6]
 [3 3 2 4 4 0 6 7]]


You can also number the axis from the back using negative axis indices. The following is equivalent to the above. 

In [None]:
r1 = np.concatenate([x2, y2], axis=-2)
r2 = np.concatenate([x2, y2], axis=-1)
print(x2)
print()
print(y2)
print()
print(r1)
print()
print(r2)

[[6 5 5 1]
 [3 3 2 4]]

[[8 2 5 6]
 [4 0 6 7]]

[[6 5 5 1]
 [3 3 2 4]
 [8 2 5 6]
 [4 0 6 7]]

[[6 5 5 1 8 2 5 6]
 [3 3 2 4 4 0 6 7]]


The axis index -1 is particularly useful as one often wants to concatenate by the last axis. And by passing the axis argument -1 (instead of a positive number), it is not necessary to keep track of how many dimensions the array has: it is the last dimension, no matter how many dimensions the input array has. For example, if it happens that the array shape needs to be changed (by increasing or decreasing the number of dimensions), because you added some further code at the beginning of your program, then the index -1 still refers to the last dimension. No need to correct it.

The function **vstack()** stacks arrays vertically, meaning it concantenates arrays row wise. 

In [None]:
r = np.vstack([x, y, z])
print(r)

[[6 4 6]
 [6 5 3]
 [4 0 6]]


Note the difference with regard to the shape of the resulting arrays between concatenate and vstack. Concatenate joins the two arrays along the existing axis, while vstack introduces a new axis.

In [None]:
print(np.concatenate([x, y, z]).shape)
print(np.vstack([x, y, z]).shape)

(9,)
(3, 3)


A new axis is, however, only introduced, if the respective axis does not already exist. If we stack along the horizontal direction with the function **hstack()**, which is column wise, then no new axis needs to be introduced, even for the 1-dimensional case.

In [None]:
r = np.hstack([x, y, z])
print(r.shape)
print(r)

(9,)
[6 4 6 6 5 3 4 0 6]


For 2-dimensional arrays, both vstack and hstack do not introduced new dimensions, since this is not necessary. So the result is the same as with concatenate.

In [None]:
print(np.vstack([x2, y2]))
print()
print(np.hstack([x2, y2]))

[[6 5 5 1]
 [3 3 2 4]
 [8 2 5 6]
 [4 0 6 7]]

[[6 5 5 1 8 2 5 6]
 [3 3 2 4 4 0 6 7]]


But we can stack 2-dimensional arrays according to the depth dimension with the function **dstack()**.

In [None]:
r3 = np.dstack([x2, y2])
print(r3.shape)
print()
print(r3)

(2, 4, 2)

[[[6 8]
  [5 2]
  [5 5]
  [1 6]]

 [[3 4]
  [3 0]
  [2 6]
  [4 7]]]


Due to the fact that the resulting array now has 3 dimensions, the output with print is no longer as nice and easy to follow.

There is also a general function **stack()**, that allows you to define the axis to stack by.

For the splitting of arrays, NumPy provides the functions **split()**, **vsplit()**, **hsplit()**, and **dsplit()**, which work on a given axis, row wise, column wise, or depth wise, respectively. When no further information is passed to the functions, the split produces arrays of equal size. The number of times the array is to be split is passed as the first argument to the function.

Note that the function returns a tuple with as many sub-arrays as is requested by the respective argument. And that the result of a split function are views on arrays, rather than copies of arrays (like the slicing operation).

In [None]:
print(x2)
print()
print(np.vsplit(x2, 2))
s1, s2 = np.vsplit(x2, 2)
print()
print(s1)
print()
print(s2)

[[6 5 5 1]
 [3 3 2 4]]

[array([[6, 5, 5, 1]]), array([[3, 3, 2, 4]])]

[[6 5 5 1]]

[[3 3 2 4]]


In [None]:
print(y2)
print()
s1, s2 = np.hsplit(y2, 2)
print(s1)
print()
print(s2)

[[8 2 5 6]
 [4 0 6 7]]

[[8 2]
 [4 0]]

[[5 6]
 [6 7]]


Instead of the number of times the split should be performed, you can also pass a list of indices as a second argument that defines exactly where the splits should happen according to the respective axis. (The returned sub-arrays are then not of equal size, but rather exactly how you defined it.)

In [None]:
print(x2)
print()
s1, s2, s3 = np.hsplit(x2, [1, 3])
print()
print(s1)
print()
print(s2)
print()
print(s3)

[[6 5 5 1]
 [3 3 2 4]]


[[6]
 [3]]

[[5 5]
 [3 2]]

[[1]
 [4]]


# Closing Remarks

The exercise notebook just shows the most common ways of constructing, indexing, slicing, reshaping, combining, and splitting multidimensional arrays, and there are many more functions that can be used for these purposes. Check the official online documentation of NumPy for further ways. There are also a lot of tricks of the trade that can be used to accomplish some things in this regard in an elegant way. It is always worth searching the web to see if someone encountered the same problem and if someone else had a great way to solve it. 

Once you become more experienced with Numpy,  you will probably love to use it.