# Using NumPy

After learning python, one needs to become familiar with various python packages which are used very heavily in Machine Learning and Data Science. ***Numpy*** is often the very first such package that you encounter in your journey. NumPy is, just like SciPy, Scikit-Learn, Pandas, etc. one of the packages that you just can’t miss when you’re learning data science.

**Prerequisites**

It is recommended to have a basic knowledge of python before you start this post. If you are not comfortable using Python or need a quick refresher than take a look at our post on [Python Fundamentals](http://learningmlandai.com/python-fundamentals/).

## Introduction

### What is Numpy?

[NumPy](https://numpy.org/) which stands for Numerical Python, is the core library for scientific computing in Python. It is a library consisting of multidimensional array objects and a collection of routines for processing those arrays. Using NumPy, mathematical and logical operations on arrays can be performed very efficiently.

In our previous posts on [Python Fundamentals](http://learningmlandai.com/python-fundamentals/), we learnt about [collections](http://learningmlandai.com/python-fundamentals-part-2/) in Python including lists. Python lists are very useful but they have various limitations. Lists are a collection of heterogeneous elements. Because of this operations can’t be performed efficiently in lists. Also in lists, you have no way of applying a mathematical operation over an entire list easily. Both these problems are solved easily using Numpy.

*We will compare performance of lists vs numpy later in this post after we have become little familiar with using numpy.*

### Installing Numpy

Numpy along with many other commonly used packages comes prebundled with Anaconda. However if you don’t have numpy currently installed you can do so easily using the pip command.

`pip install numpy`

<br> 
## Getting started with Numpy

Numpy is the core library for scientific computing in Python. It provides a homogeneous, high-performance, multidimensional array object, that holds many benefits over Python lists, such as: being more compact, faster access in reading and writing items, being more convenient and more efficient.

NumPy array is a central data structure of the numpy library. A numpy array is an N-dimensional array type called ndarray, and can be indexed by a tuple of non-negative integers.

To work with these arrays, there’s a vast amount of high-level mathematical functions available to operate on these arrays.

### Importing numpy

In order to start using numpy and numpy arrays, we need to make sure that the numpy library is present in our coding environment. To do this we can easily import the numpy library in the environment using the `import` command.

In [1]:
# importing numpy
import numpy as np

In the above statement we have imported numpy as np. This enables us to use numpy with the short form np. Although we could have used any word in place of np, using np is a common practice followed by all the developers.

### Creating numpy arrays

To make a numpy array, you can just use the np.array() function by passing a list to it. Optionally, you can also specify the data type of the data in the numpy array. You can read about all the data types supported by numpy [here](https://numpy.org/doc/stable/user/basics.types.html)

Lets look at some examples.

In [2]:
numbers_list = [1, 2, 3, 4]
print(numbers_list)
print(type(numbers_list))

[1, 2, 3, 4]
<class 'list'>


Now using this list lets create a numpy array.

In [3]:
numbers_array = np.array(numbers_list)
numbers_array

array([1, 2, 3, 4])

Lets check the type of this array.

In [4]:
type(numbers_array)

numpy.ndarray

As we can see, the type of the ***numbers_array*** is `numpy.ndarray`. As we have read before ndarray means n dimesional array as we can create a n dimensional array using numpy.

Lets take a look at another example.

In [5]:
another_array = np.array([15, 18, 21, -5, 6])
another_array

array([15, 18, 21, -5,  6])

We can also create multi-dimensional array using numpy. Lets try and create a 2 dimensional array in numpy.

In [6]:
two_d_array = np.array([[1, 2, 3], [4, 5, 6]])
two_d_array

array([[1, 2, 3],
       [4, 5, 6]])

In [7]:
three_d_array = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])
three_d_array

array([[[ 1,  2,  3],
        [ 4,  5,  6]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

<br>When you print an array, NumPy displays it in a similar way to nested lists, but with the following layout:

-  the last axis is printed from left to right,
-  the second-to-last is printed from top to bottom,
-  the rest are also printed from top to bottom, with each slice separated from the next by an empty line.

One-dimensional arrays are then printed as rows, bidimensionals as matrices and tridimensionals as lists of matrices.

We rarely need to use arrays with more than 3 dimension. Mostly we use 1d and 2d numpy arrays only.

<br>Often, the elements of an array are originally unknown, but its size is known. Hence, NumPy offers several utility functions to create arrays with initial placeholder content. These minimize the necessity of growing arrays, an expensive operation.

The function `zeros()` creates an array full of zeros, the function `ones()` creates an array full of ones, and the function `empty()` creates an array whose initial content is random and depends on the state of the memory. Lets try this out with some examples.

In [8]:
# Create an array of zeros
zeros_array = np.zeros((2, 3))
print(zeros_array)

[[0. 0. 0.]
 [0. 0. 0.]]


In [9]:
# Create an array of ones
ones_array = np.ones((3, 3))
print(ones_array)

[[1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]]


In [10]:
# Create an empty array
empty_array = np.empty((3,2))
empty_array

array([[0., 0.],
       [0., 0.],
       [0., 0.]])

As we can see, by default, the dtype of the created array is float64. We can create a numpy array of a specific dtype by specifying the dtype at the time of creation as below.

In [11]:
# Create an int array of ones
ones_int_array = np.ones((3, 3), dtype=int)
ones_int_array

array([[1, 1, 1],
       [1, 1, 1],
       [1, 1, 1]])

<br>We can initialize arrays with ones or zeros as we have seen above, but you can also create arrays that get filled up with evenly spaced values, constant or random values. Lets see some examples.

In [12]:
# Create an array with random values
random_array = np.random.random((2,2))
random_array

array([[0.90194533, 0.64007909],
       [0.9853102 , 0.20331476]])

In [13]:
# Create a full array
full_array = np.full((2,2), 7)
full_array

array([[7, 7],
       [7, 7]])

To create sequences of numbers, NumPy provides the `arange` function which is analogous to the Python built-in range, but returns an array. This function returns an ndarray object containing evenly spaced values within a given range. The format of the function is as follows −

`numpy.arange(start, stop, step, dtype)`

In [14]:
# Create an array of evenly-spaced values
range_array = np.arange(10,50,5)
range_array

array([10, 15, 20, 25, 30, 35, 40, 45])

When arange is used with floating point arguments, it is generally not possible to predict the number of elements obtained, due to the finite floating point precision. For this reason, it is usually better to use the function linspace that receives as an argument the number of elements that we want, instead of the step. The usage of this function is as follows

In [15]:
# Create an array of evenly-spaced values
evenly_spaced_array = np.linspace(0,2,9)
evenly_spaced_array

array([0.  , 0.25, 0.5 , 0.75, 1.  , 1.25, 1.5 , 1.75, 2.  ])

<br>So as we have seen, numpy provide us with many different options to create an array. Below two points all the things you need to know for creating arrays in numpy.

-  For some functions, such as `np.ones()`, `np.random.random()`, `np.empty()`, `np.full()` or `np.zeros()` the only thing that you need to provide is the shape of the array that you want to make. Optionally, you can also specify the data type. In the case of np.full(), you also have to specify the constant value that you want to insert into the array.
-  With `np.linspace()` and `np.arange()` you can make arrays of evenly spaced values. The difference between these two functions is that the last value of the three that are passed in the code chunk above designates either the step value for np.linspace() or a number of samples for np.arange(). What happens in the first is that you want, for example, an array of 9 values that lie between 0 and 2. For the latter, you specify that you want an array to start at 10 and per steps of 5, generate values for the array that you’re creating.

### Inspecting numpy arrays

Numpy arrays have a lot of attributes you can use to know the details of an array object. Let's try this out with an example.

In [16]:
# Create a numpy array
a = np.array([[1, 2, 3], [4, 5, 6]])
a

array([[1, 2, 3],
       [4, 5, 6]])

-  ***shape*** - We can find the shape of an array using the `shape` attribute. This array attribute returns a tuple consisting of array dimensions.

In [17]:
a.shape

(2, 3)

In the above example `shape` return a tuple, which tells us that ***a*** has 2 rows and 3 columns.

-  ***ndim*** - We can find the number of array dimensions using the `ndim` attribute.

In [18]:
a.ndim

2

-  ***size*** - We can find the number of elements in the array using the `size` attribute.

In [19]:
a.size

6

-  ***dtype*** - We can find the data type of the elements of the array using the `dtype` attribute.

In [20]:
a.dtype

dtype('int64')

These are the most commonly used array attributes. However numpy provides a lot more array attributes like `itemsize`, `data`. You can check out all the available attributes in numpy official documentation [here](https://numpy.org/doc/stable/reference/arrays.ndarray.html#array-attributes).

<br>*Now that we know how to create a numpy array and use common attributes to find more information about the array, it’s time to look more closely into the second key element that really defines the NumPy library: scientific computing.*

<br> 
## Arithmetic operations on numpy arrays

Arithmetic operations on numpy arrays are usually done on pairs of arrays on an element-by-element basis. A new array is created and filled with the result. Basic mathematical functions operate elementwise on arrays, and are available both as operator overloads and as functions in the numpy module. In the simplest case, if two arrays are of exactly the same shape, then these operations are smoothly performed.

Lets take a look at some examples.

In [21]:
x = np.array([[1, 2], [3, 4]], dtype=np.float64)
y = np.array([[5, 6], [7, 8]], dtype=np.float64)

In [22]:
x

array([[1., 2.],
       [3., 4.]])

In [23]:
y

array([[5., 6.],
       [7., 8.]])

Now lets do some basic arithmentic operation on ***x*** and ***y***.

In [24]:
# Elementwise sum; both produce the array
print(x + y)
print(np.add(x, y))

[[ 6.  8.]
 [10. 12.]]
[[ 6.  8.]
 [10. 12.]]


In [25]:
# Elementwise difference; both produce the array
print(x - y)
print(np.subtract(x, y))

[[-4. -4.]
 [-4. -4.]]
[[-4. -4.]
 [-4. -4.]]


In [26]:
# Elementwise product; both produce the array
print(x * y)
print(np.multiply(x, y))

[[ 5. 12.]
 [21. 32.]]
[[ 5. 12.]
 [21. 32.]]


In [27]:
# Elementwise division; both produce the array
print(x / y)
print(np.divide(x, y))

[[0.2        0.33333333]
 [0.42857143 0.5       ]]
[[0.2        0.33333333]
 [0.42857143 0.5       ]]


<br>Unlike in many matrix languages, the product operator `*` operates elementwise in NumPy arrays. The matrix product can be performed using the `@` operator (in python >=3.5) or by using the `dot` method.

In [28]:
# Matrix division; both produce the array
print(x @ y)
print(x.dot(y))

[[19. 22.]
 [43. 50.]]
[[19. 22.]
 [43. 50.]]


<br>You can also easily do exponentiation and taking the square root of your arrays with `np.exp()` and `np.sqrt()`, or calculate the sines or cosines of your array with `np.sin()` and `np.cos()`. There’s also a way for you to calculate the natural logarithm with `np.log()`.

In [29]:
# Elementwise array exponentiation
np.exp(x)

array([[ 2.71828183,  7.3890561 ],
       [20.08553692, 54.59815003]])

In [30]:
# Elementwise square root of array
np.sqrt(x)

array([[1.        , 1.41421356],
       [1.73205081, 2.        ]])

In [31]:
# Elementwise sine of a numpy array
np.sin(x)

array([[ 0.84147098,  0.90929743],
       [ 0.14112001, -0.7568025 ]])

In [32]:
# Elementwise cosine of a numpy array
np.cos(x)

array([[ 0.54030231, -0.41614684],
       [-0.9899925 , -0.65364362]])

In [33]:
# Elementwise log of a numpy array
np.log(x)

array([[0.        , 0.69314718],
       [1.09861229, 1.38629436]])

<br>Numpy also provides many useful aggregate functions for performing computations on arrays. We can use `np.sum()` to find the sum of all the elements and use `np.min()` or `np.max()` to find the min or max elements in the array. We can also find various statistical parameters like mean, median  and standard deviation of the array using `np.mean()`, `np.median()` or `np.std()` functions.

Lets learn more about these aggregate function from examples below.

In [34]:
x

array([[1., 2.],
       [3., 4.]])

In [35]:
# Sum of all elements of the array
np.sum(x)

10.0

We can also perform row wise or column wise sum by providing ***axis*** parameter to the sum function. `axis=0` means columnwise sum of all rowa and `axis=1` means rowwise sum of all columns.

In [36]:
# Columnwise sum of all elements of array
np.sum(x, axis=0)

array([4., 6.])

In [37]:
# Rowwise sum of all elements of array
np.sum(x, axis=1)

array([3., 7.])

In [38]:
# Max element in the array
np.max(x)

4.0

In [39]:
# Minimum element in the array
np.min(x)

1.0

In [40]:
# Mean of all the array elements
np.mean(x)

2.5

In [41]:
# Median of the array
np.median(x)

2.5

In [42]:
# Standard deviation of the array
np.std(x)

1.118033988749895

We have a lot more operations supported by numpy. You can find the list of all the operations supported [here](https://numpy.org/doc/stable/reference/routines.math.html)

<br>As we have seen from the above examples, it is pretty straightforward to perform a variety of mathematical and statistical computations on numpy arrays if they have similar shapes. If the dimensions of two arrays are dissimilar, element-to-element operations are not possible. However, operations on arrays of non-similar shapes is still possible in NumPy, because of the ***broadcasting*** capability.

### Broadcasting in numpy

***Broadcasting*** is a powerful mechanism that allows numpy to work with arrays of different shapes when performing arithmetic operations. Frequently we have a smaller array and a larger array, and we want to use the smaller array multiple times to perform some operation on the larger array. Subject to certain constraints, the smaller array is *broadcast* across the larger array so that they have compatible shapes. Broadcasting provides a means of vectorizing array operations so that looping occurs in C instead of Python. It does this without making needless copies of data and usually leads to efficient algorithm implementations.

As we have seen just above, numPy operations are usually done on pairs of arrays on an element-by-element basis. In the simplest case, the two arrays must have exactly the same shape, as in the following example:

In [43]:
a = np.array([1, 2, 3])
a

array([1, 2, 3])

In [44]:
b = np.array([2, 2, 2])
b

array([2, 2, 2])

In [45]:
a * b

array([2, 4, 6])

NumPy’s broadcasting rule relaxes this constraint when the arrays’ shapes meet certain constraints. The simplest broadcasting example occurs when an array and a scalar value are combined in an operation.

In [46]:
a

array([1, 2, 3])

In [47]:
b = 2
b

2

In [48]:
a * b

array([2, 4, 6])

The result is equivalent to the previous example where ***b*** was an array. We can think of the scalar ***b*** being stretched during the arithmetic operation into an array with the same shape as ***a***. The new elements in ***b*** are simply copies of the original scalar. ***The stretching analogy is only conceptual***. NumPy is smart enough to use the original scalar value without actually making copies so that broadcasting operations are as memory and computationally efficient as possible.

<br>***General Broadcasting Rules***

When operating on two arrays, NumPy compares their shapes element-wise. It starts with the trailing dimensions and works its way forward. Two dimensions are compatible when
1. they are equal, or
2. one of them is 1

If these conditions are not met, a `ValueError: operands could not be broadcast together` exception is thrown, indicating that the arrays have incompatible shapes. The size of the resulting array is the size that is not 1 along each axis of the inputs.

<br>Arrays do not need to have the same number of dimensions. For example, if you have a 256x256x3 array of RGB values, and you want to scale each color in the image by a different value, you can multiply the image by a one-dimensional array with 3 values. Lining up the sizes of the trailing axes of these arrays according to the broadcast rules, shows that they are compatible:
![numpy_1](https://learn-ml-and-ai-blog-resources.s3.us-east-2.amazonaws.com/Numpy/numpy_1.png)

When either of the dimensions compared is one, the other is used. In other words, dimensions with size 1 are stretched or “copied” to match the other.

In the following example, both the A and B arrays have axes with length one that are expanded to a larger size during the broadcast operation:
![numpy_2](https://learn-ml-and-ai-blog-resources.s3.us-east-2.amazonaws.com/Numpy/numpy_2.png)

Here are some more examples:
![numpy_3](https://learn-ml-and-ai-blog-resources.s3.us-east-2.amazonaws.com/Numpy/numpy_3.png)

Here are examples of shapes that do not broadcast:
![numpy_4](https://learn-ml-and-ai-blog-resources.s3.us-east-2.amazonaws.com/Numpy/numpy_4.png)

Let's see an example of broadcasting in example.

In [49]:
x = np.arange(4)
xx = x.reshape(4,1)
y = np.ones(5)
z = np.ones((3,4))

In [50]:
x.shape

(4,)

In [51]:
y.shape

(5,)

In [52]:
x + y

ValueError: operands could not be broadcast together with shapes (4,) (5,) 

In [54]:
xx.shape

(4, 1)

In [55]:
y.shape

(5,)

In [56]:
(xx + y).shape

(4, 5)

In [57]:
xx + y

array([[1., 1., 1., 1., 1.],
       [2., 2., 2., 2., 2.],
       [3., 3., 3., 3., 3.],
       [4., 4., 4., 4., 4.]])

In [58]:
x.shape

(4,)

In [59]:
z.shape

(3, 4)

In [60]:
(x + z).shape

(3, 4)

In [61]:
x + z

array([[1., 2., 3., 4.],
       [1., 2., 3., 4.],
       [1., 2., 3., 4.]])

<br> 
## Indexing, Slicing and Iterating

Besides mathematical operations, we might also want to take just a part of the original array (or the resulting array) or just some array elements to use in further analysis or other operations. In such case, we will need to subset, slice and/or index arrays.

These operations are very similar to when we perform them on Python lists. If you are not familiar with indexing in python lists, I strongly recommend you to first take a look at my post on [Python Collections](http://learningmlandai.com/python-fundamentals/python-fundamentals-part-2/) where we have described indexsing in detail with a lot of examples. Here we will be building upon that knowledge.

One-dimensional arrays can be indexed, sliced and iterated over, much like lists and other Python sequences. It is 0-based, and accepts negative indices for indexing from the end of the array.

***Indexing and slicing in 1d array***

In [62]:
x = np.arange(10)
x

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [63]:
x[2]

2

In [64]:
x[-2]

8

In [65]:
x[2:5]

array([2, 3, 4])

***Indexing and slicing in multidimensional arrays***

Multidimensional arrays can have one index per axis.

In [66]:
y = np.array([[1, 2, 3], [4, 5, 6]])
y

array([[1, 2, 3],
       [4, 5, 6]])

In [67]:
y[1][2]

6

Unlike lists and tuples, numpy arrays support multidimensional indexing for multidimensional arrays. That means that it is not necessary to separate each dimension’s index into its own set of square brackets. These indices can also be given in a tuple separated by commas:

In [68]:
y[1, 2]

6

Note that although `y[1 ,2]` = `y[1][2]`, the second case is more inefficient as a new temporary array is created after the first index that is subsequently indexed by 2.

In [69]:
y[0, :]

array([1, 2, 3])

In [70]:
y[:, 1]

array([2, 5])

In [71]:
y[:, 1:3]

array([[2, 3],
       [5, 6]])

When fewer indices are provided than the number of axes, the missing indices are considered complete slices `:`.

In [72]:
y[0]

array([1, 2, 3])

The expression within brackets in `y[i]` is treated as an i followed by as many instances of : as needed to represent the remaining axes.

***Indexing using arrays and lists***

Numpy arrays may also be indexed with other arrays or lists.

In [73]:
x = np.arange(10,1,-1)
x

array([10,  9,  8,  7,  6,  5,  4,  3,  2])

In [74]:
# Indexing using numpy array
x[np.array([3, 3, 1, 8])]

array([7, 7, 9, 2])

In [75]:
# Indexing using lists
x[[3, 3, 1, 8]]

array([7, 7, 9, 2])

In [76]:
y

array([[1, 2, 3],
       [4, 5, 6]])

In [77]:
# Indexing using list in multidimensional array
y[:, [1, 1, 2]]

array([[2, 2, 3],
       [5, 5, 6]])

***Boolean indexing***

We use boolean indexing when instead of selecting elements, rows or columns based on index number, we want to select those values from the array that fulfill a certain condition.

Lets look at some examples.

In [78]:
x = np.array([15, 10, 3, 14, 9, 8, 12, 1, 5, 16, 4])
x

array([15, 10,  3, 14,  9,  8, 12,  1,  5, 16,  4])

Suppose we want to get all the elements in ***x*** which are less than 10. Lets see what comparing ***x*** with 10 gives us. 

In [79]:
result = x < 10
result

array([False, False,  True, False,  True,  True, False,  True,  True,
       False,  True])

In [80]:
x.shape

(11,)

In [81]:
result.shape

(11,)

As we can see, comparing ***x*** with 10 gave us a boolean array of the same dimension which we stored in ***result***. Each element of ***result*** has a value of `True` if the corresponding element in ***x*** fulfilled the codition else `False`. 

Now we can use this boolean array ***result*** to index ***x*** to get all those elements which fulfilled the conditions or in other words all those elements corresponding for which `True` is present in the boolean indexding array.

In [82]:
x[result]

array([3, 9, 8, 1, 5, 4])

We can do this directly without needing to create an intermediate boolean array as well.

In [83]:
x[x < 10]

array([3, 9, 8, 1, 5, 4])

To specify the condition, we can use all the logical operators available in python. Suppose in the above example we want all elements of ***x*** less than 10 but greater than 5 say. We can do this easily using `&` operator as below.

In [84]:
x[(x > 5) & (x < 10)]

array([9, 8])

There are a lot more caveats in indexing arrays. If you are interested in reading about this in detail, you can refer to numpy's official documentation [here](https://numpy.org/doc/stable/user/basics.indexing.html).

<br> 
## Manipulating arrays

Several routines are available in Numpy package for manipulation of elements in ndarray object.

### Reshaping and Resizing

We discussed in the broadcasting section that the dimensions of arrays need to be compatible if we want them to be good candidates for arithmetic operations. But the question of what should we do when that is not the case, was not answered yet.

Well, this is where we discuss that!

What we can do if the arrays don’t have the same dimensions, is to resize the array. It will then return a new array that has the shape that we passed to the `np.resize()` function. If we pass our original array together with the new dimensions, and if that new array is larger than the one that we originally had, the new array will be filled with copies of the original array that are repeated as many times as is needed.
However, if we just apply np.resize() to the array and pass the new shape to it, the new array will be filled with zeros.

In either case if the new shape results in an array smaller than the one we originally had, extra elements from the original array are discarded.

This will become more clear with some examples.

In [85]:
x = np.array([[1, 2, 3], [4, 5, 6]])
x

array([[1, 2, 3],
       [4, 5, 6]])

In [86]:
x.shape

(2, 3)

Now suppose we want to resize the array to `2*2`  dimension. First lets use `np.resize()` and pass both ***x*** and the new shape to it.

In [87]:
np.resize(x, (2, 2))

array([[1, 2],
       [3, 4]])

We can see that since the array with the new shape could hold only 4 elements, the last 2 elements from the original array has been discarded. This operation didn't change the original array though and produced a new array which we can check by printing x as below.

In [88]:
x

array([[1, 2, 3],
       [4, 5, 6]])

Now lets try to do the same operation by using the `resize()` function of the arrays as below.

In [89]:
x.resize((2, 2), refcheck=False)

In [90]:
x

array([[1, 2],
       [3, 4]])

As we can see in this case, the original array gets resized to the new shape.

Let's now try resizing our array to a larger array.

In [91]:
x

array([[1, 2],
       [3, 4]])

Lets first try to resize it using `np.resize()` method into an array with shape 3*3.

In [92]:
np.resize(x, (3, 3))

array([[1, 2, 3],
       [4, 1, 2],
       [3, 4, 1]])

We can see that the elements of the original array are repeated in the new array as many times as required till we get the new array of required shape. Again this doesn't change the original array which we can again confirm as below.

In [93]:
x

array([[1, 2],
       [3, 4]])

Again, let's try to do the same thing using `resize()` method of the arrays.

In [94]:
x.resize((3, 3), refcheck=False)

In [95]:
x

array([[1, 2, 3],
       [4, 0, 0],
       [0, 0, 0]])

Two things to observe here-
1. The elements of the original array are not repeated to fill the new array with the new shape. Extra values are filled with 0.
2. Original array gets changed.

<br>Besides ***resizing***, you can also ***reshape*** your array. This means that you give a new shape to an array without changing its data. The key to reshaping is to make sure that the total size of the new array is unchanged. If you take the example of an array which has a size of 3 X 4 or 12, you have to make sure that the new array also has a size of 12.

Let's see reshaping with some examples.

In [96]:
x = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
x

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

In [97]:
x.shape

(3, 4)

In [98]:
x.size

12

We can reshape this array into a new array with a shape of 4 X 3 as below.

In [99]:
np.reshape(x, (4, 3))

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])

As we can see, it returns a new array with the new shape. Although our original array is not changed as we can check.

In [100]:
x

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

Now lets try reshaping using the `reshape()` method of the array.

In [101]:
x.reshape((4, 3))

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])

In [102]:
x

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

We can see that even using the `reshape()` method of the array, original array is not changed.

Similarly we can reshape the array to 2 X 6 or 6 X 2 or any other shape which will result in a new array with size same as that of original array.

In [103]:
x.reshape((2, 6))

array([[ 1,  2,  3,  4,  5,  6],
       [ 7,  8,  9, 10, 11, 12]])

In [104]:
x.reshape((6, 2))

array([[ 1,  2],
       [ 3,  4],
       [ 5,  6],
       [ 7,  8],
       [ 9, 10],
       [11, 12]])

If we try to reshape an array into a new array which will lead to different size of the new array, python throws an error.

In [105]:
x.reshape((2, 2))

ValueError: cannot reshape array of size 12 into shape (2,2)

<br> 
Another operation that you might keep handy when you’re changing the shape of arrays is `ravel()`. This function allows you to flatten your arrays. This means that if you ever have 2D, 3D or n-D arrays, you can just use this function to flatten it all out to a 1-D array.

In [106]:
x

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

In [107]:
x.ravel()

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12])

In [108]:
x

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

As we can see, using `ravel()` flattens the array into a single dimensional array. However we can also see that the original array is not modified.

### Transpose

What transposing your arrays actually does is permuting the dimensions of it. Or, in other words, you switch around the shape of the array.

Lets see this with example.

In [109]:
x

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

In [110]:
x.shape

(3, 4)

In [111]:
np.transpose(x)

array([[ 1,  5,  9],
       [ 2,  6, 10],
       [ 3,  7, 11],
       [ 4,  8, 12]])

In [112]:
np.transpose(x).shape

(4, 3)

In [113]:
x

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

As we can see, using `np.transpose()` just reversed the shape of our array. Also our original array remains unchanged.

We can also perform transpose using the `T` attribute of the array as below.

In [114]:
x.T

array([[ 1,  5,  9],
       [ 2,  6, 10],
       [ 3,  7, 11],
       [ 4,  8, 12]])

In [115]:
x.T.shape

(4, 3)

In [116]:
x

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

As we can see, using the `T` attribute of the array, we can get the same results of transposing an array and the original array remains unchanged.

Both do the same; There isn’t too much difference. You do have to take into account that `T` seems more of a convenience function and that you have a lot more flexibility with `np.transpose()`.

What happens when we try to transpose a 1d array? Lets try it out.

In [117]:
y = np.array([1, 2, 3])
y

array([1, 2, 3])

In [118]:
y.shape

(3,)

In [119]:
np.transpose(y)

array([1, 2, 3])

In [120]:
y.T

array([1, 2, 3])

In [121]:
y.T.shape

(3,)

As we can see using transpose on a 1d array return the same array.

### Adding / Removing Elements

**numpy.append**

When you append arrays to your original array, they are “glued” to the end of that original array. Appending is a pretty easy thing to do thanks to the NumPy library, you can just make use of the `np.append()`. The append operation is not inplace, a new array is allocated. Also the dimensions of the input arrays must match otherwise `ValueError` will be generated. You can specify axis for appennding as well. Remember that axis 1 indicates the columns, while axis 0 indicates the rows in 2-D arrays.

In [122]:
a = np.array([[1, 2, 3], [4, 5, 6]])
a

array([[1, 2, 3],
       [4, 5, 6]])

In [123]:
b = np.array([[10], [11]])
b

array([[10],
       [11]])

In [124]:
np.append(a, b)

array([ 1,  2,  3,  4,  5,  6, 10, 11])

We can see that if we don't specify any axis in append, the arrays are flattened and the second array is appended at the end of the first array.

Now lets try and specify axis in the append method.

In [125]:
np.append(a, b, axis=1)

array([[ 1,  2,  3, 10],
       [ 4,  5,  6, 11]])

Since the number of rows in both the arrays is same, we can append ***b*** at end of ***a***. 

Let's now try to append ***b*** to ***a*** by axis 0.

In [126]:
np.append(a, b, axis=0)

ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 3 and the array at index 1 has size 1

As we can see this operation fails as if we try to append b along axis 0 of a (as additional rows), the number of columnsin both should match. But since the number of columns in ***a*** is 3 and ***b*** is 1, this operation can't be performed and python throws a ValueError.

Let's create a new array with 3 columns and try to append that to a along axis 0.

In [127]:
c = np.array([[100, 101, 102]])
c

array([[100, 101, 102]])

In [128]:
np.append(a, c, axis=0)

array([[  1,   2,   3],
       [  4,   5,   6],
       [100, 101, 102]])

**np.insert**

This function inserts values in the input array along the given axis and before the given index. If the type of values is converted to be inserted, it is different from the input array. Insertion is not done in place and the function returns a new array. Also, if the axis is not mentioned, the input array is flattened.

The insert() function takes the following parameters −

`numpy.insert(arr, obj, values, axis)`

-  **arr** (*array_like*) - Input array.
-  **obj** (*int, slice or sequence of ints*) - Object that defines the index or indices before which values is inserted.
-  **values** (*array_like*) - Values to insert into arr.
-  **axis** (*int, optional*) - Axis along which to insert values. If axis is None then arr is flattened first.

In [129]:
a = np.array([[100, 101, 102], [103, 104, 105]])
a

array([[100, 101, 102],
       [103, 104, 105]])

In [130]:
np.insert(a, 1, [11])

array([100,  11, 101, 102, 103, 104, 105])

In the example above, we try to insert **11** before elemnet at index **1**. Since we didn't passed any axis, ***a*** is first flattened and then **11** in inserted before the element at index **1**.

Let's try out an example by providing axis.

In [131]:
np.insert(a, 1, [11, 12, 13], axis=0)

array([[100, 101, 102],
       [ 11,  12,  13],
       [103, 104, 105]])

In this example, we try to insert **[11, 12, 13]** before element at index **1** along axis 0. We get the output as expected. Here we provided values to be inserted of appropraite shape. What if we try to insert an object with a different shape? Let's try it out.

In [132]:
np.insert(a, 1, 11, axis=0)

array([[100, 101, 102],
       [ 11,  11,  11],
       [103, 104, 105]])

As we can see in this case the value is broadcasted and then inserted before the given index.

Let's try to insert the same value by broadcasting along axis 1 now.

In [133]:
np.insert(a, 1, 11, axis=1)

array([[100,  11, 101, 102],
       [103,  11, 104, 105]])

**np.delete**

This function returns a new array with the specified subarray deleted from the input array. As in case of insert() function, if the axis parameter is not used, the input array is flattened. 

The function takes the following parameters -

`numpy.delete(arr, obj, axis)`

-  **arr** (*array_like*) - Input array.
-  **obj** (*slice, int or array of ints*) - Indicate indices of sub-arrays to remove along the specified axis.
-  **axis** (*int, optional*) - The axis along which to delete the subarray defined by obj. If axis is None, obj is applied to the flattened array.

In [134]:
a

array([[100, 101, 102],
       [103, 104, 105]])

In [135]:
np.delete(a, 1)

array([100, 102, 103, 104, 105])

In [136]:
np.delete(a, 1, axis=0)

array([[100, 101, 102]])

In [137]:
np.delete(a, 1, axis=1)

array([[100, 102],
       [103, 105]])

### Joining Arrays

**np.concatenate**

This function is used to join two or more arrays of the same shape along a specified axis. The function takes the following parameters - 

`numpy.concatenate((a1, a2, ...), axis)`

-  **a1, a2, …** (*sequence of array_like*) - The arrays must have the same shape, except in the dimension corresponding to axis (the first, by default).
-  **axis** (*int, optional*) - The axis along which the arrays will be joined. If axis is None, arrays are flattened before use. Default is 0.

In [138]:
a

array([[100, 101, 102],
       [103, 104, 105]])

In [139]:
b = np.array([[1, 2, 3]])
b

array([[1, 2, 3]])

In [140]:
np.concatenate((a, b))

array([[100, 101, 102],
       [103, 104, 105],
       [  1,   2,   3]])

This works because by default the axis is 0 and we have same number of columns in the two arrays. We could have however specified the axis 0 explicitly to make it more readable and clear.

In [141]:
np.concatenate((a, b), axis=0)

array([[100, 101, 102],
       [103, 104, 105],
       [  1,   2,   3]])

In [142]:
c = np.array([[1], [2]])
c

array([[1],
       [2]])

In [143]:
np.concatenate((a, c))

ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 3 and the array at index 1 has size 1

This doesn't work because if we don't provide any axis it is taken as 0 be default and both arrays don't have same number of columns for concatenate to work properly.

However the below code works when we provide axis as 1.

In [144]:
np.concatenate((a, c), axis=1)

array([[100, 101, 102,   1],
       [103, 104, 105,   2]])

In [145]:
np.concatenate((a, b, c), axis=None)

array([100, 101, 102, 103, 104, 105,   1,   2,   3,   1,   2])

As we can see, providing axis as **None**, flattens all the arrays before concatenating.

**np.hstack**

Stack arrays in sequence horizontally (column wise). Number of rows in all the arrays being stacked should be the same.

In [146]:
a

array([[100, 101, 102],
       [103, 104, 105]])

In [147]:
c

array([[1],
       [2]])

In [148]:
np.hstack((a, c))

array([[100, 101, 102,   1],
       [103, 104, 105,   2]])

In [149]:
b

array([[1, 2, 3]])

In [150]:
np.hstack((a, b))

ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 0, the array at index 0 has size 2 and the array at index 1 has size 1

Error thrown when the number of rows is not same in the arrays being stacked.

**np.vstack**

Stack arrays in sequence vertically (row wise). Number of columns in all the arrays being stacked should be the same.

In [151]:
a

array([[100, 101, 102],
       [103, 104, 105]])

In [152]:
b

array([[1, 2, 3]])

In [153]:
np.vstack((a, b))

array([[100, 101, 102],
       [103, 104, 105],
       [  1,   2,   3]])

In [154]:
c

array([[1],
       [2]])

In [155]:
np.vstack((a, c))

ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 3 and the array at index 1 has size 1

Error thrown when the number of columns is not same in the arrays being stacked.

### Splitting arrays

**np.split**

This function divides the array into subarrays along a specified axis. The function takes three parameters.

`numpy.split(ary, indices_or_sections, axis)`

-  **ary** (*ndarray*) - Array to be divided into sub-arrays.
-  **indices_or_sections** (*int or 1-D array*) - 
    -  If **indices_or_sections** is an integer, N, the array will be divided into N equal arrays along axis. If such a split is not possible, an error is raised.
    -  If **indices_or_sections** is a 1-D array of sorted integers, the entries indicate where along axis the array is split. For example, [2, 3] would, for axis=0, result in
        -  ary[:2]
        -  ary[2:3]
        -  ary[3:]
    -  If an index exceeds the dimension of the array along axis, an empty sub-array is returned correspondingly.
-  **axis** (*int, optional*) - The axis along which to split, default is 0.

In [156]:
a

array([[100, 101, 102],
       [103, 104, 105]])

In [157]:
np.split(a, 2)

[array([[100, 101, 102]]), array([[103, 104, 105]])]

In [158]:
np.split(a, 3, axis=1)

[array([[100],
        [103]]),
 array([[101],
        [104]]),
 array([[102],
        [105]])]

In [159]:
np.split(a, [1], axis=1)

[array([[100],
        [103]]),
 array([[101, 102],
        [104, 105]])]

**np.hsplit**

Split an array into multiple sub-arrays horizontally (column-wise). `hsplit` is equivalent to `split` with **axis=1**, the array is always split along the second axis regardless of the array dimension.

In [160]:
a

array([[100, 101, 102],
       [103, 104, 105]])

In [161]:
np.hsplit(a, 3)

[array([[100],
        [103]]),
 array([[101],
        [104]]),
 array([[102],
        [105]])]

In [162]:
np.hsplit(a, [1])

[array([[100],
        [103]]),
 array([[101, 102],
        [104, 105]])]

**np.vsplit**

Split an array into multiple sub-arrays vertically (row-wise). `vsplit` is equivalent to `split` with **axis=0** (default), the array is always split along the first axis regardless of the array dimension.

In [163]:
a

array([[100, 101, 102],
       [103, 104, 105]])

In [164]:
np.vsplit(a, 2)

[array([[100, 101, 102]]), array([[103, 104, 105]])]

In [165]:
np.vsplit(a, [1, 2])

[array([[100, 101, 102]]),
 array([[103, 104, 105]]),
 array([], shape=(0, 3), dtype=int64)]

<br> 
## Numpy arrays vs list

As we now are a little familiar with numpy arrays, its time we compare numpy arrays with traditional lists in python.

We use numpy array instead of a list because of the below three reasons:
1. Less Memory
2. Fast
3. Convenient

The very first reason to choose python numpy array is that it occupies less memory as compared to list. Then, it is pretty fast in terms of execution and at the same time it is very convenient to work with numpy. So these are the major advantages that python numpy array has over list. 

*Not convinced? Alright let's prove it!*

### Comparing memory consumption - Numpy array vs List

Firstly let's create a list of 100 numbers and check its size.

In [166]:
# Create a list of 100 numbers
a_list = list(range(100))

In [167]:
# Check size of each element of list and of entire list
import sys 

# printing size of the whole list 
print("Size of the whole list in bytes: {}".format(sys.getsizeof(a_list)))

Size of the whole list in bytes: 1016


Now let's create an array of 100 numbers and check its size.

In [168]:
# Create an array of 100 numbers
a_array = np.arange(100)

In [169]:
# Check size of each element of array and of entire array
  
# printing size of the whole list 
print("Size of the whole array in bytes: {}".format(sys.getsizeof(a_array)))

Size of the whole array in bytes: 896


### Comparing speed - Numpy array vs List

In [170]:
import time 
   
# size of arrays and lists 
size = 1000000  

In [171]:
# declaring lists 
list1 = range(size) 
list2 = range(size) 
   
# capturing time before the multiplication of Python lists 
initial_time = time.time() 
  
# multiplying  elements of both the lists and stored in another list 
resultant_list = [(a * b) for a, b in zip(list1, list2)] 

final_time = time.time()
   
# calculating execution time 
print('It took {} seconds to perform multiplication using Lists.'.format(final_time - initial_time)) 

It took 0.1111290454864502 seconds to perform multiplication using Lists.


In [172]:
# declaring arrays 
array1 = np.arange(size)   
array2 = np.arange(size) 

# capturing time before the multiplication of Numpy arrays 
initial_time = time.time() 
  
# multiplying  elements of both the Numpy arrays and stored in another Numpy array  
resultant_array = array1 * array2 

final_time = time.time()
   
# calculating execution time 
print('It took {} seconds to perform multiplication using Arrays.'.format(final_time - initial_time)) 

It took 0.00555419921875 seconds to perform multiplication using Arrays.


As we can see it take a lot more time to perform operations on lists than arrays.

Hope this convinces you of the power of numpy arrays over lists.

<hr>

With this we conclude this post on Using Numpy. It is okay if it feels overwhelming at first with so many new things being introduced. Go through the entire post once again if needed, this time trying out the examples and all other cases you can think of which i have not covered. Experimentation I guess is the best way to understand and learn anything new.

You can check the Jupyter source notebook for this post [here](https://github.com/guptanik/python-libraries-for-machine-learning/blob/master/Using%20Numpy.ipynb).

***Additional Resources***

-  [**Source Notebook for this post**](https://github.com/guptanik/python-libraries-for-machine-learning/blob/master/Using%20Numpy.ipynb)
-  [**Numpy official site**](https://numpy.org/)
-  [**Numpy quick started guide**](https://numpy.org/devdocs/user/quickstart.html)