## Python - Numpy Guide


#### Priyaranjan Mohanty

In [1]:
2 + 4 # Its a comment



6

### Numpy : what is it ?

Numpy is the most basic and a powerful package for working with data in python.

If you are going to work on data analysis or machine learning projects, then having a solid understanding of numpy is nearly mandatory.

Because other packages for data analysis (like pandas) is built on top of numpy and the scikit-learn package which is used to build machine learning applications works heavily with numpy as well.

So what does numpy provide?

At the core, numpy provides the excellent ndarray objects, short for n-dimensional arrays.

In a ‘ndarray’ object, aka ‘array’, you can store multiple items of the same data type. It is the facilities around the array object that makes numpy so convenient for performing math and data manipulations.

You might wonder, ‘I can store numbers and other objects in a python list itself and do all sorts of computations and manipulations through list comprehensions, for-loops etc. What do I need a numpy array for?’

Well, there are very significant advantages of using numpy arrays overs lists.

To understand this, let’s first see how to create a numpy array.

### How to create NDArray ( Numpy Array )

There are multiple ways to create a numpy array. 

However one of the most common ways is to create one from a list or a list like an object by passing it to the np.array function.

In [2]:
# Install Numpy 

!pip install numpy



In [3]:
# Import Numpy 

import numpy as np

In [4]:
# Create an 1 dimensional array from a list

list1 = [0,1,2,3,4]

Arr_1d = np.array(list1)

In [5]:
# Print the array and its type
print(type(Arr_1d))

print("\n")

print(Arr_1d)

<class 'numpy.ndarray'>


[0 1 2 3 4]


In [6]:
# Print the shape of the Array

print(Arr_1d.shape)

(5,)


The key difference between an array and a list is, arrays are designed to handle vectorized operations while a python list is not.

That means, if you apply a function it is performed on every item in the array, rather than on the whole array object.

Let’s suppose you want to add the number 2 to every item in the list. The intuitive way to do it is something like this:

In [7]:
print(list1)

[0, 1, 2, 3, 4]


In [8]:
list1 + 2

TypeError: can only concatenate list (not "int") to list

In [None]:
print(Arr_1d)

In [None]:
Arr_1d + 2

### Next Step - Lets create 2 Dimensional Array 



In [10]:
# Create a 2d array from a list of lists

list_2D = [[0,1,2,3], 
           [4,5,6,7], 
           [8,9,10,11]]

Arr_2D = np.array(list_2D)

In [11]:
Arr_2D

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [None]:
print(type(Arr_2D))

print("\n")

print(Arr_2D)

print("\n")

print(Arr_2D.shape)

In [None]:
# We can also create a 2-D Array from a flat list ( Not a nested list )

List_Num = [0,1,2,3,4,5,6,7,8,9,10,11]

# Array_1D_2 = np.array(List_Num)

# Array_2D_2 = Array_1D_2.reshape(3,4)

Array_2D_2 = np.array(List_Num).reshape(3,4)

Array_2D_2

In [None]:
Array_2D_2.shape

In [None]:
Array_2D_3 = Array_2D_2.reshape(6,2)

Array_2D_3

We can also specify the datatype by setting the dtype argument. 

Some of the most commonly used numpy dtypes are: 'float', 'int', 'bool', 'str'

In [None]:
Arr_2D_2 = np.array(list_2D)

print(Arr_2D_2)


In [None]:
# Create a float 2d array

Arr_2D_2 = np.array(list_2D, dtype='float')

print(Arr_2D_2)

print(Arr_2D_2.dtype)

#### Note -

A numpy array must have all items to be of the same data type, unlike lists. This is another significant difference.

### Note :

We can always convert a Numpy Array back to List 

In [None]:
print(Arr_1d)

print(type(Arr_1d))

In [None]:
# Convert an ndarray to list 

print(Arr_1d.tolist())

print("\n")

print(type(Arr_1d.tolist()))

In [None]:
print(Arr_2D)

print("\n")

print(type(Arr_2D))

In [None]:
# Convert 2-d array to list 

print(Arr_2D.tolist())

print("\n")

print(type(Arr_2D.tolist()))

#### To summarise, 

The main differences with python lists are:

Arrays support vectorised operations, while lists don’t.

Once an array is created, you cannot change its size. You will have to create a new array or overwrite the existing one.

Every array has one and only one dtype. All items in it should be of that dtype.

An equivalent numpy array occupies much less space than a python list of lists.

### Some Functions , Methods or Attributes associates with Numpy Array Object

In [None]:
print(Arr_2D)

print(Arr_2D.shape)

In [None]:
# Get the number of dimensions of a Numpy array using ndim attribute

print(Arr_2D.ndim)

In [None]:
# get the shape of the numpy array object

print(Arr_2D.shape)

In [None]:
# get the size of the numpy array object

print(Arr_2D.size)

In [None]:
# get the data type of the numpy array object

print(Arr_2D.dtype)

What if I try to create an Array from a list containing hetrogenous data.

In [None]:
List_var = [1,2,'a','b']

Array_2D_3 = np.array(List_var).reshape(2,2)

print(Array_2D_3)

### Extracting specific items from an array

In [None]:
print(Arr_1d)

In [None]:
print(Arr_1d[0])

In [None]:
print(Arr_1d[-1])

In [None]:
print(Arr_1d[-3])

Extracting elements from a multi dimensional array 

In [None]:
print(Arr_2D)

In [None]:
List_2_Var = [[ 0 , 1 , 2 , 3] ,  [ 4 , 5 , 6 , 7] ,  [ 8 , 9 ,10 , 11]]

In [None]:
print(Arr_2D[0])

In [None]:
# Extract the element which is in 2nd row and 1st column of the array 

Arr_2D[1,0]

In [None]:
# Extract the elements which are in 2nd row ( all columns )

print(Arr_2D[1,:])

In [None]:
# Extract the elements which are in 1st column ( all rows )

Arr_2D[:,0]

In [None]:
# Extract the elements which are in 1st & 2nd row of column 1

Arr_2D[:2,0]

Additionally, numpy arrays support boolean indexing.

In [None]:
Arr_2D

In [None]:
Bool_Idx = Arr_2D > 5

print(Bool_Idx)

In [None]:
Arr_2D[Bool_Idx]

In [None]:
print(type(Arr_2D[Bool_Idx]))

print(Arr_2D[Bool_Idx].shape)

Modifying / updating the contents of an array 

In [None]:
print(Arr_2D)

In [None]:
Arr_2D[2,1]

In [None]:
#Update the last element in last row of the array

Arr_2D[2,1] = 99

In [None]:
Arr_2D

### Compute mean, min, max on the ndarray

In [12]:
print(Arr_2D)

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]


In [13]:
# mean, max and min

print("Mean value is: ", Arr_2D.mean())

print("Max value is: ", Arr_2D.max())

print("Min value is: ", Arr_2D.min())


Mean value is:  5.5
Max value is:  11
Min value is:  0


### However, if we want to compute the minimum values row wise or column wise, 

use the np.amin version instead.

In [14]:
# Row wise and column wise min

print("Column wise minimum: ", np.amin(Arr_2D, axis=0))

print("Row wise minimum: ", np.amin(Arr_2D, axis=1))


Column wise minimum:  [0 1 2 3]
Row wise minimum:  [0 4 8]


In [None]:
# Row wise and column wise max

print("Column wise minimum: ", np.amax(Arr_2D, axis=0))

print("Row wise minimum: ", np.amax(Arr_2D, axis=1))

### How to create a new array from an existing array?


If you just assign a portion of an array to another array, the new array you just created actually refers to the parent array in memory.

That means, if you make any changes to the new array, it will reflect in the parent array as well.

So to avoid disturbing the parent array, you need to make a copy of it using copy(). All numpy arrays come with the copy() method.

In [15]:
# Assign one array to another new array 

Arr_2D_Cpy = Arr_2D

print("Original 2-d Array :\n" ,Arr_2D)
print('\n')
print("Copied 2-d Array :\n" ,Arr_2D_Cpy)

Original 2-d Array :
 [[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]


Copied 2-d Array :
 [[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]


Making a change in the value of copied array will have the change reflected in the original array as well.

Which shows that the copied Array and Original Array are both referring to same memory location.

In [16]:
# Making changes in the copied Array and printing the content of original array 

Arr_2D_Cpy[0, 0] = 100  # 100 will reflect in arr2

print(Arr_2D)

[[100   1   2   3]
 [  4   5   6   7]
 [  8   9  10  11]]


Now , how do we 'Really' create a copy of an Array with the copy of the array having its own memory allocation.

We use array.copy() method to create a copy of an array

In [21]:
# creating a copy of new array from another array 

Arr_2D_Cpy2_23424 = Arr_2D.copy()

print(Arr_2D)
print('\n')
print(Arr_2D_Cpy2_23424)

[[100   1   2   3]
 [  4   5   6   7]
 [  8   9  10  11]]


[[100   1   2   3]
 [  4   5   6   7]
 [  8   9  10  11]]


Now , lets modify the value of the element in first row and first column of Copied Array and see if that change gets effected in the original array as well

In [23]:
Arr_2D_Cpy2_23424[0, 0] = 0  # 100 will reflect in arr2

print(Arr_2D)
print('\n')
print(Arr_2D_Cpy2)

[[100   1   2   3]
 [  4   5   6   7]
 [  8   9  10  11]]


[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]


#### Combining / Concatenating Arrays 

Concatenating 1 dimensional Arrays

In [24]:
# Concatenating 1 dimensional Arrays

Array_1D_1 = np.array([1,2,3,4])
print(Array_1D_1)

Array_1D_2 = np.array([11,22,33,44,55])
print(Array_1D_2)

[1 2 3 4]
[11 22 33 44 55]


In [25]:
np.concatenate([Array_1D_1, Array_1D_2])

array([ 1,  2,  3,  4, 11, 22, 33, 44, 55])

Concatenating 2 dimensional Arrays

In [26]:
Array_2D_1 = np.array([1,2,3,4,5,6]).reshape(2,3)
print(Array_2D_1)

print("\n")

Array_2D_2 = np.array([11,22,33,44,55,66]).reshape(2,3)
print(Array_2D_2)

[[1 2 3]
 [4 5 6]]


[[11 22 33]
 [44 55 66]]


When Concatenating 2 Dimensional Array , the concatenation could of 1 of 2 possible types -

Type 1 : Concatenating by rows 

Type 2 : Concatenating by columns

Concatenating by rows 

In [27]:
# concatenate along the first axis ( Row )

np.concatenate([Array_2D_1, Array_2D_2],
              axis = 0)

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [11, 22, 33],
       [44, 55, 66]])

In [28]:
# concatenate along the second axis ( column )

np.concatenate([Array_2D_1, Array_2D_2],
              axis = 1)

array([[ 1,  2,  3, 11, 22, 33],
       [ 4,  5,  6, 44, 55, 66]])