## Data Computations using Numpy

This is lab is performing data computations with NumPy. NumPy is a scientific tool used to make mathematical computations easily.

In this lab, you will learn to:

1. Create a NumPy array
2. Select data: indexing and slicing of array
3. Perform mathematical and other basic operations
4. Perform basic statistics
5. Manipulate data

If you are using Google Colab, we do not need to install NumPy. We will only have to import it just like this:

import numpy as np

If you are using local Jupyter notebooks, make sure you have it installed already.

## 1. Creating an numpy array

Array can either be vector or matrice. A vector is one dimensional array, and a matrix is a two or more dimensional array.

In [1]:
# importing numpy
import numpy as np

In [2]:
# creating an one dimensional array
np.array([1,2,3,4,5])

array([1, 2, 3, 4, 5])

In [5]:
# creating an two dimensional array
np.array([(1,2,3,4,5),(6,7,8,9,0)])

array([[1, 2, 3, 4, 5],
       [6, 7, 8, 9, 0]])

In [7]:
# creating an array from a list
my_list = [2,4,6,8,10]
np.array(my_list)

array([ 2,  4,  6,  8, 10])

In [8]:
# printing a numpy list
print(np.array(my_list))

[ 2  4  6  8 10]


## 1.1 Generating Array


NumPy offers various options to generate an array depending on particular need, such as:

* Generating identity array
* Generating zero array of a given size
* Generating ones array with a given size
* Generating an array in a given range
* Generating an array with random values

In [9]:
# Generating an identity array
identity_array = np.identity(4)
print(identity_array)

[[1. 0. 0. 0.]
 [0. 1. 0. 0.]
 [0. 0. 1. 0.]
 [0. 0. 0. 1.]]


In [203]:
# Check the type of the array

type(np.identity(4))

numpy.ndarray

In [205]:
type(np.array([(1,2,3),(4,5,6)]))

numpy.ndarray

In [10]:
# Generating an identity matrix of 1's
np.eye(5)

array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])

In [12]:
# any constant can be multiplied with the identity array
print(identity_array * 7)
print(np.eye(5) * 8)

[[7. 0. 0. 0.]
 [0. 7. 0. 0.]
 [0. 0. 7. 0.]
 [0. 0. 0. 7.]]
[[8. 0. 0. 0. 0.]
 [0. 8. 0. 0. 0.]
 [0. 0. 8. 0. 0.]
 [0. 0. 0. 8. 0.]
 [0. 0. 0. 0. 8.]]


In [13]:
# Generating zero array of a given size
# 1 dimensional zero array
np.zeros(5)

array([0., 0., 0., 0., 0.])

In [15]:
# Generating zero array of two dimensional size: pass the rows and columns as a tuple
# np.zeros(rows,columns)

np.zeros((2,3))

array([[0., 0., 0.],
       [0., 0., 0.]])

In [16]:
# Generating ones array of one dimensional shape
np.ones(6)

array([1., 1., 1., 1., 1., 1.])

In [17]:
# Generating ones array of two dimensional size: pass the rows and columns as a tuple
# np.zeros(rows,columns)

np.ones((2,3))

array([[1., 1., 1.],
       [1., 1., 1.]])

In [18]:
# Generating an array in a given range or interval

np.arange(0,7)

array([0, 1, 2, 3, 4, 5, 6])

In [19]:
# Generating an array in a given range or interval with step size

np.arange(0,30,3)

array([ 0,  3,  6,  9, 12, 15, 18, 21, 24, 27])

In [20]:
# You can also use linspace to generate an evenly spaced numbers in a given interval

np.linspace(0,20,2)

array([ 0., 20.])

In [21]:
np.linspace(0,200,20)

array([  0.        ,  10.52631579,  21.05263158,  31.57894737,
        42.10526316,  52.63157895,  63.15789474,  73.68421053,
        84.21052632,  94.73684211, 105.26315789, 115.78947368,
       126.31578947, 136.84210526, 147.36842105, 157.89473684,
       168.42105263, 178.94736842, 189.47368421, 200.        ])

In [22]:
## Generating an array with random values
# Create a 1D array with 6 random numbers

np.random.rand(6)

array([0.61231086, 0.74698152, 0.79406187, 0.30037865, 0.64364287,
       0.65494703])

In [23]:
# Create a 2D array with 4*5 random values

np.random.rand(4,5)

array([[0.77135351, 0.53717309, 0.07914771, 0.61377981, 0.74977814],
       [0.62471923, 0.8527454 , 0.46850526, 0.85188513, 0.7929111 ],
       [0.80928996, 0.24333945, 0.30878016, 0.58285521, 0.75961629],
       [0.34982368, 0.70746898, 0.82650679, 0.38519495, 0.77610907]])

In [24]:
### Generate one random integer in a given range

np.random.randint(5,50)

29

In [25]:
### Generate 10 random integers in a given range

np.random.randint(5,50,10)

array([ 6, 48, 20, 33,  8, 49, 25,  6,  6, 19])

In [26]:
## Random seed to output the same random vaues at all run time 
import random

random.seed(10)

random.randint(5,50)

41

## 2. Data Selection: Indexing and slicing an Array

Indexing: Selecting individual elements from the array

Slicing: Selecting group of element from the array.

## 2.1 1D Array Indexing and Selection

In [36]:
# Creating a 1 dimensional vector

array_1d = np.array([1,2,3,4,5])

In [37]:
## Indexing: selcting the first element from an array
## Note: Indexing starts at 0
array_1d[0]

1

In [38]:
## Indexing: Selecting the last element from an array
array_1d[-1]

5

In [39]:
# Slicing: Returning the group of element from an array

array_1d [2:4]

array([3, 4])

## 2.2 2D Array Indexing and Selection

In [40]:
## Indexing 2D array

array_2d = np.array([[1,2,3],[4,5,6],[7,8,9]])

In [42]:
## Selecting individual element
## array_2d[row][column]
## let's select 5..that is row 1, column 1 (Indexing starts from 0!!)

array_2d[1][1]

5

In [43]:
# let's select 9..that is row 2, column 2

array_2d[2][2]

9

In [44]:
## Selecting whole row
#array_2d[row]

array_2d[1]

array([4, 5, 6])

In [45]:
## Selecting group of elements in 2D array
## array_2d[rows, columns]..You select rows and columns

## Let's select the first two rows
## Rows :2 denotes that we are selecting all rows up to the second. 
## Columns : denotes that all columns are selected.


array_2d[:2,:]

array([[1, 2, 3],
       [4, 5, 6]])

In [46]:
## Let's select the first two rows and first column
array_2d[:2,:1]

array([[1],
       [4]])

In [47]:
## Selecting all first two rows and first two columns

array_2d[:2,:2]

array([[1, 2],
       [4, 5]])

In [48]:
## The below command is same as selecting first two rows and all columns

array_2d[0:2,:]

array([[1, 2, 3],
       [4, 5, 6]])

In [49]:
## This will return all rows (only three rows in array_2d), and so all columns. Hence the result we be same as orginal array
array_2d[0:3,:]

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [53]:
## return the second row

array_2d[1,:]

array([4, 5, 6])

In [55]:
## return the second column
array_2d[:,1]

array([2, 5, 8])

In [60]:
## return the last two columns
array_2d[:,1:3]

array([[2, 3],
       [5, 6],
       [8, 9]])

In [61]:
## return the first column
array_2d[:,0]

array([1, 4, 7])

In [62]:
## return the first row
array_2d[0,:]

array([1, 2, 3])

Indexing or selecting 2D array may seems confusing but when you try it multiple times, you get the idea. If you are selecting an entire row, that means the all the columns are selected (but not their all values). And vice versa.

As shown below, we are selecting the first row, but as you can see all columns are selected (:).

array_2d[0,:]

## 2.3 Conditional selection

You can use a condition to select values in an array. Let's use comparison operators to select the values.

In [63]:
## Let's create an array

arr= np.array(([1,2,3],[4,5,6],[7,8,9]))

In [66]:
## Select all elements in an array which are less than 6
arr[arr <6]

array([1, 2, 3, 4, 5])

In [67]:
## Select all elements in an array which are greater than 6
arr[arr >6]

array([7, 8, 9])

In [68]:
## select all even numbers in an array
arr[arr %2 ==0]

array([2, 4, 6, 8])

In [69]:
## select all odd numbers in an array
arr[arr %2 !=0]

array([1, 3, 5, 7, 9])

In [76]:
## You can also have multiple conditions

## In all odd numbers, return values which are greater or equal to 5

arr[(arr %2 !=0) & (arr >= 5)]

array([5, 7, 9])

In [78]:
## Using logical selection, you can also return True for values in which a given condition is met in an array

arr >= 8
#arr == 0

array([[False, False, False],
       [False, False, False],
       [False,  True,  True]])

## 3. Basic Array Operations
## 3.1 Quick Arithmetic operation: Addition, Subtraction, Multiplication, Division, Squaring

In [79]:
arr1 = np.arange(0,5)
arr2 = np.arange(4,9)

In [80]:
# Addition

arr1 + arr2

array([ 4,  6,  8, 10, 12])

In [81]:
# Subtraction

arr1 - arr2

array([-4, -4, -4, -4, -4])

In [82]:
# Multiplication

arr1 * arr2

array([ 0,  5, 12, 21, 32])

In [83]:
# Division

arr1 / arr2

array([0.        , 0.2       , 0.33333333, 0.42857143, 0.5       ])

In [85]:
# Squaring

arr1 ** 2

array([ 0,  1,  4,  9, 16], dtype=int32)

In [126]:
# Square root

arr1**(1/2)

array([0.        , 1.        , 1.41421356, 1.73205081, 2.        ])

## 3.2 Universal functions

NumPy universal functions ( ufunc ) allows to compute math, trigonometric, logical and comparison operations such as sin, cos, tan, exponent(exp), log, square, greater, less, etc...

In [86]:
arr1 = np.arange(0,5)
arr2 = np.arange(4,9)

In [87]:
# Addition

np.add(arr1, arr2)

array([ 4,  6,  8, 10, 12])

In [88]:
# Subtraction

np.subtract(arr1, arr2)

array([-4, -4, -4, -4, -4])

In [89]:
# Multiplication

np.multiply(arr1, arr2)

array([ 0,  5, 12, 21, 32])

In [91]:
# Division

np.divide(arr1, arr2)

array([0.        , 0.2       , 0.33333333, 0.42857143, 0.5       ])

In [127]:
# Square root of arr1

np.sqrt(arr1)

array([0.        , 1.        , 1.41421356, 1.73205081, 2.        ])

In [128]:
# Square root of arr2

np.sqrt(arr2)

array([[1.73205081, 2.        ],
       [2.23606798, 2.44948974]])

In [92]:
## Calculating the sin of arr1

np.sin(arr1)

array([ 0.        ,  0.84147098,  0.90929743,  0.14112001, -0.7568025 ])

In [93]:
# Calculating the cosine of arr1

np.cos(arr1)

array([ 1.        ,  0.54030231, -0.41614684, -0.9899925 , -0.65364362])

In [94]:
# Calculating the tangent of arr1 

np.tan(arr1)

array([ 0.        ,  1.55740772, -2.18503986, -0.14254654,  1.15782128])

In [95]:
# Calculating the logarithmic (log) of arr1

np.log(arr1)

  This is separate from the ipykernel package so we can avoid doing imports until


array([      -inf, 0.        , 0.69314718, 1.09861229, 1.38629436])

In [96]:
# Calculating the logarithmic (log) of arr2

np.log(arr2)

array([1.38629436, 1.60943791, 1.79175947, 1.94591015, 2.07944154])

In [97]:
## Calculating the exponent(exp or e^) of arr1

np.exp(arr1)

array([ 1.        ,  2.71828183,  7.3890561 , 20.08553692, 54.59815003])

In [98]:
## Calculating the exponent(exp or e^) of arr2

np.exp(arr2)

array([  54.59815003,  148.4131591 ,  403.42879349, 1096.63315843,
       2980.95798704])

In [100]:
## Calculating the power  of the array
## Array 1 is powered array 2...0^4=0, 1^5=1, 2^6=64, etc..

np.power(arr1, arr2)

array([    0,     1,    64,  2187, 65536], dtype=int32)

In [101]:
## Comparison operations return true or false
## Arr 1 is less than arr 2...so that's false

np.greater(arr1, arr2)

array([False, False, False, False, False])

In [103]:
## Comparison operations return true or false
## Arr 1 is less than arr 2...so that's true

np.less(arr1, arr2)

array([ True,  True,  True,  True,  True])

## 4. Basic Statistics

With NumPy, we can compute the basic statistics such as the standard deviation (std), variance (var),mean, median, minimum value, maximum value of an array.

More about NumPy statistics: https://numpy.org/doc/stable/reference/routines.statistics.html#order-statistics

In [104]:
## Creating an array 

arr = np.arange(0,5)
arr

array([0, 1, 2, 3, 4])

## 4.1 Standard Deviation

In [105]:
## calculating the standard deviation of the array
## Std is how much an element of the array deviates from the mean of the array

np.std(arr)

1.4142135623730951

In [106]:
arr2 = np.array([[3,4], [5,6]])

np.std(arr2)

1.118033988749895

In [110]:
## Specifying the axis
## By default, the std is computed on the flattened values (or converted into a single column vector)

np.std(arr2, axis=0)

array([1., 1.])

In [117]:
np.std(arr2, axis=1)

array([0.5, 0.5])

## 4.2 Variance

variance is a measure of dispersion that takes into account the spread of all data points in a data set. It's the measure of dispersion the most often used, along with the standard deviation, which is simply the square root of the variance

In [123]:
## Calculating the Variance (var)

arr = np.arange(0,5)

np.var(arr)

2.0

In [129]:
np.var(arr2)

1.25

## 4.3 Mean

In [131]:
# Create an array

arr3 = np.arange(0,9)

In [132]:
# Calculate mean of arr3 using mean function

np.mean(arr3)

4.0

In [133]:
# Calculate mean of arr3 using average function

np.average(arr3)

4.0

## 4.4 Median

In [134]:
## Calculating the median of the array

np.median(arr3)

4.0

## 4.5 Minimum and Maximum

In [135]:
## find the minimum value in an arry

np.min(arr3)

0

In [136]:
## Find the maximum value in an array

np.max(arr3)

8

## 5. Data Manipulation

Data Manipulation is important step in Machine Learning project. Let's some of NumPy methods and functions which are useful in data manipulation.

## 5.1 Shape of the array

In [138]:
## Creating an array 

arr4 = np.arange(0,10)
arr5= np.array(([1,2,3],[4,5,6],[7,8,9]))

In [139]:
arr4

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [140]:
arr5

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [141]:
np.shape(arr4)

(10,)

In [142]:
np.shape(arr5)

(3, 3)

In [143]:
# Alternative way to find the shape of an array

arr4.shape

(10,)

In [144]:
arr5.shape

(3, 3)

## 5.2 Shaping the Array

np.reshape(array_name, newshape=(rows, columns) or array_name.reshape(rows, columns) change the shape of the array. The rows and columns of the new shape has to comform with the existing data of the array. Otherwise, it won't work. Take an example, you can convert (3,3) array into (1,9) but you can't convert it into (5,5).

In [146]:
### arr4 is (10,)....10 rows, 1 column. Let's reshape it into (5,2)
np.reshape(arr4, newshape=(5,2))

array([[0, 1],
       [2, 3],
       [4, 5],
       [6, 7],
       [8, 9]])

In [149]:
## Alternative way to reshape an array
arr4.reshape(5,2)

array([[0, 1],
       [2, 3],
       [4, 5],
       [6, 7],
       [8, 9]])

In [151]:
## Array transpose (array_name.T)

arr5_reshaped = arr5.reshape(9,1)
arr5_reshaped.T

array([[1, 2, 3, 4, 5, 6, 7, 8, 9]])

In [152]:
arr5_reshaped.reshape(3,3)

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [154]:
## np.resize can also be used to change the shape of the array into a specific size

np.resize(arr5, (1,9))

array([[1, 2, 3, 4, 5, 6, 7, 8, 9]])

## 5.3 Copying array

In [155]:
arr1 = np.arange(0,10)
arr1

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [156]:
arr1_copy = arr1.copy()
arr1_copy

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [157]:
## Copying the values of one array into the other 

## Let's copy array 2 into 1 --they have the same shape

arr1 = np.arange(0,6)
arr2 = np.arange(6,12)

In [158]:
## arr1 is destination, arr2 is source
np.copyto(arr1, arr2)

In [159]:
arr2

array([ 6,  7,  8,  9, 10, 11])

In [160]:
arr1

array([ 6,  7,  8,  9, 10, 11])

## 5.4 Joining arrays

In [161]:
### Creating two arrays

arr1 = np.array([[1,2,3],[4,5,6],[7,8,9]])
arr2 = np.array([[10,11,12]])

In [162]:
## Joining them

np.concatenate((arr1, arr2))

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])

In [178]:
arr3 = np.array([(13,14,15), (16,17,18)])

In [179]:
arr3

array([[13, 14, 15],
       [16, 17, 18]])

In [180]:
np.concatenate((arr1, arr3))

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [13, 14, 15],
       [16, 17, 18]])

In [181]:
## Transposing arr2
## arr2.T is transpose operation

np.concatenate((arr1, arr2.T), axis=1)

array([[ 1,  2,  3, 10],
       [ 4,  5,  6, 11],
       [ 7,  8,  9, 12]])

In [182]:
### Setting axis to none flatten the array

np.concatenate((arr1, arr2), axis=None)

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12])

In [183]:
### Joining two 1Ds array into 2D array: Stacking

# Column stacking

arr1 = np.arange(0,6)
arr2 = np.arange(6,12)

np.column_stack((arr1, arr2))

array([[ 0,  6],
       [ 1,  7],
       [ 2,  8],
       [ 3,  9],
       [ 4, 10],
       [ 5, 11]])

In [185]:
## Row stacking 

np.row_stack((arr1, arr2))

array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11]])

## 5.5 Splitting arrays

In [186]:
arr1 = np.arange(0,6)
arr1

array([0, 1, 2, 3, 4, 5])

In [187]:
### Splitting the array into two arrays

np.split(arr1, 2)

[array([0, 1, 2]), array([3, 4, 5])]

In [188]:
### Splitting the array into three arrays

np.split(arr1, 3)

[array([0, 1]), array([2, 3]), array([4, 5])]

## 5.6 Adding (append) and repeating elements in an array

In [189]:
arr1 = np.arange(0,6)
arr1

array([0, 1, 2, 3, 4, 5])

In [196]:
## repeat each elements of arr1 three times

np.repeat(arr1, 3)

array([0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5])

In [193]:
## append 6 to arr1

np.append(arr1, 6)

array([0, 1, 2, 3, 4, 5, 6])

In [197]:
## repeat the elements of arr1 one after the other three times

np.tile(arr1, 3)

array([0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5])

## 5.7 Sorting elements in an array

In [198]:
arr = np.array([[1,2,3,4,5,3,2,1,3,5,6,7,7,5,9,5]])

np.sort(arr)

array([[1, 1, 2, 2, 3, 3, 3, 4, 5, 5, 5, 5, 6, 7, 7, 9]])

In [199]:
## Finding the unique elements in an array

arr = np.array([[1,2,3,4,5,3,2,1,3,5,6,7,7,5,9,5]])

np.unique(arr)

array([1, 2, 3, 4, 5, 6, 7, 9])

## 5.8 Reversing an array

In [200]:
## Create an array

arr = np.array([[1,2,3],[4,5,6],[7,8,9]])
arr

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [201]:
## Create an up and down flipping
np.flipud(arr)

array([[7, 8, 9],
       [4, 5, 6],
       [1, 2, 3]])

In [202]:
## Create an left and right flipping

np.fliplr(arr)

array([[3, 2, 1],
       [6, 5, 4],
       [9, 8, 7]])

That's it for NumPy. In this lab, you learned how to create an array, perform basic operations, and also how to manipulate an array.

In the next lab, we will learn about the Pandas, another important tool used for real world data manipulation