# NumPy
NumPy is the fundamental package for scientific computing with Python. It enables convenient and efficient mathematical computation. It provides an abundance of useful features for operations on n-dimensional arrays in Python.

But before that, let us first look back at `list` which is Python's built-in data type to represent arrays.

In [1]:
x = [1, 3, 5, 4]
y = [5, 1, 7, 6]

In [2]:
x + y

[1, 3, 5, 4, 5, 1, 7, 6]

In [3]:
x - y # this will result in error

TypeError: unsupported operand type(s) for -: 'list' and 'list'

In [6]:
x * 3

[1, 3, 5, 4, 1, 3, 5, 4, 1, 3, 5, 4]

In [7]:
x / 3 # this will also result in error

TypeError: unsupported operand type(s) for /: 'list' and 'int'

As you can see, `list` doesn't do a good job to satiate our basic mathematical intuitions of arrays.
## Enter `numpy` arrays!

In [8]:
import numpy

x = numpy.array([1, 3, 5, 4])
y = numpy.array([5, 1, 7, 6])

In [9]:
x + y

array([ 6,  4, 12, 10])

In [10]:
x - y

array([-4,  2, -2, -2])

In [11]:
x * 3

array([ 3,  9, 15, 12])

In [12]:
x / 3

array([0.33333333, 1.        , 1.66666667, 1.33333333])

Numpy arrays are not only useful in mathematical sense, but also better in terms of faster computation, less memory use and ease of manipulation. This makes numpy arrays great for computation on large amount of numerical data that we have to deal with in machine learning and data science related tasks.

Let's dive in more.

In [13]:
# Since we need to type 'numpy' everytime we need to use it, we'll make it short as follows:
# In fact, you'll find that almost everyone does it this way.
import numpy as np

In [14]:
# Now, we can type 'np' instead of 'numpy'
x = np.array([1, 2, 3])
x

array([1, 2, 3])

In [15]:
# Convert python list to numpy array
my_list = [10, 20, 30]
numpy_array = np.array(my_list)
numpy_array

array([10, 20, 30])

In [16]:
# numpy equivalent of range()
x = np.arange(10)
x

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [17]:
y = np.arange(10, 20)
y

array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19])

In [18]:
z = np.arange(0, 10, 2)
z

array([0, 2, 4, 6, 8])

In [19]:
np.arange(1, 10, 0.5)  # start, end, step

array([1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5, 5. , 5.5, 6. , 6.5, 7. ,
       7.5, 8. , 8.5, 9. , 9.5])

In [20]:
# 2-dimensional array
a = np.array([[1, 2], [2, 3], [3, 4]])
a

array([[1, 2],
       [2, 3],
       [3, 4]])

In [21]:
print(a.shape)  # (number of rows, number of columns)
print(a.size)  # total number of elements
print(a.ndim)  # number of dimensions
print(a.dtype) # data type of array elements

(3, 2)
6
2
int64


## Default Array Creation

In [22]:
np.zeros(10)  # 1-dimensional array

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [23]:
np.zeros((3,4))  # 2d array of shape (3, 4)

array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

In [24]:
np.ones((4, 5))

array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

In [25]:
np.full((3,4), 10)

array([[10, 10, 10, 10],
       [10, 10, 10, 10],
       [10, 10, 10, 10]])

In [26]:
np.eye(3)  # identity matrix of shape (3, 3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [27]:
# An array of 5 random numbers uniformly distributed between 0 and 1
np.random.random(5)

array([0.30632961, 0.87728346, 0.93344588, 0.72695854, 0.31024359])

In [28]:
# 3x4 array with uniformly distributed random numbers between 0 and 1
np.random.random((4, 3))

array([[0.7224154 , 0.08518238, 0.62467982],
       [0.75141387, 0.52429301, 0.77007171],
       [0.26788073, 0.69618831, 0.05009893],
       [0.2342696 , 0.58012153, 0.01771717]])

In [29]:
# Random integers
np.random.randint(0, 20, (4, 3))

array([[12, 11, 15],
       [18,  5, 11],
       [19, 16, 16],
       [ 8,  0,  2]])

## Array Reshaping

In [30]:
arr = np.random.rand(12)
arr

array([0.20473685, 0.29021926, 0.7794166 , 0.58958657, 0.15495458,
       0.97416272, 0.57668298, 0.72311655, 0.65416207, 0.54985861,
       0.29834036, 0.49821557])

In [31]:
arr.shape

(12,)

In [32]:
arr.reshape(4, 3)

array([[0.20473685, 0.29021926, 0.7794166 ],
       [0.58958657, 0.15495458, 0.97416272],
       [0.57668298, 0.72311655, 0.65416207],
       [0.54985861, 0.29834036, 0.49821557]])

In [33]:
arr.reshape(6, 2)

array([[0.20473685, 0.29021926],
       [0.7794166 , 0.58958657],
       [0.15495458, 0.97416272],
       [0.57668298, 0.72311655],
       [0.65416207, 0.54985861],
       [0.29834036, 0.49821557]])

In [34]:
arr.reshape(12, 1)

array([[0.20473685],
       [0.29021926],
       [0.7794166 ],
       [0.58958657],
       [0.15495458],
       [0.97416272],
       [0.57668298],
       [0.72311655],
       [0.65416207],
       [0.54985861],
       [0.29834036],
       [0.49821557]])

In [35]:
arr

array([0.20473685, 0.29021926, 0.7794166 , 0.58958657, 0.15495458,
       0.97416272, 0.57668298, 0.72311655, 0.65416207, 0.54985861,
       0.29834036, 0.49821557])

In [36]:
arr = arr.reshape(6, 2)
arr

array([[0.20473685, 0.29021926],
       [0.7794166 , 0.58958657],
       [0.15495458, 0.97416272],
       [0.57668298, 0.72311655],
       [0.65416207, 0.54985861],
       [0.29834036, 0.49821557]])

Note that the size of the array is always the same `12` after reshaping. This means you cannot reshape an array to any arbitrary shape. Only those shapes that are consistent with the size of the array are permitted.

In [37]:
arr.reshape(3, -1)  # dimension with -1 will be calculated accordingly

array([[0.20473685, 0.29021926, 0.7794166 , 0.58958657],
       [0.15495458, 0.97416272, 0.57668298, 0.72311655],
       [0.65416207, 0.54985861, 0.29834036, 0.49821557]])

In [38]:
arr.reshape(6, -1)

array([[0.20473685, 0.29021926],
       [0.7794166 , 0.58958657],
       [0.15495458, 0.97416272],
       [0.57668298, 0.72311655],
       [0.65416207, 0.54985861],
       [0.29834036, 0.49821557]])

In [39]:
arr.reshape(-1, 4)

array([[0.20473685, 0.29021926, 0.7794166 , 0.58958657],
       [0.15495458, 0.97416272, 0.57668298, 0.72311655],
       [0.65416207, 0.54985861, 0.29834036, 0.49821557]])

In [40]:
arr.reshape(-1, 3).shape

(4, 3)

## Indexing and Slicing
Indexing and slicing are similar to python lists.

In [41]:
a = np.arange(10)
a

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [42]:
a[5]

5

In [43]:
a[-3]

7

In [44]:
a[2:5]

array([2, 3, 4])

In [45]:
a[-5:-3]

array([5, 6])

In [46]:
x = np.random.randint(0, 10, (6, 5))  # 2d array of 6x5
x

array([[7, 4, 7, 3, 6],
       [4, 2, 3, 6, 7],
       [0, 9, 7, 0, 7],
       [9, 9, 1, 2, 3],
       [4, 4, 9, 0, 9],
       [8, 5, 5, 6, 6]])

In [47]:
x[0] # 1st row

array([7, 4, 7, 3, 6])

In [48]:
x[2]  # 3rd row

array([0, 9, 7, 0, 7])

In [49]:
x[1][2]  # [row index][column index]

3

In [50]:
x[1, 2]  # [row index, column index]

3

In [51]:
# Assignment can also be done
x[1, 2] = 200
x

array([[  7,   4,   7,   3,   6],
       [  4,   2, 200,   6,   7],
       [  0,   9,   7,   0,   7],
       [  9,   9,   1,   2,   3],
       [  4,   4,   9,   0,   9],
       [  8,   5,   5,   6,   6]])

In [52]:
x[:, 0]  # All rows, 1st column

array([7, 4, 0, 9, 4, 8])

In [53]:
x[:3, :4]  # first 3 rows and first 4 columns

array([[  7,   4,   7,   3],
       [  4,   2, 200,   6],
       [  0,   9,   7,   0]])

In [54]:
x[1:4, 2:5]

array([[200,   6,   7],
       [  7,   0,   7],
       [  1,   2,   3]])

## Broadcasting

In [55]:
np.random.seed(0)  # so that everytime same random numbers are generated
x = np.random.randint(0, 10, size=(3, 4))
x

array([[5, 0, 3, 3],
       [7, 9, 3, 5],
       [2, 4, 7, 6]])

In [56]:
x + 3

array([[ 8,  3,  6,  6],
       [10, 12,  6,  8],
       [ 5,  7, 10,  9]])

In [57]:
x * 3

array([[15,  0,  9,  9],
       [21, 27,  9, 15],
       [ 6, 12, 21, 18]])

In [58]:
y = np.arange(1, 5)
y

array([1, 2, 3, 4])

In [59]:
x

array([[5, 0, 3, 3],
       [7, 9, 3, 5],
       [2, 4, 7, 6]])

In [60]:
x.shape

(3, 4)

In [61]:
y.shape

(4,)

In [62]:
x + y

array([[ 6,  2,  6,  7],
       [ 8, 11,  6,  9],
       [ 3,  6, 10, 10]])

In [63]:
z = np.arange(1,4).reshape(3,1)
z

array([[1],
       [2],
       [3]])

In [64]:
x + z

array([[ 6,  1,  4,  4],
       [ 9, 11,  5,  7],
       [ 5,  7, 10,  9]])

## Vectorized Operations

In [65]:
x = np.arange(-5, 5)
x

array([-5, -4, -3, -2, -1,  0,  1,  2,  3,  4])

In [66]:
np.sum(x)

-5

In [67]:
np.mean(x)

-0.5

In [68]:
np.max(x)

4

In [69]:
np.min(x)

-5

In [70]:
np.random.seed(5)
x = np.random.randint(0, 10, size=(4, 5))
x

array([[3, 6, 6, 0, 9],
       [8, 4, 7, 0, 0],
       [7, 1, 5, 7, 0],
       [1, 4, 6, 2, 9]])

In [71]:
np.sum(x)

85

In [72]:
np.sum(x, axis=0)

array([19, 15, 24,  9, 18])

In [73]:
np.sum(x, axis=1)

array([24, 19, 20, 22])

The parameter `axis` is intuitively defined as per **which dimension is collapse** during the operation.
Note that shape of an array has format **(row, column)**. Thus, row axis is `axis=0` and column axis is `axis=1`. This intuition about axis is more helpful than defining it by column-wise/row-wise operation which might be quite confusing.

So, `axis=0` in the above example means the row gets collapsed and the result is column-wise sum. And `axis=1` means the column gets collapsed and the result is row-wise sum.

In [74]:
np.mean(x, axis=0)

array([4.75, 3.75, 6.  , 2.25, 4.5 ])

In [75]:
y = np.array([7, 4, 6, 3, 8, 6, 3, 1, 5, 5, 2, 7, 4, 9, 6, 4])

In [76]:
np.argmax(y)

13

In [77]:
x

array([[3, 6, 6, 0, 9],
       [8, 4, 7, 0, 0],
       [7, 1, 5, 7, 0],
       [1, 4, 6, 2, 9]])

In [78]:
np.argmax(x)

4

In [79]:
np.argmax(x, axis=0)

array([1, 0, 1, 2, 0])

In [80]:
y.shape

(16,)

In [81]:
y[np.argmax(y)] == np.max(y)

True

In [82]:
np.argmin(y)

7

## Universal Functions
A universal function, or *ufunc*, is a function that performs elementwise operations on data in ndarrays.

In [83]:
x = np.arange(-5, 5)
x

array([-5, -4, -3, -2, -1,  0,  1,  2,  3,  4])

In [84]:
np.square(x)

array([25, 16,  9,  4,  1,  0,  1,  4,  9, 16])

In [85]:
np.sin(x)

array([ 0.95892427,  0.7568025 , -0.14112001, -0.90929743, -0.84147098,
        0.        ,  0.84147098,  0.90929743,  0.14112001, -0.7568025 ])

In [86]:
np.exp(x)

array([6.73794700e-03, 1.83156389e-02, 4.97870684e-02, 1.35335283e-01,
       3.67879441e-01, 1.00000000e+00, 2.71828183e+00, 7.38905610e+00,
       2.00855369e+01, 5.45981500e+01])

In [87]:
np.abs(x)

array([5, 4, 3, 2, 1, 0, 1, 2, 3, 4])

# Boolean Indexing

In [88]:
np.random.seed(5)
x = np.random.randint(0, 20, size=15)
x

array([ 3, 14, 15,  6, 16,  9,  8,  4,  7, 16, 16,  7, 12, 15, 17])

In [89]:
x > 10

array([False,  True,  True, False,  True, False, False, False, False,
        True,  True, False,  True,  True,  True])

In [90]:
x[x > 10]

array([14, 15, 16, 16, 16, 12, 15, 17])

In [91]:
x[x % 2 == 0]

array([14,  6, 16,  8,  4, 16, 16, 12])

# Task 5
We have marks in 5 subjects of 10 students. Each row denotes a student and each column denotes a subject.

In [92]:
students = np.array(['Aakash', 'Bikram', 'Dinesh', 'Garima', 'Indira', 'Manisha', 'Nishan', 'Pinky', 'Richa', 'Saroj'])
subjects = np.array(['Physics', 'Chemistry', 'Biology', 'Mathematics', 'English'])

fm = 100  # full mark of each subject
pm = 40  # pass mark of each subject

marks = np.array([[76, 40, 77, 83, 67],
                   [38, 70, 54, 93, 43],
                   [63, 92, 41, 95, 48],
                   [43, 68, 53, 95, 81],
                   [53, 69, 69, 74, 77],
                   [75, 41, 79, 83, 93],
                   [65, 74, 43, 57, 58],
                   [34, 72, 51, 69, 85],
                   [50, 91, 50, 57, 74],
                   [43, 32, 73, 65, 50]])

### Questions
1. Who obtained the highest total marks? <sup><sub>Hint: Use `np.sum()` and `np.argmax()`</sub></sup>
2. Calculate percentage of each student. <sup><sub>Hint: Use `np.sum()`</sub></sup>
2. Find the names of student who failed the exam. <sup><sub>Hint: Use `np.min()` and boolean indexing</sub></sup>
3. How many people failed in Physics? <sup><sub>Hint: Use boolean indexing and `np.count_nonzero()`</sub></sup>
4. Which was the most difficult subject? Assume the subject with least average score to be the most difficult. <sup><sub>Hint: Use `np.mean()` and `np.argmin()`</sub></sup>