<img style = 'float:left;' src = 'nump.jpg'> 

* Numerical Python
* `numpy` is the fundamental package for scientific computing with Python
* Much more efficient data storage and operations capability
* Entire ecosystem of Python data science tools depend on `numpy`

In [105]:
import numpy as np
# Numpy is the linear algebra library for python
np.__version__

'2.0.1'

## 1. Creating Arrays

### 1.1 Arrays from Scratch

Numbers 0 to 9

In [4]:
# np.arange(start,stop,step)
np.arange(10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

5 zeroes

In [5]:
np.zeros(5)

array([0., 0., 0., 0., 0.])

An array of ones

In [6]:
np.ones(8)

array([1., 1., 1., 1., 1., 1., 1., 1.])

Identity matrix

In [7]:
np.eye(3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

2 x 3 array filled with 1

In [8]:
# np.full((nrows,ncols),value)
np.full((2,3),1) 

array([[1, 1, 1],
       [1, 1, 1]])

3 x 4 array with random integers between 0 and 10

In [9]:
np.random.randint(0,10,(3,4))

array([[7, 7, 6, 0],
       [4, 4, 7, 3],
       [3, 2, 4, 5]])

1d array

In [10]:
np.random.randint(9,20,5)

array([15, 17, 12, 16, 15])

3 x 4 array with random uniform values between 0 and 1

In [11]:
np.random.random((3,4))

array([[0.42316336, 0.80637694, 0.68199   , 0.0720663 ],
       [0.04459208, 0.81670141, 0.02208351, 0.10607316],
       [0.27737815, 0.72999534, 0.28207988, 0.71441773]])

In [12]:
# this is another version with just random.rand(), solves the same purpose as above
np.random.rand(3,4)

array([[0.11872036, 0.108744  , 0.64006385, 0.933115  ],
       [0.74148363, 0.94796836, 0.36797648, 0.30460754],
       [0.51075006, 0.24825916, 0.7780066 , 0.06128184]])

4 x 5 array with normally distributed random values having mean 0 and SD 1

In [13]:
np.random.normal(0,1,(4,5))

array([[-0.08838208, -0.45187425, -0.5033508 ,  1.88108185, -0.77130542],
       [ 1.74673312,  1.19153694,  0.63398389,  0.57685291, -0.50065332],
       [-0.54076169,  1.1847647 ,  0.20106228, -1.91674979, -0.43059127],
       [ 1.03644788,  0.87693453,  0.04928865,  0.02819703, -1.72525702]])

Here mean is by default 0 and SD is 1.

Similar to the above method

In [14]:
np.random.randn(4,5)

array([[-1.07256924, -2.07665089,  0.90465491, -1.11878443,  0.87677282],
       [-1.7878661 ,  0.01517633,  1.23532742, -0.36319069, -1.38976594],
       [-0.77540018,  0.29943314, -0.04363676,  1.64895244,  2.14865226],
       [-0.05927135,  0.73046972,  1.03324364, -0.2360624 , -0.13143388]])

array from 0 to 10 with 5 equally spaced values

In [15]:
# (start,stop,# of samples)
np.linspace(0,10,5)

array([ 0. ,  2.5,  5. ,  7.5, 10. ])

### 1.2 Creating arrays fom lists or tuples

In [16]:
np.array([4,5,6])

array([4, 5, 6])

In [17]:
alist = [9, 7, 5, 6]
np.array(alist)

array([9, 7, 5, 6])

In [18]:
atuple = (4, 5, 6, 7)
np.array(atuple)

array([4, 5, 6, 7])

List of lists

In [19]:
list_list = [[i, i+2] for i in range(0,5)]
list_list

[[0, 2], [1, 3], [2, 4], [3, 5], [4, 6]]

2D array from lists of lists

In [20]:
my_arr = np.array(list_list)

In [21]:
my_arr2 = np.arange(10)
my_arr2

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Convert array back to list

In [22]:
list(my_arr2)

[np.int64(0),
 np.int64(1),
 np.int64(2),
 np.int64(3),
 np.int64(4),
 np.int64(5),
 np.int64(6),
 np.int64(7),
 np.int64(8),
 np.int64(9)]

In [23]:
tuple(my_arr2)

(np.int64(0),
 np.int64(1),
 np.int64(2),
 np.int64(3),
 np.int64(4),
 np.int64(5),
 np.int64(6),
 np.int64(7),
 np.int64(8),
 np.int64(9))

## 2. Indexing and Modifying Arrays

### 2.1 Array Attributes

Dimensions

In [24]:
my_arr = np.array([[0, 2], [1, 3], [2, 4], [3, 5], [4, 6]])
my_arr

array([[0, 2],
       [1, 3],
       [2, 4],
       [3, 5],
       [4, 6]])

In [25]:
# shape
my_arr.shape

(5, 2)

**ndim**

In [26]:
my_arr.ndim

2

**size**

In [27]:
# total number of elements
my_arr.size

10

reshape

In [28]:
# doesn't happen inplace. Original array remains the same.
my_arr.reshape(10)

array([0, 2, 1, 3, 2, 4, 3, 5, 4, 6])

In [29]:
my_arr 

array([[0, 2],
       [1, 3],
       [2, 4],
       [3, 5],
       [4, 6]])

### 2.2 Indexing

similar to lists

`npa[start:stop:step]`

In [30]:
my_arr2 = np.array([31, 89, 94, 56, 34, 69, 98, 41, 53, 83, 77])
my_arr2

array([31, 89, 94, 56, 34, 69, 98, 41, 53, 83, 77])

Get 69

In [31]:
my_arr2[5]

np.int64(69)

Get all the numbers in reverse order

In [32]:
my_arr2[::-1]

array([77, 83, 53, 41, 98, 69, 34, 56, 94, 89, 31])

In [33]:
my_arr2

array([31, 89, 94, 56, 34, 69, 98, 41, 53, 83, 77])

Get numbers from 98 to 89 in reverse order

In [33]:
my_arr2[1:7][::-1]

array([98, 69, 34, 56, 94, 89])

2D arrays

`[start:stop:step, start:stop:step]`

In [34]:
# .seed just makes sure that all of our random numbers match
np.random.seed(32)
my_arr = np.random.rand(4,5)
my_arr

array([[0.85888927, 0.37271115, 0.55512878, 0.95565655, 0.7366696 ],
       [0.81620514, 0.10108656, 0.92848807, 0.60910917, 0.59655344],
       [0.09178413, 0.34518624, 0.66275252, 0.44171349, 0.55148779],
       [0.70371249, 0.58940123, 0.04993276, 0.56179184, 0.76635847]])

In [35]:
# my_arr[start:stop:step, start:stop:step]
my_arr[:,:]

array([[0.85888927, 0.37271115, 0.55512878, 0.95565655, 0.7366696 ],
       [0.81620514, 0.10108656, 0.92848807, 0.60910917, 0.59655344],
       [0.09178413, 0.34518624, 0.66275252, 0.44171349, 0.55148779],
       [0.70371249, 0.58940123, 0.04993276, 0.56179184, 0.76635847]])

Get rows 0,1 and columns 0,1

In [36]:
my_arr[0:2,0:2]

array([[0.85888927, 0.37271115],
       [0.81620514, 0.10108656]])

In Python, we generally use ':' after ',' to get all rows or columns.

Get row 1, all columns

In [37]:
my_arr[1,:]

array([0.81620514, 0.10108656, 0.92848807, 0.60910917, 0.59655344])

Get all rows, and column with index 1

In [38]:
my_arr[:,1] 

array([0.37271115, 0.10108656, 0.34518624, 0.58940123])

In [39]:
my_arr

array([[0.85888927, 0.37271115, 0.55512878, 0.95565655, 0.7366696 ],
       [0.81620514, 0.10108656, 0.92848807, 0.60910917, 0.59655344],
       [0.09178413, 0.34518624, 0.66275252, 0.44171349, 0.55148779],
       [0.70371249, 0.58940123, 0.04993276, 0.56179184, 0.76635847]])

Get a particular element

In [40]:
my_arr[2,2]

np.float64(0.6627525231855876)

row 1 and columns from 1 to 2

In [41]:
my_arr[1,1:3]

array([0.10108656, 0.92848807])

In [42]:
my_arr

array([[0.85888927, 0.37271115, 0.55512878, 0.95565655, 0.7366696 ],
       [0.81620514, 0.10108656, 0.92848807, 0.60910917, 0.59655344],
       [0.09178413, 0.34518624, 0.66275252, 0.44171349, 0.55148779],
       [0.70371249, 0.58940123, 0.04993276, 0.56179184, 0.76635847]])

Get columns 0, 2, 4 in reverse order

In [43]:
my_arr[:,::-2]

array([[0.7366696 , 0.55512878, 0.85888927],
       [0.59655344, 0.92848807, 0.81620514],
       [0.55148779, 0.66275252, 0.09178413],
       [0.76635847, 0.04993276, 0.70371249]])

### 2.3 Boolean Masking

**Using conditionals in slicing**

In [44]:
np.random.seed(32)
my_arr = np.random.rand(10)
my_arr

array([0.85888927, 0.37271115, 0.55512878, 0.95565655, 0.7366696 ,
       0.81620514, 0.10108656, 0.92848807, 0.60910917, 0.59655344])

In [45]:
my_arr > 0.5

array([ True, False,  True,  True,  True,  True, False,  True,  True,
        True])

Get only those array elements whose value is greater than 0.5

In [46]:
my_arr[my_arr > 0.5]

array([0.85888927, 0.55512878, 0.95565655, 0.7366696 , 0.81620514,
       0.92848807, 0.60910917, 0.59655344])

In [47]:
np.random.seed(45)
my_arr2D = np.random.rand(3,5)
my_arr2D

array([[0.98901151, 0.54954473, 0.2814473 , 0.07728957, 0.4444695 ],
       [0.47280797, 0.048522  , 0.16332445, 0.11595071, 0.62739168],
       [0.85618205, 0.65010242, 0.99072168, 0.47035075, 0.61829448]])

Get a boolean array

In [48]:
my_arr2D > 0.5 

array([[ True,  True, False, False, False],
       [False, False, False, False,  True],
       [ True,  True,  True, False,  True]])

In [49]:
my_arr2D[my_arr2D > 0.5]

array([0.98901151, 0.54954473, 0.62739168, 0.85618205, 0.65010242,
       0.99072168, 0.61829448])

In [50]:
my_arr2D

array([[0.98901151, 0.54954473, 0.2814473 , 0.07728957, 0.4444695 ],
       [0.47280797, 0.048522  , 0.16332445, 0.11595071, 0.62739168],
       [0.85618205, 0.65010242, 0.99072168, 0.47035075, 0.61829448]])

Get all rows for which column 2 values are greater than 0.2

In [51]:
my_arr2D[:,2] > 0.2

array([ True, False,  True])

In [52]:
my_arr2D[(my_arr2D[:,2] > 0.2), 0:2]

array([[0.98901151, 0.54954473],
       [0.85618205, 0.65010242]])

In [53]:
np.random.seed(32)
my_arr1 = np.random.randint(0,10,6)
my_arr2 = np.random.randint(0,10,6)
print ("my_arr1" + " is " + str(my_arr1))
print ("my_arr2" + " is " + str(my_arr2))

my_arr1 is [7 5 6 8 3 7]
my_arr2 is [9 3 5 9 4 1]


### 2.4 Modifying arrays

In [54]:
np.random.seed(32)
my_arr = np.random.rand(4,5)
my_arr

array([[0.85888927, 0.37271115, 0.55512878, 0.95565655, 0.7366696 ],
       [0.81620514, 0.10108656, 0.92848807, 0.60910917, 0.59655344],
       [0.09178413, 0.34518624, 0.66275252, 0.44171349, 0.55148779],
       [0.70371249, 0.58940123, 0.04993276, 0.56179184, 0.76635847]])

In [55]:
my_arr[1,2] = 9
my_arr

array([[0.85888927, 0.37271115, 0.55512878, 0.95565655, 0.7366696 ],
       [0.81620514, 0.10108656, 9.        , 0.60910917, 0.59655344],
       [0.09178413, 0.34518624, 0.66275252, 0.44171349, 0.55148779],
       [0.70371249, 0.58940123, 0.04993276, 0.56179184, 0.76635847]])

Replacing all rows of 4th column with new values

In [56]:
my_arr[:,4] = np.array([8, 6, 9, 22])
my_arr

array([[ 0.85888927,  0.37271115,  0.55512878,  0.95565655,  8.        ],
       [ 0.81620514,  0.10108656,  9.        ,  0.60910917,  6.        ],
       [ 0.09178413,  0.34518624,  0.66275252,  0.44171349,  9.        ],
       [ 0.70371249,  0.58940123,  0.04993276,  0.56179184, 22.        ]])

appending, inserting, and deleting

In [57]:
np.random.seed(32)
my_arr1D = np.random.randint(0,20,5)
my_arr1D

array([11,  5, 19,  7,  3])

Add an element at the end

In [58]:
np.append(my_arr1D, 100)

array([ 11,   5,  19,   7,   3, 100])

Insert element at index 3, and with value 200

In [59]:
np.insert(my_arr1D,3,200)

array([ 11,   5,  19, 200,   7,   3])

For all the above methods, changes are not in place.

In [60]:
my_arr1D

array([11,  5, 19,  7,  3])

Delete element at index 1

In [61]:
np.delete(my_arr1D,1)

array([11, 19,  7,  3])

Concatenation and splitting

In [62]:
my_arr1 = np.array([1, 2, 3, 4])
my_arr2 = np.array([9, 9, 9, 9])

Vertical stacking

In [63]:
np.vstack([my_arr1, my_arr2])

array([[1, 2, 3, 4],
       [9, 9, 9, 9]])

Horizontal stacking, `np.concatenate` also does something similar

In [64]:
np.hstack([my_arr1, my_arr2])

array([1, 2, 3, 4, 9, 9, 9, 9])

In [65]:
np.random.seed(32)
my_arr = np.random.randint(0,10,(4,5))
my_arr

array([[7, 5, 6, 8, 3],
       [7, 9, 3, 5, 9],
       [4, 1, 3, 1, 2],
       [3, 8, 2, 4, 2]])

row index, where the split should happen. also try `np.hsplit()`

In [66]:
upsplit, losplit = np.vsplit(my_arr,[2])

In [67]:
upsplit

array([[7, 5, 6, 8, 3],
       [7, 9, 3, 5, 9]])

In [68]:
losplit

array([[4, 1, 3, 1, 2],
       [3, 8, 2, 4, 2]])

In [69]:
lsplit, rsplit = np.hsplit(my_arr,[2])

In [70]:
lsplit

array([[7, 5],
       [7, 9],
       [4, 1],
       [3, 8]])

In [71]:
rsplit

array([[6, 8, 3],
       [3, 5, 9],
       [3, 1, 2],
       [2, 4, 2]])

Other array manipulation methods please see link below
* https://docs.scipy.org/doc/numpy/reference/routines.array-manipulation.html

Aliased sub arrays. If we mutate sub arrays, originals are also mutated.

In [72]:
sub_arr = my_arr[:,1:2]
sub_arr

array([[5],
       [9],
       [1],
       [8]])

In [73]:
sub_arr[:,0] = np.array([9, 9, 9, 9])
sub_arr

array([[9],
       [9],
       [9],
       [9]])

In [74]:
my_arr

array([[7, 9, 6, 8, 3],
       [7, 9, 3, 5, 9],
       [4, 9, 3, 1, 2],
       [3, 9, 2, 4, 2]])

### 2.5 UFuncs and Algebraic operations

With lists element-wise operations need for loops

In [75]:
al = [1, 2, 3, 4]

In [76]:
al*4

[1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4]

In [77]:
al + [1 ,2 ,3, 4]

[1, 2, 3, 4, 1, 2, 3, 4]

In [78]:
for i in al:
    print (i*4)

4
8
12
16


List comprehensions

In [79]:
[4*i for i in al]

[4, 8, 12, 16]

Various operations

In [80]:
my_arr = np.arange(10)
my_arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [81]:
my_arr*4

array([ 0,  4,  8, 12, 16, 20, 24, 28, 32, 36])

In [82]:
my_arr-500

array([-500, -499, -498, -497, -496, -495, -494, -493, -492, -491])

In [83]:
my_arr/5

array([0. , 0.2, 0.4, 0.6, 0.8, 1. , 1.2, 1.4, 1.6, 1.8])

Operations in an expression

In [84]:
my_arr*5+3

array([ 3,  8, 13, 18, 23, 28, 33, 38, 43, 48])

In [85]:
np.multiply(my_arr,4)

array([ 0,  4,  8, 12, 16, 20, 24, 28, 32, 36])

Some more Ufuncs: max, min, mean, sum, sqrt,exp, sin, cos, log, argmin, argmax, std, var, just use np.function

See https://docs.scipy.org/doc/numpy/reference/ufuncs.html


In [86]:
my_arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [87]:
np.mean(my_arr)

np.float64(4.5)

In [88]:
np.prod(my_arr)

np.int64(0)

In [89]:
np.sum(my_arr)

np.int64(45)

For several of these aggregating functions, we could also used methods on numpy array object itself

In [90]:
np.sqrt(my_arr)

array([0.        , 1.        , 1.41421356, 1.73205081, 2.        ,
       2.23606798, 2.44948974, 2.64575131, 2.82842712, 3.        ])

In [112]:
np.exp(my_arr)

array([1.00000000e+00, 2.71828183e+00, 7.38905610e+00, 2.00855369e+01,
       5.45981500e+01, 1.48413159e+02, 4.03428793e+02, 1.09663316e+03,
       2.98095799e+03, 8.10308393e+03])

Multi-dimensional aggregation

In [91]:
np.random.seed(32)
my_arr = np.random.randn(3,5)
my_arr

array([[-0.34889445,  0.98370343,  0.58092283,  0.07028444,  0.77753268],
       [ 0.58195875,  1.47179053,  1.66318101, -0.26117712, -0.68867681],
       [-0.69492326,  1.94042346,  1.80541519,  0.45631385, -0.57481204]])

Specify whether we want to apply the function along rows (0), or columns (1)

In [92]:
np.mean(my_arr, 0)

array([-0.15395299,  1.46530581,  1.34983968,  0.08847372, -0.16198539])

In [93]:
np.sum(my_arr, 1)

array([2.06354893, 2.76707635, 2.9324172 ])

Methods on numpy objects

In [95]:
# similar to Ufuncs, numpy objects themselves have several methods
my_arr.max()

np.float64(1.9404234595994223)

In [96]:
my_arr.sum()

np.float64(7.763042476841536)

In [98]:
# index position where maximum value occurs
my_arr.argmax()

np.int64(11)

***Extras(Optional)***

A few other numpy methods

In [99]:
np.random.seed(32)
my_arr = np.random.randint(0,10,20)
my_arr

array([7, 5, 6, 8, 3, 7, 9, 3, 5, 9, 4, 1, 3, 1, 2, 3, 8, 2, 4, 2])

`bincount` gives count (frequency) of each value starting from 0 to the largest value.

In [100]:
np.bincount(my_arr)

array([0, 2, 3, 4, 2, 2, 1, 2, 2, 2])

Gives the index position where value = 5

In [101]:
np.where(my_arr == 5)

(array([1, 8]),)

In [102]:
# To mutate the original use my_arr.sort()
np.sort(my_arr)

array([1, 1, 2, 2, 2, 3, 3, 3, 3, 4, 4, 5, 5, 6, 7, 7, 8, 8, 9, 9])

In [103]:
my_arr

array([7, 5, 6, 8, 3, 7, 9, 3, 5, 9, 4, 1, 3, 1, 2, 3, 8, 2, 4, 2])

In [104]:
# this will give indexes after sorting the array
np.argsort(my_arr)

array([11, 13, 19, 17, 14,  4, 15,  7, 12, 18, 10,  1,  8,  2,  0,  5, 16,
        3,  6,  9])