### NumPy - Numeric python <img src="https://upload.wikimedia.org/wikipedia/commons/thumb/1/1a/NumPy_logo.svg/1200px-NumPy_logo.svg.png" alt="NumPy logo" width = "100">

NumPy (np) is the premier Python package for scientific computing

https://numpy.org

Its powerful comes from the <b>N-dimensional array object</b>

np is a *lower*-level numerical computing library. 

This means that, while you can use it directly, most of its power comes from the packages built on top of np:
* Pandas (*Pan*els *Da*tas)
* Scikit-learn (machine learning)
* Scikit-image (image processing)
* OpenCV (computer vision)
* more...

<b>Importing NumPy<br>
Convention: use np alias</b>

In [None]:
import numpy as np

<img src="https://www.oreilly.com/library/view/elegant-scipy/9781491922927/assets/elsp_0105.png" alt="data structures" width="500">

<b>NumPy basics</b>

Arrays are designed to:
* handle vectorized operations lists are not
    * if you apply a function it is performed on every item in the array, rather than on the whole array object
* store multiple items <b>of the same data type</b>
* have 0-based indexing

* Missing values can be represented using `np.nan` object
    * the object `np.inf` represents infinite
* Array size cannot be changed, should create a new array
* An equivalent numpy array occupies much less space than a python list of lists

<b>Create Array</b><br>
https://docs.scipy.org/doc/numpy-1.13.0/user/basics.creation.html

In [1]:
# Build array from Python list
vector = np.array([1,2,3])
vector

array([1, 2, 3])

In [6]:
# matrix with zeros 
np.zeros((3,4), dtype = int)

array([[0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0]])

In [5]:
# matrix with 1s
np.ones((3,4), dtype=int)

array([[1, 1, 1, 1],
       [1, 1, 1, 1],
       [1, 1, 1, 1]])

In [7]:
# matrix with a constant value
value = 20
np.full((3,4), value)

array([[20, 20, 20, 20],
       [20, 20, 20, 20],
       [20, 20, 20, 20]])

In [8]:
# Create a 4x4 identity matrix
np.eye(4)        

array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.]])

In [10]:
# arange - numpy range
np.arange(10, 30, 2)

array([10, 12, 14, 16, 18, 20, 22, 24, 26, 28])

In [13]:
# evenly spaced numbers over a specified interval
ev_array = np.linspace(1, 10, 20)
print(ev_array)
ev_array.shape

[ 1.          1.47368421  1.94736842  2.42105263  2.89473684  3.36842105
  3.84210526  4.31578947  4.78947368  5.26315789  5.73684211  6.21052632
  6.68421053  7.15789474  7.63157895  8.10526316  8.57894737  9.05263158
  9.52631579 10.        ]


(20,)

<b>Random data</b><br>
https://docs.scipy.org/doc/numpy-1.13.0/reference/routines.random.html

In [14]:
# Create an array filled with random values
np.random.random((3,4))        

array([[0.06653214, 0.48323365, 0.55671644, 0.2925122 ],
       [0.91583343, 0.54295365, 0.12030665, 0.94693428],
       [0.80817943, 0.42028615, 0.54841892, 0.33795278]])

In [22]:
# Create an array filled with random values from the standard normal distribution
np.random.randn(3,4)    

array([[ 1.18962227, -1.69061683, -1.35639905, -1.23243451],
       [-0.54443916, -0.66817174,  0.00731456, -0.61293874],
       [ 1.29974807, -1.73309562, -0.9833101 ,  0.35750775]])

In [29]:
# Generate the same random numbers every time
# Set seed
np.random.seed(10)

In [30]:
np.random.randn(3,4)

array([[ 1.3315865 ,  0.71527897, -1.54540029, -0.00838385],
       [ 0.62133597, -0.72008556,  0.26551159,  0.10854853],
       [ 0.00429143, -0.17460021,  0.43302619,  1.20303737]])

In [46]:
np.random.seed(10)
print(np.random.randn(3,4))

print(np.random.randn(3,4))

np.random.seed(100)
print(np.random.randn(3,4))
                

[[ 1.3315865   0.71527897 -1.54540029 -0.00838385]
 [ 0.62133597 -0.72008556  0.26551159  0.10854853]
 [ 0.00429143 -0.17460021  0.43302619  1.20303737]]
[[-0.96506567  1.02827408  0.22863013  0.44513761]
 [-1.13660221  0.13513688  1.484537   -1.07980489]
 [-1.97772828 -1.7433723   0.26607016  2.38496733]]
[[-1.74976547  0.3426804   1.1530358  -0.25243604]
 [ 0.98132079  0.51421884  0.22117967 -1.07004333]
 [-0.18949583  0.25500144 -0.45802699  0.43516349]]


```python
# Create the random state
rs = np.random.RandomState(100)
```

<b>Basic array attributes:</b>
* shape: array dimension
* size: Number of elements in array
* ndim: Number of array dimension (len(arr.size))
* dtype: Data-type of the array
* T: The transpose of the array

In [40]:
matrix = np.array([[1,2,3],[4,5,6],[7,8,9],[10,11,12]])
matrix

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])

In [39]:
# let's check them out 
matrix.shape

(2, 3, 2)

In [33]:
matrix.size

12

In [37]:
matrix = np.array([[[1,2],[2,3],[3,4]],[[4,5],[4,6],[6,7]]])

In [38]:
matrix.ndim

3

In [35]:
matrix.dtype

dtype('int64')

In [41]:
matrix.T

array([[ 1,  4,  7, 10],
       [ 2,  5,  8, 11],
       [ 3,  6,  9, 12]])

In [42]:
matrix

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])

<b>Reshaping</b>

In [47]:
matrix

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])

In [48]:
# Reshaping
matrix_reshaped = matrix.reshape(2,6)
matrix_reshaped

array([[ 1,  2,  3,  4,  5,  6],
       [ 7,  8,  9, 10, 11, 12]])

<b>Slicing/Indexing</b>

In [49]:
# List-like
matrix_reshaped[1][1]

8

In [50]:
matrix_reshaped[1,3]

10

In [51]:
matrix_reshaped[1,:3]

array([7, 8, 9])

In [53]:
matrix_reshaped[:2,:3]

array([[1, 2, 3],
       [7, 8, 9]])

In [55]:
# iterrating ... let's print the elements of matrix_reshaped
nrows = matrix_reshaped.shape[0]
ncols = matrix_reshaped.shape[1]

for i in range(nrows):
    for j in range(ncols):
        print(matrix_reshaped[i,j])



1
2
3
4
5
6
7
8
9
10
11
12


In [56]:
# Fun arrays
checkers_board = np.zeros((8,8),dtype=int)
checkers_board[1::2,::2] = 1
checkers_board[::2,1::2] = 1
print(checkers_board)

[[0 1 0 1 0 1 0 1]
 [1 0 1 0 1 0 1 0]
 [0 1 0 1 0 1 0 1]
 [1 0 1 0 1 0 1 0]
 [0 1 0 1 0 1 0 1]
 [1 0 1 0 1 0 1 0]
 [0 1 0 1 0 1 0 1]
 [1 0 1 0 1 0 1 0]]


Create a 2d array with 1 on the border and 0 inside

In [57]:
boarder_array = np.zeros((8,8),dtype=int)
boarder_array[0,:] = 1

boarder_array

array([[1, 1, 1, 1, 1, 1, 1, 1],
       [0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0]])

In [60]:
boarder_array = np.ones((8,8),dtype=int)
boarder_array[1:-1,1:-1] = 0
boarder_array

array([[1, 1, 1, 1, 1, 1, 1, 1],
       [1, 0, 0, 0, 0, 0, 0, 1],
       [1, 0, 0, 0, 0, 0, 0, 1],
       [1, 0, 0, 0, 0, 0, 0, 1],
       [1, 0, 0, 0, 0, 0, 0, 1],
       [1, 0, 0, 0, 0, 0, 0, 1],
       [1, 0, 0, 0, 0, 0, 0, 1],
       [1, 1, 1, 1, 1, 1, 1, 1]])

In [67]:
boarder_array[:,-1]

array([1, 1, 1, 1, 1, 1, 1, 1])

<b>Performance</b>

test_list = list(range(int(1e6)))
<br>
test_vector = np.array(test_list)

In [61]:
test_list = list(range(int(1e6)))
test_vector = np.array(test_list)

In [62]:
%%timeit
sum(test_list)

3.48 ms ± 23.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [63]:
%%timeit
np.sum(test_vector)

459 µs ± 12.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


https://numpy.org/devdocs/user/quickstart.html#universal-functions

<b>Matrix operations</b>

https://www.tutorialspoint.com/matrix-manipulation-in-python<br>
Arithmetic operators on arrays apply elementwise. <br> 
A new array is created and filled with the result.


<b>Array broadcasting</b><br>

https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html<br>
The term broadcasting describes how numpy treats arrays with different shapes during arithmetic operations. <br>
Subject to certain constraints, the smaller array is “broadcast” across the larger array so that they have compatible shapes.

<img src = "https://www.tutorialspoint.com/numpy/images/array.jpg" height=10/>


https://www.tutorialspoint.com/numpy/numpy_broadcasting.htm

In [68]:
matrix

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])

In [71]:
np.array([1,2,3,4]).reshape(4,1)

array([[1],
       [2],
       [3],
       [4]])

In [72]:
matrix + np.array([1,2,3,4]).reshape(4,1)

array([[ 2,  3,  4],
       [ 6,  7,  8],
       [10, 11, 12],
       [14, 15, 16]])

In [74]:
matrix + np.array([1,2,3])

array([[ 2,  4,  6],
       [ 5,  7,  9],
       [ 8, 10, 12],
       [11, 13, 15]])

In [75]:
matrix * np.array([1,2,3,4]).reshape(4,1)

array([[ 1,  2,  3],
       [ 8, 10, 12],
       [21, 24, 27],
       [40, 44, 48]])

In [77]:
matrix2 = np.array([[1,2,3],[5,6,7],[1,1,1],[2,2,2]])
matrix2

array([[1, 2, 3],
       [5, 6, 7],
       [1, 1, 1],
       [2, 2, 2]])

In [78]:
matrix * matrix2

array([[ 1,  4,  9],
       [20, 30, 42],
       [ 7,  8,  9],
       [20, 22, 24]])

In [80]:
# matrix multiplication
matrix.dot(np.array([1,2,3]).reshape(3,1))

array([[14],
       [32],
       [50],
       [68]])

In [81]:
# matrix multiplication - more recently
matrix@(np.array([1,2,3]).reshape(3,1))

array([[14],
       [32],
       [50],
       [68]])

In [82]:
# stacking arrays together
np.vstack((matrix,matrix2))

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12],
       [ 1,  2,  3],
       [ 5,  6,  7],
       [ 1,  1,  1],
       [ 2,  2,  2]])

In [83]:
np.hstack((matrix,matrix2))

array([[ 1,  2,  3,  1,  2,  3],
       [ 4,  5,  6,  5,  6,  7],
       [ 7,  8,  9,  1,  1,  1],
       [10, 11, 12,  2,  2,  2]])

In [84]:
# splitting arrays 
np.vsplit(matrix,2)

[array([[1, 2, 3],
        [4, 5, 6]]), array([[ 7,  8,  9],
        [10, 11, 12]])]

In [85]:
np.hsplit(matrix,(2,3))

[array([[ 1,  2],
        [ 4,  5],
        [ 7,  8],
        [10, 11]]), array([[ 3],
        [ 6],
        [ 9],
        [12]]), array([], shape=(4, 0), dtype=int64)]

<b>Copy</b>

In [None]:
matrix

In [86]:
# shallow copy - looks at the same data
matrix_copy = matrix
matrix_copy1 = matrix.view()
print(matrix_copy)
print(matrix_copy1)

[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]
[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]


In [88]:
print(matrix)

print(matrix_copy)

print(matrix_copy1)

[[ 5  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]
[[ 5  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]
[[ 5  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]


In [87]:
matrix_copy1[0,0] = 5

In [89]:
# deep copy
matrix_copy2 = matrix.copy()
print(matrix_copy2)

[[ 5  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]


In [90]:
matrix_copy2[0,0] = 7

In [91]:
print(matrix)

print(matrix_copy)

print(matrix_copy1)

print(matrix_copy2)

[[ 5  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]
[[ 5  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]
[[ 5  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]
[[ 7  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]


<b>More matrix computation</b>

In [92]:
# conditional subsetting
matrix[(6 < matrix[:,0])]

array([[ 7,  8,  9],
       [10, 11, 12]])

In [93]:
matrix[(4 <= matrix[:,0]) & (matrix[:,0] <= 7)
       & (2 <= matrix[:,1]) & (matrix[:,1] <= 7),]

array([[5, 2, 3],
       [4, 5, 6]])

In [94]:
matrix

array([[ 5,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])

In [95]:
# col mean
matrix.mean(axis = 0)

array([6.5, 6.5, 7.5])

In [96]:
# row mean
matrix.mean(axis = 1)

array([ 3.33333333,  5.        ,  8.        , 11.        ])

In [100]:
# unique values and counts
matrix = np.random.random((3,4), )
matrix = np.array([[ 5,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])
uvals, counts = np.unique(matrix, return_counts=True)
print(uvals,counts)

[ 2  3  4  5  6  7  8  9 10 11 12] [1 1 1 2 1 1 1 1 1 1 1]


https://www.w3resource.com/python-exercises/numpy/index.php


Create a matrix of 5 rows and 6 columns with numbers from 1 to 30.
Add 2 to the odd values of the array.

In [115]:
matrix = np.arange(1,31).reshape(5,6)
matrix[matrix%2==1 ] +=  2 
matrix

array([[ 3,  2,  5,  4,  7,  6],
       [ 9,  8, 11, 10, 13, 12],
       [15, 14, 17, 16, 19, 18],
       [21, 20, 23, 22, 25, 24],
       [27, 26, 29, 28, 31, 30]])

Normalize the values in the matrix. Substract the mean and divide by the standard deviation.

In [120]:
mat_mean = np.mean(matrix)
mat_std = np.std(matrix)
matrix_norm = (matrix - mat_mean)/mat_std
matrix_norm

array([[-1.55971247, -1.67524673, -1.32864396, -1.44417822, -1.09757545,
        -1.2131097 ],
       [-0.86650693, -0.98204119, -0.63543842, -0.75097267, -0.4043699 ,
        -0.51990416],
       [-0.17330139, -0.28883564,  0.05776713, -0.05776713,  0.28883564,
         0.17330139],
       [ 0.51990416,  0.4043699 ,  0.75097267,  0.63543842,  0.98204119,
         0.86650693],
       [ 1.2131097 ,  1.09757545,  1.44417822,  1.32864396,  1.67524673,
         1.55971247]])

In [121]:
matrix

array([[ 3,  2,  5,  4,  7,  6],
       [ 9,  8, 11, 10, 13, 12],
       [15, 14, 17, 16, 19, 18],
       [21, 20, 23, 22, 25, 24],
       [27, 26, 29, 28, 31, 30]])

Create a random array (5 by 3) and compute: 
   * the sum of all elements 
   * the sum of the rows  
   * the sum of the columns

In [126]:
matrix = np.random.rand(5,3)
print(matrix)
matrix.sum()
matrix.sum(1)
matrix.sum(0)

[[0.21097842 0.36052525 0.54937526]
 [0.27183085 0.46060162 0.69616156]
 [0.5003559  0.71607099 0.52595594]
 [0.00139902 0.39470029 0.49216697]
 [0.40288033 0.3542983  0.50061432]]


array([1.38744452, 2.28619645, 2.76427405])

In [127]:
#Given a set of Gene Ontology (GO) terms and the genes that are associated with these terms find the gene 
#that is associated with the most GO terms

go_terms=np.array(["cellular response to nicotine",
                   "cellular response to hypoxia",
                   "cellular response to lipid"])
genes=np.array(["BAD","KCNJ11","MSX1","CASR","ZFP36L1"])

assoc_matrix = np.array([[1,1,0,1,0],[1,0,0,1,1],[1,0,0,0,0]])

print(assoc_matrix)

[[1 1 0 1 0]
 [1 0 0 1 1]
 [1 0 0 0 0]]


In [130]:
max(assoc_matrix.sum(0))

3

In [129]:
genes[0]

'BAD'