### NumPy - Numeric python <img src="https://upload.wikimedia.org/wikipedia/commons/thumb/1/1a/NumPy_logo.svg/1200px-NumPy_logo.svg.png" alt="NumPy logo" width = "100">

NumPy (np) is the premier Python package for scientific computing

https://numpy.org

Its powerful comes from the <b>N-dimensional array object</b>

np is a *lower*-level numerical computing library. 

This means that, while you can use it directly, most of its power comes from the packages built on top of np:
* Pandas (*Pan*els *Da*tas)
* Scikit-learn (machine learning)
* Scikit-image (image processing)
* OpenCV (computer vision)
* more...

<b>Importing NumPy<br>
Convention: use np alias</b>

In [1]:
import numpy as np

<img src="https://www.oreilly.com/library/view/elegant-scipy/9781491922927/assets/elsp_0105.png" alt="data structures" width="500">

<b>NumPy basics</b>

Arrays are designed to:
* handle vectorized operations lists are not
    * if you apply a function it is performed on every item in the array, rather than on the whole array object
* store multiple items <b>of the same data type</b>
* have 0-based indexing

* Missing values can be represented using `np.nan` object
    * the object `np.inf` represents infinite
* Array size cannot be changed, should create a new array
* An equivalent numpy array occupies much less space than a python list of lists

<b>Create Array</b><br>
https://docs.scipy.org/doc/numpy-1.13.0/user/basics.creation.html

In [6]:
# Build array from Python list
my_list = [1,2,3]
vector = np.array(my_list)
vector

array([1, 2, 3])

In [7]:
# matrix with zeros 
dimensions = (3,4)
np.zeros(dimensions)

array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

In [8]:
# matrix with 1s
np.ones((3,4), dtype=int)

array([[1, 1, 1, 1],
       [1, 1, 1, 1],
       [1, 1, 1, 1]])

In [9]:
# matrix with a constant value
value = 10
np.full((3,4), value)

array([[10, 10, 10, 10],
       [10, 10, 10, 10],
       [10, 10, 10, 10]])

In [10]:
# Create a 4x4 identity matrix
np.eye(4)        

array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.]])

In [11]:
# arange - numpy range
np.arange(10, 30, 2)

array([10, 12, 14, 16, 18, 20, 22, 24, 26, 28])

In [12]:
# evenly spaced numbers over a specified interval
np.linspace(1, 10, 20)

array([ 1.        ,  1.47368421,  1.94736842,  2.42105263,  2.89473684,
        3.36842105,  3.84210526,  4.31578947,  4.78947368,  5.26315789,
        5.73684211,  6.21052632,  6.68421053,  7.15789474,  7.63157895,
        8.10526316,  8.57894737,  9.05263158,  9.52631579, 10.        ])

In [13]:
# create array from a list of lists

list_array = [[1,2,3],[4,5,6]]
np.array(list_array)


array([[1, 2, 3],
       [4, 5, 6]])

<b>Random data</b><br>
https://docs.scipy.org/doc/numpy-1.13.0/reference/routines.random.html

In [14]:
# Create an array filled with random values
np.random.random((3,4))        

array([[0.43926119, 0.53169942, 0.14242743, 0.5625177 ],
       [0.70577487, 0.04839984, 0.73204806, 0.45749144],
       [0.76274122, 0.19205857, 0.6536739 , 0.65778395]])

In [18]:
# Create an array filled with random values from the standard normal distribution
np.random.randn(3,4)    

array([[ 0.55375447,  1.75813499,  1.34512361, -0.43734165],
       [-0.78668788, -0.61894156,  2.08241087,  0.89723289],
       [-1.24431676,  0.46540271, -1.02304119, -1.31352807]])

In [31]:
# Generate the same random numbers every time
# Set seed
np.random.seed(10)

In [32]:
np.random.randn(3,4)  

array([[ 1.3315865 ,  0.71527897, -1.54540029, -0.00838385],
       [ 0.62133597, -0.72008556,  0.26551159,  0.10854853],
       [ 0.00429143, -0.17460021,  0.43302619,  1.20303737]])

```python
# Create the random state
rs = np.random.RandomState(100)
```

<b>Basic array attributes:</b>
* shape: array dimension
* size: Number of elements in array
* ndim: Number of array dimension (len(arr.size))
* dtype: Data-type of the array
* T: The transpose of the array

In [33]:
matrix = np.array([[1,2,3],[4,5,6],[7,8,9],[10,11,12]])
matrix

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])

In [34]:
# let's check them out 
matrix.shape

(4, 3)

In [35]:
matrix.size

12

In [36]:
matrix.ndim

2

In [37]:
matrix.dtype

dtype('int64')

In [38]:
matrix

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])

<b>Reshaping</b>

In [39]:
matrix.T

array([[ 1,  4,  7, 10],
       [ 2,  5,  8, 11],
       [ 3,  6,  9, 12]])

In [40]:
matrix

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])

In [43]:
# Reshaping
matrix_reshaped = matrix.reshape(2,6)
matrix_reshaped

array([[ 1,  2,  3,  4,  5,  6],
       [ 7,  8,  9, 10, 11, 12]])

<b>Slicing/Indexing</b>

In [42]:
# List-like
matrix_reshaped[1][1]

8

In [44]:
matrix_reshaped[1,3]

10

In [45]:
matrix_reshaped[1,:3]

array([7, 8, 9])

In [50]:
# both rows and columns up to column 4
matrix_reshaped[:,2:4]

array([[ 3,  4],
       [ 9, 10]])

In [51]:
matrix_reshaped

array([[ 1,  2,  3,  4,  5,  6],
       [ 7,  8,  9, 10, 11, 12]])

In [60]:
# iterrating ... let's print the elements of matrix_reshaped
rows = matrix_reshaped.shape[0]
cols = matrix_reshaped.shape[1]

#print(rows,cols)
for i in range(rows):
    for j in range(cols):
        print(matrix_reshaped[i,j])


1
2
3
4
5
6
7
8
9
10
11
12


In [61]:
# Fun arrays
checkers_board = np.zeros((8,8),dtype=int)
checkers_board[1::2,::2] = 1
checkers_board[::2,1::2] = 1
print(checkers_board)

[[0 1 0 1 0 1 0 1]
 [1 0 1 0 1 0 1 0]
 [0 1 0 1 0 1 0 1]
 [1 0 1 0 1 0 1 0]
 [0 1 0 1 0 1 0 1]
 [1 0 1 0 1 0 1 0]
 [0 1 0 1 0 1 0 1]
 [1 0 1 0 1 0 1 0]]


In [63]:
checkers_board[2,2] = 5
checkers_board

array([[0, 1, 0, 1, 0, 1, 0, 1],
       [1, 0, 1, 0, 1, 0, 1, 0],
       [0, 1, 5, 1, 0, 1, 0, 1],
       [1, 0, 1, 0, 1, 0, 1, 0],
       [0, 1, 0, 1, 0, 1, 0, 1],
       [1, 0, 1, 0, 1, 0, 1, 0],
       [0, 1, 0, 1, 0, 1, 0, 1],
       [1, 0, 1, 0, 1, 0, 1, 0]])

Create a 2d (8 by 8) array with 1 on the border and 0 inside

In [66]:
border_array = np.zeros((8,8),dtype=int)
border_array[0,:] = 1
border_array

array([[1, 1, 1, 1, 1, 1, 1, 1],
       [0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0]])

In [67]:
border_array = np.ones((8,8),dtype=int)
border_array[1:border_array.shape[0]-1,1:border_array.shape[1]-1] = 0
border_array

array([[1, 1, 1, 1, 1, 1, 1, 1],
       [1, 0, 0, 0, 0, 0, 0, 1],
       [1, 0, 0, 0, 0, 0, 0, 1],
       [1, 0, 0, 0, 0, 0, 0, 1],
       [1, 0, 0, 0, 0, 0, 0, 1],
       [1, 0, 0, 0, 0, 0, 0, 1],
       [1, 0, 0, 0, 0, 0, 0, 1],
       [1, 1, 1, 1, 1, 1, 1, 1]])

<b>Performance</b>

In [70]:
test_list = list(range(int(1e6)))

test_vector = np.array(test_list)

In [72]:
%%timeit
sum(test_list)

3.56 ms ± 125 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [73]:
%%timeit
np.sum(test_vector)

420 µs ± 4.35 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


https://numpy.org/devdocs/user/quickstart.html#universal-functions

<b>Matrix operations</b>

https://www.tutorialspoint.com/matrix-manipulation-in-python<br>
Arithmetic operators on arrays apply elementwise. <br> 
A new array is created and filled with the result.


<b>Array broadcasting</b><br>

https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html<br>
The term broadcasting describes how numpy treats arrays with different shapes during arithmetic operations. <br>
Subject to certain constraints, the smaller array is “broadcast” across the larger array so that they have compatible shapes.

<img src = "https://www.tutorialspoint.com/numpy/images/array.jpg" height=10/>


https://www.tutorialspoint.com/numpy/numpy_broadcasting.htm

In [74]:
matrix

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])

In [75]:
matrix + np.array([1,2,3,4]).reshape(4,1)

array([[ 2,  3,  4],
       [ 6,  7,  8],
       [10, 11, 12],
       [14, 15, 16]])

In [76]:
matrix * np.array([1,2,3,4]).reshape(4,1)

array([[ 1,  2,  3],
       [ 8, 10, 12],
       [21, 24, 27],
       [40, 44, 48]])

In [112]:
np.array([1,2,3,4]).reshape(1,4).T

array([[1],
       [2],
       [3],
       [4]])

In [77]:
matrix2 = np.array([[1,2,3],[5,6,7],[1,1,1],[2,2,2]])

In [78]:
matrix * matrix2

array([[ 1,  4,  9],
       [20, 30, 42],
       [ 7,  8,  9],
       [20, 22, 24]])

In [82]:
# matrix multiplication
matrix.dot(np.array([1,2,3]).reshape(3,1))

array([[14],
       [32],
       [50],
       [68]])

In [83]:
# matrix multiplication - more recently
matrix@(np.array([1,2,3]).reshape(3,1))

array([[14],
       [32],
       [50],
       [68]])

In [84]:
# stacking arrays together
np.vstack((matrix,matrix2))

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12],
       [ 1,  2,  3],
       [ 5,  6,  7],
       [ 1,  1,  1],
       [ 2,  2,  2]])

In [85]:
np.hstack((matrix,matrix2))

array([[ 1,  2,  3,  1,  2,  3],
       [ 4,  5,  6,  5,  6,  7],
       [ 7,  8,  9,  1,  1,  1],
       [10, 11, 12,  2,  2,  2]])

In [89]:
# splitting arrays 
np.vsplit(matrix,2)

[array([[1, 2, 3],
        [4, 5, 6]]), array([[ 7,  8,  9],
        [10, 11, 12]])]

In [90]:
np.hsplit(matrix,(2,3))

[array([[ 1,  2],
        [ 4,  5],
        [ 7,  8],
        [10, 11]]), array([[ 3],
        [ 6],
        [ 9],
        [12]]), array([], shape=(4, 0), dtype=int64)]

<b>Copy</b>

In [91]:
matrix

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])

In [92]:
# shallow copy - looks at the same data
matrix_copy = matrix
matrix_copy1 = matrix.view()
print(matrix_copy)
print(matrix_copy1)

[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]
[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]


In [94]:
matrix_copy1[0,0] = 5
print(matrix_copy)
print(matrix_copy1)
print(matrix)



[[ 5  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]
[[ 5  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]
[[ 5  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]


In [96]:
# deep copy
matrix_copy2 = matrix.copy()
print(matrix_copy2)

[[ 5  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]


In [97]:
matrix_copy2[0,0] = 77
print(matrix_copy)
print(matrix_copy1)
print(matrix)
print(matrix_copy2)



[[ 5  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]
[[ 5  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]
[[ 5  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]
[[77  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]


In [98]:
matrix

array([[ 5,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])

<b>More matrix computation</b>

In [99]:
# conditional subsetting
matrix[(4 < matrix[:,0])]

array([[ 5,  2,  3],
       [ 7,  8,  9],
       [10, 11, 12]])

In [100]:
matrix[(1 <= matrix[:,0]) & (matrix[:,0] <= 6)
       & (2 <= matrix[:,1]) & (matrix[:,1] <= 7),]

array([[5, 2, 3],
       [4, 5, 6]])

In [101]:
# row mean
matrix.mean(axis = 0)

array([6.5, 6.5, 7.5])

In [102]:
dir(matrix)

['T',
 '__abs__',
 '__add__',
 '__and__',
 '__array__',
 '__array_finalize__',
 '__array_function__',
 '__array_interface__',
 '__array_prepare__',
 '__array_priority__',
 '__array_struct__',
 '__array_ufunc__',
 '__array_wrap__',
 '__bool__',
 '__class__',
 '__complex__',
 '__contains__',
 '__copy__',
 '__deepcopy__',
 '__delattr__',
 '__delitem__',
 '__dir__',
 '__divmod__',
 '__doc__',
 '__eq__',
 '__float__',
 '__floordiv__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__gt__',
 '__hash__',
 '__iadd__',
 '__iand__',
 '__ifloordiv__',
 '__ilshift__',
 '__imatmul__',
 '__imod__',
 '__imul__',
 '__index__',
 '__init__',
 '__init_subclass__',
 '__int__',
 '__invert__',
 '__ior__',
 '__ipow__',
 '__irshift__',
 '__isub__',
 '__iter__',
 '__itruediv__',
 '__ixor__',
 '__le__',
 '__len__',
 '__lshift__',
 '__lt__',
 '__matmul__',
 '__mod__',
 '__mul__',
 '__ne__',
 '__neg__',
 '__new__',
 '__or__',
 '__pos__',
 '__pow__',
 '__radd__',
 '__rand__',
 '__rdivmod__',
 '__

In [107]:
# unique values and counts
matrix = np.random.random((3,4), )
uvals, counts = np.unique(matrix, return_counts=True)
#

matrix = np.array([[1,2,3],[4,5,6],[7,8,9],[10,11,12]])
matrix[0,0] =5
print(matrix)


uvals, counts = np.unique(matrix, return_counts=True)
print (uvals, counts)

[[ 5  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]
[ 2  3  4  5  6  7  8  9 10 11 12] [1 1 1 2 1 1 1 1 1 1 1]


https://www.w3resource.com/python-exercises/numpy/index.php


Create a matrix of 5 rows and 3 columns with numbers from 1 to 30.
Add 2 to the odd values of the array.

Normalize the values in the matrix. Substract the mean and divide by the standard deviation.

Create a random array (5 by 3) and compute: 
   * the sum of all elements 
   * the sum of the rows  
   * the sum of the columns

In [None]:
#Given a set of Gene Ontology (GO) terms and the genes that are associated with these terms find the gene 
#that is associated with the most GO terms

go_terms=np.array(["cellular response to nicotine",
                   "cellular response to hypoxia",
                   "cellular response to lipid"])
genes=np.array(["BAD","KCNJ11","MSX1","CASR","ZFP36L1"])

assoc_matrix = np.array([[1,1,0,1,0],[1,0,0,1,1],[1,0,0,0,0]])

print(assoc_matrix)