# NUMPY

Numerical Python, numpy is a foundational package on which many of the most common data science packages are built. it provides us with performance multi-dimensional arrays which can we can use as vectors or matrices.

The key features of numpy are:
* **ndarrays:** n-dimensional arrays of the same data type which are fast and space-efficient. There are a number of built-in methods for ndarrays which allow for rapid processing of data without using loops eg compute the mean.
* **Broadcasting:** Useful tool which defines implicit behavior between multi-dimensional arrays of different sizes.
* **Vectorization:** Enables numeric operations on ndarrays.
* **Input/Output:** Simplifies reading and writing of data from/to file.

A Rank 1 ndarrays are simply a single dimensional array or a vector

How to create Rank 1 numpy arrays:

In [9]:
import numpy as np
# array function is called to return the ndarray object
an_array = np.array([3, 33, 333]) # create Rank 1 array
print(type(an_array))             # the type of an ndarray is: "<class 'numpy.ndarray'>"

<class 'numpy.ndarray'>


In [10]:
# test the shape of the array just created
# it should have just one dimension (Rank 1) with 3 elements.
print(an_array.shape)

(3,)


In [11]:
# because this is a 1-rank array
# only one index is needed to access each element
print(an_array[0], an_array[1], an_array[2])

3 33 333


In [12]:
an_array[0] = 888   # ndarrays are mutable, you also can't assign an integer to be a string.
print(an_array)

[888  33 333]


How to create a Rank 2 numpy array:

A rank 2 ndarray is one with two dimensions. Notice the format below of [ [row], [row] ]. 2 dimensional arrays are great for representing matrices which are often used in Data Science.

In [13]:
# create Rank 2 array
another = np.array([[11, 12, 13], [21, 22, 23]]) 
# print array
print(another)

# rows x columns: 2 rows and 3 columns
print("The shape is 2 rows, 3 columns: ", another.shape)
# When specifying for the two values,  we're asking for the row first, and then the column.
print("Accessing elements [0,0], [0,1], and [1,0] of the ndarray: ", another[0,0], ", ",another[0, 1],", ", another[1, 0])

[[11 12 13]
 [21 22 23]]
The shape is 2 rows, 3 columns:  (2, 3)
Accessing elements [0,0], [0,1], and [1,0] of the ndarray:  11 ,  12 ,  21


### There are many ways to create numpy arrays:

Here we create a number of different size arrays with different shapes and different pre-filled values. Numpy has a number of built in methods which help us quickly and easily create multidimensional arrays.

In [14]:
# create a 2x2 array of zeros
ex1 = np.zeros((2,2))
print(ex1)

[[0. 0.]
 [0. 0.]]


In [15]:
# create a 2x2 array filled with 9.0
ex2 = np.full((2,2), 9.0)
print(ex2)

[[9. 9.]
 [9. 9.]]


In [16]:
# create a 2x2 matrix with the diagonal 1s and the others 0
ex3 = np.eye(2,2)
print(ex3)

[[1. 0.]
 [0. 1.]]


In [17]:
# create an array of ones
ex4 = np.ones((1,2))
print(ex4)

[[1. 1.]]


In [18]:
# notice that ex4 is a Rank 2 ndarray
# notice:
# Rank 1 ndarray gives the number of elements in the array. 
# Rank 2 shows the numbers of rows and columns in the array
print(ex4.shape)

# this means we have to use 2 indexes to access an element
print()
print(ex4[0,1])


(1, 2)

1.0


You can create random arrays by calling the random function in ***np.random***. By specifying size, you get back a matrix filled with random values. This is particularly useful for algorithms which you need to random state in order to get kick started.

In [19]:
# create an array of random floats between 0 and 1
ex5 = np.random.random((2,2))
print(ex5)

[[0.34709044 0.03966463]
 [0.21690581 0.01533526]]


## Array Indexing

### Slice Indexing:

In [20]:
# Rank 2 array of shape (3,4)
# Remember the shape means 3 rows and 4 columns
an_array = np.array([[11,12,13,14], [21,22,23,24], [31,32,33,34]])
print(an_array)

[[11 12 13 14]
 [21 22 23 24]
 [31 32 33 34]]


Use array slicing to get a subarray consisting of the first 2 rows x 2 columns

In [21]:
# if you wanted to have a slice be a copy,
# Instead of just asking for this slice of an_array, you can 
# essentially make a copy and for that you use 
# np.array(an_array[:2, 1:3]) - this creates a copy of a portion
# of the ndarray and assigns it to a_slice. That way, you do not
# the underlying code using a_slice.
a_slice = an_array[:2, 1:3]
# a_slice now has its own indices which are different from an_array
print(a_slice)

[[12 13]
 [22 23]]


When you modify a slice, you actually modify the underlying array

In [22]:
print("Before:", an_array[0, 1]) # Inspect the element at 0,1

# a_slice[0, 0] is the same piece of data as an_array[0, 1]
a_slice[0, 0] = 1000

# Its easy to forget that slices are just references to the 
# same underlying data as the original array.
print("After:", an_array[0, 1])

Before: 12
After: 1000


## Use both integer indexing and slice indexing

We can use combinations of integer and slice indexing to create different shaped matrices

In [23]:
# Create a Rank 2 array of shape (3, 4)
an_array = np.array([[11, 12, 13, 14], [21, 22, 23, 24], [31, 32, 33, 34]])
print(an_array)

[[11 12 13 14]
 [21 22 23 24]
 [31 32 33 34]]


In [24]:
# Using both integer indexing and slicing generated an 
# array of lower of rank
# Using : means all columns.
# an_array[1, :] - means elements in the second row and
# : means include all the columns in that array.
# Using a sinle index, you get back a Rank 1 ndarray.
row_rank1 = an_array[1, :]   # Rank 1 view
print(row_rank1, row_rank1.shape)   # Notice only a single []

[21 22 23 24] (4,)


In [25]:
# Slicing alone: generates an array of the same rank as the an_array
row_rank2 = an_array[1:2, :]  # Rank 2 view
print(row_rank2, row_rank2.shape) # Notice the [] []

[[21 22 23 24]] (1, 4)


In [26]:
# Same thing for columns of an array:

print()
col_rank1 = an_array[:, 1]
col_rank2 = an_array[:, 1:2]

print(col_rank1, col_rank1.shape) # Rank 1
print()
print(col_rank2, col_rank2.shape) # Rank 2


[12 22 32] (3,)

[[12]
 [22]
 [32]] (3, 1)


## Array indexing for changing elements:

Its useful to use an array of indexes to access or change elements.

In [27]:
# Create a new array
an_array = np.array([[11, 12, 13], [21, 22, 23], [31, 32, 33], [41, 42, 43]])
print("Original Array:")
print(an_array)

Original Array:
[[11 12 13]
 [21 22 23]
 [31 32 33]
 [41 42 43]]


In [28]:
# Create an array of indices
# \n adds a blank line before the statement that comes after
col_indices = np.array([0, 1, 2, 0])
print('\nCol indices picked: ', col_indices)

# np.arange returns evenly spaced values within a given interval [start,stop)
row_indices = np.arange(4)
print('\nRows indices picked: ', row_indices)


Col indices picked:  [0 1 2 0]

Rows indices picked:  [0 1 2 3]


In [29]:
# Examine the pairings of row_indices and col_indices.
# These are the elements we'll change next
for row, col in zip(row_indices, col_indices):
    print(row, ", ",col)

0 ,  0
1 ,  1
2 ,  2
3 ,  0


In [30]:
# Select one element from each row
print('Values in the array at those indices: ' ,
      an_array[row_indices, col_indices])

Values in the array at those indices:  [11 22 33 41]


In [31]:
# Change one element from each row using the indices selected
an_array[row_indices, col_indices] += 100000

print('\nChanged Array:')
print(an_array)


Changed Array:
[[100011     12     13]
 [    21 100022     23]
 [    31     32 100033]
 [100041     42     43]]


# Boolean Indexing

Array Indexing for changing elements:

In [32]:
# Create a 3x2 array
an_array = np.array([[11, 12], [21, 22], [31, 32]])
print(an_array)

[[11 12]
 [21 22]
 [31 32]]


In [33]:
# Create a filter which will be boolean values
# For whether each element meets this condition
filter = (an_array > 15)
filter

array([[False, False],
       [ True,  True],
       [ True,  True]])

Notice that the filter is a same size ndarray as an_array filled with True for each element whose corresponding element in an_array which is greater than 15 and False for those elements whose value is less than 15

In [34]:
# show the elements which meet that criteria
print(an_array[filter])

[21 22 31 32]


In [35]:
# For short, we could have just used the approach 
# below without the need for the separate filter array

an_array[(an_array % 2 == 0)]

array([12, 22, 32])

We can change elements in the array applying a similar logical filter. Let's add 100 to all the even values.

In [36]:
an_array[an_array % 2 == 0] +=100
print(an_array)

[[ 11 112]
 [ 21 122]
 [ 31 132]]


# Datatypes and Array Operations

### Datatypes:

In [37]:
ex1 = np.array([11, 12])  # Python assigns the datatype
print(ex1.dtype)

int64


In [38]:
ex2 = np.array([11.0, 12.0]) # Python assigns the datatype
print(ex2.dtype)

float64


In [39]:
ex3 = np.array([11, 21], 
               dtype=np.int64) # Python can tell the datatype
print(ex3.dtype)

int64


In [40]:
# Using the floor function to force floats into integers
ex4 = np.array([11.1, 12.7], dtype=np.int64)
print(ex4.dtype)
print()
print(ex4)

int64

[11 12]


In [41]:
# Using the floor function to force integers into floats
# if you anticipate the values may change to floats later
ex5 = np.array([11, 21], dtype=np.float64)
print(ex5.dtype)
print()
print(ex5)

float64

[11. 21.]


## Arithmetic Array Operations:

In [42]:
x = np.array([[111, 112], [121, 122]], dtype=np.int)
y = np.array([[211.1, 212.1],[221.1,222.1]], dtype=np.float64)

print(x)
print()
print(y)

[[111 112]
 [121 122]]

[[211.1 212.1]
 [221.1 222.1]]


In [43]:
print(np.add(x,y))

[[322.1 324.1]
 [342.1 344.1]]


In [44]:
print(np.subtract(x,y))

[[-100.1 -100.1]
 [-100.1 -100.1]]


In [45]:
print(np.multiply(x,y))

[[23432.1 23755.2]
 [26753.1 27096.2]]


In [46]:
print(np.divide(x,y))

[[0.52581715 0.52805281]
 [0.54726368 0.54930212]]


In [47]:
print(np.sqrt(x))

[[10.53565375 10.58300524]
 [11.         11.04536102]]


In [48]:
print(np.exp(x))

[[1.60948707e+48 4.37503945e+48]
 [3.54513118e+52 9.63666567e+52]]


## Statistical Methods, Sorting And Set Operations

### Basic Statistical Operations:

In [49]:
# setup a random 2 x 4 matrix
arr = 10 * np.random.randn(2,5)
print(arr)

[[ -2.83198548  10.81941955   6.17868867  -7.40099684   5.36407362]
 [-17.8009848  -17.9081805   -9.88741145  -9.71201804   6.26661931]]


In [50]:
# compute the mean for all elements
print(arr.mean())

-3.69127759548802


In [51]:
# compute the means by row
print(arr.mean(axis = 1))

[ 2.4258399 -9.8083951]


In [52]:
# compute the means by columns
print(arr.mean(axis = 0))

[-10.31648514  -3.54438047  -1.85436139  -8.55650744   5.81534646]


In [53]:
# sum of all the elements
print(arr.sum())

-36.9127759548802


In [54]:
# median values for each row
print(np.median(arr, axis = 1))

[ 5.36407362 -9.88741145]


### Sorting:

In [55]:
# create a 10 element array of randoms
unsorted = np.random.randn(10)
print(unsorted)

[ 1.13179356  1.14075735 -0.62664844  0.96081023  0.31178778  1.05512899
  0.99549677  2.03655643 -0.70079596 -1.19221676]


In [56]:
# create a copy
# by doing this, we ensure that the original array is still in tact.
sorted = np.array(unsorted)
# sort the copy
sorted.sort()

print(sorted)
print()
print(unsorted)

[-1.19221676 -0.70079596 -0.62664844  0.31178778  0.96081023  0.99549677
  1.05512899  1.13179356  1.14075735  2.03655643]

[ 1.13179356  1.14075735 -0.62664844  0.96081023  0.31178778  1.05512899
  0.99549677  2.03655643 -0.70079596 -1.19221676]


In [57]:
# inplace sorting 
unsorted.sort()

print(unsorted)

[-1.19221676 -0.70079596 -0.62664844  0.31178778  0.96081023  0.99549677
  1.05512899  1.13179356  1.14075735  2.03655643]


### Finding unique elements:

In [58]:
array = np.array([1,2,1,4,2,1,4,2])

print(np.unique(array))

[1 2 4]


## Set operations with np.array data type:

In [59]:
s1 = np.array(['desk', 'chair', 'bulb'])
s2 = np.array(['lamp', 'bulb', 'chair'])
print(s1,s2)

['desk' 'chair' 'bulb'] ['lamp' 'bulb' 'chair']


In [60]:
# use 1d because intersect expects 1d arrays.

print( np.intersect1d(s1,s2) )  # elements both in s1 and s2

['bulb' 'chair']


In [61]:
print( np.union1d(s1,s2) )   # elements both in s1 and s2

['bulb' 'chair' 'desk' 'lamp']


In [62]:
print( np.setdiff1d(s1,s2) ) # elements in s1 that are not in s2

['desk']


In [63]:
print( np.in1d(s1,s2) ) # elements of s1 that are in s2

[False  True  True]


## Broadcasting:

In [64]:
start = np.zeros((4,3))
print(start)

[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]


In [65]:
# create a Rank 1 ndarray with 3 values
add_rows = np.array([1, 0, 2])
print(add_rows)

[1 0 2]


In [66]:
# Add to each row of 'start' using broadcasting
y = start + add_rows
print(y)

[[1. 0. 2.]
 [1. 0. 2.]
 [1. 0. 2.]
 [1. 0. 2.]]


In [67]:
# create an ndarray which is 4 x 1 to broadcast across columns
add_cols = np.array([[0, 1, 2, 3]])
add_cols = add_cols.T # Transposes

print(add_cols)

[[0]
 [1]
 [2]
 [3]]


In [68]:
# add to each column of 'start' using broadcasting
y = start + add_cols
print(y)

[[0. 0. 0.]
 [1. 1. 1.]
 [2. 2. 2.]
 [3. 3. 3.]]


In [69]:
# this will just broadcast in both dimensions
add_scalar = np.array([1])
print(start + add_scalar)

[[1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]]


Examples from the slides:

In [70]:
# create a 3 x 4 matrix
arrA = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12]])
print(arrA)

[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]


In [71]:
arrB = [0,1,0,2]
print(arrB)

[0, 1, 0, 2]


In [72]:
# add the two using broadcasting
print(arrA + arrB)

[[ 1  3  3  6]
 [ 5  7  7 10]
 [ 9 11 11 14]]


## Speedtest: ndarrays vs lists

First setup parameters for the speed test. We'll be testing time to sum elements in an ndarray vs a list.

In [73]:
from numpy import arange
from timeit import Timer
size    = 1000000
timeits = 1000

In [74]:
# create the ndarray with values 0,1,2...,size-1
nd_array = arange(size)
print( type(nd_array) )

<class 'numpy.ndarray'>


In [75]:
# timer expects the operation as a parameter,
# here we pass nd_array.sum()
timer_numpy = Timer("nd_array.sum()", 
                    "from __main__ import nd_array")
print("Time taken by numpy ndarray: %f seconds" %
     (timer_numpy.timeit(timeits)/timeits))


Time taken by numpy ndarray: 0.002064 seconds


In [76]:
# create the list with values 0,1,2...,size-1
a_list = list(range(size))
print(type(a_list))

<class 'list'>


In [77]:
# timer expects the operation as a parameter, 
# here we pass sum(a_list)
timer_list = Timer("sum(a_list)", 
                   "from __main__ import a_list")
print("Timer taken by list: %f seconds" %
     (timer_list.timeit(timeits)/timeits))

Timer taken by list: 0.021732 seconds


### Read or Write To Disk:

Binary Format:

In [78]:
x = np.array([23.23, 24.24])

In [79]:
np.save('an_array', x)

In [80]:
np.load('an_array.npy')

array([23.23, 24.24])

Text Format:

In [81]:
np.savetxt('array.txt', X=x, delimiter=',')

In [82]:
!cat array.txt

2.323000000000000043e+01
2.423999999999999844e+01


In [83]:
np.loadtxt('array.txt', delimiter=',')

array([23.23, 24.24])

## Additional Common ndarray Operations

Dot product on matrices and inner product on vectors:

In [84]:
# determine the dot product of two matrices
x2d = np.array([[1,1],[1,1]])
y2d = np.array([[2,2],[2,2]])

print(x2d.dot(y2d))
print()
print(np.dot(x2d, y2d))

[[4 4]
 [4 4]]

[[4 4]
 [4 4]]


In [85]:
# determine the inner product of two vectors
a1d = np.array([9, 9])
b1d = np.array([10, 10])

print(a1d.dot(b1d))
print()
print(np.dot(a1d, b1d))

180

180


In [86]:
# dot produce on an array and vector
print(x2d.dot(a1d))
print()
print(np.dot(x2d, a1d))

[18 18]

[18 18]


Sum:

In [87]:
# sum elements in the array
ex1 = np.array([[11, 12], [21, 22]])

print(np.sum(ex1))            # add all members

66


In [88]:
print(np.sum(ex1, axis=0))    # columnwise sum

[32 34]


In [89]:
print(np.sum(ex1, axis=1))    # rowwise sum

[23 43]


### Element-Wise Functions

For example, let's compare two arrays values to get the maximum of each

In [90]:
# random array
x = np.random.randn(8)
x

array([-0.11013989,  0.90074379,  0.22700409, -0.27332641,  1.2468485 ,
       -0.06694573, -0.47500264,  0.32715678])

In [91]:
# another random array
y = np.random.randn(8)
y

array([ 1.40756113,  1.0570713 , -0.70450049,  2.05888651,  1.41568245,
        0.11984299, -0.13581872,  1.37640054])

In [92]:
# returns element wise maximum between two arrays
np.maximum(x, y)

array([ 1.40756113,  1.0570713 ,  0.22700409,  2.05888651,  1.41568245,
        0.11984299, -0.13581872,  1.37640054])

Reshaping array:

In [93]:
# grab values from 0 through 19 in an array
arr = np.arange(20)
print(arr)

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19]


In [94]:
# reshape to be a 4 x 5 matrix
arr.reshape(4,5)

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19]])

Transpose:

In [95]:
# transpose
ex1 = np.array([[11, 12], [21, 22]])

ex1.T

array([[11, 21],
       [12, 22]])

Indexing using where():

In [96]:
x_1 = np.array([1, 2, 3, 4, 5])

y_1 = np.array([11, 22, 33, 44, 55])

filter = np.array([True, False, True, False, True])

In [97]:
out = np.where(filter, x_1, y_1)
print(out)

[ 1 22  3 44  5]


In [98]:
mat = np.random.rand(5,5)
mat

array([[0.25821843, 0.26363762, 0.00436177, 0.51289036, 0.46187193],
       [0.79778583, 0.07497694, 0.42752888, 0.55773723, 0.79293313],
       [0.38647727, 0.07355835, 0.19757946, 0.63988669, 0.22043883],
       [0.44573082, 0.45982846, 0.80777496, 0.22950302, 0.41648023],
       [0.26667315, 0.01708304, 0.42199801, 0.04308592, 0.36670738]])

In [99]:
np.where( mat > 0.5, 1000, -1)

array([[  -1,   -1,   -1, 1000,   -1],
       [1000,   -1,   -1, 1000, 1000],
       [  -1,   -1,   -1, 1000,   -1],
       [  -1,   -1, 1000,   -1,   -1],
       [  -1,   -1,   -1,   -1,   -1]])

### "any" or "all" conditionals:

In [100]:
arr_bools = np.array([True, False, True, True, False])

In [101]:
arr_bools.any()

True

In [102]:
arr_bools.all()

False

### Random Number Generation:

In [103]:
Y = np.random.normal(size = (1,5))[0]
print(Y)

[ 0.41022169 -0.03081481  0.96809094 -1.4201252   0.41436719]


In [104]:
Z = np.random.randint(low=2, high=50, size=4)
print(Z)

[ 6 11 38 15]


In [105]:
np.random.permutation(Z)  # return a new ordering of elements in Z

array([38, 11, 15,  6])

In [106]:
np.random.uniform(size=4) # uniform distribution

array([0.07953365, 0.16792572, 0.60038996, 0.81464241])

In [107]:
np.random.normal(size=4)  # normal distribution

array([ 0.11551233, -0.52109806,  0.43538456,  2.47862921])

### Merging Data Sets:

In [108]:
K = np.random.randint(low=2, high=50, size=(2,2))
print(K)

print()
M = np.random.randint(low=2, high=50, size=(2,2))
print(M)

[[48 49]
 [43 26]]

[[46 17]
 [33 39]]


In [109]:
np.vstack((K, M))

array([[48, 49],
       [43, 26],
       [46, 17],
       [33, 39]])

In [110]:
np.hstack((K,M))

array([[48, 49, 46, 17],
       [43, 26, 33, 39]])

In [111]:
np.concatenate([K,M], axis = 0)

array([[48, 49],
       [43, 26],
       [46, 17],
       [33, 39]])

In [112]:
np.concatenate([K,M.T], axis = 1)

array([[48, 49, 46, 33],
       [43, 26, 17, 39]])