***
## 3.4 Numpy - Manipulating Arrays
***
### Python3.1 Numpy Introduction
### Python3.2 Numpy DataTypes, Functions, and Random Module
### Python3.3 Numpy Iterating Over Arrays
### Python3.4 Numpy Manipulating Arrays
### Python3.5 Numpy Operations
### Python3.6 Numpy File Input and Output and Data Processing
### Python3.7 Numpy-Sort, Argsort, Nonzero, and Extract Functions
### Python3.8 Numpy BreakoutGroupExercises
### Python3.8 Numpy BreakoutGroupExercises - Solutions
***


***
## Table of Contents:
### Section 1. Indexing/Subsetting (Index Slicing, Fancy Indexing, and Boolean/Conditional Indexing)
### Section 2. Functions for Extracting Data from Arrays and Creating Arrays
### Section 3. Reshaping and Resizing Arrays
### Section 4. Combining/Stacking and Repeating Arrays
### Section 5. Splitting Arrays
### Section 6: Adding/Removing Elements 
### Section 7: "Shallow Copy" and "Copy"
### Section 8: Using Arrays in Conditions
***

## Section 1. Indexing/Subsetting (Index Slicing, Fancy Indexing, and Boolean/Conditional Indexing)

### 1) Indexing: Index elements in an array using `square brackets[ ]` and `indices`:

- Just like a normal Python list, indexing starts at zero in Numpy arrays

In [143]:
import numpy as np

In [144]:
# A vector: the argument to the array function is a Python list
v = np.array([1,2,3,4])
v

array([1, 2, 3, 4])

In [145]:
v[3]

4

In [146]:
v[1:3]

array([2, 3])

In [147]:
v[:]

array([1, 2, 3, 4])

In [148]:
ar=np.arange(1,9)
ar

array([1, 2, 3, 4, 5, 6, 7, 8])

In [149]:
ar[7]

8

In [150]:
ar[0:5]

array([1, 2, 3, 4, 5])

In [151]:
ar[5:] # grab everything beyond index 5

array([6, 7, 8])

In [152]:
ar[:]=77 # broadcasting
ar

array([77, 77, 77, 77, 77, 77, 77, 77])

In [153]:
# A matrix: the argument to the array function is a nested Python list
M = np.array([[1, 2], [3, 4]])
M

array([[1, 2],
       [3, 4]])

In [154]:
# v is a vector, and has only one dimension, taking one index
v[0]

1

In [155]:
# M is a matrix, or a 2 dimensional array, taking two indices 
M[1,1]

4

In [156]:
M[0,0]

1

If we omit an index of a multidimensional array it returns the whole row (or, in general, a N-1 dimensional array) 

In [157]:
M

array([[1, 2],
       [3, 4]])

In [158]:
M[0]

array([1, 2])

In [159]:
M[0,:]

array([1, 2])

The same thing can be achieved with using `:` instead of an index: 

In [160]:
M[1,:] # row 2

array([3, 4])

In [161]:
M[:,1] # column 2

array([2, 4])

We can assign new values to elements in an array using indexing:

In [162]:
M[0,0] = 1
M

array([[1, 2],
       [3, 4]])

In [163]:
# also works for rows and columns
M[1,:] = 0
M[:,1] = -1
M

array([[ 1, -1],
       [ 0, -1]])

### 2) Index slicing: the technical name to extract part of an array
#### a) `M[lower:upper:step]`: Index/Extract part of a 1D array

- Slicing (i.e. ndarray1[2:6]) is a ‘view’ on the original array. 
- `Data is NOT copied`. Any modifications (i.e. ndarray1[2:6] = 8) to the ‘view’ will be reflected in the original array.
- Instead of a ‘view’, explicit copy of slicing via :

In [164]:
import numpy as np

In [165]:
A = np.array([1,2,3,4,5])
A.copy() 

array([1, 2, 3, 4, 5])

In [166]:
A[0:3]  # Up to but not include 3

array([1, 2, 3])

In [167]:
A[1:3]

array([2, 3])

In [168]:
A[2:]  # grab everything beyond the 2nd index which occurs 2

array([3, 4, 5])

In [169]:
A[1:3] = [-2,-3]
A

array([ 1, -2, -3,  4,  5])

#### We can omit any of the three parameters in `M[lower:upper:step]`

In [170]:
A[::] # lower, upper, step all take the default values

array([ 1, -2, -3,  4,  5])

In [171]:
A

array([ 1, -2, -3,  4,  5])

In [172]:
A[::2] # step is 2, lower and upper defaults to the beginning and end of the array

array([ 1, -3,  5])

In [173]:
A[:3] # first three elements

array([ 1, -2, -3])

In [174]:
A[3:] # elements from index 3

array([4, 5])

In [175]:
# How to get this?
A[1:4]

array([-2, -3,  4])

#### Negative indices counts from the end of the array (positive index from the begining):

In [176]:
A = np.array([1,2,3,4,5])
A[-1]  # the last element in the array

5

In [177]:
A[-3]

3

In [178]:
A[-3:] # the last three elements

array([3, 4, 5])

In [179]:
# Broadcast to reassign 100 to the first three elements
A[0:3]=100
A

array([100, 100, 100,   4,   5])

#### b) **`arr_2d[row][col]`** or **`arr_2d[row,col]`** to index a 2D array (matrices)

- the comma notation is recommended for clarity.

In [180]:
arr_2d = np.array(([5,10,15],[20,25,30],[35,40,45]))
#Show
arr_2d

array([[ 5, 10, 15],
       [20, 25, 30],
       [35, 40, 45]])

In [181]:
#Indexing row
arr_2d[1]

array([20, 25, 30])

In [182]:
arr_2d[1,2]

30

In [183]:
arr_2d[1][2]

30

In [184]:
# How to get 45?


In [185]:
# 2D array slicing
# Shape (2,2) from top right corner
arr_2d[:2,1:]        # not to include the upper limit

array([[10, 15],
       [25, 30]])

In [186]:
#Shape bottom row
arr_2d[2]

array([35, 40, 45])

In [187]:
#Shape bottom row
arr_2d[2,:]

array([35, 40, 45])

In [188]:
# Exercise: use slicing notation to grab shape (3,3) top left corner 
arr_2dm = np.array(([2,4,6,8],[10,12,14,16],[18,20,22,24],[26,28,30,32]))
arr_2dm

array([[ 2,  4,  6,  8],
       [10, 12, 14, 16],
       [18, 20, 22, 24],
       [26, 28, 30, 32]])

In [189]:
arr_2dm[:3,:3]

array([[ 2,  4,  6],
       [10, 12, 14],
       [18, 20, 22]])

In [190]:
# Exercise: use slicing notation to grab shape (3,3) top right corner 
arr_2dm[:3,1:]

array([[ 4,  6,  8],
       [12, 14, 16],
       [20, 22, 24]])

In [191]:
# Exercise: use slicing notation to grab shape (2,2) bottom left corner 


### 3) Fancy Indexing (aka ‘indexing using integer arrays’):
- Is the name for when an array or list is used in place of an index
- `Fancy indexing ALWAYS creates a copy of the data`


In [192]:
arr_2dm = np.array(([2,4,6,8],[10,12,14,16],[18,20,22,24],[26,28,30,32]))
arr_2dm

array([[ 2,  4,  6,  8],
       [10, 12, 14, 16],
       [18, 20, 22, 24],
       [26, 28, 30, 32]])

In [193]:
row_indices = [3, 1]
arr_2dm[row_indices]

array([[26, 28, 30, 32],
       [10, 12, 14, 16]])

In [194]:
col_indices = [-1,-3] # index -1 means the last element
arr_2dm[:,col_indices]

array([[ 8,  4],
       [16, 12],
       [24, 20],
       [32, 28]])

In [195]:
arr_2dm[row_indices, col_indices]

array([32, 12])

**Note: `Fancy indexing allows you to select entire rows or columns out of order`,** to show this, let's quickly build out a Numpy array:

In [196]:
#Set up matrix
arr_2dm

array([[ 2,  4,  6,  8],
       [10, 12, 14, 16],
       [18, 20, 22, 24],
       [26, 28, 30, 32]])

In [197]:
arr_2dm.shape[0]

4

In [198]:
#Length of array
arr_length = arr_2dm.shape[1]
arr_length

4

In [199]:
arr_length = arr_2dm.shape
arr_length

(4, 4)

In [200]:
arr_2dm[[0,2]]

array([[ 2,  4,  6,  8],
       [18, 20, 22, 24]])

In [201]:
#Allows in any order
arr_2dm[[2,0]]

array([[18, 20, 22, 24],
       [ 2,  4,  6,  8]])

### 4) Boolean Indexing (Conditional Indexing)
We can also use index masks: If the index mask is an Numpy array of data type bool, then an element is selected (True) or not (False) depending on the value of the index mask at the position of each element: 

- Selecting data by boolean indexing ALWAYS creates a copy of the data.
- The `and` and `or` keywords do NOT work with boolean arrays. Use `&` and `|`

In [202]:
B = np.array([n for n in range(7)])
B

array([0, 1, 2, 3, 4, 5, 6])

In [203]:
bool_B = B > 4
bool_B

array([False, False, False, False, False,  True,  True])

In [204]:
B[bool_B]

array([5, 6])

In [205]:
B[B > 4]

array([5, 6])

In [206]:
B[B < 3]

array([0, 1, 2])

In [207]:
row_mask = np.array([True, False, True, False, False, True, False])
B[row_mask]

array([0, 2, 5])

In [208]:
# same thing
row_mask = np.array([2,0,0,1,0,1,1], dtype=bool)
B[row_mask]

array([0, 3, 5, 6])

In [209]:
arr_2dm = np.arange(40)
arr_2dm

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
       34, 35, 36, 37, 38, 39])

In [210]:
# Exercise: grab the following output
arr_2dm = np.arange(40).reshape(4,10)
arr_2dm

array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35, 36, 37, 38, 39]])

In [211]:
arr_2dm[1:3,3:5]

array([[13, 14],
       [23, 24]])

In [212]:
# Exercise: Grab the first two rows & first 6 columns


This feature is very useful to conditionally select elements from an array, using for example comparison operators:

In [213]:
x = np.arange(0, 10, 0.5)
x

array([0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5, 5. , 5.5, 6. ,
       6.5, 7. , 7.5, 8. , 8.5, 9. , 9.5])

In [214]:
mask = (5 < x) & (x < 7.5)
mask

array([False, False, False, False, False, False, False, False, False,
       False, False,  True,  True,  True,  True, False, False, False,
       False, False])

In [215]:
x[mask]

array([5.5, 6. , 6.5, 7. ])

In [216]:
x[(5 < x) & (x < 7.5)]

array([5.5, 6. , 6.5, 7. ])

## Section 2. Functions for extracting data from arrays and creating arrays

#### a) `where()` function returns the indices of elements in an input array where the given condition is satisfied.

- Syntax: numpy.where(condition, x, y)
- Condition: When True, yield x, otherwise yield y.
- x, y: Values from which to choose. x, y and condition need to be broadcastable to some shape.

Example 1. The index mask can be converted to position index using the **where** function.

In [217]:
mask

array([False, False, False, False, False, False, False, False, False,
       False, False,  True,  True,  True,  True, False, False, False,
       False, False])

In [218]:
indices = np.where(mask)
indices

(array([11, 12, 13, 14], dtype=int64),)

In [219]:
x[indices] # this indexing is equivalent to the fancy indexing x[mask]

array([5.5, 6. , 6.5, 7. ])

Example 2. 1-dimensional array

In [220]:
a = np.arange(10)
a

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [221]:
np.where(a < 5, a, 10*a)

array([ 0,  1,  2,  3,  4, 50, 60, 70, 80, 90])

Example 3. Multidimensional arrays

In [222]:
np.where([[True, False], [True, True]],[[1, 2], [3, 4]],[[9, 8], [7, 6]])

array([[1, 8],
       [3, 4]])

Example 4. Broadcast:

In [223]:
a = np.array([[0, 1, 2],
              [0, 2, 4],
              [0, 3, 6]])
np.where(a < 4, a, -1)  # -1 is broadcast

array([[ 0,  1,  2],
       [ 0,  2, -1],
       [ 0,  3, -1]])

#### p.where(cond, 1, 0).argmax() => Find the first True element

argmax() can be used to find the index of the maximum element. This example usage is find the first element that has a “price > number” in an array of price data

#### b) `take()` function is similar to fancy indexing described above:

In [224]:
v2 = np.arange(-3,3)
v2

array([-3, -2, -1,  0,  1,  2])

In [225]:
row_indices = [1, 3, 5]
v2[row_indices]         # fancy indexing

array([-2,  0,  2])

In [226]:
v2.take(row_indices)

array([-2,  0,  2])

But `take` also works on lists and other objects:

In [227]:
np.take([-3, -2, -1,  0,  1,  2], row_indices)

array([-2,  0,  2])

## Section 3. Reshaping and Resizing Arrays

The shape of a Numpy array can be modified without copying the underlying data, which makes it a fast operation even for large arrays.

Reference: https://docs.scipy.org/doc/numpy/reference/generated/numpy.reshape.html

#### The shape attribute for numpy arrays returns the dimensions of the array. If Y has `n` rows and `m` columns, then:

- Y.shape is `(n,m)`

- Y.shape[0] is `n`: will display number of rows

- Y.shape[1] is `m`: will display number of columns

In [228]:
import numpy as np
Y = np.arange(12).reshape(3,4)
Y

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [229]:
Y.shape, Y.shape[0], Y.shape[1]

((3, 4), 3, 4)

In [230]:
ranarr = np.random.randint(0,50,10)
ranarr

array([25, 49, 39, 43,  2, 42,  4, 21, 23,  6])

#### a) `reshape()` function returns an array containing the same data with a new shape.

In [231]:
arr = np.arange(25)
arr

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24])

In [232]:
arr.reshape(5,5)

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24]])

#### b) `flatten()` function can flatten to make a higher-dimensional array into a vector. But this function create a copy of the data.

In [233]:
arr.flatten()

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24])

#### c) `max()`,`min()`,`argmax()`,`argmin()` function
These are useful methods for finding max or min values. Or to find their index locations using argmin or argmax

In [234]:
ranarr

array([25, 49, 39, 43,  2, 42,  4, 21, 23,  6])

In [235]:
ranarr.max()

49

In [236]:
ranarr.argmax()  # the location of max

1

In [237]:
ranarr.min()

2

In [238]:
ranarr.argmin()

4

## Section 4. Combining and Repeating Arrays

Using function `repeat()`, `tile()`, `vstack()`, `hstack()`, and `concatenate()` we can create larger vectors and matrices from smaller ones:

#### a) `tile()` and `repeat()` functions

In [239]:
a = np.array([[1, 2], [3, 4]])
a

array([[1, 2],
       [3, 4]])

In [240]:
# repeat each element 3 times
np.repeat(a, 3)

array([1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4])

In [241]:
# tile the matrix 3 times 
np.tile(a, 3)

array([[1, 2, 1, 2, 1, 2],
       [3, 4, 3, 4, 3, 4]])

In [242]:
a*3

array([[ 3,  6],
       [ 9, 12]])

#### b) `concatenate()` function
- Concatenate arrays


In [243]:
a

array([[1, 2],
       [3, 4]])

In [244]:
b = np.array([[5, 6]])
b

array([[5, 6]])

In [245]:
np.concatenate((a, b), axis=0)

array([[1, 2],
       [3, 4],
       [5, 6]])

In [246]:
# transpose array
b.T

array([[5],
       [6]])

In [247]:
# Or
b.transpose()

array([[5],
       [6]])

In [248]:
np.concatenate((a, b.T), axis=1)  # .T is transpose

array([[1, 2, 5],
       [3, 4, 6]])

#### c) Stacking arrays 
- Stack arrays vertically (row-wise): 
   - vstack()
   - r_[ ]
- Stack arrays horizontally (column-wise)row-wise:  
   - hstack()
   - column_stack((a,d))
   - c_[ ]


In [249]:
np.vstack((a,b))

array([[1, 2],
       [3, 4],
       [5, 6]])

In [250]:
 np.r_[a,b] 

array([[1, 2],
       [3, 4],
       [5, 6]])

In [251]:
a

array([[1, 2],
       [3, 4]])

In [252]:
b

array([[5, 6]])

In [253]:
np.hstack((a,b.T))

array([[1, 2, 5],
       [3, 4, 6]])

In [254]:
np.c_[(a,b.T)]

array([[1, 2, 5],
       [3, 4, 6]])

In [255]:
np.column_stack((a,b.T))

array([[1, 2, 5],
       [3, 4, 6]])

## Section 5. Splitting Arrays
- np.hsplit(ary, indices_or_sections): Split an array into multiple sub-arrays horizontally (column-wise)
- np.vsplit(ary, indices_or_sections): Split an array into multiple sub-arrays vertically (row-wise)

In [256]:
 a = np.array([1,2,3])

In [257]:
np.hsplit(a,3)

[array([1]), array([2]), array([3])]

In [258]:
d= np.arange(16.0).reshape(4, 4)
d

array([[ 0.,  1.,  2.,  3.],
       [ 4.,  5.,  6.,  7.],
       [ 8.,  9., 10., 11.],
       [12., 13., 14., 15.]])

In [259]:
np.hsplit(d,2) # Split the array horizontally at the 2nd index

[array([[ 0.,  1.],
        [ 4.,  5.],
        [ 8.,  9.],
        [12., 13.]]),
 array([[ 2.,  3.],
        [ 6.,  7.],
        [10., 11.],
        [14., 15.]])]

In [260]:
np.vsplit(d,2) # Split the array vertically at the 2nd index


[array([[0., 1., 2., 3.],
        [4., 5., 6., 7.]]),
 array([[ 8.,  9., 10., 11.],
        [12., 13., 14., 15.]])]

## Section 6: Adding/Removing Elements


In [261]:
a

array([1, 2, 3])

In [262]:
np.append(a,[5,9]) # Append items to an array


array([1, 2, 3, 5, 9])

In [263]:
np.insert(a, 1, 5) # Insert items in an array


array([1, 5, 2, 3])

In [264]:
np.delete(a,[1]) # Delete items from an array

array([1, 3])

## Section 7. "Shallow Copy" and "Copy"

- To achieve high performance, assignments in Python usually do not copy the underlaying objects. This is important for example when objects are passed between functions, to avoid an excessive amount of memory copying when it is not necessary (technical term: pass by reference). 
- For more details about this topic, please refere to the Python2.2.1 notebooks

In [265]:
A = np.array([[1, 2], [3, 4]])
A

array([[1, 2],
       [3, 4]])

In [266]:
# now B is referring to the same array data as A 
B = A 

In [267]:
B

array([[1, 2],
       [3, 4]])

In [268]:
# changing B affects A
B[0,0] = 10
B

array([[10,  2],
       [ 3,  4]])

In [269]:
A

array([[10,  2],
       [ 3,  4]])

If we want to avoid this behavior, so that when we get a new completely independent object B copied from A, then we need to do a so-called "`deep copy`" using the function copy:

In [270]:
C = np.copy(A)

In [271]:
C

array([[10,  2],
       [ 3,  4]])

In [272]:
# now, if we modify B, A is not affected
C[0,0] = -5
C

array([[-5,  2],
       [ 3,  4]])

In [273]:
A

array([[10,  2],
       [ 3,  4]])

## Section 8. Using Arrays in Conditions

When using arrays in conditions,for example `if statements` and other Boolean expressions, one needs to use `.any()` or `.all()`, which requires that `.any()` or `.all()` elements in the array evalutes to True:

In [274]:
import numpy as np
M = np.array([[1, 2], [3, 4]])
M

array([[1, 2],
       [3, 4]])

In [275]:
if (M > 5).any():
    print("at least one element in M is larger than 5")
else:
    print("no element in M is larger than 5")

no element in M is larger than 5


In [276]:
if (M > 0).all():
    print("all elements in M are larger than 0")
else:
    print("all elements in M are not larger than 5")

all elements in M are larger than 0


### Exercise

In [277]:
grades = np.array([[72, 89, 14],
 [43, 65, 74],
 [38, 71, 62],
 [82, 66, 49],
 [31, 95, 65],
 [42, 58, 51],
 [15, 54, 85],
 [60, 21, 15],
 [79, 23, 58],
 [63, 87, 67]])
grades

array([[72, 89, 14],
       [43, 65, 74],
       [38, 71, 62],
       [82, 66, 49],
       [31, 95, 65],
       [42, 58, 51],
       [15, 54, 85],
       [60, 21, 15],
       [79, 23, 58],
       [63, 87, 67]])

In [278]:
grades.shape

(10, 3)

These numbers represent grades of 10 students (each is out of a hundred). The first two are assignments and the last is the final exam.

### Exercise

Show all grades for the first assignment (use the slicing syntax we have used often)

In [279]:
# Try it here
grades[:,1]

array([89, 65, 71, 66, 95, 58, 54, 21, 23, 87])

In [280]:
grades.shape

(10, 3)

###  Exercise 

Use dot product to get what is the final grade when the first assignment is the only grade considered?

In [281]:
np.dot(grades,[1,0,0])

array([72, 43, 38, 82, 31, 42, 15, 60, 79, 63])

Notice that this is the same as using slicing syntax to get the first row. 

### Exercise

Using the dot product, we are saying, _get 100% of the first column and zero percent of the second and third column_

What is the final grade if assignments 1 and 2 are considered equally and the final exam is ignored (so the first column contributes half and the second column contributes half)?

In [282]:
np.dot(grades, [.5, .5, 0])

array([80.5, 54. , 54.5, 74. , 63. , 50. , 34.5, 40.5, 51. , 75. ])

The following array represents the final grade for each student (thinking of this as a vertical array, with 47.25 at the top and 71 at the bottom, might make it easier to visualize)

In [283]:
final_grade = np.array([47.25, 64., 58.25, 61.5, 64., 50.5, 59.75, 27.75, 54.5 ,71.])
final_grade

array([47.25, 64.  , 58.25, 61.5 , 64.  , 50.5 , 59.75, 27.75, 54.5 ,
       71.  ])

What if you are the professor and already know the final grade for each student (from a previous quarter), but have forgotten how you weighed the assignments and the final grade? How can you figure out the parameters for the dot product function?

In [284]:
np.linalg.lstsq(grades, final_grade, rcond=None)[0]

array([0.25, 0.25, 0.5 ])

The above method tells us that the first two assignments contributed 25% to the final grade and the last value contributed 50%.

## Further reading

- http://numpy.scipy.org
- http://scipy.org/Tentative_NumPy_Tutorial
- http://scipy.org/NumPy_for_Matlab_Users - A Numpy guide for MATLAB users.

#### Note: The course materials are developed mainly based on personal experience and contributions from the Python learning community
Referred Books: 
- Learning Python, 5th Edition by Mark Lutz
- Python Data Science Handbook, Jake, VanderPlas
- Python for Data Analysis, Wes McKinney    

Copyright ©2023 Mei Najim. All rights reserved. 