# Introduction

Writing effective algorithms with Python requires comptence with the <span style="font-family:'Courier New'">numpy</span> package, which enables:
- Fast execution
- Minimum memory footprint

Understanding <span style="font-family:'Courier New'">numpy</span>, its speed, and how to use it more effectively requires a deeper examination of how variables are handled in memory.    

This Jupyter notebook covers the essential basis of <span style="font-family:'Courier New'">numpy</span> syntax and methods to get you on your way to writing faster code that uses less memory.  It focuses on basic techniques that are frequently useful in writing algorithms.

# <span style="font-family:'Courier New'">numpy</span> versus Python Lists

The first things to understand is that <code>numpy</code> derives its speed from storing elements of arrays in contiguous blocks of memory and relying on the fast C programming language.

![contiguous_memory](images/numpy_vs_list.jpg)

# Speed: Establish <span style="font-family:'Courier New'">numpy</span> Variables Once <a class="anchor" id="instan_numpy-once">

Any of these actions cause a <code>numpy</code> array to be re-instantiated, which negates the speed advantages you could garner with <code>numpy</code>.  So, avoid these operations:
    
- Resize a <span style="font-family:'Courier New'">numpy</span> array 
- Change the data type of a <span style="font-family:'Courier New'">numpy</span> array   
- Take a slice of a <span style="font-family:'Courier New'">numpy</span> array 
- Concatenating <span style="font-family:'Courier New'">numpy</span> arrays
- Appending values to <span style="font-family:'Courier New'">numpy</span> arrays
- Using <code>numpy np.vstack()</code> or <code>np.hstack()</code>

Prevent needing to re-instantiating a <span style="font-family:'Courier New'">numpy ndarray</span>s by determining the required _size_, _shape_, and _data type_ of the array and establish it once.

Assume in the example below we are are filling a <span style="font-family:'Courier New'">numpy</span> array with values that we are computing, which I will simulate with random numbers.  We will do this two ways, one with reserving space for the results first, and, the second, appending rows to the original <span style="font-family:'Courier New'">numpy</span> array as we create the values.  The second way causes the numpy array to be reconstructed many times.

In [1]:
import numpy as np
import time

In [2]:
nrows = 10000
ncols = 10

The numpy methods used below recreate the <code>numpy</code> array each time the method is called, which caused execution to be incredibly slow.

- <code>np.hstack()</code>
- <code>np.vstack()</code>
- <code>np.concatenate()</code>
- <code>np.append()</code>

These methods applied multiple times within a loop is basically a poor idea.

Operations with <code>numpy</code> arrays are very fast once the arrays are created, but instantiating a <code>numpy</code> array takes non-neglible time.

In [4]:
start = time.time()
np_arr = np.random.rand(1,ncols)
for i in range(nrows-1):
    np_arr = np.vstack((np_arr, np.random.rand(1,ncols)))
print(f'Execution time with reserved ndarray: {time.time() - start}')
print(np_arr.shape)

Execution time with reserved ndarray: 0.13271784782409668
(10000, 10)


In [12]:
start = time.time()
np_arr = np.random.rand(ncols,1)
for i in range(nrows-1):
    np_arr = np.hstack((np_arr, np.random.rand(ncols,1)))
print(f'Execution time with reserved ndarray: {time.time() - start}')
print(np_arr.shape)

Execution time with reserved ndarray: 0.14214706420898438
(10, 10000)


In [5]:
start = time.time()
np_arr = np.random.rand(1,ncols)
for i in range(nrows-1):
    np_arr = np.append(np_arr, np.random.rand(1,ncols), axis=0)
print(f'Execution time with reserved ndarray: {time.time() - start}')
print(np_arr.shape)

Execution time with reserved ndarray: 0.12223935127258301
(10000, 10)


In [6]:
start = time.time()
np_arr = np.random.rand(1,ncols)
for i in range(nrows-1):
    np_arr = np.concatenate((np_arr, np.random.rand(1,ncols)))
print(f'Execution time with reserved ndarray: {time.time() - start}')
print(np_arr.shape)

Execution time with reserved ndarray: 0.11562895774841309
(10000, 10)


It is much faster to create a numpy array of sufficient size once to reserve space and just replace its values as the algorithm progresses.

In [7]:
np_res = np.zeros((nrows, ncols))

In [8]:
start = time.time()
for i in range(nrows):
    new_row = np.random.rand(1,ncols)
    np_res[i] = new_row
print(f'Execution time with reserved ndarray: {time.time() - start}')
print(np_res.shape)

Execution time with reserved ndarray: 0.015741586685180664
(10000, 10)


If you do feel the need to append to a data structure as an algorithm progresses without defininf a <code>numpy</code> array to reserve space, tthen appending to a Python list is much faster and the list can be converted to a <code>numpy</code> array when the data accumulation is complete.

In [10]:
start = time.time()
np_arr = []
for i in range(nrows-1):
    np_arr.append(np.random.rand(1,ncols)) 
print(f'Execution time with reserved ndarray: {time.time() - start}')
np_arr = np.array(np_arr)
print(np_arr.shape)

Execution time with reserved ndarray: 0.013594865798950195
(9999, 1, 10)


## Application: The Cell Tower Problem

In [1]:
import random

def setup():
    prob_size = 100000
    data = [random.random() for _ in range(prob_size)]
    budget = 5.0
    return data, budget

### Python List with Deletion of Used Elements

In [2]:
import random
import time

towers, budget = setup()
time_start = time.time()

towers_to_pick = []

while sum(towers_to_pick) < budget and len(towers) > 0:
    if sum(towers_to_pick) + towers[0] <= budget:
        towers_to_pick.append(towers.pop(0))
    else:
        _ = towers.pop(0)

print(f'Investment: {sum(towers_to_pick)} \nExecution time: {time.time() - time_start} seconds \nTowers selected: {towers_to_pick}')

Investment: 4.999996959970765 
Execution time: 0.890256404876709 seconds 
Towers selected: [0.6397090690421973, 0.8932899834495232, 0.140097477165253, 0.3077801740402405, 0.8228421949137854, 0.24261227862442758, 0.33492161664540565, 0.6478410003391296, 0.7146420199370555, 0.25101359662863076, 0.004755420725151782, 7.920551800111308e-05, 0.0002899865395430412, 3.786991647292126e-05, 8.506648594852617e-05]


In [3]:
import random
import time

towers, budget = setup()
time_start = time.time()

towers_to_pick = []
while sum(towers_to_pick) < budget and len(towers) > 0:
    if sum(towers_to_pick) + towers[0] <= budget:
        towers_to_pick.append(towers[0])
    del towers[0]

print(f'Investment: {sum(towers_to_pick)} \nExecution time: {time.time() - time_start} seconds \nTowers selected: {towers_to_pick}')

Investment: 4.999991440831966 
Execution time: 0.8942358493804932 seconds 
Towers selected: [0.01862817159471508, 0.09944490307912379, 0.7544640799811307, 0.36212089786555113, 0.364531412711818, 0.3965647356169755, 0.19303006143570667, 0.15199291399533355, 0.5717185193827808, 0.32195897362136794, 0.6602552027746296, 0.4195904713964931, 0.12090964670195792, 0.38927413099940433, 0.1618943545780912, 0.002873741581200684, 0.00015798914796205654, 0.006936029802124377, 0.0027240778532309218, 4.659804955842173e-05, 0.0006625852056239001, 4.31404530256696e-05, 0.0001651556786068653, 3.647325552580405e-06]


### With a <code>for</code> Loop

In [4]:
import random
import time

towers, budget = setup()
time_start = time.time()

towers_to_pick = []

for t in towers:
    if sum(towers_to_pick) + t <= budget:
        towers_to_pick.append(t)

print(f'Investment: {sum(towers_to_pick)} \nExecution time: {time.time() - time_start} seconds \nTowers selected: {towers_to_pick}')

Investment: 4.9999989412663535 
Execution time: 0.02350616455078125 seconds 
Towers selected: [0.5927846372307863, 0.49080806473652505, 0.48294085639318096, 0.6480193601322259, 0.5785812028229532, 0.30216308816048465, 0.20879061742920568, 0.9174081388705451, 0.7562100725766908, 0.011679830999509533, 0.002832929332382106, 0.00763220519923169, 2.949279888819767e-05, 0.00011844458374432598]


### <code>numpy</code> with Slices to Eliminate Used Elements

In [5]:
import numpy as np

towers, budget = setup()
towers = np.array(towers)
time_start = time.time()

towers_to_pick = np.array([])

while towers_to_pick.sum() < budget and towers.shape[0] > 0:
    if towers_to_pick.sum() + towers[0] <= budget:
        towers_to_pick = np.append(towers_to_pick, towers[0])
    towers = towers[1:]

print(f'Investment: {sum(towers_to_pick)} \nExecution time: {time.time() - time_start} seconds \nTowers selected: {towers_to_pick}')

Investment: 4.999989012490344 
Execution time: 0.43709707260131836 seconds 
Towers selected: [2.89758975e-01 8.52389109e-01 6.90416519e-01 6.90291974e-01
 9.73411079e-01 4.14303710e-01 1.87098549e-02 7.38652903e-01
 2.47555832e-01 1.41671591e-02 3.44325137e-02 2.00293814e-02
 1.58200725e-02 4.99310424e-05]


### Efficient <code>numpy</code> with Reserved Memory for Array

Notice also that the <code>numpy</code> array <code>sum()</code> function is replaced.

In [9]:
import numpy as np

towers, budget = setup()
towers = np.array(towers)
time_start = time.time()

''' Reserve space for solution of maximum possible size '''
towers_to_pick = np.zeros(towers.shape[0], dtype=np.float32)  # do not use np.empty()!!!

j = 0  # counter for number of elements packed and the index of the next element to be packed
for vol in towers:
    if vol <= budget:
        towers_to_pick[j] = vol
        budget -= vol
        j += 1

print(f'Investment: {sum(towers_to_pick)} \nExecution time: {time.time() - time_start} seconds \nTowers selected: {towers_to_pick[:j]}')

Investment: 4.999996521022695 
Execution time: 0.03643393516540527 seconds 
Towers selected: [5.8088803e-01 6.0085980e-03 5.0092101e-01 9.4081634e-01 9.7277844e-01
 2.5623366e-01 8.4856910e-01 2.0202537e-01 3.4030995e-01 3.1571701e-01
 1.7443914e-02 2.3598270e-03 1.1648559e-02 3.9108428e-03 2.8377521e-04
 8.2092978e-05]


## Avoid Loops with <code>numpy</code> and Elementwise Calculations (Vectorization)

As we have discussed, your code slows dramatically with each nested <code>for</code> loop you add.  You can avoid using loops with <code>numpy</code> vectorization.  While the loops are eliminated from your Python code, a looping mechanism is still executed behind the scenes in <code>numpy</code>, although <code>numpy</code> does this operation much more quickly than if it was done with Python code.

### The Traditional Python Approach to Array Addition with Loops

In [None]:
x = [[0,1,2],[3,4,5],[6,7,8]]
y = [[1,1,1],[1,1,1],[1,1,1]]
z = [[0,0,0],[0,0,0],[0,0,0]]

for i in range(len(x)):
    for j in range(len(x[0])):
        z[i][j] = x[i][j] + y[i][j]
print(z)

### Array Addition With <code>numpy</code>, Without loops

In [None]:
x = np.array([[0,1,2],[3,4,5],[6,7,8]])
y = np.array([[1,1,1],[1,1,1],[1,1,1]])

z = x + y
print(z)

### A Bigger Addition Problem

In [17]:
import random
prob_size = 1000
x = [[random.randint(0,10) for _ in range(prob_size)] for _ in range(prob_size)]
y = [[random.randint(0,10) for _ in range(prob_size)] for _ in range(prob_size)]
z = [[0 for _ in range(prob_size)] for _ in range(prob_size)]

time_start = time.time()
for i in range(len(x)):
    for j in range(len(x[0])):
        z[i][j] = x[i][j] + y[i][j]
print(f'for loop execution time: {time.time() - time_start}')

x = np.random.randint((prob_size,prob_size))
y = np.random.randint((prob_size,prob_size))

time_start = time.time()
z = x + y
print(f'numpy execution time: {time.time() - time_start}')

for loop execution time: 0.2594606876373291
numpy execution time: 0.0


## Selecting Elements from <code>numpy</code> Arrays 

- <span style="font-family:'Courier New'">np.min()</span>
- <span style="font-family:'Courier New'">np.max()</span>
- <span style="font-family:'Courier New'">np.argmin()</span>
- <span style="font-family:'Courier New'">np.argmax()</span>

Algorithms frequently require that either the minimum or maximum elements be selected from an array/list or, in a more complex manner, the best element fitting particular criteria is sought.

One one just find the least or greatest array elements using the <code>np.min()</code> or <code>np.max()</code> methods, respectively.

In [None]:
x = np.random.randint(0,10,(10,))

In [18]:
print(x)
print(x.min(), np.min(x))
print(x.max(), np.max(x))

[9 7 5 1 9 8 3 9 0 9]
0 0
9 9


One might also find the leat and greatest elements using the <code>np.argmin()</code> or <code>np.argmax()</code> methods, respectively, although this requires a second statement to actually retrieve the element values.

In [19]:
idx_min = x.argmin()
idx_max = x.argmax()
print(x)
print(idx_min, x[idx_min])
print(idx_max, x[idx_max])

[9 7 5 1 9 8 3 9 0 9]
8 0
0 9


Despite needing a second statement to obtain a value, knowing the index of a minimum/maximum is quite useful when one must select multiple elements from an array and keep track of which elements have been selected so that they are not selected again.  This is the focus of a subsequent section in this Jupyter notebook.

The <code>np.argsort()</code> method can be useful to find the element from a list that, rather than being the least or greatest element, is the largest (smallest) item smaller (larger) than some upper (lower)limit.

In [27]:
idx_sort = np.argsort(x)
print(f'x: {x}')
print(f'idx_sort: {idx_sort}')
print(f'x[idx_sort]: {x[idx_sort]}')

x: [9 7 5 1 9 8 3 9 0 9]
idx_sort: [8 3 6 2 1 5 0 4 7 9]
x[idx_sort]: [0 1 3 5 7 8 9 9 9 9]


In [25]:
# Find the largest element less than 5
i = -1
while x[idx_sort[i+1]] < 5 and i+1 < x.shape[0]:
    i += 1
print(i, idx_sort[i], x[idx_sort[i]])

2 6 3


Recall one method for sorting in descending order.

In [28]:
idx_sort = np.argsort(x)
idx_sort = np.flip(idx_sort)
print(f'x: {x}')
print(f'idx_sort: {idx_sort}')
print(f'x[idx_sort]: {x[idx_sort]}')

x: [9 7 5 1 9 8 3 9 0 9]
idx_sort: [9 7 4 0 5 1 2 6 3 8]
x[idx_sort]: [9 9 9 9 8 7 5 3 1 0]


This is another method, although it is perhaps less intuitive.

In [29]:
idx_sort = np.argsort(x)
idx_sort = idx_sort[::-1]
print(f'x: {x}')
print(f'idx_sort: {idx_sort}')
print(f'x[idx_sort]: {x[idx_sort]}')

x: [9 7 5 1 9 8 3 9 0 9]
idx_sort: [9 7 4 0 5 1 2 6 3 8]
x[idx_sort]: [9 9 9 9 8 7 5 3 1 0]


## Boolean Masks

A Boolean (<code>True</code>/<code>False</code>) array can be used to filter out values from a <code>numpy</code> array.  Array elements whose position coincide with a <code>False</code> are filtered out.

### Example 1

In [None]:
size = 5
x = np.arange(size)
x

In [None]:
mask_x = np.array([True if i%2==1 else False for i in range(size)])
mask_x

In [None]:
print(x[mask_x])

### Example 2

In [None]:
y = np.arange(size**2).reshape(5,5)
y

In [None]:
mask_y = np.array([True if i%2==1 else False for i in range(size**2)]).reshape(5,5)
mask_y

In [None]:
print(y[mask_y])

### Example 3

In [None]:
z = np.random.random(size = (10,))
z

In [None]:
z >= 0.5

In [None]:
mask_z = (z >= 0.5)
mask_z

In [None]:
z[mask_z]

## Application 2: Traveling Salesperson Problem

In this problem, the task is to maintain the original data in its original instantiation without deleting the data pertaining to the destinations already included in the Traveling Salesperson's route.

### Recall <code>np.argmin()</code>

Gets the index (argument) of the minimum value.

In [None]:
z = np.random.random(10)
print(z)

In [None]:
print(f'argmin(z) = {np.argmin(z)}; minimum value = {z[np.argmin(z)]}')

### A TSP Greedy Algorithm

Randomly select Location 1 to start.

- Loop until all locations visited
  - For each location, choose the next location to be closest possible next location of locations not yet visited
  
![AlgoStep1](images/m1.jpg)
![AlgoStep2](images/m2.jpg)
![AlgoStep3](images/m3.jpg)
![AlgoStep4](images/m4.jpg)
![AlgoStep5](images/m5.jpg)

Route: 1-2-0-4-3-1

#### Set up the data

In [None]:
# create distance matrix
nloc = 10
dist = np.random.rand(nloc,nloc)
dist = np.triu(dist,k=0)
for i in range(1,nloc):
    for j in range(0, i):
        dist[i,j] = dist[j,i]
for i in range(nloc):
    dist[i,i]=0.0
dist

In [None]:
''' Set up parameters '''
nloc = dist.shape[0]                      # number of locations
assert dist.shape[0] == dist.shape[1]     # ensure square distance matrix

''' Initialize random starting point '''
start = np.random.randint(0, nloc-1)      # select random starting location
sol = [start]                             # solution route in a list
cur_loc = start                           # use cur_loc to indicate current location index

''' Establish Boolean mask for the columns: True = column location not visited '''
col_mask = np.ones(nloc).astype(np.bool_) # creates array of True
col_mask[start] = False                   # cannot choose starting location as
                                          # next location

''' Create ndarray of column indices '''
col_indices = np.arange(nloc)             # create array of indices

''' Initial distance of solution '''
sol_dist = 0.0                            # initialize distance of solution

''' Execute algorithm '''
while col_mask.sum() > 0:              # continue if any True values in col_mask
    ''' Get index of next location '''
    next_loc_ind = np.argmin(dist[cur_loc][col_mask])  # get index of row minimum for
                                                       #  remaining locations
    next_loc_ind = col_indices[col_mask][next_loc_ind] # find index of minimum relative to original
                                                       #   indices (true index of location)
    
    ''' Update solution and mask '''
    sol.append(next_loc_ind)                   # append next location to solution
    col_mask[next_loc_ind] = False             # update mask for current location
    sol_dist += dist[cur_loc, next_loc_ind]    # update solution distance
    cur_loc = next_loc_ind                     # update current location

sol.append(start)       # append starting location for round trip
sol, sol_dist