# Parallelizing Your Own Python Algorithms

## Lab Solution

In [None]:
import coiled
from dask.distributed import Client

cluster = coiled.Cluster(name="training-cluster")
client = Client(cluster)
client

## Lab - Emergency Services Modeling

We'll work on a more complicated simulation-based model to evaluate time-to-response for emergency vehicles in different schemes for Cascadia City.

Part of the city is planned as a 16x16 block street grid, and we'd like to look at a few different models where we divide this region into equal-sized zones, and each zone has its own emergency vehicle (which must remain inside that zone).

The purpose of our lab is to use Dask to distribute the work, so we'll start with some functions that do most of the calculation work, and focus on running those in the Dask cluster using `Future`.

In [None]:
import numpy as np
from matplotlib import pyplot as plt

traffic = np.load('data/traffic.npy')

plt.imshow(traffic)
plt.colorbar()

This array represents transit time costs (in minutes, under congested conditions) to reach all of the intersections in this 16x16 block grid, based on data from other cities.

To find travel time between points for the whole grid -- or for a section -- we'll build an *adjacency matrix* and then use a shortest-path algorithm.

In [None]:
city_chunk_width = 4 # we'll work with square chunks, so N-S and E-W are both 4

def build_adj_matrix(costs):
    adj_dim = costs.shape[0] ** 2
    adj_matix = np.zeros((adj_dim, adj_dim)) # since every pair of locations gets a cost in the adj matrix
    
    def linear_loc_for_row_col(r, c):
        return r + c*costs.shape[0]
    
    for i in range(costs.shape[0]):
        for j in range(costs.shape[1]):
            cost_to_ij = costs[i, j]
            dest_loc = linear_loc_for_row_col(i, j)
            if i > 0:
                adj_matix[linear_loc_for_row_col(i-1, j), dest_loc] = cost_to_ij                
            if i < costs.shape[0] - 1:
                adj_matix[linear_loc_for_row_col(i+1, j), dest_loc] = cost_to_ij                
            if j > 0:
                adj_matix[linear_loc_for_row_col(i, j-1), dest_loc] = cost_to_ij                
            if j < costs.shape[1] - 1:
                adj_matix[linear_loc_for_row_col(i, j+1), dest_loc] = cost_to_ij
    return adj_matix

demo_adj = build_adj_matrix(traffic[0:city_chunk_width, 0:city_chunk_width])
plt.imshow(demo_adj)

We can use a helper from `scipy` to find the shortest path (expressed here as travel time)

In [None]:
from scipy.sparse.csgraph import shortest_path

In [None]:
total_travel_time_all = shortest_path(demo_adj)
plt.imshow(total_travel_time_all)
plt.colorbar()

Now, suppose there are a fire and a fire truck at particular locations

In [None]:
import random

def response_to_random_fire(travel_time_matrix, zone_rows, zone_cols):
    fire_x = random.randint(0, zone_cols-1)
    fire_y = random.randint(0, zone_rows-1)

    firetruck_x = random.randint(0, zone_cols-1)
    firetruck_y = random.randint(0, zone_rows-1)

    travel_from = firetruck_y + zone_rows*firetruck_x
    travel_to = fire_y + zone_rows*fire_x
    
    return travel_time_matrix[travel_from, travel_to]

response_sample = response_to_random_fire(total_travel_time_all, city_chunk_width, city_chunk_width)

print("Travel time", response_sample)

We'd like to measure response time under various scenarios, including ones where more trucks are available.

#### Activity 1: Travel time matrices for all zones

Divide the full traffic map (matrix) into 16 subsections similar to the one above, and generate travel time matrices for all of them using Dask.

Note: in some scenarios we might use Dask array, but for today's exercise, let's use regular NumPy and focus on parallelizing our work with `Future`.

Hint: For dividing the matrix into subsections, adapt this sample code using:

In [None]:
example = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12], [13,14,15,16]])
example

In [None]:
arrays = []

for outer in map(lambda m : np.vsplit(m, 2), np.hsplit(example, 2)):
    for inner in outer:
        arrays.append(inner)
    
arrays

In [None]:
zones = []

for outer in map(lambda m : np.vsplit(m, city_chunk_width), np.hsplit(traffic, city_chunk_width)):
    for inner in outer:
        zones.append(inner)

In [None]:
zones

In [None]:
adj_data = client.map(build_adj_matrix, zones)

In [None]:
travel_times_futures = client.map(shortest_path, adj_data) 

In [None]:
travel_times = client.gather(travel_times_futures)

In [None]:
plt.imshow(travel_times[0])

#### Activity 2: Emergency response times for all zones

Simulate emergency response times for each zone, using Dask

In [None]:
zone_count = len(zones)

sample = client.map(response_to_random_fire, travel_times_futures, 
                    [city_chunk_width]*zone_count, [city_chunk_width]*zone_count, pure=False)

In [None]:
client.gather(sample)

#### Activity 3: Collect and plot samples for all zones

Gather 100 samples for each zone, combine the results, and plot a histogram

In [None]:
all_sample_futures = []
for i in range(100):
    all_sample_futures.extend( \
        client.map(response_to_random_fire, travel_times_futures, 
                   [city_chunk_width]*zone_count, [city_chunk_width]*zone_count, pure=False))
    
plt.hist(client.gather(all_sample_futures), bins=20)

#### Activity 4: Compare zone schemes

*Bonus*

Simulate
* the single-zone model with 16 firetrucks uniformly distributed
  * this means 1 zone and `city_chunk_width` of 16
  * 16 random firetruck locations, so 16 travel times (choose shortest or mean)
* 4-zone model (each zone `city_chunk_width` of 8)

Compare the response time distributions to the 16-zone model we've done so far