## Model 1
This notebook performs optimization for model 1 -- Maximizing marginal access by placing new grocery stores

### Inputs:
- relevant_buildings.shp

### Pre-optimization setup:
1. Use the helper_population_allocation.py to allocate a population count to each residential building 
2. Use the helper_distance_calculation.py to calculate existing access and distance between a residential and commercial building
3. Use the helper_distance_calculation.py to calculate existing access to grocery stores for each residential building (0.5 mile)
4. Once 1-3 are done, all parameters are ready. 

### Model 1 methodology and outputs
1. The analytical formulation is provided below in the 'optimization' section of the code, with detailed comments
2. Output: where to put additional stores to maximize access
3. Output: same as above, but assuming no existing grocery stores (start from scratch thought experiment)


In [2]:
# Import libraries
import geopandas as gpd
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import haversine as hs
import gurobipy as gp
from gurobipy import GRB

# Helper modules
import helper_population_allocation as pa   # User defined file to calculate population at each residential building
import helper_distance_calculation as dc    # User defined file to calculate Haversine distance between any 2 buildings

# Avoid printing set copy warnings
import warnings
warnings.filterwarnings("ignore")

### PRE-OPTIMIZATION SETUP

In [3]:
# Get the main buildings dataset 
buildings_df = gpd.read_file('../processed_data/relevant_buildings.shp')

# Create ID variable (each row represents a unique building)
buildings_df.reset_index(drop=True, inplace=True)
buildings_df['building_id'] = buildings_df.index + 1
buildings_df['building_id'] = buildings_df.apply(lambda row: str(row['building_id']) + '-' + str(row['CLASS']) , axis=1)

In [4]:
# Create arrays to track ordering (residential)
res_buildings = buildings_df[buildings_df['class_reco'].str.contains('Residential')]
res_buildings = res_buildings.sort_values('building_id')
res_buildings = dc.get_geocoordinate(res_buildings, 'geometry')

res_buildings_array = np.array(res_buildings['building_id'])    # ith element represents the building id of ith residential building
res_buildings_coordinates_array = np.array(res_buildings['coordinates'])    # ith element represents the coordinates of the ith residential building

In [5]:
# Create arrays to track ordering (Commercial)
comm_buildings = buildings_df[buildings_df['class_reco'].str.contains('commercial')]
comm_buildings = comm_buildings.sort_values('building_id')
comm_buildings = dc.get_geocoordinate(comm_buildings, 'geometry')

comm_buildings_array = np.array(comm_buildings['building_id'])  # ith element represents the building id of ith commercial building
comm_buildings_coordinates_array = np.array(comm_buildings['coordinates'])  # ith element represents the coordinates of the ith commercial building


In [6]:
# Create arrays to track ordering (grocery stores)
grocery_stores = buildings_df[buildings_df['class_reco'].str.contains('Grocery')]
grocery_stores = grocery_stores.sort_values('building_id')
grocery_stores = dc.get_geocoordinate(grocery_stores, 'geometry')

grocery_stores_array = np.array(grocery_stores['building_id'])  # ith element represents the building id of ith grocery store
grocery_stores_coordinates_array = np.array(grocery_stores['coordinates'])  # ith element represents the coordinates of the ith grocery store


In [7]:
### Calculate pairwise distances ###

# DONT RUN THIS AGAIN
# WE have run this and stored the matrices in processed_data
# This code block takes about 66 mins (more depending on CPU)

# Create parameter matrices (Res comm access matrix - Bij)
# [i,j] value indicates whether residential building i is within access distance of commercial building j
# res_comm_distance_matrix, res_comm_access_matrix = dc.calculate_access(res_buildings_coordinates_array, comm_buildings_coordinates_array)

# # Save file
# np.save('../processed_data/res_comm_distance_matrix', res_comm_distance_matrix)


In [8]:
# Load the files and use it (created by running above code block)
# [i,j] corresponds to distance between ith residential building and jth commercial building
res_comm_distance_matrix = np.load('../processed_data/res_comm_distance_matrix.npy')  

# Creating res comm access matrix with the 0.5 mile definition
# [i,j] entry indicates whether residential building i and commercial building j are within 0.5 miles of each other
res_comm_access_matrix_half_mile = res_comm_distance_matrix.copy()
res_comm_access_matrix_half_mile[res_comm_access_matrix_half_mile <= 0.5] = 1
res_comm_access_matrix_half_mile[res_comm_access_matrix_half_mile != 1] = 0

In [9]:
# Create parameter matrices (Res groc access array - Ai)
# ith value indicates whether the ith residential building has existing access (within 0.5 miles)
res_groc_distance_matrix, res_groc_access_matrix = dc.calculate_access(res_buildings_coordinates_array, grocery_stores_coordinates_array)
res_groc_access_matrix_half_mile = res_groc_distance_matrix.copy()
res_groc_access_matrix_half_mile[res_groc_access_matrix_half_mile <= 0.5] = 1
res_groc_access_matrix_half_mile[res_groc_access_matrix_half_mile != 1] = 0

res_access_array_half_mile = np.amax(res_groc_access_matrix_half_mile, 1)

In [10]:
# Create parameter matrices (Res Population - Pi)
# ith value indicates the population of the ith residential building
res_population = pa.get_population(buildings_df) 
res_population = res_population.drop_duplicates('building_id') # drop duplicates

res_population = res_population.sort_values('building_id') # Just to be safe
res_population_array = np.array(res_population['population'])
res_population_array.shape


(109324,)

### OPTIMIZATION


i = set of all residential buildings,
j = set of all commercial buildings

Decision variable:
- Cj = 1 if the new grocery store is put in commercial building j, 0 otherwise

Parameters:
- Ai = 1 if residential building i already has access to a food store (within 0.5 mile)
- Bij = 1 if commercial building j is within 0.5 mile of residential building i
- Pi = Population at building i

Objective function:

Max $$ \sum_j \sum_i (1-A_i)*P_i*B_{ij}*C_j$$






In [11]:
# Function that can put n stores in a greedy way
def place_n_stores(num_stores, access_matrix, access_array):

    store_indices = []  # Store the indices of selected commercial buildings where new grocery stores would be put
    store_ids = []  # Store the building ids of chosen commercial buildings
    marginal_access_gain = []   # Store marginal access by placing each new store

    res_access_array_copy = access_array.copy()
    for n in range(num_stores):

        if np.sum(res_access_array_copy) == len(res_buildings_array): # This means every building now has access
            pass
        else:
            # STEP 1
            existing_access_indices = res_access_array_copy.nonzero()[0] # These are indices of residential buildings that currently have access
            res_comm_access_matrix_subset = np.delete(access_matrix, existing_access_indices, axis=0 )  # Remove those buildings from consideration in the res_comm_access_matrix

            # STEP 2
            res_population_array_sub = np.delete(res_population_array, existing_access_indices, axis=0) # Remove the same buildings as above from res population array

            # STEP 3
            res_population_array_sub = np.reshape(res_population_array_sub, (-1, len(res_population_array_sub)))
            new_access_array = np.matmul(res_population_array_sub, res_comm_access_matrix_subset)   # The result containings the marginal population access for each commercial building

            chosen_comm_index = np.argmax(new_access_array) # Chose the building that provides maximum marginal access
            chosen_comm_building = comm_buildings_array[chosen_comm_index]
            new_access_created = np.max(new_access_array)

            # STEP 4: Update results and arrays to indicate that a store has been put in this loop
            
            # Which residential buildings does this new store give access to
            new_buildings_with_access = access_matrix[:,chosen_comm_index].nonzero()[0]    # These are the indices in the res_access_array that need to be replaced (these buildings now have access)

            # Update the access values of these buildings in the access array
            replace_vals = list(np.ones(new_buildings_with_access.shape)) # These are the values which with certain elements of the access array will be replaced with

            # Perform replace
            res_access_array_copy[new_buildings_with_access] = replace_vals

            # STEP 5: Store results
            store_indices.append(chosen_comm_index)
            store_ids.append(chosen_comm_building)
            marginal_access_gain.append(new_access_created)



    return store_indices, store_ids, marginal_access_gain

### Results/Analysis from Optimization

In [12]:
# IMPLEMENTING THE MODEL

# Choose a large n_stores, so that the model can keep placing stores until maximum possible access is achieved
# Since the placemet is greedy, the first store placed corresponds to the same store it would pick if n=1
# Similarly, top 3 is the same as running this function with n=3 

# Hence, we do the large run, to see all new locations it would pick, given the current placement of existing grocery stores

# Run model
store_indices, store_ids, marginal_access_gain = place_n_stores(50, access_array=res_access_array_half_mile,
                                                                     access_matrix=res_comm_access_matrix_half_mile)

print(f"Num additional stores needed to fill complete access = {len(store_indices)}")

# Print results
results_df = pd.DataFrame(list(zip(store_ids, marginal_access_gain)),
                                columns = ['building_id', 'marginal_access_gain'])

print('\nNew store locations chosen by model, in order:')

# Write results to csv for use in later files
results_df.to_csv('../processed_data/new_store_ids.csv')

results_df




Num additional stores needed to fill complete access = 21

New store locations chosen by model, in order:


Unnamed: 0,building_id,marginal_access_gain
0,31851-C,8040.7
1,105592-R,6919.09
2,3018-R,5697.39
3,19415-C,5319.45
4,110387-C,4887.92
5,14692-C,4366.1
6,65810-C,4301.95
7,109-C,1997.31
8,3898-C,1902.04
9,2877-C,1284.82


In [13]:
# THOUGHT EXPERIMENT: IF THERE WERE NO EXISTING STORES, AND WE START FROM SCRATCH

res_access_array_zero = np.zeros(res_access_array_half_mile.shape)


store_indices, store_ids, marginal_access_gain = place_n_stores(100, access_array=res_access_array_zero,
                                                                     access_matrix=res_comm_access_matrix_half_mile)

print(f"Num stores needed to fill complete access, if we start from scratch = {len(store_indices)}")

results_df = pd.DataFrame(list(zip(store_ids, marginal_access_gain)),
                                columns = ['building_id', 'marginal_access_gain'])

print('\nNew store locations chosen by model, in order:')

# Write results to csv for use in later files
results_df.to_csv('../processed_data/new_store_ids_assuming_no_existing_access.csv')

results_df

Num stores needed to fill complete access, if we start from scratch = 48

New store locations chosen by model, in order:


Unnamed: 0,building_id,marginal_access_gain
0,112491-C,42913.48
1,103358-C,26982.53
2,60523-C,24095.1
3,101372-C,22349.77
4,36990-C,19643.89
5,113664-C,19577.45
6,80815-C,18112.11
7,52332-C,15308.27
8,76745-C,15249.77
9,15668-C,13861.77
