This notebook performs the optimization task to find the location where a new grocery store should be placed, to maximize access

How this works (implementation):

1. Use the helper_population_allocation.py to allocate a population count to each residential building (not ready yet, some placeholder function in there)
2. Use the helper_distance_calculation.py to calculate existing access and distance between a residential and commercial building
3. Once 1 and 2 are done, all parameters are ready. 

4. Then this notebook does some pre optimization setup and then runs an optimization model in Gurobi. It's pretty anti climatic. 

Next steps:

- Update the population allocation helper file
- Figure out the correct ordering of arrays. It's not accurate currently 


In [116]:
# Import libraries
import geopandas as gpd
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import haversine as hs
import gurobipy as gp
from gurobipy import GRB

# Helper modules
import helper_population_allocation as pa
import helper_distance_calculation as dc

# Avoid printing set copy warnings
import warnings
warnings.filterwarnings("ignore")



PRE-OPTIMIZATION SETUP

In [117]:
# Get the main buildings dataset 
buildings_df = gpd.read_file('../processed_data/relevant_buildings.shp')

# Create ID variable
buildings_df.reset_index(drop=True, inplace=True)
buildings_df['building_id'] = buildings_df.index + 1
buildings_df['building_id'] = buildings_df.apply(lambda row: str(row['building_id']) + '-' + str(row['CLASS']) , axis=1)

buildings_df = buildings_df.sample(n=2000, random_state=1)  # Remove later


# Population parameter (Pj)
res_population = pa.get_population(geopandas_dataframe=buildings_df) 

# Existing access parameter (Aj)
res_groc_access =  dc.calculate_access(
                            geopandas_dataframe=buildings_df,
                            building_type_1='Residential',
                            building_type_2='Grocery',
                            identifier_column='class_reco', 
                            geo_column='geometry', 
                            output_format='dataframe'
)



modified script


In [118]:
# Residential- commercial access parameter (Bij)
# Run this once, save dataset (will take forever)
res_comm_access = dc.calculate_access(
                            geopandas_dataframe=buildings_df,
                            building_type_1='Residential',
                            building_type_2='commercial',
                            identifier_column='class_reco', 
                            geo_column='geometry', 
                            output_format='dataframe'
)


modified script


In [119]:
# Sort data, create arrays for gurobi optimization so that ordering is maintained

# Res population
res_population = res_population.sort_values('building_id')
res_population_array = np.array(res_population[res_population['class_reco'].str.contains('Residential')]['population']) # ith entry corresponds to population at ith residential building 

# Res access
res_groc_access = res_groc_access[['building_idResidential', 'access']].groupby('building_idResidential').max('access').sort_values('building_idResidential')
res_access_array = np.array(res_groc_access['access']) # ith entry corresponds to existing access at ith residential building 

# Res comm access (we will not create a matrix because it messes up ordering. We will directly use the values from the dataframe, but will sort it)
res_comm_access = res_comm_access.sort_values(by=['building_idResidential', 'building_idcommercial'])

# create a sorted list of all residential building ids
res_buildings = np.array(res_population['building_id'])

# Create a sorted list of all commercial buildings
# Just picking one residential building ID and getting all the associated commercial building IDs for that would do the job
# Since this dataframe contains a unique row per residential commercial building pair, sorted by (residential, commercial)
comm_buildings = np.array(res_comm_access[res_comm_access['building_idResidential'] == '100022-R']['building_idcommercial'])



In [120]:
# Create a matrix because optimization takes too long otherwise. 
# Should brainstorm how to make this matrix creation faster. I used pandas.pivot and that works but messes up the ordering

res_comm_access_matrix = np.zeros((len(res_buildings), len(comm_buildings)))

for i in range(len(res_buildings)):
    building_id = res_buildings[i]
    insert_array = np.array(res_comm_access[res_comm_access['building_idResidential'] == building_id]['access'])

    res_comm_access_matrix[i] = insert_array

res_comm_access_matrix




array([[0., 1., 1., ..., 0., 1., 1.],
       [0., 1., 1., ..., 0., 1., 1.],
       [0., 1., 1., ..., 0., 1., 1.],
       ...,
       [0., 1., 1., ..., 0., 1., 1.],
       [0., 1., 1., ..., 0., 1., 1.],
       [0., 1., 1., ..., 0., 1., 1.]])

In [121]:
res_comm_access_matrix.T.shape

(115, 1881)

OPTIMIZATION

Steps:
i = set of all commercial buildings
j = set of all residential buildings

Decision variable:
- Ci = 1 if the new grocery store is put in commercial building i, 0 otherwise

Parameters:
- Aj = 1 if residential building j already has access to a food store (within 1 mile)
- Bij = 1 if commercial building i is within 1 mile of residential building j
- Pj = Population at building j

Objective function:

Max $$ \sum_i \sum_j (1-A_j)*P_j*B_{ij}*C_i$$

Constraint (only 1 grocery store location being allocated):
$$ \sum_i C_i = 1$$





In [122]:
########################################
# SET UP MODEL
########################################

m = gp.Model("food_access")

num_commercial_buildings = len(comm_buildings)
num_residential_buildings = len(res_buildings)


########################################
# ASSIGN DECISION VARIABLES
########################################

c_i = m.addVars(range(num_commercial_buildings), vtype=GRB.BINARY)

print('decision vars added')

#######################################
# OBJECTIVE FUNCTION
########################################

m.setObjective(sum(((1-res_access_array[j]) 
                        * res_population_array[j] 
                        * res_comm_access_matrix.T[i,j]
                        * c_i[i] 
                    for j in range(num_residential_buildings) 
                for i in range(num_commercial_buildings))))
m.modelSense = GRB.MAXIMIZE

print('objective function set')

########################################
# CONSTRAINTS
########################################

m.addConstr(sum(c_i[i] for i in range(num_commercial_buildings)) ==  1) 

print('constraints added')


decision vars added
objective function set
constraints added


In [123]:
# Optimize and see results
m.optimize()

Gurobi Optimizer version 9.5.2 build v9.5.2rc0 (win64)
Thread count: 4 physical cores, 8 logical processors, using up to 8 threads
Optimize a model with 1 rows, 115 columns and 115 nonzeros
Model fingerprint: 0x1b5356fe
Variable types: 0 continuous, 115 integer (115 binary)
Coefficient statistics:
  Matrix range     [1e+00, 1e+00]
  Objective range  [1e+00, 7e+02]
  Bounds range     [1e+00, 1e+00]
  RHS range        [1e+00, 1e+00]
Found heuristic solution: objective 495.0000000
Presolve removed 1 rows and 115 columns
Presolve time: 0.00s
Presolve: All rows and columns removed

Explored 0 nodes (0 simplex iterations) in 0.01 seconds (0.00 work units)
Thread count was 1 (of 8 available processors)

Solution count 2: 655 495 

Optimal solution found (tolerance 1.00e-04)
Best objective 6.550000000000e+02, best bound 6.550000000000e+02, gap 0.0000%


ORDERING OF ARRAYS SANITY CHECK

In [124]:
# New population impacted
print(f"Total new population with access = {m.objVal}")

# Where is the store being placed (need a better way than this forloop)
store_location_index = 0
for i in range(num_commercial_buildings):
    if c_i[i].x == 1:
        store_location_index = i
        print(f"store should be placed in commercial building {i}")



Total new population with access = 655.0
store should be placed in commercial building 32


In [125]:
comm_buildings[32]

'25316-C'

In [126]:
# Now let's check if the ordering makes sense. 

# Which building is this
# Since the res_com_access has a row per unique residential and commercial building combination, let's just find the 59th commercial building for the first residential building
chosen_building_id = comm_buildings[store_location_index]
print(f"The chosen building id is {chosen_building_id}")

The chosen building id is 25316-C


In [127]:
# Which Residential buildings have access to this building and don't have current access
res_buildings = res_comm_access[res_comm_access['building_idcommercial']==chosen_building_id]
res_buildings = res_buildings[res_comm_access['access']==1]

# Removing buildings which have existing access
res_groc_access = res_groc_access.reset_index().rename(columns={'access':'existing_access'})
res_buildings = pd.merge(res_buildings, res_groc_access, on ='building_idResidential', how='inner')

res_buildings = res_buildings[res_buildings['existing_access'] == 0]
res_buildings

Unnamed: 0,geoid_Residential,tract_id_Residential,Residential_coordinates,building_idResidential,geoid_commercial,tract_id_commercial,commercial_coordinates,building_idcommercial,distance,access,existing_access
0,420031303003,130300,"(-79.8942185836958, 40.453568268060074)",102916-R,420031405003,140500,"(-79.8932043626825, 40.44669046543195)",25316-C,0.108923,1,0
1,420031303003,130300,"(-79.89251943069462, 40.45325623508251)",102928-R,420031405003,140500,"(-79.8932043626825, 40.44669046543195)",25316-C,0.092615,1,0
2,420031303003,130300,"(-79.89269068954871, 40.45329912032209)",102991-R,420031405003,140500,"(-79.8932043626825, 40.44669046543195)",25316-C,0.087639,1,0
3,420031303003,130300,"(-79.90145004872258, 40.45535258082362)",103008-R,420031405003,140500,"(-79.8932043626825, 40.44669046543195)",25316-C,0.579315,1,0
4,420031303003,130300,"(-79.8948844781098, 40.45586654503308)",103017-R,420031405003,140500,"(-79.8932043626825, 40.44669046543195)",25316-C,0.160785,1,0
...,...,...,...,...,...,...,...,...,...,...,...
131,420031203002,120300,"(-79.89838071561871, 40.46802040211525)",49653-R,420031405003,140500,"(-79.8932043626825, 40.44669046543195)",25316-C,0.441322,1,0
132,420031203002,120300,"(-79.90451871677017, 40.467196830576924)",49690-R,420031405003,140500,"(-79.8932043626825, 40.44669046543195)",25316-C,0.820292,1,0
133,420031203002,120300,"(-79.89842440100014, 40.46811326524014)",49707-R,420031405003,140500,"(-79.8932043626825, 40.44669046543195)",25316-C,0.444428,1,0
134,420031203002,120300,"(-79.90326537627557, 40.4688209213872)",49728-R,420031405003,140500,"(-79.8932043626825, 40.44669046543195)",25316-C,0.745092,1,0


In [128]:
# Lets add up the population in all the buildings in res_buildings and see if that value matches the optimum function value
res_population.rename(columns={'building_id': "building_idResidential"},inplace=True)
res_buildings_pop = pd.merge(res_buildings, res_population, on ='building_idResidential', how='inner')
np.sum(res_buildings_pop['population'])

655

In [129]:
if np.sum(res_buildings_pop['population']) == m.objval:
    print('results match and ordering makes sense. What a time to be alive')

else:
    print('manual and gurobi result value does not match, fix ordering of arrays')

results match and ordering makes sense. What a time to be alive
