This notebook performs the optimization task to find the location where a new grocery store should be placed, to maximize access

How this works (implementation):

1. Use the helper_population_allocation.py to allocate a population count to each residential building (not ready yet, some placeholder function in there)
2. Use the helper_distance_calculation.py to calculate existing access and distance between a residential and commercial building
3. Once 1 and 2 are done, all parameters are ready. 

4. Then this notebook does some pre optimization setup and then runs an optimization model in Gurobi. It's pretty anti climatic. 

Next steps:

- Update the population allocation helper file
- Figure out the correct ordering of arrays. It's not accurate currently 


In [1]:
# Import libraries
import geopandas as gpd
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import haversine as hs
import gurobipy as gp
from gurobipy import GRB

# Helper modules
import helper_population_allocation as pa
import helper_distance_calculation as dc

# Avoid printing set copy warnings
import warnings
warnings.filterwarnings("ignore")



PRE-OPTIMIZATION SETUP

In [2]:
# Get the main buildings dataset 
buildings_df = gpd.read_file('../processed_data/relevant_buildings.shp')

# Create ID variable
buildings_df.reset_index(drop=True, inplace=True)
buildings_df['building_id'] = buildings_df.index + 1
buildings_df['building_id'] = buildings_df.apply(lambda row: str(row['building_id']) + '-' + str(row['CLASS']) , axis=1)

buildings_df = buildings_df.sample(n=5000, random_state=1)  # Remove later


# Population parameter (Pj)
res_population = pa.get_population(geopandas_dataframe=buildings_df) 

# Existing access parameter (Aj)
res_groc_access =  dc.calculate_access(
                            geopandas_dataframe=buildings_df,
                            building_type_1='Residential',
                            building_type_2='Grocery',
                            identifier_column='class_reco', 
                            geo_column='geometry', 
                            output_format='dataframe'
)



modified script


In [4]:
# Residential- commercial access parameter (Bij)
# Run this once, save dataset (will take forever)
res_comm_access = dc.calculate_access(
                            geopandas_dataframe=buildings_df,
                            building_type_1='Residential',
                            building_type_2='commercial',
                            identifier_column='class_reco', 
                            geo_column='geometry', 
                            output_format='dataframe'
)


modified script


OPTIMIZATION

Steps:
i = set of all commercial buildings
j = set of all residential buildings

Decision variable:
- Ci = 1 if the new grocery store is put in commercial building i, 0 otherwise

Parameters:
- Aj = 1 if residential building j already has access to a food store (within 1 mile)
- Bij = 1 if commercial building i is within 1 mile of residential building j
- Pj = Population at building j

Objective function:

Max $$ \sum_i \sum_j (1-A_j)*P_j*B_{ij}*C_i$$

Constraint (only 1 grocery store location being allocated):
$$ \sum_i C_i = 1$$





In [5]:
########################################
# SET UP MODEL
########################################

m = gp.Model("food_access")

# Set up arrays for gurobi
res_population_array = np.array(res_population[res_population['class_reco'].str.contains('Residential')]['population']) # ith entry corresponds to population at ith residential building 
res_access_array = np.array(res_groc_access[['building_idResidential', 'access']].groupby('building_idResidential').max('access')['access']) # ith entry corresponds to existing access at ith residential building 
res_comm_access_matrix = np.array(res_comm_access.pivot(index='commercial_coordinates', columns='Residential_coordinates', values='access')) # entry [i,j] corresponds to Bij

print('arrays created')

num_commercial_buildings = len(buildings_df[buildings_df['class_reco'].str.contains('commercial')])
num_residential_buildings = len(buildings_df[buildings_df['class_reco'].str.contains('Residential')])


########################################
# ASSIGN DECISION VARIABLES
########################################

c_i = m.addVars(range(num_commercial_buildings), vtype=GRB.BINARY)

print('decision vars added')

#######################################
# OBJECTIVE FUNCTION
########################################

m.setObjective(sum(((1-res_access_array[j]) * res_population_array[j] * res_comm_access_matrix[i,j] * c_i[i] for i in range(num_commercial_buildings) for j in range(num_residential_buildings))))
m.modelSense = GRB.MAXIMIZE

print('objective function set')

########################################
# CONSTRAINTS
########################################

m.addConstr(sum(c_i[i] for i in range(num_commercial_buildings)) ==  1) 

print('constraints added')


Set parameter Username
Academic license - for non-commercial use only - expires 2023-08-24
arrays created
decision vars added
objective function set
constraints added


In [6]:
# Optimize and see results
m.optimize()

Gurobi Optimizer version 9.5.2 build v9.5.2rc0 (win64)
Thread count: 4 physical cores, 8 logical processors, using up to 8 threads
Optimize a model with 1 rows, 298 columns and 298 nonzeros
Model fingerprint: 0x14b9dfd2
Variable types: 0 continuous, 298 integer (298 binary)
Coefficient statistics:
  Matrix range     [1e+00, 1e+00]
  Objective range  [3e+00, 1e+03]
  Bounds range     [1e+00, 1e+00]
  RHS range        [1e+00, 1e+00]
Found heuristic solution: objective 37.0000000
Presolve removed 1 rows and 298 columns
Presolve time: 0.00s
Presolve: All rows and columns removed

Explored 0 nodes (0 simplex iterations) in 0.01 seconds (0.00 work units)
Thread count was 1 (of 8 available processors)

Solution count 2: 1175 37 

Optimal solution found (tolerance 1.00e-04)
Best objective 1.175000000000e+03, best bound 1.175000000000e+03, gap 0.0000%


ORDERING OF ARRAYS SANITY CHECK

In [7]:
# New population impacted
print(f"Total new population with access = {m.objVal}")

# Where is the store being placed (need a better way than this forloop)
store_location_index = 0
for i in range(num_commercial_buildings):
    if c_i[i].x == 1:
        store_location_index = i
        print(f"store should be placed in commercial building {i}")



Total new population with access = 1175.0
store should be placed in commercial building 69


In [8]:
# Now let's check if the ordering makes sense. 

# Which building is this
# Since the res_com_access has a row per unique residential and commercial building combination, let's just find the 59th commercial building for the first residential building
chosen_building_id = res_comm_access[res_comm_access['building_idResidential'] =='49763-R']['building_idcommercial'].iloc[store_location_index]
print(f"The chosen building id is {chosen_building_id}")

The chosen building id is 25934-R


In [16]:
# Which Residential buildings have access to this building and don't have current access
res_buildings = res_comm_access[res_comm_access['building_idcommercial']==chosen_building_id]
res_buildings = res_buildings[res_comm_access['access']==1]
res_buildings

Unnamed: 0,geoid_Residential,tract_id_Residential,Residential_coordinates,building_idResidential,geoid_commercial,tract_id_commercial,commercial_coordinates,building_idcommercial,distance,access
1261,420035632002,563200,"(-79.9951063001856, 40.45638468308156)",85754-R,420032412002,241200,"(-79.98690607803377, 40.46899524767685)",25934-R,0.586470,1
2453,420033001005,300100,"(-79.99060371696692, 40.410960204296565)",54570-R,420032412002,241200,"(-79.98690607803377, 40.46899524767685)",25934-R,0.742419,1
3049,420030203001,020300,"(-79.97974734318176, 40.454858885498055)",72401-R,420032412002,241200,"(-79.98690607803377, 40.46899524767685)",25934-R,0.522984,1
4539,420032901001,290100,"(-79.98877120793772, 40.39701389356088)",76118-R,420032412002,241200,"(-79.98690607803377, 40.46899524767685)",25934-R,0.874218,1
6029,420032412002,241200,"(-79.99350233395364, 40.45870796730991)",25982-R,420032412002,241200,"(-79.98690607803377, 40.46899524767685)",25934-R,0.472206,1
...,...,...,...,...,...,...,...,...,...,...
1389345,420032620001,262000,"(-79.99728587690483, 40.46646390617516)",35285-R,420032412002,241200,"(-79.98690607803377, 40.46899524767685)",25934-R,0.717820,1
1394411,420031702001,170200,"(-79.97832924216891, 40.42990428398907)",79089-R,420032412002,241200,"(-79.98690607803377, 40.46899524767685)",25934-R,0.756246,1
1394709,420033001005,300100,"(-79.98847547498661, 40.41791105756967)",55060-R,420032412002,241200,"(-79.98690607803377, 40.46899524767685)",25934-R,0.623159,1
1395007,420031702001,170200,"(-79.98236922536847, 40.42768706153529)",78385-R,420032412002,241200,"(-79.98690607803377, 40.46899524767685)",25934-R,0.587062,1


In [22]:
# Lets add up the population in all the buildings in res_buildings and see if that value matches the optimum function value
res_population.rename(columns={'building_id': "building_idResidential"},inplace=True)
res_buildings_pop = pd.merge(res_buildings, res_population, on ='building_idResidential', how='inner')
np.sum(res_buildings_pop['population'])

808

In [23]:
if np.sum(res_buildings_pop['population']) == m.objval:
    print('results match and ordering makes sense')

else:
    print('manual and gurobi result value does not match, fix ordering of arrays')

manual and gurobi result value does not match, fix ordering of arrays
