## Spatial Modeling and Analytics
### Exploration
# A Simple Site Selection Example

## Reminder
<a href="#/slide-2-0" class="navigate-right" style="background-color:blue;color:white;padding:8px;margin:2px;font-weight:bold;">Continue with the lesson</a>

<br>
</br>
<font size="+1">

By continuing with this lesson you are granting your permission to take part in this research study for the Hour of Cyberinfrastructure: Developing Cyber Literacy for GIScience project. In this study, you will be learning about cyberinfrastructure and related concepts using a web-based platform that will take approximately one hour per lesson. Participation in this study is voluntary.

Participants in this research must be 18 years or older. If you are under the age of 18 then please exit this webpage or navigate to another website such as the Hour of Code at https://hourofcode.com, which is designed for K-12 students.

If you are not interested in participating please exit the browser or navigate to this website: http://www.umn.edu. Your participation is voluntary and you are free to stop the lesson at any time.

For the full description please navigate to this website: <a href="../../gateway-lesson/gateway/gateway-1.ipynb">Gateway Lesson Research Study Permission</a>.

</font>

# Let's Create a Basic Site Suitability Model

## Goal: 
Find buildings in a city that are suitable candidates for a new coffee shop business.

## Criteria:
The candidate buildings should be:
1. A building type of commercial, retail, or office building
1. At least 400 meters from other coffee shops
1. Close to a bikepath
1. Close to a cinema

## The process:
1. Determine the criteria (done!)
1. Get data
1. Create buffers
1. Assign weights
1. Intersect and sum values

The result is a map showing the site suitability values. Suitability is indicated by the value - high values are highly suitable. 

# Get the data
First, as usual, we need to import the appropriate python packages, with `osmnx` being the most important one since this is where our data come from.

In [None]:
from IPython import get_ipython
import osmnx as ox 
import pandas as pd
import geopandas as gpd
import folium
import matplotlib.pyplot as plt
get_ipython().run_line_magic('matplotlib', 'inline')
#Tells jupyter to plot matplotlib figures inline


Since we're going to call for Minneapolis repeatedly, let's set a variable to store our location. 

[In OSM, you can use standard place names. If you want to run this notebook later for a different place, you can simply put a new placename in here. Remember that since OSM is crowd-sourced, you might not find all the places you want to use in the dataset. However, all major US and global cities are probably there.] 

In [None]:
place = 'Minneapolis, MN'
place

Our criteria require that we get data about coffee shops, bikepaths, cinemas and buildings. The OSM data contains all these kinds of data, but we have to extract each one separately for our model.

In this first block, we'll get cafes whose cuisine is coffee shop in Minneapolis, Minnesota. 
We use osmnx to create a gdf (GeoDataFrame) which is stored in the `coffee_shops` variable.

In [None]:
tags = {'amenity':'cafe', 'cuisine':'coffee-shop'}  
coffee_shops = ox.geometries_from_place(place, tags) 

# Convert to UTM
coffee_shops = coffee_shops.to_crs('epsg:3174') 

coffee_shops.info()
coffee_shops.head()

We got 140 coffee shops, did you? 

And what does this look like?

In [None]:
coffee_shops.plot()

Next we'll get the bikepaths. Since OSM is crowd-sourced, the tagging of features is often inconsistent. Sometimes you need to use more than one tag to find all the features you are looking for. Here we're using three. 

[If you want to know more about OSM tags for mapped features, see https://wiki.openstreetmap.org/wiki/Map_features.] 

In [None]:
tags = {'highway':'cycleway','route':'bicycle','cycleway':True}
bikepaths = ox.geometries_from_place(place, tags)
bikepaths = bikepaths.to_crs('epsg:3174') 
bikepaths.info()
bikepaths.plot()
bikepaths.head()

Now get the cinema point features.

In [None]:
tags = {'amenity':'cinema'} 
cinemas = ox.geometries_from_place(place, tags)
cinemas = cinemas.to_crs('epsg:3174') 
cinemas.info()
cinemas.plot()
cinemas

Finally fetch the footprints (outlines, i.e. polygons) for commerical, retail and office buildings in Minneapolis. This may take some time, so be patient while waiting for the asterisk to change to a number. 

In [None]:
tags = {'building':['commercial','retail','offices']}
buildings = ox.geometries_from_place(place, tags)
buildings = buildings.to_crs('epsg:3174') 
buildings.info()
buildings.plot(figsize = (20,10))
buildings.head()

Now, we have all our data. Let's go back to the criteria so we can see how we need to manipulate these data.

Recall that the candidate buildings should be:

1. A building type of commercial, retail, or office building
1. At least 400 meters from other coffee shops
1. Close to a bikepath
1. Close to a cinema

OK, we've already taken care of criteria #1 by getting data about only buildings of these types. To do criteria #2 we need to create buffers...

## Create buffers

Buffers are used to define the area of influence of features. We'll buffer the coffee shops by 400m as an exclusion zone in which we don't want to select candidate sites. 

In [None]:
coffee_shops_buffer = gpd.GeoDataFrame(coffee_shops.buffer(400), geometry = coffee_shops.buffer(400))
coffee_shops_buffer.plot(figsize = (5,5))

Good. This plot shows all those areas that are within 400m buffer of existing coffee shops. We do not want to include buildings in these areas in our result. 

Now we need to deal with the final two criteria in which locations close to cinemas and bikepaths are more favorable than those that are farther away. Thus places nearby should have higher value in our site selection than places far away - we do this by assigning weights.

## Assign weights

There are many ways to assign weights in site suitability models. Since this is all vector data, we're going to assign weights by creating concentric buffers with declining value as distance from the feature increases. For example, we prefer places that are close to cinemas, so locations that are less than 500 m get a higher weight than places between 500 and 1000 m, and those get more than places 1000 to 1500m away. Anything futher than 1500 gets no weight at all! 

Let's see how this works with our Cinema data.

In [None]:
# Cinema weighting
cinema_df1 = gpd.GeoDataFrame(cinemas.buffer(1500), geometry = cinemas.buffer(1500))
cinema_df2 = gpd.GeoDataFrame(cinemas.buffer(1000), geometry = cinemas.buffer(1000))
cinema_df3 = gpd.GeoDataFrame(cinemas.buffer(500), geometry = cinemas.buffer(500))

cinema3 = cinema_df3
cinema3['weight'] = 3

cinema2 = gpd.overlay(cinema_df2, cinema_df3, how='difference')
cinema2['weight'] = 2

cinema1 = gpd.overlay(cinema_df1, cinema_df2, how='difference')
cinema1['weight'] = 1

cinemas.plot(figsize = (5,5))
cinema3.plot(figsize = (5,5))
cinema2.plot(figsize = (5,5))
cinema1.plot(figsize = (5,5))

cinema2

Note how the buffers nest inside of each other. Weights are 3 for the smallest, 2 for the middle one and 1 for the largest/furthest away. 

MOHSEN - why do this this way. Why not just stick all the data together sequentially?

In [None]:
cinema_w = gpd.overlay(cinema1, cinema2, how='union')
cinema_w = gpd.overlay(cinema_w, cinema3, how='union')
cinema_w.plot(figsize = (10,10))

cinema_w['weights'] = pd.concat([cinema_w['weight_1'].fillna(0).astype('int'), 
                                 cinema_w['weight_2'].fillna(0).astype('int'), 
                                 cinema_w['weight'].fillna(0).astype('int')], axis = 1).max(axis=1)

cinema_w

Now we assign weights to the bikepaths. We'll set only 2 weights - 2 for locations less than 15100 m away and 1 for locations between 150 to 300m. 

In [None]:
bikepaths_df2 = gpd.GeoDataFrame(bikepaths.buffer(150), geometry = bikepaths.buffer(150))
bikepaths_df1 = gpd.GeoDataFrame(bikepaths.buffer(300), geometry = bikepaths.buffer(300))

bikepaths2 = bikepaths_df2
bikepaths2['weight'] = 2

bikepaths1 = gpd.overlay(bikepaths_df1, bikepaths_df2, how='difference')
bikepaths1['weight'] = 1

bikepaths.plot(figsize = (5,5))
bikepaths2.plot(figsize = (5,5))
bikepaths1.plot(figsize = (5,5))

In [None]:
bikepaths_df2 = gpd.GeoDataFrame(bikepaths.buffer(500), geometry = bikepaths.buffer(500))
bikepaths_df1 = gpd.GeoDataFrame(bikepaths.buffer(1000), geometry = bikepaths.buffer(1000))

bikepaths2 = bikepaths_df2
bikepaths2['weight'] = 2

bikepaths1 = gpd.overlay(bikepaths_df1, bikepaths_df2, how='difference')
bikepaths1['weight'] = 1

bikepaths1.plot(figsize = (10,10))
bikepaths2.plot(figsize = (10,10))

bikepaths_w = gpd.overlay(bikepaths1, bikepaths2, how='union')

bikepaths_w.plot(figsize = (10,10))

bikepaths_w['weights'] = pd.concat([bikepaths_w['weight_1'].fillna(0).astype('int'), 
                                 bikepaths_w['weight_2'].fillna(0).astype('int')], axis = 1).max(axis=1)

bikepaths_w

## Intersect and sum values to find the highest value locations

In [None]:

res_union1 = gpd.overlay(bikepaths_w, coffee_shops_buffer, how='difference')

res_union = gpd.overlay(res_union1[res_union1.geometry.type=='Polygon'], cinema_w, how='intersection')

# sum up the weights
res_union['final_weights'] = res_union['weights_1'].fillna(0).astype('int') + res_union['weights_2'].fillna(0).astype('int')

res_union = res_union[['final_weights', 'geometry']] # keep only the final_weights and geometry columns

res_union.columns = ['weight', 'geometry'] # rename the columns 

res_union

In [None]:
res_union.plot(figsize = (10,10), column = 'weight', cmap = 'Reds')


Fetch the building footprints in Minneapolis

In [None]:
import warnings
warnings.filterwarnings('ignore') # Hide warnings

place = "Minneapolis, MN"
tags = {"building": True}
building = ox.geometries_from_place(place, tags)
building = building.to_crs('epsg:3174')

building.plot(figsize = (10,10))


Filtering out the candidate buildings

In [None]:
sites = gpd.overlay(res_union, building[building.geometry.type=='Polygon'], how='intersection')
sites.plot(figsize = (20,20), column = 'weight', cmap = 'Reds')


In [None]:
sites[(sites['building'] == 'commercial') | 
              (sites['building'] == 'retail') | 
              (sites['building'] == 'offices')].plot(figsize = (20,20), column = 'weight', cmap = 'Reds')


In [None]:
warnings.filterwarnings("ignore")
selected_sites = sites[(sites['building'] == 'commercial') | (sites['building'] == 'retail') | 
              (sites['building'] == 'offices')]

selected_sites['geo_area'] = selected_sites.area

selected_sites = selected_sites.to_crs(epsg='4326')
selected_sites['centroid'] = selected_sites.centroid


Visualize the final result on a folium interactive map


In [None]:
m = folium.Map(location = [44.9778, -93.2650], tiles='OpenStreetMap' , zoom_start = 13) # tiles="Stamen Toner"


for _, r in selected_sites.iterrows():

    sim_geo = gpd.GeoSeries(r['geometry']) #.simplify(tolerance=0.001) 
    geo_j = sim_geo.to_json()
    geo_j = folium.GeoJson(data=geo_j, 
                           style_function = lambda x: {'color': 'red', 'weight': 1,  'fillColor': 'YlGnBu'})
    folium.Popup(f"<i>Type: {r['building']}, Area: {r['geo_area']}</i>").add_to(geo_j)

    
    geo_j.add_to(m)
    
    folium.Marker([r['centroid'].y, r['centroid'].x], popup=f"<i>Type: {r['building']}</i>", tooltip=f"<i>Area: {r['geo_area']}</i>").add_to(m)


m

<font size="+1"><a style="background-color:blue;color:white;font-weight:bold;" 
href="./">Click here to go back to the root folder!</a></font>

