## A moving target

The director of market research for a company that develops, designs, and manages resort-style retirement communities has been tasked with identifying candidate locations for a new facility. He knows he needs to be creative. While it may have been sufficient in the United States ten years ago to build sprawling developments in the warmest parts of the country, people approaching retirement today are not as willing to relocate. Many want to stay connected to friends and family, remain close to existing doctors, continue to work, to enjoy local cultural and educational opportunities, and to be surrounded by people of all ages. Consequently, as a first pass, he decides to look for locations projected to have large numbers of senior citizens but very few existing options for residential retirement. He will then narrow these locations by ranking how similar each candidate is to the company's current most successful resort retirement community.

![](locating_community_images/image1.png)

### What data is needed?
He will first model supply versus demand for retirement housing opportunities.

For the demand component of his model, he needs a variable representing potential retirement community residents. Since a new facility will not be open for a couple years, he obtains the projected 2019 age 55 and older population data, by ZIP Code, from Business Analyst.

The supply component of the model proves a bit more difficult. Retirement facilities range from private homes accommodating one or two people to whole villages housing as many as 100,000 residents. While he can easily get the number of businesses within each ZIP Code that are classified as retirement communities, retirement homes, independent living facilities, or senior citizen housing (SIC codes 805904, 805918, and 836114), his business data does not include information about the number of residents or the number of residential units associated with each facility.

He decides to use the number of employees associated with these facilities as a surrogate for retirement community size, at least until better information is available.

![](locating_community_images/image2.png)

He also gets housing unit vacancy data from Business Analyst since locations with high vacancy rates generally reflect low demand for new housing.

### Where is the demand for retirement housing highest in relation to supply?
While it may be tempting to calculate supply versus demand as a simple ratio—projected age 55+ population, divided by an estimate of retirement community housing resources (number of employees, for now)—this is problematic for at least three reasons:

**Division by zero**: For ZIP Codes without any retirement community facilities, the denominator will be zero and the ratio will be undefined. If these ZIP Codes are removed from the analysis as a workaround, it will eliminate most of the ZIP Codes in the contiguous United States (see the map below) and will likely remove the very high-demand locations the director of market research is hoping to discover.<br>
**Small numbers problem**: Extreme ratios due to small numerators and small denominators can also be a problem because they are unstable. The best example of this is mortality rates. Suppose you have a community with only two people and one dies from cancer. The cancer mortality rate for this community would be a very high 50 percent, an outlier that could throw off subsequent analyses. Similarly, a ZIP Code with very few senior citizens and very few retirement housing resources will result in ratios that are unstable.

![](locating_community_images/image3.png)

ZIP Codes impacted by zero divide or the small numbers problem.<br>
**Boundaries**: In addition, looking at a ZIP Code in isolation can be misleading. If a ZIP Code has lots of residential retirement opportunities and very few senior citizens, we might say it has more supply than demand. But what if it is surrounded by ZIP Codes with lots of senior citizens and no residential retirement opportunities? We will get a better picture of supply and demand if we evaluate each ZIP Code within the context of its neighboring ZIP Codes.
Spatial Outliers

![](locating_community_images/image4.png)

To address the division by zero and small numbers problems, the director of market research creates a level of service variable (L). The underlying assumption for the level of service variable is one of equity. If a ZIP Code contains 8 percent of the Country's projected 55+ population, it should also contain 8 percent of the Country's retirement community resources.

How does he construct this variable? It's easy. He subtracts a supply ratio from a demand ratio:

![](locating_community_images/image5.png)

The demand ratio Demand ratio **($d_{i}$/D)** is the projected age 55+ population in a ZIP Code, (di), divided by the projected age 55+ population in all ZIP Codes (D). Because the denominator is a count of all age 55 or older people in all ZIP Codes, it will never be zero or small (unstable).

Similarly, the supply ratio,**($r_{i}$/R)**, Supply ratio, is the estimated number of retirement community employees in a ZIP Code (ri), divided by the total number of retirement community employees in all ZIP Codes (R).

Here are some examples of how this plays out:

Supply is equal to demand: When supply equals demand, L is zero. Suppose a ZIP Code contains 5 percent of the Country's senior citizens and 5 percent of the Country's retirement community employees. When you subtract the supply ratio from the demand ratio (5 - 5 = 0), the result is zero.
Demand exceeds supply: When demand for retirement housing is larger than supply, L is a positive number. Suppose a ZIP Code contains 10 percent of the Country's senior citizen population but only 2 percent of the Country's residential retirement employees. When you subtract the supply ratio from the demand ratio (10 - 2 = 8), the result is a positive number.
Supply exceeds demand: When the supply of retirement housing opportunities is larger than demand, on the other hand, L is a negative number. For example, if you have a ZIP Code containing 3 percent of all senior citizens but 12 percent of all residential retirement employees, the difference (3 - 12 = -9) is a negative number.
To address the boundary issue, the director of market research uses Hot Spot Analysis on the level of service variable (L) which balances the surpluses or deficits within each ZIP Code with the surpluses and deficits for surrounding ZIP Codes. A spatial cluster of large positive values that is not balanced by nearby negative values will be identified as a hot spot for demand. Similarly, a spatial cluster of negative values that isn't balanced by nearby positive values will be identified as a cold spot for demand. The map below shows the results of this analysis.

![](locating_community_images/image6.png)

### Where are vacancy rates lowest? Which areas are projected to have the largest number of people age 55 and older?<BR>
Next, the director will take into account vacancy rates (2014) and the projected
(2019) number of people, age 55 and older, across the country. The hot spot analysis maps for these variables are shown below.

![](locating_community_images/image7.png)

Vacancy rate hot spot analysis results
The red areas have high housing unit vacancy rates.

![](locating_community_images/image8.png)

### Where are vacancies lowest and demand for retirement housing highest?
The director selects ZIP Codes within statistically significant hot or cold spot areas that meet all of these criteria:

High demand and low supply of retirement housing opportunities
Low housing unit vacancy rates
Large projected age 55 and older populations
He finds that there are 898 ZIP Codes satisfying all three of these criteria. These become the candidate ZIP Codes for further analysis.

![](locating_community_images/image9.png)

## Worlflow using Python ArcGIS.

Connect your ArcGIS online organization.

In [1]:
from arcgis import *

In [2]:
gis = GIS("https://deldev.maps.arcgis.com", "demo_deldev", "P@ssword123")

Accessing the content property of your gis object you can use the `search()` method. 
Search for **CrimeAnalysisData** content made by other users by turning the **outside_org** to True.

In [3]:
items = gis.content.search('title: LocatingRetirementCommunity owner:lscott_ANGP', 'Feature layer',
                           outside_org=True)

Import the display module to see items

In [4]:
from IPython.display import display

In [5]:
for item in items:
    display(item)

In [6]:
LocatingRetirementCommunity = items[0]

Since the item is a Feature Layer Collection, accessing the layers property will give us a list of FeatureLayer objects.

In [7]:
lyrs = LocatingRetirementCommunity.layers

In [8]:
for lyr in lyrs:
    print(lyr.properties.name)

TargetCommunity
Candidates


Two layers are added to the map. The first layer, called **Target Communities**, contains the current best performing retirement community near Knoxville, Tennessee. The second layer, called** Candidates**, contains the 898 ZIP Codes in the continental USA associated with statistically significant hot or cold spot areas for all of these criteria:

- High demand for retirement housing opportunities
- Low supply of retirement housing
- Low housing unit vacancy rates
- Large projected age 55 and older populations

In [9]:
target_community = lyrs[0]
candidates = lyrs[1]

In [20]:
LocatingRetirementCommunity_map = gis.map('Knoxville', zoomlevel=9)

In [21]:
LocatingRetirementCommunity_map

In [22]:
LocatingRetirementCommunity_map.add_layer(target_community)

![](locating_community_images/image10.png)

### Proximity analysis


Proximity analysis tools help you answer one of the most common questions posed in spatial analysis: What is near what?

Proximity tools are available under the sub module `use_proximity` in the `features` module of the API.you can use create_drive_time_areas tool  to create a 5-mile drive distance around the best performing community.

### Create a 5-mile drive time polygon around the best performing community.

In [23]:
from arcgis.features.use_proximity import create_drive_time_areas

In [26]:
target_area = create_drive_time_areas(target_community, break_values=[5], break_units='Miles',
                        overlap_policy='Overlap',
                        output_name='drive_time_areas')

In [27]:
target_area

In [43]:
target_area_map = gis.map('Knoxville', zoomlevel=10)

In [41]:
target_area_map

In [42]:
target_area_map.add_layer(target_area)

![](locating_community_images/image11.png)

### Determine the top tapestry segments.
You will be looking for ZIP Codes that are similar to the area surrounding the best performing retirement community. You will take advantage of tapestry variables because they summarize so many aspects of a population, such as age, income, home value, occupation, education, and consumer spending behaviors. To identify the top tapestries within the 5-mile drive distance area, you will obtain and compare all 68 tapestry segments. You will also obtain the tapestry base variable so you can calculate percentages.

### Enriching study areas


The enrich_layer tool gives you demographic and landascape data for the people, places, and businesses in a specific area, or within a selected travel time or distance from a location.

To obtain the tapestry and demographic data for the area, we will use `enrich_layer` tool fromm the enrich_data module.

In [47]:
from arcgis.features.enrich_data import enrich_layer

In [49]:
target_area_data = enrich_layer(target_area,
                                analysis_variables=["lifemodegroupsNEW.THHGRPL1","lifemodegroupsNEW.THHGRPL2","lifemodegroupsNEW.THHGRPL3","lifemodegroupsNEW.THHGRPL4","lifemodegroupsNEW.THHGRPL5","lifemodegroupsNEW.THHGRPL6","lifemodegroupsNEW.THHGRPL7","lifemodegroupsNEW.THHGRPL8","lifemodegroupsNEW.THHGRPL9","lifemodegroupsNEW.THHGRPL10","lifemodegroupsNEW.THHGRPL11","lifemodegroupsNEW.THHGRPL12","lifemodegroupsNEW.THHGRPL13","lifemodegroupsNEW.THHGRPL14","AtRisk.THHBASE","tapestryhouseholdsNEW.THH01","tapestryhouseholdsNEW.THH02","tapestryhouseholdsNEW.THH03","tapestryhouseholdsNEW.THH04","tapestryhouseholdsNEW.THH05","tapestryhouseholdsNEW.THH06","tapestryhouseholdsNEW.THH07","tapestryhouseholdsNEW.THH08","tapestryhouseholdsNEW.THH09","tapestryhouseholdsNEW.THH10","tapestryhouseholdsNEW.THH11","tapestryhouseholdsNEW.THH12","tapestryhouseholdsNEW.THH13","tapestryhouseholdsNEW.THH14","tapestryhouseholdsNEW.THH15","tapestryhouseholdsNEW.THH16","tapestryhouseholdsNEW.THH17","tapestryhouseholdsNEW.THH18","tapestryhouseholdsNEW.THH19","tapestryhouseholdsNEW.THH20","tapestryhouseholdsNEW.THH21","tapestryhouseholdsNEW.THH22","tapestryhouseholdsNEW.THH23","tapestryhouseholdsNEW.THH24","tapestryhouseholdsNEW.THH25","tapestryhouseholdsNEW.THH26","tapestryhouseholdsNEW.THH27","tapestryhouseholdsNEW.THH28","tapestryhouseholdsNEW.THH29","tapestryhouseholdsNEW.THH30","tapestryhouseholdsNEW.THH31","tapestryhouseholdsNEW.THH32","tapestryhouseholdsNEW.THH33","tapestryhouseholdsNEW.THH34","tapestryhouseholdsNEW.THH35","tapestryhouseholdsNEW.THH36","tapestryhouseholdsNEW.THH37","tapestryhouseholdsNEW.THH38","tapestryhouseholdsNEW.THH39","tapestryhouseholdsNEW.THH40","tapestryhouseholdsNEW.THH41","tapestryhouseholdsNEW.THH42","tapestryhouseholdsNEW.THH43","tapestryhouseholdsNEW.THH44","tapestryhouseholdsNEW.THH45","tapestryhouseholdsNEW.THH46","tapestryhouseholdsNEW.THH47","tapestryhouseholdsNEW.THH48","tapestryhouseholdsNEW.THH49","tapestryhouseholdsNEW.THH50","tapestryhouseholdsNEW.THH51","tapestryhouseholdsNEW.THH52","tapestryhouseholdsNEW.THH53","tapestryhouseholdsNEW.THH54","tapestryhouseholdsNEW.THH55","tapestryhouseholdsNEW.THH56","tapestryhouseholdsNEW.THH57","tapestryhouseholdsNEW.THH58","tapestryhouseholdsNEW.THH59","tapestryhouseholdsNEW.THH60","tapestryhouseholdsNEW.THH61","tapestryhouseholdsNEW.THH62","tapestryhouseholdsNEW.THH63","tapestryhouseholdsNEW.THH64","tapestryhouseholdsNEW.THH65","tapestryhouseholdsNEW.THH66","tapestryhouseholdsNEW.THH67","tapestryhouseholdsNEW.THH68","tapestryhouseholdsNEW.THHGRPU1","tapestryhouseholdsNEW.THHGRPU2","tapestryhouseholdsNEW.THHGRPU3","tapestryhouseholdsNEW.THHGRPU4","tapestryhouseholdsNEW.THHGRPU5","tapestryhouseholdsNEW.THHGRPU6","KeyUSFacts.POPGRWCYFY","AtRisk.TOTPOP_CY","industry.UNEMPRT_CY"],
                                output_name="enrich_with_tapestry")

In [50]:
target_area_data

In [51]:
data = target_area_data.layers[0]

Convert the layer into pandas dataframe to analyze top 4 tapestries. Determine the top four tapestries associated with the largest counts. 

In [54]:
from arcgis.features import SpatialDataFrame

In [55]:
sdf = SpatialDataFrame.from_layer(data)

In [56]:
sdf[['THH01','THH02','THH03','THH04','THH05','THH06',
                    'THH07','THH08','THH09','THH10','THH11','THH12',
                    'THH13','THH14','THH15','THH16','THH17','THH18',
                    'THH19','THH20','THH21','THH22','THH23','THH24',
                    'THH25','THH26','THH27','THH28','THH29','THH30',
                    'THH31','THH32','THH33','THH34','THH35','THH36',
                    'THH37','THH38','THH39','THH40','THH41','THH42',
                    'THH43','THH44','THH45','THH46','THH47','THH48',
                    'THH49','THH50','THH51','THH52','THH53','THH54',
                    'THH55','THH56','THH57','THH58','THH59','THH60',
                    'THH61','THH62','THH63','THH64','THH65','THH66',
                    'THH67','THH68','THHGRPL1','THHGRPL2','THHGRPL3',
                    'THHGRPL4','THHGRPL5','THHGRPL6','THHGRPL7',
                    'THHGRPL8','THHGRPL9','THHGRPL10',
                    'THHGRPL11','THHGRPL12','THHGRPL13','THHGRPL14',
                    'THHGRPU1','THHGRPU2','THHGRPU3','THHGRPU4','THHGRPU5',
                    'THHGRPU6','p14Seg1B','p14Seg5A','p14Seg5B','p14Seg8C']].T.sort_values(by=0, ascending=False).head(4).index 

Index(['THHGRPU3', 'THHGRPU4', 'THHGRPL5', 'THHGRPL1'], dtype='object')

### Convert the top four target area tapestry counts to percentages.
Rather than counts, you will want to compare tapestry percentages between each candidate ZIP Code and the target area.

Create four new fields to hold the tapestry percentages.<br>
You can create a feature layer instance to add new fields into your layer.

A feature service serves a collection of feature layers and tables, with the associated relationships among the entities. It is represented by arcgis.features.FeatureLayerCollection in the ArcGIS Python API.

Instances of FeatureLayerCollection can be constructed using a feature service, as shown below:

In [57]:
from arcgis.features import FeatureLayerCollection

In [58]:
target_data = FeatureLayerCollection.fromitem(target_area_data)

The collection of layers and tables in a FeatureLayerCollection can be accessed using the layers and tables properties respectively:

In [59]:
data_layer = target_data.layers[0]

In [60]:
data_layer.manager.add_to_definition({"fields":[{"name":"THHGRPU3PERC",
                                                 "type":"esriFieldTypeDouble",
                                                 "alias":"2017 HHs: Urbanization Group 3 PERC",
                                                 "nullable":True,"editable":True,"length":256}]})

{'success': True}

In [62]:
data_layer.manager.add_to_definition({"fields":[{"name":"THHGRPU4PERC",
                                         "type":"esriFieldTypeDouble",
                                         "alias":"2017 HHs: Urbanization Group 4 PERC",
                                         "nullable":True,"editable":True,"length":256}]})

{'success': True}

In [63]:
data_layer.manager.add_to_definition({"fields":[{"name":"THHGRPL5PERC",
                                         "type":"esriFieldTypeDouble",
                                         "alias":"2017 HHs: LifeMode Group 5 PERC",
                                         "nullable":True,"editable":True,"length":256}]})

{'success': True}

In [64]:
data_layer.manager.add_to_definition({"fields":[{"name":"THH17PERC",
                                         "type":"esriFieldTypeDouble",
                                         "alias":"2017 HHs in Tapestry Seg 5B PERC",
                                         "nullable":True,"editable":True,"length":256}]})

{'success': True}

In [65]:
target_area_data

Refresh to update the fields in your layer.

In [66]:
target_data.manager.refresh()

{'success': True}

In [67]:
df = SpatialDataFrame.from_layer(data_layer)

In [69]:
df[['THHGRPU4PERC','THHGRPU3PERC','THHGRPL5PERC','THH17PERC']]

Unnamed: 0,THHGRPU4PERC,THHGRPU3PERC,THHGRPL5PERC,THH17PERC
0,,,,


Calculate percentage and add to the fields.

In [70]:
data_layer.calculate('1=1', 
                     calc_expression=[{"field":"THHGRPU3PERC","sqlExpression":"THHGRPU3 / THHBASE"}])

{'success': True, 'updatedFeatureCount': 1}

In [71]:
data_layer.calculate('1=1', 
                     calc_expression=[{"field":"THHGRPU4PERC","sqlExpression":"THHGRPU4 / THHBASE"}])

{'success': True, 'updatedFeatureCount': 1}

In [72]:
data_layer.calculate('1=1', 
                     calc_expression=[{"field":"THHGRPL5PERC","sqlExpression":"THHGRPL5 / THHBASE"}])

{'success': True, 'updatedFeatureCount': 1}

In [73]:
data_layer.calculate('1=1', 
                     calc_expression=[{"field":"THH17PERC","sqlExpression":"THH17 / THHBASE"}])

{'success': True, 'updatedFeatureCount': 1}

In [74]:
sf = SpatialDataFrame.from_layer(data_layer)

In [75]:
sf[['THHGRPU3PERC', 'THHGRPU4PERC', 'THHGRPL5PERC', 'THH17PERC']]

Unnamed: 0,THHGRPU3PERC,THHGRPU4PERC,THHGRPL5PERC,THH17PERC
0,0.4148,0.329753,0.295511,0.200296


### Obtain the same data for the candidate ZIP Codes.


In [76]:
candidates_data = enrich_layer(candidates,
                                analysis_variables=["lifemodegroupsNEW.THHGRPL1","lifemodegroupsNEW.THHGRPL2","lifemodegroupsNEW.THHGRPL3","lifemodegroupsNEW.THHGRPL4","lifemodegroupsNEW.THHGRPL5","lifemodegroupsNEW.THHGRPL6","lifemodegroupsNEW.THHGRPL7","lifemodegroupsNEW.THHGRPL8","lifemodegroupsNEW.THHGRPL9","lifemodegroupsNEW.THHGRPL10","lifemodegroupsNEW.THHGRPL11","lifemodegroupsNEW.THHGRPL12","lifemodegroupsNEW.THHGRPL13","lifemodegroupsNEW.THHGRPL14","AtRisk.THHBASE","tapestryhouseholdsNEW.THH01","tapestryhouseholdsNEW.THH02","tapestryhouseholdsNEW.THH03","tapestryhouseholdsNEW.THH04","tapestryhouseholdsNEW.THH05","tapestryhouseholdsNEW.THH06","tapestryhouseholdsNEW.THH07","tapestryhouseholdsNEW.THH08","tapestryhouseholdsNEW.THH09","tapestryhouseholdsNEW.THH10","tapestryhouseholdsNEW.THH11","tapestryhouseholdsNEW.THH12","tapestryhouseholdsNEW.THH13","tapestryhouseholdsNEW.THH14","tapestryhouseholdsNEW.THH15","tapestryhouseholdsNEW.THH16","tapestryhouseholdsNEW.THH17","tapestryhouseholdsNEW.THH18","tapestryhouseholdsNEW.THH19","tapestryhouseholdsNEW.THH20","tapestryhouseholdsNEW.THH21","tapestryhouseholdsNEW.THH22","tapestryhouseholdsNEW.THH23","tapestryhouseholdsNEW.THH24","tapestryhouseholdsNEW.THH25","tapestryhouseholdsNEW.THH26","tapestryhouseholdsNEW.THH27","tapestryhouseholdsNEW.THH28","tapestryhouseholdsNEW.THH29","tapestryhouseholdsNEW.THH30","tapestryhouseholdsNEW.THH31","tapestryhouseholdsNEW.THH32","tapestryhouseholdsNEW.THH33","tapestryhouseholdsNEW.THH34","tapestryhouseholdsNEW.THH35","tapestryhouseholdsNEW.THH36","tapestryhouseholdsNEW.THH37","tapestryhouseholdsNEW.THH38","tapestryhouseholdsNEW.THH39","tapestryhouseholdsNEW.THH40","tapestryhouseholdsNEW.THH41","tapestryhouseholdsNEW.THH42","tapestryhouseholdsNEW.THH43","tapestryhouseholdsNEW.THH44","tapestryhouseholdsNEW.THH45","tapestryhouseholdsNEW.THH46","tapestryhouseholdsNEW.THH47","tapestryhouseholdsNEW.THH48","tapestryhouseholdsNEW.THH49","tapestryhouseholdsNEW.THH50","tapestryhouseholdsNEW.THH51","tapestryhouseholdsNEW.THH52","tapestryhouseholdsNEW.THH53","tapestryhouseholdsNEW.THH54","tapestryhouseholdsNEW.THH55","tapestryhouseholdsNEW.THH56","tapestryhouseholdsNEW.THH57","tapestryhouseholdsNEW.THH58","tapestryhouseholdsNEW.THH59","tapestryhouseholdsNEW.THH60","tapestryhouseholdsNEW.THH61","tapestryhouseholdsNEW.THH62","tapestryhouseholdsNEW.THH63","tapestryhouseholdsNEW.THH64","tapestryhouseholdsNEW.THH65","tapestryhouseholdsNEW.THH66","tapestryhouseholdsNEW.THH67","tapestryhouseholdsNEW.THH68","tapestryhouseholdsNEW.THHGRPU1","tapestryhouseholdsNEW.THHGRPU2","tapestryhouseholdsNEW.THHGRPU3","tapestryhouseholdsNEW.THHGRPU4","tapestryhouseholdsNEW.THHGRPU5","tapestryhouseholdsNEW.THHGRPU6","KeyUSFacts.POPGRWCYFY","AtRisk.TOTPOP_CY","industry.UNEMPRT_CY"],
                                output_name="enrich_candidates_with_tapestry")

In [77]:
cand_data = FeatureLayerCollection.fromitem(candidates_data)

In [78]:
cand_layer = cand_data.layers[0]

In [79]:
cand_layer.manager.add_to_definition({"fields":[{"name":"THHGRPU3PERC",
                                                 "type":"esriFieldTypeDouble",
                                                 "alias":"2017 HHs: Urbanization Group 3 PERC",
                                                 "nullable":True,"editable":True,"length":256}]})

{'success': True}

In [80]:
cand_layer.manager.add_to_definition({"fields":[{"name":"THHGRPU4PERC",
                                         "type":"esriFieldTypeDouble",
                                         "alias":"2017 HHs: Urbanization Group 4 PERC",
                                         "nullable":True,"editable":True,"length":256}]})

{'success': True}

In [81]:
cand_layer.manager.add_to_definition({"fields":[{"name":"THHGRPL5PERC",
                                         "type":"esriFieldTypeDouble",
                                         "alias":"2017 HHs: LifeMode Group 5 PERC",
                                         "nullable":True,"editable":True,"length":256}]})

{'success': True}

In [82]:
cand_layer.manager.add_to_definition({"fields":[{"name":"THH17PERC",
                                         "type":"esriFieldTypeDouble",
                                         "alias":"2017 HHs in Tapestry Seg 5B PERC",
                                         "nullable":True,"editable":True,"length":256}]})

{'success': True}

In [83]:
rf = SpatialDataFrame.from_layer(cand_layer)

In [84]:
rf.THHBASE.sort_values().head()

789    0
469    0
715    0
854    0
369    0
Name: THHBASE, dtype: int64

 Notice that some of the base counts are zero. If you try to create the percentages with these zero values, you will get a zero divide. Filter these zero (or very small) population ZIP Codes, excluding them from further analysis.

In [85]:
cand_layer.calculate('THHBASE > 0', 
                     calc_expression=[{"field":"THHGRPU3PERC","sqlExpression":"THHGRPU3 / THHBASE"}])

{'success': True, 'updatedFeatureCount': 886}

In [86]:
cand_layer.calculate('THHBASE > 0', 
                     calc_expression=[{"field":"THHGRPU4PERC","sqlExpression":"THHGRPU4 / THHBASE"}])

{'success': True, 'updatedFeatureCount': 886}

In [87]:
cand_layer.calculate('THHBASE > 0', 
                     calc_expression=[{"field":"THHGRPL5PERC","sqlExpression":"THHGRPL5 / THHBASE"}])

{'success': True, 'updatedFeatureCount': 886}

In [88]:
cand_layer.calculate('THHBASE > 0', 
                     calc_expression=[{"field":"THH17PERC","sqlExpression":"THH17 / THHBASE"}])

{'success': True, 'updatedFeatureCount': 886}

In [89]:
cf = SpatialDataFrame.from_layer(cand_layer)

In [90]:
cf[['THHGRPU3PERC', 'THHGRPU4PERC', 'THHGRPL5PERC', 'THH17PERC']]

Unnamed: 0,THHGRPU3PERC,THHGRPU4PERC,THHGRPL5PERC,THH17PERC
0,0.000000,0.771362,0.176322,0.000000
1,0.037032,0.000000,0.000000,0.000000
2,0.000000,0.222418,0.000000,0.000000
3,0.000000,1.000000,0.000000,0.000000
4,0.000000,0.851347,0.000000,0.000000
5,0.000000,0.310435,0.000000,0.000000
6,0.000000,1.000000,0.000000,0.000000
7,0.057197,0.344327,0.000000,0.000000
8,0.109071,0.488638,0.000000,0.000000
9,0.067009,0.829081,0.000000,0.000000


### Rank the candidate ZIP Codes by their similarity to the target area.

In [91]:
from arcgis.features.find_locations import find_similar_locations

In [92]:
top_4_most_similar_results = find_similar_locations(data_layer,cand_layer,
                       analysis_fields=['THHGRPU3','THHGRPU4','THHGRPL5','THH17','POPDENS14','FAMGRW10_14','UNEMPRT_CY'],
                                        output_name = "top_4_similar_locations",
                                                    number_of_results=4)

In [93]:
top_4_most_similar_results

In [94]:
map1 = gis.map('Atlanta', zoomlevel= 10)
map1.add_layer(top_4_most_similar_results)

In [95]:
map2 = gis.map('Houston', zoomlevel= 10)
map2.add_layer(top_4_most_similar_results)

In [96]:
from ipywidgets import *

map1.layout=Layout(flex='1 1', padding='10px')
map2.layout=Layout(flex='1 1', padding='10px')

box = HBox([map1, map2])
box

![](locating_community_images/image12.png)

Two of the top ZIP Codes will be located near Houston and two will be located near Atlanta.