# Store CSV to Store Segments

This Jupyter Notebook demonstrates how, by using ArcGIS, you can start with little more than a list of stores with sales volume and coordinates, and end with stores segmented by similar demographic characteristics.

1. Prepare Input Data
2. Create Drive Time Trade Areas
2. Acquire Demographic Analysis Factors Based on Drive Time Trade Areas Around Stores
3. Segment Stores Using KMeans

### Note: Increased IOPub
For visualization, if you did not start this notebook with an increased data rate limit, stop the notebook, go back to the command line, and start Jupyter Notebook using the following command.

`jupyter notebook --NotebookApp.iopub_data_rate_limit=10000000000`

## Prepare Data

The data coming in from the CSV file, while the coordinate locations are present in the data, ArcGIS does not yet know how to recognize the data as _spatial_ for subseqnet analysis steps. To accomplish this, we will load the data into a Pandas DataFrame, convert this into an ArcGIS SpatialDataFrame, and finally create an ArcGIS Feature Set, which we will then use for subsequent analysis.

In [1]:
import pandas as pd
import arcgis

Load the data into a Pandas DataFrame from a CSV file.

In [2]:
df = pd.read_csv('./store_locations.csv', index_col='OBJECTID')
df.head(10)

Unnamed: 0_level_0,LOCNUM,SALESVOL,X,Y
OBJECTID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,666990510,35495,-121.843,36.621
2,653371815,35495,-121.8112,36.6676
3,423468472,35495,-121.9651,36.9753
4,511743478,35495,-121.774,36.9154
5,404459478,52059,-122.0362,37.3231
6,373128867,84715,-121.9907,37.2928
7,402344537,35495,-122.0323,37.3737
8,637354200,35495,-121.8614,37.2505
9,435039879,35495,-121.8039,37.2499
10,230021602,70990,-121.9181,37.2632


While the coordinates for each store are contained in an X (longitude) and Y (latitude) field, the data is not yet able to be recognized spatially. We need to create a point geometry for each location in a new field so the data will be recognized as spatial. Once this is done, we also can get rid of the explicity X and Y fields, since the location is now stored in the SHAPE field.

In [3]:
df['SHAPE'] = df.apply(lambda row: arcgis.geometry.Point({'x': row.X, 'y': row.Y, 'spatialReference': {'wkid': 4326}}), axis=1)
df = df.drop(['X', 'Y'], axis=1)
df.head(10)

Unnamed: 0_level_0,LOCNUM,SALESVOL,SHAPE
OBJECTID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,666990510,35495,"{'spatialReference': {'wkid': 4326}, 'x': -121..."
2,653371815,35495,"{'spatialReference': {'wkid': 4326}, 'x': -121..."
3,423468472,35495,"{'spatialReference': {'wkid': 4326}, 'x': -121..."
4,511743478,35495,"{'spatialReference': {'wkid': 4326}, 'x': -121..."
5,404459478,52059,"{'spatialReference': {'wkid': 4326}, 'x': -122..."
6,373128867,84715,"{'spatialReference': {'wkid': 4326}, 'x': -121..."
7,402344537,35495,"{'spatialReference': {'wkid': 4326}, 'x': -122..."
8,637354200,35495,"{'spatialReference': {'wkid': 4326}, 'x': -121..."
9,435039879,35495,"{'spatialReference': {'wkid': 4326}, 'x': -121..."
10,230021602,70990,"{'spatialReference': {'wkid': 4326}, 'x': -121..."


Now, with the location data properly formatted to be recoginzed as point geometry, we can create a SpatialDataFrame with the store locations so the data will now be recognized as spatial data.

In [4]:
sdf = arcgis.features.SpatialDataFrame(df)
sdf.set_geometry(col='SHAPE')  # assign the properly formatted shape field to be recognized by the SpatialDataFrame
sdf.reset_index(inplace=True, drop=True)
sdf.head(10)

Unnamed: 0,LOCNUM,SALESVOL,SHAPE
0,666990510,35495,"{'spatialReference': {'wkid': 4326}, 'x': -121..."
1,653371815,35495,"{'spatialReference': {'wkid': 4326}, 'x': -121..."
2,423468472,35495,"{'spatialReference': {'wkid': 4326}, 'x': -121..."
3,511743478,35495,"{'spatialReference': {'wkid': 4326}, 'x': -121..."
4,404459478,52059,"{'spatialReference': {'wkid': 4326}, 'x': -122..."
5,373128867,84715,"{'spatialReference': {'wkid': 4326}, 'x': -121..."
6,402344537,35495,"{'spatialReference': {'wkid': 4326}, 'x': -122..."
7,637354200,35495,"{'spatialReference': {'wkid': 4326}, 'x': -121..."
8,435039879,35495,"{'spatialReference': {'wkid': 4326}, 'x': -121..."
9,230021602,70990,"{'spatialReference': {'wkid': 4326}, 'x': -121..."


In [5]:
# get a subset to test with, just the first five records
sdf = sdf[:10]
sdf['SHAPE'].apply(lambda point: (point.x, point.y))

0      (-121.843, 36.621)
1    (-121.8112, 36.6676)
2    (-121.9651, 36.9753)
3     (-121.774, 36.9154)
4    (-122.0362, 37.3231)
5    (-121.9907, 37.2928)
6    (-122.0323, 37.3737)
7    (-121.8614, 37.2505)
8    (-121.8039, 37.2499)
9    (-121.9181, 37.2632)
Name: SHAPE, dtype: object

Convert the SpatailDataFrame to a FeatureSet to use as input for subsequent analysis steps.

__NOTE:__ On 18Aug2017, this will not work unless you have access to the development reposititory, since there was a bug in the `to_featureset` method, which was recently fixed.

In [6]:
fs_store_locations = sdf.to_featureset()
fs_store_locations

{"geometryType": "esriGeometryPoint", "fields": [], "features": [{"geometry": {"spatialReference": {"wkid": 4326}, "x": -121.84299999999992, "y": 36.62100000000007}, "attributes": {"LOCNUM": 666990510, "SALESVOL": 35495}}, {"geometry": {"spatialReference": {"wkid": 4326}, "x": -121.84299999999992, "y": 36.62100000000007}, "attributes": {"LOCNUM": 653371815, "SALESVOL": 35495}}, {"geometry": {"spatialReference": {"wkid": 4326}, "x": -121.84299999999992, "y": 36.62100000000007}, "attributes": {"LOCNUM": 423468472, "SALESVOL": 35495}}, {"geometry": {"spatialReference": {"wkid": 4326}, "x": -121.84299999999992, "y": 36.62100000000007}, "attributes": {"LOCNUM": 511743478, "SALESVOL": 35495}}, {"geometry": {"spatialReference": {"wkid": 4326}, "x": -121.84299999999992, "y": 36.62100000000007}, "attributes": {"LOCNUM": 404459478, "SALESVOL": 52059}}, {"geometry": {"spatialReference": {"wkid": 4326}, "x": -121.84299999999992, "y": 36.62100000000007}, "attributes": {"LOCNUM": 373128867, "SALESVO

## Create Drive Time Trade Areas

Intantiate a Web GIS object instance to use for ArcGIS capabilities.

In [8]:
from getpass import getpass

gis_coldbrew = arcgis.gis.GIS(
    url='http://portal.coldbrew.esri.com/portal',
    username='headless', 
    password=getpass('Please enter the headless password: ')
)

Please enter the headless password: ········


Create a service area layer to use for analysis.

In [9]:
service_area_layer = arcgis.network.ServiceAreaLayer(
    url=gis_coldbrew.properties.helperServices.serviceArea.url, 
    gis=gis_coldbrew
)

Get the travel mode, properly formatted, to use for solving.

In [10]:
travel_modes = service_area_layer.retrieve_travel_modes()
travel_mode_drive = [t for t in travel_modes['supportedTravelModes'] if t['name'] == 'Driving Time'][0]

Create a SpatialDataFrame, and deritive FeatureSet with the columns formatted for input to service area.

In [11]:
sdf_service_area_input = sdf[['LOCNUM', 'SHAPE']].rename(columns={'LOCNUM': 'Name'})
fs_service_area_input = sdf_service_area_input.to_featureset()

Get the trade areas around the stores.

In [12]:
resp_service_area = service_area_layer.solve_service_area(
    facilities=fs_service_area_input, 
    travel_mode=travel_mode_drive, 
    default_breaks=[8],
    out_sr=4326
)
resp_service_area

{'messages': [],
 'saPolygons': {'features': [{'attributes': {'FacilityID': 1,
     'FromBreak': 0,
     'Name': '666990510 : 0 - 8',
     'ObjectID': 1,
     'Shape_Area': 0.002164947423158558,
     'Shape_Length': 1.5493428284911819,
     'ToBreak': 8},
    'geometry': {'rings': [[[-121.86070632899998, 36.58175468400003],
       [-121.86250305199997, 36.58152961700006],
       [-121.86250305199997, 36.57816124000004],
       [-121.86272811899994, 36.57973289500006],
       [-121.86272811899994, 36.57951164200006],
       [-121.86362648, 36.57995796200004],
       [-121.86362648, 36.58040809600004],
       [-121.86272811899994, 36.58018302900007],
       [-121.86295318599997, 36.58175468400003],
       [-121.86632156399997, 36.58355140700007],
       [-121.86699485799994, 36.58489799500006],
       [-121.86519813499996, 36.58332634000004],
       [-121.86429977399996, 36.584451675000025],
       [-121.86452484099999, 36.58288002000006],
       [-121.86362648, 36.58400154100008],
     

Convert the FeatureSet to a SpatialDataFrame to make it easier to clean up the data.

In [13]:
sdf_service_area = arcgis.features.FeatureSet(resp_service_area['saPolygons']['features']).df
sdf_service_area

Unnamed: 0,FacilityID,FromBreak,Name,ObjectID,Shape_Area,Shape_Length,ToBreak,SHAPE
0,1,0,666990510 : 0 - 8,1,0.002165,1.549343,8,"{'rings': [[[-121.86070632899998, 36.581754684..."
1,2,0,653371815 : 0 - 8,2,0.002165,1.549343,8,"{'rings': [[[-121.86070632899998, 36.581754684..."
2,3,0,423468472 : 0 - 8,3,0.002165,1.549343,8,"{'rings': [[[-121.86070632899998, 36.581754684..."
3,4,0,511743478 : 0 - 8,4,0.002165,1.549343,8,"{'rings': [[[-121.86070632899998, 36.581754684..."
4,5,0,404459478 : 0 - 8,5,0.002165,1.549343,8,"{'rings': [[[-121.86070632899998, 36.581754684..."
5,6,0,373128867 : 0 - 8,6,0.002165,1.549343,8,"{'rings': [[[-121.86070632899998, 36.581754684..."
6,7,0,402344537 : 0 - 8,7,0.002165,1.549343,8,"{'rings': [[[-121.86070632899998, 36.581754684..."
7,8,0,637354200 : 0 - 8,8,0.002165,1.549343,8,"{'rings': [[[-121.86070632899998, 36.581754684..."
8,9,0,435039879 : 0 - 8,9,0.002165,1.549343,8,"{'rings': [[[-121.86070632899998, 36.581754684..."
9,10,0,230021602 : 0 - 8,10,0.002165,1.549343,8,"{'rings': [[[-121.86070632899998, 36.581754684..."


Get the LOCNUM out of the Name field, and drop all the fields besides LOCNUM and SHAPE.

In [14]:
sdf_service_area['LOCNUM'] = sdf_service_area['Name'].apply(lambda name: name.split(':')[0])
sdf_service_area = sdf_service_area.drop(['FacilityID', 'FromBreak', 'ObjectID', 'ToBreak', 
                                        'Name', 'Shape_Area', 'Shape_Length'], axis=1)
sdf_service_area

Unnamed: 0,SHAPE,LOCNUM
0,"{'rings': [[[-121.86070632899998, 36.581754684...",666990510
1,"{'rings': [[[-121.86070632899998, 36.581754684...",653371815
2,"{'rings': [[[-121.86070632899998, 36.581754684...",423468472
3,"{'rings': [[[-121.86070632899998, 36.581754684...",511743478
4,"{'rings': [[[-121.86070632899998, 36.581754684...",404459478
5,"{'rings': [[[-121.86070632899998, 36.581754684...",373128867
6,"{'rings': [[[-121.86070632899998, 36.581754684...",402344537
7,"{'rings': [[[-121.86070632899998, 36.581754684...",637354200
8,"{'rings': [[[-121.86070632899998, 36.581754684...",435039879
9,"{'rings': [[[-121.86070632899998, 36.581754684...",230021602


Convert the service area SpatialDataFrame to a FeatureSet for geoenrichment.

In [16]:
fs_service_area = sdf_service_area.to_featureset()
fs_service_area

{"geometryType": "esriGeometryPolygon", "fields": [], "features": [{"geometry": {"rings": [[[-121.86070632899998, 36.58175468400003], [-121.86250305199997, 36.58152961700006], [-121.86250305199997, 36.57816124000004], [-121.86272811899994, 36.57973289500006], [-121.86272811899994, 36.57951164200006], [-121.86362648, 36.57995796200004], [-121.86362648, 36.58040809600004], [-121.86272811899994, 36.58018302900007], [-121.86295318599997, 36.58175468400003], [-121.86632156399997, 36.58355140700007], [-121.86699485799994, 36.58489799500006], [-121.86519813499996, 36.58332634000004], [-121.86429977399996, 36.584451675000025], [-121.86452484099999, 36.58288002000006], [-121.86362648, 36.58400154100008], [-121.86295318599997, 36.58377647400005], [-121.86407470699999, 36.58332634000004], [-121.86385154699997, 36.58265495300003], [-121.86340141299996, 36.58310127300007], [-121.86250305199997, 36.58288002000006], [-121.86227798499999, 36.58332634000004], [-121.86182975799994, 36.58288002000006], [

In [17]:
trade_area_map = gis_coldbrew.map('Meades Ranch, KS', 4)
trade_area_map.basemap = 'gray'
trade_area_map

In [19]:
for index, row in sdf_service_area.iterrows():
    trade_area_map.draw(row.SHAPE)

## Perform Geoenrichment

Since the ArcGIS Python API requires a published layer to use the built in Geoenrichment method, but we just want to use a FeatureSet as input, we utilize the ArcGIS Python API's built in `post` method, which takes care of the token authetication, and also has the `urllib.encode` method built in for converting the payload from a dictionary for the post call.

In [20]:
trade_area_drive_time = 8  # in minutes
study_area_options = '{"areaType":"DriveTimeBuffer","bufferUnits":"esriDriveTimeUnitsMinutes",' + \
        '"bufferRadii":' + '[{drive_time}]'.format(drive_time=trade_area_drive_time) + '}"'
study_area_options = '{"areaType":"DriveTimeBuffer","bufferUnits":"esriDriveTimeUnitsMinutes","bufferRadii":[5]}'

In [21]:
url_geoenrich = gis_coldbrew.properties.helperServices.geoenrichment.url + "/Geoenrichment/Enrich"
payload = {
    'studyAreas': fs_service_area.features,
#    'analysisVariables': enrichment_variables,
    'dataCollections': '["KeyUSFacts"]',
#    'studyAreasOptions': study_area_options,  # this can be used if AGOL or a correctly configured BA Server is used
    'f': 'json'
}
headers = {
    'content-type': "application/x-www-form-urlencoded",
    'cache-control': "no-cache"
}
resp_enrich = gis_coldbrew._con.post(url_geoenrich, postdata=payload)
resp_enrich

{'messages': [],
 'results': [{'dataType': 'GeoEnrichmentResult',
   'paramName': 'GeoEnrichmentResult',
   'value': {'FeatureSet': [{'displayFieldName': '',
      'features': [{'attributes': {'AREA_ID': '0',
         'AVGHHSZ_CY': 2.96,
         'AVGHINC_CY': 69864,
         'AVGHINC_FY': 76498,
         'AVGVAL_CY': 521829,
         'AVGVAL_FY': 581500,
         'DIVINDX_CY': 87.8,
         'FAMGRW10CY': 0.46,
         'FAMGRWCYFY': 0.65,
         'GQPOP_CY': 226,
         'HHGRW10CY': 0.44,
         'HHGRWCYFY': 0.63,
         'HasData': 1,
         'ID': '0',
         'LOCNUM': '666990510 ',
         'MEDHINC_CY': 53514,
         'MEDHINC_FY': 57510,
         'MEDVAL_CY': 508392,
         'MEDVAL_FY': 567260,
         'MHIGRWCYFY': 1.45,
         'OBJECTID': 1,
         'OWNER_CY': 4716,
         'OWNER_FY': 4825,
         'PCIGRWCYFY': 1.68,
         'PCI_CY': 23916,
         'PCI_FY': 25994,
         'POPGRW10CY': 0.68,
         'POPGRWCYFY': 0.75,
         'RENTER_CY': 8110,
   

In [22]:
fs_enrich = arcgis.features.FeatureSet(
    features=resp_enrich['results'][0]['value']['FeatureSet'][0]['features'], 
    fields=resp_enrich['results'][0]['value']['FeatureSet'][0]['fields']
)
fs_enrich

{"fields": [{"type": "esriFieldTypeOID", "name": "OBJECTID", "alias": "Object ID"}, {"length": 256, "type": "esriFieldTypeString", "name": "ID", "alias": "ID"}, {"length": 256, "type": "esriFieldTypeString", "name": "LOCNUM", "alias": "LOCNUM"}, {"length": 256, "type": "esriFieldTypeString", "name": "sourceCountry", "alias": "sourceCountry"}, {"length": 256, "type": "esriFieldTypeString", "name": "AREA_ID", "alias": "AREA_ID"}, {"type": "esriFieldTypeInteger", "name": "HasData", "alias": "HasData"}, {"length": 256, "type": "esriFieldTypeString", "name": "aggregationMethod", "alias": "aggregationMethod"}, {"units": "count", "component": "demographics", "fullName": "KeyUSFacts.AVGHHSZ_CY", "decimals": 2, "type": "esriFieldTypeDouble", "alias": "2016 Average Household Size", "name": "AVGHHSZ_CY"}, {"units": "currency", "component": "demographics", "fullName": "KeyUSFacts.AVGHINC_CY", "decimals": 0, "type": "esriFieldTypeDouble", "currency": "$", "alias": "2016 Average Household Income", "

In [23]:
df_enrich = fs_enrich.df
df_enrich

Unnamed: 0_level_0,AREA_ID,AVGHHSZ_CY,AVGHINC_CY,AVGHINC_FY,AVGVAL_CY,AVGVAL_FY,DIVINDX_CY,FAMGRW10CY,FAMGRWCYFY,GQPOP_CY,...,TOTHU_CY,TOTHU_FY,TOTPOP00,TOTPOP10,TOTPOP_CY,TOTPOP_FY,VACANT_CY,VACANT_FY,aggregationMethod,sourceCountry
OBJECTID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,0,2.96,69864,76498,521829,581500,87.8,0.46,0.65,226,...,13810,14236,41925,36591,38170,39622,984,1003,BlockApportionment:US.BlockGroups,US
2,1,2.96,69864,76498,521829,581500,87.8,0.46,0.65,226,...,13810,14236,41925,36591,38170,39622,984,1003,BlockApportionment:US.BlockGroups,US
3,2,2.96,69864,76498,521829,581500,87.8,0.46,0.65,226,...,13810,14236,41925,36591,38170,39622,984,1003,BlockApportionment:US.BlockGroups,US
4,3,2.96,69864,76498,521829,581500,87.8,0.46,0.65,226,...,13810,14236,41925,36591,38170,39622,984,1003,BlockApportionment:US.BlockGroups,US
5,4,2.96,69864,76498,521829,581500,87.8,0.46,0.65,226,...,13810,14236,41925,36591,38170,39622,984,1003,BlockApportionment:US.BlockGroups,US
6,5,2.96,69864,76498,521829,581500,87.8,0.46,0.65,226,...,13810,14236,41925,36591,38170,39622,984,1003,BlockApportionment:US.BlockGroups,US
7,6,2.96,69864,76498,521829,581500,87.8,0.46,0.65,226,...,13810,14236,41925,36591,38170,39622,984,1003,BlockApportionment:US.BlockGroups,US
8,7,2.96,69864,76498,521829,581500,87.8,0.46,0.65,226,...,13810,14236,41925,36591,38170,39622,984,1003,BlockApportionment:US.BlockGroups,US
9,8,2.96,69864,76498,521829,581500,87.8,0.46,0.65,226,...,13810,14236,41925,36591,38170,39622,984,1003,BlockApportionment:US.BlockGroups,US
10,9,2.96,69864,76498,521829,581500,87.8,0.46,0.65,226,...,13810,14236,41925,36591,38170,39622,984,1003,BlockApportionment:US.BlockGroups,US
