## Introduction/Business Problem
A restaurant chain is looking for possible areas in the Tampa FL area that will put them in the best position to succeed. In order to determine the level of success each possible area may have in the area they need to know:
  - What's the general population of the surrounding area?
      - What is the population density per mile?
  - Is the surrounding area growing? 
  - Out of the occupants in each postal region:
      - What is the employment rate?
      - Do residents have disposable income?
  - How much are the additional property tax rates in the area?



## Data Sources

For this problem the following data points will be utilized:
  - Geographic information sourced from: https://www.latlong.net/convert-address-to-lat-long.html
  - Demographic information sourced from: http://www.city-data.com/zipmaps/Tampa-Florida.html

In [1]:
#Import Libraries
import pandas as pd
import numpy as np

from sklearn.cluster import KMeans

import matplotlib.cm as cm
import matplotlib.colors as colors

import requests 

#!conda install -c conda-forge folium=0.5.0 --yes 
import folium 

print("Libraries Imported")

Libraries Imported


In [5]:
import types
from botocore.client import Config
import ibm_boto3

def __iter__(self): return 0

#Source information was converted to CSV and then loaded to studio. 

df1 = pd.read_csv(body)
df1.head()


Unnamed: 0,Zip,population2016,population2010,costoflivingindex,landarea,waterarea,popdensity,permalepop,propertytax,medianhomevalue,medianincome,unemploymentrate,latitude,longitude
0,33602,14353,11515,97.9,2.5,0.4,5630,0.541,0.01,282916,71594,0.04,27.95365,-82.45804
1,33603,19992,19100,95.1,4.1,0.1,4834,0.482,0.007,147840,39366,0.068,27.98444,-82.46339
2,33604,39011,35485,94.4,7.4,0.4,5241,0.488,0.007,121868,37987,0.103,28.01816,-82.45746
3,33605,18581,17073,94.1,7.8,1.4,2373,0.484,0.006,86351,26517,0.137,27.96241,-82.43287
4,33606,18792,17746,100.8,3.3,0.9,5725,0.474,0.008,480479,66282,0.041,27.93964,-82.47066


In [6]:
from sklearn.preprocessing import StandardScaler

X = df1.values[:,1:]
X = np.nan_to_num(X)
cluster_dataset = StandardScaler().fit_transform(X)
cluster_dataset

array([[-9.34397722e-01, -1.07024663e+00,  1.39971886e+00,
        -8.94665473e-01, -6.65896399e-01,  1.12142412e+00,
         2.10017208e+00, -2.01351254e-01,  8.60331128e-01,
         1.04118267e+00, -9.91733488e-01, -5.31158451e-01,
         1.60935770e-01],
       [-5.60818610e-01, -5.15539130e-01, -1.89151197e-01,
        -6.74581205e-01, -1.16531870e+00,  6.15551952e-01,
        -1.25758807e-01, -2.20326765e-01, -4.25616703e-01,
        -6.77431987e-01,  1.95637866e-02, -6.89312512e-02,
         7.77400502e-02],
       [ 6.99174497e-01,  6.82731457e-01, -5.86368711e-01,
        -2.20657404e-01, -6.65896399e-01,  8.74207697e-01,
         1.00607046e-01, -2.20326765e-01, -6.72874818e-01,
        -7.50969581e-01,  1.28368538e+00,  4.37281843e-01,
         1.69955119e-01],
       [-6.54296200e-01, -6.63778037e-01, -7.56604788e-01,
        -1.65636337e-01,  9.98844598e-01, -9.48457360e-01,
        -5.03035228e-02, -2.26651936e-01, -1.01100305e+00,
        -1.36262746e+00,  2.51168921e

In [8]:
num_clusters = 5

k_means = KMeans(init="k-means++", n_clusters=num_clusters, n_init=12)
k_means.fit(cluster_dataset)
labels = k_means.labels_

print(labels)

[2 3 1 3 2 3 2 1 3 1 1 1 1 3 1 3 0 3 4 2 3 3 3 0]


In [9]:
df1["Labels"] = labels
df1.head(5)

Unnamed: 0,Zip,population2016,population2010,costoflivingindex,landarea,waterarea,popdensity,permalepop,propertytax,medianhomevalue,medianincome,unemploymentrate,latitude,longitude,Labels
0,33602,14353,11515,97.9,2.5,0.4,5630,0.541,0.01,282916,71594,0.04,27.95365,-82.45804,2
1,33603,19992,19100,95.1,4.1,0.1,4834,0.482,0.007,147840,39366,0.068,27.98444,-82.46339,3
2,33604,39011,35485,94.4,7.4,0.4,5241,0.488,0.007,121868,37987,0.103,28.01816,-82.45746,1
3,33605,18581,17073,94.1,7.8,1.4,2373,0.484,0.006,86351,26517,0.137,27.96241,-82.43287,3
4,33606,18792,17746,100.8,3.3,0.9,5725,0.474,0.008,480479,66282,0.041,27.93964,-82.47066,2


In [11]:
df1.groupby('Labels').mean()

Unnamed: 0_level_0,Zip,population2016,population2010,costoflivingindex,landarea,waterarea,popdensity,permalepop,propertytax,medianhomevalue,medianincome,unemploymentrate,latitude,longitude
Labels,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
0,33633.0,51009.0,44802.5,94.9,30.25,2.25,1657.0,0.507,0.012,191237.0,59730.5,0.0695,28.034745,-82.37047
1,33612.142857,43449.571429,40377.571429,94.728571,9.985714,0.642857,4657.714286,0.487714,0.008429,126846.571429,36406.142857,0.086857,28.027443,-82.459419
2,33611.5,19243.25,17224.5,98.175,3.75,0.575,5206.0,0.49725,0.009,386331.5,77789.75,0.03725,27.939978,-82.48597
3,33618.6,19741.9,18479.7,94.61,6.24,0.64,3573.8,0.4711,0.0085,162585.9,50576.4,0.0656,27.986552,-82.485694
4,33621.0,2418.0,1643.0,98.7,8.3,1.5,290.0,0.52,0.8,179538.0,58439.0,0.067,27.84974,-82.48365


## Findings - Cluster 2 Zip Codes Found Favorable
    - Based on findings of clusters, best cluster areas are those within cluster 2. 
        - This group had the best overall results when reviewing major variables concerning future profit including:
            - Population Density
            - Population Growth 2010 to 2016
            - Lower Unemployment
            - Favorable Tax Rates
            - Strong Cost of Living Index
            - Population With Disposable Income