# 1. Introduction

## Background

I was recently offered two jobs: one is based Miami, FL; the other is based in Columbia, SC. I'm really torn on which job to take given that: 
1. both jobs are with reputable companies;
2. both jobs are great fit for my background and experience, where I can continue doing what I've been trained for and becoming good at;
3. the people that I'd report to at both companies are very respectable and easy to get along with
4. one is a well-established multinational firm, the other is a regional leading firm but I will have more exposure to senior management
4. both offer competitive pays

Since I'm having a hard time making a decision solely based on the aspect of career progression between the two job offers, I'm going to leverage what I have learned in the machine learning module and the previous Capstone project to look at the dining and entertainment options in both locations. Hopefully, this can help me to make a more informed decision.

In [2]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


# 2. Data


In this section, I'm going to get the location for the two cities: Miami and Columbia.

## 2.1 Get neighborhood information for Miami, FL

### 2.1.1 Get coordinates for Miami

In [3]:
miami = pd.read_html('https://en.wikipedia.org/wiki/List_of_neighborhoods_in_Miami')

miami = miami[0]
miami = miami.drop([11,25])
miami.head()

Unnamed: 0,Neighborhood,Demonym,Population2010,Population/Km²,Sub-neighborhoods,Coordinates
0,Allapattah,,54289,4401,,25.815-80.224
1,Arts & Entertainment District,,11033,7948,,25.799-80.190
2,Brickell,Brickellite,31759,14541,West Brickell,25.758-80.193
3,Buena Vista,,9058,3540,Buena Vista East Historic District and Design ...,25.813-80.192
4,Coconut Grove,Grovite,20076,3091,"Center Grove, Northeast Coconut Grove, Southwe...",25.712-80.257


### 2.1.2 Split coordinates into latitude and longitude

In [4]:
for index, row in miami.iterrows():
    miami.loc[index,'Lat'] = miami.Coordinates[index].split('-')[0]
    miami.loc[index,'Lng'] = '-'+miami.Coordinates[index].split('-')[1]

miami.Lat = miami.Lat.astype(float)
miami.Lng = miami.Lng.astype(float)
miami.head()

Unnamed: 0,Neighborhood,Demonym,Population2010,Population/Km²,Sub-neighborhoods,Coordinates,Lat,Lng
0,Allapattah,,54289,4401,,25.815-80.224,25.815,-80.224
1,Arts & Entertainment District,,11033,7948,,25.799-80.190,25.799,-80.19
2,Brickell,Brickellite,31759,14541,West Brickell,25.758-80.193,25.758,-80.193
3,Buena Vista,,9058,3540,Buena Vista East Historic District and Design ...,25.813-80.192,25.813,-80.192
4,Coconut Grove,Grovite,20076,3091,"Center Grove, Northeast Coconut Grove, Southwe...",25.712-80.257,25.712,-80.257


### 2.1.3 Generate a map for the neighborhoods with available coordiates for Miami, FL

In [9]:
# Note: the geolocator can be unstable sometimes, refresh the code multiple times if needed
address = 'Miami, FL'

geolocator = Nominatim(user_agent="hw")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Miami are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Miami are 25.7742658, -80.1936589.


In [10]:

map_miami = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, label in zip(miami['Lat'], miami['Lng'], miami['Neighborhood']):
    
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_miami)  
    
map_miami

## 2.2 Get neighborhood data for Columbia, SC

### 2.2.1 Get coordinates for Columbia

Since I couldn't find any tabulated list of neighborhoods and corresponding coordiates for Columbia online, I instead found a article about the characteristics of the neighborhoods in Columbia, SC. In this subsection, I will manully type in the names of each neighborhood and use the *geolocator.geocdoe* function to find the coordinates. 
In addition, for some neighborhoods, the *geolocator.geocode* had trouble finding the correct coordinates, so I will include *try*, *except* in my code and focus only on the neighborhoods with useful coordinates.

#### 2.2.2 Note: the geolocator can be unstable sometimes, refresh the code multiple times if needed
 

In [6]:
columbia_neighborhoods = ['Melrose Heights, SC','Cottontown, SC','Shandon, SC','Forest Acres, SC','Forest Hills, SC','Heathwood, SC','Rosewood, SC','Wildewood, SC','Lake Carolina,SC','Spring Valley,SC']

columbia = pd.DataFrame(columns = ['Neighborhood','Lat','Lng'])

for i,neigh in enumerate(columbia_neighborhoods):
    address = neigh
    try:
        geolocator = Nominatim(user_agent="hw")
        location = geolocator.geocode(address)
        latitude = location.latitude
        longitude = location.longitude
        print('The geograpical coordinate of',neigh, 'are {}, {}.'.format(latitude, longitude))
        columbia.loc[i,'Neighborhood'] = neigh
        columbia.loc[i,'Lat'] = latitude
        columbia.loc[i,'Lng'] = longitude
    except:
        pass
    
columbia

The geograpical coordinate of Melrose Heights, SC are 34.0065447, -81.002591.
The geograpical coordinate of Shandon, SC are 33.9982115, -81.0056467.
The geograpical coordinate of Forest Acres, SC are 34.0193221, -80.9898128.
The geograpical coordinate of Forest Hills, SC are 34.9793027, -81.2331298.
The geograpical coordinate of Heathwood, SC are 33.9990448, -80.9864797.
The geograpical coordinate of Rosewood, SC are 34.9126262, -81.8526023.
The geograpical coordinate of Wildewood, SC are 34.1043185, -80.8820322.
The geograpical coordinate of Lake Carolina,SC are 34.17484295, -80.8862830494948.
The geograpical coordinate of Spring Valley,SC are 34.9112573, -80.929798.


Unnamed: 0,Neighborhood,Lat,Lng
0,"Melrose Heights, SC",34.0065,-81.0026
2,"Shandon, SC",33.9982,-81.0056
3,"Forest Acres, SC",34.0193,-80.9898
4,"Forest Hills, SC",34.9793,-81.2331
5,"Heathwood, SC",33.999,-80.9865
6,"Rosewood, SC",34.9126,-81.8526
7,"Wildewood, SC",34.1043,-80.882
8,"Lake Carolina,SC",34.1748,-80.8863
9,"Spring Valley,SC",34.9113,-80.9298


In [7]:
# Note: the geolocator can be unstable sometimes, refresh the code multiple times if needed
address2 = 'Columbia, SC'

geolocator2 = Nominatim(user_agent="hw")
location2 = geolocator2.geocode(address2)
latitude2 = location2.latitude
longitude2 = location2.longitude
print('The geograpical coordinate of Columbia are {}, {}.'.format(latitude2, longitude2))

The geograpical coordinate of Columbia are 34.0007493, -81.0343313.


### 2.2.3 Generate a map for the neighborhoods with available coordiates for Columbia, SC

In [8]:

map_columbia = folium.Map(location=[latitude2, longitude2], zoom_start=8)

# add markers to map
for lat, lng, label in zip(columbia['Lat'], columbia['Lng'], columbia['Neighborhood']):
    
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_columbia)  
    
map_columbia

## 2.3 Foursquare API Data

In this subsection, I will use Foursquare API to get the venues in each neighborhood in both Miami and Columbia. 
The resultant data will be used to collect information on the number of venues available as well as the types of the venues. For example, how many restaurants or coffee shop or parks are present in each neighborhood; additionally, for restaurant-type venue, what types of cuisines are available (especially the one 

### 2.3.1 Define Foursquare Credentials and Version

In [14]:
CLIENT_ID = 'XYK2G5XLJP5KUOWHNW3JOGNPEWAN0JJPMIGWI4FR4J4PBOED' # your Foursquare ID
CLIENT_SECRET = 'N2JRIJN1FQX2FUKND1NUTNNQSBBUYKUINHKM0S4ZKV5X2R1Z' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version


#### 2.3.1 Define a function top 100 venues that are in the listed neighborhoods within a given radius for the target cities: Miami and Columbia (This was borrowed from the Lab with minor modification)

In [15]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        try:    
        # make the GET request
          results = requests.get(url).json()['response']['groups'][0]['items']
        
        # return only relevant information for each nearby venue
          venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])
            
        except:    
            print('unable to fetch data')
    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
       
    return(nearby_venues)

### 2.3.2 Run the above function on each neighborhood and create two new dataframes called *miami_venues*, and *columbia_venues* resectively.

In [16]:
LIMIT = 100

print("***** Miami ********")
miami_venues   = getNearbyVenues(names=miami['Neighborhood'],
                                   latitudes=miami['Lat'],
                                   longitudes=miami['Lng']
                                  )

print("***** Columbia *******")
columbia_venues = getNearbyVenues(names = columbia['Neighborhood'],
                                 latitudes = columbia['Lat'],
                                 longitudes = columbia['Lng']
                                 )

***** Miami ********
Allapattah
Arts & Entertainment District
Brickell
Buena Vista
Coconut Grove
Coral Way
Design District
Downtown
Edgewater
Flagami
Grapeland Heights
Liberty City
Little Haiti
Little Havana
Lummus Park
Midtown
Overtown
Park West
The Roads
Upper Eastside
Venetian Islands
Virginia Key
West Flagler
Wynwood
***** Columbia *******
Melrose Heights, SC
Shandon, SC
Forest Acres, SC
Forest Hills, SC
Heathwood, SC
Rosewood, SC
Wildewood, SC
Lake Carolina,SC
Spring Valley,SC


### 2.3.3 Check the size of the resulting dataframe

In [19]:
print("Miami:",miami_venues.shape)
miami_venues.head()

Miami: (557, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Allapattah,25.815,-80.224,Three Fingers Liquor & Lounge,25.815523,-80.224406,Lounge
1,Allapattah,25.815,-80.224,noor market,25.818165,-80.224197,Convenience Store
2,Allapattah,25.815,-80.224,Conde Art Gallery,25.818671,-80.224548,Art Gallery
3,Arts & Entertainment District,25.799,-80.19,Bunnie Cakes,25.799544,-80.190953,Cupcake Shop
4,Arts & Entertainment District,25.799,-80.19,Bunbury Miami,25.798284,-80.191118,Wine Shop


In [20]:
print("Columbia:",columbia_venues.shape)
columbia_venues.head()

Columbia: (51, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Melrose Heights, SC",34.006545,-81.002591,Long's Drugs,34.003504,-81.002401,Pharmacy
1,"Melrose Heights, SC",34.006545,-81.002591,New York Butcher Shoppe,34.009827,-81.004168,Butcher
2,"Melrose Heights, SC",34.006545,-81.002591,Mill Creek Pet Food/Grooming Center,34.003133,-81.000803,Pet Store
3,"Melrose Heights, SC",34.006545,-81.002591,Glorious Hill,34.010235,-81.001519,Cocktail Bar
4,"Shandon, SC",33.998211,-81.005647,Craft And Draft,33.998067,-81.005111,Beer Store
