# IBM Data Science Professional Certificate   
## Capstone Project by PETER C. PALMER


### Table of Contents

    i.   Introduction 
    ii.  Data  
    iii. Methodology 
    iv.  Results 
    v.   Discussion 
    vi.  Conclusion 

### Introduction
P. James, an electrician with over a decade of experience, working on residential and commercial projects, wants to start an electrical company in Orlando, Florida. He has done electrical work before in Orlando, but he is not sure where in Orlando is the best place to set up an office. He knows that he will need financing and so one criteria that he is looking for is proximity to a lending institution, like a bank or credit union. It would also be great if there are not many registered electrical companies in the area, reducing competition. It is possible, with the relevant licenses, for an electrical company to be located in one county but the electrician does work in another county, or several other counties. 

### Business problem
The objective is to help Mr. James make the best decision, about office space, by using the Foursquare location data, data science techniques, webscraping techniques, and machine learning algorithm. With these tools we will try to answer the question: can we determine the best neighborhood in Orlando for Mr. James to set up an electrical company?  

### Data
In order to solve this problem, we will need the following data: 
* a list of neighborhoods in Orlando;   
* latitude and longitude of the neighborhoods;
* venue data, from Orlando, identifying banks and electrical companies.

The list of neighborhoods in orlando can be found on the wikipedia webpage, https://en.wikipedia.org/wiki/Category:Neighborhoods_in_Orlando,_Florida. The page has the neighborhoods contained in a bulleted list, showing twenty six neighborhoods. To access this data we will use webscraping techniques which includes the beautifulsoup package. For the coordinates of the neighborhoods, latitude and longitude, we will use the geocoder package, and then join these in a single dataframe. We will also need the latitude and longitude of the neighborhoods so maps can be plotted and markers will be placed on these maps so that the different clusters can be visualized. Finally, the Foursquare API will help us with information that can be used to drill down on the neighborhoods and surrounding venues.    

### Methodology 

The methodology can be summarized under eight subsections: import and install libraries, scrape the website for the neighborhoods, get the coordinates of the neighborhoods, create a map of Orlando,Florida, explore the neighborhoods in Orlando, analysis of the neighborhoods, form the clusters and use the clustering algorithm, and examine the clusters. 

The necessary libraries must be either imported or installed. Each tool that will be used requires a library, and without this library we will get error messages, in the notebook, and the analysis cannot be completed in this case. The libraries can be grouped as webscraping libraries, dataframe and data manipulation libraries, machine learnining libraries, and plotting libraries. 

The list of neighborhoods in orlando can be found on the wikipedia webpage, https://en.wikipedia.org/wiki/Category:Neighborhoods_in_Orlando,_Florida. The page has the neighborhoods contained in a bulleted list, showing twenty six neighborhoods. To access this data we will use webscraping techniques which includes the beautifulsoup package. We start the webscraping process by sending a GET request to the wikipedia webpage and save the information in variable and use beautifulsoup to parse the data from the webpage. Now we create an empty list to store neighborhood data that will be retrieved from the webpage. The data on the wikipedia page can be found in a div tag with class "mw-category", and then in a list with the li tag. We end the webscraping part by appended the data into an empty list, and create a data frame, which we call df. 

We need the coordinate of the neighborhoods in Orlando so that we can create the map of Orlando. Now, to get the coo 8urdinates of the neighborhoods, we define a function, then after we call the function the data willbe stoed in a dataframe. Therefore, we now have two separate dataframes, one with the neighborhoods and one with the coordinates. The final thing here is to merge these two dataframes into one data frame.  

We create a map of Orlando. Before the map is created, we find the coordinates of Orlando using geolocater. Now that we have the coordinates, we can use folium to draw the map, and we also superimpose the markers of the neighborhoods, on the map of Orlando. 

We explore the neighborhoods in Orlando, Fl by using the Foursquare API. Foursquare credentials, client id, client secret, and version, are necessary and must be obtained by setting up an account on Foursquare. Use your credentials and version to get the top 100 venues that are within a radius of 1000 meters and store them. Now create a dataframe from the list of venues and check the first five rows. Some of the venue categories that we see in the first five rows of the venues data frame are electronic store, Latin American Restaurant, breakfast spot, and bakery. We can determine the number of venues returned for each neighborhood. This gives us an idea of the density of the businesses in the neighborhood. We can determine the number of unique categories and print some of them. Are there any electrical companies nearby? It shows that there is none.

We do an analysis of the neighborhoods in Orlando by using one hot encoding. Include the neighborhoods column in the one hot encoding dataframe, make this into a smaller dataframe by grouping the rows by the neighborhood, and taking the mean of the frequency of each category. Then we look at the banks only to meet the criteria set by Mr. James. 

Form the clusters and use the k-means algorithm. We use three clusters and fit the k-means clustering algorithm. Create a new dataframe with the clusters and the top 10 venues for each neighborhood, and include the clustering labels, 0,1, and 2. We then merge the data frames and add the coordinates of each neighborhood. Next the data frame is sorted by the cluster labels 0,1,2. Lastly, in this section, we create a map of Orlando but this time showing the showing the clusters. Finally, we examine the clusters individually. 






### 1.  Here we will Import all the necessary libraries that will be used in the analysis.

In [3]:
# library to handle data in vector form 
import numpy as np 

# library for data analsysis
import pandas as pd 
pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)

# library to handle JSON files from Foursquare API
import json 

# convert an address into its coordinates and get the coordinates of a location
from geopy.geocoders import Nominatim 
!pip install geocoder
import geocoder 

# library to handle requests
import requests 
# library to parse HTML and XML documents
from bs4 import BeautifulSoup 

# tranform JSON file into a pandas dataframe
from pandas.io.json import json_normalize 

# Libraries for plotting
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means machine learning algorithm 
from sklearn.cluster import KMeans

# library for drawing mapsmap rendering library
!pip install folium
import folium # map rendering library

print("Libraries imported.")

Collecting folium
[?25l  Downloading https://files.pythonhosted.org/packages/fd/a0/ccb3094026649cda4acd55bf2c3822bb8c277eb11446d13d384e5be35257/folium-0.10.1-py2.py3-none-any.whl (91kB)
[K     |████████████████████████████████| 92kB 7.2MB/s eta 0:00:011
Collecting branca>=0.3.0 (from folium)
  Downloading https://files.pythonhosted.org/packages/81/6d/31c83485189a2521a75b4130f1fee5364f772a0375f81afff619004e5237/branca-0.4.0-py3-none-any.whl
Installing collected packages: branca, folium
Successfully installed branca-0.4.0 folium-0.10.1
Libraries imported.


### 2. Here we scrape the wikipedia website for the neighborhods in Orlando.  

In [6]:
# Send a GET request to the wikipedia webpage and save the information in orc 
URL = "https://en.wikipedia.org/wiki/Category:Neighborhoods_in_Orlando,_Florida"
orc = requests.get(URL).text

In [13]:
# We will use beautifulsoup to parse the data from the webpage 
# and save the data in orc_soup.
orc_soup = BeautifulSoup(orc, 'html.parser')


In [15]:
# Now we create an empty list to store neighborhood data that will be retrieved from 
# the webpage. 
neigh_list = []

# The data can be found in a div tag with class "mw-category", and then in a list 
# list with the li tag. The data is appended into the empty list 
for row in orc_soup.find_all("div", class_="mw-category")[0].findAll("li"):
    neigh_list.append(row.text)


In [16]:
# Create a data frame which we call df and view the first five rows
df = pd.DataFrame({"Neighborhood": neigh_list})
df.head()

Unnamed: 0,Neighborhood
0,"List of neighborhoods in Orlando, Florida"
1,"Azalea Park, Florida"
2,"Callahan, Orlando, Florida"
3,"Central Business District, Orlando, Florida"
4,College Park (Orlando)


In [17]:
# Drop the first row, it is not a neighborhood, and reset the index
# for the dataframe. 

df.drop(df.loc[df['Neighborhood'] == "List of neighborhoods in Orlando, Florida"].index, inplace=True)
df=df.reset_index()
del df['index']
df.head()

Unnamed: 0,Neighborhood
0,"Azalea Park, Florida"
1,"Callahan, Orlando, Florida"
2,"Central Business District, Orlando, Florida"
3,College Park (Orlando)
4,"Conway, Florida"


In [18]:
# Look at shape of the dataframe
df.shape

(26, 1)

### 3. Here we will get the coordinates of the neighborhoods and add them to our dataframe. 

In [21]:
# Define a function that will get coordinates of the neighborhoods.
def lat_long(neighborhood):
    # initialize the variable
    lat_long_coord = None
    # loop through the coordinates until until you get the coordinates
    while(lat_long_coord is None):
        g = geocoder.arcgis('{}, Orlando, Florida'.format(neighborhood))
        lat_long_coord = g.latlng
    return lat_long_coord


In [23]:
# Store the coorinates in coord and view them. 
coord = [ lat_long(neighborhood) for neighborhood in df["Neighborhood"].tolist() ]

In [25]:
# Create a dataframe of coordinates and merge this with the neighborhoods. 
df_lat_long = pd.DataFrame(coord, columns=['Latitude', 'Longitude'])
df = pd.concat([df, df_lat_long], axis=1)
df.head()

Unnamed: 0,Neighborhood,Latitude,Longitude
0,"Azalea Park, Florida",28.53283,-81.31041
1,"Callahan, Orlando, Florida",28.53834,-81.37924
2,"Central Business District, Orlando, Florida",28.53834,-81.37924
3,College Park (Orlando),28.535289,-81.440574
4,"Conway, Florida",28.51737,-81.33511


### 4. Here, we create a map of Orlando, Fl and superimpose the neighborhood markers on it.  

In [26]:
# Find the coordinates of Orlando using geolocater
address = 'Orlando, Florida'

geolocator = Nominatim(user_agent="orc_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The coordinates of Orlando, Fl are {}, {}.'.format(latitude, longitude))

The coordinates of Orlando, Fl are 28.5421109, -81.3790304.


In [80]:
# Create the map of Orlando, Fl, using the latitude and longitude values, and folium
map_orlando = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, neighborhood in zip(df['Latitude'], df['Longitude'], df['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_orlando)  
    
map_orlando

### 5. Here, we explore the neighborhoods in Orlando, Fl by using the Foursquare API.

In [82]:
# Use your Foursquare Credentials and Version 
CLIENT_ID = 'Your Client Id' # your Foursquare ID
CLIENT_SECRET = 'Your Client Secret' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: Your Client Id
CLIENT_SECRET:Your Client Secret


In [33]:
# Let us get the top 100 venues that are within a radius of 1000 meters and store them in 
# venues. 
radius = 1000
LIMIT = 100

# Create an empty list for the venues.
venues = [] 

for lat, long, neighborhood in zip(df['Latitude'], df['Longitude'], df['Neighborhood']):
    
# The Foursquare API request
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        long,
        radius, 
        LIMIT)
    
# The GET request
    results = requests.get(url).json()["response"]['groups'][0]['items']
    
# Return only the relevant information for each venue that is close
    for venue in results:
        venues.append((
            neighborhood,
            lat, 
            long, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))


In [36]:
# Create a dataframe from the list of venues and check the first five rows.
# Some of the venue categories that we see in the first five rows of the venues
# dataframe are electronic store, Latin American Restaurant, breakfast spot, and bakery. 
ven_df = pd.DataFrame(venues)

# Assign the columns of ven_df with the following names
ven_df.columns = ['Neighborhood', 'Latitude', 'Longitude', 'Ven. Name', 'Ven. Latitude', 'Ven. Longitude', 'Ven. Category']

#print(venues_df.shape)
ven_df.head()

Unnamed: 0,Neighborhood,Latitude,Longitude,Ven. Name,Ven. Latitude,Ven. Longitude,Ven. Category
0,"Azalea Park, Florida",28.53283,-81.31041,Macdroid Store,28.534682,-81.31061,Electronics Store
1,"Azalea Park, Florida",28.53283,-81.31041,Oh! Que Bueno Restaurant,28.529,-81.310034,Latin American Restaurant
2,"Azalea Park, Florida",28.53283,-81.31041,Wawa,28.52825,-81.30997,Breakfast Spot
3,"Azalea Park, Florida",28.53283,-81.31041,Taino's Bakery and Deli,28.53876,-81.308669,Bakery
4,"Azalea Park, Florida",28.53283,-81.31041,IHOP,28.533353,-81.310336,Breakfast Spot


In [38]:
# We can determine the number of venues returned for each neighborhood. This gives us an
# idea of the density of the businesses in the neighborhood.  
ven_df.groupby(["Neighborhood"]).count()

Unnamed: 0_level_0,Latitude,Longitude,Ven. Name,Ven. Latitude,Ven. Longitude,Ven. Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Azalea Park, Florida",24,24,24,24,24,24
"Callahan, Orlando, Florida",100,100,100,100,100,100
"Central Business District, Orlando, Florida",100,100,100,100,100,100
College Park (Orlando),5,5,5,5,5,5
"Conway, Florida",52,52,52,52,52,52
"Delaney Park, Orlando, Florida",18,18,18,18,18,18
Downtown Orlando,100,100,100,100,100,100
"Education Village, Orlando, Florida",100,100,100,100,100,100
"Eola, Orlando, Florida",100,100,100,100,100,100
Griffin Park Historic District,29,29,29,29,29,29


In [48]:
# We can determine the number of unique categories and print some of them.  

print('There are {} unique categories.'.format(len(ven_df['Ven. Category'].unique())))
print('These are some of the unique categories:',ven_df['Ven. Category'].unique()[:25])

There are 184 unique categories.
These are some of the unique categories: ['Electronics Store' 'Latin American Restaurant' 'Breakfast Spot' 'Bakery'
 'Fried Chicken Joint' 'Gas Station' 'Discount Store' 'Park'
 'Argentinian Restaurant' 'Video Store' 'Convenience Store'
 'Mobile Phone Shop' 'Fast Food Restaurant' 'Donut Shop' 'Intersection'
 'Toll Plaza' 'Home Service' 'Hobby Shop' 'Theater' 'Hotel'
 'American Restaurant' 'Lounge' 'Steakhouse' 'Smoke Shop' 'Burger Joint']


In [57]:
# Are there any electrical companies nearby? It shows that there is none. 
"Electrical Company" in ven_df['Ven. Category'].unique()

False

### 6. Here, we do an analysis of the neighborhoods.  

In [59]:
# We will use one hot encoding. 
orl_hot = pd.get_dummies(ven_df[['Ven. Category']], prefix="", prefix_sep="")

# Include the neighborhoods column in the one hot encoding dataframe 
orl_hot['Neighborhoods'] = ven_df['Neighborhood'] 

# Get the neighborhood column to be the first column
fixed_columns = [orl_hot.columns[-1]] + list(orl_hot.columns[:-1])
orl_hot = orl_hot[fixed_columns]

print(orl_hot.shape)
orl_hot.head()

(1554, 185)


Unnamed: 0,Neighborhoods,American Restaurant,Arcade,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Automotive Shop,BBQ Joint,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Basketball Court,Basketball Stadium,Beach Bar,Bed & Breakfast,Beer Bar,Beer Garden,Big Box Store,Bistro,Boat or Ferry,Bookstore,Boutique,Breakfast Spot,Brewery,Bubble Tea Shop,Building,Burger Joint,Business Service,Cafeteria,Café,Cajun / Creole Restaurant,Camera Store,Candy Store,Caribbean Restaurant,Cemetery,Chinese Restaurant,Church,Clothing Store,Cocktail Bar,Coffee Shop,Comedy Club,Concert Hall,Convenience Store,Cosmetics Shop,Cuban Restaurant,Department Store,Dessert Shop,Dim Sum Restaurant,Discount Store,Dive Bar,Doctor's Office,Donut Shop,Electronics Store,Empanada Restaurant,English Restaurant,Farmers Market,Fast Food Restaurant,Food Stand,Food Truck,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Gas Station,Gastropub,Gay Bar,Gift Shop,Golf Course,Grocery Store,Gym,Gym / Fitness Center,Health Food Store,History Museum,Hobby Shop,Home Service,Hot Dog Joint,Hotel,Hotel Bar,Hotel Pool,Ice Cream Shop,Indian Restaurant,Intersection,Italian Restaurant,Japanese Restaurant,Jewelry Store,Juice Bar,Karaoke Bar,Lake,Latin American Restaurant,Lawyer,Leather Goods Store,Library,Lighthouse,Liquor Store,Lounge,Martial Arts Dojo,Massage Studio,Mediterranean Restaurant,Mexican Restaurant,Mini Golf,Miscellaneous Shop,Mobile Phone Shop,Movie Theater,Music Venue,Nail Salon,Neighborhood,New American Restaurant,Nightclub,Optical Shop,Outdoors & Recreation,Paintball Field,Paper / Office Supplies Store,Parade,Park,Performing Arts Venue,Peruvian Restaurant,Pet Store,Pharmacy,Pier,Pizza Place,Playground,Plaza,Pool,Professional & Other Places,Pub,Public Art,Ramen Restaurant,Rental Car Location,Residential Building (Apartment / Condo),Resort,Restaurant,Salad Place,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shipping Store,Shop & Service,Shopping Mall,Skating Rink,Smoke Shop,Soccer Field,Soccer Stadium,South American Restaurant,Southern / Soul Food Restaurant,Souvenir Shop,Spanish Restaurant,Speakeasy,Spiritual Center,Sporting Goods Shop,Sports Bar,Stadium,Steakhouse,Storage Facility,Sushi Restaurant,Taco Place,Tapas Restaurant,Tattoo Parlor,Tea Room,Tennis Court,Tex-Mex Restaurant,Thai Restaurant,Theater,Theme Park,Theme Park Ride / Attraction,Theme Restaurant,Thrift / Vintage Store,Tiki Bar,Toll Plaza,Tourist Information Center,Toy / Game Store,Train Station,Vegetarian / Vegan Restaurant,Video Store,Vietnamese Restaurant,Warehouse Store,Water Park,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,"Azalea Park, Florida",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,"Azalea Park, Florida",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,"Azalea Park, Florida",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,"Azalea Park, Florida",0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,"Azalea Park, Florida",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [61]:
# Let us make this into a smaller dataframe by grouping the rows by the neighborhood 
# and taking the mean of the frequency of each category. 
orl_grouped = orl_hot.groupby(["Neighborhoods"]).mean().reset_index()

print(orl_grouped.shape)
orl_grouped.head()

(26, 185)


Unnamed: 0,Neighborhoods,American Restaurant,Arcade,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Automotive Shop,BBQ Joint,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Basketball Court,Basketball Stadium,Beach Bar,Bed & Breakfast,Beer Bar,Beer Garden,Big Box Store,Bistro,Boat or Ferry,Bookstore,Boutique,Breakfast Spot,Brewery,Bubble Tea Shop,Building,Burger Joint,Business Service,Cafeteria,Café,Cajun / Creole Restaurant,Camera Store,Candy Store,Caribbean Restaurant,Cemetery,Chinese Restaurant,Church,Clothing Store,Cocktail Bar,Coffee Shop,Comedy Club,Concert Hall,Convenience Store,Cosmetics Shop,Cuban Restaurant,Department Store,Dessert Shop,Dim Sum Restaurant,Discount Store,Dive Bar,Doctor's Office,Donut Shop,Electronics Store,Empanada Restaurant,English Restaurant,Farmers Market,Fast Food Restaurant,Food Stand,Food Truck,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Gas Station,Gastropub,Gay Bar,Gift Shop,Golf Course,Grocery Store,Gym,Gym / Fitness Center,Health Food Store,History Museum,Hobby Shop,Home Service,Hot Dog Joint,Hotel,Hotel Bar,Hotel Pool,Ice Cream Shop,Indian Restaurant,Intersection,Italian Restaurant,Japanese Restaurant,Jewelry Store,Juice Bar,Karaoke Bar,Lake,Latin American Restaurant,Lawyer,Leather Goods Store,Library,Lighthouse,Liquor Store,Lounge,Martial Arts Dojo,Massage Studio,Mediterranean Restaurant,Mexican Restaurant,Mini Golf,Miscellaneous Shop,Mobile Phone Shop,Movie Theater,Music Venue,Nail Salon,Neighborhood,New American Restaurant,Nightclub,Optical Shop,Outdoors & Recreation,Paintball Field,Paper / Office Supplies Store,Parade,Park,Performing Arts Venue,Peruvian Restaurant,Pet Store,Pharmacy,Pier,Pizza Place,Playground,Plaza,Pool,Professional & Other Places,Pub,Public Art,Ramen Restaurant,Rental Car Location,Residential Building (Apartment / Condo),Resort,Restaurant,Salad Place,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shipping Store,Shop & Service,Shopping Mall,Skating Rink,Smoke Shop,Soccer Field,Soccer Stadium,South American Restaurant,Southern / Soul Food Restaurant,Souvenir Shop,Spanish Restaurant,Speakeasy,Spiritual Center,Sporting Goods Shop,Sports Bar,Stadium,Steakhouse,Storage Facility,Sushi Restaurant,Taco Place,Tapas Restaurant,Tattoo Parlor,Tea Room,Tennis Court,Tex-Mex Restaurant,Thai Restaurant,Theater,Theme Park,Theme Park Ride / Attraction,Theme Restaurant,Thrift / Vintage Store,Tiki Bar,Toll Plaza,Tourist Information Center,Toy / Game Store,Train Station,Vegetarian / Vegan Restaurant,Video Store,Vietnamese Restaurant,Warehouse Store,Water Park,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,"Azalea Park, Florida",0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.041667,0.041667,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.083333,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,"Callahan, Orlando, Florida",0.07,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.11,0.0,0.0,0.01,0.02,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.07,0.03,0.01,0.01,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.02,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.04,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.03,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.02,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0
2,"Central Business District, Orlando, Florida",0.07,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.11,0.0,0.0,0.01,0.02,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.07,0.03,0.01,0.01,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.02,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.04,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.03,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.02,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0
3,College Park (Orlando),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"Conway, Florida",0.0,0.0,0.0,0.0,0.0,0.0,0.019231,0.0,0.038462,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019231,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.057692,0.019231,0.0,0.0,0.0,0.038462,0.0,0.0,0.019231,0.0,0.0,0.0,0.0,0.038462,0.0,0.019231,0.0,0.0,0.0,0.019231,0.0,0.0,0.0,0.019231,0.0,0.038462,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.019231,0.0,0.038462,0.0,0.019231,0.0,0.0,0.019231,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019231,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019231,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.076923,0.0,0.038462,0.0,0.019231,0.019231,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019231,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.057692,0.038462,0.0,0.0,0.0,0.0,0.019231,0.0


In [64]:
# Let us look at the banks only; this would probably be a good area 
# to set up the offices. 
orl_bank = orl_grouped[["Neighborhoods","Bank"]]
orl_bank.head()

Unnamed: 0,Neighborhoods,Bank
0,"Azalea Park, Florida",0.0
1,"Callahan, Orlando, Florida",0.0
2,"Central Business District, Orlando, Florida",0.0
3,College Park (Orlando),0.0
4,"Conway, Florida",0.038462


###  7. Here, we form the clusters and use k-means algorithm. 


In [65]:
# Let us use three clusters
orl_clust = 3

orl_clustering = orl_bank.drop(["Neighborhoods"], 1)

# Fit the k-means clustering algorithm
kmeans = KMeans(n_clusters=orl_clust, random_state=0).fit(orl_clustering)

In [69]:
# Create a new dataframe with the clusters and top 10 venues for each neighborhood.
orl_merged = orl_bank.copy()

# Include the clustering labels, 0,1, and 2
orl_merged["Cluster Labels"] = kmeans.labels_

orl_merged.rename(columns={"Neighborhoods": "Neighborhood"}, inplace=True)
orl_merged.head()

Unnamed: 0,Neighborhood,Bank,Cluster Labels
0,"Azalea Park, Florida",0.0,0
1,"Callahan, Orlando, Florida",0.0,0
2,"Central Business District, Orlando, Florida",0.0,0
3,College Park (Orlando),0.0,0
4,"Conway, Florida",0.038462,1


In [70]:
# Merge data frames and add the coordinates of each neighborhood
orl_merged = orl_merged.join(df.set_index("Neighborhood"), on="Neighborhood")

print(orl_merged.shape)
orl_merged.head() # check the last columns!

(26, 5)


Unnamed: 0,Neighborhood,Bank,Cluster Labels,Latitude,Longitude
0,"Azalea Park, Florida",0.0,0,28.53283,-81.31041
1,"Callahan, Orlando, Florida",0.0,0,28.53834,-81.37924
2,"Central Business District, Orlando, Florida",0.0,0,28.53834,-81.37924
3,College Park (Orlando),0.0,0,28.535289,-81.440574
4,"Conway, Florida",0.038462,1,28.51737,-81.33511


In [71]:
# Sort the results by Cluster Labels 0,1,2.
orl_merged.sort_values(["Cluster Labels"], inplace=True)
orl_merged

Unnamed: 0,Neighborhood,Bank,Cluster Labels,Latitude,Longitude
0,"Azalea Park, Florida",0.0,0,28.53283,-81.31041
23,"South Eola, Orlando, Florida",0.0,0,28.477134,-81.468852
22,"Parramore, Orlando, Florida",0.0,0,28.53834,-81.37924
21,Parramore,0.0,0,28.528715,-81.388873
20,Orlando Main Street Program,0.0,0,28.49257,-81.534016
19,Orlando Design District,0.0,0,25.81278,-80.19211
18,MetroWest (Orlando),0.0,0,28.519929,-81.47337
17,"Lake Nona, Orlando, Florida",0.0,0,28.53834,-81.37924
16,"Lake Nona South, Orlando, Florida",0.0,0,28.398841,-81.250447
15,"Lake Nona Estates, Orlando, Florida",0.0,0,28.398841,-81.250447


In [81]:
# Create a map showing the clusters. 
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(orl_clust)
ys = [i+x+(i*x)**2 for i in range(orl_clust)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(orl_merged['Latitude'], orl_merged['Longitude'], orl_merged['Neighborhood'], orl_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' - Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### 8. Here, we will examine the clusters. 

In [77]:
# In cluster zero. 
orl_merged.loc[orl_merged['Cluster Labels'] == 0]

Unnamed: 0,Neighborhood,Bank,Cluster Labels,Latitude,Longitude
0,"Azalea Park, Florida",0.0,0,28.53283,-81.31041
23,"South Eola, Orlando, Florida",0.0,0,28.477134,-81.468852
22,"Parramore, Orlando, Florida",0.0,0,28.53834,-81.37924
21,Parramore,0.0,0,28.528715,-81.388873
20,Orlando Main Street Program,0.0,0,28.49257,-81.534016
19,Orlando Design District,0.0,0,25.81278,-80.19211
18,MetroWest (Orlando),0.0,0,28.519929,-81.47337
17,"Lake Nona, Orlando, Florida",0.0,0,28.53834,-81.37924
16,"Lake Nona South, Orlando, Florida",0.0,0,28.398841,-81.250447
15,"Lake Nona Estates, Orlando, Florida",0.0,0,28.398841,-81.250447


In [78]:
# In cluster one. 
orl_merged.loc[orl_merged['Cluster Labels'] == 1]

Unnamed: 0,Neighborhood,Bank,Cluster Labels,Latitude,Longitude
4,"Conway, Florida",0.038462,1,28.51737,-81.33511


In [79]:
# In cluster two. 
orl_merged.loc[orl_merged['Cluster Labels'] == 2]

Unnamed: 0,Neighborhood,Bank,Cluster Labels,Latitude,Longitude
13,Lake Eola Heights Historic District,0.01,2,28.54984,-81.37062


### Results

The results obtained by using the k-means clustering algorithm show that we can categorize the neighborhoods, in Orlando, into three categories based on the frequency of occurence of banks. The following are the categories: 
* cluster 0: neighborhoods in Orlando with no bank;
* cluster 1: neighborhoods in Orlando with a low number of banks;
* cluster 2: neighborhoods in Orlando with a moderate number of banks.

Therefore, Mr. James can choose, to set up his company office, in the neighborhoods in cluster 1 or cluster 2. 

### Discussion 

We observed that there were no electrical companies in the venues categories. Firstly, the lack of electrical companies in the clusters suggests that there is great potential for Mr. James to establishing his company in these clusters. However, I believe that there may be some error here, that there are no electrical companies in the neighborhoods, and one possibility could be the electrical companies' business names, in the neighborhoods, do not contain the word electrical. With added criteria for selecting the clusters, instead of just financial institutions, we could form better clusters. This research could be extended to dig deeper into the clusters and actually start to look at properties, available for renting. We would need other criteria about the type of space that Mr. James needs, for example price, number of rooms, amenities, and so on.  

### Conclusion

In this project, trying to find the best neighborhood for Mr. James to set up an office, we were able to suggest two clusters of neighborhoods that would be good candidates. In order to complete the project we scraped a website for a list of the neighborhoods, got the coordinates of the neighborhoods using geolocater, created maps of Orlando, Florida, explored and analysed the neighborhoods of interest, and used a clustering algorithm to create and examine the clusters. There are opportunities for us to do further work towards finding not just the potential neighborhood for the office but to find an actual rental property, for Mr. James. 