# Rob Gould's Capstone Project - The Battle of the Neighborhoods
### Applied Data Science Capstone

### Table of Contents
* [Introduction: Business Problem](#intro)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results](#results)
* [Discussion](#discussion)
* [Conclusion](#conclusion)

# Introduction: Business Problem <a name="intro"></a>

Our analysis will focus on how to successfully relocate from one area to another by identifying similar neighborhoods to ease the transition. This analysis would be beneficial for anyone who is looking to relocate or for real estate agents looking to help facilitate relocation. In this analysis, all the user needs to do is update Home town and target relocation spot to run this analysis.

In our sample scenario, Dublin, Ohio is a small city just outside Columbus, Ohio. It is a suburb of around 50,000 people and would be more likely to compare to a Neighborhood or Borough within our target area of relocation, Toronto.

In this scenario, the reason we chose to relocate is that the United States is going through a recent COVID spike that is much larger than other areas. COVID management is different state-to-state here, and we feel it would be best to look outside of the US for a temporary move until the current environment resolves. We chose Toronto based on our willingness to relocate to an area that has the following:

<br>1) Similar lifestyle</br>
<br>2) Outside of the United States due to the country as a whole continiously posting high COVID numbers</br>
<br>3) Ability to speak English while we learn the local language</br>
<br>4) Similar weather as Ohio (cold and not super hot summers)</br>
<br>5) Proximity to Columbus to visit friends</br>

In order to ease the transition, we are targeting a location within Toronto that is most similar to our current residence of Dublin, Ohio. Toronto is a common visiting spot for people within the Ohio area of the United States, and is only a 6 hour drive. This makes it convenient to visit family and friends. 

During this analysis, we will use our skills to pull data from the internet for Dublin and Toronto, use four square to pull neighborhood venues to compare using K-Means for our ideal cluster of locations, use four square again to find a realtor in our new area, and then use Folium to map our new neighborhoods!



# Data <a name="data"></a>

Given our target of finding an ideal neighborhood, we will need the following set of data points:
* Current home town lat - lon
* Neighborhoods within Toronto
* Current home town venue list
* Relocation city (Toronto) venue list to find the most similar neighborhood(s)
* Real Estate Agent in the city to help us determine our best options
* Rating of the Real Estate Agent
* Map detailing our relocation spot

In order to perform our analysis, the following data will be pulled:
* <b> Foursquare API </b>: We will use this to pull Venue information for hometown and our relocation city. We will also pull real estae agent information and rating from this.
* <b> Geocoder API </b>: This will help us determine lat-lon of our hometown and relocation city
* <b> Wikipedia </b>: This will contain information for us to pull the neighborhoods to compare against our hometown
*<b> External CSV of Lat-lon for relocation city </b>

Some of the packages used during this analysis:
* Folium for mapping
* Pandas and Numpy for data prep
* requests for API pulls
* JSON module from Pandas to read our API data
* URLIB for API pulls
* Sklearn K-Means cluster to determine which neighborhood our hometown is most similar to

## Import our Packages

In [8]:
import requests
import urllib.request
import time
import json # library to handle JSON files
import numpy as np

import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
from pandas.io.json import json_normalize# tranform JSON file into a pandas dataframe
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

from urllib.request import urlopen

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


Understand current Covid Comparison by using Google to track trends. You can also pull data, but that is outside the scope of this exercise.

<a href ="https://news.google.com/covid19/map?hl=en-US&mid=%2Fm%2F09c7w0&gl=US&ceid=US%3Aen"> Covid Data </a>

## 1. Download data for current residence and prepare

### Adding our Foursquare Credentials and Version

In [3]:
CLIENT_ID = 'NQ0VZRXFYJPMXAEBK0GBQJLL53HEQ435LZ3F4TD15QTI0HO1' # your Foursquare ID
CLIENT_SECRET = 'YLXKDJUPXETBAP3GE1YADVQ3TZHBROLQZNGMUDJNY1350HPP' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version


#### Set the location of our current residence
Note addressed was removed but the Lat - Lon is the same for Dublin Ohio. Dublin is our resident example.

In [17]:
address = 'Dublin, Ohio 43017'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of our house is {}, {}.'.format(latitude, longitude))

The geograpical coordinate of our house is 40.0992294, -83.1140771.


#### Now, let's get the top 100 venues that are in Dublin, Ohio within a radius of 1,000 meters.
We chose a 1,000 meter radius based on how US cities are laid out where they are less built on neighborhood and more on the suburb itself.

In [18]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

### Exploring the data in the dataframe
First step: Get the neighborhood's name.
Second Step: Get the LAT LON of the neighborhood.

In [19]:
LIMIT1 = 100 # limit of number of venues returned by Foursquare API
radius1 = 1000 # define radius
# create URL
url2 = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    latitude, 
    longitude, 
    radius1, 
    LIMIT1)
url2 # display URL

# Process the GET request
home_results = requests.get(url2).json()
# results -- commented out so the results do not show on Github

This section cleans the data and prepares for pandas

In [20]:
home_venues = home_results['response']['groups'][0]['items']
    
home_nearby_venues = json_normalize(home_venues) # flatten JSON

# filter columns
home_filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
home_nearby_venues =home_nearby_venues.loc[:, home_filtered_columns]

# filter the category for each row
home_nearby_venues['venue.categories'] = home_nearby_venues.apply(get_category_type, axis=1)

# clean columns
home_nearby_venues.columns = [col.split(".")[-1] for col in home_nearby_venues.columns]

home_nearby_venues.head()


Unnamed: 0,name,categories,lat,lng
0,La Chatelaine French Bakery & Bistro,French Restaurant,40.099238,-83.115106
1,Woodhouse Day Spa,Spa,40.100023,-83.114122
2,Jeni's Splendid Ice Creams,Ice Cream Shop,40.099411,-83.113908
3,Dublin Village Tavern,Pub,40.098958,-83.113817
4,Downtown Historic Dublin,Neighborhood,40.099338,-83.113996


In [21]:
#Show the table and then print the data / shape
print('{} venues were returned by Foursquare.'.format(home_nearby_venues.shape[0]))

61 venues were returned by Foursquare.


Great! We have 61 venues to compare against other locations. Based on the data in the Categories field you can see we have a good mix of venues.

In [22]:
home = home_nearby_venues

Now we need to manipulate the data into one dataframe and add in our neighborhood and lat lon for later stacking.

In [23]:
home['Neighbourhood'] = 'Dublin, Ohio'
home['Neighbourhood Latitude'] = latitude
home['Neighbourhood Longitude'] = longitude
home.rename(columns={'name':'Venue','categories':'Venue Category','lat':'Venue Latitude','lng':'Venue Longitude'},inplace=True)


In [43]:
home = home[['Neighbourhood','Neighbourhood Latitude','Neighbourhood Longitude','Venue','Venue Latitude','Venue Longitude','Venue Category']]
home.head()

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Dublin, Ohio",40.102535,-83.114388,Condado Tacos,40.102403,-83.114321,Taco Place
1,"Dublin, Ohio",40.102535,-83.114388,Woodhouse Day Spa,40.100023,-83.114122,Spa
2,"Dublin, Ohio",40.102535,-83.114388,Jeni's Splendid Ice Creams,40.099411,-83.113908,Ice Cream Shop
3,"Dublin, Ohio",40.102535,-83.114388,La Chatelaine French Bakery & Bistro,40.099238,-83.115106,French Restaurant
4,"Dublin, Ohio",40.102535,-83.114388,Oscar's,40.101184,-83.113857,Italian Restaurant


## 2. Download data for targeted area and prepare

Let's leverage the pd.read_html module and its ability to read tables from websites. The same exercise could be applied for other cities, with wikipedia have a great selection. Additionally, you can access a lot of great data via an ODBC.


In [70]:
# Leveraging what we learned in week 3, I am using the same table. However, you can use pd.read_html for any table available online.
url='https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
df=pd.read_html(url, header=0)[0]
df.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


The following code creates a new DataFrame, keeping the original for reference. The new DataFrame (df2) removes Boroughs that have a 'Not assigned' value.

In [25]:
df2 = df[~df['Borough'].isin(['Not assigned'])]

Neighbourhoods are already grouped using the "," in Wikipedia.
Code below shows no Neighbourhoods yield a 'Not Assigned' result.

In [26]:
NC = df2[df2['Neighbourhood'].isin(['Not assigned'])]

Lifting from the Week 3 project, we are leveraging the CSV file that contains the Lat and Lon for Toronto. There are a lot of CSV files for download out there, or you can use a geocoder similar to what we do later for our Four Square analysis.

In [27]:
geodata = 'https://cocl.us/Geospatial_data'

In [28]:
geo = pd.read_csv(geodata)

In [29]:
df3 = pd.merge(df2,geo,how='left',on='Postal Code')

Merged the scraped data from Wiki and the geodata, captured through Pandas Read HTML

In [31]:
toronto_data = df3[df3['Borough'].str.contains('Toronto')].reset_index(drop=True)

### Now we need to pull data for our Toronto Neighborhoods

In [32]:

neighborhood_latitude = toronto_data.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = toronto_data.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = toronto_data.loc[0, 'Neighbourhood'] # neighborhood name


In [33]:
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius
# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url # display URL

# Process the GET request
results = requests.get(url).json()

### Break apart our data for Pandas

In [34]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()


Unnamed: 0,name,categories,lat,lng
0,Roselle Desserts,Bakery,43.653447,-79.362017
1,Tandem Coffee,Coffee Shop,43.653559,-79.361809
2,Cooper Koo Family YMCA,Distribution Center,43.653249,-79.358008
3,Body Blitz Spa East,Spa,43.654735,-79.359874
4,Impact Kitchen,Restaurant,43.656369,-79.35698


### Now lets Explore Neighborhoods in Toronto
Step 1: Create a function to repeat the same process to all the neighborhoods in Toronto

In [35]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### Now we will write the code to run the above function on each neighborhood and create a new dataframe called *Toronto_venues*.

In [36]:
toronto_venues = getNearbyVenues(names=toronto_data['Neighbourhood'],
                                   latitudes=toronto_data['Latitude'],
                                   longitudes=toronto_data['Longitude']
                                  )

Regent Park, Harbourfront
Queen's Park, Ontario Provincial Government
Garden District, Ryerson
St. James Town
The Beaches
Berczy Park
Central Bay Street
Christie
Richmond, Adelaide, King
Dufferin, Dovercourt Village
Harbourfront East, Union Station, Toronto Islands
Little Portugal, Trinity
The Danforth West, Riverdale
Toronto Dominion Centre, Design Exchange
Brockton, Parkdale Village, Exhibition Place
India Bazaar, The Beaches West
Commerce Court, Victoria Hotel
Studio District
Lawrence Park
Roselawn
Davisville North
Forest Hill North & West, Forest Hill Road Park
High Park, The Junction South
North Toronto West, Lawrence Park
The Annex, North Midtown, Yorkville
Parkdale, Roncesvalles
Davisville
University of Toronto, Harbord
Runnymede, Swansea
Moore Park, Summerhill East
Kensington Market, Chinatown, Grange Park
Summerhill West, Rathnelly, South Hill, Forest Hill SE, Deer Park
CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport
R

### Now we check the size of dataframe and print the table

In [37]:
print(toronto_venues.shape)
toronto_venues.head()

(1630, 7)


Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Regent Park, Harbourfront",43.65426,-79.360636,Roselle Desserts,43.653447,-79.362017,Bakery
1,"Regent Park, Harbourfront",43.65426,-79.360636,Tandem Coffee,43.653559,-79.361809,Coffee Shop
2,"Regent Park, Harbourfront",43.65426,-79.360636,Cooper Koo Family YMCA,43.653249,-79.358008,Distribution Center
3,"Regent Park, Harbourfront",43.65426,-79.360636,Body Blitz Spa East,43.654735,-79.359874,Spa
4,"Regent Park, Harbourfront",43.65426,-79.360636,Impact Kitchen,43.656369,-79.35698,Restaurant


## 3. Merging our data together into one table

Now we need to stack our data and create one dataframe. In order to do this we are going to use the pd.concat function, but first we need to make sure the shapes are the same.

We need to compare the shapes of the dataframes to ensure alignment:

In [38]:
toronto_venues.shape

(1630, 7)

In [39]:
home.shape

(61, 7)

Shapes match (7 columns)! Now we need to merge the two dataframes.

In [40]:
frames = [home,toronto_venues]

In [41]:
df_merge = pd.concat(frames)
df_merge.head()

of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.


  """Entry point for launching an IPython kernel.


Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Category,Venue Latitude,Venue Longitude
0,"Dublin, Ohio",40.099229,-83.114077,La Chatelaine French Bakery & Bistro,French Restaurant,40.099238,-83.115106
1,"Dublin, Ohio",40.099229,-83.114077,Woodhouse Day Spa,Spa,40.100023,-83.114122
2,"Dublin, Ohio",40.099229,-83.114077,Jeni's Splendid Ice Creams,Ice Cream Shop,40.099411,-83.113908
3,"Dublin, Ohio",40.099229,-83.114077,Dublin Village Tavern,Pub,40.098958,-83.113817
4,"Dublin, Ohio",40.099229,-83.114077,Downtown Historic Dublin,Neighborhood,40.099338,-83.113996


### And then we examine the number of venues per neighbourhood.

In [42]:
df_merge.groupby('Neighbourhood').count()

Unnamed: 0_level_0,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Category,Venue Latitude,Venue Longitude
Neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Berczy Park,59,59,59,59,59,59
"Brockton, Parkdale Village, Exhibition Place",23,23,23,23,23,23
"Business reply mail Processing Centre, South Central Letter Processing Plant Toronto",17,17,17,17,17,17
"CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport",16,16,16,16,16,16
Central Bay Street,62,62,62,62,62,62
Christie,17,17,17,17,17,17
Church and Wellesley,76,76,76,76,76,76
"Commerce Court, Victoria Hotel",100,100,100,100,100,100
Davisville,35,35,35,35,35,35
Davisville North,7,7,7,7,7,7


Good! Dublin, compares favorably in the number of venues we pulled. There are some venues that are low in venue count and could add noise into our analysis. Since we are unfamiliar with the area, lets leave them in. But know, you can build a dataframe with that (as we will do later on).

### And finally we look at unique venues.

In [43]:
print('There are {} uniques categories.'.format(len(df_merge['Venue Category'].unique())))

There are 237 uniques categories.


# Methodology

Now that we have our data pulled and assembled, let's review our methodology.

In this exercise we will work to identify ideal neighborhoods for our relocation. We will target ones in Toronto, which is a city similar to our current home city of Dublin, Ohio. 

In our initial step, we collected data based on Dublin, Ohio's geographic location as well as venue categories from foursquare. Next we pulled neighborhoods from Toronto so we could process them through foursquare's API to collect venue information. In order to compare the locations, we need to create one table that contains all information so we can compare the k-means.

Our next step, the Analysis phase, we will work on transforming the data so that we can run this through a k-means comparison to find the clusters for all locations we have provided. Since the categories and data are in the same table, our home town will be added into our relocation city's clusters. From there, we will identify the cluster our home town belongs to and then pull the corresponding neighborhoods to see where we could potentially land. 

In our final step, we will be reviewing our cluster information and pulling a real estate agent's information to help us choose a neighborhood. Before choosing, we will want to review their rating. Lastly, we will want to visualize the locations to get a better understanding of potential areas and visually look at the areas -- maybe there is something that really excites us about one of the areas!

# Analysis 

For our analysis, we chose k-means so we could identify neighborhoods most similar to our current ones. As noted, Dublin and Toronto are already very similar, but if we chose two different cities - say New York to Paris - we would be able to find our ideal relocation spot.

We will use foursquare to help us pull a real estate agent, and then look at their ratings. 

We will be using folio mapping to see the city and proximity to major markers, which will help us potentially rule out an area we previously didn't know we wanted to. If we find something - say a Metro Station in the area - we could add an exclusion to neighborhoods with that venue.

## 4. Analyze Each Neighborhood

We will use pd.get_dummies to turn the categories into columns

In [44]:
# one hot encoding
df_merge_onehot = pd.get_dummies(df_merge[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
df_merge_onehot['Neighbourhood'] = df_merge['Neighbourhood'] 

# move neighborhood column to the first column
fixed_columns = [df_merge_onehot.columns[-1]] + list(df_merge_onehot.columns[:-1])
df_merge_onehot = df_merge_onehot[fixed_columns]

df_merge_onehot.head()

Unnamed: 0,Neighbourhood,ATM,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Workshop,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Stadium,Basketball Stadium,Beach,Bed & Breakfast,Beer Bar,Beer Store,Belgian Restaurant,Bistro,Boat or Ferry,Bookstore,Boutique,Brazilian Restaurant,Breakfast Spot,Brewery,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Bus Line,Butcher,Café,Cajun / Creole Restaurant,Camera Store,Candy Store,Caribbean Restaurant,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,College Arts Building,College Auditorium,College Cafeteria,College Gym,College Rec Center,Colombian Restaurant,Comfort Food Restaurant,Comic Shop,Concert Hall,Convenience Store,Cosmetics Shop,Coworking Space,Creperie,Cuban Restaurant,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Fish & Chips Shop,Fish Market,Flea Market,Food & Drink Shop,Food Court,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General Entertainment,General Travel,German Restaurant,Gift Shop,Gluten-free Restaurant,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Health & Beauty Service,Health Food Store,Historic Site,History Museum,Hobby Shop,Hookah Bar,Hospital,Hostel,Hotel,Hotel Bar,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Intersection,Irish Pub,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Korean Restaurant,Lake,Latin American Restaurant,Lawyer,Light Rail Station,Lingerie Store,Liquor Store,Lounge,Market,Martial Arts School,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark,Moroccan Restaurant,Movie Theater,Museum,Music Venue,Neighborhood,New American Restaurant,Nightclub,Noodle House,Office,Opera House,Optical Shop,Organic Grocery,Other Great Outdoors,Outdoor Sculpture,Park,Performing Arts Venue,Persian Restaurant,Pet Store,Pharmacy,Pizza Place,Playground,Plaza,Poke Place,Portuguese Restaurant,Poutine Place,Pub,Ramen Restaurant,Record Shop,Rental Car Location,Restaurant,Roof Deck,Sake Bar,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Sculpture Garden,Seafood Restaurant,Shoe Repair,Shoe Store,Shopping Mall,Skate Park,Skating Rink,Smoke Shop,Smoothie Shop,Snack Place,Soup Place,Southern / Soul Food Restaurant,Spa,Speakeasy,Sporting Goods Shop,Sports Bar,Stadium,Stationery Store,Steakhouse,Strip Club,Supermarket,Sushi Restaurant,Swim School,Taco Place,Tailor Shop,Taiwanese Restaurant,Tanning Salon,Tapas Restaurant,Tea Room,Tennis Court,Thai Restaurant,Theater,Theme Restaurant,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Women's Store,Yoga Studio
0,"Dublin, Ohio",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,"Dublin, Ohio",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,"Dublin, Ohio",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,"Dublin, Ohio",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,"Dublin, Ohio",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


And now we look at the size of the dataframe.

In [45]:
df_merge_onehot.shape

(1691, 238)

### In this step we group rows by neighbourhood and the mean of frequency

In [47]:
df_merge_grouped = df_merge_onehot.groupby('Neighbourhood').mean().reset_index()

And again, we confirm the size of the dataframe

In [48]:
df_merge_grouped.shape

(40, 238)

Now we will reduce the data down, and group to understand the most common venues. 

In [49]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [50]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighbourhood'] = df_merge_grouped['Neighbourhood']

for ind in np.arange(df_merge_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(df_merge_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Berczy Park,Coffee Shop,Cocktail Bar,Cheese Shop,Beer Bar,Bakery,Farmers Market,Seafood Restaurant,Restaurant,Café,Clothing Store
1,"Brockton, Parkdale Village, Exhibition Place",Café,Breakfast Spot,Coffee Shop,Pet Store,Music Venue,Italian Restaurant,Bar,Restaurant,Bakery,Intersection
2,"Business reply mail Processing Centre, South C...",Yoga Studio,Auto Workshop,Comic Shop,Pizza Place,Restaurant,Butcher,Burrito Place,Brewery,Skate Park,Spa
3,"CN Tower, King and Spadina, Railway Lands, Har...",Airport Lounge,Airport Service,Airport Terminal,Harbor / Marina,Bar,Coffee Shop,Rental Car Location,Sculpture Garden,Boutique,Boat or Ferry
4,Central Bay Street,Coffee Shop,Italian Restaurant,Café,Sandwich Place,Department Store,Salad Place,Burger Joint,Japanese Restaurant,Bubble Tea Shop,Donut Shop


### Let's review our hometown!

In [51]:
neighborhoods_venues_sorted.loc[neighborhoods_venues_sorted['Neighbourhood'] == 'Dublin, Ohio']

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
10,"Dublin, Ohio",Pizza Place,Hotel,Italian Restaurant,Bar,Spa,Gift Shop,Steakhouse,Ice Cream Shop,Café,Bank


## 5. Cluster Neighborhoods

First we start by running K Means into 6 clusters

In [52]:
# set number of clusters
kclusters = 6

merge_grouped_clustering = df_merge_grouped.drop('Neighbourhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(merge_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1])

And now we create a dataframe and add the top 10 venues

In [53]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

df_merge2 = df_merge[['Neighbourhood','Neighbourhood Latitude','Neighbourhood Longitude']].drop_duplicates()

all_merged = df_merge2

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
all_merged = all_merged.join(neighborhoods_venues_sorted.set_index('Neighbourhood'), on='Neighbourhood')

all_merged.head() # check the last columns!

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Dublin, Ohio",40.099229,-83.114077,1,Pizza Place,Hotel,Italian Restaurant,Bar,Spa,Gift Shop,Steakhouse,Ice Cream Shop,Café,Bank
0,"Regent Park, Harbourfront",43.65426,-79.360636,1,Coffee Shop,Park,Café,Pub,Bakery,Breakfast Spot,Theater,Ice Cream Shop,Performing Arts Venue,Beer Store
45,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,1,Coffee Shop,College Cafeteria,Yoga Studio,Beer Bar,Smoothie Shop,Sandwich Place,Café,Portuguese Restaurant,Park,College Auditorium
80,"Garden District, Ryerson",43.657162,-79.378937,1,Clothing Store,Coffee Shop,Café,Cosmetics Shop,Bubble Tea Shop,Japanese Restaurant,Theater,Pizza Place,Fast Food Restaurant,Italian Restaurant
180,St. James Town,43.651494,-79.375418,1,Café,Coffee Shop,Clothing Store,Cosmetics Shop,Restaurant,Cocktail Bar,American Restaurant,Park,Moroccan Restaurant,Creperie


### Notice the Index is off from the data manipulations. Let's reset it.

In [54]:
all_merged.reset_index(drop=True,inplace=True)

### Now we need to see what Cluster our hometown is in.

In [55]:
all_merged.loc[all_merged['Neighbourhood'] == 'Dublin, Ohio']

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Dublin, Ohio",40.099229,-83.114077,1,Pizza Place,Hotel,Italian Restaurant,Bar,Spa,Gift Shop,Steakhouse,Ice Cream Shop,Café,Bank


## 6. Examine Clusters

Now we will exmaine our clusters and the supporting data that tie to the cluster (1) our home town is in.

### Cluster 1

In [56]:
all_merged.loc[all_merged['Cluster Labels'] == 1, all_merged.columns[[0] + list(range(4, all_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Dublin, Ohio",Pizza Place,Hotel,Italian Restaurant,Bar,Spa,Gift Shop,Steakhouse,Ice Cream Shop,Café,Bank
1,"Regent Park, Harbourfront",Coffee Shop,Park,Café,Pub,Bakery,Breakfast Spot,Theater,Ice Cream Shop,Performing Arts Venue,Beer Store
2,"Queen's Park, Ontario Provincial Government",Coffee Shop,College Cafeteria,Yoga Studio,Beer Bar,Smoothie Shop,Sandwich Place,Café,Portuguese Restaurant,Park,College Auditorium
3,"Garden District, Ryerson",Clothing Store,Coffee Shop,Café,Cosmetics Shop,Bubble Tea Shop,Japanese Restaurant,Theater,Pizza Place,Fast Food Restaurant,Italian Restaurant
4,St. James Town,Café,Coffee Shop,Clothing Store,Cosmetics Shop,Restaurant,Cocktail Bar,American Restaurant,Park,Moroccan Restaurant,Creperie
6,Berczy Park,Coffee Shop,Cocktail Bar,Cheese Shop,Beer Bar,Bakery,Farmers Market,Seafood Restaurant,Restaurant,Café,Clothing Store
7,Central Bay Street,Coffee Shop,Italian Restaurant,Café,Sandwich Place,Department Store,Salad Place,Burger Joint,Japanese Restaurant,Bubble Tea Shop,Donut Shop
8,Christie,Grocery Store,Café,Park,Athletics & Sports,Candy Store,Italian Restaurant,Coffee Shop,Restaurant,Baby Store,Nightclub
9,"Richmond, Adelaide, King",Coffee Shop,Café,Clothing Store,Hotel,Restaurant,Gym,Bar,Steakhouse,Thai Restaurant,Office
10,"Dufferin, Dovercourt Village",Bakery,Pharmacy,Grocery Store,Music Venue,Café,Middle Eastern Restaurant,Brewery,Bar,Bank,Supermarket


After running various K-Means (which you can do by running an Elbow comparison with a similar code to the one below) we find that Dublin is similar to a lot of neighborhoods in Toronto! It is time to call a real estate agent to determine our best options. 

<b>k-means optimal level:</b>

<br>groups = []</br>
<br>K = range(1,10)</br>
<br>for k in K:</br>
<br>kmeanModel = KMeans(n_clusters=k).fit(X)</br>
<br>kmeanModel.fit(X)</br>
<br>groups.append(sum(np.min(cdist(X, kmeanModel.cluster_centers_, ‘euclidean’), axis=1)) / X.shape[0])</br>



## 7. Locating our Real Estate Agent

In [57]:
search_query = 'Real Estate'
re_radius = 2000
print(search_query + ' .... OK!')

Real Estate .... OK!


In [58]:
# real estate url
new_lat = 43.6532
new_lon = -79.3832
url_re = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, new_lat, new_lon, VERSION, search_query, re_radius, LIMIT)
url_re

'https://api.foursquare.com/v2/venues/search?client_id=NQ0VZRXFYJPMXAEBK0GBQJLL53HEQ435LZ3F4TD15QTI0HO1&client_secret=YLXKDJUPXETBAP3GE1YADVQ3TZHBROLQZNGMUDJNY1350HPP&ll=43.6532,-79.3832&v=20180605&query=Real Estate&radius=2000&limit=100'

Prearing our data again for a dataframe

In [59]:
re_results = requests.get(url_re).json()

In [60]:
# assign relevant part of JSON to venues
re_venues = re_results['response']['venues']

# tranform venues into a dataframe
real_estate = json_normalize(re_venues)   
real_estate.head()

Unnamed: 0,categories,hasPerk,id,location.address,location.cc,location.city,location.country,location.crossStreet,location.distance,location.formattedAddress,location.labeledLatLngs,location.lat,location.lng,location.neighborhood,location.postalCode,location.state,name,referralId,venuePage.id
0,"[{'id': '5032885091d4c4b30a586d66', 'name': 'R...",False,5e86dbb7ddb9860007500cd2,1 Dundas Street West Suite 2500,CA,Toronto,Canada,,521,"[1 Dundas Street West Suite 2500, Toronto ON M...","[{'label': 'display', 'lat': 43.6486362, 'lng'...",43.648636,-79.381744,,M5G 1Z3,ON,Real Estate Lawyers.ca LLP,v-1597601541,586437570.0
1,"[{'id': '5032885091d4c4b30a586d66', 'name': 'R...",False,5e477ba435aef200082bf259,"1550 Sixteenth Avenue 200, Building C South",CA,Richmond Hill,Canada,,226,"[1550 Sixteenth Avenue 200, Building C South, ...","[{'label': 'display', 'lat': 43.65383853190168...",43.653839,-79.385877,,L4B 3K9,ON,AtHouse Real Estate,v-1597601541,
2,"[{'id': '4bf58dd8d48988d124941735', 'name': 'O...",False,4d87428af9f3a1cd3ae0ee64,401 Bay Street,CA,Toronto,Canada,Queen,294,"[401 Bay Street (Queen), Toronto ON, Canada]","[{'label': 'display', 'lat': 43.651617, 'lng':...",43.651617,-79.380267,,,ON,Smith Company Commercial Real Estate Services ...,v-1597601541,
3,"[{'id': '503287a291d4c4b30a586d65', 'name': 'F...",False,5ae22cf6ad910e002c093e58,1400-330 Bay Street,CA,Toronto,Canada,,392,"[1400-330 Bay Street, Toronto ON M5H 2S8, Canada]","[{'label': 'display', 'lat': 43.6499932, 'lng'...",43.649993,-79.381183,,M5H 2S8,ON,"Alpha August Real Estate Advisory, Inc. (AAREA)",v-1597601541,491486097.0
4,"[{'id': '5032885091d4c4b30a586d66', 'name': 'R...",False,5e86e29dddb986000760f67b,2 Bloor Street East Suite 3500,CA,Toronto,Canada,,1880,"[2 Bloor Street East Suite 3500, Toronto ON M4...","[{'label': 'display', 'lat': 43.6699641, 'lng'...",43.669964,-79.386079,,M4W 1A8,ON,Real Estate Lawyers.ca LLP,v-1597601541,581187961.0


Now we need to manipulate so we can start pulling Ratings for realtors.

In [63]:
# keep only columns that include venue name, and anything that is associated with location
estate_columns = ['name', 'categories'] + [col for col in real_estate.columns if col.startswith('location.')] + ['id']
dataframe_estate = real_estate.loc[:, estate_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
dataframe_estate['categories'] = dataframe_estate.apply(get_category_type, axis=1)

# clean column names by keeping only last term
dataframe_estate.columns = [column.split('.')[-1] for column in dataframe_estate.columns]
dataframe_estate.head()

Unnamed: 0,name,categories,address,cc,city,country,crossStreet,distance,formattedAddress,labeledLatLngs,lat,lng,neighborhood,postalCode,state,id
0,Real Estate Lawyers.ca LLP,Real Estate Office,1 Dundas Street West Suite 2500,CA,Toronto,Canada,,521,"[1 Dundas Street West Suite 2500, Toronto ON M...","[{'label': 'display', 'lat': 43.6486362, 'lng'...",43.648636,-79.381744,,M5G 1Z3,ON,5e86dbb7ddb9860007500cd2
1,AtHouse Real Estate,Real Estate Office,"1550 Sixteenth Avenue 200, Building C South",CA,Richmond Hill,Canada,,226,"[1550 Sixteenth Avenue 200, Building C South, ...","[{'label': 'display', 'lat': 43.65383853190168...",43.653839,-79.385877,,L4B 3K9,ON,5e477ba435aef200082bf259
2,Smith Company Commercial Real Estate Services ...,Office,401 Bay Street,CA,Toronto,Canada,Queen,294,"[401 Bay Street (Queen), Toronto ON, Canada]","[{'label': 'display', 'lat': 43.651617, 'lng':...",43.651617,-79.380267,,,ON,4d87428af9f3a1cd3ae0ee64
3,"Alpha August Real Estate Advisory, Inc. (AAREA)",Financial or Legal Service,1400-330 Bay Street,CA,Toronto,Canada,,392,"[1400-330 Bay Street, Toronto ON M5H 2S8, Canada]","[{'label': 'display', 'lat': 43.6499932, 'lng'...",43.649993,-79.381183,,M5H 2S8,ON,5ae22cf6ad910e002c093e58
4,Real Estate Lawyers.ca LLP,Real Estate Office,2 Bloor Street East Suite 3500,CA,Toronto,Canada,,1880,"[2 Bloor Street East Suite 3500, Toronto ON M4...","[{'label': 'display', 'lat': 43.6699641, 'lng'...",43.669964,-79.386079,,M4W 1A8,ON,5e86e29dddb986000760f67b


## Now we need to explore some Realtors and pull Ratings

In [64]:
venue_id = '5e477ba435aef200082bf259' # ID of a selected Realtor
realtor = 'https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&v={}'.format(venue_id, CLIENT_ID, CLIENT_SECRET, VERSION)
realtor

'https://api.foursquare.com/v2/venues/5e477ba435aef200082bf259?client_id=NQ0VZRXFYJPMXAEBK0GBQJLL53HEQ435LZ3F4TD15QTI0HO1&client_secret=YLXKDJUPXETBAP3GE1YADVQ3TZHBROLQZNGMUDJNY1350HPP&v=20180605'

In [65]:
result_realtor = requests.get(realtor).json()
print(result_realtor['response']['venue'].keys())
result_realtor['response']['venue']

dict_keys(['id', 'name', 'contact', 'location', 'canonicalUrl', 'categories', 'verified', 'stats', 'likes', 'dislike', 'ok', 'allowMenuUrlEdit', 'beenHere', 'specials', 'photos', 'reasons', 'hereNow', 'createdAt', 'tips', 'shortUrl', 'timeZone', 'listed', 'seasonalHours', 'pageUpdates', 'inbox', 'attributes'])


{'id': '5e477ba435aef200082bf259',
 'name': 'AtHouse Real Estate',
 'contact': {'phone': '9058831988', 'formattedPhone': '(905) 883-1988'},
 'location': {'address': '1550 Sixteenth Avenue 200, Building C South',
  'lat': 43.653838531901684,
  'lng': -79.38587665557861,
  'labeledLatLngs': [{'label': 'display',
    'lat': 43.653838531901684,
    'lng': -79.38587665557861}],
  'postalCode': 'L4B 3K9',
  'cc': 'CA',
  'city': 'Richmond Hill',
  'state': 'ON',
  'country': 'Canada',
  'formattedAddress': ['1550 Sixteenth Avenue 200, Building C South',
   'Richmond Hill ON L4B 3K9',
   'Canada']},
 'canonicalUrl': 'https://foursquare.com/v/athouse-real-estate/5e477ba435aef200082bf259',
 'categories': [{'id': '5032885091d4c4b30a586d66',
   'name': 'Real Estate Office',
   'pluralName': 'Real Estate Offices',
   'shortName': 'Real Estate',
   'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/shops/realestate_',
    'suffix': '.png'},
   'primary': True}],
 'verified': False,
 'stats'

In [66]:
try:
    print(result_realtor['response']['venue']['rating'])
except:
    print('This venue has not been rated yet.')

This venue has not been rated yet.


### We can repeat this until we find a realtor we are happy with. In this example the realtor did not have a rating, but that doesn't mean we shouldn't use them.

## 8. And the final step, visualize our data, removing Dublin, to find our location mapping.

In [67]:
map_data = all_merged.loc[all_merged['Neighbourhood'] != 'Dublin, Ohio']

In [68]:
map_data = map_data[['Neighbourhood','Neighbourhood Latitude','Neighbourhood Longitude','Cluster Labels']]
map_data.head()

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Cluster Labels
1,"Regent Park, Harbourfront",43.65426,-79.360636,1
2,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,1
3,"Garden District, Ryerson",43.657162,-79.378937,1
4,St. James Town,43.651494,-79.375418,1
5,The Beaches,43.676357,-79.293031,0


In [69]:
# create map to show neighborhoods
# add Toronto Lat Lon for mapping
new_lat = 43.6532
new_lon = -79.3832
map_clusters = folium.Map(location=[new_lat, new_lon], zoom_start=13)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(map_data['Neighbourhood Latitude'], map_data['Neighbourhood Longitude'], map_data['Neighbourhood'], map_data['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

# Results  <a name="results"></a>

Our analysis shows that Dublin, Ohio is similar to a large number of neighborhoods within Toronto. When we review the venues, we see similar themes: bars, a large university (Ohio State and Toronto University), coffee shops, cafes, and more. While this data is helpful for comparing the two areas in terms of similarity, some of the data can be obtained visually. In the folio map we can see neighborhoods that are for the University of Toronto - I am not a student, so we could pass - and others are near the water. The analysis returned a high level comparison as expected of ideal neighborhoods to begin our search.


# Discussion  <a name="discussion"></a>

Given the high match of neighborhoods to one cluster, multiple other data elements could be included. This would be part of the continious improvement effort to keep refining our analysis. Since these two cities were chosen for their proximity to one another and similar weather, we could include that as part of our analysis going forward in the event a real estate agent or potential mover did not have a desired city. Some areas to include could be: weather, local language, population, and more.

# Conclusion <a name="conclusion"></a>

The purpose of the analysis was to find an ideal neighborhood for our relocation. Since we had Toronto selected as a desired city prior given similarity to our current residence and the recent COVID news, we processed foursquare data to show our ideal area. Our current residence is similar to many neighborhoods in Toronto so more information is needed. For this, we chose to lookup a real estate agent to help us start navigating the information, and pull their rating from foursquare as well. We found a real estate agent who had no ratings but we were willing to give them a try as our data pulled back many neighborhoods that matched our current residence. 

Since this is a large decision, the final decision will come by partnering all available data and reviewing the areas. Stakeholders should look at specific characteristics of a city like venues, proximity to the city, and more. By using a map, like one in folio, additional insights can be gleaned about a neighborhood and added back into the analysis.