<h1>Capstone Project - The Battle of Neighborhoods (Week 2)</h1>

<h2>Introduction</h2>

What do you think of when you think of shopping? Probably a mall, right? In most areas, rural and urban, a shopping mall is the primary location for folks to walk around, purchase items, and sometimes even have a meal. For this project, we're going to determine where is the best place to open a mall in the Greater Boston Area. The most famous mall in the city itself is perhaps Copley Place in the Back Bay neighborhood which features stores that range from Tiffany & Co. to Lucky Brand, but is Back Bay *really* the best place to be?

As the Greater Boston Area continues to expand and develop into its surrounding neighborhoods such as Dorchester, Roxbury, etc., this information will become relevant and can even be manipulated to target the development of movie theaters, housing complexes, etc. instead of malls. Therefore, the target audience of this report will be developers and investors that are looking to profit on the production of such development. Having this information could not only promote development, but also drive profits for investors. According to Norada Real Estate Investments, the average housing prices in Boston are increasing by about 5.7% per year. This might not seem like much until you realize that the most expensive neighborhood of Beacon Hill has an median housing price of over $2 million dollars. Having such information, but for all neighbors can let investors know if its really worth placing a mall in an area where housing is so expensive. 

<h2>Data</h2>

The primary dataset we're going to be using in this project is that from the Foursquare API, specifically the venue data that lists the shopping malls in the Greater Boston Area. In regards to the area we will be surveying, I will refer to the 22 neighborhoods listed in [this](https://en.wikipedia.org/wiki/Neighborhoods_in_Boston) Wikipedia article. I will use python webscraping techniques using beautiful soup in order extract the neighborhood information which I'll then be able to send into the Geocoder package and eventually the Foursquare API to get venue information

Once we have all of our data sourced, we will be able to implement the techniques learned in this course, specifically k-means clustering in order to determine the answer to our question at hand. The clustering will happen based on the venues (specifically the malls) in the different zip codes that encompass the neighborhoods.

<h2>Analysis</h2>

<h3>Install Necessary Dependencies</h3>

In [48]:
import numpy as np
import pandas as pd
import requests
#!conda install -c conda-forge bs4 --yes
import bs4
import requests
from urllib.request import urlopen
from bs4 import BeautifulSoup
#!conda install -c conda-forge folium=0.5.0 --yes 
import folium
#!conda install -c conda-forge geopy --yes
import geopy
from geopy.geocoders import Nominatim
#!conda install -c conda-forge geocoder --yes
import geocoder

<h3>Import Data</h3>

In this section we will scrape the Wikipedia page in order to retrieve the names of the 22 neighborhoods in the Greater Boston Area. Something that was prevalent in this process was that in the neighborhood name was surrounding areas that were included. In order to avoid running into issues with this going forward, everything except for the neighborhood name was emitted.

In [30]:
neighborhoodData = requests.get("https://en.wikipedia.org/wiki/Neighborhoods_in_Boston").text
souped = BeautifulSoup(neighborhoodData, 'html.parser')

In [31]:
data = []

for row in souped.find_all("div", class_="div-col columns column-width")[0].findAll("li"):
    data.append(row.text)
    
df = pd.DataFrame({"Neighborhood": data})
df.info() # Should show that there are 22 entries (aka 22 neighborhoods)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 22 entries, 0 to 21
Data columns (total 1 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   Neighborhood  22 non-null     object
dtypes: object(1)
memory usage: 304.0+ bytes


In [34]:
print(df)

                                         Neighborhood
0                                             Allston
1                                            Back Bay
2                                         Bay Village
3                                         Beacon Hill
4                                            Brighton
5                                         Charlestown
6                          Chinatown/Leather District
7   Dorchester (divided for planning purposes into...
8                                            Downtown
9                                         East Boston
10                 Fenway Kenmore (includes Longwood)
11                                          Hyde Park
12                                      Jamaica Plain
13                                           Mattapan
14                                       Mission Hill
15                                          North End
16                                         Roslindale
17                          

In [42]:
df['Neighborhood'].replace({'Fenway Kenmore (includes Longwood)':'Fenway Kenmore'},inplace=True)
df['Neighborhood'].replace({'Dorchester (divided for planning purposes into Mid Dorchester and Dorchester)':'Dorchester'},inplace=True)

In [43]:
print(df)

                  Neighborhood
0                      Allston
1                     Back Bay
2                  Bay Village
3                  Beacon Hill
4                     Brighton
5                  Charlestown
6   Chinatown/Leather District
7                   Dorchester
8                     Downtown
9                  East Boston
10              Fenway Kenmore
11                   Hyde Park
12               Jamaica Plain
13                    Mattapan
14                Mission Hill
15                   North End
16                  Roslindale
17                     Roxbury
18                South Boston
19                   South End
20                    West End
21                West Roxbury


In [35]:
df.shape

(22, 1)

<h3>Latitude & Longitude</h3>

In this section we will retrieve the coordinates of each neighborhood using the geocoder library. This method is similar to that provided in the *Segmenting and Clustering Neighborhoods in Toronto* section, but is slightly modified to examine the neighborhood rather than the postal code. After the coordinates were received, they were matched to their respective neighborhoods in the original dataframe initialized above.

In [50]:
def getCoords(neighborhood):
    coords = None
    while(coords is None):
        geo = geocoder.arcgis('{}, Boston, Massachusetts'.format(neighborhood))
        coords = geo.latlng
    return coords

In [52]:
coordinates = [ getCoords(neighborhood) for neighborhood in df["Neighborhood"].tolist() ]

In [58]:
dfCoordinates = pd.DataFrame(coordinates, columns=['Latitude', 'Longitude'])

df['Latitude'] = dfCoordinates['Latitude']
df['Longitude'] = dfCoordinates['Longitude']

print(df.head())

  Neighborhood   Latitude  Longitude
0      Allston  42.350531 -71.111091
1     Back Bay  42.349990 -71.087650
2  Bay Village  42.348165 -71.068470
3  Beacon Hill  42.358420 -71.068600
4     Brighton  42.352134 -71.124925


<h3>Initialize a Map</h3>

In this section we will initialize a map that not only uses the general coordinates of the Greater Boston Area, but also each individual neighborhood. The blue markers found in the map are marking the 22 neighborhoods retrieved above.

In [67]:
# Sourced from Google Search Response, Coordinates of Boston: 42.3601° N, 71.0589° W

latitude = 42.3601
longitude = -71.0589
print('The geograpical coordinate of Boston, Massachusetts {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Boston, Massachusetts 42.3601, -71.0589.


In [66]:
map_bos = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, neighborhood in zip(df['Latitude'], df['Longitude'], df['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_bos)  
    
map_bos

<h3>Foursquare API: Initialized & </h3>