# Battle of the Neighborhoods

## Introduction/Business Problem

A new brewery, Hops Inc, wants to open business in the Portland metropolitan area. Their goal is to open multiple locations throughout the area, primarily targeting neighborhoods that already feature breweries. Their management theory is that dense population centers with a high proportion of breweries will yield greater opportunities for partnerships between brewers - pub crawls, growler exchanges, etc.

The goal of the project is to identify at least three neighborhoods in the metropolitan area surrounding Portland OR that will suite Hops Inc's endeavor to open _at least three_ locations situated with relative proximity to clusters of existing breweries.

## Data

The first dataset required will be the names and geographical locations of the neighborhoods in the Portland metropolitan area. This data will be sourced and scraped from **PDXListed.com**: https://www.pdxlisted.com/neighborhoods/ This data source will permit us to identify:

- Region names
- Neighborhood/City names
- Approximate neighborhood population

We will further use the a **Python Geocoder** to identify the latitude and longitude of each neighborhood.

We will leverage the **Foursquare API** to identify all breweries in the region by iterating through each neighborhood in turn. Marrying our brewery location data with neighborhood geolocations will allow us to identify the neighborhoods with the greatest number of breweries _as well as_ the density of breweries with respect to population. 

## Methodology

### Data Collection

Our first step is to produce a dataframe consolidating all of the region and neighborhood/city names for the metropolitan area using data from PDXListed.com. We will leverage Python's request library and BeautifulSoup to parse the HTML page and extract our data accordingly.

Rather than iterating through our data twice, we'll _also_ geocode as we go for each neighborhood.

In [68]:
from bs4 import BeautifulSoup
import folium
from geopy.geocoders import Nominatim
import pandas as pd
import urllib.request

urlmap = {
    'Northwest Portland': 'http://www.pdxlisted.com/neighborhoods/nw-pdx/',
    'Southwest Portland': 'http://www.pdxlisted.com/neighborhoods/sw-pdx/',
    'Northeast Portland': 'http://www.pdxlisted.com/neighborhoods/ne-pdx/',
    'Southeast Portland': 'http://www.pdxlisted.com/neighborhoods/se-pdx/',
    'North Portland': 'http://www.pdxlisted.com/neighborhoods/nopo/',
    'East Portland': 'http://www.pdxlisted.com/neighborhoods/e-pdx/'
}
geolocator = Nominatim(user_agent='ca_explorer')

neighborhoods = []

for region, url in urlmap.items():
    print(f"Fetching data for {region}...")
    page = urllib.request.urlopen(url)
    soup = BeautifulSoup(page, 'lxml')
    
    table = soup.find('table', class_='neighborhoods')
    
    if table is None:
        print(f"No neighborhoods in {region}")
        break
    
    # Iterate through the table cells
    for row in table.findAll('tr'):
        cells = row.findAll('td')
        for cell in cells:
            neighborhood = cell.find(text=True)
            
            if neighborhood is not None:
                link = cell.find('a', href=True)['href']
                neighbor_page = urllib.request.urlopen(link)
                neighbor_soup = BeautifulSoup(neighbor_page, 'lxml')
                
                neighbor_table = neighbor_soup.find('table')
                population_row = neighbor_table.findAll('tr')[1]
                population_cell = population_row.findAll('td')[1]
                
                population = population_cell.find(text=True)                
                latitude = None
                longitude = None
                location = geolocator.geocode(f"{neighborhood} Portland, OR")
                
                if location is not None:
                    latitude = location.latitude
                    longitude = location.longitude
                
                neighborhoods.append({'Region': region, 'Neighborhood': neighborhood, 'Population': population, "Latitude": latitude, "Longitude": longitude})
                
neighbor_df = pd.DataFrame(neighborhoods)
neighbor_df['Population'] = neighbor_df['Population'].str.replace(r'\D+','').astype('int')
neighbor_df

Fetching data for Northwest Portland...
Fetching data for Southwest Portland...
Fetching data for Northeast Portland...
Fetching data for Southeast Portland...
Fetching data for North Portland...
Fetching data for East Portland...


Unnamed: 0,Region,Neighborhood,Population,Latitude,Longitude
0,Northwest Portland,Forest Park,7733,45.561376,-122.758458
1,Northwest Portland,Arlington Heights,950,45.519496,-122.710667
2,Northwest Portland,Linnton,537,45.600330,-122.786779
3,Northwest Portland,Goose Hollow,4796,45.517749,-122.692819
4,Northwest Portland,Northwest Heights,1158,45.540806,-122.774354
...,...,...,...,...,...
87,East Portland,Glenfair,2991,45.522719,-122.504133
88,East Portland,Powellhurst Gilbert,23381,45.492568,-122.538696
89,East Portland,Lents,20105,45.479661,-122.564504
90,East Portland,Centennial,23873,45.505595,-122.499711


In [69]:
pdx_lat = 45.5202471
pdx_long = -122.6741949
map_pdx = folium.Map(location=[pdx_lat, pdx_long], zoom_start=12)

# Some of our neighborhoods will fail geolocation. Remove them
filtered = neighbor_df.dropna()

# Mark every neighborhood on the map to ensure we have solid coverage of the area
for lat, lng, neighborhood in zip(filtered['Latitude'], filtered['Longitude'], filtered['Neighborhood']):
    label = folium.Popup(neighborhood, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_pdx)  
    
map_pdx

Our next step is to leverage the Foursquare API to identify all breweries in the metro area. We'll iterate through each neighborhood and fetch a list of breweries in that area. Then we'll de-dupe the list of all breweries in the Portland area and overlay their positions on the map. Finally, we'll change the neighborhood weight to reflect the densiity of breweries (breweries per capita) to better hightlight the most attractive locations.

In [70]:
import requests

from creds import CLIENT_ID, CLIENT_SECRET
VERSION = '20180604'
LIMIT = 50

search_query = 'brewery'
radius = 805 # Half mile in meters

brewery_count = {} # Dict to track neighborhood => brewery counts
breweries = pd.DataFrame(columns=['Name', 'Longitude', 'Latitude'])  # Set to track unique tuples of breweries (Brewery, lat, long)

# Iterate through our neighborhoods
for i, row in filtered.iterrows():
    latitude = row['Latitude']
    longitude = row['Longitude']

    url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)
    results = requests.get(url).json()
    
    venues = results['response']['venues']
    
    if len(venues) == 0:
        continue

    # Transform venues into a dataframe
    dataframe = pd.json_normalize(venues)
       
    # Filter things down to _just_ the columns we care about - name, latitude, longitude
    filtered_dataframe = dataframe[['name', 'location.lng', 'location.lat']]
    filtered_dataframe.rename(columns={'name': 'Name', 'location.lng': 'Longitude', 'location.lat': 'Latitude'}, inplace=True)
   
    # Store our data for later
    brewery_count[row['Neighborhood']] = filtered_dataframe.shape[0]
    breweries = breweries.append(filtered_dataframe)

# Dedupe our brewery list for visualization
breweries = breweries.drop_duplicates()
breweries

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().rename(


Unnamed: 0,Name,Longitude,Latitude
0,Kells Brewery,-122.69421,45.524561
0,Brewery Blocks Parking Garage,-122.68355,45.52413
1,Deschutes Brewery Portland Public House,-122.681982,45.524544
2,Brewery Block #2,-122.682607,45.523662
3,Brewery Blocks Motorized Carriage House,-122.684295,45.523906
4,Starbucks,-122.68204,45.52376
5,Bluemercury,-122.681835,45.523798
6,Von Ebert Brewing,-122.68469,45.52398
7,Rogue Ales Public House & Distillery,-122.684982,45.525807
8,Old Town Pizza & Brewing,-122.673011,45.524546


In [71]:
# Update our neighborhood dataframe with brewery information
filtered['Breweries'] = filtered.apply(lambda row: brewery_count[row['Neighborhood']] if row['Neighborhood'] in brewery_count else 0, axis=1)
filtered['Per_Capita'] = filtered.apply(lambda row: row['Breweries'] / row['Population'], axis=1)
filtered

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  filtered['Breweries'] = filtered.apply(lambda row: brewery_count[row['Neighborhood']] if row['Neighborhood'] in brewery_count else 0, axis=1)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  filtered['Per_Capita'] = filtered.apply(lambda row: row['Breweries'] / row['Population'], axis=1)


Unnamed: 0,Region,Neighborhood,Population,Latitude,Longitude,Breweries,Per_Capita
0,Northwest Portland,Forest Park,7733,45.561376,-122.758458,0,0.000000
1,Northwest Portland,Arlington Heights,950,45.519496,-122.710667,0,0.000000
2,Northwest Portland,Linnton,537,45.600330,-122.786779,0,0.000000
3,Northwest Portland,Goose Hollow,4796,45.517749,-122.692819,1,0.000209
4,Northwest Portland,Northwest Heights,1158,45.540806,-122.774354,0,0.000000
...,...,...,...,...,...,...,...
87,East Portland,Glenfair,2991,45.522719,-122.504133,0,0.000000
88,East Portland,Powellhurst Gilbert,23381,45.492568,-122.538696,0,0.000000
89,East Portland,Lents,20105,45.479661,-122.564504,1,0.000050
90,East Portland,Centennial,23873,45.505595,-122.499711,0,0.000000


In [72]:
# Now for the final map! Breweries will be small points in brown. Neighborhood centers will be in blue but 
# weighted base on the per_capita value in the dataframe, scaled between 1-10

max_density = filtered['Per_Capita'].max()

filtered['Normalized'] = filtered.apply(lambda row: row['Per_Capita'] / max_density * 9 + 1, axis=1)

# And now the map
final_map = folium.Map(location=[pdx_lat, pdx_long], zoom_start=12)

# Mark every neighborhood on the map to ensure we have solid coverage of the area
for lat, lng, neighborhood, weight in zip(filtered['Latitude'], filtered['Longitude'], filtered['Neighborhood'], filtered['Normalized']):
    label = folium.Popup(neighborhood, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=weight,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.2,
        parse_html=False).add_to(final_map)  

for lat, lng, name in zip(breweries['Latitude'], breweries['Longitude'], breweries['Name']):
    label = folium.Popup(name, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=1,
        popup=label,
        color='brown',
        fill=True,
        fill_color='#663300',
        fill_opacity=1,
        parse_html=False).add_to(final_map)  
    
final_map

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  filtered['Normalized'] = filtered.apply(lambda row: row['Per_Capita'] / max_density * 9 + 1, axis=1)


## Results

In [73]:
# Let's identify the neighborhoods with the greates potential based on existing density
potentials = filtered[filtered['Normalized'] > 3].sort_values(by=['Normalized'], ascending=False)
potentials

Unnamed: 0,Region,Neighborhood,Population,Latitude,Longitude,Breweries,Per_Capita,Normalized
9,Northwest Portland,Old Town Chinatown,2820,45.524934,-122.673516,9,0.003191,10.0
27,Southwest Portland,,3038,45.520247,-122.674195,8,0.002633,8.425938
7,Northwest Portland,Pearl District,5244,45.529044,-122.681598,9,0.001716,5.839817
47,Northeast Portland,Lloyd,2374,45.531382,-122.660082,2,0.000842,3.375737
48,Northeast Portland,Sullivan’s Gulch,2683,45.532939,-122.640494,2,0.000745,3.102124


## Discussion

Based on the data and regional knowledge, the best neighborhoods in which to open a brew pub based on existing density of similar venues will be near downtown. Old Town Chinatown in particular is the most dense locale with respect to existing venues compared to population. The next densest area is labeled blank in the list above but is the downtown core (a character encoding issue broke the dataframe display).

If the goal is to target existing areas with hight density but also limit competition, the Lloyd district or Sullivan's Gulch neighborhoods would be attractive options. They each only have 2 breweries (limited competition) but due to population density have plenty of potential customers.

## Conclusion

Hops Inc, wants to open business in the Portland metropolitan area, focusing on neighborhoods with high volumes of existing breweries to foster greater opportunities for partnership. Based on a deep walk through of existing venues thanks to Foursqure, the single best neighborhood for a new brewery is the **Pearl District**.

This neighborhood does not feature the highest per capita brewery distribution but is in the top five in the metropolitan area. Of neighborhoods in portland, the Pearl District already ties for the largest number of breweries but beats out Old Town Chinatown with almost double the population. This ensures plenty of opportunity for parternships and collaboration with other venues without overwhelming the local population due to oversaturation.