# 1. Introduction
 There are over 200 named neighborhoods in Atlanta, each with a unique history and culture. Sadly, for a long time these neighborhoods were isolated from each other. The Atlanta Beltline project aims to correct that by providing easy walking paths around the city. These pathways have seen exploding business and cultural growth since construction.
I am from Atlanta, GA and have always loved beer, so I am always on the look out for new beer serving establishments. But what would it look like to start my own? Would the Beltline help? Would I get drowned out by other bars or restaurants?

# 2. Business Problem
If I want to open a beer taproom in Atlanta, where should I put it? Where would I get the best value for my money while also reaching the most popular neighborhoods. By using population and FourSquare data, can I see which neighborhoods have the fewest beer serving establishments per capita?

# 3. Data Description

## 3.1 Atlanta Neighborhood Data
The City of Atlanta publishes geographic data for each of the neighborhoods within the city limits. This shapefile includes geometries. We will use only two columns from this data set:
* Name
* geometry

## 3.2 Atlanta Neighborhood Population
Wikipedia has an article with population values for Atlanta neighborhoods with popultions over 500 people. Since I probably do not want to open a bierhaus in a sparsely populated neighborhood, I accept this limitation. The page is here: [https://en.wikipedia.org/wiki/Table_of_Atlanta_neighborhoods_by_population](https://en.wikipedia.org/wiki/Table_of_Atlanta_neighborhoods_by_population).

## 3.3 FourSquare API Data
Using the FourSquare API, I will gather the number of beer serving venues in each Neighborhood. Since I am using the a free license with FourSquare I will limit the queries to venues and not menus. Expected data we will retrieve is:
* Neighbourhood : Name of the Neighbourhood
* Neighbourhood Latitude : Latitude of the Neighbourhood
* Neighbourhood Longitude : Longitude of the Neighbourhood
* Venue : Name of the Venue
* Venue Latitude : Latitude of Venue
* Venue Longitude : Longitude of Venue
* Venue Category : Category of Venue

# 4. Method

My analytical approach to this problem will be to determine the best neighborhoods based on populaton and the most popular venues in that neighborhood. I will then determine which neighborhood to open my bierhaus in based on the lowset venue per capita neighborhood of those clustered together as high beer drinking neighborhoods.

## 4.1 Library and Data Import

I will be using python and a jupyer notebook for this so I will first start by importing the proper packages and libraries for my analysis.

In [None]:
import requests
import numpy as np
import pandas as pd
import geopandas as gpd
import folium
from sklearn.cluster import KMeans
import matplotlib.cm as cm
import matplotlib.colors as colors
from config import fs_csec #this holds my private key for foursquare

In [None]:
atl_neighborhoods = gpd.read_file('Neighborhoods-shp/Neighborhoods.shp').to_crs(4326)
wiki = pd.read_html(requests.get("https://en.wikipedia.org/wiki/Table_of_Atlanta_neighborhoods_by_population").text)
wiki = wiki[0]

In [None]:
atl_neighborhoods.head()

In [None]:
atl_neighborhoods.shape

## 4.2 Data preprocessing
I do not need to do much preprocessing on the data since the structures are already fairly clean. But I do want to map the neighborhood population to the geogrpahic data

In [None]:
atl_neighborhoods['POP'] = atl_neighborhoods.NAME.map(dict(zip(wiki.Neighborhood,wiki['Population (2010)'])))

The population by neighborhood list only includes neighbohoods with populations over 500 in 2010, which is fine. I will simply remove the other, low population neighborhoods as I probably do not want to open a bierhaus in those anyway. We also want to add the latitude and longitudes of the centers of each neighborhood.

In [None]:
atl_neighborhoods = atl_neighborhoods[~(atl_neighborhoods['POP'].isna())].reset_index(drop=True)
atl_neighborhoods['Latitude'] = atl_neighborhoods.geometry.centroid.y
atl_neighborhoods['Longitude'] = atl_neighborhoods.geometry.centroid.x
atl_neighborhoods.shape

So we have 153 potential target neighborhoods, lets map them just to see what those look like.

In [None]:
latitude = 33.7490 ## Atlanta latitude
longitude = -84.3880 ## Atlanta longitude

In [None]:
m1 = folium.Map([latitude, longitude], zoom_start = 11)
folium.TileLayer('CartoDB positron',name="Light Map",control=False).add_to(m1)
folium.GeoJson(atl_neighborhoods).add_to(m1)
m1

Now lets add a little color to see how these neighborhoods compare, population-wise.

In [None]:
m2 = folium.Map([latitude,longitude], zoom_start = 11)
scale = (atl_neighborhoods['POP'].quantile((0,0.1,0.75,0.9,0.98,1))).tolist()
folium.TileLayer('CartoDB positron',name="Light Map",control=False).add_to(m2)
folium.Choropleth(
 geo_data=atl_neighborhoods,
 name='Choropleth',
 data=atl_neighborhoods,
 columns=['NAME','POP'],
 key_on="feature.properties.NAME",
 fill_color='YlGnBu',
 threshold_scale=scale,
 fill_opacity=0.8,
 line_opacity=0.2,
 legend_name='Population (2010)',
 smooth_factor=0
).add_to(m2)

m2

Now lets plot the centroid points of each neighborhood.

In [None]:
m3 = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
#for point, label in zip(df.geometry.centroid.to_crs(4326), df['NAME']):
for lat, lng, label in zip(atl_neighborhoods['Latitude'], atl_neighborhoods['Longitude'], atl_neighborhoods['NAME']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        #point,
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(m3)  
    
m3

## 4.3 Get Venues from FourSquare
In order to best use the FourSquare API, I tried using the neighborhood name as a search criteria, but most did not return any results, so we will use the central points combined withed a as our search criteria.

But first we set some basic variables so we can hit the FourSquare API.

In [None]:
CLIENT_ID = 'CXDDIXSSG4QDUN25PUJXW4ZGNEAX24P15S40NYRZ5SE3FA2M' # your Foursquare ID
CLIENT_SECRET = fs_csec # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

Here we define a function for finding all venues within our neighborhood.

In [None]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        #print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [None]:
atl_venues = getNearbyVenues(names=atl_neighborhoods['NAME'],
                                   latitudes=atl_neighborhoods['Latitude'],
                                   longitudes=atl_neighborhoods['Longitude']
                                  )

In [None]:
atl_venues.head()

Lets see the different types of venues in our neighborhoods.

In [None]:
atl_venues['Venue Category'].sort_values().unique().tolist()

Of these, let's use Bar, Beer Bar, Gastropub, Irish Pub, Pub, Sports Bar, Whiskey Bar, and Wine Bar as the types of venues that my Bierhaus would compete with.

In [None]:
comp = ['Bar', 'Beer Bar', 'Gastropub', 'Irish Pub', 'Pub', 'Sports Bar', 'Whiskey Bar', 'Wine Bar']

Let's reduce our results for only these venue types and group by the Neighborhood to count how many of these venues exist.

In [None]:
atl_venues = atl_venues[atl_venues['Venue Category'].isin(comp)]
atl_grouped = atl_venues.groupby('Neighborhood')['Venue Category'].count().reset_index()
atl_grouped

## 5. Results and Making the decision

As we saw above, only 18 of the Neighborhoods have venues we would compete with. As I said before, we do not want to be the first into a new market.

To make a decision, we will map these venue counts to our original data and find venues per 1000 people and select the neighborhood with the highest value.

In [None]:
atl_neighborhoods['Number of Venues'] = atl_neighborhoods['NAME'].map(dict(zip(atl_grouped['Neighborhood'],atl_grouped['Venue Category'])))
atl_neighborhoods['Venues per 1000 People'] = (atl_neighborhoods['Number of Venues']/atl_neighborhoods['POP'])*1000

Let's visualize the results.

In [None]:
m4 = folium.Map([latitude,longitude], zoom_start = 11)
scale = (atl_neighborhoods['Venues per 1000 People'].quantile((0,0.1,0.75,0.9,0.98,1))).tolist()
folium.TileLayer('CartoDB positron',name="Light Map",control=False).add_to(m4)
folium.Choropleth(
 geo_data=atl_neighborhoods,
 name='Choropleth',
 data=atl_neighborhoods,
 columns=['NAME','Venues per 1000 People'],
 key_on="feature.properties.NAME",
 fill_color='YlGnBu',
 threshold_scale=scale,
 fill_opacity=0.8,
 line_opacity=0.2,
 legend_name='Venues per 1000 People',
 smooth_factor=0
).add_to(m4)

m4

Most of the neighborhoods are black because they have no venues like ours. But it is hard to tell which neighborhood we want to go into exactly. So, let's see which neighborhood is our best area to target for our Bierhaus.

In [None]:
best_neighborhood = atl_neighborhoods.sort_values(by='Venues per 1000 People', ascending = False).reset_index(drop=True)['NAME'][0]
print('The Atlanta Neighborhood where I should open a Bierhaus is', best_neighborhood+'.')

So the Marietta Street Artery is ripe for a Bierhaus, and being an Atlanta native, I can confrim this would be a great place for a Bierhaus.

# 6. Disucssion

This analysis is designed to give general recommendations for where to put my Bierhaus. But there are some assumptions we made and shortcuts taken.

## 6.1 Assumptions

The biggest assumption we made in this analysis is that the best place to put a Bierhaus is simply the neighborhood with the highest beer-like drinks venues per 1000 people. Obviously, there are other things to consider like space availability, rent, access, etc. This analysis was designed to be more of a first pass on which neighborhoods show a desnity of venues like the one I would like to open.

Another assumption we made was that the FourSquare data included all venues similar to my Bierhaus concept. There are many different types of drinks venues so we would want to investigate that further.

## 6.2 Areas for improvement

Ideally, I would have searched FourSquare using the neighborhood names themselves. I tried this but got many errors and saw that there was a misalignment with the names of Atlanta neighborhoods as the city seems them and what is reported in FourSquare. To overcome this, I simply used the central latitudes and longitudes for each neighbood and searched around a certain raidus from those points. We probably missed some venues using this method.

Finally, Atlanta is known as a sprawling city with many people driving in and out of the city-proper and its neighborhoods to go about their lives. I also already mentioned the Atlanta Beltline project as a means to connect the various neighborhoods. Basing this decision on reported population is likely missing the true story of the average daily person count in a particular neighborhood. I would like to improve this model with this data if it were available.

# 7. Conclusion

The analysis contained in this report concludes that the Marietta Street Artery is the best neighborhood in Atlanta where I should open a Bierhaus. Using python, FourSquare, and some visualization techniques, we have shown how some very easily accesable data can help aspiring business owners make smart decisions. Thank you for reading!