# Identifying neighborhoods for a new restaurant in Atlanta, GA
## Author: Elizabeth Niese
## Date: June 2020



# Background

For this project I am interested in opening a contemporary casual restaurant in Atlanta, GA with a focus on locally sourced food. With Atlanta's increasing population of educated professionals and the current interest in sustainable food, there should be at least one neighborhood in Atlanta that can support such a restaurant. To identify an optimal location for this restaurant I will need to look for areas of the city that has other mid to high end restaurants, indicating that local residents have the financial resources needed to eat out regularly, but is not oversaturated with such options. Another issue that would need to be addressed prior to opening such a restaurant is finding sufficient local farmers to purchase food from,but that is outside the scope of this project.  


# Data 

The neighborhood data is available at https://opendata.arcgis.com/datasets/d6298dee8938464294d3f49d473bcf15_196.geojson.  This data set includes neighborhood information such as name, statistical area, basic demographics, and neighborhood boundaries.  To identify an appropriate neighborhood for the restaurant we will need to 
1. identify the geographic location of each neighborhood,
2. use the Foursquare API to find venues in each neighborhood, and
3. use k-means clustering to identify which neighborhoods are likely to support the restaurant.


# Methodology

After the initial download of the json file, certain features were of interest.  These were the Statistical Area Code, neighborhood name, and latitude and longitude for each neighborhood.  The geographic information was given as boundary coordinates, so to find a latitude and longitude for each neighborhood, the centroid of the polygon was computed.  This is done by taking the average of each coordinate.  A few neighborhoods had boundaries that were not amenable to this process.  These were assigned a latitude and longitude of NaN and dropped later in the data-cleaning process.  

The Foursquare API was used to find venue information for each neighborhood.  Since the goal of this project is finding a good location for a restaurant, once the venue information was loaded into the dataframe, venues were restricted to those that contained the words *restaurant, museum, art,* and *studio*.  These choices were made to help find a neighborhood which can support a higher-end restaurant and has other entertainment venues likely to draw patrons to the neighborhood.  Prior to limiting the venue types, it was difficult 

One hot encoding was used to convert data into numeric form to prepare for *k* means clustering and can be seen in the table excerpt. 

![75C9D7FF-0419-41A5-B7E7-34B7FEC0DFCC.jpeg](attachment:75C9D7FF-0419-41A5-B7E7-34B7FEC0DFCC.jpeg)


To determine the number of clusters that would be best, I used the elbow method available in the yellowbrick package. This method iterates *k* means clustering for a variety of values of *k* and determines the optimal number of clusters for the data. Using the elbow method I determined that 13 clusters would be used. 

![FD0BB331-942B-48B0-A349-BCD6F1220A12_4_5005_c.jpeg](attachment:FD0BB331-942B-48B0-A349-BCD6F1220A12_4_5005_c.jpeg)


*K*-means clustering was used on the neighborhoods and the clusters were plotted on a map. 

![C9EDBAE7-5B6B-4FF6-90A6-FD200B6D0EF9_4_5005_c.jpeg](attachment:C9EDBAE7-5B6B-4FF6-90A6-FD200B6D0EF9_4_5005_c.jpeg)

After mapping the clusters, I also viewed each cluster and the top venues for neighborhoods in that cluster.  This allowed me to identify clusters containing venues that I want near my restaurant. 

# Results

Based on analysis of the clusters, I determinded that cluster 4 and cluster 8 contained neighborhoods that would reasonably support the type of restaurant I intend to open.  In particular, both clusters had cultural sites and a variety of restaurants.  These two clusters also have neighborhods close to each other, so there should be plenty of potential customers near the area in which a restaurant is opened.  

**Cluster 4:**

![F52FEF46-3725-4DC9-A2E0-1C4B58D5E1DB_4_5005_c.jpeg](attachment:F52FEF46-3725-4DC9-A2E0-1C4B58D5E1DB_4_5005_c.jpeg)

**Cluster 8:**

![496FED1E-F80B-4555-B7FB-236AD2ECC90C_4_5005_c.jpeg](attachment:496FED1E-F80B-4555-B7FB-236AD2ECC90C_4_5005_c.jpeg)

Using the map, I decided that I would choose to build in either the Capitol Gateway or the Peoplestown neighborhood.  These are on a main road and near each other.  In fact there are multiple neighborhoods in clusters 4 and 8 near these two neighborhoods.  

![B24F85B5-9F99-4A30-9F8C-15CD53FF9E82_4_5005_c.jpeg](attachment:B24F85B5-9F99-4A30-9F8C-15CD53FF9E82_4_5005_c.jpeg)
![51CFB4B0-EB4D-4E39-8193-EDB6453F639B_4_5005_c.jpeg](attachment:51CFB4B0-EB4D-4E39-8193-EDB6453F639B_4_5005_c.jpeg)

# Discussion

For this project, I needed to access a data set for neighborhoods in a large city.  I chose Atlanta since it is a major city in the southeast United States and because it had readily accessible neighborhood data.  For this project, the scope was limited to looking at venues in neighborhoods and inferring neighborhoods that could support a contemporary casual style restaurant.  Prior to opening a restaurant, other considerations such as average income of nearby neighborhoods, crime rates, and accessibility.  The *k*-means clustering algorithm allowed me to group neighborhoods by prevalent venues.  By mapping the neighborhoods by cluster, I was able to identify geographic regions of the city that had similar ability to support a restaurant.  

Both the map and the cluster analysis were important for identifying neighborhoods for this restaurant.  Using the cluster analysis, I was able to determine which clusters had top venues with characteristics indicating neighborhoods that could support a new restaurant.  Using the map, I was able to identify neighborhoods that were geographically close to other neighborhoods with similar characteristics.  Future work for determining location would require more detailed information within neighborhoods.