# Identifying neighborhoods for a new restaurant in Atlanta, GA
## Author: Elizabeth Niese
## Date: June 2020



# Background

For this project I am interested in opening a contemporary casual restaurant in Atlanta, GA with a focus on locally sourced food. With Atlanta's increasing population of educated professionals and the current interest in sustainable food, there should be at least one neighborhood in Atlanta that can support such a restaurant. To identify an optimal location for this restaurant I will need to look for areas of the city that has other mid to high end restaurants, indicating that local residents have the financial resources needed to eat out regularly, but is not oversaturated with such options. Another issue that would need to be addressed prior to opening such a restaurant is finding sufficient local farmers to purchase food from,but that is outside the scope of this project.


# Data 

The neighborhood data is available at https://opendata.arcgis.com/datasets/d6298dee8938464294d3f49d473bcf15_196.geojson.  To identify an appropriate neighborhood for the restaurant we will need to 
1. identify the geographic location of each neighborhood,
2. use the Foursquare API to find venues in each neighborhood, and
3. use k-means clustering to identify which neighborhoods are likely to support the restaurant.


# Methodology

After the initial download of the json file, certain features were of interest.  These were the Statistical Area Code, neighborhood name, and latitude and longitude for each neighborhood.  The geographic information was given as boundary coordinates, so to find a latitude and longitude for each neighborhood, the centroid of the polygon was computed.  This is done by taking the average of each coordinate.  A few neighborhoods had boundaries that were not amenable to this process.  These were assigned a latitude and longitude of NaN and dropped later in the data-cleaning process.  

From here the Foursquare API was used to find venue information for each neighborhood.  Since the goal of this project is finding a good location for a restaurant, once the venue information was loaded into the dataframe, venues were restricted to those that contained the words *restaurant, museum, art,* and *studio*.  These choices were made to help find a neighborhood which can support a higher-end restaurant and has other entertainment venues likely to draw patrons to the neighborhood.

One hot encoding was used to convert data into numeric form to prepare for *k*-means clustering and can be seen in the table excerpt. 

![75C9D7FF-0419-41A5-B7E7-34B7FEC0DFCC.jpeg](attachment:75C9D7FF-0419-41A5-B7E7-34B7FEC0DFCC.jpeg)


To determine the number of clusters that would be best, I used the elbow method available in the yellowbrick package.  Using the elbow method I determined that 8 clusters would be used. 

![D24D418D-1306-4C91-A8EC-A521620F6CFE_4_5005_c.jpeg](attachment:D24D418D-1306-4C91-A8EC-A521620F6CFE_4_5005_c.jpeg)



*K*-means clustering was used on the neighborhoods and the clusters were plotted on a map. 

![00DEFB5F-9B3D-442E-AEF5-EB35D7838950_4_5005_c.jpeg](attachment:00DEFB5F-9B3D-442E-AEF5-EB35D7838950_4_5005_c.jpeg)




# Results

Based on analysis of the clusters, I determinded that cluster 1 and cluster 4 contained neighborhoods that would reasonably support the type of restaurant I intend to open.  In particular, both clusters had cultural sites and a variety of restaurants.  

**Cluster 1:**

![A4B10A9E-2A05-47CF-8368-3E5E9DC5EF1C_4_5005_c.jpeg](attachment:A4B10A9E-2A05-47CF-8368-3E5E9DC5EF1C_4_5005_c.jpeg)

**Cluster 4:**

![93583A27-F645-4435-92E9-2F9F51792D78_4_5005_c.jpeg](attachment:93583A27-F645-4435-92E9-2F9F51792D78_4_5005_c.jpeg)

Using the map, I decided that I would choose to build in either the Peoples Town neighborhood or the Capitol Gateway neighborhood.  These are on a main road and near each other.  In fact there are multiple neighborhoods in clusters 1 and 4 near these two neighborhoods.  

![1F591011-4F92-4E99-97FA-EA3EFD6A04BB_4_5005_c.jpeg](attachment:1F591011-4F92-4E99-97FA-EA3EFD6A04BB_4_5005_c.jpeg)![0E4E10A9-B92D-4C95-B2D7-FF96D3BDB370_4_5005_c.jpeg](attachment:0E4E10A9-B92D-4C95-B2D7-FF96D3BDB370_4_5005_c.jpeg)


# Discussion

For this project, I needed to access a data set for neighborhoods in a large city.  I chose Atlanta since it is a major city in the southeast United States and because it had readily accessible neighborhood data.  For this project, the scope was limited to looking at venues in neighborhoods and inferring neighborhoods that could support a contemporary casual style restaurant.  Prior to opening a restaurant, other considerations such as average income of nearby neighborhoods, crime rates, and accessibility.  The *k*-means clustering algorithm allowed me to group neighborhoods by prevalent venues.  By mapping the neighborhoods by cluster, I was able to identify geographic regions of the city that had similar ability to support a restaurant.  