# What do successful restaurants in New York have in common? 

Over the course of two weeks, I need to come up with an idea to leverage the Foursquare location data or to come up with a problem that I can use the Foursquare location data to solve.

I will have to provide sufficient justification of why I think what I want to solve is important and why would a client or a group of people be interested in my project.  

For this week, I will be required to submit the following:
1. A description of the problem and a discussion of the background.  
2. A description of the data and how will it be used to solve the problem.  

## Introduction/Business Problem

What am I required to do?
- Clearly define a problem or an idea of my choice
- Where you would need to leverage the Foursquare location data to solve or execute
- Describe my audience and why they would care about my problem

### Define an idea
There are many aspects that need to be take care of when running a food business e.g. location, accessibility, visibility, types of cuisines offered, customer service and many more.  It is difficult for business owners who are new to this field to figure out what should they prioritize when managing their restaurants.  

For this project, we operationalize the Foursquare rating of a restaurant to be the measure of "successfulness" of a restaurant.  It is thought that we can try to use machine learning to study the Foursquare profiles of many restaurants.  From the study/training process, we can figure out if there are common characteristics among restaurants with high rating that worth greater attention of the business owners and operators.   

For now, this project will be **focusing on the New York region**.  However, the idea and the code for this project are transferable (with minimal modifications) to any regions in the world as long as Foursquare is being used in that region.   

In short, the idea of this project is to determine the traits of the restaurants with high Foursquare rating in New York.  Another way of framing the question can be "What do successful restaurants in New York have in common?" 

### Where to leverage the Foursquare location data
As mentioned in the subsection above, this project will be revolving completely around the location data that are made available by Foursquare for each restaurant.  After reviewing the details available for each venue, it is decided that the Foursquare location data can be leveraged in three different ways.  With respect to that, the project is designed to have **three sections**.  

1. **Location**  
The first section aims to study the relationship between the location of a restaurant and the successfulness of that restaurant.  Location data leveraged from Foursquare consists of details such as the coordinates and the address of the restaurant.  This section can be helpful for business owners to see to which magnitude does the location of their restaurant affect their business positively/negatively. Location is given it's own section because this aspect of the restaurant is something that is rather fixed for the business owners.  In that case, this section serves to make the business owners aware of the effect of the location.  With that, business owners may devise their strategies to complement/compensate for the advantage/disadvantage of their location.  

2. **Traits of restaurants**   
The second section involves studying the different details of each restaurant that are available in Foursquare.  Some examples of these details include the category of the restaurant, statistics (collected from customers), provision of url that links to web page of the restaurants, operating hours of the restaurant, menu of the restaurant, price tier, count of photos provided by visitors and other attributes  of the restaurants.  This section attempts to study if any of these traits can be reliable predictors to predict the successfulness of a restaurant.  Most of these traits are within the control of the business owners.  With the reliable predictors being determined, the restaurant may have a clearer direction on what to work on to improve their business.     

3. **Tips**   
The third section involves studying the tips provided by people who visits the restaurants.  This information can be leveraged from Foursquare (to a certain extent as the number of request is limited).  This section is also important as it provides an opportunity for the business owners to see things from the customers' perspective.  It is hoped that this section may allow the business owners to reflect on their operation objectively and make changes if deemed necessary.  

### Describe my audience
This project will be useful for any business owners or potential business owners.    

Business owners can use this project to check how the restaurants' location is affecting the business.  Besides that, the business owners can compared their restaurants to the "ideal traits" of a successful restaurants to determine if there are potential aspects that can be improved to attract more customers. 

Potential business owners can utilize this project to find a suitable location to set up their restaurants.  The project also allows the business owners to set their priority on the few aspects that is crucial to have a successful restaurant.  

## Data

What am I required to do?
- Describe the data that I will be using to execute my idea
- Provide adequate explanation and discussion, with examples, of the data that I will be using

### Describe the data
All the data required for this project can be obtained using the Foursquare platform.  Foursquare is known to be one of the established, independent location data platform.  It is created mainly to understand how people "move through the real world".  It collects information on how people move using a combination of (i) Pilgrim, their location tracking technology, (ii) Foursquare Swarm, their location sharing platform and (iii) the user's search history.  

Developer are allowed to obtain part of the information collected by Foursquare using the Foursquare API.  In particular, this project will first obtain the venue ID of all eateries in the region of New York.  By using the venue ID, the details of each venue found are collected.  These information will then be send for preprocessing to filter out the irrelevant and redundant details.  

### Explanation and discussion
1. **Location**   
For the first section, the data needed are the coordinates (latitude and longitude) and the rating of the restaurants.  The latitude and longitude of the restaurants are obtained using the "Search for venues" request.  This request allows the developer to "returns a list of venues near the current location, optionally matching a search term."  This list contains basic location information (including latidude and longitude) of all venues found, which will be sufficient for this section of the project.  The rating can only be obtained by feeding the venue ID found above into another request - "get details of a venue".  Plenty of details will be fetched as the response to this request.  These details include categories, url, operating hour, price tier, rating, count of tips and many more.  For this section, only rating is required.  For the next section, however, the other details may be useful (see below). 

2. **Traits of restaurants**    
The process to obtain the details of venues is as discussed in the paragraph above.  For this section, we decide to close in on the few details as predictors, which are the categories, stats, url, hours, menu url, price, photos, attributes.  The rating of each restaurant will be the target of the decision tree classification model (more will be discussed in the Methodology section).  These details are extracted in the preprocessing process to create a dataframe that can be analyzed more conveniently.   

3. **Tips**    
For this section, we will extract the "phrase" from the details of each venue.  This key refers to the "list of phrases commonly seen in this venue’s tips, as well as a sample tip snippet and the number of tips this phrase appears in". By aggregating the list of phrases of all restaurant into two big clusters based on the rating of the restaurant, we can construct two word clouds.  Each word cloud contains phrases that appear in the tips of the restaurants, with the font size adjusted according to the frequency of the phrases used (again to be discussed further in the Methodology section).  

###  Links to relevant websites on how to request data from Foursquare
Many of the phrases used in the data section are extracted from the two websites below.  These websites list out the parameters and responses of the two requests being used above.    
1. **Search for Venues**   
<a href="https://developer.foursquare.com/docs/api/venues/search">https://developer.foursquare.com/docs/api/venues/search</a>

2. **Get Details of a Venue**   
<a href="https://developer.foursquare.com/docs/api/venues/details">https://developer.foursquare.com/docs/api/venues/details</a>