# Which city is more easy to find a Chinese Restaurant

## Introduction

如果你会中文，看到这段文字一定会很亲切。今天想用在Coursera上学习到的一些Data Science的知识，借助Python解决一个中国人去美国，哪个城市更容易找到中餐馆呢？

Today, I want to use some of the data science knowledge I learned on Coursera, and use Python to solve a Chinese person going to the United States. Which city is easier to find a Chinese restaurant?  Let’s say if you are a Chinese and have never been to the US. The first question that you may wonder is which city maybe easy to find a Chinese Restaurant while you are there. So you want to go to a place with a high density of Chinese Restaurants around there. The solution is to analyze the locations of Chinese Restaurants in a few major US cities and find the best city which can easily find a good Chinese Restaurant. 

Also this solution could be a good reference for any tourists with a taste of eastern-style food.  This final project explores the locations for Chinese restaurants throughout New York, Chicago, San Francisco, Jersery City and Boston, all are very famous tourism cities and also many Chinese residents.  The project will attempt to answer the questions “Which city is more easy to find a Chinese Restaurant?"

## Data

The FourSquare API will be used to collect data of Chinese Restaurant locations in the following five major US cities, which are:
<li>New York, NY</li>
<li>San Francisco, CA</li>
<li>Jersey City, NJ</li>
<li>Boston, MA</li>
<li>Chicago, IL</li>. 
These are one of the most populated US cities with many Chinese residents and also tourists.  Assuming there will be many good and even great Chinese Restaurants in these cities.

According to the categories information on https://developer.foursquare.com/docs/build-with-foursquare/categories/, we will use Chinese Restaurant 4bf58dd8d48988d145941735 as the category to explore Chinese Restaurants.

And I will use explore endponts of place APIs from FourSquare (https://developer.foursquare.com/docs/api-reference/venues/explore/) to explore information.


## Methodology

By exploreing Chinese Restaurant locations in the five major US cities with the Four Square API through the explore endpoint to find all Chinese Restaurant venues info.  The near function query is used to get venues in these cities, and also using the CategoryID to fillter out only Chinese Restaurant. 

Here is an example of the explore request:

https://api.foursquare.com/v2/venues/explore?&client_id=&client_secret=&v=20180605&New%20York,%20NY&limit=100&categoryId=4bf58dd8d48988d145941735

The 4bf58dd8d48988d1ca941735 is the Category Id of the Chinese Restaurant Category. Also, Foursquare limits us to maximum of 500 venues per query. 

And then this request repeated for the five US cities and got their top 100 venues. The name and coordinate data from the result is stored and plotted in maps for visual inspection.

In the end, to calcuate indicator of the density of Chinese Restaurants, the project calculated a center coordinate of the venues to get the mean longitude and latitude values. Then the mean of Euclidean distance from each venue to the mean coordinates is calculated, these indicator: mean distance to the mean coordinate, will tell which city has the highest density, so that we will know it suppose to be easier to find a Chinese Restaurant.

#### Import required libraries

In [1]:
import numpy as np # library to handle data in a vectorized manner
import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None) # show all of columns on pandas dataframe
pd.set_option('display.max_rows', None) # show all of rows on pandas dataframe
import requests # library to handle requests
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


### Sign up FourSquare account, and retrieve client_id, secret and etc. info
client_id and client_secret will be hidden by using # @hidden_cell in the following cell.

In [2]:
# @hidden_cell
client_id = '' # your Foursquare ID
client_secret = '' # your Foursquare Secret
version = '20180605' # Foursquare API version

print('Your credentails for FourSquare is set up correctly.')

Your credentails for FourSquare is set up correctly.


### Prepare request URL to retrieve Chinese Restaurant information via explore endpoint of FourSquare places APIs
There is a lot of info on foursqaure.com (https://developer.foursquare.com/docs/places-api/), and also could google for a good sample of how to setup request.  And one import info is to search for the category id for Chinese Restautant from the website.  If this notebook will handle multiple requests or dynamic ones, this section could be refined to retrieve categoris info from foursquare webpages (https://developer.foursquare.com/docs/build-with-foursquare/categories/) automatically.

In [3]:
# As there is limitation for sandbox account(950/day), set up the max results
limit = 500 
# set up five major cities in US
cities = ["New York, NY", 'Chicago, IL', 'San Francisco, CA', 'Jersey City, NJ', 'Boston, MA']

#prepare and execute request for each city
results = {}
for city in cities:
    url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&near={}&limit={}&section={}&categoryId={}'.format(
        client_id, 
        client_secret, 
        version,
        city,
        limit,
        "food",
        "4bf58dd8d48988d145941735") # Chinese Restaurant Category ID
    results[city] = requests.get(url).json()

In [4]:
df_venues={}
for city in cities:
    venues = pd.json_normalize(results[city]['response']['groups'][0]['items'])
    df_venues[city] = venues[['venue.name', 'venue.location.address', 'venue.location.lat', 'venue.location.lng']]
    df_venues[city].columns = ['Name', 'Address', 'Lat', 'Lng']

The Foursquare API Only gives us the nearest 100 venues in the city.

Let's first check out their densities by our eyes

In [5]:
maps = {}

df_city_num = pd.DataFrame(columns=['city name', 'numbers'])
i=0
for city in cities:
    city_lat = np.mean([results[city]['response']['geocode']['geometry']['bounds']['ne']['lat'],
                        results[city]['response']['geocode']['geometry']['bounds']['sw']['lat']])
    city_lng = np.mean([results[city]['response']['geocode']['geometry']['bounds']['ne']['lng'],
                        results[city]['response']['geocode']['geometry']['bounds']['sw']['lng']])
    maps[city] = folium.Map(location=[city_lat, city_lng], zoom_start=11)

    # add markers to map
    for lat, lng, label in zip(df_venues[city]['Lat'], df_venues[city]['Lng'], df_venues[city]['Name']):
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat, lng],
            radius=5,
            popup=label,
            color='blue',
            fill=True,
            fill_color='#3186cc',
            fill_opacity=0.7,
            parse_html=False).add_to(maps[city])  
    num = results[city]['response']['totalResults']
    df_city_num.loc[i]=(city,num)
    i=i+1

#sort cities by numbers of Chinese Restaurants in decending order
df_city_num.sort_values(by=["numbers"], ascending=False)
df_city_num

Unnamed: 0,city name,numbers
0,"New York, NY",237
1,"Chicago, IL",230
2,"San Francisco, CA",236
3,"Jersey City, NJ",157
4,"Boston, MA",192


In [6]:
print("Chinese Restaurants in " + cities[0])
maps[cities[0]]

Chinese Restaurants in New York, NY


In [7]:
print("Chinese Restaurants in " + cities[1])
maps[cities[1]]

Chinese Restaurants in Chicago, IL


In [8]:
print("Chinese Restaurants in " + cities[2])
maps[cities[2]]

Chinese Restaurants in San Francisco, CA


In [9]:
print("Chinese Restaurants in " + cities[3])
maps[cities[3]]

Chinese Restaurants in Jersey City, NJ


In [10]:
print("Chinese Restaurants in " + cities[4])
maps[cities[4]]

Chinese Restaurants in Boston, MA


## Results
We can see that New York and San Francisco have the most Chinese Restaurants in the five big cities, and according to the map, these restaurants are scattered in different patterns. It's a little bit hard to tell which city will be easy to find a Chinese Restaurant in mean distance or time.  So let's have a concrete measure of this density. For this I will use some basic statistics. I will get the mean location of the Chinese Restaurant places which should be near to most of them if they are really dense or far if not. 

### Let's run average of the distance of the venues to the mean coordinates. 

In [11]:
maps = {}

df_means = pd.DataFrame(columns=['city name', 'means'])
i=0
for city in cities:
    city_lat = np.mean([results[city]['response']['geocode']['geometry']['bounds']['ne']['lat'],
                        results[city]['response']['geocode']['geometry']['bounds']['sw']['lat']])
    city_lng = np.mean([results[city]['response']['geocode']['geometry']['bounds']['ne']['lng'],
                        results[city]['response']['geocode']['geometry']['bounds']['sw']['lng']])
    maps[city] = folium.Map(location=[city_lat, city_lng], zoom_start=11)
    venues_mean_coor = [df_venues[city]['Lat'].mean(), df_venues[city]['Lng'].mean()] 
    # add markers to map
    for lat, lng, label in zip(df_venues[city]['Lat'], df_venues[city]['Lng'], df_venues[city]['Name']):
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat, lng],
            radius=5,
            popup=label,
            color='blue',
            fill=True,
            fill_color='#3186cc',
            fill_opacity=0.7,
            parse_html=False).add_to(maps[city])
        folium.PolyLine([venues_mean_coor, [lat, lng]], color="green", weight=1.5, opacity=0.5).add_to(maps[city])
    
    label = folium.Popup("Mean Co-ordinate", parse_html=True)
    folium.CircleMarker(
        venues_mean_coor,
        radius=10,
        popup=label,
        color='green',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(maps[city])
    city_mean=np.mean(np.apply_along_axis(lambda x: np.linalg.norm(x - venues_mean_coor),1,df_venues[city][['Lat','Lng']].values))
    df_means.loc[i]= (city, city_mean)
    i=i+1
    
df_means.sort_values(by=["means"], ascending=True, inplace=True)
df_means

Unnamed: 0,city name,means
3,"Jersey City, NJ",0.015277
0,"New York, NY",0.020803
2,"San Francisco, CA",0.026842
4,"Boston, MA",0.028492
1,"Chicago, IL",0.045421


In [12]:
print("Chinese Restaurants in " + cities[0])
maps[cities[0]]

Chinese Restaurants in New York, NY


In [13]:
print("Chinese Restaurants in " + cities[1])
maps[cities[1]]

Chinese Restaurants in Chicago, IL


In [14]:
print("Chinese Restaurants in " + cities[2])
maps[cities[2]]

Chinese Restaurants in San Francisco, CA


In [15]:
print("Chinese Restaurants in " + cities[3])
maps[cities[3]]

Chinese Restaurants in Jersey City, NJ


In [16]:
print("Chinese Restaurants in " + cities[4])
maps[cities[4]]

Chinese Restaurants in Boston, MA


In [17]:
df_means

Unnamed: 0,city name,means
3,"Jersey City, NJ",0.015277
0,"New York, NY",0.020803
2,"San Francisco, CA",0.026842
4,"Boston, MA",0.028492
1,"Chicago, IL",0.045421


### An intresting finding
Eventhong, New York has the most number of Chinese Restaurants, but from mean distance that Jersey City, NJ is even more density comparing to NYC, that's an very interesting finding. It's very diffciult to say which city is more easier.  To my personal experience, I would put that New York as best option. And as aplus the 2nd best place is Jersey City which is just on the other side of the shore. Our tourist's best interest would be to book a hotel near that mean coordinate to surround himself with the 100 Chinese Restaurant there!!


#### Wish you have a great meal in a good Chinese Restaurant!