# What Michelin Guide Restaurants are Participating in Restaurant Week?
We can answer this question with some web scraping :-)

## First, get the list of restaurant week restaurants

We can fetch all the restaurant data from the nice API at https://service.nycgo.com/

In [1]:
import requests, json
url = 'https://service.nycgo.com/nycgo/v2/body-grid-blocks?entryId=411&gridId=restaurant-week&randomizeFirst=true&callback=ng_jsonp_callback_1'
resp = requests.get(url).content.decode('utf-8')

# The request has some extra characters the the beginning and end
# which wrap the JSON object, hence the hacky [24:-2] indexing
restaurant_data = json.loads(resp[24:-2])['data'][0]['gridItems']

In [2]:
restaurant_week_names = []
for rdata in restaurant_data:
    restaurant_week_names.append(rdata['displayTitle'])

# Sanity check: should be 662 as of 7/28/2022
print(len(restaurant_week_names))

662


## Second, get the names and ratings of NYC restaurants on the Michelin Guide
There didn't seem to be an easy place to get the restaurants like in the former example, so we resort to scraping with Selenium

In [3]:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
from webdriver_manager.chrome import ChromeDriverManager

In [4]:
# Initialize a headless browser
chrome_options = Options()
chrome_options.add_argument("--headless")
driver = webdriver.Chrome(ChromeDriverManager().install(), options=chrome_options)




  driver = webdriver.Chrome(ChromeDriverManager().install(), options=chrome_options)


In [5]:
from selenium.common.exceptions import StaleElementReferenceException

NYC_MICHELIN_PAGES = 25
michelin_names = []
michelin_ratings = []
for page_num in range(1, NYC_MICHELIN_PAGES + 1):
    print(f'on page {page_num} / {NYC_MICHELIN_PAGES}')
    driver.get(f'https://guide.michelin.com/us/en/new-york-state/new-york/restaurants/page/{page_num}')
    name_cards = driver.find_elements(By.CLASS_NAME, 'card__menu-content--title') # restaurant name
    rating_cards = driver.find_elements(By.CLASS_NAME, 'card__menu-content--rating') # restaurant rating

    # There are two failure modes (that I've found) here as a result of asynchronous page loading:

    # 1. name_cards/rating_cards get updated asynchronously after we load the page.
    #    This leads to a StaleElementReferenceException. We can fix this by retrying with a try/except.

    # 2. name_cards/rating_cards is populated, but with empty strings. We can just check for this in a loop
    retry = True
    while retry:
        try:
            while name_cards[0].text == '':
                name_cards = driver.find_elements(By.CLASS_NAME, 'card__menu-content--title')
                rating_cards = driver.find_elements(By.CLASS_NAME, 'card__menu-content--rating')

            retry = False
        except StaleElementReferenceException:
            name_cards = driver.find_elements(By.CLASS_NAME, 'card__menu-content--title')
            rating_cards = driver.find_elements(By.CLASS_NAME, 'card__menu-content--rating')
            retry = True
            

    for name_card, rating_card in zip(name_cards, rating_cards):
        restaurant_name = name_card.text
        restaurant_rating = rating_card.text
        rating = ''
        if '=' in restaurant_rating: # '=' denotes Bib Gourmand
            rating += ':P'
        rating += '' + '*' * restaurant_rating.count('m') # each 'm' denotes a single Michelin star
        michelin_names.append(restaurant_name)
        michelin_ratings.append(rating)

on page 1 / 25
on page 2 / 25
on page 3 / 25
on page 4 / 25
on page 5 / 25
on page 6 / 25
on page 7 / 25
on page 8 / 25
on page 9 / 25
on page 10 / 25
on page 11 / 25
on page 12 / 25
on page 13 / 25
on page 14 / 25
on page 15 / 25
on page 16 / 25
on page 17 / 25
on page 18 / 25
on page 19 / 25
on page 20 / 25
on page 21 / 25
on page 22 / 25
on page 23 / 25
on page 24 / 25
on page 25 / 25


In [6]:
# Sanity check: should be 482 as of 7/28/2022
print(len(michelin_names))

482


## Finally, get the intersection of our two sets
We can definitely do this faster than the O(n^2) nested for loop, but we're not dealing with that much data so it's no big deal :)

In [7]:
from nltk.metrics.distance import jaccard_distance

set_of_rw_and_michelin_restaurants = set()
for m_name, m_rating in zip(michelin_names, michelin_ratings):
    m_set = set(ch for ch in m_name)
    for rw_name in restaurant_week_names:
        rw_name = rw_name
        rw_set = set(ch for ch in rw_name)
        if jaccard_distance(m_set, rw_set) < .2: # imperfect method that is good enough in practice
            set_of_rw_and_michelin_restaurants.add(f'{m_name} {m_rating}')

print(len(set_of_rw_and_michelin_restaurants))

49


## Print the list in the notebook
Obviously, you could also export this to a file if you'd like.

Notably, at least Gramercy Tavern and The Modern are missing here due to naming differences across the websites.

In [8]:
print('\n'.join(set_of_rw_and_michelin_restaurants))

Periyali 
Il Fiorista 
Schilling 
Tanoreen :P
The Leopard at Des Artistes 
Pastis 
Peasant 
Bar Tulix 
Veranda 
JoJo 
Bâtard *
Baar Baar 
Oso :P
Kubeh :P
Oceans 
Gentle Perch :P
Wau 
Eléa 
Vestry *
Gage & Tollner 
Ci Siamo 
Hearth 
Wayan 
Union Square Cafe 
Noreetuh 
Empellón 
Orsay 
Junoon 
Pylos 
Aburiya Kinnosuke 
Maya 
Soba Totto 
232 Bleecker 
Il Cortile 
Huertas 
Bar Primi :P
Golden Unicorn 
Danji 
Foragers Table 
Dagon 
Kyma 
The Fulton 
Lore 
HanGawi :P
Carne Mare 
Khe-Yo :P
Boulud Sud 
Barbetta 
Portale 
