# Yelp Data Extraction Using Yelp Fusion

Yelp has a tool for developers named Yelp Fusion that interfaces with REST. I use Yelp Fusion through Python's Requests library.

tqdm is a miscellaneous tool that adds a loading bar, which is useful when processing many queries.

See yelp_extract.py for Requests implementation, and us_states.py for the list of territories in the USA.

## Notebook Purpose:

To extract coffeeshop yelp review data from all states in the USA using Yelp Fusion

In [1]:
import yelp_extract
import us_states
from tqdm import tqdm

import csv

API_KEY = 'taRZZGoAxOYJcrM-tqvTBCHoTZHMTioGLCXRnTVWLdzAlK8tJEw82xAofzxewrrFP4ZNgAnXluaj6B2XK2ggSLu_B0Ov1EJ8Zhy3tIKnjwn_4-mlZ-qPRxyygbB-X3Yx'

ModuleNotFoundError: No module named 'yelp_extract'

For this project, I grabbed 50 data points from each state, specifying the category 'coffeeshop'. This is due to the search limit of a single query being 50 results, and searching one state at a time should make the data points more uniform over the US.

In [2]:
data = []

# Search for 50 coffee shops for each state (territory technically) in the U.S.A.

for location in tqdm(us_states.states.values()):
    results = yelp_extract.category_search(API_KEY, 'coffeeshops', location)
    try:
        data += results['businesses'][0:50]
    except:
        print("Error")

100%|██████████| 57/57 [00:35<00:00,  1.59it/s]


In [17]:
# Example showing data format
print(len(data))
data[0]

2693


{'id': '1c52QZlCSxaJFrg1cnkrKA',
 'alias': 'mooses-tooth-anchorage',
 'name': "Moose's Tooth",
 'image_url': 'https://s3-media4.fl.yelpcdn.com/bphoto/aKnYwUXTcUOTCPCclkzJng/o.jpg',
 'is_closed': False,
 'url': 'https://www.yelp.com/biz/mooses-tooth-anchorage?adjust_creative=Uf814TLeT57jhSsmGb9P_A&utm_campaign=yelp_api_v3&utm_medium=api_v3_business_search&utm_source=Uf814TLeT57jhSsmGb9P_A',
 'review_count': 2281,
 'categories': [{'alias': 'pizza', 'title': 'Pizza'},
  {'alias': 'tradamerican', 'title': 'American (Traditional)'},
  {'alias': 'beerbar', 'title': 'Beer Bar'}],
 'rating': 4.5,
 'coordinates': {'latitude': 61.1905345745827, 'longitude': -149.86878140741},
 'transactions': ['delivery'],
 'price': '$$',
 'location': {'address1': '3300 Old Seward Hwy',
  'address2': '',
  'address3': '',
  'city': 'Anchorage',
  'zip_code': '99503',
  'country': 'US',
  'state': 'AK',
  'display_address': ['3300 Old Seward Hwy', 'Anchorage, AK 99503']},
 'phone': '+19072582537',
 'display_phone

## Wrangling Data:
I cleaned the data by transferring the necessary data to a new table. I ignored columns not listed below due to them being unimportant or being a multi-valued field, making writing to a csv difficult.

In [35]:
# Cleaning data and adding it to a new table

csv_data = []

for business in data:
    
    data_point = {}
    
    data_point['name'] = business['name']
    data_point['review_count'] = business['review_count']
    data_point['rating'] = business['rating']
    data_point['longitude'] = business['coordinates']['longitude']
    data_point['latitude'] = business['coordinates']['latitude']
    data_point['country'] = business['location']['country']
    data_point['state'] = business['location']['state']
    
    try:
        data_point['price'] = len(business['price'])
    except:
        data_point['price'] = 'N/A'
        
    if data_point['country'] == 'US' and 20 <= data_point['latitude'] <= 50 and -150 <= data_point['longitude'] <= -50:
        csv_data.append(data_point)
        
len(csv_data)

2408

In [36]:
# Writing data to a csv in the /data folder

keys = csv_data[0].keys()
with open('coffeeshops.csv', 'w', newline='', encoding = 'utf-8')  as output_file:
    dict_writer = csv.DictWriter(output_file, keys)
    dict_writer.writeheader()
    dict_writer.writerows(csv_data)