# Yelp API - Lab


## Part 1 - Set up

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
from utilities import *

In [3]:
import sys
print(sys.version)

3.6.6 |Anaconda, Inc.| (default, Jun 28 2018, 11:07:29) 
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)]


In [4]:
# Must by msql-connector-python 8.0.17
!conda list mysql

# packages in environment at /Users/rcharan/anaconda3/envs/learn-env:
#
# Name                    Version                   Build  Channel
mysql-connector-c         6.1.11            had4e77e_1002    conda-forge
mysql-connector-python    8.0.17           py36h7d2c6da_0    conda-forge


### Define a new schema, yelp

Only run this once (aka never)

In [5]:
# create_schema = '''CREATE SCHEMA yelp;'''
# cnx, cur = get_cursor()
# cursor.execute(create_schema)
# cnx.close()

## Part 2 -  Get Business Data

Safe to run multiple times.
Change reset to True to drop the table and re-pull from the API


In [6]:
business_reset = False

Drop and rebuild the table

In [7]:
business_table = '''
    CREATE TABLE IF NOT EXISTS yelp.business (
          table_id            int
                              NOT NULL
                              UNIQUE
                              AUTO_INCREMENT
                              PRIMARY KEY                  ,

          yelp_id             varchar(200)                 ,
          alias               varchar(200)                 ,
          name                varchar(200)                 ,
          review_count        int                          ,
          rating              int                          ,
          location_address1   varchar(200)                 ,
          location_address2   varchar(200)                 ,
          location_address3   varchar(200)                 ,
          location_city       varchar(200)                 ,
          location_zip_code   varchar( 10)                 ,
          location_country    varchar(200)                 ,
          location_state      varchar(200)                 ,
          price               float
    )  ENGINE=INNODB;
'''

if business_reset:
    cnx, cursor = get_cursor()
    try:
        cursor.execute('''DROP TABLE IF EXISTS yelp.business;''')
        cursor.execute(business_table)
    except:
        cnx.close()
        print('Warning: failure to reset table')
        raise
else:
    print('''Warning: this didn't do anything''')




Get the data from the API

In [1]:
# URL Parameter Builder
def package_business_request(term = 'mediterranean', 
                             location = 'chelsea, NY',
                             limit = 50,
                             offset = 0):
    return {'term'     : term,
            'location' : location,
            'limit'    : limit,
            'offset'   : offset}
business_search_params_builder = lambda offset : package_business_request(offset = offset)

# Yelp API Endpoint for Business Search
businesses_url = 'https://api.yelp.com/v3/businesses/search'

In [9]:
# Repeatedly call the API. On failure, error_return and error_df are returned for traceback
if business_reset:
    error_return = None
    error_return = all_results(businesses_url, business_search_params_builder,
                               parse_business_response, config.api_key,
                               yelp_call, 'business', 'yelp', offset = 0, 
                               recovery = error_return)

    if isinstance(error_return, tuple):
        error_df     = error_return[1]
        error_return = error_return[0]
else:
    print('''Warning: this didn't do anything''')



## Part 3: Get Reviews Data

In [10]:
yelp_reset = False

In [11]:
review_table = '''
    CREATE TABLE IF NOT EXISTS yelp.reviews (
          table_id            int
                              NOT NULL
                              UNIQUE
                              AUTO_INCREMENT
                              PRIMARY KEY                  ,
          business_id         int                          ,
          review_yelp_id      varchar(200)                 ,
          rating              varchar(200)                 ,
          text                varchar(200)                 ,
          time_created        datetime                     ,
          FOREIGN KEY (business_id) 
            REFERENCES yelp.business(table_id)
            ON DELETE CASCADE
            ON UPDATE CASCADE
    )  ENGINE=INNODB;
'''

if yelp_reset:
    cnx, cursor = get_cursor()
    try:
        cursor.execute('''DROP TABLE IF EXISTS yelp.reviews;''')
        cursor.execute(review_table)
        print('yelp.reviews reset')
    except:
        cnx.close()
        print('Warning: failure to reset table')
        raise
else:
    print('''Warning: this didn't do anything''')




In [14]:
get_reviews()

271 businesses detected
271 businesses with reviews already detected
Collecting Yelp reviews for any remaining businesses.
This could take a while if there are many remaining


## Part 4: Write SQL queries that will answer the questions posed. 

In [27]:
conn = get_connection()
query = querier_maker(conn)

- Which are the 5 most reviewed businesses?

In [17]:
query('''
    SELECT
      *
    FROM
      yelp.business
    ORDER BY
      review_count DESC
    LIMIT
      5
''')

Unnamed: 0,table_id,yelp_id,alias,name,review_count,rating,location_address1,location_address2,location_address3,location_city,location_zip_code,location_country,location_state,price
0,124,xEnNFXtMLDF5kZDxfaCJgA,the-halal-guys-new-york-2,The Halal Guys,9270,4,W 53rd St 6th Ave,,,New York,10019,US,NY,1.0
1,132,L-IuiVoFMDSw2K6OAciP1g,mamouns-falafel-new-york-2,Mamoun's Falafel,2353,4,119 MacDougal St,,,New York,10012,US,NY,1.0
2,3,B55Ocx5RBWxo6AGSucYSIA,ilili-new-york-2,ilili,2337,4,236 5th Ave,,,New York,10001,US,NY,3.0
3,214,bRPq-Nmct5bOuOtsu8fC7Q,smac-new-york,S'MAC,2156,4,197 1st Ave,,,New York,10003,US,NY,2.0
4,43,XEugUtbw4rRmGr9S1XA-aQ,alta-new-york,Alta,1857,4,64 W 10th St,,,New York,10011,US,NY,3.0


- What is the highest rating recieved in your data set and how many businesses have that rating?

In [23]:
query('''
    SELECT
      MAX(rating) as max_rating,
      COUNT(*)    as count_max_rating
    FROM
        (SELECT
          *,
          RANK() OVER (ORDER BY rating DESC) AS rating_rank
        FROM
          yelp.business) ranked
    WHERE
      rating_rank = 1
''')

Unnamed: 0,max_rating,count_max_rating
0,5,42


- What percentage of businesses have a rating greater than or  4.5?

- What percentage of businesses have a rating less than 3?

- What is the average rating of restaurants that have a price label of one dollar sign? Two dollar signs? Three dollar signs? 

In [33]:
query('''
SELECT
  price,
  AVG(rating) AS average_rating_by_price
FROM
  yelp.business
GROUP BY
  price
ORDER BY price
''')

Unnamed: 0,price,average_rating_by_price
0,,4.0851
1,1.0,3.9608
2,2.0,3.8406
3,3.0,4.0882
4,4.0,4.0


- Return the text of the reviews for the most reviewed restaurant. 

In [46]:
review_texts = query('''
SELECT name, review_count, text, rating, time_created FROM
(
(SELECT name, table_id, review_count FROM yelp.business ORDER BY review_count DESC LIMIT 1) biz
LEFT JOIN
(SELECT * FROM yelp.reviews) rev
ON
biz.table_id = rev.business_id
)''')

In [49]:
review_texts.text.tolist()

["What more needs to be said?  This place is fan-fucking-tastic.  I've eaten all over New York City, yet no meal was nearly as satisfying as when I waited in...",
 "I'm so annoyed and disappointed. Make sure you use PLENTY of white sauce because you will need something to make this bland food edible. Where's the flavor?...",
 'As someone who loves Halal carts, on my most recent trip to NYC I knew I had to make the pilgrimage to the original location: The Halal Guys.  One night, we...']

- Return the name of the business with the most recent review. 

In [53]:
query('''
SELECT name FROM
(
(SELECT * FROM yelp.reviews ORDER BY time_created DESC LIMIT 1) top_rev
LEFT JOIN
(SELECT * FROM yelp.business)  biz
ON
top_rev.business_id = biz.table_id
)
''')

Unnamed: 0,name
0,Kwik Meal


- Find the highest rated business and return text of the most recent review. If multiple business have the same rating, select the restaurant with the most reviews. 

In [58]:
query('''
SELECT name, review_count, text, time_created FROM
(
(SELECT * FROM yelp.business ORDER BY rating DESC, review_count DESC LIMIT 1) biz
LEFT JOIN
(SELECT * FROM yelp.reviews) rev
ON
biz.table_id = rev.business_id
)
ORDER BY time_created DESC
LIMIT 1
''')

Unnamed: 0,name,review_count,text,time_created
0,Taim West Village,1507,I came here for dinner with a friend as we wan...,2019-09-23 07:55:12


- Find the lowest rated business and return text of the most recent review.  If multiple business have the same rating, select the restaurant with the least reviews. 

In [59]:
query('''
SELECT name, review_count, text, time_created FROM
(
(SELECT * FROM yelp.business ORDER BY rating ASC, review_count ASC LIMIT 1) biz
LEFT JOIN
(SELECT * FROM yelp.reviews) rev
ON
biz.table_id = rev.business_id
)
ORDER BY time_created DESC
LIMIT 1
''')

Unnamed: 0,name,review_count,text,time_created
0,Ora Restaurant,1,I can't recall how I found or why I booked Ora...,2006-10-13 21:04:39


### Using DB:
    
For this lab, you can either store the data on one DB or put in on both of the partners DBs. If you decide to put it on one DB, you want to make sure both partners have access to it.  To do this you want to add a user to your DB.  

[how to add a new user](https://howchoo.com/g/mtm3zdq2nzv/how-to-add-a-mysql-user-and-grant-privileges)