# Module 1 Project 


Flatiron School Data Science immersive January 28 Cohort


This project asks to collect data either through API calls or webscraping, put them in tabular format, and create visualizations and exploratory analysis to tell a story of these data.

Karen Lin and Derrick Lewis collaborated on this project to collect data regarding Vegan vs. Steak Restaurants from 10 cities in the United States. The goal was to look for comparisons between these nearly opposite foods. We will look for restaturant density, price disparity and local sentiment through ratings. 

To begin we will import data from the Yelp API. 

This process will require several python libraries. The first "requests" will allow us to the "get" function which will organize the url, paramaters and identification key to our digital request. Next is Pandas to create DataFrames to orgnaize the data. We'll use math to create a special version of round which rounds up to the nearest integer. Lastly, we import OS to pull our API Key from our profile instead of saving it to this code. 

In [1]:
import requests
import pandas as pd
import math
import os

## The Function to pull from the API

Next we will build our request function. The restrictions on the Yelp API will limit our results to just 50 businesses. As most urban areas have more restaurants available we will need to create a loop to make multiple requests each time at a new starting point in the list. 

### Criteria

With some early tests directly on the Yelp website, we found more distinct results from a category search rather than searching for term across the entire data set. This returns restaurants with specific tags for that category. 

### Area

To create a uniform comparison we decided to look at an exact 40 km radius starting from each urban center. We will pair this with population date from [Freemaptools.com](https://www.freemaptools.com/find-population.htm). Which creates population estimates of a specific radius from a starting point. 

In [2]:
client_id = os.environ.get('YELP_ID')
api_key = os.environ.get('YELP_KEY')

url = 'https://api.yelp.com/v3/businesses/search'
headers = {
        'Authorization': 'Bearer {}'.format(api_key),
    }


def yelp_pull(cat, lat, lon):
    '''explain this:
    INPUTS: 
    OUTPUTS: returns a list of dictionaries for each restaurant 
    '''
    #do an initial request to get the total availble results.
    results = 0
    offset = 0
    url_params = {
                     'term': 'restaurants',
                     'limit': 50,
                     'categories': str(cat),
                     'offset': offset,
                     'latitude':lat,
                     'longitude':lon,
                     'radius':40000
                      }      
    response = requests.get(url, headers=headers, params=url_params)
    results = response.json()['total']
    
    
    temp_list= []
    rng = math.ceil(results/50)
    print(rng)
    for i in range(0,rng):
        print(offset)
        url_params = {
                     'term': 'restaurants',
                     'limit': 50,
                     'categories': str(cat),
                     'offset': offset,
                     'latitude':lat,
                     'longitude':lon,
                     'radius':40000
                      }
        
        
        response = requests.get(url, headers=headers, params=url_params)
        results = response.json()['total']
        for x in range(len(response.json()['businesses'])):
                 temp_list.append(response.json()['businesses'][x])
        offset +=50
    return temp_list

# Cleaning the data

Step 1: Pull in the list of dictionaries specific to the city and type of food.

In [5]:
chi_stk_list = yelp_pull('steak', start_point['chicago'][0], start_point['chicago'][1])

2
0
50


In [6]:
chi_stk_list

[{'id': 'Nk4H9CkU8B1VPAofaB2qcQ',
  'alias': 'mables-table-chicago',
  'name': "Mable's Table",
  'image_url': 'https://s3-media2.fl.yelpcdn.com/bphoto/zk7nA0bYl0jlpuNAFQo4PQ/o.jpg',
  'is_closed': False,
  'url': 'https://www.yelp.com/biz/mables-table-chicago?adjust_creative=eAOaimX85y-R9FkFMWxzPQ&utm_campaign=yelp_api_v3&utm_medium=api_v3_business_search&utm_source=eAOaimX85y-R9FkFMWxzPQ',
  'review_count': 209,
  'categories': [{'alias': 'newamerican', 'title': 'American (New)'},
   {'alias': 'steak', 'title': 'Steakhouses'}],
  'rating': 4.5,
  'coordinates': {'latitude': 41.91582, 'longitude': -87.6699},
  'transactions': ['pickup', 'restaurant_reservation'],
  'price': '$$',
  'location': {'address1': '1655 W Cortland St',
   'address2': '',
   'address3': None,
   'city': 'Chicago',
   'zip_code': '60622',
   'country': 'US',
   'state': 'IL',
   'display_address': ['1655 W Cortland St', 'Chicago, IL 60622']},
  'phone': '+17739047433',
  'display_phone': '(773) 904-7433',
  'di

Step 2: Create a DataFrame from the list above.

In [7]:
chi_df = pd.DataFrame(chi_stk_list)

Step 3: drop all the colums we don't need.

In [8]:
chi_df.drop(['alias','distance', 'id', 'image_url', 'is_closed','location', 'transactions','categories', 'url'], axis = 'columns', inplace=True)

Step 4: Add a collumn to the dataframe with the specific food type

In [12]:
chi_df['category']='steak'

Step 5: Create a new collumn to convert number of $$ to an integer.

In [None]:
xxxxx

step 6: Create a mean of the Price collumn

Step 7: Add that mean to the rows with missing values

Step x: rearrange the collumns

In [None]:
chi_df = chi_df[['name','price','review_count','rating','coordinates']]

Step 9: Concat this clean DataFrame to the master DataFrame. 

### Population of matching 40km radius

In [180]:
pop = {'Portland':1907395, 
'Los Angeles':10055380, 
'New York City':13409590, 
'Boston':3342667, 
'Chicago':6377851, 
'Austin':1142575, 
'miami':3140501, 
'Oklahoma City':992543, 
'Minneapolis':2648228, 
'San Diego':2684215}

### Longitude and Latitude of the center of the radius

In [4]:
start_point={'new york':(40.70544486444615, -73.99429321289062),
'chicago':(41.90515925618311, -87.67776489257812),
'portland':(45.515785397030584, -122.65411376953125),
'san francisco':(37.76089938976322, -122.43644714355469),
'los angeles':(34.0615895441259, -118.32138061523438),
'boston':(42.34784169448538, -71.07124328613281),
'austin':(30.305156315977833, -97.75772094726562),
'miami':(25.752753731496888, -80.2880859375),
'san diego':(32.790569394537286, -117.15408325195312),
'tulsa':(36.10220015729658, -95.9271240234375),
'minneapolis':(44.96558443188442, -93.2904052734375)}

In [155]:
{
    "city": row['location']['city'],
    'name': row['name']
    'phone_number': row['number'],
    'steak': 'steak' in row['categories'].....
}

In [163]:
#df[df['city']=='chicago']

two sample Z test - 