# Module 1 Project 


Flatiron School Data Science immersive January 28 Cohort


This project asks to collect data either through API calls or webscraping, put them in tabular format, and create visualizations and exploratory analysis to tell a story of these data.

Karen Lin and Derrick Lewis collaborated on this project to collect data regarding Vegan vs. Steak Restaurants from 10 cities in the United States. The goal was to look for comparisons between these nearly opposite foods. We will look for restaturant density, price disparity and local sentiment through ratings. 

To begin we will import data from the Yelp API. 

This process will require several python libraries. The first "requests" will allow us to the "get" function which will organize the url, paramaters and identification key to our digital request. Next is Pandas to create DataFrames to orgnaize the data. We'll use math to create a special version of round which rounds up to the nearest integer. Lastly, we import OS to pull our API Key from our profile instead of saving it to this code. 

In [None]:
import requests
import pandas as pd
import math
import os
import csv

## The Function to pull from the API

Next we will build our request function. The restrictions on the Yelp API will limit our results to just 50 businesses. As most urban areas have more restaurants available we will need to create a loop to make multiple requests each time at a new starting point in the list. 

### Criteria

With some early tests directly on the Yelp website, we found more distinct results from a category search rather than searching for term across the entire data set. This returns restaurants with specific tags for that category. 

### Area

To create a uniform comparison we decided to look at an exact 40 km radius starting from each urban center. We will pair this with population data from [Freemaptools.com](https://www.freemaptools.com/find-population.htm). Which creates population estimates of a specific radius from a starting point. 

In [None]:
client_id = os.environ.get('YELP_ID')
api_key = os.environ.get('YELP_KEY')

url = 'https://api.yelp.com/v3/businesses/search'
headers = {
        'Authorization': 'Bearer {}'.format(api_key),
    }


def yelp_pull(cat, lat, lon):
    '''explain this:
    INPUTS: 
    OUTPUTS: returns a list of dictionaries for each restaurant 
    '''
    #do an initial request to get the total availble results.
    results = 0
    offset = 0
    url_params = {
                     'term': 'restaurants',
                     'limit': 50,
                     'categories': str(cat),
                     'offset': offset,
                     'latitude':lat,
                     'longitude':lon,
                     'radius':40000
                      }      
    response = requests.get(url, headers=headers, params=url_params)
    results = response.json()['total']
    
    
    temp_list= []
    rng = math.ceil(results/50)
    print(rng)
    for i in range(0,rng):
        print(offset)
        url_params = {
                     'term': 'restaurants',
                     'limit': 50,
                     'categories': str(cat),
                     'offset': offset,
                     'latitude':lat,
                     'longitude':lon,
                     'radius':40000
                      }
        
        
        response = requests.get(url, headers=headers, params=url_params)
        results = response.json()['total']
        for x in range(len(response.json()['businesses'])):
                 temp_list.append(response.json()['businesses'][x])
        offset +=50
    return temp_list

### Pulling data for each city and food type:

List of each city and it's coordinates of it's center (as determined by yelp): 

In [None]:
start_point={'new york':(40.70544486444615, -73.99429321289062),
'chicago':(41.90515925618311, -87.67776489257812),
'portland':(45.515785397030584, -122.65411376953125),
'san francisco':(37.796069, -122.341047),
'los angeles':(34.0615895441259, -118.32138061523438),
'boston':(42.34784169448538, -71.07124328613281),
'austin':(30.305156315977833, -97.75772094726562),
'miami':(25.752753731496888, -80.2880859375),
'san diego':(32.790569394537286, -117.15408325195312),
'tulsa':(36.10220015729658, -95.9271240234375),
'minneapolis':(44.96558443188442, -93.2904052734375)}


#'san francisco':(37.76089938976322, -122.43644714355469)

In [None]:
nyc_veg_list = yelp_pull('vegan', start_point['new york'][0], start_point['new york'][1])

In [None]:
nyc_stk_list = yelp_pull('steak', start_point['new york'][0], start_point['new york'][1])

In [None]:
chi_veg_list = yelp_pull('vegan', start_point['chicago'][0], start_point['chicago'][1])

In [None]:
chi_stk_list = yelp_pull('steak', start_point['chicago'][0], start_point['chicago'][1])

In [None]:
pdx_veg_list = yelp_pull('vegan', start_point['portland'][0], start_point['portland'][1])

In [None]:
pdx_stk_list = yelp_pull('steak', start_point['portland'][0], start_point['portland'][1])

In [None]:
sf_veg_list = yelp_pull('vegan', start_point['san francisco'][0], start_point['san francisco'][1])

In [None]:
sf_stk_list = yelp_pull('steak', start_point['san francisco'][0], start_point['san francisco'][1])

In [None]:
la_veg_list = yelp_pull('vegan', start_point['los angeles'][0], start_point['los angeles'][1])

In [None]:
la_stk_list = yelp_pull('steak', start_point['los angeles'][0], start_point['los angeles'][1])

In [None]:
bos_veg_list = yelp_pull('vegan', start_point['boston'][0], start_point['boston'][1])

In [None]:
bos_stk_list = yelp_pull('steak', start_point['boston'][0], start_point['boston'][1])

In [None]:
atx_veg_list = yelp_pull('vegan', start_point['austin'][0], start_point['austin'][1])

In [None]:
atx_stk_list = yelp_pull('steak', start_point['austin'][0], start_point['austin'][1])

In [None]:
mia_veg_list = yelp_pull('vegan', start_point['miami'][0], start_point['miami'][1])

In [None]:
mia_stk_list = yelp_pull('steak', start_point['miami'][0], start_point['miami'][1])

In [None]:
snd_veg_list = yelp_pull('vegan', start_point['san diego'][0], start_point['san diego'][1])

In [None]:
snd_stk_list = yelp_pull('steak', start_point['san diego'][0], start_point['san diego'][1])

In [None]:
tls_veg_list = yelp_pull('vegan', start_point['tulsa'][0], start_point['tulsa'][1])

In [None]:
tls_stk_list = yelp_pull('steak', start_point['tulsa'][0], start_point['tulsa'][1])

In [None]:
msp_veg_list = yelp_pull('vegan', start_point['minneapolis'][0], start_point['minneapolis'][1])

In [None]:
msp_stk_list = yelp_pull('steak', start_point['minneapolis'][0], start_point['minneapolis'][1])

In [None]:
sf_veg_list = yelp_pull('vegan', start_point['san francisco'][0], start_point['san francisco'][1])

In [None]:
sf_stk_list = yelp_pull('steak', start_point['san francisco'][0], start_point['san francisco'][1])

In [None]:
type_list= ['vegan', 'steak',
            'vegan', 'steak',
            'vegan', 'steak',
            'vegan', 'steak',
            'vegan', 'steak',
            'vegan', 'steak',
            'vegan', 'steak',
            'vegan', 'steak',
            'vegan', 'steak',
            'vegan', 'steak',
            'vegan', 'steak']

In [None]:
city_list=['new york','new york', 
           'chicago', 'chicago', 
           'portland','portland',
           'san francisco','san francisco', 
           'los angeles','los angeles',
           'boston','boston',
           'austin','austin',
           'miami','miami',
           'san diego','san diego',
           'tulsa','tulsa',
           'minneapolis','minneapolis']

In [None]:
yelp_list= [nyc_veg_list, nyc_stk_list, 
            chi_veg_list, chi_stk_list, 
            pdx_veg_list, pdx_stk_list,
            sf_veg_list, sf_stk_list,
            la_veg_list, la_stk_list,
            bos_veg_list, bos_stk_list,
            atx_veg_list, atx_stk_list,
            mia_veg_list, mia_stk_list,
            snd_veg_list, snd_stk_list,
            tls_veg_list, tls_stk_list,
            msp_veg_list, msp_stk_list]

    Combine all 22 lists into one list to be cleaned:

# Cleaning the data

    Step 1: Start a master DataFrame to hold all the results:

In [None]:
yelp_master_data = pd.DataFrame()

    Step 2: Create a DataFrame from an list above.

In [None]:
chi_df = pd.DataFrame(chi_veg_list)

    Step 3: Create a column to label the city:

In [None]:
chi_df['city']='chicago'

    Step 4: Add a collumn to the dataframe with the specific food type

In [None]:
chi_df['category']='vegan'

    Step 5: Create a new collumn to convert number of $$ to an integer.

In [None]:
chi_df['price']= chi_df['price'].map(lambda x: len(x), na_action='ignore')

    Step 6: Create a mean of the Price collumn

In [None]:
mean = chi_df['price'].mean()

    Step 7: Add that mean to the rows with missing values

In [None]:
chi_df['price'].fillna(mean, inplace=True)

    Step 8: Convert coordinates into two distinct collumns

In [None]:
df2 = chi_df["coordinates"].apply(pd.Series)

In [None]:
chi_df = pd.concat([chi_df, df2], axis = 1)

    Step 9: drop all the colums we don't need.

In [None]:
chi_df.drop(['coordinates''phone','display_phone','alias','distance', 'id', 'image_url', 'is_closed','location', 'transactions','categories', 'url'], axis = 'columns', inplace=True)



    Step 10: Concat this clean DataFrame to the master DataFrame. 

In [None]:
master_df = pd.concat([master_df, chi_df])

   ### Combine all 8 steps into one function to clean all 22 lists:

In [None]:
def yelp_clean(yelp_lst, type_lst, city_lst):
    yelp_master_data1 = pd.DataFrame()
    for i in range(len(yelp_lst)):
        temp_df = pd.DataFrame(yelp_lst[i])
        temp_df['category']=type_lst[i]
        temp_df['city']=city_lst[i]
        temp_df['price']= temp_df['price'].map(lambda x: len(x), na_action='ignore')
        mean = temp_df['price'].mean()
        temp_df['price'].fillna(mean, inplace=True)
        temp2 = temp_df["coordinates"].apply(pd.Series)
        temp_df = pd.concat([temp_df, temp2], axis = 1)
        temp_df.drop(['coordinates','phone','display_phone','alias','distance', 'id', 'image_url', 'is_closed','location', 'transactions','categories', 'url'], axis = 'columns', inplace=True)
        yelp_master_data1 = pd.concat([yelp_master_data1, temp_df])
    return yelp_master_data1

    
    
    

In [None]:
yelp_master_data = yelp_clean(yelp_list, type_list, city_list)

### Save the Master DataFrame to a csv: 

In [None]:
yelp_master_data.to_csv('yelp_master_data.csv')