# Week 4 - NumPy, Pandas and  Data Ingestion

### Topics Covered:
- **NumPy**
    - Basic NumPy Operations
    - Common NumPy Array Operations
    - Linear Algebra
- **Pandas**
    - Pandas Series
    - Pandas DataFrame
    - Properties of Pandas DataFrame
    - Operations on Pandas DataFrame
- **Data Ingestion**
    - Reading CSV Files
    - Reading HTML Tables From Web Pages and PDF Files
    - Reading and Writing HDF5 Files and Operation

### In-class Assignment (Top5 App): 
Get a list of all the restaurants in Denver and their details from YELP. Rank the restaurants by their distance from the campus and find top 5 restaurants by category.

#### <span style='color:blue'>Task 1: Signing Up for YELP API</span>

Link: https://docs.developer.yelp.com/docs/overview

In [42]:
# Extracting Yelp Key
from configparser import ConfigParser
config = ConfigParser()
config.read('config.ini')

API_KEY = config["YELP"]["api-key"]

### <span style='color:blue'>Task 2: Pulling data with REST API</span> 

Recall the componenets of a REST API call

<img src="images/rest.png" width="900">

In [82]:
# import data from yelp
import requests
import json

url = 'https://api.yelp.com/v3' + '/businesses/search'

url_params = {
    'location': 'Denver, CO',
    'limit': 50,
    'term': 'restaurant',
    'radius': 5000
}

headers = {'Authorization': API_KEY}

response = requests.get(url, headers=headers, params=url_params)


In [83]:
# print code, content and headers
print(response.status_code)
print(response.content)
print(response.headers)

200
b'{"businesses": [{"id": "2wrS0h8iXFZTr4p0egCLjQ", "alias": "west-saloon-and-kitchen-denver", "name": "West Saloon & Kitchen ", "image_url": "https://s3-media1.fl.yelpcdn.com/bphoto/LPYnorL4BW9Q_LwIPM5s2w/o.jpg", "is_closed": false, "url": "https://www.yelp.com/biz/west-saloon-and-kitchen-denver?adjust_creative=h_83--JT2KAHZ603PlTOZA&utm_campaign=yelp_api_v3&utm_medium=api_v3_business_search&utm_source=h_83--JT2KAHZ603PlTOZA", "review_count": 661, "categories": [{"alias": "tradamerican", "title": "American"}], "rating": 4.4, "coordinates": {"latitude": 39.74403, "longitude": -104.99044}, "transactions": ["restaurant_reservation", "pickup", "delivery"], "price": "$$", "location": {"address1": "501 16th St Mall", "address2": "", "address3": null, "city": "Denver", "zip_code": "80202", "country": "US", "state": "CO", "display_address": ["501 16th St Mall", "Denver, CO 80202"]}, "phone": "+13038253690", "display_phone": "(303) 825-3690", "distance": 1971.726182635776, "business_hours":

In [84]:
# load json
result = json.loads(response.content)

In [85]:
result['businesses'][2]['review_count']

50

#### A glimpse over Pandas Data Structure

<img src="images/series.png" width="900">

**Series** is a one-dimensional data structure which can hold data of multiple types. Series are labelled where an index labels each element inside a series. Series can be created out from a Python list, a Python dictionary or even a scalar value. A particular Pandas Series can hold only a single data type.

In [56]:
import pandas as pd

data = pd.Series([35, 67, 93], index=['Mercury', 'Venus', 'Earth'])
print(data.index)
print(data.values)

Index(['Mercury', 'Venus', 'Earth'], dtype='object')
[35 67 93]


<img src="images/dataframe.png" width="900">

Unlike Series, **DataFrames** are multi-dimensional, primarily consisting of rows and columns. They are similar to a spreadsheet or a SQL table. Pandas provides all functionalities and methods to deal with data in the DataFrame. Each row in a DataFrame is labeled with an index, as it is done in Series. Whereas, there are also labels for each column. In one of upcoming sections, we will also have a look at multi-level indexing for DataFrames.

In [57]:
# create a dataframe
web_views = pd.DataFrame({
    'Chrome': [67, 44, 8],
    'Safari': [74, 58, 14],
    'Firefox': [89, 70, 16]
}, index=['2018', '2019', '2020'])

print(web_views.index)
print(web_views.columns)
print(web_views.values)

Index(['2018', '2019', '2020'], dtype='object')
Index(['Chrome', 'Safari', 'Firefox'], dtype='object')
[[67 74 89]
 [44 58 70]
 [ 8 14 16]]


In [None]:
# Why might we need to make an explicit choice of index? And be careful about it?

In [58]:
type(web_views.loc[:,'Firefox'])
# Why not use type(web_views['Firefox']) ?

pandas.core.series.Series

In [59]:
web_views['Firefox']

2018    89
2019    70
2020    16
Name: Firefox, dtype: int64

#### Selecting and Subsetting

Pandas offers a few ways that we can use to select a set of data. This includes selecting a cell, row, column or a subset of the entire dataframe. The 3 ways that we will go through include: selection using `[]`, `.loc` (label based indexing) and `.iloc` (position based indexing).

Using `[]` is an archaic way of indexing which mimics the same way we would index dictionaries in Python. In Pandas, it will select the lower-dimension.

While you may still find the use of this notation for indexing DataFrames and Series, it can often result in errors and ambiguity. The purpose to bring it up here was to provide familiarity with this notation. Therefore, we will focus more on using .loc for selection as it is a powerful and explicit indexer. The format it follows is `.loc[row_indexer,col_indexer]`

In [66]:
# Let's go through some examples
web_views.iloc[0:2,:]

Unnamed: 0,Chrome,Safari,Firefox
2018,67,74,89
2019,44,58,70


### <span style='color:blue'>Task 3: Structuring the result with Pandas</span> 

In [86]:
import pandas as pd
restaurant_df = pd.DataFrame(result['businesses'])

In [87]:
restaurant_df.head()

Unnamed: 0,id,alias,name,image_url,is_closed,url,review_count,categories,rating,coordinates,transactions,price,location,phone,display_phone,distance,business_hours,attributes
0,2wrS0h8iXFZTr4p0egCLjQ,west-saloon-and-kitchen-denver,West Saloon & Kitchen,https://s3-media1.fl.yelpcdn.com/bphoto/LPYnor...,False,https://www.yelp.com/biz/west-saloon-and-kitch...,661,"[{'alias': 'tradamerican', 'title': 'American'}]",4.4,"{'latitude': 39.74403, 'longitude': -104.99044}","[restaurant_reservation, pickup, delivery]",$$,"{'address1': '501 16th St Mall', 'address2': '...",13038253690,(303) 825-3690,1971.726183,"[{'open': [{'is_overnight': False, 'start': '1...","{'business_temp_closed': None, 'menu_url': 'ht..."
1,qRLjMCH1ysrOl3ewkZ-2qQ,pancho-poncho-denver,Pancho Poncho,https://s3-media3.fl.yelpcdn.com/bphoto/Z0lGpQ...,False,https://www.yelp.com/biz/pancho-poncho-denver?...,14,"[{'alias': 'mexican', 'title': 'Mexican'}]",4.1,"{'latitude': 39.727133, 'longitude': -104.98216}",[],,"{'address1': '400 E 7th Ave', 'address2': None...",17206179400,(720) 617-9400,1382.911408,"[{'open': [{'is_overnight': False, 'start': '1...","{'business_temp_closed': None, 'menu_url': 'ht..."
2,iS9j0iSOhRzBBYJdzzuZFQ,point-easy-denver,Point Easy,https://s3-media2.fl.yelpcdn.com/bphoto/qaiHD3...,False,https://www.yelp.com/biz/point-easy-denver?adj...,50,"[{'alias': 'newamerican', 'title': 'New Americ...",4.7,"{'latitude': 39.75686910940363, 'longitude': -...",[],,"{'address1': '2000 E 28th Ave', 'address2': ''...",13032335656,(303) 233-5656,2447.575999,"[{'open': [{'is_overnight': False, 'start': '1...","{'business_temp_closed': None, 'menu_url': Non..."
3,svJWwW0ilssyqk_UML0mUg,work-and-class-denver,Work & Class,https://s3-media2.fl.yelpcdn.com/bphoto/JZkQDL...,False,https://www.yelp.com/biz/work-and-class-denver...,1800,"[{'alias': 'newamerican', 'title': 'New Americ...",4.5,"{'latitude': 39.75761, 'longitude': -104.98604}",[delivery],$$,"{'address1': '2500 Larimer St', 'address2': 'S...",13032920700,(303) 292-0700,2809.710449,"[{'open': [{'is_overnight': False, 'start': '1...","{'business_temp_closed': None, 'menu_url': 'ht..."
4,GmniFxffmpFcbmb5KWIwbA,culinary-dropout-denver-denver-4,Culinary Dropout,https://s3-media3.fl.yelpcdn.com/bphoto/Z8rdDG...,False,https://www.yelp.com/biz/culinary-dropout-denv...,773,"[{'alias': 'newamerican', 'title': 'New Americ...",4.2,"{'latitude': 39.73132, 'longitude': -104.93918}",[],$$,"{'address1': '4141 E 9th Ave', 'address2': '',...",17207790190,(720) 779-0190,2697.275321,"[{'open': [{'is_overnight': False, 'start': '1...","{'business_temp_closed': None, 'menu_url': '',..."


In [88]:
# Append data
for i in range(1, 4):
    url_params = {
        'location': 'Denver, CO',
        'limit': 50,
        'term': 'restaurant',
        'radius': 5000,
        'offset': (i*50)
    }
    response = requests.get(url, headers=headers, params=url_params)
    try:
        result = json.loads(response.content)
        data = result['businesses']
        new_df = pd.DataFrame(data)
        restaurant_df = pd.concat([restaurant_df, new_df], ignore_index=True)
    except:
        print(response.content)
        # change url params

In [89]:
restaurant_df.shape

(200, 18)

#### Deep vs Shallow copy

The process of assigning DataFrame to a new variable using `=` sign doesn’t copy the values but just makes the new variable another point of reference for the dataframe.

**Shallow Copy**:

- Creates a new object but references the same memory location for nested objects.
- Changes to nested objects in the copy will affect the original.
- The difference this has with using the assignment operator `=` is that we can append extra rows and columns to the shallow copied DataFrame without affecting the original one.

**Deep Copy**:

- Creates a new object and recursively copies all nested objects.
- Changes to nested objects in the copy do not affect the original.

In [94]:
shallow = web_views.copy(deep=False)
deep = web_views.copy(deep=True)

In [95]:
web_views

Unnamed: 0,Chrome,Safari,Firefox
2018,67,74,89
2019,44,58,70
2020,8,14,16


In [96]:
shallow.loc['2019', 'Chrome'] = 80
deep.loc['2018', 'Firefox'] = 99

In [97]:
web_views

Unnamed: 0,Chrome,Safari,Firefox
2018,67,74,89
2019,80,58,70
2020,8,14,16


In [98]:
shallow

Unnamed: 0,Chrome,Safari,Firefox
2018,67,74,89
2019,80,58,70
2020,8,14,16


In [99]:
deep

Unnamed: 0,Chrome,Safari,Firefox
2018,67,74,99
2019,44,58,70
2020,8,14,16


### <span style='color:blue'>Task 4: Creating a deep copy and exporting</span> 

In [None]:
# Export data to csv
restaurant_df.to_csv('restaurants_raw.csv', index=False)
# Export data to pickle
restaurant_df.to_pickle('restaurants_raw.pickle')

In [100]:
# create a deep copy
restaurant_df_clean = restaurant_df.copy(deep=True)


In [102]:
# capatialize columns
restaurant_df_clean.columns = restaurant_df_clean.columns.str.capitalize()
restaurant_df_clean.columns

Index(['Id', 'Alias', 'Name', 'Image_url', 'Is_closed', 'Url', 'Review_count',
       'Categories', 'Rating', 'Coordinates', 'Transactions', 'Price',
       'Location', 'Phone', 'Display_phone', 'Distance', 'Business_hours',
       'Attributes'],
      dtype='object')

In [103]:
# Subset columns using loc
restaurant_df_clean = restaurant_df_clean.loc[:, ['Id', 'Name', 'Is_closed', 'Review_count', 'Categories', 
                                      'Rating', 'Coordinates', 'Transactions', 'Price', 'Location',
                                      'Distance', 'Business_hours']]
restaurant_df_clean.head(2)


Unnamed: 0,Id,Name,Is_closed,Review_count,Categories,Rating,Coordinates,Transactions,Price,Location,Distance,Business_hours
0,2wrS0h8iXFZTr4p0egCLjQ,West Saloon & Kitchen,False,661,"[{'alias': 'tradamerican', 'title': 'American'}]",4.4,"{'latitude': 39.74403, 'longitude': -104.99044}","[restaurant_reservation, pickup, delivery]",$$,"{'address1': '501 16th St Mall', 'address2': '...",1971.726183,"[{'open': [{'is_overnight': False, 'start': '1..."
1,qRLjMCH1ysrOl3ewkZ-2qQ,Pancho Poncho,False,14,"[{'alias': 'mexican', 'title': 'Mexican'}]",4.1,"{'latitude': 39.727133, 'longitude': -104.98216}",[],,"{'address1': '400 E 7th Ave', 'address2': None...",1382.911408,"[{'open': [{'is_overnight': False, 'start': '1..."


### Data Types in Pandas

The attribute dtypes provides us with the the data type for each column in a DataFrame. Pandas has a set of data types that it understands and manipulates. 

- integer: default integer types are int64 (called nullable integers too)
- float: default float types are float64
- boolean: stores boolean data (True/False) with missing values
- date/time: stores dates in datetime64[ns] format
- object: holds any Python object, including strings
- string: dedicated to strings
- category: stores limited, fixed number of possible values

### <span style='color:blue'>Task 5: Get basic metadata and convert types</span> 

In [106]:
# Get basic metadata
print(restaurant_df_clean.shape)
print(restaurant_df_clean.dtypes)

(200, 12)
Id                 object
Name               object
Is_closed            bool
Review_count        int64
Categories         object
Rating            float64
Coordinates        object
Transactions       object
Price              object
Location           object
Distance          float64
Business_hours     object
dtype: object


In [108]:
# check for nans
print(restaurant_df_clean.isna().sum())

Id                 0
Name               0
Is_closed          0
Review_count       0
Categories         0
Rating             0
Coordinates        0
Transactions       0
Price             61
Location           0
Distance           0
Business_hours     0
dtype: int64


In [109]:
# convert data types
restaurant_df_clean = restaurant_df_clean.astype({
    'Price': 'category',
    'Name': 'string',
    'Id':'string'
})
print(restaurant_df_clean.dtypes)

Id                string[python]
Name              string[python]
Is_closed                   bool
Review_count               int64
Categories                object
Rating                   float64
Coordinates               object
Transactions              object
Price                   category
Location                  object
Distance                 float64
Business_hours            object
dtype: object


#### Apply method and lambda in Pandas

`apply()` method is used to apply functions on data. It works on both DataFrame or Series. For DataFrames, it operates on each column or row as a whole. For Series, it operates on each element. It can return a scalar, Series, or DataFrame. Compared to this, `applymap()` applies a function to every individual element in the DataFrame.

`df.apply(lambda x: some_operation(x))`

### <span style='color:blue'>Task 6: Get the first category name</span> 

In [112]:
restaurant_df_clean.head(3)

Unnamed: 0,Id,Name,Is_closed,Review_count,Categories,Rating,Coordinates,Transactions,Price,Location,Distance,Business_hours
0,2wrS0h8iXFZTr4p0egCLjQ,West Saloon & Kitchen,False,661,American,4.4,"{'latitude': 39.74403, 'longitude': -104.99044}","[restaurant_reservation, pickup, delivery]",$$,"{'address1': '501 16th St Mall', 'address2': '...",1971.726183,"[{'open': [{'is_overnight': False, 'start': '1..."
1,qRLjMCH1ysrOl3ewkZ-2qQ,Pancho Poncho,False,14,Mexican,4.1,"{'latitude': 39.727133, 'longitude': -104.98216}",[],,"{'address1': '400 E 7th Ave', 'address2': None...",1382.911408,"[{'open': [{'is_overnight': False, 'start': '1..."
2,iS9j0iSOhRzBBYJdzzuZFQ,Point Easy,False,50,New American,4.7,"{'latitude': 39.75686910940363, 'longitude': -...",[],,"{'address1': '2000 E 28th Ave', 'address2': ''...",2447.575999,"[{'open': [{'is_overnight': False, 'start': '1..."


In [111]:
# get first category
restaurant_df_clean.loc[:, 'Categories'] = restaurant_df_clean.loc[:, 'Categories'].apply(lambda x: x[0]['title'])

### Pass by Reference vs Pass by Value

It is important to know how Pandas objects are treated, especially when they are passed to functions or when changes are made to them. If you are not familiar with the concept, arguments to functions are pass-by-value where Python creates a copy of the variable inside the function and works with that copy.

![](https://www.mathwarehouse.com/programming/images/pass-by-reference-vs-pass-by-value-animation.gif)

In [117]:
def square_number(number_in_function):
    number_in_function = number_in_function**2
    print(number_in_function)

In [118]:
number = 5
square_number(number)
print(number)

25
5


However, Pandas objects, particularly DataFrame and Series are passed as pass-by-reference, where the variable pointing to the Pandas object is given to the function and, therefore, any changes made to the object the pointer variable is pointing to within the function, also changes the object in place.

Moreover, any assignment to a new variable such as df2 = df just makes the new variable point to the same DataFrame and as a result, any changes made to df2 will also impact df since they both point to the same object.

In [123]:
def alter_transactions(df):
    for transaction in ['delivery', 'pickup', 'restaurant_reservation']:
        df.loc[:, transaction] = 0
        df.loc[:, transaction] = df.loc[:, 'Transactions'].apply(lambda x: 1 if transaction in x else 0)

In [124]:
alter_transactions(restaurant_df_clean)

In [127]:
restaurant_df_clean.head()

Unnamed: 0,Id,Name,Is_closed,Review_count,Categories,Rating,Coordinates,Transactions,Price,Location,Distance,Business_hours,delivery,pickup,reservation,restaurant_reservation
0,2wrS0h8iXFZTr4p0egCLjQ,West Saloon & Kitchen,False,661,American,4.4,"{'latitude': 39.74403, 'longitude': -104.99044}","[restaurant_reservation, pickup, delivery]",$$,"{'address1': '501 16th St Mall', 'address2': '...",1971.726183,"[{'open': [{'is_overnight': False, 'start': '1...",1,1,0,1
1,qRLjMCH1ysrOl3ewkZ-2qQ,Pancho Poncho,False,14,Mexican,4.1,"{'latitude': 39.727133, 'longitude': -104.98216}",[],,"{'address1': '400 E 7th Ave', 'address2': None...",1382.911408,"[{'open': [{'is_overnight': False, 'start': '1...",0,0,0,0
2,iS9j0iSOhRzBBYJdzzuZFQ,Point Easy,False,50,New American,4.7,"{'latitude': 39.75686910940363, 'longitude': -...",[],,"{'address1': '2000 E 28th Ave', 'address2': ''...",2447.575999,"[{'open': [{'is_overnight': False, 'start': '1...",0,0,0,0
3,svJWwW0ilssyqk_UML0mUg,Work & Class,False,1800,New American,4.5,"{'latitude': 39.75761, 'longitude': -104.98604}",[delivery],$$,"{'address1': '2500 Larimer St', 'address2': 'S...",2809.710449,"[{'open': [{'is_overnight': False, 'start': '1...",1,0,0,0
4,GmniFxffmpFcbmb5KWIwbA,Culinary Dropout,False,773,New American,4.2,"{'latitude': 39.73132, 'longitude': -104.93918}",[],$$,"{'address1': '4141 E 9th Ave', 'address2': '',...",2697.275321,"[{'open': [{'is_overnight': False, 'start': '1...",0,0,0,0


In [129]:
# remove transactions column
restaurant_df_clean.drop(columns=['restaurant_reservation'], inplace=True)
restaurant_df_clean.head(5)

Unnamed: 0,Id,Name,Is_closed,Review_count,Categories,Rating,Coordinates,Price,Location,Distance,Business_hours,delivery,pickup,reservation
0,2wrS0h8iXFZTr4p0egCLjQ,West Saloon & Kitchen,False,661,American,4.4,"{'latitude': 39.74403, 'longitude': -104.99044}",$$,"{'address1': '501 16th St Mall', 'address2': '...",1971.726183,"[{'open': [{'is_overnight': False, 'start': '1...",1,1,0
1,qRLjMCH1ysrOl3ewkZ-2qQ,Pancho Poncho,False,14,Mexican,4.1,"{'latitude': 39.727133, 'longitude': -104.98216}",,"{'address1': '400 E 7th Ave', 'address2': None...",1382.911408,"[{'open': [{'is_overnight': False, 'start': '1...",0,0,0
2,iS9j0iSOhRzBBYJdzzuZFQ,Point Easy,False,50,New American,4.7,"{'latitude': 39.75686910940363, 'longitude': -...",,"{'address1': '2000 E 28th Ave', 'address2': ''...",2447.575999,"[{'open': [{'is_overnight': False, 'start': '1...",0,0,0
3,svJWwW0ilssyqk_UML0mUg,Work & Class,False,1800,New American,4.5,"{'latitude': 39.75761, 'longitude': -104.98604}",$$,"{'address1': '2500 Larimer St', 'address2': 'S...",2809.710449,"[{'open': [{'is_overnight': False, 'start': '1...",1,0,0
4,GmniFxffmpFcbmb5KWIwbA,Culinary Dropout,False,773,New American,4.2,"{'latitude': 39.73132, 'longitude': -104.93918}",$$,"{'address1': '4141 E 9th Ave', 'address2': '',...",2697.275321,"[{'open': [{'is_overnight': False, 'start': '1...",0,0,0


### <span style='color:blue'>Task 8: Map values in a DataFrame</span> 

Pandas provides the map method that maps value in a Series (or a column in a DataFrame) to a given set of values. Providing a dictionary for mapping is the most convenient way, however, map method takes other forms of inputs too.

In [130]:
# show unique values of Price column
restaurant_df_clean.loc[:, 'Price'].unique()

['$$', NaN, '$$$$', '$$$', '$']
Categories (4, object): ['$', '$$', '$$$', '$$$$']

In [131]:
# map values
restaurant_df_clean.loc[:, 'Price'] = restaurant_df_clean.loc[:, 'Price'].map({'$': 'Incredible!', 
                                                                               '$$': 'Can Manage It', 
                                                                               '$$$': 'Ummm, Fancy', 
                                                                               '$$$$': 'Oh My God'})
restaurant_df_clean.head(3)

Length: 200
Categories (4, object): ['Incredible!', 'Can Manage It', 'Ummm, Fancy', 'Oh My God']' has dtype incompatible with category, please explicitly cast to a compatible dtype first.
  restaurant_df_clean.loc[:, 'Price'] = restaurant_df_clean.loc[:, 'Price'].map({'$': 'Incredible!',


Unnamed: 0,Id,Name,Is_closed,Review_count,Categories,Rating,Coordinates,Price,Location,Distance,Business_hours,delivery,pickup,reservation
0,2wrS0h8iXFZTr4p0egCLjQ,West Saloon & Kitchen,False,661,American,4.4,"{'latitude': 39.74403, 'longitude': -104.99044}",Can Manage It,"{'address1': '501 16th St Mall', 'address2': '...",1971.726183,"[{'open': [{'is_overnight': False, 'start': '1...",1,1,0
1,qRLjMCH1ysrOl3ewkZ-2qQ,Pancho Poncho,False,14,Mexican,4.1,"{'latitude': 39.727133, 'longitude': -104.98216}",,"{'address1': '400 E 7th Ave', 'address2': None...",1382.911408,"[{'open': [{'is_overnight': False, 'start': '1...",0,0,0
2,iS9j0iSOhRzBBYJdzzuZFQ,Point Easy,False,50,New American,4.7,"{'latitude': 39.75686910940363, 'longitude': -...",,"{'address1': '2000 E 28th Ave', 'address2': ''...",2447.575999,"[{'open': [{'is_overnight': False, 'start': '1...",0,0,0


### Looping through DataFrame

We can use either `itertuple()` or `iterrows()` to effectively loop over rows in a Pandas DataFrame.

`itertuples()` is generally faster and returns each row as a named tuple. It returns named tuples, which allow for attribute access and preserves data types of the columns.

`iterrows()` returns each row as a tuple containing the index and a Series of the row. It is generally slower than itertuples() and may not preserve data types.


In [134]:
# loop through a single row
for row in restaurant_df_clean.itertuples():
    print(row)
    print(row.Coordinates)
    break


Pandas(Index=0, Id='2wrS0h8iXFZTr4p0egCLjQ', Name='West Saloon & Kitchen ', Is_closed=False, Review_count=661, Categories='American', Rating=4.4, Coordinates={'latitude': 39.74403, 'longitude': -104.99044}, Price='Can Manage It', Location={'address1': '501 16th St Mall', 'address2': '', 'address3': None, 'city': 'Denver', 'zip_code': '80202', 'country': 'US', 'state': 'CO', 'display_address': ['501 16th St Mall', 'Denver, CO 80202']}, Distance=1971.726182635776, Business_hours=[{'open': [{'is_overnight': False, 'start': '1100', 'end': '2300', 'day': 0}, {'is_overnight': False, 'start': '1100', 'end': '2300', 'day': 1}, {'is_overnight': False, 'start': '1100', 'end': '2300', 'day': 2}, {'is_overnight': False, 'start': '1100', 'end': '2300', 'day': 3}, {'is_overnight': False, 'start': '1100', 'end': '0000', 'day': 4}, {'is_overnight': False, 'start': '1100', 'end': '0000', 'day': 5}, {'is_overnight': False, 'start': '1100', 'end': '2300', 'day': 6}], 'hours_type': 'REGULAR', 'is_open_now

### <span style='color:blue'>Task 9: Get Duration from DU</span> 

In [135]:
DU_coordinates = '-104.9619,39.6766'
mapbox_api = config["mapbox"]["api-key"]

In [136]:
coordinates = str(row.Coordinates['longitude']) + ',' + str(row.Coordinates['latitude'])
url = f'https://api.mapbox.com/directions/v5/mapbox/driving-traffic/{DU_coordinates};{coordinates}'


params = {
    'access_token': mapbox_api
}

response = requests.get(url, params=params)
data = response.json()
duration = data['routes'][0]['duration']
print(duration)

1005.889


In [140]:
row.Distance

1971.726182635776

In [144]:
import time

restaurant_df_clean.loc[:, 'Duration'] = 0
for row in restaurant_df_clean.itertuples():
    coordinates = str(row.Coordinates['longitude']) + ',' + str(row.Coordinates['latitude'])
    url = f'https://api.mapbox.com/directions/v5/mapbox/driving-traffic/{DU_coordinates};{coordinates}'
    response = requests.get(url, params=params)
    data = response.json()
    duration = data['routes'][0]['duration']
    restaurant_df_clean.loc[row.Index, 'Duration'] = duration
    time.sleep(1)


In [145]:
restaurant_df_clean.head(10)

Unnamed: 0,Id,Name,Is_closed,Review_count,Categories,Rating,Coordinates,Price,Location,Distance,Business_hours,delivery,pickup,reservation,Duration
0,2wrS0h8iXFZTr4p0egCLjQ,West Saloon & Kitchen,False,661,American,4.4,"{'latitude': 39.74403, 'longitude': -104.99044}",Can Manage It,"{'address1': '501 16th St Mall', 'address2': '...",1971.726183,"[{'open': [{'is_overnight': False, 'start': '1...",1,1,0,1000.348
1,qRLjMCH1ysrOl3ewkZ-2qQ,Pancho Poncho,False,14,Mexican,4.1,"{'latitude': 39.727133, 'longitude': -104.98216}",,"{'address1': '400 E 7th Ave', 'address2': None...",1382.911408,"[{'open': [{'is_overnight': False, 'start': '1...",0,0,0,809.437
2,iS9j0iSOhRzBBYJdzzuZFQ,Point Easy,False,50,New American,4.7,"{'latitude': 39.75686910940363, 'longitude': -...",,"{'address1': '2000 E 28th Ave', 'address2': ''...",2447.575999,"[{'open': [{'is_overnight': False, 'start': '1...",0,0,0,1215.017
3,svJWwW0ilssyqk_UML0mUg,Work & Class,False,1800,New American,4.5,"{'latitude': 39.75761, 'longitude': -104.98604}",Can Manage It,"{'address1': '2500 Larimer St', 'address2': 'S...",2809.710449,"[{'open': [{'is_overnight': False, 'start': '1...",1,0,0,1163.419
4,GmniFxffmpFcbmb5KWIwbA,Culinary Dropout,False,773,New American,4.2,"{'latitude': 39.73132, 'longitude': -104.93918}",Can Manage It,"{'address1': '4141 E 9th Ave', 'address2': '',...",2697.275321,"[{'open': [{'is_overnight': False, 'start': '1...",0,0,0,1030.6
5,pFuMzFgNKdEWXi0ckj0ivw,Corsica,False,62,Wine Bars,4.5,"{'latitude': 39.761427, 'longitude': -104.983907}",,"{'address1': '2801 Walnut St', 'address2': '',...",3108.182747,"[{'open': [{'is_overnight': True, 'start': '14...",0,0,0,1167.172
6,_39Md1_VX2ftOT7HLyO1Yg,Reckless Noodle House,False,118,Noodles,4.2,"{'latitude': 39.729257409926944, 'longitude': ...",,"{'address1': '800 N Sherman St', 'address2': '...",1414.966619,"[{'open': [{'is_overnight': False, 'start': '1...",0,0,0,815.316
7,uQd0TZKBprBXxF1aGSIAIA,Dew Drop Inn,False,65,Gastropubs,4.6,"{'latitude': 39.7434, 'longitude': -104.9738192}",,"{'address1': '1033 E 17th Ave', 'address2': ''...",927.529237,"[{'open': [{'is_overnight': False, 'start': '1...",0,0,0,1123.784
8,6-MYTEofJo7jmdoTfuVIHA,Nana's Dim Sum & Dumplings,False,306,Dim Sum,4.5,"{'latitude': 39.763536, 'longitude': -105.010744}",,"{'address1': '3316 Tejon St', 'address2': 'Ste...",4658.986507,"[{'open': [{'is_overnight': False, 'start': '1...",1,1,0,924.398
9,fHR1VVGarB17xh56bxbUcg,Revival Denver Public House,False,131,Cocktail Bars,4.2,"{'latitude': 39.74311852120664, 'longitude': -...",,"{'address1': '630 E 17th Ave', 'address2': Non...",1124.024585,"[{'open': [{'is_overnight': False, 'start': '1...",1,1,0,1066.941


In [146]:
# convert duration to minutes
restaurant_df_clean.loc[:, 'Duration'] = restaurant_df_clean.loc[:, 'Duration'].apply(lambda x: x / 60)
restaurant_df_clean.head(10)

Unnamed: 0,Id,Name,Is_closed,Review_count,Categories,Rating,Coordinates,Price,Location,Distance,Business_hours,delivery,pickup,reservation,Duration
0,2wrS0h8iXFZTr4p0egCLjQ,West Saloon & Kitchen,False,661,American,4.4,"{'latitude': 39.74403, 'longitude': -104.99044}",Can Manage It,"{'address1': '501 16th St Mall', 'address2': '...",1971.726183,"[{'open': [{'is_overnight': False, 'start': '1...",1,1,0,16.672467
1,qRLjMCH1ysrOl3ewkZ-2qQ,Pancho Poncho,False,14,Mexican,4.1,"{'latitude': 39.727133, 'longitude': -104.98216}",,"{'address1': '400 E 7th Ave', 'address2': None...",1382.911408,"[{'open': [{'is_overnight': False, 'start': '1...",0,0,0,13.490617
2,iS9j0iSOhRzBBYJdzzuZFQ,Point Easy,False,50,New American,4.7,"{'latitude': 39.75686910940363, 'longitude': -...",,"{'address1': '2000 E 28th Ave', 'address2': ''...",2447.575999,"[{'open': [{'is_overnight': False, 'start': '1...",0,0,0,20.250283
3,svJWwW0ilssyqk_UML0mUg,Work & Class,False,1800,New American,4.5,"{'latitude': 39.75761, 'longitude': -104.98604}",Can Manage It,"{'address1': '2500 Larimer St', 'address2': 'S...",2809.710449,"[{'open': [{'is_overnight': False, 'start': '1...",1,0,0,19.390317
4,GmniFxffmpFcbmb5KWIwbA,Culinary Dropout,False,773,New American,4.2,"{'latitude': 39.73132, 'longitude': -104.93918}",Can Manage It,"{'address1': '4141 E 9th Ave', 'address2': '',...",2697.275321,"[{'open': [{'is_overnight': False, 'start': '1...",0,0,0,17.176667
5,pFuMzFgNKdEWXi0ckj0ivw,Corsica,False,62,Wine Bars,4.5,"{'latitude': 39.761427, 'longitude': -104.983907}",,"{'address1': '2801 Walnut St', 'address2': '',...",3108.182747,"[{'open': [{'is_overnight': True, 'start': '14...",0,0,0,19.452867
6,_39Md1_VX2ftOT7HLyO1Yg,Reckless Noodle House,False,118,Noodles,4.2,"{'latitude': 39.729257409926944, 'longitude': ...",,"{'address1': '800 N Sherman St', 'address2': '...",1414.966619,"[{'open': [{'is_overnight': False, 'start': '1...",0,0,0,13.5886
7,uQd0TZKBprBXxF1aGSIAIA,Dew Drop Inn,False,65,Gastropubs,4.6,"{'latitude': 39.7434, 'longitude': -104.9738192}",,"{'address1': '1033 E 17th Ave', 'address2': ''...",927.529237,"[{'open': [{'is_overnight': False, 'start': '1...",0,0,0,18.729733
8,6-MYTEofJo7jmdoTfuVIHA,Nana's Dim Sum & Dumplings,False,306,Dim Sum,4.5,"{'latitude': 39.763536, 'longitude': -105.010744}",,"{'address1': '3316 Tejon St', 'address2': 'Ste...",4658.986507,"[{'open': [{'is_overnight': False, 'start': '1...",1,1,0,15.406633
9,fHR1VVGarB17xh56bxbUcg,Revival Denver Public House,False,131,Cocktail Bars,4.2,"{'latitude': 39.74311852120664, 'longitude': -...",,"{'address1': '630 E 17th Ave', 'address2': Non...",1124.024585,"[{'open': [{'is_overnight': False, 'start': '1...",1,1,0,17.78235


#### Addtinal concept: Method Chaining

As the name suggests, method chaining enables us to use a series of methods in a single command. This is possible because every method that we use on a pandas object gets returned as a pandas object itself, allowing us to apply another method on it. Method chaining simplies are code and makes it more readable.

### Week Recap

In [149]:
restaurant_df_clean.groupby('Categories')['Duration'].mean().sort_values().head()

Categories
Hot Pot                   11.742917
Hawaiian                  11.821308
Asian Fusion              12.740783
Food Delivery Services    13.231900
Noodles                   13.242842
Name: Duration, dtype: float64