# Foursquare ETL

### Introduction

In this lesson, let's use the Foursquare API to practice the ETL pattern.  We can also use it to practice the refactoring process.

### Back to Foursquare

1. Starting with our previous code

In [177]:
import requests 

def venue_search(query_params = {'ll': "40.7,-74", "query": "tacos"}):
    client_id = "ALECV5CBBEHRRKTIQ5ZV143YEXOH3SBLAMU54SPHKGZI1ZKE"
    client_secret = "3JX3NRGRS2P0KE0NSKPTMCOZOY4MWUU4M3G33BO4XTRJ15SM"
    date = "20190407"
    
    auth_params = {'client_id': client_id, 
               'client_secret': client_secret,
               'v': date}
    params = auth_params.copy()
    params.update(query_params)
    url = "https://api.foursquare.com/v2/venues/search"
    response = requests.get(url, params)
    return response.json()['response']['venues']

In [179]:
venues_from_api = venue_search()

In [181]:
venues_from_api[:2]

[{'id': '5b2932a0f5e9d70039787cf2',
  'name': 'Los Tacos Al Pastor',
  'location': {'address': '141 Front St',
   'lat': 40.70243624175102,
   'lng': -73.98753900608666,
   'labeledLatLngs': [{'label': 'display',
     'lat': 40.70243624175102,
     'lng': -73.98753900608666}],
   'distance': 1086,
   'postalCode': '11201',
   'cc': 'US',
   'neighborhood': 'DUMBO',
   'city': 'New York',
   'state': 'NY',
   'country': 'United States',
   'formattedAddress': ['141 Front St',
    'New York, NY 11201',
    'United States']},
  'categories': [{'id': '4bf58dd8d48988d151941735',
    'name': 'Taco Place',
    'pluralName': 'Taco Places',
    'shortName': 'Tacos',
    'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/taco_',
     'suffix': '.png'},
    'primary': True}],
  'delivery': {'id': '857049',
   'url': 'https://www.seamless.com/menu/los-tacos-al-pastor-141a-front-st-brooklyn/857049?affiliate=1131&utm_source=foursquare-affiliate-network&utm_medium=affiliate&utm_campaign=

In [31]:
venue_from_api = venues_from_api[0]
venue_from_api

{'id': '5b2932a0f5e9d70039787cf2',
 'name': 'Los Tacos Al Pastor',
 'location': {'address': '141 Front St',
  'lat': 40.70243624175102,
  'lng': -73.98753900608666,
  'labeledLatLngs': [{'label': 'display',
    'lat': 40.70243624175102,
    'lng': -73.98753900608666}],
  'distance': 1086,
  'postalCode': '11201',
  'cc': 'US',
  'neighborhood': 'DUMBO',
  'city': 'New York',
  'state': 'NY',
  'country': 'United States',
  'formattedAddress': ['141 Front St', 'New York, NY 11201', 'United States']},
 'categories': [{'id': '4bf58dd8d48988d151941735',
   'name': 'Taco Place',
   'pluralName': 'Taco Places',
   'shortName': 'Tacos',
   'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/taco_',
    'suffix': '.png'},
   'primary': True}],
 'delivery': {'id': '857049',
  'url': 'https://www.seamless.com/menu/los-tacos-al-pastor-141a-front-st-brooklyn/857049?affiliate=1131&utm_source=foursquare-affiliate-network&utm_medium=affiliate&utm_campaign=1131&utm_content=857049',
  'prov

### Creating Venues

Let's start by creating a class that allows us to build venue instances.  We should enforce initializing each venue with a `name`, `id`, `longitude` and `latitude`. 

In [120]:
class Venue:
    def __init__(self, id, name, longitude, latitude, category = None, zip_code = None):
        self.id = id
        self.name = name
        self.longitude = longitude
        self.latitude = latitude
        self.zip_code = zip_code
        self.category = category

Next, loop through the list of venues to create a list of venue instances.  The resulting list should be assigned to a variable called `venues`.

In [34]:
venues = []
for venue_from_api in venues_from_api:
    venue_id = venue_from_api['id']
    venue_name = venue_from_api['name']
    lat = venue_from_api['location']['lat']
    long = venue_from_api['location']['lng']
    venues.append(Venue(id = venue_id, name = venue_name, latitude = lat, longitude = long))

In [38]:
len(venues)

30

We can look at the attributes of the first four venues.

In [41]:
[venue.__dict__ for venue in venues[:4]]
# [{'id': '5b2932a0f5e9d70039787cf2',
#   'name': 'Los Tacos Al Pastor',
#   'longitude': -73.98753900608666,
#   'latitude': 40.70243624175102},
#  {'id': '542f62bc498ee31baa1395cb',
#   'name': "Rocco's Tacos and Tequila Bar Brooklyn",
#   'longitude': -73.98868115958473,
#   'latitude': 40.693277341475834},
#  {'id': '5196b9ff498e8a6be4336a03',
#   'name': 'Los Tacos No. 1',
#   'longitude': -74.00596115738153,
#   'latitude': 40.74224400629671},
#  {'id': '5d5f24ec09484500079aee00',
#   'name': 'Los Tacos No. 1',
#   'longitude': -74.008756,
#   'latitude': 40.714267}]

[{'id': '5b2932a0f5e9d70039787cf2',
  'name': 'Los Tacos Al Pastor',
  'longitude': -73.98753900608666,
  'latitude': 40.70243624175102},
 {'id': '542f62bc498ee31baa1395cb',
  'name': "Rocco's Tacos and Tequila Bar Brooklyn",
  'longitude': -73.98868115958473,
  'latitude': 40.693277341475834},
 {'id': '5196b9ff498e8a6be4336a03',
  'name': 'Los Tacos No. 1',
  'longitude': -74.00596115738153,
  'latitude': 40.74224400629671},
 {'id': '5d5f24ec09484500079aee00',
  'name': 'Los Tacos No. 1',
  'longitude': -74.008756,
  'latitude': 40.714267}]

Let's copy and paste our loop from above, but this time, also add the attributes of a venue's neighborhod, zipcode, and the first `category` of the venue (use the short name from category).  We'll let you choose precisely how to accomplish this, but one way would be to change the `__init__` method to assign more attributes.

> Not all venues will be assigned a postal code, so you will get a key error.  See how you can use a dictionary's `.get` method to avoid this.

In [51]:
venues = []
for venue_from_api in venues_from_api:
    venue_id = venue_from_api['id']
    venue_name = venue_from_api['name']
    lat = venue_from_api['location']['lat']
    long = venue_from_api['location']['lng']
    venue = Venue(id = venue_id, name = venue_name, latitude = lat, longitude = long)
    category = venue_from_api['categories'][0]['shortName']
    zip_code = venue_from_api['location'].get('postalCode')
    venue.zip_code = zip_code
    venue.category = category
    venues.append(venue)

In [52]:
venues[:3]

[<__main__.Venue at 0x10bb450d0>,
 <__main__.Venue at 0x10bb45110>,
 <__main__.Venue at 0x10bb45150>]

### Working On Venue Builder

Now that we have gotten the functionality working it's time to refactor the code.  Let's begin by moving our code into a class called `VenuesBuilder`.

In [164]:
class VenuesBuilder:
    def __init__(self, venues_data):
        self.venues_data = venues_data 
        
    def extract_data(self, venue_data):
        venue_id = venue_data['id']
        name = venue_data['name']
        lat = venue_data['location']['lat']
        long = venue_data['location']['lng']
        zip_code = venue_data['location'].get('postalCode')
        category = venue_data['categories'][0]['shortName']
        return {'name': name, 'category': category, 'id': venue_id, 
         'latitude': lat, 
         'longitude': long, 
         'zip_code': zip_code, 'category': category}
        
    def run(self):
        venues = []
        for venue_data in self.venues_data:
            selected_venue_data = self.extract_data(venue_data)
            venue = Venue(**selected_venue_data)
            venues.append(venue)
        return venues

Begin by adding an `init` function that ensures the VenuesBuilder is initialized with a list of dictionaries from the API.  The data should be stored as `venues_data` on the instance.

In [133]:
venues_from_api = venue_search(auth, params)

In [126]:
builder = VenuesBuilder(venues_from_api)

In [127]:
len(builder.venues_data)
# 30

30

In [128]:
# builder.venues_data[0]

# {'id': '5b2932a0f5e9d70039787cf2',
#  'name': 'Los Tacos Al Pastor',
#  'location': {'address': '141 Front St',
#   'lat': 40.70243624175102,
#   'lng': -73.98753900608666,
#   'labeledLatLngs': [{'label': 'disp

Next copy the code that loops through the data and constructs venues into a function called run.  This time, it should loop through the data in the `venues_data`  and return a list of `Venue` instances. 

In [153]:
# venues_from_api[:2]

In [139]:
builder = VenuesBuilder(venues_from_api)

In [140]:
venue_instances = builder.run()

In [141]:
[venue.__dict__ for venue in venue_instances[:3]]
# [{'id': '5b2932a0f5e9d70039787cf2',
#   'name': 'Los Tacos Al Pastor',
#   'longitude': -73.98753900608666,
#   'latitude': 40.70243624175102,
#   'zip_code': '11201',
#   'category': 'Tacos'},
#  {'id': '542f62bc498ee31baa1395cb',
#   'name': "Rocco's Tacos and Tequila Bar Brooklyn",
#   'longitude': -73.98868115958473,
#   'latitude': 40.693277341475834,
#   'zip_code': '11201',
#   'category': 'Mexican'},
#  {'id': '5196b9ff498e8a6be4336a03',
#   'name': 'Los Tacos No. 1',
#   'longitude': -74.00596115738153,
#   'latitude': 40.74224400629671,
#   'zip_code': '10011',
#   'category': 'Tacos'}]

[{'id': '4c1be969b306c928d1ff62b7',
  'name': 'Tri Mexican Tacos Food Truck',
  'longitude': -73.99009351837739,
  'latitude': 40.73936348846878,
  'zip_code': '10003',
  'category': 'Food Truck'},
 {'id': '4c1be969b306c928d1ff62b7',
  'name': 'Tri Mexican Tacos Food Truck',
  'longitude': -73.99009351837739,
  'latitude': 40.73936348846878,
  'zip_code': '10003',
  'category': 'Food Truck'},
 {'id': '4c1be969b306c928d1ff62b7',
  'name': 'Tri Mexican Tacos Food Truck',
  'longitude': -73.99009351837739,
  'latitude': 40.73936348846878,
  'zip_code': '10003',
  'category': 'Food Truck'}]

Now our run function is pretty long.  Move the contents of the four loop into two new functions.

The first function is called extract data.  It should take a single venue from the API and just return a dictionary of the data that we want to use to build our instance:

* id, name, latitude, longitude, zip_code, and category


We can check out the function in isolation with the following.

In [160]:
# venues_from_api[:2]

In [165]:
builder = VenuesBuilder(venues_from_api)

In [166]:
first_venue_data = builder.venues_data[0]

In [167]:
second_venue_data = builder.venues_data[1]

In [169]:
builder.extract_data(first_venue_data)
# {'name': 'Los Tacos Al Pastor',
#  'category': 'Tacos',
#  'id': '5b2932a0f5e9d70039787cf2',
#  'latitude': 40.70243624175102,
#  'longitude': -73.98753900608666,
#  'zip_code': '11201'}

{'name': 'Los Tacos Al Pastor',
 'category': 'Tacos',
 'id': '5b2932a0f5e9d70039787cf2',
 'latitude': 40.70243624175102,
 'longitude': -73.98753900608666,
 'zip_code': '11201'}

In [171]:
builder.extract_data(second_venue_data)
# {'name': "Rocco's Tacos and Tequila Bar Brooklyn",
#  'category': 'Mexican',
#  'id': '542f62bc498ee31baa1395cb',
#  'latitude': 40.693277341475834,
#  'longitude': -73.98868115958473,
#  'zip_code': '11201'}

{'name': "Rocco's Tacos and Tequila Bar Brooklyn",
 'category': 'Mexican',
 'id': '542f62bc498ee31baa1395cb',
 'latitude': 40.693277341475834,
 'longitude': -73.98868115958473,
 'zip_code': '11201'}

But the `run` function should be using the function to extract the relevant data from each venue, and then to build a venue instance.

> The output of our code should not have changed.  The difference is that now, our `run` function should be shorter.

In [172]:
builder = VenuesBuilder(venues_from_api)

In [173]:
venues = builder.run()

In [174]:
venues[:2]
# [<__main__.Venue at 0x10d663810>, <__main__.Venue at 0x10d663610>]

[<__main__.Venue at 0x10d65ebd0>, <__main__.Venue at 0x10d65ea10>]

In [175]:
[venue.__dict__ for venue in venues[:3]]

[{'id': '5b2932a0f5e9d70039787cf2',
  'name': 'Los Tacos Al Pastor',
  'longitude': -73.98753900608666,
  'latitude': 40.70243624175102,
  'zip_code': '11201',
  'category': 'Tacos'},
 {'id': '542f62bc498ee31baa1395cb',
  'name': "Rocco's Tacos and Tequila Bar Brooklyn",
  'longitude': -73.98868115958473,
  'latitude': 40.693277341475834,
  'zip_code': '11201',
  'category': 'Mexican'},
 {'id': '5196b9ff498e8a6be4336a03',
  'name': 'Los Tacos No. 1',
  'longitude': -74.00596115738153,
  'latitude': 40.74224400629671,
  'zip_code': '10011',
  'category': 'Tacos'}]

### Building a Client Class

The code related to the API, that we provided for you up front should be in a class called Client.  As this is the code that directly works with the API.

In [193]:
import requests 
class Client: 
    def venue_search(query_params = {'ll': "40.7,-74", "query": "tacos"}):
        client_id = "ALECV5CBBEHRRKTIQ5ZV143YEXOH3SBLAMU54SPHKGZI1ZKE"
        client_secret = "3JX3NRGRS2P0KE0NSKPTMCOZOY4MWUU4M3G33BO4XTRJ15SM"
        date = "20190407"
        auth_params = {'client_id': client_id, 
                   'client_secret': client_secret,
                   'v': date}
        params = auth_params.copy()
        params.update(query_params)
        url = "https://api.foursquare.com/v2/venues/search"
        response = requests.get(url, params)
        return response.json()['response']['venues']

Now there are different things we need to do to begin cleaning this up into a class.  The first step we did for you, as we moved the `import requests` to be outside of the class.

Next, we'll need to add an argument of `self` to each of our methods in the class (here there is only one).

Ok, once that is done, let's start by initializing an instance of client, and calling the `venue_search` method.

In [190]:
client = Client()

In [192]:
venues_data = client.venue_search()
# [{'id': '5b2932a0f5e9d70039787cf2',
#   'name': 'Los Tacos Al Pastor',
#   'location': {'address': '141 Front St',
#    'lat': 40.70243624175102,
#    'lng': -73.98753900608666,
#    'labeledLatLngs': [{'label': 'display',
#      'lat': 40.70243624175102,
#      'lng': -73.98753900608666}],

> If you see an error that says client object is not iterable, it's because you forgot to add `self` as a venue_search method argument.

Now it's time to make this `venue_search` method smaller.  We'll let you figure out how best to refactor this method.  But begin by trying to comment out different steps or components of the function, then turn that comment into a new method.  The venue_search method should continue to return the same data.