# Web-Based Data Wrangling in Python – Direct Download
## Introduction
The ugly truth about collecting web data is that every web data collection source can be and usually is unique.  By that I mean, while the
data returned will almost always be in a standard file format (CSV, JSON, XML), the API endpoint navigation and
interactions and returned data schema (and thus interpretation) will most certainly be unique.  Long story short,
your success will depend on your ability to read, interpret, and implement technical information pertaining to a
service’s APIs and their returned data.
The simplest way an organization can make available data for download is simply to provide the URL for a data file.
And, as you might expect owing to its simplicity, this is pretty common.  We will review this technique in
the Direct Download section below.
One notch above direct download are the APIs that require no authentication and (usually) no query parameters.
We will review this topic in the Simple API section below.
Direct Download

As stated above, the simplest way an organization can make available data for download is simply to provide the
URL for a data file.  Often such data is compressed into bundles referred to as ‘tar balls’ – more on that next time.


## Question
In this exercise you will access a single JSON file containing name data.  
The following url returns data in a JSON format and is UTF-8 encoded.
```
https://www.drivehq.com/file/DFPublishFile.aspx/FileID7591152445/Key9wj0wwhrnulk/users.json
```

### Tasks
1. Print the address for the user with username *Maxime_Nienow*?
Answering this question will require that you examine the JSON data structure and act accordingly.

### Pseudocode
* GET the JSON data
* Decode the returned bytes into readable text (assume encoded as UTF-8). Ie., deserialize the JSON data
* Process/parse the data to find target user
* Print the address information (street, suite, city and zipcode) for the user having
username Maxime_Nienow.

In [1]:
import requests
from requests.exceptions import HTTPError
import json
from pprint import pprint

# TEST DATA
# Here is the direct access URL we are interested in
URL = 'https://www.drivehq.com/file/DFPublishFile.aspx/FileID7591152445/Key9wj0wwhrnulk/users.json'

try:
    response = requests.get(URL, timeout=5)
    # If the response was successful, no Exception will be raised
    response.raise_for_status()

except HTTPError as http_err:
    print(f'HTTP error occurred: {http_err}')

except Exception as err:
    print(f'Other error occurred: {err}')

# Like the for statement's else: clause, the code block for try-else
# is entered if the try terminates normally.  Ie, no exception was raised
else:
    print('Success!')

# The text attribute will return a json serialized string.  NOT because that's
# what the .text attribute returns, but because that's the type of payload data that was
# returned for our HTTP GET request.  This is an important distinction.
#
json_str = response.text
json_as_py = json.loads(json_str)

# The json() method decodes the payload data (like json.loads()) under the assumption
# that the payload data really is data that was serialized as JSON (like json.dumps()).
# NOTE: if the payload data is not really JSON data, this method will fail with an JSONDecodeError exception.

# what does it look like?
json_as_py = response.json()
pprint(json_as_py)

# Figuring out how to parse the JSON converted to PY data requires understanding the
# structure of the data. You must examine the data to figure this out...
# Below is a chunk of the python version of the JSON data.
# 1. the top level structure is a python list (note the '[')
# 2. the list elements are python dictionaries.
# 3. several of the keys for this dict have values that are dictionaries
#    ex: the value for key 'address' is dict having keys city, geo, street, suite, zipcode
# 4. username is a key (at the same level as the address key) whose value will contain
#    the user's username
'''
[{'address': {'city': 'Gwenborough',
              'geo': {'lat': '-37.3159', 'lng': '81.1496'},
              'street': 'Kulas Light',
              'suite': 'Apt. 556',
              'zipcode': '92998-3874'},
  'company': {'bs': 'harness real-time e-markets',
              'catchPhrase': 'Multi-layered client-server neural-net',
              'name': 'Romaguera-Crona'},
  'email': 'Sincere@april.biz',
  'id': 1,
  'name': 'Leanne Graham',
  'phone': '1-770-736-8031 x56442',
  'username': 'Bret',
  'website': 'hildegard.org'},
'''

# PSEUDOCODE
# Looking for the address information for username Maxime_Nienow.
# * for each dict in the list, see if the value for the username key is Maxime_Nienow.
# * if it is, use the address key's value to obtain a reference to the dict
#   that has city, street, and zipcode as its keys.
# * Use those keys to get the address component values that we are after.

target_user = 'Maxime_Nienow'
# create a flag-like variable to hold a reference to the found address dict
addr_dict = None

# iterate over the list of dictionaries
for d in json_as_py:
    # remember, the iteration variable -d- is a dictionary
    if d['username'] == target_user:

        # Found target username
        # save a reference to the address dict for the target user
        addr_dict = d['address']

        # stop looking
        break

if addr_dict:
    print(f'Found target username: {target_user}')
    for e in 'suite,street,city,zipcode'.split(','):
        print(f'Address element "{e}" has value {addr_dict[e]}.')
else:
    # we complete the loop and addr_dict is still None - target not found
    print(f'Target username: {target_user} was not found.')

Success!
[{'address': {'city': 'Gwenborough',
              'geo': {'lat': '-37.3159', 'lng': '81.1496'},
              'street': 'Kulas Light',
              'suite': 'Apt. 556',
              'zipcode': '92998-3874'},
  'company': {'bs': 'harness real-time e-markets',
              'catchPhrase': 'Multi-layered client-server neural-net',
              'name': 'Romaguera-Crona'},
  'email': 'Sincere@april.biz',
  'id': 1,
  'name': 'Leanne Graham',
  'phone': '1-770-736-8031 x56442',
  'username': 'Bret',
  'website': 'hildegard.org'},
 {'address': {'city': 'Wisokyburgh',
              'geo': {'lat': '-43.9509', 'lng': '-34.4618'},
              'street': 'Victor Plains',
              'suite': 'Suite 879',
              'zipcode': '90566-7771'},
  'company': {'bs': 'synergize scalable supply-chains',
              'catchPhrase': 'Proactive didactic contingency',
              'name': 'Deckow-Crist'},
  'email': 'Shanna@melissa.tv',
  'id': 2,
  'name': 'Ervin Howell',
  'phone': '010