# Requesting data

- Acquire data from some *external* source
- Read that data into a Python data structure
- Print a sample of the data

## Tools

- [Requests](http://docs.python-requests.org/en/master/) is a small library for HTTP requests
- [urllib](https://docs.python.org/3/library/urllib.html) is a set of lower-level tools in the Python standard library for HTTP requests and more
- [A list of HTTP status codes](https://en.wikipedia.org/wiki/List_of_HTTP_status_codes)

Use **Requests** for simplicity and ease of use.

## Examples

We'll walk through three very simple examples.

- request a CSV file and read a sample of its data
- request an API endpoint and read a sample of its data
- scrape a web page and read a sample of its data

In [1]:
# just for presentation in notebooks
from pprint import pprint as print

### A CSV example

Hospitals in the UK according to NHS Choices.

In [2]:
import requests
import csv


csv_source = 'http://data.gov.uk/data/resource/nhschoices/Hospital.csv'

csv_delimiter = '\t'

response = requests.get(csv_source)

raw = response.text.splitlines()

reader = csv.DictReader(raw, delimiter=csv_delimiter)

data = []

for row in reader:
    data.append(row)

print(data[:2])

[{'Address1': 'Swinemoor Lane',
  'Address2': '',
  'Address3': '',
  'City': 'Beverley',
  'County': 'East Yorkshire',
  'Email': 'newhospital@nhs.net',
  'Fax': '',
  'IsPimsManaged': 'True',
  'Latitude': '53.853134155273437',
  'Longitude': '-0.41147232055664063',
  'OrganisationCode': 'RV9HE',
  'OrganisationID': '1421',
  'OrganisationName': 'East Riding Community Hospital',
  'OrganisationStatus': 'Visible',
  'OrganisationType': 'Hospital',
  'ParentName': 'Humber NHS Foundation Trust',
  'ParentODSCode': 'RV9',
  'Phone': '01482 886600',
  'Postcode': 'HU17 0FA',
  'Sector': 'NHS Sector',
  'SubType': 'Mental Health Hospital',
  'Website': 'http://www.humber.nhs.uk'},
 {'Address1': 'Zachary Merton Community Hospital',
  'Address2': 'Glenville Road',
  'Address3': 'Rustington',
  'City': 'Littlehampton',
  'County': 'Sussex',
  'Email': '',
  'Fax': '',
  'IsPimsManaged': 'True',
  'Latitude': '50.807880401611328',
  'Longitude': '-0.50063163042068481',
  'OrganisationCode': '5

### A JSON example

What are today's exchange rates against the euro?

In [3]:
import requests


json_source = 'https://api.fixer.io/latest'

response = requests.get(json_source)

data = response.json()

print(data)

{'base': 'EUR',
 'date': '2016-09-16',
 'rates': {'AUD': 1.4949,
           'BGN': 1.9558,
           'BRL': 3.6993,
           'CAD': 1.4817,
           'CHF': 1.0941,
           'CNY': 7.4915,
           'CZK': 27.024,
           'DKK': 7.4471,
           'GBP': 0.85203,
           'HKD': 8.7099,
           'HRK': 7.5115,
           'HUF': 309.14,
           'IDR': 14770.61,
           'ILS': 4.2245,
           'INR': 75.237,
           'JPY': 114.35,
           'KRW': 1263.64,
           'MXN': 21.7665,
           'MYR': 4.6441,
           'NOK': 9.2625,
           'NZD': 1.5367,
           'PHP': 53.741,
           'PLN': 4.3167,
           'RON': 4.4501,
           'RUB': 72.8966,
           'SEK': 9.557,
           'SGD': 1.5318,
           'THB': 39.19,
           'TRY': 3.3384,
           'USD': 1.1226,
           'ZAR': 15.868}}


### A scraping example

When is the next bank holiday in the UK?

In [4]:
import requests
from pyquery import PyQuery


html_source = 'https://www.gov.uk/bank-holidays'

response = requests.get(html_source)

document = PyQuery(response.text)

answer = document('.calendar:first tbody tr:first td:first').text()

print('{} is the next bank holiday in the UK.'.format(answer))

'26 December is the next bank holiday in the UK.'


## Other tools

- [Read excel files](http://www.python-excel.org)
- [Read ods files](https://github.com/pyexcel/pyexcel-ods3)
- [A powerful scraping framework](https://scrapy.org)
- [PyQuery API](https://pythonhosted.org/pyquery/api.html), and [JQuery](https://api.jquery.com/category/selectors/), on which it is modeled
- [Read and write multiple tabular data file formats via a unified API](https://github.com/frictionlessdata/tabulator-py)