# The NHSX Analytics Unit introduction to Python Session 3

---

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/nhs-pycom/coding-club/blob/main/introduction-to-apis/introduction-to-apis.ipynb)

This article was very useful and recommended - https://realpython.com/python-api/

Application Programming Interface (APIs) allow different systems to exchange (send or recieve) data. This could be used to automatically update a record/database or to send a data extract.

Currently, APIs are mostly used in software/app development to pass data and messages smoothly.  They also allow granular access to many large databases for analytical exploration.  However, in my experience these are often databases that are well managed and accessible already so often the API work loses potency from actual application after training.   This shouldn't deter us as API usage and being a standard way of accessing data is on the rise across government and in the NHS.  In paricular, NHSX should be leading on the call for more accessible data through APIs and so understanding how to build and use these is important.  

## Example well-known APIs

APIs are all around us but often hidden away doing the legwork to make smooth data flows e.g. weather apps on your phone, paypal, and loggin through google all use APIs.

![Commonly used APIs](https://github.com/nhs-pycom/coding-club-apis/blob/main/images/commonUses.png?raw=1)

Take a look at:

- https://any-api.com/
- https://github.com/public-apis/public-apis (note: US focussed)

**Top 50 most used APIs** *(accoriding to rapidaPIs.com in April 2021)*

Many of these enable websites and apps to quickly updated to latest information based on a search criteria or location (excet #13)

1. Skyscanner Flight Search

2. Open Weather Map

3. API-FOOTBALL

4. The Cocktail DB

5. REST Countries v1

6. Yahoo Finance

7. Love Calculator

8. URL Shortener Service

9. NasaAPI

10. Numbers

11. musiXmatch

12. SYSTRAN.io – Translation and NLP

13. Chuck Norris

14. Hearthstone

15. Currency Exchange

16. Breaking News

17. Booking

18. Free NBA

19. Deezer

20. Email Validator

21. Urban Dictionary

22. Pokemon Go

23. Recipe – Food – Nutrition

24. Investors Exchange (IEX) Trading

25. Movie Database (IMDB Alternative)


** Healthcare APIs **

The lists I found on whilst searching github and google are mainly US based.  NHS Digitial has a range of APIs available listed here: https://digital.nhs.uk/developer/api-catalogue.  Many of these are to do with passing secure records between services and the security around these.  


## Types of API 

There are four common;y used API types:

- Open/External/Public: Can be either completely open or require an API key
- Internal: Hidden from external uses
- Partner: Similar to Open APIs but use a third-party API gateway to manage access
- Composite: Access to several end points at once (useful for dev)

There are three standat sets of rules (Protocls) commonly used:

- REST
- RPC
- SOAP
- GraphQL (created by facebook)
See here for more info: https://apifriends.com/api-creation/different-types-apis/ & https://www.altexsoft.com/blog/soap-vs-rest-vs-graphql-vs-rpc/

Today we will focus on REST APIs

### REST API - Terminology

![Terminology](https://github.com/nhs-pycom/coding-club-apis/blob/main/images/terminology.png?raw=1)

The API itself defines accessible endpoints and valid request and response formats

### REST API Commands

- POST - Create
- GET - Read
- PUT - Update
- DELETE - DELETE

*note: There are others not covered here*

## Benefits of APIs

- Security for underlying database
- Consistency of output
- Separate frontend fram backend allows for interoperability
- Development without disruption or releases

https://www.england.nhs.uk/publication/open-api-architecture-policy/

### Side note on http vs https
Whilst most endpoints are https some are still http.  Note that https is the encrypted version for http communication.  Never send any sensitive or work data over a http connection.

## JSON format

The response most commonly comes in java script object notation (JSON).  This is a hierarchical list of key-value pairs similar to a Python dictionary.



In [None]:
#Example JSON layout
 
# {
#     "firstName": "Duke",
#     "lastnName": "Java",
#     "age": 18,
#     "streetAddress": "100 Internet Dr",
#     "city": :"JavaTown",
#     "state": "JA",
#     "postalCode": "12345",
#     "phoneNumbers": [
#         { "Mobile": "111-111-1111" },
#         {"Home": "222-222-2222" }
#     ]
# }

- An set of key-value pairs is called an object.

- Within an object one key can have an array of sub key-values pairs.

    - {} enclose objects
    - , separate pairs within an object 
    - : separating keys and values
    - [] enclose arrays

- Objects can contain arrays which in turn can contain further objects or arrays and so on.  This means that we can end up with fairly complex tree structures. 


# Practical


Steps: 

- Choose the API to work with
- Read the API documentation (this takes the most time)
- Start with small code, and complement it with more features.

Python has a few libraries used for interacting with APIs such as requests, pycurl, urllib and postman.   I find requests to be the easiest to start with however postman is good for testing estahblish APIs and good to read up on. 

Using the Python request package the code required is minimal (especially compared to other languages such as java).  We will also beed to import json and pprint to view the responses in a readable format. 

In [None]:
import requests
import json
import pprint

## Task 1: Find the ISS and who is currently in it
*from https://medium.com/quick-code/absolute-beginners-guide-to-slaying-apis-using-python-7b380dc82236*

In [None]:
request = requests.get('http://api.open-notify.org/iss-now.json')
print(request.status_code)

If a request returns a status code 200 then everything is OK, if it returns 404 then the page or resource was not found.

**Status code**
- 200 "OK"	Your request was successful!
- 201 "Created"	Your request was accepted and the resource was created.
- 400 "Bad Request"	Your request is either wrong or missing some information.
- 401 "Unauthorized"	Your request requires some additional permissions.
- 404 "Not Found"	The requested resource does not exist.
- 405 "Method Not Allowed"	The endpoint does not allow for that specific HTTP method.
- 500 "Internal Server Error"	Your request wasn’t expected and probably broke something on the server side.

To see the content which has been returned:

In [None]:
print(request.text)

In [None]:
print(request.json())

To get the latitude and longitude only we can filter by "iss_position

In [None]:
print(request.json()['iss_position'])

If we wanted we could now combine this with a geocoding API to give a map view.  I haven't done this here as it requires an API key but this is publicaly available if you want a go as a learning exercise. 

For the moment take a look at the documentation here: http://open-notify.org/Open-Notify-API/People-In-Space/ and spend **5-10 mins** trying to work out who is on the ISS right now.

In [None]:
#CODE IN HERE




NASA has some great APIs for instance one which allows the astronomoy picture of the day or Mars rover public images to be requested.  Again these need a free sign-in to get a key before use - https://api.nasa.gov/

## Task 2: Search Stackoverflow Questions

Lets try a slightly more complicated request now.  This time we will use the API provided by stackoverflow to find relevent questions

The API documentation can be found here: https://api.stackexchange.com/docs

This time when making the request we want the response to be sorted to our preference and perhaps with specific search criteria.  This can be done through the url in order to reduce the amount of data being requested.  

The format for this is the same as for any url search that you may see (for instance when using a google search or scanning through a clickbait article)

In [None]:
url = 'http://api.stackexchange.com/2.2/questions?order=desc&sort=activity&site=stackoverflow'
response = requests.get(url)
print(response.status_code)

IT can be useful at times to see the headers of an API request or response.  The headers define a few parameters for what's accepted by the API.  Here we see that the server will only respond to json content, some details around content length and encoding, and lots of other bits and bobs. 

In [None]:
print(response.headers)

A better way to set up the request is to separate the parameters from the URL so they can be easily changed by a user (e.g. through a GUI)

In [None]:
url2 = 'http://api.stackexchange.com/2.2/questions'

parameters = {
    'order':'desc',
    'sort':'activity',
    'site':'stackoverflow',
}

response = requests.get(url2, params=parameters)
print(response.status_code)

This should be the same result which you can check if you want!

We now want to print out the response and find questions of interest.

In [None]:
print(response.json()['items'])

The full JSON requested has been printed.  A nicer way of printing this is to use **pprint**

In [None]:
pprint.pprint(response.json())

You may have expected more questions to be returned than you'll see here.  The limited number is due to paging.  For stack exchange page starts at and defaults to 1, pagesize can be any value between 0 and 100 and defaults to 30.  There is a section in the stack exchange documentation on paging and how to return total results, but as the reason for paging is not to overload the API, and we dont really need all the results, we'll stick with the defaults.

APIs will also limit the rate if requests or "throttle" the number of request per second to avoid abuse or overloading.

Use a for loop to run through the items and print only those meeting a certain condition

In [None]:
term = [" r "," R ", "python", "Python"]

for data in response.json()['items']:
  if any(x in data['title'] for x in term):
    print(data['title'])
    print(data['link'])
    print()

Now try and spend **5-10 mins** attempting to get find the "answer with the most votes along with the original question"

In [None]:
# CODE IN HERE




If the Stack overflow example was a bit vanilla for you then have a look at https://thedogapi.com/ or https://thecatapi.com/ which I hear are really good examples of well documented APIs.  They do require a sign-up though so I've not touched them here for time.

## Task 3: "Post" an update

Extracting data will be the most common use for data users.  However, it may be useful to also see posting data to a database.

For this we need a server to post to.  I'll use requestbin hosted by pipedream for this.  Specifcially, https://requestbin.com/r/encygohnki5lb (note: this probably won't be available after the initial session but it's easy to genereate your own).  Documentation: https://requestbin.com/docs/#examining-requests



In [None]:
url_pipedream = "https://encygohnki5lb.x.pipedream.net/training/AU/"
mydict = {
    'fav_film': 'Lethal Weapon',
    'fav_scene': 'Woods',
          }

Post your data to the requestbin

In [None]:
requests.post(url_pipedream, data = mydict)

We should now be able to see each of the posts in the requestbin log.  Feel free to have a go at "GET"ing the data back again. 

# Note on Building an API

To make a simple post and request set is fairly straight forwards but developing a fully functioning API which meets all user and REST requirements is a much larger task. 

Roughly we need to:
- create a server or app that can run in a server
- define a series of endpoints 
- for each end point define the GET, POST, PUT, DELETE functions
  - GET: This usually consists of converting a datasource into a dictionary that can be returned alongside the code "200"
  - POST/PUT: This requires a set of required fields to be defined with a series of if statements to check for duplicate or invalid entries. 
  - DELETE: Required fields and if statements to check the record exists in order to delete it

The trouble is that useable datasets have many fields with specificy conditions that need to be met and to make a useful API we would need to define a whole series of endpoints.  Thus maybe more time consuming than difficult. 

In Python the most common tools used to creat an API are FLASK and Django.  Here is a good walk through to creating an API in Flask - https://towardsdatascience.com/the-right-way-to-build-an-api-with-python-cd08ab285f8f including the repo with the full code - https://gist.github.com/jamescalam/0b309d275999f9df26fa063602753f73

