## Tutorial on exploring the TfL Open Data APIs

The TfL Open Data APIs are specified by a 'Swagger' JSON file, which at its top level has a list of paths, which are the key piece of info for any prospective TfL data dev.

Below I'll do some exploratory analysis on this JSON file, to see what's available and hone in on particular APIs of interest to me.

In [2]:
import httpx
import json

swagger_url = "https://api.tfl.gov.uk/swagger/docs/v1"
swagger_json = json.loads(httpx.get(swagger_url).content)

In [3]:
print(swagger_json)

{'swagger': '2.0', 'info': {'version': 'v1', 'title': 'Transport for London Unified API'}, 'host': 'api.digital.tfl.gov.uk', 'schemes': ['https'], 'paths': {'/AccidentStats/{year}': {'get': {'tags': ['AccidentStats'], 'summary': 'Gets all accident details for accidents occuring in the specified year', 'operationId': 'AccidentStats_Get', 'consumes': [], 'produces': ['application/json', 'text/json', 'application/xml', 'text/xml'], 'parameters': [{'name': 'year', 'in': 'path', 'description': 'The year for which to filter the accidents on.', 'required': True, 'type': 'integer', 'format': 'int32'}], 'responses': {'200': {'description': 'OK', 'schema': {'type': 'array', 'items': {'$ref': '#/definitions/Tfl.Api.Presentation.Entities.AccidentStats.AccidentDetail'}}}}, 'deprecated': False}}, '/AirQuality': {'get': {'tags': ['AirQuality'], 'summary': 'Gets air quality data feed', 'operationId': 'AirQuality_Get', 'consumes': [], 'produces': ['application/json', 'text/json', 'application/xml', 'te

In [13]:
for x in [*swagger_json]: print(x, "\n", [swagger_json[x]] if isinstance(swagger_json[x], str) else [*swagger_json[x]], end="\n\n")

swagger 
 ['2.0']

info 
 ['version', 'title']

host 
 ['api.digital.tfl.gov.uk']

schemes 
 ['https']

paths 
 ['/AccidentStats/{year}', '/AirQuality', '/BikePoint', '/BikePoint/{id}', '/BikePoint/Search', '/Cabwise/search', '/Journey/Meta/Modes', '/Journey/JourneyResults/{from}/to/{to}', '/Line/Meta/Modes', '/Line/Meta/Severity', '/Line/Meta/DisruptionCategories', '/Line/Meta/ServiceTypes', '/Line/{ids}', '/Line/Mode/{modes}', '/Line/Route', '/Line/{ids}/Route', '/Line/Mode/{modes}/Route', '/Line/{id}/Route/Sequence/{direction}', '/Line/{ids}/Status/{StartDate}/to/{EndDate}', '/Line/{ids}/Status', '/Line/Search/{query}', '/Line/Status/{severity}', '/Line/Mode/{modes}/Status', '/Line/{id}/StopPoints', '/Line/{id}/Timetable/{fromStopPointId}', '/Line/{id}/Timetable/{fromStopPointId}/to/{toStopPointId}', '/Line/{ids}/Disruption', '/Line/Mode/{modes}/Disruption', '/Line/{ids}/Arrivals/{stopPointId}', '/Mode/ActiveServiceTypes', '/Mode/{mode}/Arrivals', '/Occupancy/CarPark/{id}', '/Occupa

In [19]:
from pprint import pprint
pp = lambda x: pprint(x, sort_dicts=False)

swagger_paths = swagger_json["paths"]

for p in swagger_paths:
    print(p)

/AccidentStats/{year}
/AirQuality
/BikePoint
/BikePoint/{id}
/BikePoint/Search
/Cabwise/search
/Journey/Meta/Modes
/Journey/JourneyResults/{from}/to/{to}
/Line/Meta/Modes
/Line/Meta/Severity
/Line/Meta/DisruptionCategories
/Line/Meta/ServiceTypes
/Line/{ids}
/Line/Mode/{modes}
/Line/Route
/Line/{ids}/Route
/Line/Mode/{modes}/Route
/Line/{id}/Route/Sequence/{direction}
/Line/{ids}/Status/{StartDate}/to/{EndDate}
/Line/{ids}/Status
/Line/Search/{query}
/Line/Status/{severity}
/Line/Mode/{modes}/Status
/Line/{id}/StopPoints
/Line/{id}/Timetable/{fromStopPointId}
/Line/{id}/Timetable/{fromStopPointId}/to/{toStopPointId}
/Line/{ids}/Disruption
/Line/Mode/{modes}/Disruption
/Line/{ids}/Arrivals/{stopPointId}
/Mode/ActiveServiceTypes
/Mode/{mode}/Arrivals
/Occupancy/CarPark/{id}
/Occupancy/CarPark
/Occupancy/ChargeConnector/{ids}
/Occupancy/ChargeConnector
/Occupancy/BikePoints/{ids}
/Place/Meta/Categories
/Place/Meta/PlaceTypes
/Place/Address/Streets/{Postcode}
/Place/Type/{types}
/Place/{id

In [20]:
pp(swagger_paths)

{'/AccidentStats/{year}': {'get': {'tags': ['AccidentStats'],
                                   'summary': 'Gets all accident details for '
                                              'accidents occuring in the '
                                              'specified year',
                                   'operationId': 'AccidentStats_Get',
                                   'consumes': [],
                                   'produces': ['application/json',
                                                'text/json',
                                                'application/xml',
                                                'text/xml'],
                                   'parameters': [{'name': 'year',
                                                   'in': 'path',
                                                   'description': 'The year '
                                                                  'for which '
                                                    

In [51]:
from pathlib import Path

top_levels = sorted(set([Path(p).parts[1] for p in swagger_paths]))
for p in top_levels:
    print(p)

AccidentStats
AirQuality
BikePoint
Cabwise
Journey
Line
Mode
Occupancy
Place
Road
Search
StopPoint
TravelTimes
Vehicle


---

Of this list, I'm not interested in:

- AccidentStats
- BikePoint
- Cabwise
- Occupancy
- Road
- Vehicle

...and I am potentially interested in:

- AirQuality
- Journey
- Line
- Mode
- Place
- Search
- StopPoint
- TravelTimes

This reduces the list of full paths to those within the APIs of interest:

In [55]:
apis_of_interest = "AirQuality Journey Line Mode Place Search StopPoint TravelTimes".split()
for a in apis_of_interest:
    for p in swagger_paths:
        if Path(p).parts[1] == a:
            print(p)
    print()

/AirQuality

/Journey/Meta/Modes
/Journey/JourneyResults/{from}/to/{to}

/Line/Meta/Modes
/Line/Meta/Severity
/Line/Meta/DisruptionCategories
/Line/Meta/ServiceTypes
/Line/{ids}
/Line/Mode/{modes}
/Line/Route
/Line/{ids}/Route
/Line/Mode/{modes}/Route
/Line/{id}/Route/Sequence/{direction}
/Line/{ids}/Status/{StartDate}/to/{EndDate}
/Line/{ids}/Status
/Line/Search/{query}
/Line/Status/{severity}
/Line/Mode/{modes}/Status
/Line/{id}/StopPoints
/Line/{id}/Timetable/{fromStopPointId}
/Line/{id}/Timetable/{fromStopPointId}/to/{toStopPointId}
/Line/{ids}/Disruption
/Line/Mode/{modes}/Disruption
/Line/{ids}/Arrivals/{stopPointId}

/Mode/ActiveServiceTypes
/Mode/{mode}/Arrivals

/Place/Meta/Categories
/Place/Meta/PlaceTypes
/Place/Address/Streets/{Postcode}
/Place/Type/{types}
/Place/{id}
/Place
/Place/{type}/At/{Lat}/{Lon}
/Place/{type}/overlay/{z}/{Lat}/{Lon}/{width}/{height}
/Place/Search

/Search
/Search/BusSchedules
/Search/Meta/SearchProviders
/Search/Meta/Categories
/Search/Meta/Sorts



It's clear from the number of API endpoints that the simplest API is `AirQuality`, and the most complicated is either `Line` or `StopPoint`.

To rank them in order of interest:

- Line
- StopPoint
- Place
- Journey
- Search
- TravelTimes
- Mode
- AirQuality

This is a very rough estimate based on how unfamiliar I am with what resides within these APIs, but it's intuitive to me that `Line` and `StopPoint` would be the necessary basis for any computation of interest:
- What tube line am I on?
- What tube stop am I going to?

You'd also be interested in tying these lines/stops to locations via the `Place` API.

I personally am less interested in using ready-made `Journey` or `Search` APIs (though maybe I could). I'm wanting to make something myself rather than just access a pre-made one, maybe?

I'm not sure what's in the `TravelTimes` API, and `Mode` might be an overview (?), while `AirQuality` is not something I'm particularly interested in investigating.

## `Line` API

- When I looked into the `Line/Route` API, most of the routes I could see were bus routes
  - 11 routes had `modeName` as `tube`.

## `StopPoint` API

- When I looked into the `StopPoint/Mode/` API I could filter by `mode=tube` and got back a long list of tube stations with info on their various facilities (like escalators).
- Another interesting endpoint is `/?lat={lat}&lon={lon}&stopTypes={stopTypes}` which gives the stop points in the radius of a given location

## Goals

Even without going into the a complex setup of all of these methods, I might want to just get the basics of lines and stops, to then build on top of.