# Berlin Tourist Guide

This notebook is a tutorial as to how to load data from web services and do useful things with it.

Imagine that a tourist lands at Berlin's [Tegel Airport](https://en.wikipedia.org/wiki/Berlin_Tegel_Airport) in the morning and has his "connecting" flight from [Schönefeld Airport](https://en.wikipedia.org/wiki/Berlin_Sch%C3%B6nefeld_Airport) in the evening. Sounds weird? His airline thought that there would be a single airport by the time it scheduled the connection.

Having never been in Berlin before and with less than a day available, we are asked to come up with a plan of sights that the tourist can visit with a car rented from a car-sharing service on his way from Tegel to Schönefeld.

After short discussion, we agree on the `sights` list below.

In [None]:
arrival = "Berlin Tegel Airport (TXL), Berlin"

sights = [
    "Alexanderplatz, Berlin",
    "Brandenburger Tor, Pariser Platz, Berlin",
    "Checkpoint Charlie, Friedrichstraße, Berlin",
    "Kottbusser Tor, Berlin",
    "Mauerpark, Berlin",
    "Siegessäule, Berlin",
    "Reichstag, Platz der Republik, Berlin",
    "Soho House Berlin, Torstraße, Berlin",
    "Tempelhofer Feld, Berlin",
]

departure = "Berlin Schönefeld Airport (SXF), Berlin"

With just the street addresses, however, we cannot find a shortest route. We need $(lat,lon)$ coordinates instead. While we could just open a site like [Google Maps](https://www.google.com/maps) in a web browser, we really don't want to have to copy & paste the coordinates by hand and decide to use a [web API](https://en.wikipedia.org/wiki/Web_API) offered by [Google](https://www.google.com).

## Geocoding

In order to obtain coordinates for the given street addresses above, a process called **geocoding**, we will use the [Google Maps API](https://developers.google.com/maps/documentation/geocoding/start).

First, read this [documentation](https://developers.google.com/maps/documentation/geocoding/start), register a developer account, and create an API key that is necessary for everything to work (Google just wants to know the people using its services).

Then, assign the API key as a text string to the `key` variable below. The first 100,000 requests per day are for free, so no costs will incur for this case study!

In [None]:
key = "paste_your_key_here"

To use external web services, our application needs to make HTTP requests just like our web browser does when surfing the internet.

Luckily, we do not have to implement this on our own. Instead, we will use the official Python Client for the Google Maps Services provided by Google in one of its corporate [GitHub repositories](https://github.com/googlemaps/google-maps-services-python). Go there and read the short documentation. Then, install the third-party library *googlemaps* with the `pip` command line tool.

In [None]:
! pip install googlemaps

Let's instantiate a client object that provides us with a lot of methods to talk to the API.

In [None]:
import googlemaps

In [None]:
api = googlemaps.Client(key=key)

In [None]:
api

Let's look at some of the object's methods and attributes.

In [None]:
[x for x in dir(api) if not x.startswith("_")]

To ask for all kinds of data associated with a street address, we just call the `geocode()` method with the address as the sole argument.

For example, let's search for Brandenburg Gate.

We receive a list with a single dictionary in it implying only one known "place" at the address. Unfortunately, the dictionary is pretty dense and hard to read.

In [None]:
api.geocode("Brandenburger Tor, Pariser Platz, Berlin")

Let's capture the first search result in the response in a variable `brandenburg_gate` and "pretty print" it with the help of the [pprint()](https://docs.python.org/3/library/pprint.html#pprint.pprint) function in the [pprint](https://docs.python.org/3/library/pprint.html) module in the Standard Library.

In [None]:
from pprint import pprint

In [None]:
response = api.geocode("Brandenburger Tor, Pariser Platz, Berlin")

In [None]:
brandenburg_gate = response[0]

The dictionary has several keys that are of use for us. *formatted_address* is a cleanly formatted version of the address, such as you enter into a car's onboard navigation system. *geometry* is a nested dictionary with several coordinates representing the place where *location* is the one we need for our calculations. Lastly, *place_id* is a unique identifier that allows us to obtain further information about the address from other APIs by Google.

In [None]:
pprint(brandenburg_gate)

## The Place Class

To keep our code readable and maintainable, we will create a `Place` class to manage the API results in a clean way.

The `__init__()` method takes the initial `street_address` as its argument and stores it on `self` after parsing the place's name out of it (the part before the first comma). Also, the instance attributes `latitude`, `longitude`, and `place_id` are initialized to `None`.

The `sync_from_google()` method takes as its `client` argument a reference to a client instance from the *googlemaps* library that has access to the API via a valid key and synchronizes the place's data with Google Maps. In particular, it updates the `address` with the formatted version and stores the values for `latitude`, `longitude`, and `place_id`. It simply returns `self` to enable method chaining.

Also, let us put a read-only `location` property on the class that returns the `latitude` and `longitude` as a tuple.

In [None]:
class Place:
    """A place with a connection to a Google Maps API search result."""

    def __init__(self, street_address):
        """Create a new place object.

        Args:
            address (str): street address of the place
        """
        self.name = ...  # extract the name part out of the street_address
        self.address = ...
        self.latitude = ...
        self.longitude = ...
        self.place_id = ...

    def __repr__(self):
        if self.place_id:
            return "<{}({!r})>".format(self.__class__.__name__, self.name)
        return "{}({!r})".format(self.__class__.__name__, self.address)

    def sync_from_google(self, client):
        """Obtain the coordinates, clean address, and ID for a place object from Google.

        Args:
            client (googlemaps.client.Client): access to the Google Maps API
        """
        response = ...  # make an API call here
        first_hit = ...
        self.address = ...
        self.latitude = ...  # reach deep into the nested dictionary if necessary
        self.longitude = ...
        self.place_id = ...
        ....

    @property
    def location(self):
        return ...

Let's try out our new class.

In [None]:
brandenburg_gate = Place("Brandenburger Tor, Pariser Platz, Berlin")

In [None]:
brandenburg_gate

Now we can obtain the geo-data from Google in a clean way. As we enabled method chaining for `sync_from_google()`, we get back the instance after calling the method.

The angle brackets "<" and ">" around the string representation indicate that we cannot type the string back into a code cell and get the same state back. This is because after synchronization with Google Maps, the `brandenburg_gate` object has some attributes set that we cannot know without asking Google. We could have put the attributes from Google as optional arguments into the `__init__()` method but do not do this for convenience.

In [None]:
brandenburg_gate.sync_from_google(api)

In [None]:
brandenburg_gate

In [None]:
brandenburg_gate.address

In [None]:
brandenburg_gate.place_id

In [None]:
brandenburg_gate.location

### The Place Class revisited: Batch  Synchronization with the Google Maps API

Let us add a class method `from_addresses()` that takes a mandatory argument `addresses` that is a list of strings and an optional argument `client` (defaulting to `None`) and returns a list of `Place` instances, one for each string. If `client` is provided, the place instances are synchronized with Google right away.

In [None]:
class Place:
    """A place with a connection to a Google Maps API search result."""

    def __init__(self, street_address):
        ...

    def __repr__(self):
        ...

    def sync_from_google(self, client):
        ...

    @property
    def location(self):
        ...

    @classmethod
    def from_addresses(cls, addresses, client=None):
        """Create new place objects.

        Args:
            addresses (list of strings): a list of street address of the places
            client (googlemaps.client.Client): access to the Google Maps API;
                if provided, sync the places with Google before returning;
                defaults to None
        Returns:
            list(Place)
        """
        places = ...  # initialize a data structure to collect intermediate objects
        for ... in ...:
            place = ...
            if client:
                ...  # sync with Google
            places.append(...)
        ...

Let's try out the alternative constructor for both cases, with and without `client` provided.

In [None]:
Place.from_addresses(sights)

In [None]:
Place.from_addresses(sights, client=api)

## Visualizations

For geo-data it always makes sense to plot them on a map. We use the third-party library *folium* to achieve that. Go to the [GitHub repository](https://github.com/python-visualization/folium) and read how the library works. Then, install it with the `pip` command line tool.

In [None]:
! pip install folium

Let's create an empty map of Berlin.

In [None]:
import folium

In [None]:
berlin = folium.Map(location=(52.513186, 13.3944349), zoom_start=14)

`folium.Map` instances are shown as interactive maps in Jupyter notebooks whenever they are the last expression in a code cell.

In [None]:
berlin

In order to put something on the map, folium works with so-called `Marker` objects. Review its docstring and then we create a marker `m` with the location data of Brandenburg Gate. Note that we use HTML tags for the `popup` argument to format the output in the map in a nicer way.

In [None]:
folium.Marker?

In [None]:
m = folium.Marker(
    location=brandenburg_gate.location,
    popup="<b>{}</b><br/>({})".format(brandenburg_gate.name, brandenburg_gate.address),
    tooltip=brandenburg_gate.name,
)

Now we put the marker on the map with its `add_to()` method.

In [None]:
m.add_to(berlin)

In [None]:
berlin

### The Place Class revisited: Marker Representation

We implement a `as_marker()` method that returns a `Marker` instance when called on a `Place` instance. The method takes an optional `color` argument that uses folium's `Icon` type to control the color of the marker.

In [None]:
class Place:
    """A place with a connection to a Google Maps API search result."""

    def __init__(self, street_address):
        ...

    def __repr__(self):
        ...

    def sync_from_google(self, client):
        ...

    @property
    def location(self):
        ...

    @classmethod
    def from_addresses(cls, addresses, client=None):
        ...

    def as_marker(self, color="blue"):
        """Create a folium Marker representation of the place.

        Args:
            color (str): color of the marker, defaults to "blue"
        Returns:
            folium.Marker
        Raises:
            RuntimeError: if the place is not yet synchronized with Google
        """
        if not self.place_id:
            raise RuntimeError("Must synchronize with Google first!")
        return folium.Marker(
            location=...,
            popup=...,
            tooltip=...,
            icon=folium.Icon(color=color)
        )

We create a new `Place` instance and convert it into a `folium.Marker` object.

In [None]:
brandenburg_gate = Place("Brandenburger Tor, Pariser Platz, Berlin")

Note that we need the location data from Google first to create a marker. Without synchronization, we get a `RuntimeError`.

In [None]:
brandenburg_gate.as_marker()

Observe the elegant use of method chaining again.

In [None]:
brandenburg_gate.sync_from_google(api).as_marker()

To make use of the new functionality further below, we need to re-instantiate the `Place` instances again.

In [None]:
places = Place.from_addresses(sights, client=api)

In [None]:
places

### The Map Class

To make folium's `Map` class work even better with our `Place` instances, we write our own `Map` class wrapping folium's. This is an example of the so-called [adapter pattern](https://en.wikipedia.org/wiki/Adapter_pattern) in software engineering. We also add further functionality to the class throughout this tutorial.

The `__init__()` method takes mandatory `name`, `center`, `start`, `end`, and `places` arguments. `name` is just there for convenience, `center` is used as the map's initial center, `start` and `end` are `Place` instances, and `places` is a list of `Place` instances. Also, it accepts an optional `initial_zoom` argument. The method creates a `folium.Map` instance that is stored as an "implementation detail" on the instance variable `_map`. Also, to design `Map` as an immutable type, we store all passed in arguments on hidden variables. Lastly, `__init__()` puts markers for each place on the `_map` object ("green" and "red" for the `start` and `end` locations, and "blue" for the places to be visited).

The `add_marker()` instance method allows to put arbitrary markers on the map and is also used internally by the `__init__()` method. We also build method chaining into it.

To maintain the automatic rendering of folium's maps in Jupyter notebooks, we simply return the hidden `_map` variable in the `show()` method.

In [None]:
class Map:
    """A map with plotting and routing capabilities."""

    def __init__(self, name, center, start, end, places, initial_zoom=12):
        """Create a new map instance.

        Args:
            name (str): name of the map
            center (float, float): coordinates of the map's center
            start (Place): start of the tour
            end (Place): end of the tour
            places (list of Places): the places to be visitied
        """
        ...  # store name, center, start, and end as implementation details
        ...
        ...
        ...
        self._map = folium.Map(...)

        ...  # add start as a green marker using the add_marker() method below
        ...  # add end as a red marker using the add_marker() method below
        for place in places:
            ...  # add place as a marker using the add_marker() method below

    def __repr__(self):
        return "<Map of {}>".format(self._name)

    def show(self):
        """Return a folium.Map representation of the map."""
        ...

    def add_marker(self, marker):
        """Add a marker to the map.

        Args:
            marker (folium.Marker): marker to be put on the map
        """
        ...  # call the add_to() method on a folium.Marker instance with the hidden _map variable
        return ...

Let's put all the sights, the two airports, and three more places, the Bundeskanzleramt, the Olympic Stadium, and the East Side Gallery, on the map. Implementing method chaining everywhere creates a nice and compact "language".

In [None]:
berlin = (
    Map("Berlin", center=(52.5015154, 13.4066838),
        start=Place(arrival).sync_from_google(api),
        end=Place(departure).sync_from_google(api),
        places=places, initial_zoom=10)
    .add_marker(Place("Bundeskanzleramt, Willy-Brandt-Straße, Berlin")
                .sync_from_google(api).as_marker(color="orange"))
    .add_marker(Place("Olympiastadion, Berlin")
                .sync_from_google(api).as_marker(color="orange"))
    .add_marker(Place("East Side Gallery, Berlin")
                .sync_from_google(api).as_marker(color="orange"))
)

In [None]:
berlin

In [None]:
berlin.show()

## Distance Matrices

Before we can find out the best order in which to visit all the sights, we need to find out the pairwise distances between all points. While Google also offers a [Directions API](https://developers.google.com/maps/documentation/directions/start) and a [Distance Matrix API](https://developers.google.com/maps/documentation/distance-matrix/start), we choose to calculate the air distances using the third-party library [geopy](https://github.com/geopy/geopy), whose documentation you find [here](https://geopy.readthedocs.io/en/stable/). *geopy* is a very popular library that we could also have used for geocoding with the Google Maps API.

Let's first install *geopy* with the `pip` command line utility.

In [None]:
! pip install geopy

We need *geopy* primarily for converting the $(lat,lon)$ coordinates into a $xy$-plane and take into account earth's curvature when calculating distances. This, however, is implemented "under the hood": *geopy* provides a `great_circle()` function to calculate the so-called [orthodromic distance](https://en.wikipedia.org/wiki/Great-circle_distance) between two places on a sphere.

In [None]:
from geopy.distance import great_circle

For quick reference, read the docstring.

In [None]:
great_circle?

For example, let's calculate the air distance between the two airports. `great_circle()` returns a custom `Distance` object that can be accessed as a `float` with either the `km` or the `meters` property.

In [None]:
tegel = Place(arrival).sync_from_google(api)
schoenefeld = Place(departure).sync_from_google(api)

In [None]:
great_circle(tegel.location, schoenefeld.location)

In [None]:
great_circle(tegel.location, schoenefeld.location).km

In [None]:
great_circle(tegel.location, schoenefeld.location).meters

### The Place Class revisited: Distance to another Place

We add a `distance_to()` instance method on the `Place` class that takes a `other` argument (that must also be a `Place` instance) and returns the distance in meters (and as an integer).

In [None]:
class Place:
    """A place with a connection to a Google Maps API search result."""

    def __init__(self, street_address):
        ...

    def __repr__(self):
        ...

    def sync_from_google(self, client):
        ...

    @property
    def location(self):
        ...

    @classmethod
    def from_addresses(cls, addresses, client=None):
        ...

    def as_marker(self, color="blue"):
        ...

    def distance_to(self, other):
        """Calculate the distance in meters.

        Args:
            other (Place): the other place to calculate the distance to
        Returns:
            int
        Raises:
            RuntimeError: if one of the places is not yet synchronized with Google
        """
        if not self.place_id or not other.place_id:
            raise RuntimeError("Must synchronize both places with Google first!")
        return int(...)  # use great_circle() and return the .meters attribute as an integer

Let's try out the new functionality.

In [None]:
tegel = Place(arrival).sync_from_google(api)
schoenefeld = Place(departure).sync_from_google(api)

If done right, object-oriented code reads almost like plain English.

In [None]:
tegel.distance_to(schoenefeld)

Again, to make use of the new method, we need to re-instantiate the `Place` instances again.

In [None]:
places = Place.from_addresses(sights, client=api)

### The Map Class revisited: Pairwise Distances

Now we add a read-only `distances` property on our `Map` class. As we are working with air distances, we observe that these are symmetric which reduces the number of distances we need to calculate. We use the [combinations()](https://docs.python.org/3/library/itertools.html#itertools.combinations) function in the [itertools](https://docs.python.org/3/library/itertools.html) module in the Standard Library that gives us all possible $r$-tuples in a list-like object where $r$ is just $2$ in our case. `distances` takes the hidden `_start`, `_end`, and `_places` attributes and returns a dictionary with keys consisting of all pairs of places and their distances in meters as the corresponding values. As this operation is rather costly and we built the `Map` class to be immutable anyways, we "cache" the calculated distances the first time we calculate them into a hidden instance attribute `_distances` (this must also be initialized in the `__init__()` method).

In [None]:
from itertools import combinations

Let's look at an easy example of using `combinations()` to understand what it does. It gives us all the $2$-tuples from a list of four `numbers` disregarding the order of the tuples' elements.

In [None]:
numbers = [1, 2, 3, 4]

for x, y in combinations(numbers, 2):
    print(x, y)

In [None]:
class Map:
    """A map with plotting and routing capabilities."""

    def __init__(self, name, center, start, end, places, initial_zoom=12):
        ...  # also initialize the cached _distances variable

    def __repr__(self):
        ...

    def show(self):
        ...

    def add_marker(self, marker):
        ...

    @property
    def distances(self):
        """Return a dictionary with the pairwise distances of all places.

        Implementation note: The result of the calculations are cached.
        """
        if not self._distances:
            distances = ...  # initialize a data structure to collect the mappings
                             # from tuples of Places to their respective distances
            all_pairs = combinations([...] + self._places, ...)  # complete the list out of which
                                                                 # the pairs are drawn
            for first, second in all_pairs:
                distance = ...  # calculate the distance from first to second
                distances[...] = distance  # store the distance both from first to second
                distances[...] = distance  # but also from second to first
            self._distances = distances
        return self._distances

We pretty print the total distance matrix.

In [None]:
berlin = Map(
    "Berlin", center=(52.5015154, 13.4066838),
    start=Place(arrival).sync_from_google(api),
    end=Place(departure).sync_from_google(api),
    places=places, initial_zoom=10
)

In [None]:
pprint(berlin.distances)

How can we be sure the matrix contains all possible pairs? As we have $9$ sights on our list plus the start and the end points of the tour, we conclude that there must be $11 * 10 = 110$ distances excluding the $0$ distances of a place to itself that are not in the distance matrix.

In [None]:
(len(places) + 2) * ((len(places) + 2) - 1)

In [None]:
len(berlin.distances)

## Route Optimization

Let us find the cost minimal order of travelling from one airport to the other and traversing all the sights.

This problem can be expressed as finding the shortest so-called [Hamiltonian path](https://en.wikipedia.org/wiki/Hamiltonian_path) from `start` to `end`, i.e., a path that visits each intermediate node exactly once. With the "trick" of assuming the distance of travelling from the `end` to the `start` to be $0$ and thereby effectively merging the two airports into a single node, the problem can be transformed into a so-called [travelling salesman problem](https://en.wikipedia.org/wiki/Travelling_salesman_problem) (TSP).

The TSP is a very hard problem to solve but also very well studied in the literature. Assuming symmetric distances, a TSP with $n$ nodes has $\frac{(n-1)!}{2}$ possible routes. $(n-1)$ because any node can be the start/end and divided by $2$ as the problem is symmetric.

Starting with about $n = 20$, the TSP is almost impossible to solve exactly in a reasonable amount of time. Luckily, we do not have that many sights to visit, and so we can use a [brute force](https://en.wikipedia.org/wiki/Brute-force_search) approach and just iterate over all possible routes to find the shortest.

In our case, we "only" need to try out $181440$ possible routes (by treating the two airports as one node, $n$ becomes $10$).

In [None]:
from math import factorial

In [None]:
factorial(len(places) + 1 - 1) // 2

Analyzing the problem a bit further, we realize that all we need is a list of permutations of the sights as the two airports will always be the first and last location.

The [permutations()](https://docs.python.org/3/library/itertools.html#itertools.permutations) function in the [itertools](https://docs.python.org/3/library/itertools.html) module in the Standard Library helps us building the exhaustive search. Let's see a small example to understand how it works.

In [None]:
from itertools import permutations

In [None]:
numbers = [1, 2, 3]

for permutation in permutations(numbers):
    print(permutation)

However, if we just use this approach, we are actually trying out redundant routes. For example, transferred to our case, the tuples `(1, 2, 3)` and `(3, 2, 1)` represent the same route as the distances are symmetric and the traveller could be going in either direction. To obtain the unique routes, we use a `if` condition in a "tricky" way by only accepting routes where the first node has a smaller value than the last.

In [None]:
for permutation in permutations(numbers):
    if permutation[0] < permutation[-1]:
        print(permutation)

In order to compare `Place` instances as numbers, we would actually have to implement the `__eq__()` magic method (and some others). Otherwise, we get a `TypeError` like this.

In [None]:
Place(arrival) < Place(departure)

A quick and dirty solution is to use the [hash()](https://docs.python.org/3/library/functions.html#hash) built-in function that converts any object into a static integer value primarily for usage as a key in a dictionary.

In [None]:
hash(Place(arrival)) < hash(Place(departure))

As the generator expression below shows, combining the `permutations()` function with an `if` check results in the correct number of routes to be iterated over.

In [None]:
sum(1 for route in permutations(places) if hash(route[0]) < hash(route[-1]))

To implement our brute force algorithm, we split the logic into two methods.

First, we create an instance method `evaluate()` that takes a `route` argument that is a tuple of `Place` instances and returns the total distance of the route. Observe that this method uses the property `distances` repeatedly which is why we built in caching above.

Second, we create an instance method `brute_force()` that needs no arguments and iterates over all possible routes to find the shortest. Note that although we assumed the `start` and `end` nodes to be the same node when we reduced the case to a TSP, we need to treat these in a special way as all the sights are away by different distances from the two airports of course. We achieve this by deriving two routes out of every permutation of intermediate nodes, one for each direction we could take.

### The Map Class revisited: Travelling Salesman Problem

In [None]:
class Map:
    """A map with plotting and routing capabilities."""

    def __init__(self, name, center, start, end, places, initial_zoom=12):
        ...

    def __repr__(self):
        ...

    def show(self):
        ...

    def add_marker(self, marker):
        ...

    @property
    def distances(self):
        ...

    def evaluate(self, route):
        """Calculate the total distance of a route.

        Args:
            route (tuple of Places): the ordered nodes of a tour
        Returns:
            int
        """
        cost = ...  # Initialize to a start value that makes sense
        # Iterate over all pairs of nodes
        origin = ...  # use the first element as the first origin
        for destination in ...:  # iterate over the remaining tuple elements
            cost += self.distances[...]  # look up the distance for the OD pair
            origin = ...  # update the origin before the next iteration
        return cost

    def brute_force(self):
        """Calculate the shortest route by brute force."""
        # Assume a very high cost to start with
        min_cost = ...  # Initialize to a start value that makes sense
        # Find all permutations of intermediate nodes to visit
        for permutation in (x for x in permutations(...)  # iterate over all permutations
                              if hash(...) < hash(...)):  # of intermediate nodes as in the
                                                          # above generator expression
            # Travel through the intermediate nodes in both directions
            for route in (permutation, permutation[::-1]):
                # Check if a route is cheaper than all routes seen before
                route = (...,) + route + (...,)  # extend the route tuple to
                                                 # include the start and end
                cost = ...  # calculate the cost for the route
                if cost < min_cost:
                    min_cost = ...  # update the minimal cost and
                    best_route = ...  # best route seen so far
        # Plot the route on the map
        folium.PolyLine(
            [x.location for x in best_route],
            color="orange", weight=3, opacity=1
        ).add_to(self._map)
        # Enable method chaining
        return self

Let's finally find a route for our tourist.

In [None]:
berlin = Map(
    "Berlin", center=(52.4915154, 13.4066838),
    start=Place(arrival).sync_from_google(api),
    end=Place(departure).sync_from_google(api),
    places=places, initial_zoom=12
)

In [None]:
berlin.brute_force().show()