# Object-Oriented Python

During this session, we will be exploring the Oriented-Object paradigm in Python using all what we did with Pandas in previous sessions. We will be working with the same data of aircraft supervising latest Tour de France.

In [None]:
import pandas as pd

df = pd.read_json("data/tour_de_france.json.gz")

There are three main principles around OOP:
- **encapsulation**: objects embed properties (attributes, methods);
- **interface**: objects expose and document services, they hide all about their inner behaviour;
- **factorisation**: objects/classes with similar behaviour are grouped together.

A common way of working with Python is to implement **protocols**. Protocols are informal interfaces defined by a set of methods allowing an object to play a particular role in the system. For instance, for an object to behave as an iterable you don't need to subclass an abstract class Iterable or implement explicitely an interface Iterable: it is enough to implement the special methods `__iter__` method or even just the `__getitem__` (we will go through these concepts hereunder).

Let's have a look at the special method `sorted`: it expects an **iterable** structure of **comparable** objects to return a sorted list of these objects. Let's have a look:

In [None]:
sorted([-2, 4, 0])

However it fails when object are not comparable:

In [None]:
sorted([-1, 1+1j, 1-2j])

Then we can write our own ComparableComplex class and implement a comparison based on modules. The **comparable** protocol expects the `<` operator to be defined (special keyword: `__lt__`)

In [None]:
class ComparableComplex(complex):
    def __lt__(a, b):
        return abs(a) < abs(b)


# Now this works: note the input is not a list but a generator.
sorted(ComparableComplex(i) for i in [-1, 1 + 1j, 1 - 2j])

We will be working with different views of pandas DataFrame for trajectories and collection of trajectories. Before we start any further, let's remember two ways to factorise behaviours in Object-Oriented Programming: **inheritance** and **composition**.

The best way to do is not always obvious and it often takes experience to find the good and bad sides of both paradigms.

In our previous examples, our ComparableComplex *offered not much more* than complex numbers. As long as we don't need to compare them, we could have *put them in a list together* with regular complex numbers *without loss of generality*: after all a ComparableComplex **is** a complex. That's a good smell for **inheritance**.

If we think about our trajectories, we will build them around pandas DataFrames. Trajectories will probably have a single attribute: the dataframe. It could be tempting to inherit from `pd.DataFrame`; it will probably work fine in the beginning but problems will occur sooner than expected (most likely with inconsistent interfaces). We **model** trajectories and collections of trajectories with dataframes, but a trajectory **is not** a dataframe. Be reasonable and go for **composition**. 

So now we can start.

- The `__init__` special method defines a constructor. `self` is necessary: it represents the current object.  
  Note that **the constructor does not return anything**.

In [None]:
class FlightCollection:
    def __init__(self, data):
        self.data = data


class Flight:
    def __init__(self, data):
        self.data = data

In [None]:
FlightCollection(df)

## Special methods

There is nothing much we did at this point: just two classes holding a dataframe as an attribute. Even the output representation is the default one based on the class name and the object's address in memory.

- we can **override** the special `__repr__` method (which **returns** a string—**do NOT** `print`!) in order to display a more relevant output. You may use the number of lines in the underlying dataframe for instance.

<div class='alert alert-warning'>
    <b>Exercice:</b> Write a relevant <code>__repr__</code> method.
</div>

In [None]:
# %load solutions/flight_repr.py


In [None]:
FlightCollection(df)

Note that we passed the dataframe in the constructor. We want to keep it that way (we will see later why). However we may want to create a different type of constructor to read directly from the JSON file. There is a special kind of keyword for that.

- `@classmethod` is a decorator to put before a method. It makes it an **class method**, i.e. you call it on the class and not on the object. The first parameter is no longer `self` (the instance) but by convention `cls` (the class).

<div class='alert alert-warning'>
    <b>Exercice:</b> Write a relevant <code>read_json</code> class method.
</div>

In [None]:
# %load solutions/flight_json.py


In [None]:
collection = FlightCollection.read_json("data/tour_de_france.json.gz")

Now we want to make this `FlightCollection` iterable.

- The special method to implement is `__iter__`. This method takes no argument and **yields** elements one after the other.

<div class='alert alert-warning'>
    <b>Exercice:</b> Write a relevant <code>__iter__</code> method which yields Flight instances.
</div>

Of course, you should reuse the code of last session about iteration.

In [None]:
# %load solutions/flight_iter.py


In [None]:
collection = FlightCollection.read_json("data/tour_de_france.json.gz")

for flight in collection:
    print(flight)

<div class='alert alert-warning'>
    <b>Exercice:</b> Write a relevant <code>__repr__</code> method for Flight including callsign, aircraft icao24 code and day of the flight.
</div>

In [None]:
# %load solutions/flight_nice_repr.py


In [None]:
for flight in collection:
    print(flight)

<div class='alert alert-success'>
    <b>Note:</b> Since our FlightCollection is iterable, we can pass it to any method accepting iterable structures.
</div>

In [None]:
list(collection)

<div class='alert alert-warning'>
    <b>Warning:</b> However, it won't work here, because Flight instances cannot be compared, unless we specify on which criterion we want to compare.
</div>

In [None]:
sorted(collection)

In [None]:
sorted(collection, key=lambda x: x.min("timestamp"))

<div class='alert alert-warning'>
    <b>Exercice:</b> Implement the proper missing method so that a FlightCollection can be sorted.
</div>

In [None]:
# %load solutions/flight_sort.py


In [None]:
sorted(collection)

## Data visualisation

See the following snippet of code for plotting trajectories on a map.

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
from cartopy.crs import EuroPP, PlateCarree

fig, ax = plt.subplots(figsize=(10, 10), subplot_kw=dict(projection=EuroPP()))
ax.coastlines("50m")

for flight in collection:
    flight.data.plot(
        ax=ax,
        x="longitude",
        y="latitude",
        legend=False,
        transform=PlateCarree(),
        color="steelblue",
    )

ax.set_extent((-5, 10, 42, 52))
ax.set_yticks([])

<div class='alert alert-warning'>
    <b>Exercice:</b> Implement a plot method to make the job even more simple.
</div>

In [None]:
# %load solutions/flight_plot.py


In [None]:
fig, ax = plt.subplots(figsize=(10, 10), subplot_kw=dict(projection=EuroPP()))
ax.coastlines("50m")

for flight in collection:
    flight.plot(ax, color="steelblue")

ax.set_extent((-5, 10, 42, 52))
ax.set_yticks([])

## Indexation

Until now, we implemented all what is necessary to iterate on structures.  
This means we have all we need to yield elements one after the other.

Note that:
- Python does not assume your structure has a length.  
  (There are some infinite iterators, like the one yielding natural integers one after the other.)
- Python cannot guess for you how you want to index your flights.


In [None]:
len(collection)

In [None]:
collection['ASR172B']

There are many ways to proceed with indexing. We may want to select flights with a specific callsign, or a specific icao24 code. Also, if only one Flight is returned, we want a Flight object. If two or more segments are contained in the underlying dataframe, we want to stick to a FlightCollection.

<div class="alert alert-warning">
    <b>Exercice:</b> Implement a <code>__len__</code> special method, then a <code>__getitem__</code> special method that will return a Flight or a FlightCollection (depending on the selection) wrapping data corresponding to the given callsign or icao24 code.
</div>

In [None]:
# %load solutions/flight_index.py


In [None]:
collection = FlightCollection.read_json("data/tour_de_france.json.gz")
collection

In [None]:
collection["3924a0"]

In [None]:
collection["ASR172B"]

In [None]:
from collections import defaultdict

count = defaultdict(int)
for flight in collection["ASR172B"]:
    count[flight.icao24] += 1

count

As we can see here, this method for indexing is not convenient enough. We could select the only flight `collection["ASR172B"]["3924a0"]` but with current implementation, there is no way to separate the 18 other flights.

<div class='alert alert-warning'>
    <b>Exercice:</b> Implement a different <code>__getitem__</code> method that checks the type of the index: filter on callsign/icao24 if the key is a <code>str</code>, filter on the day of the flight if the key is a <code>pd.Timestamp</code>.
</div>

In [None]:
# %load solutions/flight_index_time.py


In [None]:
collection = FlightCollection.read_json("data/tour_de_france.json.gz")

In [None]:
collection["ASR172B"][pd.Timestamp("2019-07-18")]

<div class='alert alert-warning'>
    <b>Exercice:</b> Plot all trajectories flying on July 18th. How can they be sure to not collide with each other?
</div>

In [None]:
# %load solutions/flight_plot_july18.py
