![IE](../img/ie.png)

# Sessions 7 & 8: Object-Oriented Programming

### Juan Luis Cano Rodríguez <jcano@faculty.ie.edu> - Master in Business Analytics and Big Data (2019-04-08)

## What are "objects" anyway?

So far we have learned how to define variables, functions and modules in Python, and we have been using objects defined in other libraries, for example pandas `DataFrame`s or matplotlib `Figure`s. In very simple terms, an *object* is something that can optionally have:

* Object-bound variables, called **properties**
* Object-bound functions, called **methods**

If the object *properties* can change, we say the object has a **state**, and also that it's **mutable**. Otherwise, it's **stateless** and **immutable**. A typical example of such differences are lists (mutable) and tuples (immutable):

In [1]:
my_list = [1, 2, 3]
my_list

[1, 2, 3]

In [2]:
my_list.append(4)
my_list

[1, 2, 3, 4]

In [3]:
# The operator that creates tuples is not parentheses:
# is the comma!
my_tuple = 1, 2, 3  # Notice that I don't need parentheses!
my_tuple

(1, 2, 3)

In [4]:
print(dir(my_tuple))  # Nothing that allows us to change the state of the tuple

['__add__', '__class__', '__contains__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'count', 'index']


In [5]:
my_tuple[0] = 99

TypeError: 'tuple' object does not support item assignment

<div class="alert alert-info">Immutable objects have the advantage that they can be <strong>hashed</strong>, that is: they can be transformed, using some cryptographical function, into something that uniquely represents that object. Mutable objects can't, because the hash would have to change every time the state of the object changed. <strong>Dictionary keys have to be hashable objects.</strong></div>

In [6]:
{
    my_tuple: "my_tuple"
}

{(1, 2, 3): 'my_tuple'}

In [7]:
hash(my_tuple)

2528502973977326415

In [8]:
{
    my_list: "my_list"
}

TypeError: unhashable type: 'list'

In [9]:
hash(my_list)

TypeError: unhashable type: 'list'

## Classes and instances

Objects are defined by **instantiating a class**. A **class** is a *template* for new objects, where we define its behavior, and an **instance** is a particular realization of that class.

![Metaphors](../img/mould.jpg)

### Example

We want to model the behavior of the users of our company product, to later study how much time they spend, what are their preferences and so forth. Let's create a `User` class:

In [10]:
class User:
    pass

type(User)

type

Our `User` **class** is of type `type`, which means that it can be used to create new objects. Now, let's create two instances:

In [11]:
user1 = User()
user2 = User()

In [12]:
user1

<__main__.User at 0x7fb228120630>

In [13]:
user2

<__main__.User at 0x7fb2281205f8>

In [14]:
type(user1), type(user2)

(__main__.User, __main__.User)

We have two **instances** of `User`: `user1` and `user2`. With a slight abuse of notation, we would say we have *two `User` objects*, or just *two `Users`s*.

### Using the instance: `self`

Let's add a very simple **method** to demonstrate a very important concept in Python: the *explicit `self`*. Remember that a method is like a function that is bound to the object, and can use its properties. Methods are defined like this:

In [21]:
class User:
    def test(self):
        print(f"This is {self}")

In [22]:
user1 = User()
user1.test()

This is <__main__.User object at 0x7fb22d376080>


Why are methods (instead of plain functions) interesting? Because of **duck typing**:

> "If it walks like a duck and it quacks like a duck, then it must be a duck"
> -- https://en.wikipedia.org/wiki/Duck_typing

If something has a method that I need, I don't care about its type.

In [15]:
def do_stuff(obj):
    return obj.mean()

In [17]:
import numpy as np
import pandas as pd

print(do_stuff(np.arange(5)))
print(do_stuff(pd.Series([0, 1, 2, 3, 4])))

2.0
2.0


Notice how we called `user1.test()` **without passing an extra argument**? This is because Python is automatically passing the instance. It's the equivalent of doing this (**never do this**):

In [23]:
User.test(user1)

This is <__main__.User object at 0x7fb22d376080>


In fact, if we define a method without a first parameter, it will fail when we call it:

In [24]:
class TestClass:
    def test():
        pass  # Don't do anything

t = TestClass()
t.test()  # Fails

TypeError: test() takes 0 positional arguments but 1 was given

This first parameter can be called anything, but **everybody uses `self`**. Remember, conventions are important to minimize surprise and enhance collaboration!

### Intermezzo: f-strings

In [18]:
f"This is {user1}"  # Python >= 3.6

'This is <__main__.User object at 0x7fb228120630>'

In [19]:
"This is {}".format(user1)  # Python < 3.6, equivalent

'This is <__main__.User object at 0x7fb228120630>'

In [43]:
User.test(user1)  # DON'T use! (Although it's equivalent)

This is <__main__.User object at 0x7fbf1e8424e0>


In [36]:
# %timeit User.test(user1)  # They have about the same performance 
# %timeit user1.test()

### Initializing our instances

Our `User` objects are not very useful yet. We will now add some properties, like their `name` and their `signup_date`. Nothing stops me from adding any property to my objects:

In [25]:
...

Ellipsis

In [26]:
user1.this_property = ...  # "99", whatever

In [27]:
user1.this_property

Ellipsis

However, this is considered a bad practice, and can confuse editors and static analysis tools. These properties should be specified on creation, in a way that I cannot have a user without `name` and `signup_date`. For that, Python provides us a special method, `__init__`<sup>1</sup>, that **initializes**<sup>2</sup> the object:

In [28]:
class User:
    # "dunder init" = double underscore init
    def __init__(self, name, signup_date):
        self.name = name
        self.signup_date = signup_date

In [29]:
import datetime as dt

In [30]:
user1 = User(name="John Doe", signup_date=dt.datetime.now())

In [31]:
user1.name, user1.signup_date

('John Doe', datetime.datetime(2019, 4, 11, 11, 51, 15, 466843))

<div class="alert alert-warning"><sup>1</sup>Not to be confused with the <code>__init__.py</code> we used to put our code!</div>
<div class="alert alert-warning"><sup>2</sup>Sometimes this method is called the <em>constructor</em>, but strictly speaking, in Python the constructor is <code>__new__</code> and you should not use it. The difference is that the constructor <em>returns an instance</em>, whereas the initializer <em>works with an already created instance and should return <code>None</code></em>.</div>

That's something! However, there are several things we can improve:

* It can be cumbersome to specify the date every time, and it would be nice to have some default.
* The default representation of the instances contains some hexadecimal memory address and nothing else. It would be nice to at least see the user name and the signup date
* Nothing stops me from changing the name and signup_date of a existing user:

In [32]:
user1.name = "John Doe Jr."
user1.name

'John Doe Jr.'

### Exercise

* Make `signup_date` optional by providing a default value (be careful, there's a trap!)
* Make the `__repr__` method return a string containing the `name` and `signup_date`, which will override the default 

In [34]:
class User:
    def __init__(self, name, signup_date=None):
        if signup_date is None:
            signup_date = dt.datetime.now()

        self.name = name
        self.signup_date = signup_date

    def __repr__(self):
        return f"User(name='{self.name}', signup_date={repr(self.signup_date)})"

In [35]:
user1 = User("John Doe")
user1

User(name='John Doe', signup_date=datetime.datetime(2019, 4, 11, 12, 7, 35, 803057))

<div class="alert alert-danger">Watch out with default parameters! They are created <strong>when the function is defined</strong>:</div>

In [36]:
def foo(my_list=[1, 2, 3]):
    my_list.append(4)
    print(my_list)

In [37]:
foo()

[1, 2, 3, 4]


In [38]:
foo()

[1, 2, 3, 4, 4]


In [39]:
foo()

[1, 2, 3, 4, 4, 4]


In [40]:
def foo(date=dt.datetime.now()):
    print(date)

In [41]:
foo()

2019-04-11 12:07:41.733726


In [42]:
foo()

2019-04-11 12:07:41.733726


### Extra: date formatting

In [43]:
dt.datetime.now().isoformat()  # ISO 8601
# If you don't like it, there's http://strftime.org/

'2019-04-11T12:08:13.182415'

In [44]:
user1.signup_date.strftime("%Y ::: %d")

'2019 ::: 11'

### Protecting properties

In Python, *there are no private attributes* (neither properties nor methods), and in fact everything can be accessed<sup>1</sup>. However, we can "hide" them by default in autocomplete and other environments by using a leading underscore `_`: this is usually called **protected variables**.

There is a common pattern in which, if I want to make some property read-only, we can

1. Make it protected
2. Create a "getter" using the `@property` decorator, which gets the value of the protected property with a public name

<small><sup>1</sup>This philosophy used to be summarized by the sentence "we are all consenting adults here", which is nowadays being less used.</small>

In [45]:
class User:
    def __init__(self, name, signup_date=None):
        if signup_date is None:
            signup_date = dt.datetime.now()

        self._name = name
        self._signup_date = signup_date

    @property
    def name(self):
        return self._name

    @property
    def signup_date(self):
        return self._signup_date

    def __repr__(self):
        return f"User(name='{self.name}', signup_date='{self.signup_date}')"

In [46]:
user1 = User("John Doe")

In [47]:
user1.name

'John Doe'

In [48]:
user1.name = "Jane Doe"

AttributeError: can't set attribute

<div class="alert alert-warning">If you see tutorials mentioning "true private variables", they are wrong!</div>

In [49]:
class Test:
    def __init__(self, name):
        self.__name = name  # Not what you think!

In [50]:
t1 = Test("This name")

In [51]:
t1.__name

AttributeError: 'Test' object has no attribute '__name'

In [52]:
t1._Test__name  # These are *NOT* "private" properties

'This name'

---

### Inheritance

In [53]:
class SpecialUser(User):
    def __init__(self, name, age, signup_date=None):
        # Initializes self._name and self._signup_date
        super().__init__(name, signup_date)

        self._age = age

    @property
    def age(self):
        return self._age

    def greet(self):
        print(f"Hi! I'm {self.name}")

In [54]:
s_user1 = SpecialUser("John Doe", 27)
#s_user1

In [55]:
s_user1.name

'John Doe'

In [56]:
s_user1.greet()

Hi! I'm John Doe


<div class="alert alert-warning">Python supports multiple inheritance as well, which must be handled with care: see for example the <a href="https://www.wikiwand.com/en/Multiple_inheritance#/The_diamond_problem">Diamond problem</a>.</div>
<div class="alert alert-warning">Now that you discovered inheritance, you might be tempted to use it everywhere. Lots of very subtle mistakes can be introduced by abusing inheritance or using it in wrong ways, see for example <a href="https://softwareengineering.stackexchange.com/a/238184/15297">this amusing story</a>, which explains of the <a  href="https://www.wikiwand.com/en/Liskov_substitution_principle">(Barbara) Liskov substitution principle</a>, and this article about <a href="http://www.thedigitalcatonline.com/blog/2014/08/20/python-3-oop-part-3-delegation-composition-and-inheritance/">composition and inheritance</a>.</div>

### More special methods

https://docs.python.org/3/reference/datamodel.html

In [57]:
class WickedList(list):
    def __len__(self):
        return 42

In [58]:
a = WickedList([1, 2, 3])

In [59]:
a

[1, 2, 3]

In [60]:
len(a)

42

In [64]:
class Number:
    def __add__(self, other):
        return 42

In [65]:
num = Number()

In [66]:
num + 1

42