# Advanced Python  & Object Oriented Python

Contains material from:*Effective Python: 59 Specific Ways to Write Better Python*, by Brett Slatkin, Addison-Wesley, 2015.

---

> Panos Louridas, Associate Professor <br />
> Department of Management Science and Technology <br />
> Athens University of Economics and Business <br />
> louridas@aueb.gr

## Objects and Attributes

* Programmers comming from other programming languages are used to using getter and setter methods.

* So for example they may be tempted to write something like the following.

In [1]:
class OldResistor(object):
    
    def __init__(self, ohms):
        self._ohms = ohms
        
    def get_ohms(self):
        return self._ohms
    
    def set_ohms(self, ohms):
        self._ohms = ohms

* This, however, leads to following kind of code, which is not Pythonic at all.

In [2]:
r0 = OldResistor(50e3)
print('Before: %5r' % r0.get_ohms())
r0.set_ohms(10e3)
print('After: %5r' % r0.get_ohms())


Before: 50000.0
After: 10000.0


* This can lead to the following clumsy code:

In [3]:
r0.set_ohms(r0.get_ohms() + 5e3)
r0.get_ohms()

15000.0

* In Python, we always implement our objects with simple public attributes.

In [4]:
class Resistor(object):
    
    def __init__(self, ohms):
        self.ohms = ohms
        self.voltage = 0
        self.current = 0

r1 = Resistor(50e3)
print(r1.ohms)
r1.ohms = 10e3
print(r1.ohms)
r1.ohms += 5e3
print(r1.ohms)

50000.0
10000.0
15000.0


* If we want to enforce some special behavior when an attribute is set, we can use the `@property` decorator.

* In the following example, we subclass `Resistor` so that we can modify the `current` by changing the `voltage`.

In [5]:
class VoltageResistance(Resistor):
    
    def __init__(self, ohms):
        super().__init__(ohms)

    @property
    def voltage(self):
        return self._voltage
    
    @voltage.setter
    def voltage(self, voltage):
        self._voltage = voltage
        self.current = self._voltage / self.ohms

* Now, assigning the `voltage` property will run the voltage setter method, which will update the `current` property.

In [6]:
r2 = VoltageResistance(1e3)
print('Before: %5r amps' % r2.current)
r2.voltage = 10
print('After: %5r amps' % r2.current)

Before:     0 amps
After:  0.01 amps


* Specifying a setter on a property allows us to perform type checking and validation. 

* For example, here is how we can define a class that ensures that all resistance values are above zero ohms.

In [7]:
class BoundedResistance(Resistor):
    
    def  __init__(self, ohms):
        super().__init__(ohms)
        
    @property
    def ohms(self):
        return self._ohms
    
    @ohms.setter
    def ohms(self, ohms):
        if ohms <= 0:
            raise ValueError('%f ohms must be > 0' % ohms)
        self._ohms = ohms

* Now if we try to assign an invalid resistance we'll get an exception.

In [8]:
r3 = BoundedResistance(1e3)
r3.ohms = 0

ValueError: 0.000000 ohms must be > 0

* We can even use the `@property` decorator to make attributes from parent classes immutable:

In [9]:
class FixedResistance(Resistor):
    
    def  __init__(self, ohms):
        super().__init__(ohms)
        
    @property
    def ohms(self):
        return self._ohms
    
    @ohms.setter
    def ohms(self, ohms):
        if hasattr(self, '_ohms'):
            raise AttributeError("Can't set attribute")
        self._ohms = ohms

 * Now if we try to set the property after construction, we'll get an exception.

In [10]:
r4 = FixedResistance(1e3)
r4.ohms = 2e3

AttributeError: Can't set attribute

## Decorators

* Decorators wrap a function changing its behavior.

* In essence, when we write:
  ```python
  @my_decorator
  def my_function():
      pass
  ```
  it is equivalent to the following:
  ```python
  my_function = decorator(my_function)
  ```

* As an example, let us write a decorator that times a function.

In [11]:
import time

def time_execution(f):

    def decorator(*args, **kwargs):
        start = time.time()
        r = f(*args, **kwargs)
        end = time.time()
        print(end - start)
        return r

    return decorator

def fib(n):
    a, b = 0, 1
    for i in range(2, n):
        a, b = b, a + b
    return b

fib = time_execution(fib)
result = fib(1000)
print(result)

7.128715515136719e-05
26863810024485359386146727202142923967616609318986952340123175997617981700247881689338369654483356564191827856161443356312976673642210350324634850410377680367334151172899169723197082763985615764450078474174626


* In practice, instead of doing it like that, we use some syntactic sugar:

In [12]:
import time

def time_execution(f):

    def decorator(*args, **kwargs):
        start = time.time()
        r = f(*args, **kwargs)
        end = time.time()
        print(end - start)
        return r

    return decorator

@time_execution
def fib(n):
    a, b = 0, 1
    for i in range(2, n):
        a, b = b, a + b
    return b

result = fib(1000)
print(result)

9.608268737792969e-05
26863810024485359386146727202142923967616609318986952340123175997617981700247881689338369654483356564191827856161443356312976673642210350324634850410377680367334151172899169723197082763985615764450078474174626


* As an additional example, suppose that we want to print the arguments and the the return value of a function call.

* This can be useful if we want to trace the function calls in a recursive function.

* This is a decorator that will do the job:

In [13]:
def trace(func):
    def wrapper(*args, **kwargs):
        result = func(*args, **kwargs)
        print('%s(%r, %r) -> %r' %
             (func.__name__, args, kwargs, result))
        return result
    return wrapper

@trace
def fibonacci(n):
    """Return the n-th Fibonacci number"""
    if n in (0, 1):
        return n
    return (fibonacci(n - 2) + fibonacci(n - 1))

fibonacci(3)

fibonacci((1,), {}) -> 1
fibonacci((0,), {}) -> 0
fibonacci((1,), {}) -> 1
fibonacci((2,), {}) -> 1
fibonacci((3,), {}) -> 2


2

* This is all fine, but the value returned by the decorator doesn't know it's named `fibonacci`:

In [14]:
print(fibonacci)

<function trace.<locals>.wrapper at 0x107a39ae8>


* The reason this happens is because the `trace` function returns the `wrapper` it defines.

* Therefore it's the `wrapper` function that is assigned to the `fibonacci` name.

* This can create problems with debuggers, and also with the built-in help system.

In [15]:
help(fibonacci)

Help on function wrapper in module __main__:

wrapper(*args, **kwargs)



* To get around the problem, we use the `@wraps` decorator from the `functool` module.

* This decorator copies all the function metadata to the decorator we define.

In [16]:
from functools import wraps

def trace(func):
    @wraps(func)
    def wrapper(*args, **kwargs):
        result = func(*args, **kwargs)
        print('%s(%r, %r) -> %r' %
             (func.__name__, args, kwargs, result))
        return result
    return wrapper

@trace
def fibonacci(n):
    """Return the n-th Fibonacci number"""
    if n in (0, 1):
        return n
    return (fibonacci(n - 2) + fibonacci(n - 1))

fibonacci(3)

fibonacci((1,), {}) -> 1
fibonacci((0,), {}) -> 0
fibonacci((1,), {}) -> 1
fibonacci((2,), {}) -> 1
fibonacci((3,), {}) -> 2


2

* Now the help system (and everything else besides) will work as intended:

In [17]:
help(fibonacci)

Help on function fibonacci in module __main__:

fibonacci(n)
    Return the n-th Fibonacci number



## Generator Expressions

* We have seen that list comprehensions are an important feature of Python.

* However, we must be careful if we use list comprehensions over a very large amount of data.

* The reason is that, as all results are put into a list, we may end up with a very big list containing lots of data.


* The following list comprehension iterates over the lines of this file and calculates the length of each one of them.

In [18]:
value = [ len(x) for x in open('advanced_python.ipynb')]
value

[2,
 12,
 4,
 28,
 17,
 19,
 27,
 6,
 6,
 15,
 53,
 10,
 133,
 10,
 13,
 10,
 54,
 66,
 62,
 25,
 5,
 5,
 4,
 28,
 17,
 19,
 27,
 6,
 6,
 15,
 35,
 10,
 109,
 10,
 82,
 5,
 5,
 4,
 24,
 25,
 17,
 19,
 27,
 6,
 6,
 18,
 15,
 36,
 14,
 39,
 35,
 18,
 33,
 35,
 14,
 39,
 32,
 5,
 5,
 4,
 28,
 17,
 19,
 27,
 6,
 6,
 15,
 86,
 5,
 5,
 4,
 24,
 25,
 17,
 19,
 30,
 6,
 6,
 16,
 6,
 23,
 30,
 15,
 27,
 25,
 7,
 6,
 6,
 15,
 32,
 46,
 27,
 44,
 5,
 5,
 4,
 28,
 17,
 19,
 27,
 6,
 6,
 15,
 52,
 5,
 5,
 4,
 24,
 25,
 17,
 19,
 30,
 6,
 6,
 16,
 6,
 15,
 22,
 17,
 8,
 8,
 27,
 21,
 37,
 6,
 6,
 15,
 42,
 20,
 5,
 5,
 4,
 28,
 17,
 19,
 27,
 6,
 6,
 15,
 82,
 5,
 5,
 4,
 24,
 25,
 17,
 19,
 30,
 6,
 6,
 16,
 6,
 23,
 30,
 15,
 19,
 19,
 18,
 7,
 6,
 6,
 15,
 33,
 14,
 39,
 34,
 34,
 34,
 10,
 29,
 24,
 24,
 24,
 24,
 21,
 5,
 5,
 4,
 28,
 17,
 19,
 27,
 6,
 6,
 15,
 119,
 10,
 120,
 5,
 5,
 4,
 24,
 26,
 17,
 19,
 30,
 6,
 6,
 18,
 15,
 44,
 14,
 39,
 40,
 10,
 23,
 32,
 38,
 14,
 29,
 41,
 41,
 55

* To solve the problem, Python provides *generator expressions*. 

* Generator expressions do not materialize the whole output sequence on which they are run.

* Instead, they evaluate an iterator that yields one item at a time from the expression.

* Here is a generator expression for the previous task:

In [19]:
it = (len(x) for x in open('advanced_python.ipynb'))
it

<generator object <genexpr> at 0x107a16f10>

* Now, each time we need to get the length of the next unread line of, we call `next()`:

In [20]:
print(next(it))
print(next(it))

2
12


* A nice feature of generator expressions is that they can be chained together.

* Here we take the iterator returned by the generator expression and use it as input to another generator expression.

In [21]:
roots = ((x, x**0.5) for x in it)

print(next(roots))

(4, 2.0)


## Iterators

* An iterator is an object representing a stream of data; this object returns the data one element at a time. 

* A Python iterator must support a method called `__next__()` that takes no arguments and always returns the next element of the stream. 

* If there are no more elements in the stream, `__next__()` must raise the `StopIteration` exception. 

* Iterators don’t have to be finite, though; it’s perfectly reasonable to write an iterator that produces an infinite stream of data.

* The `next()` function retrieves the next item from an iterator by calling its `__next__()` method.

* The built-in `iter()` function takes an arbitrary object and tries to return an iterator that will return the object’s contents or elements, raising `TypeError` if the object doesn’t support iteration. 

* Several of Python’s built-in data types support iteration, the most common being lists and dictionaries.

In [22]:
L = [1,2,3]
it = iter(L)
it

<list_iterator at 0x107a4c940>

In [23]:
print(next(it))
print(next(it))

1
2


## Iterable

* An object is called an *iterable* if we can get an iterator for it.

* If an object contains an `__iter__()` method, that method will be called when we call `iter()` on the object.

* Then, the `__iter()__` method should return an iterator.

* In the statement `for X in Y, Y` must be an iterator or some object for which `iter()` can create an iterator. 

* These two statements are equivalent:
  ```python
  for i in iter(obj):
      print(i)
  ```

  ```python
  for i in obj:
      print(i)
  ```

* For example, let's examine the following Fibonacci class from [Dive into Python](http://www.diveintopython3.net/iterators.html):

In [24]:
class Fib:
    def __init__(self, max):
        self.max = max

    def __iter__(self):
        self.a = 0
        self.b = 1
        return self

    def __next__(self):
        fib = self.a
        if fib > self.max:
            raise StopIteration 
        self.a, self.b = self.b, self.a + self.b
        return fib  

In [25]:
for n in Fib(1000):
    print(n, end=' ')

0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 

## Generators

* Generators are a way to create iterators.

* A generator returns an iterator that returns a stream of values.

* In particular, a function containing the `yield` keyword is a generator function.

* When you call a generator function, it doesn’t return a single value; instead it returns a generator object that supports the iterator protocol. 

In [26]:
def generate_ints(n):
    for i in range(n):
        yield i

* On executing the `yield` expression, the generator outputs the value of `i`, similar to a return statement. 

* The big difference between `yield` and a `return` statement is that on reaching a `yield` the generator’s state of execution is suspended and local variables are preserved. 

* On the next call to the generator’s `__next__()` method, the function will resume executing.

In [27]:
gen = generate_ints(3)
gen  

<generator object generate_ints at 0x1078b1b48>

In [28]:
print(next(gen))
print(next(gen))
print(next(gen))

0
1
2


* The following is a more interesting example, where we traverse a tree in-order.

In [29]:
class Tree:
    
    def __init__(self, label, left=None, right=None):
        self.label = label
        self.left = left
        self.right = right

    def __repr__(self, level=0, indent="    "):
        s = level*indent + repr(self.label)
        if self.left:
            s = s + "\\n" + self.left.__repr__(level+1, indent)
        if self.right:
            s = s + "\\n" + self.right.__repr__(level+1, indent)
        return s
    
    def __iter__(self):
        return inorder(self)

* Create a tree from a list:

In [30]:
def tree(list):
    n = len(list)
    if n == 0:
        return []
    i = n // 2
    return Tree(list[i], tree(list[:i]), tree(list[i+1:]))

* We can then create a tree:

In [31]:
t = tree("ABCDEFGHIJKLMNOPQRSTUVWXYZ")

* And then we can define the function that traverses the tree in-order

In [32]:
def inorder(t):
    if t:
        for x in inorder(t.left):
            yield x

        yield t.label

        for x in inorder(t.right):
            yield x

In [33]:
for x in t:
    print(' '+x, end='')

 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

## Threads and Processes

* Suppose we want to do something computationally intensive with Python.

* For example, see this naive number factorization algorithm.

In [34]:
def factorize(number):
    for i in range(1, number + 1):
        if number % i == 0:
            yield i

* Factoring a set of numbers serially can take quite a long time.

In [40]:
from time import time

numbers = [22139079, 21214759, 21516637, 21852285, 2139801]
start = time()
for number in numbers:
    print(list(factorize(number)))
end = time()
print('Took %.3f seconds' % (end - start))

[1, 3, 557, 1671, 13249, 39747, 7379693, 22139079]
[1, 17, 461, 2707, 7837, 46019, 1247927, 21214759]
[1, 29, 97, 2813, 7649, 221821, 741953, 21516637]
[1, 3, 5, 7, 13, 15, 21, 35, 39, 49, 65, 91, 105, 147, 195, 245, 273, 455, 637, 735, 1365, 1911, 2287, 3185, 6861, 9555, 11435, 16009, 29731, 34305, 48027, 80045, 89193, 112063, 148655, 208117, 240135, 336189, 445965, 560315, 624351, 1040585, 1456819, 1680945, 3121755, 4370457, 7284095, 21852285]
[1, 3, 713267, 2139801]
Took 5.996 seconds


* We might me tempted to speed up the computation using multiple threads.

* Threads are supported in Python via the `threading` library, so we would write something like the following.

In [41]:
from threading import Thread

class FactorizeThread(Thread):
    
    def __init__(self, number):
        super().__init__()
        self.number = number
     
    def factorize(self, number):
        for i in range(1, number + 1):
            if number % i == 0:
                yield i

    def run(self):
        self.factors = list(self.factorize(self.number))

* Then we can start a thread for factorizing each number in parallel.

In [42]:
start = time()
threads = []
for number in numbers:
    thread = FactorizeThread(number)
    thread.start()
    threads.append(thread)
    
for thread in threads:
    thread.join()
end = time()
print('Took %.3f seconds' % (end - start))

Took 7.010 seconds


* You may be surprised to see that this is taking even longer than before!

* That is because of the Global Interpreter Lock (GIL).

* The GIL synchronizes threads and limits their execution on a single core.

* It prevents the Python interpreter from being affected by preemptive multithreading, where one thread takes control of a program by interrupting another thread.

* Should that happen at an unexpected time, the interpreted state might get corrupted.

* Then why have threads at all in Python?

* Threads will work fine, and throughput will increase, if they make *blocking I/O calls*.

* For example, if we perform blocking I/O and intensive computations simultaneously, we may consider moving the I/O calls to threads.

* To achieve true parallelism, we should use the `multiprocessing` library.

* That provides us with parallel processes that have a thread-like interface.

* So our example would be as follows.

In [43]:
from multiprocessing import Process

class FactorizeProcess(Process):
    
    def __init__(self, number):
        super().__init__()
        self.number = number
     
    def factorize(self, number):
        for i in range(1, number + 1):
            if number % i == 0:
                yield i

    def run(self):
        self.factors = list(self.factorize(self.number))

In [45]:
start = time()

processes = []
for number in numbers:
    process = FactorizeProcess(number)
    process.start()
    processes.append(process)
    
for process in processes:
    process.join()
end = time()
print('Took %.3f seconds' % (end - start))

Took 1.716 seconds


* In general, it is not a good idea to share memory between processes (that's one reason why programming with threads is error-prone).

* However, if we really want to do that, we can do it using a [Value](https://docs.python.org/3/library/multiprocessing.html#multiprocessing.Value) or an [Array](https://docs.python.org/3/library/multiprocessing.html#multiprocessing.Array).

* Moreover, for more flexibility, we can use the [ multiprocessing.sharedctypes](https://docs.python.org/3/library/multiprocessing.html#module-multiprocessing.sharedctypes) module.

* In general, if you want to distribute tasks in different processes, it is probably worth using some dedicated framework to do it.

* A popular choice is [Celery](http://www.celeryproject.org/).

* Also popular is [Luigi](https://luigi.readthedocs.io/en/stable/) (built by Spotify), which helps in building job pipelines.

## Web Service Development with Flask

* Flask is a web development micro-platform.

* It is particularly convenient for fast development and deployment of relatively small applications.

* It is very popular and is in widespread use.

* To install flask, you only need to use `pip`:
  ```bash
  pip install flask
  ```

* This is the Hello World! program in Flask.

* It is a web server that returns a Hello World! web page.

In [6]:
from flask import Flask
app = Flask(__name__)

@app.route("/")
def hello():
    return "<h1>Hello World!</h1>"

if __name__ == "__main__":
    app.run()

 * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)


* We save this program in a file, say `hello_world.py`.

* We run the program with:
  ```bash
  python hello_world.py
  ```
  
* Then the web server starts and is available at http://127.0.0.1:5000/.


* Suppose we want to have a parameterized URL.

* We would change our program as follows:

In [1]:
from flask import Flask
app = Flask(__name__)

@app.route("/")
def hello():
    return "<h1>Hello World!</h1>"

@app.route('/user/<name>')
def user(name):
    return '<h1>Hello user %s!</h1>' % name

if __name__ == "__main__":
    app.run()

 * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)


* In a proper application, we would have all the logic we needed in one function.

* This could be connecting and reading from the database, writing to it, etc.

* Or it could consist of handling pandas data.

* Or running scikit-learn tasks and returning the results to the users.


* To deploy a Flask application on a server machine, we can put it into a standalone WSGI container.

* Probably the easiest option is to use [Gunicorn](http://gunicorn.org/).

* We can install it with pip:
  ```bash
  pip install gunicorn
  ```

* If we run:
  ```bash
  gunicorn --bind 127.0.0.1:5000 hello_world:app
  ```
  
* The application will start running through Gunicorn.

* We may want to start concurrently four workers to handle our requests. This we could do with:
  ```bash
  gunicorn -w 4 -b 127.0.0.1:5000 hello_world:app
  ```

* Note that the above means that our site will listen only on the local interface (127.0.0.1), so we will not be able to connect to it from the outside world!

* To do that, we we need to substitute `127.0.0.1` with the real IP address of our server machine.

* Or we can have Gunicorn run behind a reverse proxy, such as [Apache](https://httpd.apache.org/) or [nginx](https://nginx.org/en/).