# DSGA1007 - Programming for Data Science Lab
- Modules
- Packages
- Object Oriented Programming

## Relationship with Clean Code and Software Architecture
Remember last class when we were talking abour **software structure**? Today we will review the details of what building blocks there are to create **structure** in our code.

As we discussed last week too, when you're designing software for the *long run* you want to make it **Maintainable**.

There are also other aspects involving *the people who actually use the code*, let's call them *the client*. You want *the client* to be happy, so once you give him/her a solution to their problem, you don't want to create more problems for him/her. In particular, once you committed with a particular *way of doing things* or a particular communication *interface* with *the client* you should try not to break it. 

This notion is called [**backward compatibility**](https://en.wikipedia.org/wiki/Backward_compatibility). 

Ideally you should be able to extend your functionality without breaking that *contract* you made with *the client*.

## Module
From the "Learning Python Book":
> The **highest-level program organization unit**, which packages program code and data for **reuse**, and provides self-contained namespaces that minimize variable name clashes across your programs.
As we’ve seen, Python programs are composed of **multiple module files linked together by import statements**, and each module file is **a package of variables**—that is, a namespace. Just as importantly, each module is a self-contained namespace: **one module file cannot see the names defined in another file unless it explicitly imports that other file**. Because of this, modules serve to minimize name collisions in your code—because each file is a self-contained namespace, the names in one file cannot clash with those in another, even if they are spelled the same way. 


To use a specific module inside your Python *script*, you need to import that module. 

Some ways to do this are:

``` 
import <module> as <md> # client fetches a module (module_name) as a whole
from <module> import <something> # client fetches a particular name from module
imp.reload	# sometimes this is useful when developing a module and using it within Jupyter notebooks
```

## Package
From the Learning Python Book:
> In addition to a module name, an import can name a directory path. A directory of Python code is said to be a package, so such imports are known as package imports. In effect, a package import turns a directory on your computer into another Python name- space, with attributes corresponding to the subdirectories and module files that the directory contains.

### Package \_\_init\_\_.py Files
From the Learning Python Book:
> If you choose to use package imports, there is one more constraint you must follow: at least until Python 3.3, each directory named within the path of a package import state- ment must contain a file named __init__.py, or your package imports will fail.
> The __init__.py files can contain Python code, just like normal module files. Their names are special because their code is run automatically the first time a Python pro- gram imports a directory, and thus serves primarily as a hook for performing initiali- zation steps required by the package. These files can also be completely empty, though, and sometimes have additional roles.

If you are curious, keep reading about it in page 711.

## Object Oriented Programming (OOP)
From the Learning Python Book:
> One note up front: in Python, OOP is entirely optional, and you don’t need to use classes just to get started. You can get plenty of work done with simpler constructs such as functions, or even simple top-level script code. Because using classes well requires some up-front planning, they tend to be of more interest to people who work in strategic mode (doing long-term product development) than to people who work in tactical mode (where time is in very short supply).

### Classes and Objects
Think of a **Class** as a structure that defines what an Object of that Class should look like, an **Object** is a specific **instance** of that Class (a *thing* that is compliant with those specifications the Class made).

<font color='#9f2561'>
ex: Class is a cookie cutter and cookies are objects. Since there can be many *types* os cookie cutters, we can think of them as being objects too. </font>

Let's define a very simple class:

In [1]:
class MyClass:
    """A simple example class""" # Docstring describing the Class
    i = [12345]
    def f(self): # What does this self parameter mean?
        return 'hello world'

What can we do with it?
Class objects support two kinds of operations: 
1. Attribute references
2. Instantiation

In [2]:
print('Class, will you give me your attribute i?', MyClass.i)

Class, will you give me your attribute i? [12345]


In [3]:
first_object = MyClass()
print('type of object', type(first_object))

type of object <class '__main__.MyClass'>


### Be careful with unexpected results in class variables

In [None]:
second_object = MyClass()
second_object.i.append('Second object changed the value of i')

print('Class, will you give me your attribute i?', MyClass.i)
print('First object, will your give me you attribute i?', first_object.i)
print('Second object, will your give me you attribute i?', second_object.i)

If you intention was for each object to have different values, then this should be an instance variable (attribute belonging to an object and not to the whole class object)

But the way we can now build MyClass objects is not very expressive, let's create a constructor for MyClass.

In [4]:
class MyClass:
    """A simple example class"""
    i = 12345
    
    def __init__(self, eenie, meenie, miney, mo):
        self.eenie = eenie
        self.meenie = meenie
        self.miney = miney
        self.mo = mo
    
    def f(self):
        return 'hello world'

In [5]:
my_rhyme = MyClass('eenie', 'meenie', 'minie', 'mo')
print('Get me mo attribute:', my_rhyme.mo)

Get me mo attribute: mo


### In Python attributes are public by default

If you are familiar with Java, you might recognize the modifiers `public`, `private` and `protected`. We will not go into the [details](https://docs.oracle.com/javase/tutorial/java/javaOO/accesscontrol.html) but lets just say that every *name* that a Class declares can potentially be accessed publicly by *the client* or not. In Python **attributes are public by default**, this means that the default behavior is to let *the client* see and modify them, if this is not what you *the designer* would like, then you have to explicitely **hide** them from *the client*.

Is there an issue with that?

Imagine you want to restrict the values of a certain attribute.
Let's say we want each attribute in MyClass to have at most as many letters as the attribute name has.

In [None]:
# How can I avoid this? Can I avoid it without breaking backward compatibility?
my_rhyme.mo = 'looooooooonger'

In [6]:
class MyClass:
    """A simple example class"""
    i = 12345
    
    def __init__(self, eenie, meenie, miney, mo):
        self.eenie = eenie
        self.meenie = meenie
        self.miney = miney
        self.mo = mo
    
    def f(self):
        return 'hello world'
    
    @property
    def mo(self):
        return self.__mo
    
    @mo.setter
    def mo(self, mo):
        if len(mo) > len('mo'):
            raise AttributeError
        else:
            self.__mo = mo

In [7]:
my_rhyme = MyClass('eenie', 'meenie', 'minie', 'mo')
my_rhyme.mo = 'looooooooonger'

AttributeError: 

### Inheritance and Composition
**Inheritance** is used when you want to built a class of objects that is VERY SIMILAR to another class of objects, but has a particular behavior. Remember the example we gave last class about the class **animal** and the class **dog**. If a dog behaves pretty much like an animal, but also **barks** then we're good.

**Composition** is used to build an object as a collection of components that work together. In the same example, what if we had a **four legged** kind of animals and **domestic animals** we can also think about a **dog** as being the composition of these parts that describe some animal behavior.

Design patterns suggest that we should favor **Composition** over **Inheritance**.

### Polymorphism and Exceptions
According to Leonardo Giordani in [this](http://blog.thedigitalcatonline.com/blog/2014/08/21/python-3-oop-part-4-polymorphism/#.WBtlLOErInU) blog post:
> EAFP is a Python acronym that stands for easier to ask for forgiveness than permission. This coding style is highly pushed in the Python community because it completely relies on the [duck typing concept](https://en.wikipedia.org/wiki/Duck_typing), thus fitting well with the language philosophy.
> The concept behind EAFP is fairly easy: instead of checking if an object has a given attribute or method before actually accessing or using it, just trust the object to provide what you need and manage the error case. This can be probably better understood by looking at some code. According to EAFP, instead of writing

Example:

In [1]:
class Duck:
    def quack(self):
        print("Quaaaaaack!")
    def feathers(self):
        print("The duck has white and gray feathers.")

class Person:
    def quack(self):
        print("The person imitates a duck.")
    def feathers(self):
        print("The person takes a feather from the ground and shows it.")
    def name(self):
        print("John Smith")

def in_the_forest(duck):
    duck.quack()
    duck.feathers()

def game():
    donald = Duck()
    john = Person()
    in_the_forest(donald)
    in_the_forest(john)

game()

Quaaaaaack!
The duck has white and gray feathers.
The person imitates a duck.
The person takes a feather from the ground and shows it.


# NOW, let's get to work!
### <font color='blue'>Exercise 1 – Some Module [in development] with a Weird Class</font>
A friend asked you to take a look at his module to help him with some development of his super weird Class.

In [None]:
import examples.some_module
from examples.some_module import WeirdInt

Explain what the **WeirdInt** is and what it does in a particular weird way.

**Your answer here:**

Since you are developing this module, it is probable that you changed something *on the fly*...

Modify the **WeirdInt** so that the **__add__** method returns the average of its input parameters.

In [None]:
import examples.some_module as sm

In [None]:
x = sm.WeirdInt(2)
y = sm.WeirdInt(10)
assert(x + y == 6)

What happened? How can you fix this **without restarting your Jupyter notebook's kernel**?

In [None]:
# Your answer here:

# Try intantiating some other example numbers (say 6 and 4) and check the assertion accordingly


### <font color='blue'>Exercise 2 – Exception: A Person is not a duck!</font>
Modify the class *Person* abobve so that the person no longer has the appropriate *Duck* qualities.
What happens in this case when you call the in_the_forest function on a Person?

**Your answer:**

Do you think you should always check the type of an input parameter in a Function? Why?

In [None]:
class Person:
    pass # Define the class Person


# Run game() and handle exceptions if necessary

### <font color='blue'>Exercise 3 – Design and build a simple library for performing Data Science experiments</font>
Please do this exercise in groups of two students.
You can start by designing your library in paper and then go about coding it in Python.

The library should be compliant with these specifications:
- Multiple **Data Sources** to build your final **Dataset** (start with 2, but should be extensible for more)
    - The result after data is loaded should be the same, no matter what the data source was.
	- After data is loaded, the library should be able to perform dataset split (the percentage of training, validation and testing is specified)
- Multiple **Models** to test
    - Parameters for each model might be different
	- Results should be given the same way for all models (they should be consistent)
- Run **Configuration**
    - It is composed by general parameters that specify how to run the algorithms (independent of the model) and model configuration itself. For example, the number of training iterations or the learning rate or the loss function are independent of the model. The number of leaf nodes in a tree based model is dependent on the model, both are part of the configuration you need to specify.
	- It has to be possibe to keep track of the model configurations that have been tried
- **Results**
    - You should be able to keep track of results for the training and validation sets for every iteration of the algorithm.
    - Once the execution is finished, you should save those results (in a file, for instance)
    
You can build this as a single **Package** with multiple **Modules**.
    
Interaction with the library:
1. Each run should execute a single model according to a chosen configuration, the configuration should be shared between the data loading and the model execution itself (checkout the Singleton Desing Pattern as a way to accomplish this)
2. Once the results are obtained for a model you should save them (there has to be a one-to-one mapping between configuration and results)

#### **Notice that you don't need to implement any ML algorithm, you can output reasonable *dummy* results at each point. What is important is that you achieve all requirements.**

# References
- 1. Lutz, Learning Python.
- [Python Course](http://www.python-course.eu/python3_properties.php)
- [Polymorphism](http://blog.thedigitalcatonline.com/blog/2014/08/21/python-3-oop-part-4-polymorphism/#.WBtlLOErInU)