# Software engineering for data scientists in Python

Modularity:
- divide code in shorter, functional units
- improve readability
- improve maintanability

### Conventions and PEP 8

`pycodestyle` permet de voir les PEP 8 violations

En gros j'appliquais déjà ces règles... 
- espace entre symbole, après virgule
- sauter des lignes entres différentes idées
- les import sont au début du code
- les commentaires commencent avec '# '

### Writing a package

The package name should be lowercase, avoid _ (only if necessary)
- file named `__init__.py` in package_name directory
- call `import package_name` in the CWD where `script.py` is

### Adding functionality

On ajoute un fichier `utils.py` contenant les fonctionalités dans le folder $package \_ name$
- On écrit ensuite dans le fichier du CWD

In [None]:
# Python 3 or above
from .utils import func

# Autre manière
import package_name.utils

# Use function
package_name.utils.func()

# Exemple de Sub packages
from sklearn.preprocessing import OneHotEncoder

### Making your package portable

We need two files in the work_dir:
- requirements.txt sous la forme  

In [None]:
# Needed packages/versions
matplotlib
numpy==1.15.4
pycodestyle>+2.4.0

pip install -r requirements.txt installe tout les packages nécessaires

- setup.py

In [None]:
from setuptools import setup

setup(name='my_package',
version='0.0.1',
description='An example package',
author='Damian',
author_email='damien.fournier@oobien.com',
packages=['my_package'],
install_requires=['matplotlib', 'numpy==1.15.4', 'pycodestyle>=2.4.0'])

## Adding classes to a package

In [None]:
# Même syntaxe

from .documents import Document   # Document est une classe présente dans /wd/my_package/documents.py

### Inheritance 

In [None]:
# Import ParentClass object
from .parent_class import ParentClass

# Create a child class with inheritance
class ChildClass(ParentClass):
    def __init__(self):
        # Call parent's __init__ method
        ParentClass.__init__(self)
        # Add attribute unique to child class
        self.child_attribute = "I'm a child class attribute!"

# Create a ChildClass instance
child_class = ChildClass()

### Multilevel inheritence and super

In [3]:
class Parent:
    def __init__(self):
        print("I'm a parent!")

class Child(Parent):
    def __init__(self):
        Parent.__init__(self)
        print("I'm a child!")

class SuperChild(Parent):
    def __init__(self):
        super().__init__()
        print("I'm a super child!")

class Grandchild(SuperChild):
    def __init__(self):
        super().__init__()
        print("I'm a grandchild!")

grandchild = Grandchild()

I'm a parent!
I'm a super child!
I'm a grandchild!


In [4]:
child = Child()

I'm a parent!
I'm a child!


## Documentation

In [1]:
import this

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!


## Unit testing

In [2]:
# Example using doctest

def square(x):
    """Square the number x
    
    :params x: number to square
    :return: x squared
    
    >>> square(3)
    9
    """
    return x ** x

In [4]:
import doctest

doctest.testmod()

**********************************************************************
File "__main__", line 9, in __main__.square
Failed example:
    square(3)
Expected:
    9
Got:
    27
**********************************************************************
1 items had failures:
   1 of   1 in __main__.square
***Test Failed*** 1 failures.


TestResults(failed=1, attempted=1)

We just learned about doctest, which, if you're writing full docstrings with examples, is a simple way to minimally test your functions. In this exercise, you'll get some hands-on practice testing and debugging with doctest.

In [None]:
# Good for smaller tests
# work_dir
#       - setup.py
#       - requirements.txt
#       - text_analyzer
#           - __init__.py
#           - document.py
#       - tests
#           - test_unit.py
#           - subpackage_tests
#               - test_x.py
#           - subpackage2_tests
#               - test_y.py

In [None]:
# les fonctions de test doivent commencer par test
# working in workdir/tests/test_document.py

# Test tokens attribute on Document object
def test_document_tokens():
    doc = Document('a e i o u')

    assert doc.tokens == ['a', 'e', 'i', 'o', 'u']

# Test edge case of blank document
def test_document_empty():
    doc = Document('')

    assert doc.tokens == []
    assert doc.word_counts == Counter()

To run tests  
`~/work_dir $ pytest`

## Additional tools

- [Sphinx](https://www.sphinx-doc.org/en/master/) is a python documentation generator. Generates doc from docstrings
- [Travis CI](https://travis-ci.org/) continuous integration. When adding code, tests automatically. Scheduled tests...
- [Codecov](https://about.codecov.io/) discover where to improve your projects tests
- [Code Climate](https://codeclimate.com/) analyze your code for improvementsin readability (point out if the code is not modular)

### Utilisation de Sphynx

In [None]:
from text_analyzer import Document

class SocialMedia(Document):
    """Analyze text data from social media
    
    :param text: social media text to analyze

    :ivar hashtag_counts: Counter object containing counts of hashtags used in text
    :ivar mention_counts: Counter object containing counts of @mentions used in text
    """
    def __init__(self, text):
        Document.__init__(self, text)
        self.hashtag_counts = self._count_hashtags()
        self.mention_counts = self._count_mentions()