<img src="images/inmas.png" width=130x align=right />

# Notebook 11 - Elements of Software Engineering

Material covered in this notebook:

- Understanding Coding Standards
- The need for documentation
- Source Code Management - Version Control and git
- Working from the command line of a terminal
- Testing your code - the importance of unit testing and regression testing

### Prerequisites
Notebooks 10


In this module you will learn the basics of software engineering and best practices typically used in the industry 

While a Jupyter notebook is a great tool for interactive scripting, writing tutorials, as well as coding prototypes, larger projects are typically characterized by Python modules shared amongst team members

In view of facilitating the integration of the multiple contributors involved in a given project, it is often necessary to adopt common coding best practices

### Coding Standards
As Python is a very flexible language, there are often multiple ways to achieve the same goal. For example, a simple loop can be coded by using an index

```python
for i in range(len(x)):
```
or using an iterator,
```python
for item in x:
```
or even with a forever loop such as:
```python
while 1:
```
with an appropriate `break` condition

### Iterators of not?
- It is recommended to use iterators whenever possible
- In a few situations, however, iterators can lead to problems
- What are these cases?


Hint: Avoid modifying containers while iterating. For example:

In [None]:
mylist  = list(range(10))                         
for x in mylist:                         
    print(x, ', ', end='')             
    print(mylist.pop(0), ', ', end='')                  
    print(mylist.pop(0))

### Inquire about your team's coding standards
The purpose of this introduction is not to propose a given coding standard, but rather to make you aware that coding standards do exist and that you should (1) think about adopting a given style in your own Python scripts and (2) inquire about the existence of such standard in your future assignments

A good place to start is to read the de facto __[guide to Python scripting](https://peps.python.org/pep-0008/)__ (known as PEP 8) from the author of Python himself (PEP stands for Python Enhancement Proposal)

If you integrate to a new team, ask for the agreed-upon coding standard if any

If none, propose one!

### The `black` formatter

Coding standards are intrinsically boring but are a necessary tool in team environments 

- Fortunately, __[automated tools](https://github.com/life4/awesome-python-code-formatters)__ exist for reformating the code in order to comply with parts of the standard involving where to put spaces (or not)

- We demonstrate using `black` starting from some 'bad' code

Run the cell on the next slide to load the file contents in the cell

In [None]:
%load Code_10/badstyle.py

### Running `black` on your code is easy

Let's run `black` automated source code formatter on the file and reload:

In [None]:
! black Code_10/badstyle.py

#### Done. Now let's compare!

In [None]:
%load Code_10/badstyle.py

### Strings style and line length can be personal
- As strings can be represented by either ' ' or " ", some users develop a preference
- PEP recommends double quotes for some practical reasons
- By default, black converts string to double-quote style
    - can be ignored with `black -S` option
- Line lengths can also be configured with the `-l n` option (use lines < n characters)

`black` can also be used on notebooks

There is no reason for you not to use it!


### You won't remember your code in 6 months!
Another unpopular topic is documentation

Prototypical codes are generally poorly documented as the code is in a constant state of change

Python offers multiple ways by which code can be documented as it gets written, specifically named arguments and doc strings for functions and modules

Make it a habit to use descriptive names for variables and named arguments

The main customer of your code is you: make sure you will be able to understand your own code 6 months after not seeing it

In the following cell, we define a function using doc string:

In [None]:
def double(x):
    '''Return twice the number provided in the argument'''
    return 2*x

The text gets stored in the variable '\__doc__' associated with the function

In [None]:
double.__doc__

Multiple lines can also be used such as in:

In [None]:
def add_binary(a, b):
    '''
    Return a string representing the sum of two decimal numbers in binary digits.

            Parameters:
                    a (int): A decimal integer
                    b (int): Another decimal integer

            Returns:
                    Binary string of the sum of a and b
    '''
    binary_sum = bin(a+b)[2:]
    return binary_sum

### Coding interface is now easier to use
Now any user can call the help() function and know what the function does:

In [None]:
add_binary(1024, 1)

In [None]:
help(add_binary)    # Equivalent to print(add_binary.__doc__)

### Source Code Management
Jupyter Notebook allows for checkpoints and the ability to recall previous versions

In team environments, however, `git` is often the preferred tool for incorporating contributions from various team members

The basic functionality of `git` is relatively simple to master and might even be useful to you when you start writing your thesis in LaTeX.

A good place to start is to read a __[short tutorial](https://git-scm.com/doc)__ on git and install it on your computer

This __[web site](https://git-scm.com/)__ is the main resource for `git` source code management tool

There are also multiple websites (e.g, __[github](https://github.com/)__, __[gitlab](https://about.gitlab.com/)__, ...) that can host your project for free

Industries often use these cloud services or an internal server for that purpose

### Modules, Parameter Files, and Command Line Basics
Software reusability can significantly reduce development costs

The main mechanism by which code can be re-used in Python is through modules

In this section we will go beyond the Jupyter notebook and implement a few modules from the command line

To start, open a terminal window using the Anaconda powershell on Windows or the equivalent on macOs or Linux. Let's first see if Python is in your path by running:
```
python --version
```
Note that on Windows, a shell started from the Anaconda Navigator will have the PATH variable properly configured 

### Command line basics
Good! Next let's change to a directory to where you stored the files from the Workshop and where you will write our own modules

If you have never used a command line, the following commands will get you 90% of the work done:
- **pwd**: Print working directory
- **cd** *dirpath*: Change directory to *dirpath* (../ to climb one level)
- **mkdir** *newdir*: Make a new directory called *newdir*
- **ls**: List the files contained in the current directory
- **mv**: Rename a file or move it to a directory
- **cp**: Copy a file

### Running Python from the command line
A Python script is run as:
> python main.py

<br>or, if you want to pass arguments to the script

> python main.py arg1 arg2 ...

Most scripts require (re-)configurable parameters to run

This is different when using a single-file Jupyter notebook, where parameters tend to be hardcoded in the notebook

### Isolate and pass parameters to make algorithms more flexible
For re-usability of the code, it is beneficial to isolate the parameters that will make the algorithm more universally re-usable

There are multiple ways to pass configuration values to a Python script, generally:
1. Through a parameter file
2. Through command line arguments
3. Through a Graphical User Interface

Here, we will mainly address 1 and 2 and leave the GUI topic for an another time

### A first draft
The following simple code reads a parameter file and creates and assigns variables as described in the file:

In [None]:
def readParameters(filename):
    '''Read run-time parameters from a text file'''
    file = open(filename, 'r')
    for line in file:
        variable, value = [word.strip() for word in line.split('=')]
        variable.replace(' ', '_')
        pythoncode = variable + '=' + value
        exec(pythoncode)

Notice how this function generates new code as it reads the parameter file

This is one of the benefits of Python being a dynamically-typed, interpreted language

### Writing robust code
The code on next page contains known and new elements:
- Documenting the program
- Using `try` and `except` for handling potential errors
- Modularizing most of the algorithm for clarity and reusability
- Ensuring that main program is not loaded as a module

Modularisation is an important method which reduces the size of your code

Always remember that a line of code is a liability more than being an asset: less is always better

Every time you do a cut and paste should raise a red flag in your mind for a missed opportunity for a function or a module

### Putting it all together
- We will demonstrate running a script from the command line

- For that purpose, we provide you with a file called *main_1.py* located in the *Code_11* directory

- This file is shown on the next slide
    - Use mouse scrolling to see all the code

#### Use mouse scrolling to see all the code.

```python
#!/usr/bin/env python3
'''
A prototypical main file demonstrating how to read parameters from a file
Martin-D. Lacasse, JHU 2022
'''

import sys
import params

# Print Usage
def printHelp(name):
   print("Usage: ", name, "filename.par")
   sys.exit(1)
     
def run():
   try:
      filename = sys.argv[1]
   except:
      printHelp(sys.argv[0])
    
   myDico = params.readParameters(filename)
   params.printParameters(myDico)                                                                                                                                                                                                                    #####################################################################
# This is the main program
if __name__ == '__main__':
    run()
else:
    print("Error: Can't import main script as a module.", repr(__name__))      
```

Notice how functions definitions are separated from the main part of the program, and how this code cannot be imported as a module

This practice will force you to modularize your code and design an architecture that can be more easily understood and maintained

By going to the file tab of Jupyter, navigate to the *Code_11* directory and open the `main1.py` file in a separate tab. Alternatively, you can use your preferred code editor.

Also look at the module (file) `params.py` which is imported here

We can now run this main file from Jupyter using the *bang* (!) operator as in the following command:

In [None]:
!python Code_11/main_1.py Code_11/parameters.txt

### Running from the command line
Or one can also run this python script from a terminal by using
```shell
python main_1.py parameters.txt
```
from the Code_11 directory

Open a shell, change directory to Code_11, and run the command above

### Arguments passed on the command line
Another way to pass parameters is through command-line arguments. This is typically achieved using the getopt() function from the C standard library made available through the *getopt* module in Python. The following file `main_2.py` shows a typical usage of the `getopt()` function.

The code is shown on the next slide. Use mouse scrolling to see it all.

#### Use mouse scrolling to see all the code.
```python
#!/usr/bin/env python3

# A prototypical main file with parameters from command line arguments
# Martin-D. Lacasse, JHU 2022

import sys
import getopt
import params

# Print Usage
def printHelp(name):
    print("Usage: ", name, "-[h] [-a a_param] [-b b_param] [-c c_param] [-s _sourceCode] [-d DesiredOutcom] -f [file]")
    sys.exit(1)

# Parse options from command line
def processCommandLineArgs(argv):
    progName = argv[0]
    argList =  argv[1:]
    # Default values
    a, b, c, sourceCode, desiredOutcome = 0, 0, 0, 'none', 'failure'

    # Options
    options = "ha:b:c:s:d:"

    # Long options for parameters
    longOptions = ["help", "a=", "b=", "c==", "d=", "s="]

    try:
        # Parsing arguments
        opts, vals = getopt.getopt(argList, options, longOptions)

        # Checking each argument
        for opt, val in opts:
            if opt in ("-h", "--help"):
                printHelp(progName)
            elif opt in ("-a", "--a"):
                a = int(val)
            elif opt in ("-b", "--b"):
                b = int(val)
            elif opt in ("-c", "--c"):
                c = int(val)
            elif opt in ("-s", "--sourceCode"):
                sourceCode = val
            elif opt in ("-d", "--desiredOutcome"):
                desiredOutcome = val

    except getopt.error as err:
            print(str(err))
            printHelp(progName)
            sys.exit(2)

    # print("Opt is", opt)
    # print("Val is", val)
    return a, b, c, sourceCode, desiredOutcome

def run():
    a, b, c, sourceCode, desiredOutcome = processCommandLineArgs(sys.argv)
    print('Arguments are:', 'a=', a, 'b=', b, 'c=', c, 'sourceCode=', sourceCode, 'desiredOutcom=', desiredOutcome)


#####################################################################
# This is the main program
if __name__ == "__main__":
    run()
else:
    print("Error: Can't import main script as a module.", repr(__name__))
```

### Passing arguments on the command line
You should have noticed in the code how the strings provided through the command line need to be converted to int.

This script will read parameters from the command line and override the default values defined when the variables are first initialized. 


In [None]:
!python Code_11/main_2.py -a 2 -b 3

### Unit and Regression Testing
Once we develop our own modules, it is important to define tests that verify the conditions of use of our new algorithms. In most team environments, contributors are asked to run all tests before pushing their changes to the common code repository. As an additional incentive, source code management tools (subversion, git, etc.) have a 'blame' functionality for assigning the responsibility of a broken code to the individual who made the faulty changes.

The action to test a function in isolation is called __unit testing__ while the action of testing the ensemble, i.e., how the functions interact with one another, is called __regression testing__. Let's look at our newly created module to read parameters as an example.

A robust parameter reader should detect:
- duplicate entries
- missing or poorly formatted assignments
- lines starting with a '#' and treat them as comments

### Unit testing in Python
We will now introduce the common approach to write unit tests in Python

These tests are themselves boolean functions starting with the 'test_' keyword and typically living in a separate file located in the same directory as your module

In the interest of time, we will use simple numerical examples to illustrate how to proceed

Before we start, however, we need to discuss (yes, again) an important point about float comparison 

Let's start with a case where it works:

Say that we have the following function which normalizes a 3D vector that we represent as a tuple (this function is already available in NumPy and is only used as a short representative example here)

In [None]:
def normalize(v):
    from math import sqrt
    norm = sqrt(v[0]*v[0] + v[1]*v[1] + v[2]*v[2])
    if norm == 0.:
        v = (0., 0., 0.)
    else:
        v = (v[0]/norm, v[1]/norm, v[2]/norm)
    return v

Writing this function raises a few questions:
1. Should the vector be normalized in-place or should a new tuple be created (libraries typically implement both approaches and use past participle to distinguish between the two, e.g., 'normalize()' vs. 'normalized()'
2. A zero vector should be detected to avoid division by zero

To answer the second question, we will need a comparison involving floating numbers

We can get away with a simple comparison here, but in most other cases, we must use something more sophisticated

For that purpose, the concept of machine epsilon is necessary: this is the floating number which added to unity will return unity due to numerical inaccuraries. Running the following code will give an estimate of the value of machine epsilon on your system:

In [None]:
eps = 1.0
while 1 + eps > 1:
    eps /= 2
eps *= 2

print('Your machine epsilon is about: ', eps)

Because the numerator in the `normalize()` function decreases at a similar rate as the denominator, we only have a 'division by zero' problem when the vector is truely 0. Let's prove our case by trying to make our function to fail: 

In [None]:
normalize([eps, 0, 0])

In [None]:
normalize( (1.e-30, 1.e-21, 1.e-32) )

In [None]:
normalize((0., 0., 0.))

Great! A simple comparison with 0 seems to work.

Let's now consider this simple equality (please run):

In [None]:
a = 0.1
b = 0.2
c = 0.3
a + b == c

Surprised? This comparison involves floating point numbers which representation is only valid down to an epsilon. A better way to test is as follows: 

In [None]:
abs((a + b) - c) < abs(c)*eps

Notice how we normalize on the right hand side as we want to make all comparisons relative to unity, as epsilon is defined with respect to 1. We are now ready to write our first unit test: 

### Writing test functions
In the interest of time, we will write a test to verify existing functions. Say that you just wrote a faster trigonometric library. Let's write a test that will verify our 'new' cos() function. This test needs some help. We will fix it in the exercise.

In [None]:
import math
def mycos(x):
    '''Placeholder to define my own cosine function'''
    return math.cos(x)

def test_mycos1():
    '''Test mycos() with respect to inverse function over float range'''
    import random
    i = 0
    while (i < 10000):
        i += 1
        x = 2*random.random() - 1.
        assert (mycos(math.acos(x)) == x)
        
test_mycos1()

Notice the use of `assert` which is instrumental in error checking, even in day-to-day code

Test functions typically reside in a module called, say, test_mytrig if your library was mytrig. Once written, these tests can be automated and run before each time you commit new code to make sure that your new shiny features have not broken anything in the existing software framework.

Automation can be implemented with various tools, including the `unittest` library, but these topics are beyond the scope of this introductory tutorial. These libraries also have the benefit to provide broader comparison functions such as `assertEqual()` which detect the type of object (float, vs boolean, integer, tuple, dicts, etc.) and behave accordingly.

### Test-driven development
Testing is an important part of software development. It is generally thought of as a way to discover and prevent bugs.

Another interesting approach of testing is design through testing, unlike the typical waterfall method, which consists of designing sofware to meet some pre-defined capabilities and implementing it. 

__[Test-driven development](https://en.wikipedia.org/wiki/Test-driven_development)__ can be a great way to accelerate your project!

### Key Points

- Coding standards do exist and you should inquire about best practices
- Teams use source code management tools such as `git` that can also be beneficial to your PhD work
- Code is a liability more than an asset. Never cut and paste - use functions.
- Modules are a great way to achieve re-usability. A long self-contained notebook is more appropriate for completing an assignment. Multiple modular files are better suited for devising, testing, and managing a scalable project.
- Drop the mouse for a sec! A little knowledge of command line functions can get you a long way. 
- Software development in teams is often governed by established software design approaches and human engineering methods.
- Test your software. These tests can also help you design your code.
- Take the habit of documenting your code: You'll be the first one to thank you.

### Further reading
- [Coding styles](https://docs.python-guide.org/writing/style/#idioms)
- [Test-driven development](https://en.wikipedia.org/wiki/Test-driven_development)