# Lecture 05: Workflow and debugging

[Download on GitHub](https://github.com/NumEconCopenhagen/lectures-2021)

[<img src="https://mybinder.org/badge_logo.svg">](https://mybinder.org/v2/gh/NumEconCopenhagen/lectures-2021/master?urlpath=lab/tree/05/Workflow_and_debugging.ipynb)

1. [Programming is more than writing code](#Programming-is-more-than-writing-code)
2. [Debugging](#Debugging)
3. [VSCode](#VSCode)
4. [Modules](#Modules)
5. [Git](#Git)
6. [Summary](#Summary)


You will learn how to **structure** and **comment** your code and **document** it for later use. You will learn how to **debug** your code using print, **assert** and try/except statements. You will learn how to write **modules** and **run scripts** from a terminal in **VSCode** and how to share your code with others through **Git**.

In [1]:
import math
import numpy as np

<a id="Programming-is-more-than-writing-code"></a>

# 1. Programming is more than writing code

You seldom write some code, run it, get the right results, and then never use it again.

 * Firstly: You make errors (bugs) when you code.
 * Secondly: You need to share your code with colleagues and your future self.

Transparent **macro- and microstructure** is important. For preventing errors. For finding errors. For making your code interpretable for others and your future-self. **No code is self-explanatory** - even though if might seem so when you write it. 

**Cleaning, commenting and documenting code takes time**, but is a crucial aspect of good programming.

In **scientific programming**, a transparent program structure and good documentation is also a cornerstone in securing **replicability**. 

## 1.1 Sturcture

**Macrostructure** (wrt. folders and files):

1. **One folder** for each project with ALL required files.
2. **End goal**: One file to run it all.
3. **Module files** (.py): Define functions, classes, etc. Perhaps different modules for different kind of tasks (solving, simulating, plotting).
4. **Notebook files** (.ipynb): Call functions, classes etc. and explain and present the results.
5. **Larger projects:** Sub-folders for data, figures, etc. (*not relevant now*).

**Workflow:**

1. **Notebooks (.ipynb):** Work with them in JupyterLab.
2. **Modules (.py):** Work with them in VSCode.

**Microstructure:** Official [PEP8 guideline](https://www.python.org/dev/peps/pep-0008/).

**My recommendations:**

1. **Code layout:**
    * **Indentation:** Four spaces
    * **Line length:** Max of 79 characters (wrap line + indent properly)
    * **Strings:** Use single or double quote (be consistent)
    * **White space:**
        * After comma: ``x = [1, 2, 3]`` (not required)
        * Around assignment: ``x = y``
        * After colon: ``if x == 2: print(x)``
2. **Naming conventions:** Short, but also precise
    * **Modules:** Lower case with potential underscores (e.g. ``numecon`` or ``num_econ``)
    * **Classes:** Camel case (e.g. ``ConsumerClass``)
    * **Variables, functions and methods:** Lower case with potential underscores
3. **Ordered section comments:** Break your code into sections
    * Give each section a name and a place in the ordering
    * Level 1: a, b, c etc.
    * Level 2: i, ii, iii, iv etc.
    * Level 3: o, oo, ooo, oooo etc.
4. **Line comments:** Small additional hints
    * Again, short and precise
    * Avoid just explaining what the code does (must provide additional information)
5. **Docstrings:** Should be written for all functions, methods and classes (see how below).

**More on names:**

1. Normally avoid using ``l``, ``I``, ``O`` or any special characters.
2. Unused variables and non-public methods should start with a ``_``

**Two different perspectives on comments:**

1. The comments explain humans what the code does.
2. The code makes the computer do what the comments say.

**Note:** A style guide is about consistency. Consistency with this style guide is important. Consistency within a project is more important. Consistency within one module or function is the most important.

**Example:**

In [2]:
import math

# a. name for section
alpha = 1
beta = 2
x = [-3, -2, -1, 1, 2, 3]

# b. name for section
def my_function(x,alpha,beta):
    """ explain what the function does (docstring)
    
    Args:
    
        x (float): explanation
        alpha (float): explanation
        beta (float): explanation
        
    Returns:
    
        y (float): explanation
    
    """
    
    y = x**2 
    return y

# c. name for section
for i in range(len(x)):
    
    # i. name for sub-section
    y = my_function(x[i],alpha,alpha)
    
    # ii. name for sub-section
    cond = y > 0 # non-positive not allowed due to log (line comment)
    
    # iii. name for sub-section
    if cond:
        print(math.log(y))

2.1972245773362196
1.3862943611198906
0.0
0.0
1.3862943611198906
2.1972245773362196


**Try:** Write ``my_function(`` and press <kbd>Shift</kbd>+<kbd>Tab</kbd> 

**Recommendation:** Try to think about which sections and sub-sections you need beforehand.

<a id="Debugging"></a>

# 2. Debugging

[Why is a programming error called a bug?](https://www.youtube.com/watch?v=rhFSG-VyR_E)

**General advice:**

1. Code is always partly a black box: Print and plot results to convince yourself (and others) that your results are sensible.
2. Errors are typically something very very simple, look after that.
3. If Python raises an error first try to locate the line where the error occurs.
3. Your code can often run, but give you unexpected behavior.
4. Include ``if``, ``print`` and ``assert`` statements to catch errors.

**Most of the time spend programming is debugging!!** Even when the final code is simple, it can take a lot of trial-and-error to get there.

**Assertions:** Whenever you know something about your variables (e.g. that they should be positive), you should assert this. If the assertion does not hold Python raises an error.

In [3]:
x = -2
y = x**2
assert y > 0, f'x = {x}, y = {y}'

**Task:** Make the above assertion fail.

## 2.1 Example

Consider the following code:

In [4]:
a = 0.8
xlist = [-1,2,3]

def myfun(xlist,a):
    y = 0
    for x in xlist:
        z = x**a
        y += z
    return y

myfun(xlist,a)

(3.340308817497993+0.5877852522924732j)

**Problem:** Our result is a complex number. We did not expect that. Why does this problem arise?

**Find the error with print:**

In [5]:
def myfun(xlist,a):
    y = 0
    for x in xlist:
        z = x**a
        print(f'x = {x} -> {z}') # temp
        y += z
    return y

myfun(xlist,a)

x = -1 -> (-0.8090169943749473+0.5877852522924732j)
x = 2 -> 1.7411011265922482
x = 3 -> 2.4082246852806923


(3.340308817497993+0.5877852522924732j)

**Solution with an assert:**

In [6]:
def myfun(xlist,a):
    y = 0
    for x in xlist:
        z = x**a
        assert np.isreal(z), f'z is not real for x = {x}, but {z}'
        y += z
    return y
try:
    myfun(xlist,a)
except:
    print('assertion failed')

assertion failed


**Solution with if and raise:**

In [7]:
def myfun(xlist,a):
    y = 0
    for x in xlist:
        z = x**a
        if not np.isreal(z):
            print(f'z is not real for x = {x}, but {z}')
            raise
        y += z
    return y

try:
    myfun(xlist,a)
except:
    print('assertion failed')   

z is not real for x = -1, but (-0.8090169943749473+0.5877852522924732j)
assertion failed


**Note:** You could also decide that the function should return e.g. \\( -\infty \\) when experiencing a complex number.

In [8]:
def myfun(xlist,a):
    y = 0
    for x in xlist:
        z = x**a
        if not np.isreal(z):
            return -np.inf
        y += z
    return y

myfun(xlist,a)

-inf

## 2.2 Numpy warnings

In [9]:
xlist = [-1,2,3]
def f(xlist):
    y = np.empty(len(xlist))
    for i,x in enumerate(xlist):
        y[i] = np.log(x)
    return y

f(xlist)

  """


array([       nan, 0.69314718, 1.09861229])

You can **ignore all warnings**:

In [10]:
def f(xlist):
    y = np.empty(len(xlist))
    for i,x in enumerate(xlist):
        with np.errstate(all='ignore'):
            y[i] = np.log(x)
    return y

f(xlist)

array([       nan, 0.69314718, 1.09861229])

**Better:** Decide what the code should do.

In [11]:
def f(xlist):
    y = np.empty(len(xlist))
    for i,x in enumerate(xlist):
        if x <= 0:
            y[i] = -np.inf
        else:
            y[i] = np.log(x)
    return y

f(xlist)

array([      -inf, 0.69314718, 1.09861229])

## 2.3 Scope bugs

Global variables are dangours:

In [12]:
# a. define a function to multiple a variable with 5
a = 5
def f(x):
    return a*x

# many lines of code
# many lines of code
# many lines of code

# z. setup the input and call f
y = np.array([3,3])
a = np.mean(y)
b = np.mean(f(y))
print(b)

9.0


**Question:** What is the error?

**Conclusion:** Never use global variables. Use a positional or a keyword argument instead.

**Useful tool I:** The variable inspector. 

1. **Install:** Step 3.5 [here](https://numeconcopenhagen.netlify.com/guides/python-setup/)
2. **Open it:** Right-click and choose "Open Variable Inspector"

**Useful tool II:** The console. 

1. **Install:** Done automatically
2. **Open it:** Right-click and choose "New Console for Notebook"

## 2.4 Index bugs

In [13]:
# a. setup
N = 10
x = np.linspace(1.3,8.2,N)
y = 9.2

# b. count all entries in x below y
i = 0
try:
    while x[i] < y:
        i += 1
except:
    print('error found')

error found


**Task:** Solve the problem.

<a id="VSCode"></a>

# 3. VSCode

**Central benefits of VSCode:**

1. **Good editor** (easy to move across and with-in files)
2. **Linting** (find errors before you run the code)
3. **Run scripts**
4. **Interactive sessions**
5. **Integrated git** (to share your code online) (see below)
6. **Debugging** (not today)

**Example:** We go through this [guide](https://numeconcopenhagen.netlify.com/guides/vscode-basics/) together.

<a id="Modules"></a>

# 4. Modules

Long notebooks can be very hard to read. Code is structured better in modules saved in .py files.

1. Open VSCode
2. Locate the folder with your notebook
3. Create mymodule.py
4. In the notebook: ``import mymodule``
5. All functions in mymodule.py is now avaliable in the notebook with the prefix ``mymodule.``

**Trick:** Use the ``%load_ext autoreload`` magic with ``%autoreload 2``. Then your modules are automatically reloaded each time you run a cell. Without the module is never reloaded.

In [14]:
%load_ext autoreload
%autoreload 2

In [15]:
import mymodule

In [16]:
try:
    mymodule.myfun(2)
except:
    print('error found')

error found


<a id="Git"></a>

# 5. Git

The purpose of git is to allow you to easily share your code with collaborators and track the changes each of you make.

We go through this [guide](https://numeconcopenhagen.netlify.com/guides/vscode-git/) together.

**Note:** You will later be given repositories named **github.com/NumEconCopenhagen/projects-2020-YOURGROUPNAME**

<a id="Summary"></a>

# 6. Summary

**This lecture:** We have discussed

1. Structuring and commenting on code
2. Debugging (try-except, assert, warnings)
4. Writing and running Python in VSCode
3. Git (version control)

**Inaugural project and group-sign up:** See the details under *Project 0: Inaugural project* [here](https://numeconcopenhagen.netlify.com/exercises/).

**Need help finding a group:** Write in [this thread](https://github.com/NumEconCopenhagen/lectures-2020/issues/3).

**Deadline for hand-in of inaugural project:** 16th of March.