# Programming style

- Style applies to software!
- The goal is clarity above all else, though conciseness is also desirable.
- Very analogous to writing prose.
- Expect to iteratively edit/refine/improve as you would text.
- There are style guides (e.g., https://www.python.org/dev/peps/pep-0008/) but they tend to focus on syntax (how many spaces to indent, etc.), not on clarity, which is more difficult to prescribe.

## Documentation

- Document as you write your code.
    - There will never be a time in your life when you say, "I have nothing else to do, I think I'll go back and document some of my old work."
    - This ties directly into Jupyter's philosophy of documenting your research.
- Helpful hint: document first and foremost for yourself.  Don't think of documentation as a burden/duty to help others; think of it for its benefit to you.
- Expect to refine both code and documentation to achieve greater simplicity and concision.

## Naming

Use good variable names, yes, but here's a better, more nuanced principle:

> **The larger a variable's scope, the more descriptive a name it should have.**

Example:

In [1]:
my_list = [7, 19, -3]
accum = 0
for v in my_list:
    accum += v

If `v` is used only in this loop, a single-letter variable name is fine.  Calling it `value` or `the_current_value` will not really improve clarity.  But what if `v` were a global variable?

In [2]:
v = 37 # ??

The difference here is really one of scope.  A variable used only within a small section of code (a loop, a small block of code, a short function) gains meaning from the context it sits in, particularly if the code is following a well-known idiom.  Of course, if the meaning is not clear from the context, then a better/longer name is warranted regardless of the scope.

Functions have large scope, since they can be called from anywhere, hence they should have descriptive names.  Consider if not:

In [3]:
def xa(xb, xc, xd):
    xe = str(xb) + "/" + str(xc) + "/" + str(xd)
    print(xe)

A meaningful name really helps:

In [4]:
def print_date(xb, xc, xd):
    xe = str(xb) + "/" + str(xc) + "/" + str(xd)
    print(xe)

But that's not enough.  The arguments (the function's "signature") are integral to understanding the function.  Further, the signature, and documentation generally, describes not just what the code *does* but what *it's supposed to do*.  Without understanding the function's *intention*, how can we gauge its correctness?

In [5]:
def print_date(year, month, day):
    joined = str(year) + "/" + str(month) + "/" + str(day)
    print(joined)

The scope of `joined` is small.  A single-letter variable would not really diminish clarity here.

In [6]:
def print_date(year, month, day):
    s = str(year) + "/" + str(month) + "/" + str(day)
    print(s)

In an editing pass, we might notice that `s` serves little purpose.

In [7]:
def print_date(year, month, day):
    print(str(year) + "/" + str(month) + "/" + str(day))

## A few more style suggestions

There are many programming best practices, but here are three to start with.

> **Gather `import` statements at the top.**

Analogous to listing ingredients at the top of a recipe.

> **A function should do one thing well.**

If you're having trouble documenting a function (and you *are* documenting it, right?), that's a sign that it might need to be refactored.

> **Don't repeat yourself.**

Replace repeated constants with a variable.  Consolidate repeated code into a function.  Then, if something needs to change in the future, it need be changed in only one place.

## Docstrings

"Docstrings" tie documentation into Python's help system. Particularly useful when writing code others will use.

In [8]:
def print_date(year, month, day):
    "Prints the date in Y/M/D format."
    print(str(year) + "/" + str(month) + "/" + str(day))

In [9]:
help(print_date)

Help on function print_date in module __main__:

print_date(year, month, day)
    Prints the date in Y/M/D format.



Use triple quotes for multiline strings.

In [10]:
def print_date(year, month, day):
    """Prints the date
    in Y/M/D format."""
    print(str(year) + "/" + str(month) + "/" + str(day))

## Assertions

- An assertion raises an exception if a condition is not satisfied.
- Good for catching programming errors, not so much for user entry errors.

In [11]:
def calc_bulk_density(mass, volume):
    "Return dry bulk density = powder mass / powder volume."
    assert volume > 0
    return mass/volume

In [12]:
calc_bulk_density(2.5, -3)

AssertionError: 

Every function is written with an intended usage in mind.  Assertions are particularly useful for checking that arguments match the intention, i.e., that arguments have the expected type and meet whatever other criteria are expected of them.  Assertions act as a sanity check, but also as documentation.  Without assertions, Python blithely charges on.

In [13]:
print_date(2.7, False, [19, "what the?!"])

2.7/False/[19, 'what the?!']


In [14]:
def print_date(year, month, day):
    "Prints the date in Y/M/D format.  Arguments should be integers."
    assert type(year) == type(month) == type(day) == int
    print(str(year) + "/" + str(month) + "/" + str(day))

In [15]:
print_date(2.7, False, [19, "what the?!"])

AssertionError: 

You can add custom messages to assertions.

In [16]:
def print_date(year, month, day):
    "Prints the date in Y/M/D format.  Arguments should be integers."
    assert type(year) == type(month) == type(day) == int, "bad argument type, expecting an int"
    assert year > 0, "bad year, out of range"
    assert 1 <= month <= 12, "bad month, out of range"
    print(str(year) + "/" + str(month) + "/" + str(day))

In [17]:
print_date(2017, 0, 5)

AssertionError: bad month, out of range

## Exercise

The following code prints the sum of the squares of a list of numbers.  See if you can simplify it to make it clearer.  Focus not on variable names, but on the structure of the code.  Bonus points if you can rewrite the code using a one-line list comprehension.

In [18]:
my_list = [3, 5, -19]
answer = None # don't know it yet
length_of_list = len(my_list)
list_of_squares = []
for index in range(0, length_of_list):
    value = my_list[index]
    square = value*value
    list_of_squares.append(square)
the_sum = 0
for square_value in list_of_squares:
    the_sum = the_sum + square_value
answer = the_sum
print(answer)

395


Suggested editing approach.  Can simplify the first loop; no need to reference list elements using an index:

In [19]:
my_list = [3, 5, -19]
answer = None # don't know it yet
list_of_squares = []
for value in my_list:
    square = value*value
    list_of_squares.append(square)
the_sum = 0
for square_value in list_of_squares:
    the_sum = the_sum + square_value
answer = the_sum
print(answer)

395


Don't really need two loops, do we?

In [20]:
my_list = [3, 5, -19]
answer = None # don't know it yet
the_sum = 0
for value in my_list:
    the_sum = the_sum + value*value
answer = the_sum
print(answer)

395


Can get rid of redundant variables.

In [21]:
my_list = [3, 5, -19]
the_sum = 0
for value in my_list:
    the_sum = the_sum + value*value
print(the_sum)

395


Even better, as a list comprehension:

In [22]:
my_list = [3, 5, -19]
print(sum([v*v for v in my_list]))

395


Consider: if you weren't told in advance what the above code did, and had to discover for yourself, which version would you more easily and quickly understand?

The below is perhaps even better because it embodies another guideline:

> **Reuse existing code where possible.**

In [23]:
import numpy
my_list = [3, 5, -19]
print(numpy.dot(my_list, my_list))

395
